Removing Text Before and After dialogue.

Reply to Removing Text Before and After dialogue. on Wed, 20 Sep 2023 16:50:01 GMT

Borderless Media — Wed, 20 Sep 2023 16:50:01 GMT

I want to Thank you guys for this. Yesterday I tried Peter’s code first and it worked but there were still some things left behind. Inevitably i found a wayto cancel them out so great work either way.

Terry for some reason My Notepad++ freezes when i used your method to replace all.

Guy038, yours worked perfectly. It got rid of everything and also spaced out the lines. I intended to put them into a word doc and realized that word automatically spaces out lines so it seems I put you guys through a bit more work than needed but those codes got the job done. I won’t be needing further help on this i believe.

Thanks again and warm regards

Reply to Removing Text Before and After dialogue. on Wed, 20 Sep 2023 08:03:04 GMT

guy038 — Wed, 20 Sep 2023 08:03:04 GMT

Hello, @borderless-media, @peterjones, @terry-R and All,

An alternate solution to the @terry-R’s one could be :

Open your file or select the right tab
Move to the very beginning of the file ( Ctrl + Home )
Open the Replace dialog ( Ctrl + H )
Un-tick all the box options
SEARCH [^\x{4E00}-\x{9FFF}]+|(?-s)(.+)(?=



REPLACE ?1$0\r\n\r\n    ( or    ?1$0\n\n if you deal with Unix files )


Select the Regular expression search mode


Click on the Replace All button


Here you are !

I must admit, that I initially did not think about the Chinese Characters range. Special thanks for that clever idea, Terry ;-))
Best Regards,
guy038
P.S. :
This regex S/R works also, if a range of Chinese chars is split on  several lines. After the replacement, this range is displayed in a single line, again !



Reply to Removing Text Before and After dialogue. on Tue, 19 Sep 2023 20:58:38 GMT
Terry R — Tue, 19 Sep 2023 20:58:38 GMT
@Borderless-Media said in Removing Text Before and After dialogue.:

I want to remove everything before and after each of the Chinese dialogues so that the Chinese texts are all that remains. I’d also like for them to have 1 blank line in-between them so they are readable

I think I may have the answer. I cannot claim all the credit, I looked to an old post by @guy038 to find out the hex range of Chinese characters first. I then made an assumption with the example you provided. I assumed also that in any group of Chinese characters they commenced with a Chinese character and ended at the < character. That was because I noted that in the last group of Chinese characters there also appeared a space (see the raised .)
Anyways, for what it is, it did produce the desired result (including a blank line after each group of Chinese characters.
Using the Replace function and search mode set to “regular expression” we have

Find What:(([\x{4E00}-\x{9FFF}].+?)(?=<))|.\R?

Replace with:?{1}${1}\r\n\r\n
For an explanation we have:

(([\x{4E00}-\x{9FFF}].+?)(?=<)) - find a Chinese character (at the next position), if so then continue finding characters and stop when the next one is a <.

|.\R?` - this is alternation so if the previous code didn’t find a Chinese character then we grab this (one) character and any possible EOL (end of line)

?{1}${1}\r\n\t\n - in replacement if the first part of the find regex did find characters (group 1) we will return that group of Chinese characters followed by the end-of-line twice, this adds a carriage return & line feed behind the Chinese character group and then a second carriage return & line feed. The alternate code has no parenthesis around it (so no group definition) and that is because we want to consume it (destroy/delete), not return any of those characters.
Terry



Reply to Removing Text Before and After dialogue. on Tue, 19 Sep 2023 18:39:39 GMT
PeterJones — Tue, 19 Sep 2023 18:39:39 GMT
@Borderless-Media said in Removing Text Before and After dialogue.:

I want to remove everything before and after each of the Chinese dialogues so that the chinese texts are all that remains. I’d also like for them to have 1 blank line in-between them so they are readable.

If you really want to delete all the tags, and just leave things that aren’t part of the tags, it’s not that hard to do with regex.  So assuming you have a backup of your data, what I would suggest is:

Delete from each < to its corresponding > (assuming you don’t have any nested; normally you don’t in valid XML)

FIND = (?s)<.*?>

REPLACE = \r\n

SEARCH MODE = Regular Expression

REPLACE ALL

this finds each smallest <...> pair, and replaces them with a newline.  This will likely leave multiple newlines between some pieces of text


Combine multiple newlines into one:

FIND = (\r\n)+

REPLACE = \r\n (if you just want a single line break) or \r\n\r\n (if you want double-spaced lines)

SEARCH MODE = Regular Expression

REPLACE ALL

What this does do: gets rid of tags (ie, the stuff between <...> pairs) but leaves all content.
What this does not do: verify whether the stuff that’s left is Chinese text.  If you had Russian or Arabic or Hebrew or English or … elsewhere, it would still be there after this.
----
Useful References

Please Read Before Posting
Template for Search/Replace Questions
Formatting Forum Posts
Notepad++ Online User Manual: Searching/Regex
FAQ: Where to find other regular expressions (regex) documentation