Community
    • Login

    Finding and replacing all data between two points?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    26 Posts 4 Posters 14.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Joey FlaigJ
      Joey Flaig
      last edited by Joey Flaig

      Hey everyone, I’m editing some game transcripts to trim out unneeded information. I’m trying to figure out how find and replace all data between two points. The following is what the formatting looks like now. I want to be able to delete everything between </ORIGINAL N°00#> and </COMMENT N°00#>:

      <SPEAKER N°001>MAKOTO NAEGI</SPEAKER N°001>
      <ORIGINAL N°001>
      <CLT 4>The entrance is still blocked by that giant hunk of
      metal.<CLT>
      </ORIGINAL N°001>
      <JAPANESE N°001>
      <CLT 4>入口は…鉄の塊で塞がれてしまっている…<CLT>
      </JAPANESE N°001>
      <TRANSLATED N°001>

      </TRANSLATED N°001>
      <COMMENT N°001>
      </COMMENT N°001>

      <SPEAKER N°002>MAKOTO NAEGI</SPEAKER N°002>
      <ORIGINAL N°002>
      <CLT 4>This mailbox doesn’t seem important right now.<CLT>
      </ORIGINAL N°002>
      <JAPANESE N°002>
      <CLT 4>このレターケースは…
      今回の事件には関係ないよな…<CLT>
      </JAPANESE N°002>
      <TRANSLATED N°002>

      </TRANSLATED N°002>
      <COMMENT N°002>
      </COMMENT N°002>

      It’s a lot of unnecessary data that adds about 2 thousand extra pages to the document.
      I successfully did it last year but for the life of me can’t remember how and google isn’t helping. I had to change the N°### in the Find box for every new line of dialogue (and make sure it was the same for /ORIGINAL and /COMMENT or it’d cut out vital information from subsequent entries. So I’d do 001, then 002, then 003, etc. The successful result looks like this:
      <SPEAKER N°038>KIYOTAKA ISHIMARU</SPEAKER N°038>
      <ORIGINAL N°038>
      Okay, then let’s split up and start investigating!

      <SPEAKER N°039>KIYOTAKA ISHIMARU</SPEAKER N°039>
      <ORIGINAL N°039>
      When you’re done, everyone meet back up at the dining
      hall and we’ll share what we found!

      Considering the massive amount of data in this document, manually searching for, highlighting, and deleting each instance of this is incredibly time consuming and really hurts my hands. Can anyone help me figure out how to do this? I’d just be replacing the found range with either a space or nothing. (Also apologies in advance if this is hard to understand, I can’t for the life of me figure out how to word any of this!)

      EkopalypseE 2 Replies Last reply Reply Quote 0
      • EkopalypseE
        Ekopalypse
        last edited by

        How about using regular expression
        find: (?s)(?<=</ORIGINAL N°(\d{3})>).*?(?=</COMMENT N°\1>)
        replace with: leave empty
        and you have to use Replace All button. Using replace button doesn’t work in this case. Take it with a grain of salt and test it carefully.

        1 Reply Last reply Reply Quote 4
        • EkopalypseE
          Ekopalypse @Joey Flaig
          last edited by

          @Joey-Flaig

          you can use Mark All to see what gets replaced.

          1 Reply Last reply Reply Quote 0
          • EkopalypseE
            Ekopalypse @Joey Flaig
            last edited by

            @Joey-Flaig

            actually I assume you want to keep the comment tag intact which means instead
            of replacing with nothing you might consider using \r\n<COMMENT N°\1>\r\n
            for replacement. This, in addition, assumes that the line endings are windows eol.
            If these are linux eols instead, then use \n<COMMENT N°\1>\n

            1 Reply Last reply Reply Quote 1
            • guy038G
              guy038
              last edited by guy038

              Hello @joey-flaig, @ekopalypse and All,

              Welcome to the Notepad++ Comunity !

              Of course, the @ekopalypse solutions work but, to my mind, Joey, the fact of deleting from </ORIGINAL N°###> till </COMMENT N°00#> does not seem logic !

              Because, either, you want to delete all the block :

              <ORIGINAL N°###>
              <CLT 4>The entrance is still blocked by that giant hunk of
              metal.<CLT>
              </ORIGINAL N°###>
              

              Or you prefer to keep it and delete from the following line <JAPANESE N°001> till the line </COMMENT N°00#>, included ?

              So, could you specify, with the sample text below, which are, exactly, the numbers of the lines that you wish to delete ?

               1  <SPEAKER N°001>MAKOTO NAEGI</SPEAKER N°###>
               2  <ORIGINAL N°###>
               3  <CLT 4>The entrance is still blocked by that giant hunk of
               4  metal.<CLT>
               5  </ORIGINAL N°###>
               6  <JAPANESE N°###>
               7  <CLT 4>入口は…鉄の塊で塞がれてしまっている…<CLT>
               8  </JAPANESE N°###>
               9  <TRANSLATED N°###>
              10  
              11  </TRANSLATED N°###>
              12  <COMMENT N°###>
              13  </COMMENT N°###>
              

              Presently, I understand that you want the lines 5 to 13 to be deleted, for each ### number !

              Once the range of the lines to be deleted, clearly defined, the appropriate regex S/R will be easy to build ;-))

              See you later,

              Best Regards,

              guy038

              Joey FlaigJ 1 Reply Last reply Reply Quote 3
              • Joey FlaigJ
                Joey Flaig @guy038
                last edited by

                @guy038

                Since the game is an English localization, they included the original Japanese text. There’s also a lot of things that start with </ that I don’t need – I’m not using any style or formatting, just the English text itself. So I only want to keep lines 1 - 4 and delete lines 5 - 13 for the ‘giant hunk of metal’ dialogue. Some of the dialogue goes more or less than two lines though so it can’t just be based on line numbers, if that makes sense? For example, the ‘mailbox doesn’t seem important’ dialogue I want to delete lines 4 - 13.

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by

                  Hi, @joey-flaig, @ekopalypse and All,

                  Ah, perfect and… sorry for my demand for additional information ! As, finally, you do want what you spoke about, in your initial post, i.e to delete all the lines from <ORIGINAL N°###> to </COMMENT N°###>, for each dialogue ;-))

                  Then, the right regex S/R is obvious :

                  • Open the Replace dialog ( Ctrl + H )

                  • Type in the regex (?s-i)^\h*</ORIGINAL\x20N°(\d+)>.+?</COMMENT\x20N°\1>\R, in the Find what: zone

                  • Leave the Replace with: EMPTY

                  • Preferably tick the Wrap around option

                  • Choose the Regular expression search mode

                  • Click once on the Replace All button, or several times on the Replace button

                  Et voilà !


                  Notes :

                  • First, the in-line modifiers (?s-i) :

                    • Forces to regex engine to interpret any dot ( . ) as representing any single character, even EOL characters

                    • Forces the search to be processed in a non-insensitive way

                  • Then, the part ^\h*</ORIGINAL\x20N°(\d+)> looks for the string </ORIGINAL N°, in that exact case, both preceded with possible horizontal blank chars ( \h* ) and followed with some non-null range of digits, which is stored as group 1, due to the enclosed parentheses ( (\d+) ), and finally followed with the > character

                  • Now the last part </COMMENT\x20N°\1>\R searches for the string </COMMENT N°, in that exact case, followed with the contents of group1 ( \1 = the number of each dialogue ) and the EOL character(s) of the line ( \R )

                  • The middle part .+?, represents the smallest non null range of any character between the first part and last part matchs

                  • As the replacement zone is EMPTY, any multi-lines block <ORIGINAL N°###>…</COMMENT N°###> is then deleted

                  Best regards,

                  guy038

                  Joey FlaigJ 1 Reply Last reply Reply Quote 3
                  • Joey FlaigJ
                    Joey Flaig
                    last edited by

                    @Ekopalypse Thank you for the insight!

                    @guy038 And omg this did all 5,145 instances with one click… You just saved me HOURS (days if it weren’t for find and replace) of work! Thank you so completely much!!! Now I can process the rest of the transcript easy as pie too!!! <3

                    1 Reply Last reply Reply Quote 1
                    • Joey FlaigJ
                      Joey Flaig @guy038
                      last edited by

                      @guy038
                      Since you’re already here, quick unrelated question - since this was formatted for a certain aesthetic, sentences cut off and continue on a new line past a certain point.
                      Like:
                      <CLT 4>The entrance is still blocked by that giant hunk of
                      metal.<CLT>
                      Would there be any way to find and replace those line breaks? I’m using some direct quotes from the transcript in my writings, and it’d save a lot of frustration to auto-format them instead of doing it manually for every single dialogue that goes on for more than a line. So it’d look like this:
                      <CLT 4> The entrance is still blocked by that giant hunk of metal. <CLT>

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by guy038

                        Hi, @joey-flaig, @ekopalypse and All,

                        Glad to see that this S/R saved you some hours of work ;-))

                        No problem, to solve this new and easy challenge, Joey !


                        So, this could be translated, in current language, as:

                        Replace any end of line character(s) with a space character only if  the following line :

                        • Does not start with the string <CLT 4>

                        • Does end with the string <CLT>

                        This leads to the regex solution :

                        SEARCH (?-is)\R(?=(?!<CLT 4>).+<CLT>$)

                        REPLACE \x20


                        Notes :

                        • First, the in-line modifiers (?-si) :

                          • Forces to regex engine to interpret any dot ( . ) as representing a single standard character only, NOT EOL character(s)

                          • Forces the search to be processed in a non-insensitive way

                        • Then, the \R syntax will match any EOL ( \r\n in Windows files, \n in Unix files and \r in Mac files )

                        • But only if  the look-ahead - condition (?=.+<CLT>$) is true, i.e. if the following line ends with the string <CLT>

                        • And only if  the second negative nested look-ahead (?!<CLT 4>), right after \R, is also true, i.e. the following line does not begin with the string <CLT 4>

                        Best Regards,

                        guy38

                        1 Reply Last reply Reply Quote 4
                        • Joey FlaigJ
                          Joey Flaig
                          last edited by

                          @guy038 said:
                          Oh, this helps! I forgot to clarify - stuff marked by <CLT> is just colored text to denote in game importance or the protagonist’s internal thoughts and most of the dialogue doesn’t actually have it, it just looks like this:

                          <SPEAKER N°004>MONOKUMA</SPEAKER N°004>
                          <ORIGINAL N°004>
                          Well then. Since you’re all giving it your best, your
                          generous headmaster will give you a little hint!

                          So your solution got some, but not all!

                          1 Reply Last reply Reply Quote 0
                          • guy038G
                            guy038
                            last edited by

                            Hi, @joey-flaig,

                            You said :

                            So your solution got some, but not all!

                            I don’t understand exactly what you mean ! In case, the Multi-lignes, or not, part <CLT 4>...........<CLT> is totally absent, as below, my previous regex ( which I slightly changed, BTW ! ) does not match any EOL char, anyway !?

                            <SPEAKER N°001>MAKOTO NAEGI</SPEAKER N°001>
                            <ORIGINAL N°001>
                            </ORIGINAL N°001>
                            <JAPANESE N°001>
                            <CLT 4>入口は…鉄の塊で塞がれてしまっている…<CLT>
                            </JAPANESE N°001>
                            <TRANSLATED N°001>
                            
                            </TRANSLATED N°001>
                            <COMMENT N°001>
                            </COMMENT N°001>
                            

                            BR

                            guy038

                            1 Reply Last reply Reply Quote 2
                            • Joey FlaigJ
                              Joey Flaig
                              last edited by Joey Flaig

                              Ah, I’m sorry for the confusion! What I mean is that not all dialogue in the game is between <CTL> coding. The regex you made to gets rid of the line break in the dialogue, but only when the dialogue was preceded by <CTL>. So said regex doesn’t remove the line break for all the dialogue, just the portions that are preceeded by <CTL>. I’m looking to remove all dialogue line breaks. Does that make more sense? I can show you screencaps if needed!

                              1 Reply Last reply Reply Quote 0
                              • guy038G
                                guy038
                                last edited by guy038

                                Hello, @@joey-flaig and All,

                                Ah, OK ! If you want to wipe out any empty line or any empty line, containing blank characters, two solutions :

                                • Use the built-in N++ commands Edit > Line Operations > Remove Empty Lines or Edit > Line Operations > Remove Empty Lines (Containing Blank characters)

                                • Use the search regexes, accordingly, (^\R)+ or ^(\h*\R)+ and leave the Replacement zone EMPTY

                                BR
                                guy038

                                Joey FlaigJ 1 Reply Last reply Reply Quote 2
                                • Joey FlaigJ
                                  Joey Flaig @guy038
                                  last edited by

                                  @guy038
                                  Oooh, both these solutions are really handy! They don’t quite work for what I’m looking for though, because there aren’t empty lines, but rather unnecessary line breaks in the middle of sentences. So they didn’t do anything.

                                  Like this

                                  After playing around with the Line Operations feature, I found one called Join Lines. If I select both lines 32 and 33 and hit Join Lines, it gives me the desired effect of merging them into one.

                                  Now that I know what NPP+ can do a little bit better, I’ve come up with a more precise way of speaking. Here’s what I want: I’m looking for some way to automatically Join Lines in places where dialogue is more than one line (and therefore contains a line break/Return). The text that would be checked for this always starts on the line after the <ORIGINAL N°00#> line. Otherwise it would join the coding, SPEAKER, and ORIGINAL line with all the dialogue automatically and that would be a pain to take apart later). Some kind of process or find and replace would be ideal for this because there are thousands of lines I’d have to manually look through.

                                  This seems to be getting more and more complicated! Does this make more sense now?

                                  1 Reply Last reply Reply Quote 0
                                  • guy038G
                                    guy038
                                    last edited by guy038

                                    Hi @@joey-flaig and All,

                                    Ah, I clearly see, now, what you’re looking for ;-)) Actually, you don’t mind about the tags <CLT 4> and <CLT> at all !

                                    So, the magic regex is :

                                    SEARCH (?-si)^<ORIGINAL N°\d+>\R.+?\K\h*\R\h*(?!<)

                                    REPLACE \x20

                                    • Tick, preferably, the Wrap around option

                                    • Select the Regular expression search mode

                                    • Due to the \K syntax, you must use the Replace All button, exclusively

                                    Et voilà !


                                    Notes :

                                    • First, as usual, the in-line modifiers (?s-i) :

                                      • Forces to regex engine to interpret any dot ( . ) as representing any single character, even EOL characters, (?-s)

                                      • Forces the search to be processed in a non-insensitive way, (?-i)

                                    • Then, from beginning of line ( ^ ), the regex searches for the string <ORIGINAL N°, followed with some digits ( \d+ ), followed with the > symbol, closed to its EOL chars ( \R)

                                    • Now, on the line, following the line <ORIGINAL N°###>, it looks for the shortest range of standard characters ( .+? )

                                    • At this point, the \K syntax resets the regex engine search and position. So, from now on, the final search looks for possible blank characters followed with EOL chars, followed, again, with possible blank chars ( \h*\R\h* )

                                    • But this search happens ONLY IF  the negative look-ahead (?!<) is verified, i.e. if the next line does not begin with the < symbol ( => is, simply, the continuation text of the previous line )

                                    • In that case, the EOL characters, possibly surrounded with blank characters, are replaced with a single space character ( \x20 )

                                    Best Regards,

                                    guy038

                                    P.S. :

                                    • The fact of searching for \h*\R\h* and replacing the \x20 ensures you that the two parts of the text will be always joined with an unique space character ;-))

                                    • If, between the tags <ORIGINAL N°###> and </ORIGINAL N°###>, you have more than 2 lines of text, like below :

                                    <ORIGINAL N°001>
                                    <CLT 4>The entrance
                                        is still blocked
                                            by that
                                    giant
                                       hunk of
                                    metal.<CLT>
                                    </ORIGINAL N°001>
                                    

                                    No problem ! Just click, several times, on the Replace All button, until the message Replace All: 0 occurrences were replaced. occurs ;-))

                                    And you’ll be left with :

                                    <ORIGINAL N°001>
                                    <CLT 4>The entrance is still blocked by that giant hunk of metal.<CLT>
                                    </ORIGINAL N°001>
                                    
                                    1 Reply Last reply Reply Quote 3
                                    • Joey FlaigJ
                                      Joey Flaig
                                      last edited by

                                      Yeah, this is what I was looking to do! This regex looks great… but it sometimes fuses the line indicating a speaker change [the ones with all the dashes that look like ----------------) with the dialogue! Any way to tweak the regex to prevent that?

                                      https://imgur.com/OhYLhBM

                                      If it matters, the speaker change lines contain 60 dashes!

                                      1 Reply Last reply Reply Quote 0
                                      • guy038G
                                        guy038
                                        last edited by

                                        Hi, @@joey-flaig and All,

                                        I’ve got no much time, right now ! So try the exact regex, below :

                                        SEARCH (?-si)^<ORIGINAL N°\d+>\R.+?\K\h*\R\h*+(?!<|-{5,})

                                        REPLACE \x20

                                        I’ll explain to you when I’m back, about six hours later !

                                        Cheers,

                                        guy038

                                        Alan KilbornA 1 Reply Last reply Reply Quote 3
                                        • Alan KilbornA
                                          Alan Kilborn @guy038
                                          last edited by

                                          @guy038 said:

                                          I’ve got no much time, right now

                                          And it seems like you could have a full time job, writing this person’s regexes!

                                          1 Reply Last reply Reply Quote 4
                                          • guy038G
                                            guy038
                                            last edited by guy038

                                            Hello, @@joey-flaig and All,

                                            So yesterday, my wife and I were accompanying 7/8 years old children ( our daughter’s class +2 other classes ), on a school trip and then, in the late afternoon, I preferred to watch the USA-Spain match of the women’s world football championship ! Of course, I wanted to know who France’s opponent would be ;-))


                                            Ah ! I understood why the separation line, made up of 60 dashes, was missing in the original Joey’s post ! Just because, due to the MarkDown syntax, any line of, at least, 3 dashes produces a slight gray horizontal rule, instead ;-))

                                            So, Joey, just forget, my quick previous post and let’s recapitulate all the process :


                                            We’ll start with this sample text of 4 blocks, between the lines of dashes. In two of them, I placed some multi-lines text, inside the section <CLT 4>.............<CLT>

                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            <SPEAKER N°001>MAKOTO NAEGI</SPEAKER N°001>
                                            <ORIGINAL N°001>
                                            <CLT 4>Just some   
                                                
                                               
                                                 text to test
                                            my new
                                            regex !   
                                            <CLT>
                                            </ORIGINAL N°001>
                                            <JAPANESE N°001>
                                            <CLT 4>入口は…鉄の塊で塞がれてしまっている…<CLT>
                                            </JAPANESE N°001>
                                            <TRANSLATED N°001>
                                            
                                            </TRANSLATED N°001>
                                            <COMMENT N°001>
                                            </COMMENT N°001>
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            <SPEAKER N°002>MAKOTO NAEGI</SPEAKER N°002>
                                            <ORIGINAL N°002>
                                            <CLT 4>This mailbox doesn’t seem important right now.<CLT>
                                            </ORIGINAL N°002>
                                            <JAPANESE N°002>
                                            <CLT 4>このレターケースは…
                                            今回の事件には関係ないよな…<CLT>
                                            </JAPANESE N°002>
                                            <TRANSLATED N°002>
                                            
                                            </TRANSLATED N°002>
                                            <COMMENT N°002>
                                            </COMMENT N°002>
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            <SPEAKER N°003>MAKOTO NAEGI</SPEAKER N°003>
                                            <ORIGINAL N°003>
                                            <CLT 4>The entrance is still blocked by that giant hunk of
                                            metal.<CLT>
                                            </ORIGINAL N°003>
                                            <JAPANESE N°003>
                                            <CLT 4>入口は…鉄の塊で塞がれてしまっている…<CLT>
                                            </JAPANESE N°003>
                                            <TRANSLATED N°003>
                                            
                                            </TRANSLATED N°003>
                                            <COMMENT N°003>
                                            </COMMENT N°003>
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            <SPEAKER N°004>MAKOTO NAEGI</SPEAKER N°004>
                                            <ORIGINAL N°004>
                                            <CLT 4>An other   
                                                 text to see
                                            how the
                                            regex works<CLT>
                                            </ORIGINAL N°004>
                                            <JAPANESE N°004>
                                            <CLT 4>入口は…鉄の塊で塞がれてしまっている…<CLT>
                                            </JAPANESE N°004>
                                            <TRANSLATED N°004>
                                            
                                            </TRANSLATED N°004>
                                            <COMMENT N°004>
                                            </COMMENT N°004>
                                            

                                            Using the regex S/R, below, we keep the interesting part of each block, from Joey’s point of view, of course !

                                            SEARCH (?s-i)^\h*</ORIGINAL\x20N°(\d+)>.+?</COMMENT\x20N°\1>\R

                                            REPLACE Leave EMPTY

                                            And we get :

                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            <SPEAKER N°001>MAKOTO NAEGI</SPEAKER N°001>
                                            <ORIGINAL N°001>
                                            <CLT 4>Just some   
                                                
                                               
                                                 text to test
                                            my new
                                            regex !   
                                            <CLT>
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            <SPEAKER N°002>MAKOTO NAEGI</SPEAKER N°002>
                                            <ORIGINAL N°002>
                                            <CLT 4>This mailbox doesn’t seem important right now.<CLT>
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            <SPEAKER N°003>MAKOTO NAEGI</SPEAKER N°003>
                                            <ORIGINAL N°003>
                                            <CLT 4>The entrance is still blocked by that giant hunk of
                                            metal.<CLT>
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            <SPEAKER N°004>MAKOTO NAEGI</SPEAKER N°004>
                                            <ORIGINAL N°004>
                                            <CLT 4>An other   
                                                 text to see
                                            how the
                                            regex works<CLT>
                                            

                                            Now, in order to change all multi-lines sections <CLT 4>.............<CLT> into a mono-line one, use the following regex S/R :

                                            SEARCH (?-s)(?:(?!<|>|-).)+?\K(?:\h*\R\h*)++(?=(<CLT>)|.+)

                                            REPLACE ?1:\x20

                                            And here is your expected text :

                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            <SPEAKER N°001>MAKOTO NAEGI</SPEAKER N°001>
                                            <ORIGINAL N°001>
                                            <CLT 4>Just some text to test my new regex !<CLT>
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            <SPEAKER N°002>MAKOTO NAEGI</SPEAKER N°002>
                                            <ORIGINAL N°002>
                                            <CLT 4>This mailbox doesn’t seem important right now.<CLT>
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            <SPEAKER N°003>MAKOTO NAEGI</SPEAKER N°003>
                                            <ORIGINAL N°003>
                                            <CLT 4>The entrance is still blocked by that giant hunk of metal.<CLT>
                                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                            <SPEAKER N°004>MAKOTO NAEGI</SPEAKER N°004>
                                            <ORIGINAL N°004>
                                            <CLT 4>An other text to see how the regex works<CLT>
                                            

                                            Notes on the 2nd regex :

                                            • First, the in-line modifier (?-s) means that the regex char . represents a single standard character, only !

                                            • Then, the part (?:(?!<|>|-).)+? looks for the smallest non-null range of standard characters, different from <, > and -, in a non-capturing group, till… … … the next main part

                                            • This main part is \K(?:\h*\R\h*)++ which resets the regex engine first ( \K ), then searches for any non-null blocks of EOL chars, possibly surrounded with blank characters, placed in a non-capturing group, without any backtracking in that part, due to the possessive quantifier ++

                                            • The final part (?=(<CLT>)|.+) is a dummy look-ahead structure, which defines an always-true condition ! Indeed, the block of EOL chars searched must be followed by, either, the string <CLT>, stored as group 1 or any non-null range of standard chars

                                            • The main advantage of this odd syntax is that, if the EOL chars are followed with the specific string <CLT>, we won’t replace with a space character but simply delete this EOL block, which can be achieved with the conditional replacement ?1:\x20, meaning :

                                            • If group 1 exists, replace with the EMPTY string

                                            • If group 1 does not exist, replace with a space character

                                            Best Regards,

                                            guy038

                                            1 Reply Last reply Reply Quote 3
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors