Community
    • Login

    Replacing text from x to y

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    13 Posts 4 Posters 937 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hello, @mleczny-sernik and All,

      I’ve got a nice solution which is a bit difficult to explain, because it uses a particular kind of regex, called recursive regex ! You get this kind of regex when a subroutine call to a group, with the syntax (?#), is located within the group it refers to !


      Here is the road map :

      • Open the Replace dialog ( Ctrl + H )

      • SEARCH (?-i)^\h*history=(\{(?:[^{}]++|(?1))*\})\R+

      • REPLACE Leave EMPTY

      • Tick, preferably, the Wrap around option

      • Select the Regular expression search mode

      • Click, either, once on the Replace All button or several times on the Replace button

      Voila !


      • Note that the inner part of the regex (\{(?:[^{}]++|(?1))*\}), between the outer parentheses, define a group 1 which matches, successively :

        • An opening brace character {

        • Then, in a non-capturing group (?:•••), repeated 0 or more times, matching either :

          • The largest range of characters, all different from { and }, without any possible backtracking due to its atomic quantifier ++

          • A subroutine call (?1) to group1 which is, itself, an inner block {•••••}

        • An ending brace character }


      So, this regex will match any history section IF, of course, it contains a well-balanced number of { and } delimiters, even an empty history section as history={} with its line-break chars ;-))

      Best Regards,

      guy038

      Neil SchipperN 1 Reply Last reply Reply Quote 1
      • Neil SchipperN
        Neil Schipper @Mleczny Sernik
        last edited by

        @Mleczny-Sernik Hi. There are a few things about your description that are open to interpretation.

        I have a regex solution that is very rigid, and only matches the type of whitespace, and the amount of indentation, shown in your sample.

        Since the regex is deleting several lines, I prefer, as a starting point, to delete too little rather than too much.

        I will assume the block you want deleted:

        • always starts exactly with an empty line, then (on next line) a single TAB, then the text “history={”
        • always ends exactly with: a line that contains one TAB, then text ‘}’, and nothing else

        If the files you are processing are human-typed there’s a good chance that variations in whitespace, or commenting, will cause this solution to be incomplete. If the files are machine made, chances are better the solution will meet your need.

        Find what: ^\r\n^\thistory={.*?^\t}\r\n
        Ensure the “Replace with:” entry is completely empty
        Do check the “. matches newline”

        Do not rely on it without a lot of testing.

        You may wish to enable (Np++ menu) “View - Show Symbol - Show all chars” when examining files before and after applying the regex.

        1 Reply Last reply Reply Quote 0
        • Neil SchipperN
          Neil Schipper @guy038
          last edited by

          @guy038 Hi. Your solution is far, far, far more sophisticated than anything I could have come up with.

          However, re:

          IF, of course, it contains a well-balanced number of { and } delimiters

          what if a goofball, perhaps someone named something like Neil, was the author of one of these files, and in the course of development, he left one of the files to be processed looking something like this:

          	history={
          		owner = FRA
          		add_core_of = CHA
          		victory_points = {
          			2081 1
          		}
          		//structures = {  // obsolete name! clean out after testing
          		buildings = {
          			infrastructure = 2
          
          		}
          
          	}
          
          

          This would be a problem, I believe.

          1 Reply Last reply Reply Quote 0
          • Terry RT
            Terry R
            last edited by

            @Neil-Schipper said in Replacing text from x to y:

            what if a goofball, perhaps someone named something like Neil, was the author of one of these files, and in the course of development, he left one of the files to be processed looking something like this:

            In every solution, whether it be a regex; simple to complex like @guy038 one here; through pythonscript code, through UDL’s, they ALL rely on data integrity.

            So if someone comes to the forum seeking help and shows a “sample” of their data, the solution, whatever it may be will be based on that example. We may give a caveat, such as @guy038 did here, the need for balanced delimiters. In the end it is always the OP’s responsibility to provide enough “evidence” that we can trust our solution to act appropriately on the data.

            If you were to read back through a lot of the posts in this forum, you will find however that the OP’s generally have a naive view of their data. Often the forum member(s) who are striving to help are the ones to ask the “right” questions about the data and thus jolt the OP enough that they gain a new respect for their data. Only through that process can the solution providers believe enough in their solution to provide it, albeit sometimes with a caveat.

            Often solution providers will also suggest running the solution over a copy of the data and vetting the result before fully integrating it into their workflow.

            I think your question, whilst having some merit is over thinking the process. OP’s ask for help. Solution providers may ask for additional information and/or examples. A solution is then provided, sometimes with information on how it works and what is required of the data to get valid results and it’s left to the OP to test. Hopefully the OP comes back with a “thank you” (amazing how many times we NEVER get that) so we know it solved their problem, or a gotcha so the process repeats with the additional information loaded in.

            So don’t sweat it and get too bogged down in what-ifs when helping someone. You learn to rely on judgement and sometimes in the end the solution doesn’t work through no fault of the person helping.

            Terry

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hi, @mleczny-sernik, @neil-schipper, @terry-r and All,

              Neil, to solve the case you mention, we have 3 possibilities :

              • The first possibility is quite obvious :

                • Open the Find dialog ( Ctrl + F )

                • Type in a { or a } char in the Find what zone

                • Tick the Wrap around option

                • Click on the Count button

                => The number of { chars must be identical to the number of } chars ! If not, the program’s logic is broken and I wish you good luck to identify the missing or extra brace !

              • The second possibility is to decide that an escaped { or } char will be considered as a normal character, different from [{}].

                • => Any allowed character, inside a {•••••} section, is represented by the regex (?:\\[{}]|[^{}])

                • Then, the search regex to use becomes (?-i)^\h*history=(\{(?:(?:\\[{}]|[^{}])++|(?1))*\})\R+

                • Of course, your comment line must then be re-written as //structures = \{ // obsolete name! clean out after testing

              • The third possibility is to use the following regex which identifies the longest contiguous zone with an identical number of opening brace and ending brace character(s)

                • This magic regex is (?:[^{}]*(\{(?:[^{}]++|(?1))*\}))+[^{}]*|[^{}]+

                • Do not tick the Wrap around and perform some tests, moving the caret to different locations, in your program, which should help you, to some extent, to spot the guilty brace char !

                • Note that a simple text, without any brace char, is also matched by this regex. Logical, this text contains the same number of opening and ending brace chars : 0 ;-))

              BR

              guy038

              P.S. :

              You may test the regex (?:[^{}]*(\{(?:[^{}]++|(?1))*\}))+[^{}]*|[^{}]+ against this text, pasted in a new tab :

              {{{{ab{{{cd{{{}}}}}ef}}}}}}
              1234  567  89198765  43210x
                           0
              
              {{ab{{{{cd{{{ef{{}}}}}gh}}}}ijkl}}}}
              12  3456  789  1119876  5432    10xx
                             010
              
              {{{{{{ab{cd{ef{{{}}}}}gh}ijkl}}}mn}}}}}
              123456  7  8  91119876  5    432  10xxx
                             010
              
              {{01ab{cd{ef23gh{ij45kl}mn}op{{qr67st}uvwx}34}yz}}128956}abc
              12    3  4      5      4  3  45      4     3  2  10      x
              

              For example , move your cursor at the beginning of each line and click on the Find Next button ! This damn regex is never wrong !!

              Neil SchipperN 1 Reply Last reply Reply Quote 0
              • Neil SchipperN
                Neil Schipper @guy038
                last edited by Neil Schipper

                @guy038

                Hi Guy,

                I played with your bullet 3 regex against your sample text. It works, and it’s very cool.

                If you wished (and you do not need to prove yourself as someone who likes challenges – I for sure would not try it) you could try to make it so that on successive Find Next clicks, rather than resume from text after the entire prior match, it could crawl from where prior match started to the next open brace and go from there, allowing user to see the enclosed region shrink bit by bit… Leading to the next challenge: going left bit by bit, and watching the region grow… maybe that’s too many levels of super-cool, I dunno… and then I remember that I/we already get matching brace highlighting (maybe only for some known languages?) (I forget if it’s a native Np++ feature or provided by a plugin.)

                I tried the regex in your 2nd bullet and although it still works against the original sample code, it’s not working for me with that extra line I threw in, with the open brace but backslash-escaped. Don’t know why, and I’m not inclined to try to debug it; I’m really a noob in regex flow control.

                Your conclusion in Bullet 1 is not strictly correct (unless file were stripped of all comments) but it’s not worth sweating about.

                Neil SchipperN 1 Reply Last reply Reply Quote 0
                • Neil SchipperN
                  Neil Schipper @Neil Schipper
                  last edited by

                  @Neil-Schipper said in Replacing text from x to y:

                  it’s not working for me

                  Sorry, I should have said why: it’s matching one more level of right brace.

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by

                    Hi, @mleczny-sernik, @neil-schipper, @terry-r and All,

                    Ah…Ok ! So, I improve this magic regex to this version :

                    (?:[^{}]*(\{(?:[^{}]++|(?1))*\}))+[^{}]*|[^{}]+|}+     I just added the alternative |}+ at the end of the regex

                    Again, test it against the text below :

                    • Move your caret/cursor right before each ¤ char, first

                    • Note that I indicated, with the ^ character, the char changed in the 5 following lines ¤{{{{ab{{{cd....., in comparison with the first one

                    • Right above each line ¤{{{{ab{{{cd..... :

                      • Any range -.....- represents a match of the main part (?:[^{}]*(\{(?:[^{}]++|(?1))*\}))+[^{}]*|[^{}]+

                      • Any range b.....b represents a match of the final part {+

                    ---------------------------b----------------------------------bb------------------------------------bbb--------------------------------------------------------b---
                    ¤{{{{ab{{{cd{{{}}}}}ef}}}}}}{{ab{{{{cd{{{ef{{}}}}}gh}}}}ijkl}}}}{{{{{{ab{cd{ef{{{}}}}}gh}ijkl}}}mn}}}}}{{01ab{cd{ef23gh{ij45kl}mn}op{{qr67st}uvwx}34}yz}}128956}abc
                    
                    
                    ---------------------------------------------------------------b------------------------------------bbb--------------------------------------------------------b---
                    ¤{{{{ab{{{cd{{{}}}}}ef}}}}}{{{ab{{{{cd{{{ef{{}}}}}gh}}}}ijkl}}}}{{{{{{ab{cd{ef{{{}}}}}gh}ijkl}}}mn}}}}}{{01ab{cd{ef23gh{ij45kl}mn}op{{qr67st}uvwx}34}yz}}128956}abc
                                               ^
                    
                    ---------------------------b------------------------------------------------------------------------bbb--------------------------------------------------------b---
                    ¤{{{{ab{{{cd{{{}}}}}ef}}}}}}{{ab{{{{cd{{{ef{{}{}}}gh}}}}ijkl}}}}{{{{{{ab{cd{ef{{{}}}}}gh}ijkl}}}mn}}}}}{{01ab{cd{ef23gh{ij45kl}mn}op{{qr67st}uvwx}34}yz}}128956}abc
                                                                  ^
                    
                    ---------------------------b----------------------------------bb----------------------------------bbbbb--------------------------------------------------------b---
                    ¤{{{{ab{{{cd{{{}}}}}ef}}}}}}{{ab{{{{cd{{{ef{{}}}}}gh}}}}ijkl}}}}{{{}{{ab{cd{ef{{{}}}}}gh}ijkl}}}mn}}}}}{{01ab{cd{ef23gh{ij45kl}mn}op{{qr67st}uvwx}34}yz}}128956}abc
                                                                                       ^
                    
                    ---------------------------b----------------------------------bb------------------------------------bbb------------------------------------------------bb------b---
                    ¤{{{{ab{{{cd{{{}}}}}ef}}}}}}{{ab{{{{cd{{{ef{{}}}}}gh}}}}ijkl}}}}{{{{{{ab{cd{ef{{{}}}}}gh}ijkl}}}mn}}}}}{{01ab{cd}ef23gh{ij45kl}mn}op{{qr67st}uvwx}34}yz}}128956}abc
                                                                                                                                    ^
                    
                    ---------------------------------------------------------------------------------------------------bbbb--------------------------------------------------------b---
                    ¤{{{{ab{{{cd{{{}}}}}ef}}}}}{{{ab{{{{cd{{{ef{{}{}}}gh}}}}ijkl}}}}{{{}{{ab{cd{ef{{{}}}}}gh}ijkl}}}mn}}}}}{{01ab{cd}ef23gh{ij45kl}mn}op{{qr67st{uvwx}34}yz}}128956}abc
                                               ^                  ^                    ^                                            ^                           ^
                    

                    In this last example, the regex automatically advances to the next well-balanced range of characters ! I note, with an lower-case letter s ( for skipped ), the character(s) skipped :

                    ---------------------------------------------------------------------------------------------------bb---bbs--------------------------ss--ssss--------------s------
                    ¤{{{{ab{{{cd{{{}}}}}ef}}}}}{{{ab{{{{cd{{{ef{{}{}}}gh}}}}ijkl}}}}{{{}{{ab{cd{ef{{{}}}}}gh}ijkl}}}mn}}}abc}}{{01ab{cd}ef23gh{ijkl}mn}op{{12{{{{34{}{qr}st{uv}{{34}yz
                    

                    Now, regarding the second bullet of my previous post (?-i)^\h*history=(\{(?:(?:\\[{}]|[^{}])++|(?1))*\})\R+, if I escape the opening brace, in you comment line, as a convention which changes the brace as an ordinary character, giving this history section :

                    history={
                    		owner = FRA
                    		add_core_of = CHA
                    		victory_points = {
                    			2081 1
                    		}
                    		//structures = \{  // obsolete name! clean out after testing
                    		buildings = {
                    			infrastructure = 2
                    
                    		}
                    
                    	}
                    

                    The regex does match all the section ! So could you show me an example where the regex fails to match this history block ?


                    Finally, my answer, in bullet 1 was, indeed, rather vague and I suppose that most IDE has tools to identify and correct the unmatched blocks of programming languages ! But running first my simple work-around means necessarily an error when numbers are different, isn’t it ?

                    Best Regards,

                    guy038

                    Neil SchipperN 1 Reply Last reply Reply Quote 0
                    • Mleczny SernikM
                      Mleczny Sernik
                      last edited by

                      Thanks everyone for help :) It worked

                      1 Reply Last reply Reply Quote 0
                      • Neil SchipperN
                        Neil Schipper @guy038
                        last edited by

                        @guy038 said in Replacing text from x to y:

                        The regex does match all the section ! So could you show me an example where the regex fails to match this history block ?

                        I’m seeing it match past end of state’s } (includes all trailing newlines) rather than past history’s as desired:
                        cacd9520-ecb5-439d-9db2-dca4f4b6db26-image.png
                        v7.9.5 64-bit

                        1 Reply Last reply Reply Quote 1
                        • guy038G
                          guy038
                          last edited by guy038

                          @neil-schipper,

                          Ah, yes ! I needed to indicate two consecutive \ characters in the regex ( in order to search for a literal \ char ! )

                          So the correct regex is rather :

                          (?-i)^\h*history=(\{(?:(?:\\\[{}]|[^{}])++|(?1))*\})\R+

                          BR

                          guy038

                          Neil SchipperN 1 Reply Last reply Reply Quote 0
                          • Neil SchipperN
                            Neil Schipper @guy038
                            last edited by

                            @guy038 said in Replacing text from x to y:

                            So the correct regex

                            Confirmed. In hindsight I should’ve spotted it myself – every instance of backslash-leftbracket on this site should be treat with suspicion and caution.

                            1 Reply Last reply Reply Quote 1
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors