Community
    • Login

    Changing Data inside XML element

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    27 Posts 5 Posters 5.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hello, @matthew-allshorn and All,

      First, because of a wrong Markdown syntax, some part of your post was not displayed ! So, allow me to show the real contents of your post !

      Matthew said :

      Have this as an example:

      <ProductName><![CDATA[Product Name Could be anything here L/XL]]></ProductName>
      <ProductName><![CDATA[Product Name Could be anything here S/XXL]]></ProductName>

      Have 7000 products, around 500 of them have “/” in the product names, which is coursing an issue in another process. So want to ignore anything before and after the “/” inside square brackets and only replace the “/” for the <ProductName>.

      How can I do a find and replace on the “/” to a “-” so that it will not course me issues in another process, please?


      Now, I’ve got a regex solution :

      SEARCH (<!\\[CDATA\\[|\G)[^]\r\n]*?\K/

      REPLACE \x2d


      Notes :

      • This regex may change several slash / characters, present in a single <![CDATA[ .............. ]]> area

      • This regex keeps the slash char(s) when found outside a <![CDATA[ .............. ]]> area

      Test it against that text :

      <ProductName><![CDATA[Product/Name Could be/anything/here S/XXL]]></ProductName><p>Test/Test / Test</p>
      

      Best Regards,

      guy038

      1 Reply Last reply Reply Quote 1
      • Alan KilbornA
        Alan Kilborn
        last edited by Alan Kilborn

        I think the regexes shown above are victims of this site not being able to slow backslash followed by [ correctly!

        Also, is there a reason the technique shown HERE is not employed to solve this one?

        astrosofistaA 1 Reply Last reply Reply Quote 1
        • guy038G
          guy038
          last edited by guy038

          Hello, @matthew-allshorn, @alan-kilborn and All,

          @Alan-kilborn :

          BUt I did use this method ! I just forgot the (?!\A) look-head in front of \G !

          Remember my notations :

          • BR = <!\\[CDATA\\[

          • ER = [^]\r\n]*? which, implicitly, define the area where to search for a slash / character. indeed, the regex searches for the shortest range of characters, different from a closing square bracket ] and EOL chars, till a / character. So the area, where to search for a /, begins right after the literal <![CDATA[ string and ends right before the ]]> string

          • SR = /

          • RR = \x2d


          @matthew-allshorn

          So, the safe method is :

          • Move the caret to the very beginning of current file ( Ctrl + Home )

          • Open the Replace dialog ( Ctrl + H )

            • SEARCH (?-i)(<!\\[CDATA\\[|(?!\A)\G)[^]\r\n]*?\K/

            • REPLACE \x2d

            • Select the Regular expression search mode

            • Click once on the Replace All mode


          @Alan-kilborn :

          May be, the notation ER, for Excluded Regex, has not the appropriate name. Globally, this regex, built with a negative class char [^...] or with a negative look-ahead syntax ((?!•••••••).)*?, defines , at the same time :

          • A zone, beginning with BR, where the search of SR is possible

          • And, implicitly, any other areas, where the search of SR is strictly forbidden

          As soon as a char of the area is not valid, the regex must skip one or more characters to, possibly, match an other SR string, But this possibility is not allowed because of the \G assertion, which forces consecutive matches. So, this overall syntax guaranties that a new match attempt of SR should occur after a next BR string !

          Best Regards,

          guy038

          Alan KilbornA 1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn @guy038
            last edited by Alan Kilborn

            @guy038 said in Changing Data inside XML element:

            But I did use this method

            I guess what I meant is, why not point back to the generic solution, during solving it specifically?
            That way people that want to learn have an opportunity, and perhaps could apply the general techniques themselves to their future problems. Maybe I’m too hopeful. :-)

            But for me, I’m always reading postings here trying to pick off new techniques. If I had seen that this one was solved using a technique I already knew about, I’d save myself some time and stop reading. :-)


            May be, the notation ER, for Excluded Regex, has not the appropriate name…

            I like thinking of it as “end Exclusion Region” :-)


            Also, did you see my earlier statement?:

            I think the regexes shown above are victims of this site not being able to slow backslash followed by [ correctly!

            Because without proper escaping of the [, this isn’t going to be a valid regex:

            8427b9ea-efe8-45b1-a8ec-dc121df6f821-image.png

            Alan KilbornA 1 Reply Last reply Reply Quote 2
            • Alan KilbornA
              Alan Kilborn @Alan Kilborn
              last edited by Alan Kilborn

              @Alan-Kilborn said in Changing Data inside XML element:

              Because without proper escaping of the [, this isn’t going to be a valid regex

              Due to site problems with posting backslash-then-[, maybe best to do \Q[\E instead!

              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by

                Hello, @matthew-allshorn, @alan-kilborn and All,

                Oh, my God ! Thanks, Alan, I’ve just updated my two previous posts, in order to get the right \\[ syntax !

                Now, Alan, your idea of ER for Excluded Region, with the same letters, is really clever :-)) And I think the BR notation could also refer to the Beginning Region. Do your agree to this idea ?

                We could also choose the SR notation for Start Region, but in this case, we need to change the old SR by SD ( Search Definition ) and the old RR by RD ( Replacement Definition !

                Cheers,

                guy038

                Alan KilbornA 1 Reply Last reply Reply Quote 2
                • Alan KilbornA
                  Alan Kilborn @guy038
                  last edited by

                  @guy038 said in Changing Data inside XML element:

                  Now, Alan, your idea of ER for Excluded Region, with the same letters, is really clever :-)) And I think the BR notation could also refer to the Beginning Region. Do your agree to this idea ?
                  We could also choose the SR notation for Start Region, but in this case, we need to change the old SR by SD ( Search Definition ) and the old RR by RD ( Replacement Definition !

                  I’m always agreeable to nice patterns and things that help make something easier to understand. :-)

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by

                    Hi, @alan-kilborn,

                    So, Alan, do you think these notations acceptable and my formulation clear enough ?


                    • Let SD (Search Definition ) be the regex which defines the char, string or expression to be searched

                    • Let RD (Replacement Definition ) be the regex which defines the char, string or expression which must replace the SD expression

                    • Let SR ( Start Region ) be the regex which defines the start of the area where the search for SD, must begin

                    • Let ER ( Excluded Region) be the regex which defines, implicitly, the area where the search for SD, must stop

                    Then, the generic regex can be expressed :

                    SEARCH (?s)(?:SR|(?!\A)\G)(?:(?!ER).)*?\K(?:SD) OR (?-s)(?:BR|(?!\A)\G)(?:(?!ER).)*?\K(?:SD)

                    REPLACE RD


                    Important :

                    • You must use, at least, the v7.9.1 N++ release, so that the \A assertion is correctly handled

                    • You must, move the caret at the very beginning of current file ( Ctrl + Home ), in case of a Replace All operation

                    BR

                    guy038

                    P.S. :

                    Thereafter, if we agreed, I’ll try to get all of my posts, speaking of these *notations and will update them, accordingly !

                    Alan KilbornA 2 Replies Last reply Reply Quote 1
                    • Alan KilbornA
                      Alan Kilborn @guy038
                      last edited by

                      @guy038

                      I think I misspoke before with my “end Exclusion Region” phraseology.
                      I really meant (I think) to not use the word “exclusion”.
                      Sure, that’s what the functionality in the overall regex is, but from a user instruction perspective, it confuses. User is better to think in terms of start and end rather than start and exclusion.

                      How about:

                      • BSR (Begin Search-region Regex)
                      • ESR (End Search-region Regex)
                      • FR (Find Regex)
                      • RR (Replace Regex)

                      Of course, in the end, totally up to you.

                      1 Reply Last reply Reply Quote 0
                      • astrosofistaA
                        astrosofista @Alan Kilborn
                        last edited by

                        @Alan-Kilborn said in Changing Data inside XML element:

                        I think the regexes shown above are victims of this site not being able to slow backslash followed by [ correctly!

                        Not in my post, I used the \Q \E escape sequence to avoid that.

                        Cheers

                        1 Reply Last reply Reply Quote 1
                        • Alan KilbornA
                          Alan Kilborn @guy038
                          last edited by Alan Kilborn

                          @guy038

                          Also, can we encapsulate the (?s) and (?-s) difference somehow, so you don’t have to publish TWO regexes that differ only in that leading element?

                          Actually, shouldn’t it be up to the user to use (?s) or (?-s) as appropriate in their SR, ER, etc expressions?

                          You have one use of . in the generic expression, perhaps it should have its own (?s) or (?-s) specifier that only relates to it?

                          I suppose this gets even more complicated, but I’ll leave it to a “big regex thinker” like you. :-)

                          @astrosofista said :

                          Not in my post, I used the \Q \E escape sequence to avoid that.

                          Right, I was of course meaning the others.
                          But, it is probably bad to modify the regex you want to write, just so that it meets some criterion of the site it is posted on!

                          1 Reply Last reply Reply Quote 1
                          • guy038G
                            guy038
                            last edited by guy038

                            Hello, @matthew-allshorn, @alan-kilborn and All,

                            Back to my @matthew-allshorn’s search regex :

                            SEARCH (?-i)(<!\\[CDATA\\[|(?!\A)\G)[^]\r\n]*?\K/

                            I could have slightly modified it, as Regex_A, below which, in turn, could also be expressed, according to the generic syntax, as Regex_B

                            Regex_A :   (?x-i)  (?:  <!\\[CDATA\\[  |  (?!\A)\G)      [^]]      *?  \K  /
                            
                            Regex_B :   (?x-i)  (?:  <!\\[CDATA\\[  |  (?!\A)\G)  (?: (?!]).)   *?  \K  /
                            
                            

                            Now, Alan, I fully endorse your new abbreviations ! So :

                            • Let FR (Find Regex ) be the regex which defines the char, string or expression to be searched

                            • Let RR (Replacement Regex ) be the regex which defines the char, string or expression which must replace the FR expression

                            • Let BSR ( Begin Search-region Regex ) be the regex which defines the beginning of the area where the search for FR, must start

                            • Let ESR ( End Search-region Regex) be the regex which defines, implicitly, the area where the search for FR, must end

                            Then, the generic regex can be expressed :

                            SEARCH (?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)

                            REPLACE RR


                            Important :

                            • You must use, at least, the v7.9.1 N++ release, so that the \A assertion is correctly handled

                            • You must, move the caret at the very beginning of current file ( Ctrl + Home ), if you plane a Replace All operation

                            • If all instances of FR, to match, are located in the same line as BSR, use (?-s) ( instead of (?s) ), in the appropriate non-capturing group

                            • If the BSR expression to match is independent from case, use (?i) ( instead of (?-i) ), in the appropriate non-capturing group

                            • If the FR expression to match is independent from case, use (?i) ( instead of (?-i) ), in the appropriate non-capturing group.

                            • Depending of the FR contents, when case is not concerned, you may, either :

                              • Omit its non-capturing group and, if necessary, use the $0 syntax, in the replacement regex RR

                              • Define a group for some part of FR, which will be re-used in the replacement regex RR


                            Alan, do you like it that way ?

                            Best Regards,

                            guy038

                            Alan KilbornA 2 Replies Last reply Reply Quote 1
                            • Alan KilbornA
                              Alan Kilborn @guy038
                              last edited by

                              @guy038 said in Changing Data inside XML element:

                              Alan, do you like it that way ?

                              I do have some suggestions; will post back after I have some more time to consider it.

                              1 Reply Last reply Reply Quote 2
                              • Alan KilbornA
                                Alan Kilborn @guy038
                                last edited by Alan Kilborn

                                @guy038 said in Changing Data inside XML element:

                                (?-i:BSR|(?!\A)\G)(?s:(?!ESR).)*?\K(?-i:FR)

                                So I experimented a bit, and I found that it could be judged a bit “fragile”, from its higher-level intent. Example:

                                overall regex:

                                (?x)    (?-i:    (?s:B.*?S.*?R)    |(?!\A)\G)    (?s:(?!E.*?S.*?R).)    *?\K    (?s-i:F.*?R)
                                

                                data set 1:

                                B
                                S
                                R
                                
                                FR
                                
                                F
                                R
                                
                                E
                                S
                                R
                                
                                FR
                                

                                result 1 (intended):

                                622a2226-481c-4854-b26b-a47c48743997-image.png

                                data set 2:

                                B
                                S
                                R
                                
                                FR
                                
                                F
                                
                                
                                E
                                S
                                R
                                
                                FR
                                
                                

                                result 2 (unintended):

                                4c2ac72f-10c3-43d3-b8f7-9e0c14f3fa2f-image.png

                                I certainly understand the reason for failure of result 2.

                                1 Reply Last reply Reply Quote 1
                                • guy038G
                                  guy038
                                  last edited by guy038

                                  @Alan-kilborn and All,

                                  Alan, you’re cheating a bit ! Let’s me explain :


                                  I’ll use this sample text :

                                  B
                                  S
                                  R
                                  FR
                                  
                                  F
                                  R
                                  
                                  F
                                  
                                  F
                                  
                                  
                                  R
                                  
                                  
                                  R
                                  
                                  F
                                  
                                  FR
                                  E
                                  S
                                  R
                                  FR
                                  
                                  F
                                  R
                                  
                                  F
                                  
                                  F
                                  
                                  
                                  R
                                  
                                  
                                  R
                                  
                                  F
                                  
                                  FR
                                  
                                  FR
                                  

                                  Your FR regex is (?s-i:F.*?R) which means : “Search for the shortest range, possibly empty, of any char, EOL included, between a F letter and a R letter, with that case”

                                  So, using only the regex (?s-i:F.*?R), it matches, in the BSR •••••••• ESR area :

                                  • The string FR, line 9

                                  • The string FCRLFR, lines 11 and 12

                                  • The string FCRLFCRLFFCRLFCRLFCRLFR, between the lines 14 and 19

                                  • The string FCRLFCRLFFR, between the lines 24 and 26


                                  Now, let’s consider this other FR regex (?s-i:F\R*R). This regex searches for a F, with that case, followed with a range of consecutive EOL chars, possibly empty, till a R letter, with that case. I think this definition is close to what we expect to :

                                  Using this new version (?s-i:F\R*R), it matches, in the BSR •••••••• ESR area :

                                  • The string FR, line 9

                                  • The string FCRLFR, lines 11 and 12

                                  • The string FCRLFCRLFCRLFR, between the lines 16 and 19

                                  • The string FR, line 26


                                  So, if we use this overall regex ( only the FR part have been changed ) :

                                  (?x)    (?-i:    (?s:B.*?S.*?R)    |  (?!\A)\G)    (?s:(?!E.*?S.*?R).)    *?\K    (?s-i:F\R*R)
                                  

                                  it correctly matches and marks :

                                  • The string FR, line 9, right after the BSR

                                  • The string FCRLFR, lines 11 and 12

                                  • The string FCRLFCRLFCRLFR, between the lines 16 and 19

                                  • The string FR, line 26, right before the ESR

                                  • And every other match of the simple regex (?s-i:F\R*R), located after the ESR, are discarded, as expected ;-))

                                  Refer the picture, below, where I’m using the regex in line 3 :

                                  12be4451-8fc2-4639-9073-8e7b3e970033-image.png


                                  In summary, this generic regex seems quite robust !!

                                  Best Regards,

                                  guy038

                                  1 Reply Last reply Reply Quote 1
                                  • Alan KilbornA
                                    Alan Kilborn
                                    last edited by

                                    So the part that bothers me (slightly) is that as a user, I want to think of specifying the search region and the find regexes as independent entities.
                                    Maybe another way of saying it is that I’d like the search region to have precedence over the find.
                                    In my example 2 this doesn’t happen, and maybe there is no way to have it happen.

                                    With this, I run the risk of writing a bad replacement, that could be data dependent.
                                    Meaning that I could craft something that works well for a few cases at the top of a file where I’m testing it, but perhaps not all cases farther down in the file (or worse, multiple files).
                                    So I do my “few cases” analysis, deem it working, and then tell it to blast away at the rest of the file(s), and find out some time later that I’ve corrupted my data with errant replacements.

                                    Normally, in such a risky situation – if I know/suspect it to be so – I could tell myself “Don’t use this technique in a Replace All situation, just do a Replace (with implied Find Next) so that you can verify each replacement.” But… this technique uses \K and individual-replace doesn’t work (certainly not your fault @guy038 – are the devs ever going to make it work?).

                                    Perhaps there are just always going to be such risks with certain techniques.

                                    1 Reply Last reply Reply Quote 2
                                    • guy038G
                                      guy038
                                      last edited by guy038

                                      Hi, @alan-kilborn and All,

                                      I tried to study the \K behavior, in case of a simple replacement with the Replace button and, unfortunately, I cannot deduce a general rule for that assertion :-((

                                      • Open a new tab and paste the simple expression abgcd ef ghi j

                                      • Select one line of the list , below, containing a regex

                                      • Open the Replace dialog( Ctrl + H )

                                      • For all the examples, the Replace with field contains only the @ character

                                      • Click once on the Find Next button

                                      • Then, click several times on the Replace button … till no replacement occurs

                                      =>

                                      Results, after :
                                      
                                          - A "CTRL + HOME" operation
                                      
                                          - SELECTION of ONE regex or LINE, below
                                      
                                      	- A "CTRL + H" operation
                                      
                                          - ONE click on the "FIND NEXT" button
                                      
                                          - SEVERAL clicks on the "REPLACE" button  ( NOT the "REPLACE ALL" button ! )
                                      
                                      against the string  "abgcd ef ghi j", PASTED in a NEW tab :
                                      
                                      
                                      (?x-s).+?\K\x20           # KO  ( 3 SPACE chars NOT changed )
                                      (?x-s).*?\K\x20           # OK  ( 3 SPACE chars CHANGED into @ )
                                      
                                      (?x-is)(ab.*?\Kg|.*?\Kp)  # KO  ( TWO "g" NOT changed )
                                      (?x-is)(ab.*?\Kg|.*?\Kh)  # KO  ( TWO "g" NOT changed, "h" CHANGED into @ only  )
                                      (?x-is)(ab.*?\Kg|.*?\Kg)  # OK  ( TWO "g" CHANGED into @ )
                                      
                                      (?x-is)(.*?\Kg|.*?\Kp)    # OK  ( TWO "g" CHANGED  into @ )
                                      (?x-is)(.*?\Kg|.*?\Kh)    # OK  ( TWO "g" and "h" CHANGED into @  )
                                      (?x-is)(.*?\Kg|.*?\Kg)    # OK  ( Two "g" CHANGED into @ )
                                      
                                      (?x-is)a.*?\Kg            # KO  ( TWO "g" NOT changed )
                                      (?x-is)\l.*?\Kg           # KO  ( TWO "g" NOT changed )
                                      
                                      (?x-is).*?\Kg             # OK  ( TWO "g" CHANGED into @ )
                                      (?x-is)(ab|)*?\Kg         # OK  ( TWO "g" CHANGED into @ )
                                      (?x-is)(ab|\G)*?\Kg       # OK  ( TWO "g" CHANGED into @ )
                                      

                                      Although the \K behavior is rather difficult to interpret, it seems, however, from the last example (?x-is)(ab|\G)*?\Kg, that the generic regex is compatible with several “step by step” replacements, using the Replace button only. So :

                                      • Paste the text, below, in a new tab :
                                      F
                                      
                                      R
                                      
                                      FR
                                      B
                                      S
                                      R
                                      FR
                                      
                                      F
                                      R
                                      
                                      F
                                      
                                      F
                                      
                                      
                                      R
                                      
                                      
                                      R
                                      
                                      F
                                      
                                      FR
                                      E
                                      S
                                      R
                                      FR
                                      
                                      F
                                      R
                                      
                                      F
                                      
                                      F
                                      
                                      
                                      R
                                      
                                      
                                      R
                                      
                                      F
                                      
                                      FR
                                      
                                      FR
                                      
                                      • Move back to the very beginning ( Ctrl + Home )

                                      • Open the Replace dialog ( Ctrl + H )

                                      SEARCH    (?x)  (?: (?s-i:B.*?S.*?R)  |  (?!\A)\G )  (?s-i:(?!E.*?S.*?R).)*?  \K  (?s-i:F\R*R)   #  With my FR version
                                      
                                      REPLACE   @
                                      
                                      • Click once on the Find Next button ( it should select the first FR, after “BSR” )

                                      • Then, click four times on the Replace button

                                      => One at a time, the selected matched zone is replaced with the @ sign ;-))

                                      and you should get that picture :

                                      f6b5af75-fa54-47d5-8b17-d294c1d8a33b-image.png


                                      Now, Alan, it quite possible to avoid the \K assertion, with that syntax :

                                      SEARCH    (?x)  (  (?: (?s-i:B.*?S.*?R)  |  (?!\A)\G )  (?s-i:(?!E.*?S.*?R).)*?  )  (?s-i:F\R*R)  #  With my FR version
                                      
                                      REPLACE   \1@
                                      

                                      Remark that the replacement is \1@ ( instead of @ ). After running this S/R, in the same conditions as above, we get similar results ;-)

                                      However, note that, if we just search for the FR or try to mark the FR zones, that second syntax, without \K, is quite annoying and not really exact, as it does select FR, but it also selects :

                                      • The BSR part and anything till FR

                                      • Anything since last match till FR


                                      Thirdly, Alan, regarding the precedence of the region to search for, over the effective FR match :

                                      As your FR regex (?s-i:F.*?R) contains a range of any char, EOL included, you need to restrict this area to an area not containing the ESR, too ! Thus, this fact changes your regex to :

                                      SEARCH    (?x)  (?: (?s-i:B.*?S.*?R)  |  (?!\A)\G )  (?s-i:(?!E.*?S.*?R).)*?  \K  (?s-i:F((?!E.*?S.*?R).)*?R)   #  With your FR version
                                                                                                                        <--- Find Regex---- FR --->
                                      REPLACE   @
                                      

                                      Of course, your regex does not match exactly like my FR regex ! But the matches are correct and correctly limited in the BSR •••• ESR area. Moreover, the step by step replacement is still effective and give the picture below :

                                      84b156af-2571-49ac-a03f-405b9cc89009-image.png

                                      Best Regards,

                                      guy038

                                      P.S. :

                                      An other example :

                                      Let’s consider this text :

                                      /
                                      
                                      /
                                      
                                      The licenses/for most {software are/designed to/take away your} freedom/to share/and change it.
                                      
                                      {By contrast, the/GNU General Public/License is } intended to/guarantee/your {freedom/to share and/change} free/software
                                      
                                      This General/Public License/applies to {most of/the Free Software/Foundation's software}
                                      
                                      {/}
                                      
                                      /
                                      /
                                      
                                      • Paste it in a new tab

                                      • Move back to the beginning ( Ctrl + Home )

                                      Now, let’s suppose that we want to change any / character into a @ character, but ONLY in areas delimited by curly braces { •••• }. Note that, in our sample, the second line contains two { ••••• } zones. So :

                                      • FR = /

                                      • RR = @

                                      • BSR = \{

                                      • ESR = \}

                                      And gives this overall S/R :

                                      SEARCH    (?x-si) (?:  \{  |  (?!\A)\G  )  (?: (?!\}). )*?  \K  /
                                      
                                      REPLACE   @
                                      

                                      Remark that the part (?:(?!ESR).)*? , implicitly, defines an area which does not contain any } character till the next / char to search for. In other words, it tries to find a / before a possible further } character !

                                      => After 1 click on the Find Next button and 9 clicks on the Replace button, you’ll see that all occurrences of /, which are included in areas between curly braces ONLY, have been replaced, as expected, with the @ symbol ;-))

                                      Alan KilbornA 1 Reply Last reply Reply Quote 1
                                      • Alan KilbornA
                                        Alan Kilborn @guy038
                                        last edited by Alan Kilborn

                                        @guy038

                                        So I was intrigued by your \K results, but let me confine my response to just this small part of your posting:

                                        (?x-s).+?\K\x20           # KO  ( 3 SPACE chars NOT changed )
                                        (?x-s).*?\K\x20           # OK  ( 3 SPACE chars CHANGED into @ )
                                        
                                        abgcd ef ghi j
                                                  1111  <- tens, and
                                        01234567890123  <- ones position in doc
                                        

                                        But rather than looking at it as a user and regex guru as you did, I cheated – because I’m not the same guru – and looked at the N++ source code.

                                        So, what Notepad++ does, when you click on the Replace button is, it runs its Find Next code search (transparently) and sets a variable called “nextFind”. This search starts at the minimum position of the current selection (or just the caret position if no selection is active) and proceeds toward higher positions in the doc.

                                        If what comes out of that search is a match of the same selected text position range as you began with, then the replacement is made.

                                        If what results from the Find Next search is some different text selected (or you had nothing selected originally), the replace is not done, but the new text selection is now the nearest match higher in the doc (presumably convenient for your next press of Replace!).

                                        On actual replacement, it then moves the selection to the following match it finds (meaning that internally it runs another Find Next). For the case where the replacement was made, the following search starts at the very righthand side of the newly inserted text.
                                        That’s kind of a wordy explanation for this part, but hopefully it makes some sense.

                                        Okay, so what does this all this mean for regexes using \K, specifically yours above where the first one doesn’t work and the second one does ?

                                        analysis of “first” regex:

                                        (?x-s).+?\K\x20

                                        move caret to start-of-file, data as stated above: abgcd ef ghi j

                                        first press of Replace button:
                                        active selection range at time of button press: (0,0)
                                        code runs at Replace press:
                                        calculates “nextFind” selection range = (5,6)
                                        no (selection range) equivalency to (0,0) so NOTHING IS REPLACED
                                        selection moves to (5,6) which is the space between d and e

                                        second press of Replace button:
                                        active selection range at time of button press: (5,6)
                                        code runs at Replace press:
                                        calculates “nextFind” selection range = (8,9)
                                        no (selection range) equivalency to (5,6) so NOTHING IS REPLACED
                                        selection moves to (8,9) which is the space between f and g

                                        third press of Replace button:
                                        active selection range at time of button press: (8,9)
                                        code runs at Replace press:
                                        calculates “nextFind” selection range = (12,13)
                                        no (selection range) equivalency to (8,9) so NOTHING IS REPLACED
                                        selection moves to (12,13) which is the space between i and j

                                        etc.

                                        analysis of “second” regex:

                                        (?x-s).*?\K\x20

                                        move caret to start-of-file, data as stated above: abgcd ef ghi j

                                        first press of Replace button:
                                        active selection range at time of button press: (0,0)
                                        code runs at Replace press:
                                        calculates “nextFind” selection range = (5,6)
                                        no (selection range) equivalency to (0,0) so NOTHING IS REPLACED
                                        selection moves to (5,6) (the space between d and e)

                                        second press of Replace button:
                                        active selection range at time of button press: (5,6)
                                        code runs at Replace press:
                                        calculates “nextFind” selection range = (5,6)
                                        have (selection range) equivalency to (5,6) so REPLACEMENT IS MADE
                                        selection moves to (8,9) which is the next match between the f and g

                                        third press of Replace button:
                                        active selection range at time of button press: (8,9)
                                        code runs at Replace press:
                                        calculates “nextFind” selection range = (8,9)
                                        have (selection range) equivalency to (8,9) so REPLACEMENT IS MADE
                                        selection moves to (12,13), which is the next match between the i and j

                                        etc.

                                        Here’s the conclusion I draw from this:

                                        If you use \K in a regex, if the part of the regex to the left of the \K matches as zero-length, the Replace button press WILL work to replace data. If, however, the part of the regex to the left of \K matches one or more characters, a press of Replace will NOT perform a textual substitution, but will rather just move to the next higher match.

                                        Following this rule: Because the “first” regex demands that a minimum of one character be matched to the left of \K, no Replace ments are made. Because the “second” regex has a minimum-of-zero requirement, when it does match zero, that match comes to the left of \K, so a Replace is allowed to actually replace.

                                        Probably a zero-length match to the left of \K in a regex isn’t very useful often; perhaps that is why I don’t recall hearing of \K and single-step replace “working only sometimes” in the past? Typically, it is, “doesn’t work”.

                                        1 Reply Last reply Reply Quote 2
                                        • guy038G
                                          guy038
                                          last edited by guy038

                                          Hello, @alan-kilborn and All,

                                          You said :

                                          If you use \K in a regex, if the part of the regex to the left of the \K matches as zero-length, the Replace button press WILL work to replace data. If, however, the part of the regex to the left of \K matches one or more characters, a press of Replace will NOT perform a textual substitution, but will rather just move to the next higher match.

                                          Not totally exact, Alan !

                                          For instance, let’s imagine this text, in a new tab, beginning with two blank lines :

                                          
                                          
                                          78464 13232178913 894654465464 12231
                                          
                                          52632abc9526271 026238121 945135 s1658
                                          
                                          6479123789 456134 978941 13454
                                          
                                          46464646l 4567861341 128978 10313
                                          
                                          111386460abc9564 6240 17868913345100544 4867864
                                          

                                          Note that the lines 5 and 11, only, contains the string abc


                                          And let’s use a simple regex, derived from our generic regex, presently studied in the previous posts :

                                          (?-s)(abc|(?!\A)\G).*?\K\x20

                                          Now, in this new tab, containing the sample text :

                                          • Move back to the beginning ( Ctrl + Home )

                                          • Open the Replace dialog Ctrl + H )

                                            • SEARCH (?-s)(abc|(?!\A)\G).*?\K\x20

                                            • REPLACE @

                                          • Click once on the Replace button ( identical to a click on the Find Next button ! )

                                          => The first space character, of line 5, is matched. As the caret was initially on a blank line, it cannot match any standard character. So the part (?-s)(?!\A)\G.*?\K\x20, with the second alternative, has not been used by the regex engine

                                          Thus, the regex engine uses, necessarily, the first alternative ( the regex (?-s)abc.*?\K\x20 ), which indeed, contains a part, before \K, which is not a zero-length area, as equal to (?-s)abc.*?

                                          However, if you click a second time on the Replace button, the first space char of line 5 is, as expected, changed into the @ symbol !

                                          Of course, any further space char, located in lines containing the abc string, are replaced, in the same way, after successive clicks on the Replace button


                                          You also said :

                                          Probably a zero-length match to the left of \K in a regex isn’t very useful often;

                                          But it just so happens that our generic regex S/R :

                                          SEARCH (?s-i:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?s-i:FR)

                                          REPLACE RR

                                          seems to work correctly with “step-by-step” replacement ;-))

                                          Note that the search syntax above corresponds to the particular case where :

                                          • The search is sensible to the case of letters

                                          • The search may extend to a multi-lines area

                                          • No subset of FR, is needed in the replacement regex => All groups are non-capturing ones

                                          Best Regards,

                                          guy038

                                          Alan KilbornA 1 Reply Last reply Reply Quote 2
                                          • Alan KilbornA
                                            Alan Kilborn @guy038
                                            last edited by

                                            @guy038 said in Changing Data inside XML element:

                                            Not totally exact, Alan !

                                            But it just so happens that our generic regex S/R…

                                            Ok, then, back to the drawing board.
                                            More analysis work on this for me!
                                            I’ll return with the answers.
                                            Well…if I find them.

                                            Alan KilbornA 1 Reply Last reply Reply Quote 2
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors