Community
    • Login

    Changing Data inside XML element

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    27 Posts 5 Posters 5.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @guy038
      last edited by Alan Kilborn

      @guy038

      So I was intrigued by your \K results, but let me confine my response to just this small part of your posting:

      (?x-s).+?\K\x20           # KO  ( 3 SPACE chars NOT changed )
      (?x-s).*?\K\x20           # OK  ( 3 SPACE chars CHANGED into @ )
      
      abgcd ef ghi j
                1111  <- tens, and
      01234567890123  <- ones position in doc
      

      But rather than looking at it as a user and regex guru as you did, I cheated – because I’m not the same guru – and looked at the N++ source code.

      So, what Notepad++ does, when you click on the Replace button is, it runs its Find Next code search (transparently) and sets a variable called “nextFind”. This search starts at the minimum position of the current selection (or just the caret position if no selection is active) and proceeds toward higher positions in the doc.

      If what comes out of that search is a match of the same selected text position range as you began with, then the replacement is made.

      If what results from the Find Next search is some different text selected (or you had nothing selected originally), the replace is not done, but the new text selection is now the nearest match higher in the doc (presumably convenient for your next press of Replace!).

      On actual replacement, it then moves the selection to the following match it finds (meaning that internally it runs another Find Next). For the case where the replacement was made, the following search starts at the very righthand side of the newly inserted text.
      That’s kind of a wordy explanation for this part, but hopefully it makes some sense.

      Okay, so what does this all this mean for regexes using \K, specifically yours above where the first one doesn’t work and the second one does ?

      analysis of “first” regex:

      (?x-s).+?\K\x20

      move caret to start-of-file, data as stated above: abgcd ef ghi j

      first press of Replace button:
      active selection range at time of button press: (0,0)
      code runs at Replace press:
      calculates “nextFind” selection range = (5,6)
      no (selection range) equivalency to (0,0) so NOTHING IS REPLACED
      selection moves to (5,6) which is the space between d and e

      second press of Replace button:
      active selection range at time of button press: (5,6)
      code runs at Replace press:
      calculates “nextFind” selection range = (8,9)
      no (selection range) equivalency to (5,6) so NOTHING IS REPLACED
      selection moves to (8,9) which is the space between f and g

      third press of Replace button:
      active selection range at time of button press: (8,9)
      code runs at Replace press:
      calculates “nextFind” selection range = (12,13)
      no (selection range) equivalency to (8,9) so NOTHING IS REPLACED
      selection moves to (12,13) which is the space between i and j

      etc.

      analysis of “second” regex:

      (?x-s).*?\K\x20

      move caret to start-of-file, data as stated above: abgcd ef ghi j

      first press of Replace button:
      active selection range at time of button press: (0,0)
      code runs at Replace press:
      calculates “nextFind” selection range = (5,6)
      no (selection range) equivalency to (0,0) so NOTHING IS REPLACED
      selection moves to (5,6) (the space between d and e)

      second press of Replace button:
      active selection range at time of button press: (5,6)
      code runs at Replace press:
      calculates “nextFind” selection range = (5,6)
      have (selection range) equivalency to (5,6) so REPLACEMENT IS MADE
      selection moves to (8,9) which is the next match between the f and g

      third press of Replace button:
      active selection range at time of button press: (8,9)
      code runs at Replace press:
      calculates “nextFind” selection range = (8,9)
      have (selection range) equivalency to (8,9) so REPLACEMENT IS MADE
      selection moves to (12,13), which is the next match between the i and j

      etc.

      Here’s the conclusion I draw from this:

      If you use \K in a regex, if the part of the regex to the left of the \K matches as zero-length, the Replace button press WILL work to replace data. If, however, the part of the regex to the left of \K matches one or more characters, a press of Replace will NOT perform a textual substitution, but will rather just move to the next higher match.

      Following this rule: Because the “first” regex demands that a minimum of one character be matched to the left of \K, no Replace ments are made. Because the “second” regex has a minimum-of-zero requirement, when it does match zero, that match comes to the left of \K, so a Replace is allowed to actually replace.

      Probably a zero-length match to the left of \K in a regex isn’t very useful often; perhaps that is why I don’t recall hearing of \K and single-step replace “working only sometimes” in the past? Typically, it is, “doesn’t work”.

      1 Reply Last reply Reply Quote 2
      • guy038G
        guy038
        last edited by guy038

        Hello, @alan-kilborn and All,

        You said :

        If you use \K in a regex, if the part of the regex to the left of the \K matches as zero-length, the Replace button press WILL work to replace data. If, however, the part of the regex to the left of \K matches one or more characters, a press of Replace will NOT perform a textual substitution, but will rather just move to the next higher match.

        Not totally exact, Alan !

        For instance, let’s imagine this text, in a new tab, beginning with two blank lines :

        
        
        78464 13232178913 894654465464 12231
        
        52632abc9526271 026238121 945135 s1658
        
        6479123789 456134 978941 13454
        
        46464646l 4567861341 128978 10313
        
        111386460abc9564 6240 17868913345100544 4867864
        

        Note that the lines 5 and 11, only, contains the string abc


        And let’s use a simple regex, derived from our generic regex, presently studied in the previous posts :

        (?-s)(abc|(?!\A)\G).*?\K\x20

        Now, in this new tab, containing the sample text :

        • Move back to the beginning ( Ctrl + Home )

        • Open the Replace dialog Ctrl + H )

          • SEARCH (?-s)(abc|(?!\A)\G).*?\K\x20

          • REPLACE @

        • Click once on the Replace button ( identical to a click on the Find Next button ! )

        => The first space character, of line 5, is matched. As the caret was initially on a blank line, it cannot match any standard character. So the part (?-s)(?!\A)\G.*?\K\x20, with the second alternative, has not been used by the regex engine

        Thus, the regex engine uses, necessarily, the first alternative ( the regex (?-s)abc.*?\K\x20 ), which indeed, contains a part, before \K, which is not a zero-length area, as equal to (?-s)abc.*?

        However, if you click a second time on the Replace button, the first space char of line 5 is, as expected, changed into the @ symbol !

        Of course, any further space char, located in lines containing the abc string, are replaced, in the same way, after successive clicks on the Replace button


        You also said :

        Probably a zero-length match to the left of \K in a regex isn’t very useful often;

        But it just so happens that our generic regex S/R :

        SEARCH (?s-i:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?s-i:FR)

        REPLACE RR

        seems to work correctly with “step-by-step” replacement ;-))

        Note that the search syntax above corresponds to the particular case where :

        • The search is sensible to the case of letters

        • The search may extend to a multi-lines area

        • No subset of FR, is needed in the replacement regex => All groups are non-capturing ones

        Best Regards,

        guy038

        Alan KilbornA 1 Reply Last reply Reply Quote 2
        • Alan KilbornA
          Alan Kilborn @guy038
          last edited by

          @guy038 said in Changing Data inside XML element:

          Not totally exact, Alan !

          But it just so happens that our generic regex S/R…

          Ok, then, back to the drawing board.
          More analysis work on this for me!
          I’ll return with the answers.
          Well…if I find them.

          Alan KilbornA 1 Reply Last reply Reply Quote 2
          • Alan KilbornA
            Alan Kilborn @Alan Kilborn
            last edited by Alan Kilborn

            @guy038

            So this is all very confusing until one breaks it down! :-)
            And even then it is still confusing. :-(

            Following your steps:

            Now, in this new tab, containing the sample text :
            Move back to the beginning ( Ctrl + Home )
            Open the Replace dialog Ctrl + H )
            SEARCH (?-s)(abc|(?!\A)\G).*?\K\x20
            REPLACE @
            Click once on the Replace button ( identical to a click on the Find Next button ! )

            Then, without doing anything else (without pressing any more buttons), you say:

            Thus, the regex engine uses, necessarily, the first alternative ( the regex (?-s)abc.?\K\x20 ), which indeed, contains a part, before \K, which is not a zero-length area, as equal to (?-s)abc.?

            Which I wholly agree with.

            But… the point where I talked (in my previous post) about a zero-length string to the left of \K has not come into play (yet)!

            We are left sitting and looking at:

            82432183-357f-4e78-8cf7-0b5f27fa1cdd-image.png

            Note the single space selected in line 5.

            At this point Replace is pressed a second time.
            The internal find is again run and the match that occurs is going to effectively be this part of the regex:

            (?!\A)\G.*?\K\x20

            and that IS going to be zero-length to the left of the \K !

            The key point might be that \G will match because it by definition “matches at the start of string-to-search at the first attempt”. And because this search is indeed a new search, we are AT the first attempt. The start of the string-to-search is going to be the left side of the selected space character.

            So in summary, to the left of the \K for the match we have:

            • (?!\A) -> match of zero-length
            • \G -> match of zero-length (from “key point” just above)
            • .*? -> match of zero-length (because \x20 will match next and since we are a minimal match, we match zero)

            Add up all those zeroes to obtain: 0+0+0=0

            And thus my postulate from the previous posting seems to hold: If the match to the left of \K is a zero-length match, the replacement WILL be made. So in this case the @ replaces the space.

            Any time the Replace button is pressed, it only replaces if the current selection matches the find-expression – do we agree on this? Regex or not…right?

            Because \K “cancels out” what comes before it in a regex (and it must cancel it very deeply in the regex engine – because (?<...) doesn’t have the same difficulty \K does), the only way a current selection is going to match an expression using \K is if what is to the left of the \K has no length to it.

            I’d be happy to have you poke holes in my logic here! :-)

            1 Reply Last reply Reply Quote 2
            • guy038G
              guy038
              last edited by guy038

              Hello, @alan-kilborn and All,

              Alan, I did numerous tests and, I’m afraid, that all that story does not depend on the \G assertion, anyway but only on the \K one !! Yes, really not easy :-((

              So here is, below, how I imagine the process, after clicking, either, on the Find Next or the Replace of the Replace dialog, whatever the search mode. Note that we will not speak about the Replace All behavior, at all !


              After any click on the Find Next button, the regex engine starts searching for a match of the search regex :

              • From the end of a present normal selection

              • From the present position of the caret, if NO selection exists

              Then :

              • IF NO match exists, the overall process stops

              • IF a match has been found, it is selected


              After any click on the Replace button, the regex engine starts searching for a match of the search regex :

              • From the start of a present normal selection

              • From the present position of the caret, if NO selection exists

              Then :

              • IF NO match exists, the overall process stops

              • IF a match has been found :

                • IF the match is strictly identical to the previous selection :

                  • The selection/match is replaced with the replacement regex

                  • The current position is reset to the location, right after the end of the replacement regex

                  • And the regex engine re-starts searching, further on, for a next match of the search regex :

                    • IF such a match can be found, it is selected

                    • IF NO match exists, the overall process stops

                • IF the match is different from the previous selection OR IF no previous selection exists :

                  • This match is just selected, without any replacement process

              So, the complete solution, in order that the step by step replacement works correctly, is that the current search, in any mode, matches the previous selection, whatever the way that selection was obtained :

              • By a previous click on the Replace button ( standard case of subsequent replacements )

              • By a previous click on the Find Next button ( standard case of a first replacement )

              • By a manual selection or move of the caret, generated intentionally, by the user !

              Thus :

              • In case of a search, in the Normal or Extended mode OR if no \K assertion is used, in Regex mode :

              => The previous selection, generated by a click on the Find Next or Replace button AND the present match, met at the start of the selection, are always identical, meaning that the Step by Step replacement feature is always functional !

              • In the case of the particular use of the \K assertion, in Regex mode, the present selection, before clicking on the Replace button, is the part of the regex, located after \K structure. So the rule is :

              => A simple replacement can occur ONLY IF the overall regex can match the part of the regex located after the \K syntax !

              In other words, if we consider the general regex form ........\K••••••••, this means that :

              The overall regex ........\K•••••••• must match, exactly, the same expression than the •••••••• sub-regex


              For instance :

              • (A) The regex (?-s).*?\Kabc does match the sub-regex abc, after \K => The step-by-step replacement is possible

              • (B) Similarly, the regex (?-s).*?\Kabc\d does match the sub-regex abc\d, after \K => The step-by-step replacement is OK

              • (C) And the regex (?-s).*?\K\dabc does match the sub-regex \dabc, after \K => The step-by-step replacement will work

              But, on the contrary :

              • (D) The regex (?-s)\d.*?\Kabc does not match the sub-regex abc, after \K => The step-by-step replacement is not possible

              • (E) Similarly, the regex (?-s)\d.*?\K\dabc does not match the sub-regex \dabc, after \K => The step-by-step replacement is KO

              • (F) Now, the regex (?-s)\d.*?\K\d?abc never matches the sub-regex \d?abc, after \K, too => The step-by-step replacement will not work. However, by clicking successively on the Replace button, it produces :

                • Firstly, a selection of the \dabc string, if that string is preceded, itself, with a digit

                • Secondly, a selection of the abc string

              • (G) Note that the use of the regex (?-s)\d?.*?\K\d?abc, is more difficult to understand, although logical : by clicking successively on the Replace button, it produces, either :

                • A selection of abc, if the previous selection was \dabc

                • A replacement by the @ symbol, if the previous selection was abc

              • (H) To end with, remark that, making the first \d? expression lazy, the resulting regex (?-s)\d??.*?\K\d?abc does match the sub-regex \d?abc, after \K. So :

                • The step-by-step replacement works nice

                • Any click on the Replace button, from the second, does change the matched string \dabc into a @ symbol


              • Test all these regexes, above, against the text 123abc456abc0abc789abc0, pasted in a new tab

              • The Replace regex is just @, in all cases

              • Select the Regular expression search mode

              • Click only on the Replace or Find Next button ( Do not use the Replace All button )


              Alan, the crucial point is to move the caret or create a normal selection, by yourself, before one or some click(s) on the Replace button, and observe the results with the hope of deducing a logical behavior, as I did ! This will allow you to confirm or deny my assertions ;-))

              Best Regards,

              guy038

              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by

                Hi, All,

                See my results of testing the @sasumner’s build, about the \K assertion and the step by step replacement :

                https://github.com/notepad-plus-plus/notepad-plus-plus/issues/8434#issuecomment-784583153

                See, also, an interesting helping post, of @ArkadiuszMichalski, regarding how to test new N++ builds !

                https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9493#issuecomment-781344297

                Best Regards,

                guy038

                Michael VincentM 1 Reply Last reply Reply Quote 2
                • Michael VincentM
                  Michael Vincent @guy038
                  last edited by

                  @guy038 said in Changing Data inside XML element:

                  See, also, an interesting helping post, of @ArkadiuszMichalski, regarding how to test new N++ builds !
                  https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9493#issuecomment-781344297

                  Seems to be official now:

                  https://github.com/notepad-plus-plus/notepad-plus-plus/wiki/Testing

                  Good job @ArkadiuszMichalski !

                  Cheers.

                  1 Reply Last reply Reply Quote 3
                  • Alan KilbornA Alan Kilborn referenced this topic on
                  • guy038G guy038 referenced this topic on
                  • guy038G guy038 referenced this topic on
                  • guy038G guy038 referenced this topic on
                  • guy038G guy038 referenced this topic on
                  • guy038G guy038 referenced this topic on
                  • guy038G guy038 referenced this topic on
                  • guy038G guy038 referenced this topic on
                  • guy038G guy038 referenced this topic on
                  • guy038G guy038 referenced this topic on
                  • guy038G guy038 referenced this topic on
                  • guy038G guy038 referenced this topic on
                  • guy038G guy038 referenced this topic on
                  • PeterJonesP PeterJones referenced this topic on
                  • PeterJonesP PeterJones referenced this topic on
                  • PeterJonesP PeterJones referenced this topic on
                  • PeterJonesP PeterJones referenced this topic on
                  • PeterJonesP PeterJones referenced this topic on
                  • PeterJonesP PeterJones referenced this topic on
                  • PeterJonesP PeterJones referenced this topic on
                  • Mark OlsonM Mark Olson referenced this topic on
                  • First post
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors