• Login
Community
  • Login

need of explanation of find and replace with option regex

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
22 Posts 5 Posters 9.6k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A
    Andrea Seyfarth @Andrea Seyfarth
    last edited by Feb 7, 2018, 1:19 PM

    @Andrea-Seyfarth

    here is the new embeding of the image:

    Thanks to Peter for this tip

    S A 2 Replies Last reply Feb 7, 2018, 1:28 PM Reply Quote 0
    • S
      Scott Sumner @Andrea Seyfarth
      last edited by Feb 7, 2018, 1:28 PM

      @Andrea-Seyfarth

      Image just posted is of dubious value; @PeterJones already posted it as well as solved your original problem…or is there something new to keep this going?

      1 Reply Last reply Reply Quote 0
      • A
        Andrea Seyfarth @Andrea Seyfarth
        last edited by Feb 7, 2018, 1:53 PM

        @Andrea-Seyfarth said:

        @Andrea-Seyfarth

        here is the new embeding of the image:

        Thanks to Peter for this tip

        I tested the new suggestions, but I get always Can’t find the next …
        It seems, as if Notepad++ think the searching syntax is part of a text, not an order.
        I tried it with option advanced (in German “erweitert”) and with option regex (in German “reguläre Ausdrücke”) I was trying Peters last two expressions one by one by copy and paste, the problem is not solved …

        here is the last sample:

        Thanks

        Andrea

        1 Reply Last reply Reply Quote 0
        • P
          PeterJones
          last edited by Feb 7, 2018, 2:06 PM

          It found the text highlighted text for me
          Imgur

          1 Reply Last reply Reply Quote 1
          • G
            guy038
            last edited by guy038 Feb 7, 2018, 5:05 PM Feb 7, 2018, 3:07 PM

            Hello, @andre-seyfarth,

            Not totally sure that I’ve found out the problem, but I would advice to tick, in the Suchen dialog, the Am Ende von vorne beginnen option ( Wrap around option in English )

            With that option set, when you begin your search, for instance, at the middle of your file, The regex engine first searches occurrences from that location to the very end of your file. Then, it go back to the very beginning of the file and continue searching until the initial location of your cursor, before the search/replace operation !

            Best Regards,

            guy038

            A 1 Reply Last reply Feb 8, 2018, 2:32 PM Reply Quote 1
            • A
              Andrea Seyfarth @guy038
              last edited by Feb 8, 2018, 2:32 PM

              @guy038
              I tried that option too, but I got no result …
              Is there a setting in the programm itself, which is restraining the search-routine to recognize the syntax as a syntax and not as a part of a normal text?
              Thats my hunch, that the searchroutine of np++ can’t recognize the syntax anyway or simply ignores the fact, that it is a syntax ;-)

              Here is the new sample with option Wrap around:

              thanks for your patience to all, I don’t know, why all your suggestions don’t work in my version of npp++
              when I use the same options you have suggested and insert the syntax into the field “search what” by copy and paste.
              I compare the syntax, which you have written down with the one I have put it copy and by paste into the field every time before I push the button “Find next”. I don’t dare to push the button “replace” because I don’t want to purge the file.

              kind regards

              Andrea

              1 Reply Last reply Reply Quote 0
              • P
                PeterJones
                last edited by Feb 8, 2018, 3:46 PM

                The screenshot you showed is back to using the old regex, which didn’t allow for spaces before the II, and required a newline at the end (so won’t match the very last occurrence) – we already showed that was wrong for your data, and explained why. Please try it again with one of the new ones, like (?s)^[\x20\x09]*II\x20.+?Steuerklasse\x20VI[\x20\x09]*$.

                1 Reply Last reply Reply Quote 2
                • P
                  PeterJones
                  last edited by Feb 9, 2018, 2:29 PM

                  I know you haven’t asked for it, but I assume if you ever make my regex match, you’ll soon be noticing something I saw.

                  If you use the regex (?s)^[\x20\x09]*II\x20.+?Steuerklasse\x20VI[\x20\x09]*$ on the file quoted here (which is a simplified version of your file):

                  111,11I Header
                      II      222,22
                      III     333.33
                      IV      444.44
                      V       555.55  Steuerklassen V und IV
                      VI      666.66  Steuerklasse VI
                  111,11I Header
                      II      222,22
                      III     333.33
                      IV      444.44
                      V       555.55  Steuerklassen V und IV
                      VI      666.66  NoMatch VI
                  111,11I Header
                      II      222,22
                      III     333.33
                      IV      444.44
                      V       555.55  Steuerklassen V und IV
                      VI      666.66  Steuerklasse VI
                  111,11I Header
                      II      222,22
                      III     333.33
                      IV      444.44
                      V       555.55  Steuerklassen V und IV
                      VI      666.66  NoMatch VI
                  111,11I Header
                      II      222,22
                      III     333.33
                      IV      444.44
                      V       555.55  Steuerklassen V und IV
                      VI      666.66  Steuerklasse VI
                  

                  Then the second match will be the second and third “paragraph” (as shown in the left side of the image below). (I have defined “paragraph” to mean the entire 6-line grouping)

                  However, if you modify the regex to (?s)^[\x20\x09]*II\x20(?:.(?!^[\x20\x09]*II\x20))+?Steuerklasse\x20VI[\x20\x09]*$, it will match just the third “paragraph”, as seen in the lower-right, which I am assuming what you’ll eventually want.

                  Explanation: the .+? from the original regex will match any character, one or more times. By replacing that with (?:.(?!^[\x20\x09]*II\x20))+?, I am able to restrict it to match any character, one or more times, as long as that character isn’t followed by II at the beginning of a line. To break it down:

                  • (?:___) = this wrapper says “match ___ as a group, but (if also using a replace string) don’t capture it into the $1”. (You don’t say whether you’re ever going to try a Replace to go with this Find What, so I thought I’d make it safe
                  • (?:___)+? = match one of more of this group, but make it as short as possible (so if you’ve got two matching "paragraph"s in a row, it won’t be greedy and will only highlight one “paragraph” at a time)
                  • Moving inside, the dot . still matches one character.
                  • .(?!___) = this second parenthetical is a “negative lookahead assertion” (as indicated by the ?!). Together, this means "look for any one character, as long as it’s not followed by the ___. The lookahead does not “use up” any characters, so .(?!___) still only matches one character
                  • By using ^[\x20\x09]*II\x20 (which was what we used to define the start-of-match, earlier) as the ___ in the negative lookahead, we are saying “we don’t want our standard start-of-match sequence to be anywhere inside our match”.

                  (I had seen this problem before I made yesterday’s post, but I didn’t have a solution at that point, so wasn’t going to mention it. :-) I thought I was going to have to hand it off to @guy038 if @Andrea-Seyfarth asked about it… but I thought of the negative lookahead while lying awake in bed before my alarm went off this morning. I tried it as soon as I could, and it worked… Hopefully this helps.)

                  1 Reply Last reply Reply Quote 0
                  • G
                    guy038
                    last edited by guy038 Feb 9, 2018, 7:03 PM Feb 9, 2018, 7:00 PM

                    Hello, @andrea-seyfarth, @peterjones and All

                    Here are two regexes :

                    Regex A : (?s)^\h*II\x20.+?\R\h*VI\x20(?-s).+\R?

                    Regex B : (?s)^\h*II\x20((?!Header).)+?Steuerklasse\x20VI(\R|\z)

                    that I tested against the text below :

                    111,11I Header
                        I       000.00
                        II      222,22
                        III     333.33
                        IV      444.44
                        V       555.55  Steuerklassen V und IV
                        VI      666.66  Steuerklasse VI
                        VII     777.77  Steuerklasse VII
                        VIII    888.88  Steuerklasse VIII
                    111,11I Header
                        I       000.00
                        II      222,22
                        III     333.33
                        IV      444.44
                        V       555.55  Steuerklassen V und IV
                        VI      666.66  ----- NO MATCH 1 ----- VI
                        VII     777.77  Steuerklasse VII
                        VIII    888.88  Steuerklasse VIII
                    111,11I Header
                        I       000.00
                        II      222,22
                        III     333.33
                        IV      444.44
                        V       555.55  Steuerklassen V und IV
                        VI      666.66  Steuerklasse VI
                        VII     777.77  Steuerklasse VII
                        VIII    888.88  Steuerklasse VIII
                    111,11I Header
                        I       000.00
                        II      222,22
                        III     333.33
                        IV      444.44
                        V       555.55  Steuerklassen V und IV
                        VI      666.66  ----- NO MATCH 2 ----- VI
                        VII     777.77  Steuerklasse VII
                        VIII    888.88  Steuerklasse VIII
                    111,11I Header
                        I       000.00
                        II      222,22
                        III     333.33
                        IV      444.44
                        V       555.55  Steuerklassen V und IV
                        VI      666.66  ----- NO MATCH 3 ----- VI
                        VII     777.77  Steuerklasse VII
                        VIII    888.88  Steuerklasse VIII
                    111,11I Header
                        I       000.00
                        II      222,22
                        III     333.33
                        IV      444.44
                        V       555.55  Steuerklassen V und IV
                        VI      666.66  Steuerklasse VI
                    

                    The regex A catches the entire FIVE lines, beginning with Roman number II and beginning with Roman number VI ( with possible horizontal blank characters before ), whatever their contents

                    And the regex B catches the entire FIVE lines, beginning with Roman number II ( with possible horizontal blank characters before ) and ending with the string Steuerklasse VI, ONLY IF the string Header cannot be found, at any position, of the smallest multi-lines sequence of characters, after the regex \h*II\x20 till the regex Steuerklasse\x20VI !

                    So, Peter, as you can see, I used the negative look-ahead (?!Header), which is tested at any position of the . => the syntax ((?!Header).)+?. Note also, by I preferred to get the entire lines, with their End of Line chars ! So, when the replacement zone is empty, it does not remain any blank line, afterwards :-))

                    I also, used the alternative (\R|\z), just in case the very last line would be a line VI 666.66 Steuerklasse VI, without any line-break !

                    Best Regards,

                    guy038

                    1 Reply Last reply Reply Quote 1
                    • P
                      PeterJones
                      last edited by Feb 9, 2018, 7:12 PM

                      The word “Header” was my invention, and not in any of @Andrea-Seyfarth’s examples, so that shouldn’t be used for a regex we suggest to her. Sorry for muddying the waters with my example file.

                      I like your cleaner \h for the horizontal space (that escape sequence hadn’t yet stored in my long-term regex memory; some day, maybe even today, it will).

                      Thanks for continually sharing your regex expertise with us. I’m always amazed by your expressions, and the quality of your explanations.

                      Hopefully, we’ve helped Andrea in the process. :-)

                      1 Reply Last reply Reply Quote 0
                      22 out of 22
                      • First post
                        22/22
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors