Community
    • Login

    need of explanation of find and replace with option regex

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    22 Posts 5 Posters 9.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by

      Hello @andrea-seyfarth,

      I think that that the correct regex, in your case, is :

      SEARCH (?s)^II\x20.+?Steuerklasse\x20VI\R

      REPLACE Leave Empty

      Notes :

      • As usual, the modifier (?s) forces the dot special character to be interpreted as any single character, even End of Line characters !. Of course, you may omit this part. But, in that case, the . matches newline option must be ticked

      • Then, the part ^II\x20 tries to match the literal string II followed by a space character, at beginning of lines

      • Near the end, Steuerklasse\x20VI tries to match the literal string Steuerklasse followed, with a space character and the string VI

      • In the middle, the syntax .+? looks for the tallest range of any character, between the regex II\x20 and the regex Steuerklasse\x20VI

      • The \R form, at the very end of the search regex, catches the End of line character(s) of the last line, whatever, the file type (Windows, Unix or Mac )

      • As this total amount of text is to be deleted, the replacement zone is just empty

      Remarks :

      • It’s important to note that the (?s)^II\x20.+Steuerklasse\x20VI\R syntax, would grasp all text, between the first regex II\x20 of the file and the last regex Steuerklasse\x20VI of the file !!

      • Of course, you may replace any syntax \x20 with a single space character !

      Cheers,

      guy038

      1 Reply Last reply Reply Quote 0
      • Andrea SeyfarthA
        Andrea Seyfarth
        last edited by

        Hi guy038,
        The search with your suggested syntax doesn’t work in Notepad ++

        I copied the syntax (?s)^II\x20.+?Steuerklasse\x20VI\R into the field search and pressed the “Find Next” - Button, nothing happened, I only got the message “Can’t find the text (?s)^II\x20.+?Steuerklasse\x20VI\R”

        I tested it with all 3 Search modes (normal, advanced and Regex), Regex with both options checked and unchecked before the dot. I repeated every Search mode option with the checkmark in option match case, too
        I’m using the latest version in german (7.5.4) on Windows 8.1, 64-Bit

        Is there another setting, I have to set?

        kind regards

        Andrea

        Claudia FrankC 1 Reply Last reply Reply Quote 0
        • Claudia FrankC
          Claudia Frank @Andrea Seyfarth
          last edited by

          @Andrea-Seyfarth

          The \R at the end might be the issue if your last line (Steuerklasse VI) doesn’t have a carriage return set.

          Cheers
          Claudia

          Andrea SeyfarthA 1 Reply Last reply Reply Quote 1
          • PeterJonesP
            PeterJones
            last edited by

            I believe Claudia is right (my experiments matched hers)

            Moreover, if you change the final \R to a $ instead, it will match whether you’ve got a newline at the end or not: (?s)^II\x20.+?Steuerklasse\x20VI$

            Andrea SeyfarthA 1 Reply Last reply Reply Quote 2
            • Andrea SeyfarthA
              Andrea Seyfarth @Claudia Frank
              last edited by

              @Claudia-Frank

              thanks for another suggest of a possible solution, but I tested it and I’m not still getting any findings,
              but I have compared your Debuginfo with mine and found following differences:

              Notepad++ v7.5.4 (64-bit)
              Build time : Jan 1 2018 - 01:50:29
              Path : C:\Program Files\Notepad++\notepad++.exe
              Admin mode : OFF
              Local Conf mode : OFF
              OS : Windows 8.1 (64-bit)
              Plugins : DSpellCheck.dll mimeTools.dll NppConverter.dll
              and NppExport.dll is failing

              I think perhaps, that the bold marked parts makes the differences between working and not working.
              But how can I change the bold marked settings?

              I don’t think, the reason lies in the italic marked settings (Bitsize or OS)

              And how can I insert a screenshot in this reply?
              Coppy and paste doesn’t work and I don’t find any button to insert an image …
              I’d like you to see my inserts and settings for comparing.
              I could give you only my debuginfo, because I could copy the text with option “copy debug info into clipboard”

              I see forward to your response,
              many thanks

              Andrea

              Claudia FrankC 1 Reply Last reply Reply Quote 0
              • Claudia FrankC
                Claudia Frank @Andrea Seyfarth
                last edited by Claudia Frank

                @Andrea-Seyfarth

                Hi Andrea,

                the forum doesn’t have an option to upload images but the
                used html renderer (markdown) knows a syntax to embed external images.
                So what most do is to upload the image to a hoster like imgur.com and then,
                once the image has been created, right-click on the image and do save image address
                and use the syntax

                ![](HERE_IMAGE_ADDRESS)
                

                to embed it into the post.

                I don’t think the bolded lines have any effect on it but who knows.
                Admin mode is ON if you right-click to the npp shortcut and do
                run as administrator, if you don’t do it -> OFF

                Local Conf mode is ON if you have a file doLocalConf.xml
                in the same folder where your notepad++.exe is, if you don’t have that file -> OFF
                The file is empty, only its existence is needed to switch from off to on.
                It basically informs npp that it should not use/create the user config files under
                %APPDATA% but rather use the ones in the install directory.

                The export plugin gives the additional features to copy the text in rtf and/or html format.

                Did you try Peters suggestion as well?
                Or could you provide more, obfuscated if needed real data to see if something else
                breaks the regex?

                Cheers
                Claudia

                Andrea SeyfarthA 1 Reply Last reply Reply Quote 0
                • Andrea SeyfarthA
                  Andrea Seyfarth @PeterJones
                  last edited by

                  @PeterJones
                  Hi Peter,

                  I tested your version with $ at the end of the last line too,
                  but I got the same errormessage again.
                  In my answer to Claudia I put the debuginfo perhaps it will help, but I don’t know where I can make the alterations for the bold marked information (admin mode and local conf mode).

                  Scott SumnerS 1 Reply Last reply Reply Quote 0
                  • Scott SumnerS
                    Scott Sumner @Andrea Seyfarth
                    last edited by

                    @Andrea-Seyfarth

                    At this point it is probably best to do what @Claudia-Frank suggested and post some real data. I describe how to reasonably do that (post some data) in this thread. Try to avoid a straight paste of data into a posting as you did in the first post in this thread, as often data contains special markdown commands.

                    Another possibility is to use a site like www.textuploader.com and then post the link in a reply here. Of course, as Claudia said if your data is “sensitive” you may want to replace any sensitive info with different (but format-equivalent) dummy data first.

                    1 Reply Last reply Reply Quote 1
                    • Andrea SeyfarthA
                      Andrea Seyfarth @Claudia Frank
                      last edited by

                      @Claudia-Frank
                      @Scott-Sumner
                      @Peter Jones

                      thanks for the tip with the link to imgur and other webdrives, here is the image:

                      Screenshot Suchefenster mit Fehlermeldung

                      I tested it with Peter’s suggestion as well and I got the same errormessage.
                      I can give you the whole file, because it is no secret - it’s the latest taxlist for wages and you can get it in the the web. But it is too big for my purpose and you can get it only as an pdf-file. I converted it into a textfile and try to shorten it, because I’m only interested in taxclass I (not married, no bonus for children, no additional tax for church/religion)
                      Befor I can import it to an xls(x) file, there is a want for a slight trimming for a better import, because there are too many irregularities for import.

                      And here is the file
                      Textdatei_zum_Probieren

                      if you can’t click the link, copy the url below and paste it to your browseradress
                      http://textuploader.com/dhj7x

                      The text link will be available for one week from today.

                      Thank you for your support to all

                      Andrea

                      Scott SumnerS Andrea SeyfarthA 2 Replies Last reply Reply Quote 1
                      • PeterJonesP
                        PeterJones
                        last edited by PeterJones

                        @Andrea-Seyfarth ,

                        Use ![](https://i.imgur.com/IlpoqlI.jpg) to embed it – the initial exclamation point will allow the forum to render the image, saving us from having to click to the site:

                        In the first image, you can see that your roman-numeraled lines are all indented. The regex assumed that the II were the first characters on the line. It’s easy to fix:

                        (?s)^[\x20\x09]*II\x20.+?Steuerklasse\x20VI$

                        The [\x20\x09]* will match 0 or more spaces or tabs between the start-of-line and the II, so it won’t matter whether it’s space-indented, tab indented, or not indented. (I wanted to use the \w whitespace escape, but it also matches newline, even with (?-s:\w*).)

                        If there’s any possibility of trailing spaces after the VI, you might want to use
                        (?s)^[\x20\x09]*II\x20.+?Steuerklasse\x20VI[\x20\x09]*$

                        1 Reply Last reply Reply Quote 1
                        • Scott SumnerS
                          Scott Sumner @Andrea Seyfarth
                          last edited by

                          @Andrea-Seyfarth

                          See why posting actual data is a good thing? It allowed the whitespace on the lines before the II to be seen by readers of this thread! :-)
                          And the problem to be detected and solved–quickly! :-D

                          1 Reply Last reply Reply Quote 2
                          • Andrea SeyfarthA
                            Andrea Seyfarth @Andrea Seyfarth
                            last edited by

                            @Andrea-Seyfarth

                            here is the new embeding of the image:

                            Thanks to Peter for this tip

                            Scott SumnerS Andrea SeyfarthA 2 Replies Last reply Reply Quote 0
                            • Scott SumnerS
                              Scott Sumner @Andrea Seyfarth
                              last edited by

                              @Andrea-Seyfarth

                              Image just posted is of dubious value; @PeterJones already posted it as well as solved your original problem…or is there something new to keep this going?

                              1 Reply Last reply Reply Quote 0
                              • Andrea SeyfarthA
                                Andrea Seyfarth @Andrea Seyfarth
                                last edited by

                                @Andrea-Seyfarth said:

                                @Andrea-Seyfarth

                                here is the new embeding of the image:

                                Thanks to Peter for this tip

                                I tested the new suggestions, but I get always Can’t find the next …
                                It seems, as if Notepad++ think the searching syntax is part of a text, not an order.
                                I tried it with option advanced (in German “erweitert”) and with option regex (in German “reguläre Ausdrücke”) I was trying Peters last two expressions one by one by copy and paste, the problem is not solved …

                                here is the last sample:

                                Thanks

                                Andrea

                                1 Reply Last reply Reply Quote 0
                                • PeterJonesP
                                  PeterJones
                                  last edited by

                                  It found the text highlighted text for me
                                  Imgur

                                  1 Reply Last reply Reply Quote 1
                                  • guy038G
                                    guy038
                                    last edited by guy038

                                    Hello, @andre-seyfarth,

                                    Not totally sure that I’ve found out the problem, but I would advice to tick, in the Suchen dialog, the Am Ende von vorne beginnen option ( Wrap around option in English )

                                    With that option set, when you begin your search, for instance, at the middle of your file, The regex engine first searches occurrences from that location to the very end of your file. Then, it go back to the very beginning of the file and continue searching until the initial location of your cursor, before the search/replace operation !

                                    Best Regards,

                                    guy038

                                    Andrea SeyfarthA 1 Reply Last reply Reply Quote 1
                                    • Andrea SeyfarthA
                                      Andrea Seyfarth @guy038
                                      last edited by

                                      @guy038
                                      I tried that option too, but I got no result …
                                      Is there a setting in the programm itself, which is restraining the search-routine to recognize the syntax as a syntax and not as a part of a normal text?
                                      Thats my hunch, that the searchroutine of np++ can’t recognize the syntax anyway or simply ignores the fact, that it is a syntax ;-)

                                      Here is the new sample with option Wrap around:

                                      thanks for your patience to all, I don’t know, why all your suggestions don’t work in my version of npp++
                                      when I use the same options you have suggested and insert the syntax into the field “search what” by copy and paste.
                                      I compare the syntax, which you have written down with the one I have put it copy and by paste into the field every time before I push the button “Find next”. I don’t dare to push the button “replace” because I don’t want to purge the file.

                                      kind regards

                                      Andrea

                                      1 Reply Last reply Reply Quote 0
                                      • PeterJonesP
                                        PeterJones
                                        last edited by

                                        The screenshot you showed is back to using the old regex, which didn’t allow for spaces before the II, and required a newline at the end (so won’t match the very last occurrence) – we already showed that was wrong for your data, and explained why. Please try it again with one of the new ones, like (?s)^[\x20\x09]*II\x20.+?Steuerklasse\x20VI[\x20\x09]*$.

                                        1 Reply Last reply Reply Quote 2
                                        • PeterJonesP
                                          PeterJones
                                          last edited by

                                          I know you haven’t asked for it, but I assume if you ever make my regex match, you’ll soon be noticing something I saw.

                                          If you use the regex (?s)^[\x20\x09]*II\x20.+?Steuerklasse\x20VI[\x20\x09]*$ on the file quoted here (which is a simplified version of your file):

                                          111,11I Header
                                              II      222,22
                                              III     333.33
                                              IV      444.44
                                              V       555.55  Steuerklassen V und IV
                                              VI      666.66  Steuerklasse VI
                                          111,11I Header
                                              II      222,22
                                              III     333.33
                                              IV      444.44
                                              V       555.55  Steuerklassen V und IV
                                              VI      666.66  NoMatch VI
                                          111,11I Header
                                              II      222,22
                                              III     333.33
                                              IV      444.44
                                              V       555.55  Steuerklassen V und IV
                                              VI      666.66  Steuerklasse VI
                                          111,11I Header
                                              II      222,22
                                              III     333.33
                                              IV      444.44
                                              V       555.55  Steuerklassen V und IV
                                              VI      666.66  NoMatch VI
                                          111,11I Header
                                              II      222,22
                                              III     333.33
                                              IV      444.44
                                              V       555.55  Steuerklassen V und IV
                                              VI      666.66  Steuerklasse VI
                                          

                                          Then the second match will be the second and third “paragraph” (as shown in the left side of the image below). (I have defined “paragraph” to mean the entire 6-line grouping)

                                          However, if you modify the regex to (?s)^[\x20\x09]*II\x20(?:.(?!^[\x20\x09]*II\x20))+?Steuerklasse\x20VI[\x20\x09]*$, it will match just the third “paragraph”, as seen in the lower-right, which I am assuming what you’ll eventually want.

                                          Explanation: the .+? from the original regex will match any character, one or more times. By replacing that with (?:.(?!^[\x20\x09]*II\x20))+?, I am able to restrict it to match any character, one or more times, as long as that character isn’t followed by II at the beginning of a line. To break it down:

                                          • (?:___) = this wrapper says “match ___ as a group, but (if also using a replace string) don’t capture it into the $1”. (You don’t say whether you’re ever going to try a Replace to go with this Find What, so I thought I’d make it safe
                                          • (?:___)+? = match one of more of this group, but make it as short as possible (so if you’ve got two matching "paragraph"s in a row, it won’t be greedy and will only highlight one “paragraph” at a time)
                                          • Moving inside, the dot . still matches one character.
                                          • .(?!___) = this second parenthetical is a “negative lookahead assertion” (as indicated by the ?!). Together, this means "look for any one character, as long as it’s not followed by the ___. The lookahead does not “use up” any characters, so .(?!___) still only matches one character
                                          • By using ^[\x20\x09]*II\x20 (which was what we used to define the start-of-match, earlier) as the ___ in the negative lookahead, we are saying “we don’t want our standard start-of-match sequence to be anywhere inside our match”.

                                          (I had seen this problem before I made yesterday’s post, but I didn’t have a solution at that point, so wasn’t going to mention it. :-) I thought I was going to have to hand it off to @guy038 if @Andrea-Seyfarth asked about it… but I thought of the negative lookahead while lying awake in bed before my alarm went off this morning. I tried it as soon as I could, and it worked… Hopefully this helps.)

                                          1 Reply Last reply Reply Quote 0
                                          • guy038G
                                            guy038
                                            last edited by guy038

                                            Hello, @andrea-seyfarth, @peterjones and All

                                            Here are two regexes :

                                            Regex A : (?s)^\h*II\x20.+?\R\h*VI\x20(?-s).+\R?

                                            Regex B : (?s)^\h*II\x20((?!Header).)+?Steuerklasse\x20VI(\R|\z)

                                            that I tested against the text below :

                                            111,11I Header
                                                I       000.00
                                                II      222,22
                                                III     333.33
                                                IV      444.44
                                                V       555.55  Steuerklassen V und IV
                                                VI      666.66  Steuerklasse VI
                                                VII     777.77  Steuerklasse VII
                                                VIII    888.88  Steuerklasse VIII
                                            111,11I Header
                                                I       000.00
                                                II      222,22
                                                III     333.33
                                                IV      444.44
                                                V       555.55  Steuerklassen V und IV
                                                VI      666.66  ----- NO MATCH 1 ----- VI
                                                VII     777.77  Steuerklasse VII
                                                VIII    888.88  Steuerklasse VIII
                                            111,11I Header
                                                I       000.00
                                                II      222,22
                                                III     333.33
                                                IV      444.44
                                                V       555.55  Steuerklassen V und IV
                                                VI      666.66  Steuerklasse VI
                                                VII     777.77  Steuerklasse VII
                                                VIII    888.88  Steuerklasse VIII
                                            111,11I Header
                                                I       000.00
                                                II      222,22
                                                III     333.33
                                                IV      444.44
                                                V       555.55  Steuerklassen V und IV
                                                VI      666.66  ----- NO MATCH 2 ----- VI
                                                VII     777.77  Steuerklasse VII
                                                VIII    888.88  Steuerklasse VIII
                                            111,11I Header
                                                I       000.00
                                                II      222,22
                                                III     333.33
                                                IV      444.44
                                                V       555.55  Steuerklassen V und IV
                                                VI      666.66  ----- NO MATCH 3 ----- VI
                                                VII     777.77  Steuerklasse VII
                                                VIII    888.88  Steuerklasse VIII
                                            111,11I Header
                                                I       000.00
                                                II      222,22
                                                III     333.33
                                                IV      444.44
                                                V       555.55  Steuerklassen V und IV
                                                VI      666.66  Steuerklasse VI
                                            

                                            The regex A catches the entire FIVE lines, beginning with Roman number II and beginning with Roman number VI ( with possible horizontal blank characters before ), whatever their contents

                                            And the regex B catches the entire FIVE lines, beginning with Roman number II ( with possible horizontal blank characters before ) and ending with the string Steuerklasse VI, ONLY IF the string Header cannot be found, at any position, of the smallest multi-lines sequence of characters, after the regex \h*II\x20 till the regex Steuerklasse\x20VI !

                                            So, Peter, as you can see, I used the negative look-ahead (?!Header), which is tested at any position of the . => the syntax ((?!Header).)+?. Note also, by I preferred to get the entire lines, with their End of Line chars ! So, when the replacement zone is empty, it does not remain any blank line, afterwards :-))

                                            I also, used the alternative (\R|\z), just in case the very last line would be a line VI 666.66 Steuerklasse VI, without any line-break !

                                            Best Regards,

                                            guy038

                                            1 Reply Last reply Reply Quote 1
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors