Community
    • Login

    Why won't this PCRE regular expression work in Notepad++ when it works on regex101?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    15 Posts 6 Posters 340 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • B
      BlohoJo
      last edited by

      I have a PCRE Regular Expression that observes common restrictions and finds and selects a specified phrase at the beginning of sentences, only if it is preceded by a sentence that also contains the phrase. The first sentence’s phrase is not selected.

      Example:

      Phrase A: Words
      Phrase A: Words
      Phrase A: Words
      Phrase B: Words
      Phrase B: Words
      Phrase A: Words
      Phrase A: Words
      Phrase B: Words
      Phrase A: Words
      Phrase A: Words
      

      So in the above sequence, the user specifies Phrase A: in the expression, and it then selects only Phrase A: in lines 2, 3, 7, and 10:

      Phrase A: Words
      [Phrase A: ] Words
      [Phrase A: ] Words
      Phrase B: Words
      Phrase B: Words
      Phrase A: Words
      [Phrase A: ] Words
      Phrase B: Words
      Phrase A: Words
      [Phrase A: ] Words
      

      The expression is:
      /(?<=Phrase A: ).*[\r\n]+\KPhrase A: /gm

      It doesn’t work with numbers, and only works if the phrase is at the beginning of a line:
      https://regex101.com/r/X9xGdf/1

      However, it doesn’t work in Notepad++. I’ve tried both Notepad++'s native tools and also the MultiReplace plugin, neither work.

      Is this a bug, something I’m doing wrong, or just something that’s not possible in Notepad++? (For what it’s worth, I also tried UltraEdit; it didn’t work there either.)

      Terry RT B Lycan ThropeL 3 Replies Last reply Reply Quote 1
      • Terry RT
        Terry R @BlohoJo
        last edited by Terry R

        @BlohoJo said in Why won't this PCRE regular expression work in Notepad++ when it works on regex101?:

        However, it doesn’t work in Notepad++

        Not sure why you consider it doesn’t work because I replicated your setup (data and regular expression) and got what you wanted. See:

        e3216d45-c680-494b-9af2-26d00d365d55-image.png .

        About all I can say is you included the leading forward slash and the trailing /gm. They aren’t a part of the expression. You also mention it not working with numbers, what do you mean?

        Please recheck for yourself and come back with a similar image to what I included showing your issue if you can show it doesn’t work for you.

        I’m wondering if you have used AI to create this expression and really know nothing about regular expressions, if so you are in dangerous territory. Trying to use something and relying on the outcome without any knowledge will only lead to disaster.

        Terry

        1 Reply Last reply Reply Quote 0
        • B
          BlohoJo @BlohoJo
          last edited by

          I apologize, I swore I was on the latest version, but I wasn’t. After updating to version 8.7.8, the expression now works for the “Mark” tab and also the “Find all”.

          However, I was quite shocked to find that I now have no way to do what I want which is to DELETE the found phrases.

          I can’t select or delete marked text. I need to delete just the marked text, not the entire line, so Bookmarks don’t help here.

          If I do a “Find all” it opens in a separate panel again with the searched phrase highlighted, there is no way to select just the phrase. I would need to be able to do a column mode selection in the search results panel, but you can’t. All I can do is double click on one highlighted phrase and it selects only a single phrase on a single line.

          Find/replace doesn’t work here due to the nature of the regular expression. If I try to do that, it misses about half of the phrases because find/replace works line by line, which affects the results.

          The simplest solution would be a “select marked text,” then the user could copy, paste over, delete, or do anything that you can do with a selection.

          So for now, I guess I just have to use the online resource regex101, as all I have to do is click on “Substitution” and I get the result I need instantly. I’d rather be able to do it local though.

          Terry RT CoisesC 2 Replies Last reply Reply Quote 0
          • Terry RT
            Terry R @BlohoJo
            last edited by Terry R

            @BlohoJo said in Why won't this PCRE regular expression work in Notepad++ when it works on regex101?:

            However, I was quite shocked to find that I now have no way to do what I want which is to DELETE the found phrases.

            Now is the time to put forward the issue you have. So far you have only given scraps of information.

            As you did in the first post, show the example in the first code box before the change happens, then the same example with the changes made in another box. When you remove those marked text did the remaining line get padded with spaces, or appended to the previous line, or maybe shoved left due to characters being removed?

            Expand on the problem detail. I sort of get what you want but there are some gaps in the details which could affect the solution.

            I will say that a possibility might be to reverse the lines before deleting text and reverse them again. It’s only a guess as I say, there are gaps in the details.

            Terry

            B 1 Reply Last reply Reply Quote 0
            • B
              BlohoJo @Terry R
              last edited by

              @Terry-R

              alt text

              I need to be able to delete the "Phrase A: " highlighted (marked) text in the above image. That’s it.

              Terry RT 1 Reply Last reply Reply Quote 0
              • Terry RT
                Terry R @BlohoJo
                last edited by Terry R

                @BlohoJo
                So you still haven’t answered my questions, so I’ll just provide this answer and hope either I have it right or it will make you realise you may need to expand on what you really need.

                First run the “Reverse Line Order” under the Edit, Line Operations menu
                Then using the Replace function:
                Find What(?-s)^(Phrase A: )(?=.*\R\1)
                Replace With: nothing in this field
                Lastly run the “Reverse Line Order” under the Edit, Line Operations menu

                Bear in mind there are often a few solutions available so others may also provide their input. It will be up to you to decide which fits your needs the best.

                Terry

                PS you can record all this as a macro and assign a shortcut so it is easy to re-run, especially if you are handing the job onto someone with less knowledge than yourself.

                1 Reply Last reply Reply Quote 2
                • CoisesC
                  Coises @BlohoJo
                  last edited by

                  @BlohoJo said in Why won't this PCRE regular expression work in Notepad++ when it works on regex101?:

                  However, I was quite shocked to find that I now have no way to do what I want which is to DELETE the found phrases.

                  I can’t select or delete marked text. I need to delete just the marked text, not the entire line, so Bookmarks don’t help here.

                  One option is to use the Search… dialog in the Columns++ plugin. (You can install it from Plugins Admin.)

                  That search acts in an indicated region; if you begin with nothing selected, it will set the search region to the entire document.

                  On the dropdown menu for the Count button there is an option Select All. If you start with nothing selected, enter your search expression in the Find what box, set the Search Mode to Regular expression and then choose Select All, I believe that will do exactly what you want. You can then close the dialog and press the Delete key to delete the selected text.

                  1 Reply Last reply Reply Quote 0
                  • Lycan ThropeL
                    Lycan Thrope @BlohoJo
                    last edited by Lycan Thrope

                    @BlohoJo ,
                    Not that I’m anywhere near an expert with Regex, but first off, Notepad++ does not use PCRE, it uses Boost which is patterned somewhat after PCRE. Unless I misunderstand the conditions you want to test for, I did a very simple Search, Mark All, came up with the right solution, put it back into Replace mode and ran my Boost/NPP regex and it seems to have done exactly what your first example shows you desired. Not to take from @Terry-R or @Coises suggestions with the following regex and pics to show results.

                    Find What = Phrase A\:\h?
                    Replace With = 
                    Search Mode = REGULAR EXPRESSION
                    Dot Matches Newline = NOT CHECKED
                    

                    Pics show the results:
                    Mark Tab:
                    PCRENOTBOOST.PNG

                    Replace Tab: Replace All (selected)
                    PCRENOTBOOST2.PNG

                    Terry RT 1 Reply Last reply Reply Quote 2
                    • Terry RT
                      Terry R @Lycan Thrope
                      last edited by

                      @Lycan-Thrope said in Why won't this PCRE regular expression work in Notepad++ when it works on regex101?:

                      and it seems to have done exactly what your first example shows you desired.

                      Actually I think you missed the OP’s need. They said:
                      and finds and selects a specified phrase at the beginning of sentences, only if it is preceded by a sentence that also contains the phrase. The first sentence’s phrase is not selected.

                      When I showed what my “Mark” result was using his regex he confirmed that the highlighted text was exactly what he wanted removed. Unfortunately as he found out, and I had alluded to in an earlier post, the removal was an entirely different matter unless a reversal of lines was used.

                      So if a line has the string Phrase A:, then remove that string if the preceding line also contained that string. So when replacing, say line 9 contains that string. If line 8 also had it, then remove the string from line 9. Now check line 10, if that had the same as line 9… Oh wait, I’ve just removed that from line 9, but as I’m a regex, I don’t remember that I did that so I don’t remove it from line 10. But I should have!

                      This is all due to the lookbehind function which doesn’t help as we’ve possibly already changed the line before and now don’t know what it was. So what I’ve done is reversed the line order, so now it’s a lookahead. As such we’ve yet to process the next line so we CAN use the lookahead function with certainty.

                      Terry

                      Lycan ThropeL 1 Reply Last reply Reply Quote 3
                      • Alan KilbornA
                        Alan Kilborn
                        last edited by

                        @Coises

                        It seems like the suggestion of using a plugin to solve the OP’s problem is a bit “much” when the problem can be solved with Notepad++ itself/alone.

                        However, as OP did ask for a means to “select” search matches, and Notepad++ can’t do that, suggesting the plugin makes some sense. But, for any future readers of this with a similar need, replacing search matches with nothing effectively deletes them – you don’t need a means to “select” them so that you can then hit the Delete key to remove them.

                        1 Reply Last reply Reply Quote 2
                        • guy038G
                          guy038
                          last edited by guy038

                          Hello, @blohojo, @terry-r, @coises, @lycan-thrope, @alan-kilborn and All,

                          Indeed, this tricky replacement cannot be correctly done because, as soon as the string Phrase A: had been deleted, the regex conditions do not exist anymore for a possible match of the next line !

                          Thus, instead of trying to DELETE anything, we’ll try, first, to ADD something to the matched lines !


                          So, from your INPUT text, below :

                          Phrase A: Words
                          Phrase A: Words
                          Phrase A: Words
                          Phrase B: Words
                          Phrase B: Words
                          Phrase A: Words
                          Phrase A: Words
                          Phrase B: Words
                          Phrase A: Words
                          Phrase A: Words
                          

                          the following regex S/R will add, for example, the ¶ symbol at the beginning of any line that need to be matched :

                          FIND (?-s)(?<=Phrase A: ).*\R\K(?=Phrase A: )

                          REPLACE ¶

                          And, after a click on the Replace All button, would produce this temporary OUTPUT :

                          Phrase A: Words
                          ¶Phrase A: Words
                          ¶Phrase A: Words
                          Phrase B: Words
                          Phrase B: Words
                          Phrase A: Words
                          ¶Phrase A: Words
                          Phrase B: Words
                          Phrase A: Words
                          ¶Phrase A: Words
                          

                          Then, the trivial S/R, below, would delete all the lines beginning with the string ¶Phrase A:\x20 :

                          FIND (?-i)^¶Phrase A:\x20

                          REPLACE Leave EMPTY

                          And you get your expected OUTPUT text :

                          Phrase A: Words
                          Words
                          Words
                          Phrase B: Words
                          Phrase B: Words
                          Phrase A: Words
                          Words
                          Phrase B: Words
                          Phrase A: Words
                          Words
                          

                          However, I admit that the @coises’s solution, with the Plugins > Columns++ > Search... option and using the (?-s)(?<=Phrase A: ).*\R\KPhrase A:\x20 regex, is more simple and obvious !


                          Note that, if in the first S/R, I had written (?-s)(?<=^Phrase A: ).*\R\K(?=Phrase A: ), the second match in line 3 would not have been found because the regex would have expected that the word Phrase BEGINS the line !


                          IMPORTANT note :

                          If a MARK operation only concerns zero-length strings, like my first S/R, it just returns the message Mark: 0 matches in entire file or the message Mark: 0 matches from caret to end of file, instead of the message Mark: # matches..., with # > 0. However, as shown above, it did realize the needed replacements ! So, I think it’s a Notepad++ bug and I should create an issue about it !

                          Of course, the fact that no highlighting occurs is quite logical because one cannot highlight zero-length strings !! And I suppose this leads to the 0 matches result :-((

                          Best Regards,

                          guy038

                          P.S. :

                          If I use my first search regex (?-s)(?<=Phrase A: ).*\R\K(?=Phrase A: ), with the Columns++ plugin, and click on the Select All button, it does find 4 zero-length matches !

                          1 Reply Last reply Reply Quote 3
                          • guy038G
                            guy038
                            last edited by

                            Hi, All,

                            As intended, here is my issue , on GitHub :

                            https://github.com/notepad-plus-plus/notepad-plus-plus/issues/16279

                            BR

                            guy038

                            1 Reply Last reply Reply Quote 1
                            • Lycan ThropeL
                              Lycan Thrope @Terry R
                              last edited by Lycan Thrope

                              @Terry-R said in Why won't this PCRE regular expression work in Notepad++ when it works on regex101?:

                              Actually I think you missed the OP’s need. They said:

                              I did say, I may have misunderstood what he was testing for. I thought the first line being highlighted by a background color indicated it was chosen, so yeah that’s my bad…and I did say I’m no regex expert. :-)

                              I kind of agree with @Alan-Kilborn and @guy038 how he maybe should have proceeded, but obviously per @guy038 , @Coises solution was a better option for him. I don’t know, I’m not currently working in his plugin, so wouldn’t have known if that would work. I’m still trying to work out some kind of solution, but seem not to be able to treat the regex like a programming logic, or don’t know the proper syntax of the Boost regex to try and make something work the way we want.
                              Since most of you accomplished guys aren’t seemingly able to solve it, I don’t hold out hope for myself, either, but at least it’s making me read more of the documentation and found some new things to look at and learn, like \G (little L). Just learning and will be wrong. :-)

                              CoisesC 1 Reply Last reply Reply Quote 0
                              • CoisesC
                                Coises @Lycan Thrope
                                last edited by

                                @Lycan-Thrope said in Why won't this PCRE regular expression work in Notepad++ when it works on regex101?:

                                I kind of agree with @Alan-Kilborn and @guy038 how he maybe should have proceeded, but obviously per @guy038 , @Coises solution was a better option for him.

                                It’s kind of a weird regular expression. Had I been trying to solve this problem from scratch, I probably would have done something like @guy038 suggested.

                                The only reason I brought up Columns++ was because the original poster succeeded in getting the expression to match exactly what he wanted, but replacement with an empty string didn’t work. He wanted a way to select all the matches so he could delete them, and that is something Columns++ search offers that Notepad++ search does not.

                                The thing that makes this tricky is that regular expression replacements occur after each corresponding match; it doesn’t first match everything, then do the replacements. And for this problem, some matches (third and subsequent lines beginning with “Phrase A: ”) fail once the previously matched text has been replaced.

                                @guy038’s suggestion is the normal way of solving this sort of problem. You “mark” each match by adding some character that you know doesn’t occur in the file, then you use those marks to make the changes you need in a second step. I don’t think there is, in general, a way to do it in a single step (though of course in any specific case there might be a “trick”).

                                Columns++ offers an alternative approach by allowing multi-selection of all the matches without changing the file. In this case, just pressing delete was enough for the second step; in a more complex case, Columns++ allows you to reset the search region to match the multi-selection, so you could then search, or search and replace, within the previous matches. I think it’s powerful, though completely non-standard. You could always accomplish the same thing with the “add a character that isn’t used in the file” technique, though.

                                1 Reply Last reply Reply Quote 3
                                • guy038G
                                  guy038
                                  last edited by

                                  Hi All,

                                  See my important and final comment about my own issue :

                                  https://github.com/notepad-plus-plus/notepad-plus-plus/issues/16279#issuecomment-2726313083

                                  BR

                                  guy038

                                  1 Reply Last reply Reply Quote 1
                                  • First post
                                    Last post
                                  The Community of users of the Notepad++ text editor.
                                  Powered by NodeBB | Contributors