Community
    • Login

    Anyone can help with this regex?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    18 Posts 9 Posters 10.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Shayne Z.S
      Shayne Z.
      last edited by Shayne Z.

      So I have a data like follows:

      1.gooddata
      2.gooddata
      3.gooddata
      FF

      random
      notrelevant


      header


      4.gooddata
      5.gooddata
      6.gooddata
      FF

      and it goes over and over again. My question is, how do I use regex to find “FF” as a start point and delete everything in between the “FF” and “- - - - - -” so the final output would be like this:

      1.gooddata
      2.gooddata
      3.gooddata
      4.gooddata
      5.gooddata
      6.gooddata

      Many thanks for reading my post.

      1 Reply Last reply Reply Quote 0
      • dailD
        dail
        last edited by dail

        Search for FF.*?- - - - - - and make sure to check the box that says . matches newlines

        In general if you have any starting string S and ending string E you can just put .*? in between them like S.*?E

        Edit: Well this would get you part of the way I think…

        1 Reply Last reply Reply Quote 0
        • Scott SumnerS
          Scott Sumner
          last edited by

          This should do it, best I can tell from your description of the data (i.e., without getting to crazy about trying to catch possible situations you didn’t describe, for example, are there space characters after your FF data on the lines…):

          Find what box:

          (?s)FF\R.*?FF\R
          

          Replace with box: make sure it is empty!

          Search Mode: Regular expression

          1 Reply Last reply Reply Quote 0
          • tomas-chrastinaT
            tomas-chrastina
            last edited by

            Hi,

            I’m not sure if your sample is complete. Also I can see there header section, that you don’t mention when you talked about just FF and - - - - - -. Therefore I’m not sure if it’s all part of text?

            But try this:

            1. Backup your file !!!
            2. CTRL + H (Replace)
            3. Find what: ^((FF|header)[\s\S]*?- - - - - -|\s*)$[\r\n]+
              Replace with: (empty => delete)
              Search Mode: Regular expression
            4. Replace All

            My short explanation of: ^((FF|header)[\s\S]*?- - - - - -|\s*)$[\r\n]+

            • Look for line starting with FF OR header. If found, select all following text, until you reach - - - - - -.
            • In addition (OR) select blank lines.

            That’s as much as I can get from your text. But if there are som spaces or something different, just update data, so we can update pattern to match it.

            For complete technical explanation or pattern insert expression on this page Regex101.

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hello Shayne Z. and All,

              I think I’ve got a general regex which allows to search and delete the smaller range between two strings, let’s say, ABC and XYZ, INCLUDED the two lines containing these strings ABC and XYZ. So :

              • The first line deleted will be the line containing the string ABC. This line may be any of these four forms : ABC or ABC789 or 123ABC or 123ABC789.

              • The nearest line, containing the string XYZ, will be the last line deleted. This line, as well, may be any of the four forms : XYZ or XYZ789 or 123XYZ or 123XYZ789

              • Every line, even blank or empty ones, between these the two lines above, will be deleted


              This regex does work for particular cases such as :

              • A single line, containing the two strings ABC and XYZ

              • Two consecutive lines, containing ABC, then XYZ

              • Lines containing several start delimiter ABC and/or end delimiter XYZ

              • Lines with a mixed form of these two delimiters, as, for instance, the line 123ABC456XYZ789XYZ012ABC345ABCXYZ6789

              Of course, you must replace the example delimiters ABC and XYZ, by your own strings, used as delimiters !


              So, just follow the few steps, below :

              • Select a range of text, ONLY IF your want to restrict the future suppression to a part of your file

              • Open the Replace dialog ( CTRL + H )

              • Choose the Regular expression search mode

              • Check, preferably, the Match case option

              • Check the In selection option, if you previously selected some amount of text

              • In the Find what zone, type in (?-s)^.*ABC(?s).*?(?-s)XYZ.*(\R|\z)

              • Leave the Replace With zone EMPTY

              • Finally, click on the Replace All button

              Et voilà !


              Some explanations :

              • The (?-s) syntax is a modifier that means that the DOT character DO NOT match the END of LINE characters ( \r, \n or \r\n ). Note that, the opposite form, (?s) means that, from now on, the DOT matches, absolutely, ANY character !

              • The regex ^.*ABC matches from a beginning of line to the last string ABC found, further, in the SAME line

              • The regex (?s).*? matches any character, EVEN the END of LINE character(s), till the nearest string XYZ, found, further, even some lines after !

              • The regex (?-s)XYZ.* matches the string XYZ, then any standard character, on the SAME line, till its END of LINE character(s)

              • Finally, the regex (\R|\z) matches any EOL character(s) ( \r\n in a Windows file, \n in an UNIX file or \r in an old MAC file ) OR the VERY end of the file


              IMPORTANT :

              The way I put the different option modifiers, in the regex above, allows you to use regexes, instead of fixed strings, as delimiters :-) For instance, let’s suppose that :

              • The first line to delete would be a line containing the string ABC and, further, on the same line, the string DEF,

              • The last line to delete would be a line containing the string UVW and, further, on the same line, the string XYZ

              In that case, the search regex, above, would become :

              (?-s)^.*ABC.*DEF(?s).*?(?-s)UVW.*XYZ.*(\R|\z)

              Best regards,

              guy038

              1 Reply Last reply Reply Quote 0
              • guy038G
                guy038
                last edited by guy038

                Hi All,

                I just forgot to give an example of the general S/R, detailed, in my previous post !

                Then, giving the upper-case string ABC, as a start delimiter and the upper-case string XYZ as en end delimiter, which leads to the regex :

                • SEARCH = (?-s)^.*ABC(?s).*?(?-s)XYZ.*(\R|\z)

                • REPLACE = NOTHING

                The text, below :

                This line, containing ABC, will be deleted
                This is a BLOCK
                
                of text which will			 
                be DELETED
                
                as well as this line XYZ
                This piece of text
                
                will NOT be DELETED
                
                but the BLOCK of the TWO NEXT ONES will
                ABC
                XYZ
                This text, with some blank lines,
                
                
                won't be modified, but the NEXT line will !
                ABCXYZ
                
                The BLOCK of the TWO NEXT lines, below, will be DELETED
                12345ABC 67890 ABC
                --- XYZ XYZ ---
                
                as well as this LAST block, below
                --- ABC --- XYZ --- ABC  
                
                --- ABC --- XYZ --- XYZ --- ABC --- ABCXYZ ---
                

                will be CHANGED into :

                This piece of text
                
                will NOT be DELETED
                
                but the BLOCK of the TWO NEXT ONES will
                This text, with some blank lines,
                
                
                won't be modified, but the NEXT line will !
                
                The BLOCK of the TWO NEXT lines, below, will be DELETED
                
                as well as this LAST block, below
                

                Cheers,

                guy038

                pbarneyP 1 Reply Last reply Reply Quote 1
                • pbarneyP
                  pbarney @guy038
                  last edited by

                  @guy038, I swear, I learn so much from you. I had no idea that (?s) and (?-s) could be used anywhere within a search string (and more than once!)

                  That is so creative and really a good example of outside the box thinking. It opens up a whole new class of text manipulations for me.

                  Thank you for taking the time to share your black-belt level regex experience. If you had a tip jar, I’d put some money in it!

                  mkupperM 1 Reply Last reply Reply Quote 0
                  • mkupperM
                    mkupper @pbarney
                    last edited by

                    @pbarney Something I’m starting use more are things like (?-i:sub-expression). In that example the ignore-case flag is turned off just for whatever the sub-expression is.

                    I’ll also often spread expressions out over multiple lines using free spacing mode but I turn free-spacing off within lines of my regexp:

                    (?x-i) # Mark shutdown/startup log lines
                    ^[01][0-9]/[0-3][0-9]/20[0-9][0-9]\ [012][0-9]:[0-5][0-9]\ (?:
                    (?-x:Shut down.*\R+)|
                    (?-x:Start up.*\R+)|
                    (?-x:Logged in. Device was booted at (?'booted'[01][0-9]/[0-3][0-9]/20[0-9][0-9] [012][0-9]:[0-5][0-9]:[0-5][0-9])(?: adjusted to (?'adjusted'[01][0-9]/[0-3][0-9]/20[0-9][0-9] [012][0-9]:[0-5][0-9]:[0-5][0-9]))?\R)
                    )
                    
                    • Line 1 starts free spacing mode and I want to be in ignore-case.
                    • Line 2 matches the time stamp at the start of the log lines I’m dealing with
                    • Lines 3, 4, and 5 are the various types of log lines I’m interested in. As I don’t want to have to be alert for \ escaping spaces as I’m in free-spacing mode I turn free-spacing for the body of the line.= using (?-x:sub-expression\R+)

                    This allows me to focus on the regexp syntax one line at a time. I can select one line and with Ctrl+F verify that the pattern works. It also makes it easy to add/remove lines other than being alert for dealing with the last line does not have a trailing |.

                    The commonly used (?:sub-expression) non-capturing group is a subset of this system.

                    gerdb42G 1 Reply Last reply Reply Quote 3
                    • gerdb42G
                      gerdb42 @mkupper
                      last edited by

                      @mkupper Don’t feed these (AI-)Trolls. What Bio-Brain would dig up an 10-year old thread just to post some praise?

                      pbarneyP 1 Reply Last reply Reply Quote -1
                      • pbarneyP
                        pbarney @gerdb42
                        last edited by pbarney

                        @gerdb42 said in Anyone can help with this regex?:

                        @mkupper Don’t feed these (AI-)Trolls.

                        Dude. Look at my post history.

                        What Bio-Brain would dig up an 10-year old thread just to post some praise?

                        How about someone who does a search to find an answer instead of just posting a new question that might have been answered a dozen times before?

                        How about someone who appreciates the continually quality posts of someone who puts a lot of time in here to help people without a thought of reward?

                        Relax, Mr. hall monitor. There’s a reason that topics remain open; some questions and answers remain relevant for a long, long time.

                        1 Reply Last reply Reply Quote 3
                        • guy038G
                          guy038
                          last edited by guy038

                          Hello, @mkupper, @pbarney, @gerdb42 and All,

                          Oh, yes, @mkupper, your use of the (?-x:sub_expression) is quite interesting and I’ve never thought of such syntax, before !

                          (?x-i) # Mark shutdown/startup log lines
                          ^[01][0-9]/[0-3][0-9]/20[0-9][0-9]\ [012][0-9]:[0-5][0-9]\ (?:
                          (?-x:Shut down.*\R+)|
                          (?-x:Start up.*\R+)|
                          (?-x:Logged in. Device was booted at (?'booted'[01][0-9]/[0-3][0-9]/20[0-9][0-9] [012][0-9]:[0-5][0-9]:[0-5][0-9])(?: adjusted to (?'adjusted'[01][0-9]/[0-3][0-9]/20[0-9][0-9] [012][0-9]:[0-5][0-9]:[0-5][0-9]))?\R)
                          )
                          

                          Note that, in case there is only one or two chars, in each line of a multi-lines regex, to modify, you could use a composite regex, like below :

                          (?x-i) # 'RESPECT case' mode and 'FREE-spacing' mode :
                          
                                   # Ignore any amount of NON-ESCAPED '\s' chars which lays OUTSIDE a CHARACTER CLASS
                                   # Ignore ANY text beginning with a NON-ESCAPED # character till the end of CURRENT line
                          
                          # Mark shutdown/startup log lines
                          ^
                          [01][0-9]/[0-3][0-9]/20[0-9][0-9]\ [012][0-9]:[0-5][0-9][ ]
                          (?:
                            (Shut[ ]down.*\R+) |
                            (Start\ up.*\R+)   |
                            (?-x:Logged in. Device was booted at (?'booted'[01][0-9]/[0-3][0-9]/20[0-9][0-9] [012][0-9]:[0-5][0-9]:[0-5][0-9])(?: adjusted to (?'adjusted'[01][0-9]/[0-3][0-9]/20[0-9][0-9] [012][0-9]:[0-5][0-9]:[0-5][0-9]))?\R)
                          )
                          

                          Best Regards,

                          guy038

                          mkupperM 1 Reply Last reply Reply Quote 1
                          • mkupperM
                            mkupper @guy038
                            last edited by

                            @guy038 said in Anyone can help with this regex?:

                            your use of the (?-x:sub_expression) is quite interesting and I’ve never thought of such syntax, before !

                            Something that puzzles me in the Boost manual is the first part of the Modifiers section which has

                            (?imsx-imsx … ) alters which of the perl modifiers are in effect within the pattern, changes take effect from the point that the block is first seen and extend to any enclosing ). Letters before a ‘-’ turn that perl modifier on, letters afterward, turn it off.

                            The thing that bugs me is the ... part and the words changes take effect from the point that the block is first seen and extend to any enclosing.

                            Can the space, dot-dot-dot, space supposed to be a sub-expression?

                            The syntax on the next line in the manual with (?imsx-imsx:pattern) applies the specified modifiers to pattern only. makes perfect sense as the colon is the delimiter. Is there a way to have a sub-expression or pattern when using (?imsx-imsx ... )?

                            I understand (?imsx-imsx) style sytax to turn flags on and off but why is space, dot-dot-dot, space in the manual?

                            1 Reply Last reply Reply Quote 0
                            • guy038G
                              guy038
                              last edited by guy038

                              Hi, @mkupper and All,

                              IMO, it’s probably a typo ! I suppose that it just means (?imsx-imsx: ... ) with anything after the colon till the ending parenthesis.


                              For example, I tested all the cases below, and indeed, the only correct syntax seems to be : (?i:pattern)

                              (?i!pattern)  => 'Find: Invalid Regular Expression' message
                              (?i"pattern)  => 'Find: Invalid Regular Expression' message
                              (?i#pattern)  => 'Find: Invalid Regular Expression' message
                              (?i$pattern)  => 'Find: Invalid Regular Expression' message
                              (?i%pattern)  => 'Find: Invalid Regular Expression' message
                              (?i&pattern)  => 'Find: Invalid Regular Expression' message
                              (?i'pattern)  => 'Find: Invalid Regular Expression' message
                              (?i(pattern)  => 'Find: Invalid Regular Expression' message
                              (?i)pattern)  => 'Find: Invalid Regular Expression' message
                              (?i*pattern)  => 'Find: Invalid Regular Expression' message
                              (?i+pattern)  => 'Find: Invalid Regular Expression' message
                              (?i,pattern)  => 'Find: Invalid Regular Expression' message
                              (?i.pattern)  => 'Find: Invalid Regular Expression' message
                              (?i-pattern)  => 'Find: Invalid Regular Expression' message
                              (?i/pattern)  => 'Find: Invalid Regular Expression' message
                              (?i0pattern)  => 'Find: Invalid Regular Expression' message
                              
                              (?i:pattern)  =>  Match any string 'pattern' WHATEVER its case
                              
                              (?i;pattern)  => 'Find: Invalid Regular Expression' message
                              (?i<pattern)  => 'Find: Invalid Regular Expression' message
                              (?i=pattern)  => 'Find: Invalid Regular Expression' message
                              (?i>pattern)  => 'Find: Invalid Regular Expression' message
                              (?i?pattern)  => 'Find: Invalid Regular Expression' message
                              (?i@pattern)  => 'Find: Invalid Regular Expression' message
                              (?iApattern)  => 'Find: Invalid Regular Expression' message
                              (?i[pattern)  => 'Find: Invalid Regular Expression' message
                              (?i\pattern)  => 'Find: Invalid Regular Expression' message
                              (?i]pattern)  => 'Find: Invalid Regular Expression' message
                              (?i^pattern)  => 'Find: Invalid Regular Expression' message
                              (?i_pattern)  => 'Find: Invalid Regular Expression' message
                              (?i`pattern)  => 'Find: Invalid Regular Expression' message
                              (?iapattern)  => 'Find: Invalid Regular Expression' message
                              (?i{pattern)  => 'Find: Invalid Regular Expression' message
                              (?i|pattern)  => 'Find: Invalid Regular Expression' message
                              (?i}pattern)  => 'Find: Invalid Regular Expression' message
                              (?i~pattern)  => 'Find: Invalid Regular Expression' message
                              

                              Best Regards,

                              guy038

                              mkupperM 1 Reply Last reply Reply Quote 2
                              • mkupperM
                                mkupper @guy038
                                last edited by

                                @guy038 You may have some fun with (?P:...)

                                It’s not included in the documentation but is supported by Boost. https://stackoverflow.com/questions/10059673/named-regular-expression-group-pgroup-nameregexp-what-does-p-stand-for has a fascinating background.

                                I found that as I wondered if there were any valid flags other than [smix]. I also found that Boost does not care if you use a flag more than once. If a flag is both before and after the - then it’s turned off. Boost does not complain about (?-:...)

                                1 Reply Last reply Reply Quote 0
                                • guy038G
                                  guy038
                                  last edited by guy038

                                  Hello, @mkupper and All,

                                  Ah… OK. So I ran an other series of tests, below :

                                  (?!:pattern)  =>  Search any empty string, NON followed with the string ':pattern')  => So, roughly, match any EMPTY string
                                  
                                  (?":pattern)  =>  'Find Invalid Regular Expression' message
                                  
                                  (?#:pattern)  =>  Search any empty string, followed with the comment ':pattern' )    => So, roughly, match any EMPTY string
                                  
                                  (?$:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?%:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?&:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?':pattern)  =>  'Find Invalid Regular Expression' message
                                  (?(:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?):pattern)  =>  'Find Invalid Regular Expression' message
                                  (?*:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?+:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?,:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?.:pattern)  =>  'Find Invalid Regular Expression' message
                                  
                                  (?-:pattern)  =>  Match any string 'pattern', according to the 'Match case' option 'ON' or 'OFF'
                                  
                                  (?/:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?0:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?1:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?2:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?3:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?4:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?5:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?6:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?7:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?8:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?9:pattern)  =>  'Find Invalid Regular Expression' message
                                  
                                  (?::pattern)  =>  Match any string ':pattern', according to the 'Match case' option 'ON' or 'OFF'
                                  
                                  (?;:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?<:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?=:pattern)  =>  'Find Invalid Regular Expression' message
                                  
                                  (?>:pattern)  =>  Match any ATOMIC string ':pattern', according to the 'Match case' option 'ON' or 'OFF'
                                  
                                  (??:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?@:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?A:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?B:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?C:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?D:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?E:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?F:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?G:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?H:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?I:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?J:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?K:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?L:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?M:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?N:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?O:pattern)  =>  'Find Invalid Regular Expression' message
                                  
                                  (?P:pattern)  =>  Match any string 'pattern', according to the 'Match case' option 'ON' or 'OFF'
                                  (?Ppattern)   =>  
                                  
                                  (?Q:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?R:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?S:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?T:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?U:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?V:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?W:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?C:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?Y:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?Z:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?[:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?\:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?]:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?^:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?_:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?`:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?a:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?b:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?c:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?d:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?e:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?f:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?g:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?h:pattern)  =>  'Find Invalid Regular Expression' message
                                  
                                  (?i:pattern)  =>  Match any string 'pattern', AHTEVER its case
                                  
                                  (?j:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?k:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?l:pattern)  =>  'Find Invalid Regular Expression' message
                                  
                                  (?m:pattern)  =>  Match any string 'pattern', according to the 'Match case' option 'ON' or 'OFF'
                                  
                                  (?n:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?o:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?p:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?q:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?r:pattern)  =>  'Find Invalid Regular Expression' message
                                  
                                  (?s:pattern)  =>  Match any string 'pattern', according to the 'Match case' option 'ON' or 'OFF'
                                  
                                  (?t:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?u:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?v:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?w:pattern)  =>  'Find Invalid Regular Expression' message
                                  
                                  (?x:pattern)  =>  Match any string 'pattern', according to the 'Match case' option 'ON' or 'OFF'
                                  
                                  (?y:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?z:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?{:pattern)  =>  'Find Invalid Regular Expression' message
                                  
                                  (?|:pattern)  =>  Match any string ':pattern', according to the 'Match case' option 'ON' or 'OFF'
                                  
                                  (?}:pattern)  =>  'Find Invalid Regular Expression' message
                                  (?~:pattern)  =>  'Find Invalid Regular Expression' message
                                  

                                  The syntaxes (?P<Name>Regex) and (?P=Name), described in your stackoverflow article, are NOT correct with the present Boost implementation within Notepad++

                                  Refer to these two links :

                                  https://www.regular-expressions.info/refext.html

                                  https://www.regular-expressions.info/refreplacebackref.html

                                  For each, once opened, select, if necessary, the Boost choice in the first drop-down list and the Python choice in the second drop-down list and compare… !

                                  For instance, the regex (?P<Test>\d+) triggers the Invalid Regular Expression message, whereas the syntaxes (?<Test>\d+) or (?'Test'\d+) do find any NON-empty range of digits

                                  On the same way, the regex (?<Test>\d+)ABC(?P=Test) is not valid, whereas the syntaxes (?<Test>\d+)ABC\g<Test> or (?-i)(?<Test>\d+)ABC\k<Test> do find any string ABC embedded by a same string of digits

                                  Test my assumptions with the example text, below :

                                  1ABC2345
                                  123ABC123
                                  12345ABC9
                                  01ABC01234
                                  ABC
                                  12345ABC12345
                                  123ABC456
                                  6789ABC89
                                  

                                  Note that the (?P:pattern) syntax, in the first part of this post, look like a (?P<Name>Regex) syntax, where the name part is just replaced by a colon ?!


                                  With the example text above, see also the main difference between :

                                  • Searching with any of the 12 following regex syntaxes :

                                    • (?-i)(?<Test>\d+)ABC\g{Test}

                                    • (?-i)(?<Test>\d+)ABC\g<Test>

                                    • (?-i)(?<Test>\d+)ABC\g'Test'

                                    • (?-i)(?<Test>\d+)ABC\k{Test}

                                    • (?-i)(?<Test>\d+)ABC\k<Test>

                                    • (?-i)(?<Test>\d+)ABC\k'Test'

                                    • (?-i)(?'Test'\d+)ABC\g{Test}

                                    • (?-i)(?'Test'\d+)ABC\g<Test>

                                    • (?-i)(?'Test'\d+)ABC\g'Test'

                                    • (?-i)(?'Test'\d+)ABC\k{Test}

                                    • (?-i)(?'Test'\d+)ABC\k<Test>

                                    • (?-i)(?'Test'\d+)ABC\k'Test'

                                  • And searching with any of the 4 following regex ones :

                                    • (?-i)(?<Test>\d+)ABC(?&Test)

                                    • (?-i)(?<Test>\d+)ABC(?P>Test)

                                    • (?-i)(?'Test'\d+)ABC(?&Test)

                                    • (?-i)(?'Test'\d+)ABC(?P>Test)


                                  • In the first case, the last part of the regex, after the string ABC, is a back-reference to the present value of the named group Test

                                  • In the second case, the last part of the regex, after the string ABC, is a back-reference to the named group Test itself. so, these four regexes should match any line of my example text but the ABC string alone !

                                  Remark : any reference to a named group must be case-sensitive. Otherwise, a Find: Invalid Regular expression message is returned !


                                  In replacement, if you need to refer to a named group, you can use the $+{Test} syntax. However, note that it will always rewrite the value of named group Test when it was defined and not the last value of group Test !

                                  Best Regards,

                                  guy038

                                  Alan KilbornA 2 Replies Last reply Reply Quote 1
                                  • Alan KilbornA
                                    Alan Kilborn @guy038
                                    last edited by Alan Kilborn

                                    @guy038 said:

                                    In the first case, the last part of the regex, after the string ABC, is a back-reference to the present value of the named group Test

                                    Thus, using the first regex (of the first case), (?-i)(?<Test>\d+)ABC\g{Test}, against the data on the sample line 1ABC2345, that line won’t be matched because the Test group captured 1 (for a match to occur, that data would have to start with 1ABC1).

                                    In the second case, the last part of the regex, after the string ABC, is a back-reference to the named group Test itself

                                    Using the first regex (of the second case), (?-i)(?<Test>\d+)ABC(?&Test), the sample line data, 1ABC2345, will be matched, because the instruction is to use the regex of the named group (\d+), not the captured data from the actual match (so any sequence of digits occurring before ABC and a sequence of any digits after).

                                    1 Reply Last reply Reply Quote 2
                                    • Alan KilbornA
                                      Alan Kilborn @guy038
                                      last edited by

                                      @guy038 said:

                                      In replacement, if you need to refer to a named group, you can use the $+{Test} syntax. However, note that it will always rewrite the value of named group Test when it was defined and not the last value of group Test !

                                      I suppose if you wanted to use the “last value of group Test” in the replacement, you could add a capture group, i.e.,

                                      (?-i)(?<Test>\d+)ABC(?<foo>(?&Test))

                                      and then $+{foo} would be available for use in the replacement string.

                                      So, in the 1ABC2345 test line, $+{foo} would expand to 2345. (And $+{Test} would be 1.)

                                      1 Reply Last reply Reply Quote 2
                                      • guy038G
                                        guy038
                                        last edited by

                                        Hello, @alan-kilborn, and All,

                                        Alan, you’ve just understood all my stuff quite correctly and even more regarding your last example with $+{Test} and $+{foo}, whose I did not think of !

                                        Best Regards,

                                        guy038

                                        1 Reply Last reply Reply Quote 1
                                        • First post
                                          Last post
                                        The Community of users of the Notepad++ text editor.
                                        Powered by NodeBB | Contributors