Anyone can help with this regex?
-
Hi All,
I just forgot to give an example of the general S/R, detailed, in my previous post !
Then, giving the upper-case string ABC, as a start delimiter and the upper-case string XYZ as en end delimiter, which leads to the regex :
-
SEARCH =
(?-s)^.*ABC(?s).*?(?-s)XYZ.*(\R|\z)
-
REPLACE =
NOTHING
The text, below :
This line, containing ABC, will be deleted This is a BLOCK of text which will be DELETED as well as this line XYZ This piece of text will NOT be DELETED but the BLOCK of the TWO NEXT ONES will ABC XYZ This text, with some blank lines, won't be modified, but the NEXT line will ! ABCXYZ The BLOCK of the TWO NEXT lines, below, will be DELETED 12345ABC 67890 ABC --- XYZ XYZ --- as well as this LAST block, below --- ABC --- XYZ --- ABC --- ABC --- XYZ --- XYZ --- ABC --- ABCXYZ ---
will be CHANGED into :
This piece of text will NOT be DELETED but the BLOCK of the TWO NEXT ONES will This text, with some blank lines, won't be modified, but the NEXT line will ! The BLOCK of the TWO NEXT lines, below, will be DELETED as well as this LAST block, below
Cheers,
guy038
-
-
@guy038, I swear, I learn so much from you. I had no idea that
(?s)
and(?-s)
could be used anywhere within a search string (and more than once!)That is so creative and really a good example of outside the box thinking. It opens up a whole new class of text manipulations for me.
Thank you for taking the time to share your black-belt level regex experience. If you had a tip jar, I’d put some money in it!
-
@pbarney Something I’m starting use more are things like
(?-i:sub-expression)
. In that example the ignore-case flag is turned off just for whatever the sub-expression is.I’ll also often spread expressions out over multiple lines using free spacing mode but I turn free-spacing off within lines of my regexp:
(?x-i) # Mark shutdown/startup log lines ^[01][0-9]/[0-3][0-9]/20[0-9][0-9]\ [012][0-9]:[0-5][0-9]\ (?: (?-x:Shut down.*\R+)| (?-x:Start up.*\R+)| (?-x:Logged in. Device was booted at (?'booted'[01][0-9]/[0-3][0-9]/20[0-9][0-9] [012][0-9]:[0-5][0-9]:[0-5][0-9])(?: adjusted to (?'adjusted'[01][0-9]/[0-3][0-9]/20[0-9][0-9] [012][0-9]:[0-5][0-9]:[0-5][0-9]))?\R) )
- Line 1 starts free spacing mode and I want to be in ignore-case.
- Line 2 matches the time stamp at the start of the log lines I’m dealing with
- Lines 3, 4, and 5 are the various types of log lines I’m interested in. As I don’t want to have to be alert for
\
escaping spaces as I’m in free-spacing mode I turn free-spacing for the body of the line.= using(?-x:sub-expression\R+)
This allows me to focus on the regexp syntax one line at a time. I can select one line and with
Ctrl+F
verify that the pattern works. It also makes it easy to add/remove lines other than being alert for dealing with the last line does not have a trailing|
.The commonly used
(?:sub-expression)
non-capturing group is a subset of this system. -
@mkupper Don’t feed these (AI-)Trolls. What Bio-Brain would dig up an 10-year old thread just to post some praise?
-
@gerdb42 said in Anyone can help with this regex?:
@mkupper Don’t feed these (AI-)Trolls.
Dude. Look at my post history.
What Bio-Brain would dig up an 10-year old thread just to post some praise?
How about someone who does a search to find an answer instead of just posting a new question that might have been answered a dozen times before?
How about someone who appreciates the continually quality posts of someone who puts a lot of time in here to help people without a thought of reward?
Relax, Mr. hall monitor. There’s a reason that topics remain open; some questions and answers remain relevant for a long, long time.
-
Hello, @mkupper, @pbarney, @gerdb42 and All,
Oh, yes, @mkupper, your use of the
(?-x:sub_expression)
is quite interesting and I’ve never thought of such syntax, before !(?x-i) # Mark shutdown/startup log lines ^[01][0-9]/[0-3][0-9]/20[0-9][0-9]\ [012][0-9]:[0-5][0-9]\ (?: (?-x:Shut down.*\R+)| (?-x:Start up.*\R+)| (?-x:Logged in. Device was booted at (?'booted'[01][0-9]/[0-3][0-9]/20[0-9][0-9] [012][0-9]:[0-5][0-9]:[0-5][0-9])(?: adjusted to (?'adjusted'[01][0-9]/[0-3][0-9]/20[0-9][0-9] [012][0-9]:[0-5][0-9]:[0-5][0-9]))?\R) )
Note that, in case there is only one or two chars, in each line of a multi-lines regex, to modify, you could use a composite regex, like below :
(?x-i) # 'RESPECT case' mode and 'FREE-spacing' mode : # Ignore any amount of NON-ESCAPED '\s' chars which lays OUTSIDE a CHARACTER CLASS # Ignore ANY text beginning with a NON-ESCAPED # character till the end of CURRENT line # Mark shutdown/startup log lines ^ [01][0-9]/[0-3][0-9]/20[0-9][0-9]\ [012][0-9]:[0-5][0-9][ ] (?: (Shut[ ]down.*\R+) | (Start\ up.*\R+) | (?-x:Logged in. Device was booted at (?'booted'[01][0-9]/[0-3][0-9]/20[0-9][0-9] [012][0-9]:[0-5][0-9]:[0-5][0-9])(?: adjusted to (?'adjusted'[01][0-9]/[0-3][0-9]/20[0-9][0-9] [012][0-9]:[0-5][0-9]:[0-5][0-9]))?\R) )
Best Regards,
guy038
-
@guy038 said in Anyone can help with this regex?:
your use of the (?-x:sub_expression) is quite interesting and I’ve never thought of such syntax, before !
Something that puzzles me in the Boost manual is the first part of the
Modifiers
section which has(?imsx-imsx … ) alters which of the perl modifiers are in effect within the pattern, changes take effect from the point that the block is first seen and extend to any enclosing ). Letters before a ‘-’ turn that perl modifier on, letters afterward, turn it off.
The thing that bugs me is the
...
part and the wordschanges take effect from the point that the block is first seen and extend to any enclosing
.Can the space, dot-dot-dot, space supposed to be a sub-expression?
The syntax on the next line in the manual with
(?imsx-imsx:pattern) applies the specified modifiers to pattern only.
makes perfect sense as the colon is the delimiter. Is there a way to have a sub-expression or pattern when using(?imsx-imsx ... )
?I understand
(?imsx-imsx)
style sytax to turn flags on and off but why is space, dot-dot-dot, space in the manual? -
Hi, @mkupper and All,
IMO, it’s probably a typo ! I suppose that it just means
(?imsx-imsx: ... )
with anything after the colon till the ending parenthesis.
For example, I tested all the cases below, and indeed, the only correct syntax seems to be :
(?i:pattern)
(?i!pattern) => 'Find: Invalid Regular Expression' message (?i"pattern) => 'Find: Invalid Regular Expression' message (?i#pattern) => 'Find: Invalid Regular Expression' message (?i$pattern) => 'Find: Invalid Regular Expression' message (?i%pattern) => 'Find: Invalid Regular Expression' message (?i&pattern) => 'Find: Invalid Regular Expression' message (?i'pattern) => 'Find: Invalid Regular Expression' message (?i(pattern) => 'Find: Invalid Regular Expression' message (?i)pattern) => 'Find: Invalid Regular Expression' message (?i*pattern) => 'Find: Invalid Regular Expression' message (?i+pattern) => 'Find: Invalid Regular Expression' message (?i,pattern) => 'Find: Invalid Regular Expression' message (?i.pattern) => 'Find: Invalid Regular Expression' message (?i-pattern) => 'Find: Invalid Regular Expression' message (?i/pattern) => 'Find: Invalid Regular Expression' message (?i0pattern) => 'Find: Invalid Regular Expression' message (?i:pattern) => Match any string 'pattern' WHATEVER its case (?i;pattern) => 'Find: Invalid Regular Expression' message (?i<pattern) => 'Find: Invalid Regular Expression' message (?i=pattern) => 'Find: Invalid Regular Expression' message (?i>pattern) => 'Find: Invalid Regular Expression' message (?i?pattern) => 'Find: Invalid Regular Expression' message (?i@pattern) => 'Find: Invalid Regular Expression' message (?iApattern) => 'Find: Invalid Regular Expression' message (?i[pattern) => 'Find: Invalid Regular Expression' message (?i\pattern) => 'Find: Invalid Regular Expression' message (?i]pattern) => 'Find: Invalid Regular Expression' message (?i^pattern) => 'Find: Invalid Regular Expression' message (?i_pattern) => 'Find: Invalid Regular Expression' message (?i`pattern) => 'Find: Invalid Regular Expression' message (?iapattern) => 'Find: Invalid Regular Expression' message (?i{pattern) => 'Find: Invalid Regular Expression' message (?i|pattern) => 'Find: Invalid Regular Expression' message (?i}pattern) => 'Find: Invalid Regular Expression' message (?i~pattern) => 'Find: Invalid Regular Expression' message
Best Regards,
guy038
-
@guy038 You may have some fun with
(?P:...)
It’s not included in the documentation but is supported by Boost. https://stackoverflow.com/questions/10059673/named-regular-expression-group-pgroup-nameregexp-what-does-p-stand-for has a fascinating background.
I found that as I wondered if there were any valid flags other than
[smix]
. I also found that Boost does not care if you use a flag more than once. If a flag is both before and after the-
then it’s turned off. Boost does not complain about(?-:...)
-
Hello, @mkupper and All,
Ah… OK. So I ran an other series of tests, below :
(?!:pattern) => Search any empty string, NON followed with the string ':pattern') => So, roughly, match any EMPTY string (?":pattern) => 'Find Invalid Regular Expression' message (?#:pattern) => Search any empty string, followed with the comment ':pattern' ) => So, roughly, match any EMPTY string (?$:pattern) => 'Find Invalid Regular Expression' message (?%:pattern) => 'Find Invalid Regular Expression' message (?&:pattern) => 'Find Invalid Regular Expression' message (?':pattern) => 'Find Invalid Regular Expression' message (?(:pattern) => 'Find Invalid Regular Expression' message (?):pattern) => 'Find Invalid Regular Expression' message (?*:pattern) => 'Find Invalid Regular Expression' message (?+:pattern) => 'Find Invalid Regular Expression' message (?,:pattern) => 'Find Invalid Regular Expression' message (?.:pattern) => 'Find Invalid Regular Expression' message (?-:pattern) => Match any string 'pattern', according to the 'Match case' option 'ON' or 'OFF' (?/:pattern) => 'Find Invalid Regular Expression' message (?0:pattern) => 'Find Invalid Regular Expression' message (?1:pattern) => 'Find Invalid Regular Expression' message (?2:pattern) => 'Find Invalid Regular Expression' message (?3:pattern) => 'Find Invalid Regular Expression' message (?4:pattern) => 'Find Invalid Regular Expression' message (?5:pattern) => 'Find Invalid Regular Expression' message (?6:pattern) => 'Find Invalid Regular Expression' message (?7:pattern) => 'Find Invalid Regular Expression' message (?8:pattern) => 'Find Invalid Regular Expression' message (?9:pattern) => 'Find Invalid Regular Expression' message (?::pattern) => Match any string ':pattern', according to the 'Match case' option 'ON' or 'OFF' (?;:pattern) => 'Find Invalid Regular Expression' message (?<:pattern) => 'Find Invalid Regular Expression' message (?=:pattern) => 'Find Invalid Regular Expression' message (?>:pattern) => Match any ATOMIC string ':pattern', according to the 'Match case' option 'ON' or 'OFF' (??:pattern) => 'Find Invalid Regular Expression' message (?@:pattern) => 'Find Invalid Regular Expression' message (?A:pattern) => 'Find Invalid Regular Expression' message (?B:pattern) => 'Find Invalid Regular Expression' message (?C:pattern) => 'Find Invalid Regular Expression' message (?D:pattern) => 'Find Invalid Regular Expression' message (?E:pattern) => 'Find Invalid Regular Expression' message (?F:pattern) => 'Find Invalid Regular Expression' message (?G:pattern) => 'Find Invalid Regular Expression' message (?H:pattern) => 'Find Invalid Regular Expression' message (?I:pattern) => 'Find Invalid Regular Expression' message (?J:pattern) => 'Find Invalid Regular Expression' message (?K:pattern) => 'Find Invalid Regular Expression' message (?L:pattern) => 'Find Invalid Regular Expression' message (?M:pattern) => 'Find Invalid Regular Expression' message (?N:pattern) => 'Find Invalid Regular Expression' message (?O:pattern) => 'Find Invalid Regular Expression' message (?P:pattern) => Match any string 'pattern', according to the 'Match case' option 'ON' or 'OFF' (?Ppattern) => (?Q:pattern) => 'Find Invalid Regular Expression' message (?R:pattern) => 'Find Invalid Regular Expression' message (?S:pattern) => 'Find Invalid Regular Expression' message (?T:pattern) => 'Find Invalid Regular Expression' message (?U:pattern) => 'Find Invalid Regular Expression' message (?V:pattern) => 'Find Invalid Regular Expression' message (?W:pattern) => 'Find Invalid Regular Expression' message (?C:pattern) => 'Find Invalid Regular Expression' message (?Y:pattern) => 'Find Invalid Regular Expression' message (?Z:pattern) => 'Find Invalid Regular Expression' message (?[:pattern) => 'Find Invalid Regular Expression' message (?\:pattern) => 'Find Invalid Regular Expression' message (?]:pattern) => 'Find Invalid Regular Expression' message (?^:pattern) => 'Find Invalid Regular Expression' message (?_:pattern) => 'Find Invalid Regular Expression' message (?`:pattern) => 'Find Invalid Regular Expression' message (?a:pattern) => 'Find Invalid Regular Expression' message (?b:pattern) => 'Find Invalid Regular Expression' message (?c:pattern) => 'Find Invalid Regular Expression' message (?d:pattern) => 'Find Invalid Regular Expression' message (?e:pattern) => 'Find Invalid Regular Expression' message (?f:pattern) => 'Find Invalid Regular Expression' message (?g:pattern) => 'Find Invalid Regular Expression' message (?h:pattern) => 'Find Invalid Regular Expression' message (?i:pattern) => Match any string 'pattern', AHTEVER its case (?j:pattern) => 'Find Invalid Regular Expression' message (?k:pattern) => 'Find Invalid Regular Expression' message (?l:pattern) => 'Find Invalid Regular Expression' message (?m:pattern) => Match any string 'pattern', according to the 'Match case' option 'ON' or 'OFF' (?n:pattern) => 'Find Invalid Regular Expression' message (?o:pattern) => 'Find Invalid Regular Expression' message (?p:pattern) => 'Find Invalid Regular Expression' message (?q:pattern) => 'Find Invalid Regular Expression' message (?r:pattern) => 'Find Invalid Regular Expression' message (?s:pattern) => Match any string 'pattern', according to the 'Match case' option 'ON' or 'OFF' (?t:pattern) => 'Find Invalid Regular Expression' message (?u:pattern) => 'Find Invalid Regular Expression' message (?v:pattern) => 'Find Invalid Regular Expression' message (?w:pattern) => 'Find Invalid Regular Expression' message (?x:pattern) => Match any string 'pattern', according to the 'Match case' option 'ON' or 'OFF' (?y:pattern) => 'Find Invalid Regular Expression' message (?z:pattern) => 'Find Invalid Regular Expression' message (?{:pattern) => 'Find Invalid Regular Expression' message (?|:pattern) => Match any string ':pattern', according to the 'Match case' option 'ON' or 'OFF' (?}:pattern) => 'Find Invalid Regular Expression' message (?~:pattern) => 'Find Invalid Regular Expression' message
The syntaxes
(?P<Name>Regex)
and(?P=Name)
, described in yourstackoverflow
article, are NOT correct with the presentBoost
implementation within Notepad++Refer to these two links :
https://www.regular-expressions.info/refext.html
https://www.regular-expressions.info/refreplacebackref.html
For each, once opened, select, if necessary, the
Boost
choice in the first drop-down list and thePython
choice in the second drop-down list and compare… !For instance, the regex
(?P<Test>\d+)
triggers theInvalid Regular Expression
message, whereas the syntaxes(?<Test>\d+)
or(?'Test'\d+)
do find any NON-empty range of digitsOn the same way, the regex
(?<Test>\d+)ABC(?P=Test)
is not valid, whereas the syntaxes(?<Test>\d+)ABC\g<Test>
or(?-i)(?<Test>\d+)ABC\k<Test>
do find any stringABC
embedded by a same string of digitsTest my assumptions with the example text, below :
1ABC2345 123ABC123 12345ABC9 01ABC01234 ABC 12345ABC12345 123ABC456 6789ABC89
Note that the
(?P:pattern)
syntax, in the first part of this post, look like a(?P<Name>Regex)
syntax, where the name part is just replaced by a colon ?!
With the example text above, see also the main difference between :
-
Searching with any of the
12
following regex syntaxes :-
(?-i)(?<Test>\d+)ABC\g{Test}
-
(?-i)(?<Test>\d+)ABC\g<Test>
-
(?-i)(?<Test>\d+)ABC\g'Test'
-
(?-i)(?<Test>\d+)ABC\k{Test}
-
(?-i)(?<Test>\d+)ABC\k<Test>
-
(?-i)(?<Test>\d+)ABC\k'Test'
-
(?-i)(?'Test'\d+)ABC\g{Test}
-
(?-i)(?'Test'\d+)ABC\g<Test>
-
(?-i)(?'Test'\d+)ABC\g'Test'
-
(?-i)(?'Test'\d+)ABC\k{Test}
-
(?-i)(?'Test'\d+)ABC\k<Test>
-
(?-i)(?'Test'\d+)ABC\k'Test'
-
-
And searching with any of the
4
following regex ones :-
(?-i)(?<Test>\d+)ABC(?&Test)
-
(?-i)(?<Test>\d+)ABC(?P>Test)
-
(?-i)(?'Test'\d+)ABC(?&Test)
-
(?-i)(?'Test'\d+)ABC(?P>Test)
-
-
In the first case, the last part of the regex, after the string ABC, is a back-reference to the present value of the named group
Test
-
In the second case, the last part of the regex, after the string ABC, is a back-reference to the named group
Test
itself. so, these four regexes should match any line of my example text but theABC
string alone !
Remark : any reference to a named group must be case-sensitive. Otherwise, a
Find: Invalid Regular expression
message is returned !
In replacement, if you need to refer to a named group, you can use the
$+{Test}
syntax. However, note that it will always rewrite the value of named groupTest
when it was defined and not the last value of groupTest
!Best Regards,
guy038
-
-
@guy038 said:
In the first case, the last part of the regex, after the string ABC, is a back-reference to the present value of the named group Test
Thus, using the first regex (of the first case),
(?-i)(?<Test>\d+)ABC\g{Test}
, against the data on the sample line1ABC2345
, that line won’t be matched because theTest
group captured1
(for a match to occur, that data would have to start with1ABC1
).In the second case, the last part of the regex, after the string ABC, is a back-reference to the named group Test itself
Using the first regex (of the second case),
(?-i)(?<Test>\d+)ABC(?&Test)
, the sample line data,1ABC2345
, will be matched, because the instruction is to use the regex of the named group (\d+
), not the captured data from the actual match (so any sequence of digits occurring beforeABC
and a sequence of any digits after). -
@guy038 said:
In replacement, if you need to refer to a named group, you can use the $+{Test} syntax. However, note that it will always rewrite the value of named group Test when it was defined and not the last value of group Test !
I suppose if you wanted to use the “last value of group Test” in the replacement, you could add a capture group, i.e.,
(?-i)(?<Test>\d+)ABC(?<foo>(?&Test))
and then
$+{foo}
would be available for use in the replacement string.So, in the
1ABC2345
test line,$+{foo}
would expand to2345
. (And$+{Test}
would be1
.) -
Hello, @alan-kilborn, and All,
Alan, you’ve just understood all my stuff quite correctly and even more regarding your last example with
$+{Test}
and$+{foo}
, whose I did not think of !Best Regards,
guy038