need of explanation of find and replace with option regex
-
Hello, @andrea-seyfarth, @peterjones and All
Here are two regexes :
Regex A :
(?s)^\h*II\x20.+?\R\h*VI\x20(?-s).+\R?
Regex B :
(?s)^\h*II\x20((?!Header).)+?Steuerklasse\x20VI(\R|\z)
that I tested against the text below :
111,11I Header I 000.00 II 222,22 III 333.33 IV 444.44 V 555.55 Steuerklassen V und IV VI 666.66 Steuerklasse VI VII 777.77 Steuerklasse VII VIII 888.88 Steuerklasse VIII 111,11I Header I 000.00 II 222,22 III 333.33 IV 444.44 V 555.55 Steuerklassen V und IV VI 666.66 ----- NO MATCH 1 ----- VI VII 777.77 Steuerklasse VII VIII 888.88 Steuerklasse VIII 111,11I Header I 000.00 II 222,22 III 333.33 IV 444.44 V 555.55 Steuerklassen V und IV VI 666.66 Steuerklasse VI VII 777.77 Steuerklasse VII VIII 888.88 Steuerklasse VIII 111,11I Header I 000.00 II 222,22 III 333.33 IV 444.44 V 555.55 Steuerklassen V und IV VI 666.66 ----- NO MATCH 2 ----- VI VII 777.77 Steuerklasse VII VIII 888.88 Steuerklasse VIII 111,11I Header I 000.00 II 222,22 III 333.33 IV 444.44 V 555.55 Steuerklassen V und IV VI 666.66 ----- NO MATCH 3 ----- VI VII 777.77 Steuerklasse VII VIII 888.88 Steuerklasse VIII 111,11I Header I 000.00 II 222,22 III 333.33 IV 444.44 V 555.55 Steuerklassen V und IV VI 666.66 Steuerklasse VI
The regex
A
catches the entire FIVE lines, beginning with Roman numberII
and beginning with Roman numberVI
( with possible horizontal blank characters before ), whatever their contentsAnd the regex
B
catches the entire FIVE lines, beginning with Roman numberII
( with possible horizontal blank characters before ) and ending with the string Steuerklasse VI, ONLY IF the stringHeader
cannot be found, at any position, of the smallest multi-lines sequence of characters, after the regex\h*II\x20
till the regexSteuerklasse\x20VI
!So, Peter, as you can see, I used the negative look-ahead
(?!Header)
, which is tested at any position of the.
=> the syntax((?!Header).)+?
. Note also, by I preferred to get the entire lines, with their End of Line chars ! So, when the replacement zone is empty, it does not remain any blank line, afterwards :-))I also, used the alternative
(\R|\z)
, just in case the very last line would be a lineVI 666.66 Steuerklasse VI
, without any line-break !Best Regards,
guy038
-
The word “Header” was my invention, and not in any of @Andrea-Seyfarth’s examples, so that shouldn’t be used for a regex we suggest to her. Sorry for muddying the waters with my example file.
I like your cleaner
\h
for the horizontal space (that escape sequence hadn’t yet stored in my long-term regex memory; some day, maybe even today, it will).Thanks for continually sharing your regex expertise with us. I’m always amazed by your expressions, and the quality of your explanations.
Hopefully, we’ve helped Andrea in the process. :-)