Generic Regex: Replacing in a specific zone of text
-
This regex S/R allows to restrict a replacement to a specific zone of text, possibly repeated, on one or several consecutive lines.
This is particularly useful when dealing with
XMLorHTMLlanguages, if you need to do some modifications within a specificstartandendtag range, only.
-
Let FR (
Find Regex) be the regex which defines the char, string or expression to be searched -
Let RR (
Replacement Regex) be the regex which defines the char, string or expression which must replace the FR expression -
Let BSR (
Begin Search-region Regex) be the regex which defines the beginning of the area where the search for FR, must start -
Let ESR (
End Search-region Regex) be the regex which defines the end of the area where the search for FR must stop
Then, the generic regex can be expressed :
SEARCH
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)REPLACE RR
When the BSR and the different matches of the FR regex are all located in a single line, any line-ending char(s) will implicitly break down the
\Gfeature. The ESR part is then useless and the generic regex can be simplified into :SEARCH
(?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)REPLACE RR
IMPORTANT :
-
You must use, at least, the
v7.9.1N++ release, so that the\Aassertion is correctly handled -
You must, move the caret at the very beginning of current file (
Ctrl + Home) -
If you perform a simple search, without any replacement, just click several times on the
Find Nextbutton to notice the different zones affected by the future replacement -
As soon as a replacement is needed, you’ll have to click on the
Replace Allbutton, exclusively. Thus, it will perform a global replacement on the entire file
NOTES :
-
Each non-capturing group, relative to the BSR, ESR and FR regexes, may be prefixed with the
sor-smodifiers :-
If the BSR and/or ESR and/or FR regexes may match
EOLcharacters, use thesmodifier in the appropriate non-capturing group(s) -
If the BSR and/or ESR and/or FR regexes does not match
EOLcharacters, use the-smodifier in the appropriate non-capturing group(s)
-
-
Each non-capturing group, relative to the BSR, ESR and FR regexes, may be prefixed with the
ior-imodifiers :-
If the BSR and/or ESR and/or FR regexes are sensitive to case, use the
-imodifier in the appropriate non-capturing group(s) -
If the BSR and/or ESR and/or FR regexes are insensitive to case, use the
imodifier in the appropriate non-capturing group(s)
-
-
Of course, these modifiers may not be necessary ( for instance in case of search of an exact string or search of non-letter characters )
-
Note that the generic regexes, above, show the case when :
-
These two generic regexes are sensitive to case => The
-imodifier is present everywhere in the definitions -
The ESR region of the first regex may overlap on several lines => The
smodifier in the ESR non-capturing group
-
-
The FR regex may define a group, between parentheses, which will be re-used in the RR regex with the
\#or${#}syntaxes, where#represents an integer -
The RR regex may contain the
$0syntax which refers to each whole SR match or re-use a group, previously defined in the FR regex
Below, here are two examples to illustrate how to build real regexes S/R from these generic ones !
First, let’s imagine that you want to delete any part within parentheses in any range of text
<Descrip>............</Descrip>, only, located in a single line- Paste the
XMLtext, below, in a new tab :
<iden>123456 (START)</iden> <name>Case_1</name> <descrip>This is a (short) text to (easily) see the results (of the modifications)</descrip> <param>val (250)</param> <iden>123456</iden> <name>Case_2</name> <descrip>And the (obvious) changes occur only in (the) "descrip" tag</descrip> <param>val (500)</param> <iden>123456 (END)</iden> <name>Case_3</name> <descrip>All (the) other tags are (just) untouched</descrip> <param>val (999)</param>-
As all the parts to delete are contained in a single line, we can use the simplified formulation :
-
SEARCH
(?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR) -
REPLACE RR
-
-
Obviously, as we want to delete, the RR regex is a zero-length match. So, the
Replace withfield will be empty -
Now, the FR regex represents a
spacechar followed by the shortest text between parentheses => FR =(?:\x20\(.+?\))We do not need any case modifier as this regex does not refer to letters ! -
The BSR regex is simply the literal string
<descrip>, with this exact case. So BSR =(?-i:<descrip>
Finally, the functional regex S/R to use is :
-
SEARCH
(?-s)(?-i:<descrip>|(?!\A)\G).*?\K(?:\x20\(.+?\)) -
REPLACE
Leave EMPTY -
Open the Replace dialog
Ctrl + H -
Untick all options
-
Select the
Regular expressionsearch mode -
Move to the very beginning of current file (
Ctrl + Home) -
Hit several times the
Find Nextbutton to verify if the FR regex does match what you want ! In this present case it matches a space followed by text between parentheses -
Again, move to the very beginning of current file (
Ctrl + Home) -
Click, once only, on the
Replace Allbutton
=> As expected, all text between parentheses, of the
<descrip>tag only, has been deleted, but the other parentheses, present in other tags, are untouched !
In the second example, we’ll try to replace any number of consecutive
dashcharacter with a singlespacechar in any range<text>..........</text>, possibly splitted into several lines- Paste the following
XMLtext in a new tab
<val>37--001</val> <text>This-is -a</text> <pos>4-1234</pos> <val>37--002</val> <text>-small---example</text> <pos>9-0012</pos> <val>37--003</val> <text>-of-text- which-</text> <pos>1-9999</pos> <val>37--004</val> <text>need -to-be- modi fied</text> <pos>0-0000</pos>-
As, this time, the
<text>..........</text>may be spread over several lines, we’ll use the first generic regex :-
SEARCH
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR) -
REPLACE RR
-
-
Obviously, the RR regex is simply
\x20 -
Now, the FR regex represents a non-null number of consecutive dashe(s) => FR is just
-+, as the non-capturing group seems not needed at all -
The BSR regex is simply the literal string
<text>, with this exact case => BSR =(?-si:<text> -
The ESR regex is the literal string
</text>, with this exact case. So the BSR regex, within its non-capturing group, is(?s-i:(?!</text>).)
Then, the real regex S/R to use is :
-
SEARCH
(?-si:<text>|(?!\A)\G)(?s-i:(?!</text>).)*?\K-+ -
REPLACE
\x20 -
Open the Replace dialog
Ctrl + H -
Untick all options
-
Select the
Regular expressionsearch mode -
Move to the very beginning of current file (
Ctrl + Home) -
Hit several times the
Find Nextbutton to verify if the FR regex does match what you want ! In this present case it matches any consecutive range ofdashchars -
Again, move to the very beginning of current file (
Ctrl + Home) -
Click, once only, on the
Replace Allbutton
=> As expected, all range of consecutive dashes, of the
<text>tag only, have been replaced with a singlespacechar and the otherdashcharacters, present in other tags, are kepted ! -
-
P PeterJones referenced this topic on
-
Two other examples regarding this generic regex ! In these ones, we’ll even restrict the replacements to each concerned zone before a
#character !Paste the text below in a new tab :
<iden>123456 (START)</iden> <name>Case_1</name> <descrip>This is a (short) text to (easily) see the results (of the modifications)# (12345) test (67890)</descrip> <param>val (250)</param> <iden>123456</iden> <name>Case_2</name> <descrip>And the (obvious) changes occur only in (the) "descrip" tag # Parentheses (Yeaah) OK</descrip> <param>val (500)</param> <iden>123456 (END)</iden> <name>Case_3</name> <descrip>All (the) other tags are (just) untouched #(This is) the end (of the test)</descrip> <param>val (999)</param>In this first example, of single-line
<descrip>tags , two solutions are possible :-
Use the complete generic regex
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)where ESR =#which leads to the functional S/R :-
SEARCH
(?-s)(?-i:<descrip>|(?!\A)\G)((?!#).)*?\K(?:\x20\(.+?\)) -
REPLACE
Leave EMPTY
-
=> This time, in addition to only replace in each
<descrip>..........</descrip>zone, NO replacement will occur after the#character of each<descrip>tag !-
Use the simplified solution and add a ESR condition at the end of the regex, giving this generic variant
(?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)(?=ESR)-
SEARCH
(?-s)(?-i:<descrip>|(?!\A)\G).*?\K(?:\x20\(.+?\))(?=.*#) -
REPLACE
Leave EMPTY
-
However, this other solution needs that all the
<descrip>tags contains a comment zone with a#char
Now, paste this other text below in a new tab :
<val>37--001</val> <text>This-is -a--very---< # Dashes - - - OK/text> <pos>4-1234</pos> <val>37--002</val> <text>-small----#---example</text> <pos>9-0012</pos> <val>37--003</val> <text>-of-a-text- which-</text> <pos>1-9999</pos> <val>37--004</val> <text>need -to-be- modi fied # but - not - there</text> <pos>0-0000</pos>This second example is a multi-lines replacement, in each
<text>.............</text>zone only and also limited to the part before a#char which can be present or notOf course, we’ll have to use the complete generic regex
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)but, instead of a single(?!ESR), we’ll have to use this variant :(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR_1)(?!ESR_2).)*?\K(?-si:FR)So, the functional regex S/R becomes :
-
SEARCH
(?-si:<text>|(?!\A)\G)(?s-i:(?!</text>)(?!#).)*?\K-+ -
REPLACE
\x20
=> ONLY IF a sequence of dashes is located in a
<text>..........</text>zone AND, moreover, before a possible#char, it will be replaced with a singlespacecharacterAs you can verify, the third multi-lines
<text>.............</text>zone does not contain any#char. Thus, all dash characters, of that<Text>tag, are replaced with a singlespacechar !
Remainder :
-
You must use, at least, the
v7.9.1N++ release, so that the\Aassertion is correctly handled -
Move to the very beginning of file, before any
Find Nextsequence orReplace Alloperation -
Do not click on the step-by-step
Replacebutton
-
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
A Alan Kilborn referenced this topic on
-
P PeterJones referenced this topic on
-
G guy038 referenced this topic on
-
G guy038 referenced this topic on
-
G guy038 referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
L Luigi Giuseppe De Franceschi referenced this topic on
-
G guy038 referenced this topic on
-
G guy038 referenced this topic on
-
G guy038 referenced this topic on
-
A Alan Kilborn referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
A Alan Kilborn referenced this topic on
-
A Alan Kilborn referenced this topic on
-
T Terry R referenced this topic on
-
A Alan Kilborn referenced this topic on
-
P PeterJones referenced this topic on
-
A Alan Kilborn referenced this topic on
-
A Alan Kilborn referenced this topic on
-
F fenzek1 referenced this topic on
-
T Terry R referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
A Alan Kilborn referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
A Alan Kilborn referenced this topic on
-
G guy038 referenced this topic on
-
P Paul Wormer referenced this topic on
-
P PeterJones referenced this topic on
-
M Mark Olson referenced this topic on
-
P PeterJones referenced this topic on
-
G guy038 referenced this topic on
-
P Paul Wormer referenced this topic on
-
A Alan Kilborn referenced this topic on
-
A Alan Kilborn referenced this topic on
-
T Terry R referenced this topic on
-
P PeterJones referenced this topic on
-
D dr ramaanand referenced this topic on
-
T Terry R referenced this topic on
-
S Sylvester Bullitt referenced this topic on
-
S Sylvester Bullitt referenced this topic on
-
T Terry R referenced this topic on
-
G guy038 referenced this topic on
-
M Mark Olson referenced this topic on
-
A Alan Kilborn referenced this topic on
-
M mkupper referenced this topic on
-
G guy038 referenced this topic on
-
C Coises referenced this topic on
-
C Coises referenced this topic on
-
A Alan Kilborn referenced this topic on
-
G guy038 referenced this topic on
-
A Alan Kilborn referenced this topic on
-
G guy038 referenced this topic on
-
T Terry R referenced this topic on
-
G guy038 referenced this topic on
-
M Mark Olson referenced this topic on
-
G guy038 referenced this topic on
-
G guy038 referenced this topic on
-
T Terry R referenced this topic on
-
P PeterJones referenced this topic on
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login