Generic Regex: Replacing in a specific zone of text
-
This regex S/R allows to restrict a replacement to a specific zone of text, possibly repeated, on one or several consecutive lines.
This is particularly useful when dealing with
XMLorHTMLlanguages, if you need to do some modifications within a specificstartandendtag range, only.
-
Let FR (
Find Regex) be the regex which defines the char, string or expression to be searched -
Let RR (
Replacement Regex) be the regex which defines the char, string or expression which must replace the FR expression -
Let BSR (
Begin Search-region Regex) be the regex which defines the beginning of the area where the search for FR, must start -
Let ESR (
End Search-region Regex) be the regex which defines the end of the area where the search for FR must stop
Then, the generic regex can be expressed :
SEARCH
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)REPLACE RR
When the BSR and the different matches of the FR regex are all located in a single line, any line-ending char(s) will implicitly break down the
\Gfeature. The ESR part is then useless and the generic regex can be simplified into :SEARCH
(?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)REPLACE RR
IMPORTANT :
-
You must use, at least, the
v7.9.1N++ release, so that the\Aassertion is correctly handled -
You must, move the caret at the very beginning of current file (
Ctrl + Home) -
If you perform a simple search, without any replacement, just click several times on the
Find Nextbutton to notice the different zones affected by the future replacement -
As soon as a replacement is needed, you’ll have to click on the
Replace Allbutton, exclusively. Thus, it will perform a global replacement on the entire file
NOTES :
-
Each non-capturing group, relative to the BSR, ESR and FR regexes, may be prefixed with the
sor-smodifiers :-
If the BSR and/or ESR and/or FR regexes may match
EOLcharacters, use thesmodifier in the appropriate non-capturing group(s) -
If the BSR and/or ESR and/or FR regexes does not match
EOLcharacters, use the-smodifier in the appropriate non-capturing group(s)
-
-
Each non-capturing group, relative to the BSR, ESR and FR regexes, may be prefixed with the
ior-imodifiers :-
If the BSR and/or ESR and/or FR regexes are sensitive to case, use the
-imodifier in the appropriate non-capturing group(s) -
If the BSR and/or ESR and/or FR regexes are insensitive to case, use the
imodifier in the appropriate non-capturing group(s)
-
-
Of course, these modifiers may not be necessary ( for instance in case of search of an exact string or search of non-letter characters )
-
Note that the generic regexes, above, show the case when :
-
These two generic regexes are sensitive to case => The
-imodifier is present everywhere in the definitions -
The ESR region of the first regex may overlap on several lines => The
smodifier in the ESR non-capturing group
-
-
The FR regex may define a group, between parentheses, which will be re-used in the RR regex with the
\#or${#}syntaxes, where#represents an integer -
The RR regex may contain the
$0syntax which refers to each whole SR match or re-use a group, previously defined in the FR regex
Below, here are two examples to illustrate how to build real regexes S/R from these generic ones !
First, let’s imagine that you want to delete any part within parentheses in any range of text
<Descrip>............</Descrip>, only, located in a single line- Paste the
XMLtext, below, in a new tab :
<iden>123456 (START)</iden> <name>Case_1</name> <descrip>This is a (short) text to (easily) see the results (of the modifications)</descrip> <param>val (250)</param> <iden>123456</iden> <name>Case_2</name> <descrip>And the (obvious) changes occur only in (the) "descrip" tag</descrip> <param>val (500)</param> <iden>123456 (END)</iden> <name>Case_3</name> <descrip>All (the) other tags are (just) untouched</descrip> <param>val (999)</param>-
As all the parts to delete are contained in a single line, we can use the simplified formulation :
-
SEARCH
(?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR) -
REPLACE RR
-
-
Obviously, as we want to delete, the RR regex is a zero-length match. So, the
Replace withfield will be empty -
Now, the FR regex represents a
spacechar followed by the shortest text between parentheses => FR =(?:\x20\(.+?\))We do not need any case modifier as this regex does not refer to letters ! -
The BSR regex is simply the literal string
<descrip>, with this exact case. So BSR =(?-i:<descrip>
Finally, the functional regex S/R to use is :
-
SEARCH
(?-s)(?-i:<descrip>|(?!\A)\G).*?\K(?:\x20\(.+?\)) -
REPLACE
Leave EMPTY -
Open the Replace dialog
Ctrl + H -
Untick all options
-
Select the
Regular expressionsearch mode -
Move to the very beginning of current file (
Ctrl + Home) -
Hit several times the
Find Nextbutton to verify if the FR regex does match what you want ! In this present case it matches a space followed by text between parentheses -
Again, move to the very beginning of current file (
Ctrl + Home) -
Click, once only, on the
Replace Allbutton
=> As expected, all text between parentheses, of the
<descrip>tag only, has been deleted, but the other parentheses, present in other tags, are untouched !
In the second example, we’ll try to replace any number of consecutive
dashcharacter with a singlespacechar in any range<text>..........</text>, possibly splitted into several lines- Paste the following
XMLtext in a new tab
<val>37--001</val> <text>This-is -a</text> <pos>4-1234</pos> <val>37--002</val> <text>-small---example</text> <pos>9-0012</pos> <val>37--003</val> <text>-of-text- which-</text> <pos>1-9999</pos> <val>37--004</val> <text>need -to-be- modi fied</text> <pos>0-0000</pos>-
As, this time, the
<text>..........</text>may be spread over several lines, we’ll use the first generic regex :-
SEARCH
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR) -
REPLACE RR
-
-
Obviously, the RR regex is simply
\x20 -
Now, the FR regex represents a non-null number of consecutive dashe(s) => FR is just
-+, as the non-capturing group seems not needed at all -
The BSR regex is simply the literal string
<text>, with this exact case => BSR =(?-si:<text> -
The ESR regex is the literal string
</text>, with this exact case. So the BSR regex, within its non-capturing group, is(?s-i:(?!</text>).)
Then, the real regex S/R to use is :
-
SEARCH
(?-si:<text>|(?!\A)\G)(?s-i:(?!</text>).)*?\K-+ -
REPLACE
\x20 -
Open the Replace dialog
Ctrl + H -
Untick all options
-
Select the
Regular expressionsearch mode -
Move to the very beginning of current file (
Ctrl + Home) -
Hit several times the
Find Nextbutton to verify if the FR regex does match what you want ! In this present case it matches any consecutive range ofdashchars -
Again, move to the very beginning of current file (
Ctrl + Home) -
Click, once only, on the
Replace Allbutton
=> As expected, all range of consecutive dashes, of the
<text>tag only, have been replaced with a singlespacechar and the otherdashcharacters, present in other tags, are kepted ! -
-
P PeterJones referenced this topic on
-
Two other examples regarding this generic regex ! In these ones, we’ll even restrict the replacements to each concerned zone before a
#character !Paste the text below in a new tab :
<iden>123456 (START)</iden> <name>Case_1</name> <descrip>This is a (short) text to (easily) see the results (of the modifications)# (12345) test (67890)</descrip> <param>val (250)</param> <iden>123456</iden> <name>Case_2</name> <descrip>And the (obvious) changes occur only in (the) "descrip" tag # Parentheses (Yeaah) OK</descrip> <param>val (500)</param> <iden>123456 (END)</iden> <name>Case_3</name> <descrip>All (the) other tags are (just) untouched #(This is) the end (of the test)</descrip> <param>val (999)</param>In this first example, of single-line
<descrip>tags , two solutions are possible :-
Use the complete generic regex
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)where ESR =#which leads to the functional S/R :-
SEARCH
(?-s)(?-i:<descrip>|(?!\A)\G)((?!#).)*?\K(?:\x20\(.+?\)) -
REPLACE
Leave EMPTY
-
=> This time, in addition to only replace in each
<descrip>..........</descrip>zone, NO replacement will occur after the#character of each<descrip>tag !-
Use the simplified solution and add a ESR condition at the end of the regex, giving this generic variant
(?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)(?=ESR)-
SEARCH
(?-s)(?-i:<descrip>|(?!\A)\G).*?\K(?:\x20\(.+?\))(?=.*#) -
REPLACE
Leave EMPTY
-
However, this other solution needs that all the
<descrip>tags contains a comment zone with a#char
Now, paste this other text below in a new tab :
<val>37--001</val> <text>This-is -a--very---< # Dashes - - - OK/text> <pos>4-1234</pos> <val>37--002</val> <text>-small----#---example</text> <pos>9-0012</pos> <val>37--003</val> <text>-of-a-text- which-</text> <pos>1-9999</pos> <val>37--004</val> <text>need -to-be- modi fied # but - not - there</text> <pos>0-0000</pos>This second example is a multi-lines replacement, in each
<text>.............</text>zone only and also limited to the part before a#char which can be present or notOf course, we’ll have to use the complete generic regex
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)but, instead of a single(?!ESR), we’ll have to use this variant :(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR_1)(?!ESR_2).)*?\K(?-si:FR)So, the functional regex S/R becomes :
-
SEARCH
(?-si:<text>|(?!\A)\G)(?s-i:(?!</text>)(?!#).)*?\K-+ -
REPLACE
\x20
=> ONLY IF a sequence of dashes is located in a
<text>..........</text>zone AND, moreover, before a possible#char, it will be replaced with a singlespacecharacterAs you can verify, the third multi-lines
<text>.............</text>zone does not contain any#char. Thus, all dash characters, of that<Text>tag, are replaced with a singlespacechar !
Remainder :
-
You must use, at least, the
v7.9.1N++ release, so that the\Aassertion is correctly handled -
Move to the very beginning of file, before any
Find Nextsequence orReplace Alloperation -
Do not click on the step-by-step
Replacebutton
-
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
A Alan Kilborn referenced this topic on
-
P PeterJones referenced this topic on
-
G guy038 referenced this topic on
-
G guy038 referenced this topic on
-
G guy038 referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
L Luigi Giuseppe De Franceschi referenced this topic on
-
G guy038 referenced this topic on
-
G guy038 referenced this topic on
-
G guy038 referenced this topic on
-
A Alan Kilborn referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
A Alan Kilborn referenced this topic on
-
A Alan Kilborn referenced this topic on
-
T Terry R referenced this topic on
-
A Alan Kilborn referenced this topic on
-
P PeterJones referenced this topic on
-
A Alan Kilborn referenced this topic on
-
A Alan Kilborn referenced this topic on
-
F fenzek1 referenced this topic on
-
T Terry R referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
A Alan Kilborn referenced this topic on
-
P PeterJones referenced this topic on
-
P PeterJones referenced this topic on
-
A Alan Kilborn referenced this topic on
-
G guy038 referenced this topic on
-
P Paul Wormer referenced this topic on
-
P PeterJones referenced this topic on
-
M Mark Olson referenced this topic on
-
P PeterJones referenced this topic on
-
G guy038 referenced this topic on
-
P Paul Wormer referenced this topic on
-
A Alan Kilborn referenced this topic on
-
A Alan Kilborn referenced this topic on
-
T Terry R referenced this topic on
-
P PeterJones referenced this topic on
-
D dr ramaanand referenced this topic on
-
T Terry R referenced this topic on
-
S Sylvester Bullitt referenced this topic on
-
S Sylvester Bullitt referenced this topic on
-
T Terry R referenced this topic on
-
G guy038 referenced this topic on
-
M Mark Olson referenced this topic on
-
A Alan Kilborn referenced this topic on
-
M mkupper referenced this topic on
-
G guy038 referenced this topic on
-
C Coises referenced this topic on
-
C Coises referenced this topic on
-
A Alan Kilborn referenced this topic on
-
G guy038 referenced this topic on
-
A Alan Kilborn referenced this topic on
-
G guy038 referenced this topic on
-
T Terry R referenced this topic on
-
G guy038 referenced this topic on
-
M Mark Olson referenced this topic on
-
G guy038 referenced this topic on
-
G guy038 referenced this topic on
-
T Terry R referenced this topic on
-
P PeterJones referenced this topic on