Generic Regex: Replacing in a specific zone of text
-
This regex S/R allows to restrict a replacement to a specific zone of text, possibly repeated, on one or several consecutive lines.
This is particularly useful when dealing with
XML
orHTML
languages, if you need to do some modifications within a specificstart
andend
tag range, only.
-
Let FR (
Find Regex
) be the regex which defines the char, string or expression to be searched -
Let RR (
Replacement Regex
) be the regex which defines the char, string or expression which must replace the FR expression -
Let BSR (
Begin Search-region Regex
) be the regex which defines the beginning of the area where the search for FR, must start -
Let ESR (
End Search-region Regex
) be the regex which defines the end of the area where the search for FR must stop
Then, the generic regex can be expressed :
SEARCH
(?-si:
BSR|(?!\A)\G)(?s-i:(?!
ESR).)*?\K(?-si:
FR)
REPLACE RR
When the BSR and the different matches of the FR regex are all located in a single line, any line-ending char(s) will implicitly break down the
\G
feature. The ESR part is then useless and the generic regex can be simplified into :SEARCH
(?-s)(?-i:
BSR|(?!\A)\G).*?\K(?-i:
FR)
REPLACE RR
IMPORTANT :
-
You must use, at least, the
v7.9.1
N++ release, so that the\A
assertion is correctly handled -
You must, move the caret at the very beginning of current file (
Ctrl + Home
) -
If you perform a simple search, without any replacement, just click several times on the
Find Next
button to notice the different zones affected by the future replacement -
As soon as a replacement is needed, you’ll have to click on the
Replace All
button, exclusively. Thus, it will perform a global replacement on the entire file
NOTES :
-
Each non-capturing group, relative to the BSR, ESR and FR regexes, may be prefixed with the
s
or-s
modifiers :-
If the BSR and/or ESR and/or FR regexes may match
EOL
characters, use thes
modifier in the appropriate non-capturing group(s) -
If the BSR and/or ESR and/or FR regexes does not match
EOL
characters, use the-s
modifier in the appropriate non-capturing group(s)
-
-
Each non-capturing group, relative to the BSR, ESR and FR regexes, may be prefixed with the
i
or-i
modifiers :-
If the BSR and/or ESR and/or FR regexes are sensitive to case, use the
-i
modifier in the appropriate non-capturing group(s) -
If the BSR and/or ESR and/or FR regexes are insensitive to case, use the
i
modifier in the appropriate non-capturing group(s)
-
-
Of course, these modifiers may not be necessary ( for instance in case of search of an exact string or search of non-letter characters )
-
Note that the generic regexes, above, show the case when :
-
These two generic regexes are sensitive to case => The
-i
modifier is present everywhere in the definitions -
The ESR region of the first regex may overlap on several lines => The
s
modifier in the ESR non-capturing group
-
-
The FR regex may define a group, between parentheses, which will be re-used in the RR regex with the
\#
or${#}
syntaxes, where#
represents an integer -
The RR regex may contain the
$0
syntax which refers to each whole SR match or re-use a group, previously defined in the FR regex
Below, here are two examples to illustrate how to build real regexes S/R from these generic ones !
First, let’s imagine that you want to delete any part within parentheses in any range of text
<Descrip>............</Descrip>
, only, located in a single line- Paste the
XML
text, below, in a new tab :
<iden>123456 (START)</iden> <name>Case_1</name> <descrip>This is a (short) text to (easily) see the results (of the modifications)</descrip> <param>val (250)</param> <iden>123456</iden> <name>Case_2</name> <descrip>And the (obvious) changes occur only in (the) "descrip" tag</descrip> <param>val (500)</param> <iden>123456 (END)</iden> <name>Case_3</name> <descrip>All (the) other tags are (just) untouched</descrip> <param>val (999)</param>
-
As all the parts to delete are contained in a single line, we can use the simplified formulation :
-
SEARCH
(?-s)(?-i:
BSR|(?!\A)\G).*?\K(?-i:
FR)
-
REPLACE RR
-
-
Obviously, as we want to delete, the RR regex is a zero-length match. So, the
Replace with
field will be empty -
Now, the FR regex represents a
space
char followed by the shortest text between parentheses => FR =(?:\x20\(.+?\))
We do not need any case modifier as this regex does not refer to letters ! -
The BSR regex is simply the literal string
<descrip>
, with this exact case. So BSR =(?-i:<descrip>
Finally, the functional regex S/R to use is :
-
SEARCH
(?-s)(?-i:<descrip>|(?!\A)\G).*?\K(?:\x20\(.+?\))
-
REPLACE
Leave EMPTY
-
Open the Replace dialog
Ctrl + H
-
Untick all options
-
Select the
Regular expression
search mode -
Move to the very beginning of current file (
Ctrl + Home
) -
Hit several times the
Find Next
button to verify if the FR regex does match what you want ! In this present case it matches a space followed by text between parentheses -
Again, move to the very beginning of current file (
Ctrl + Home
) -
Click, once only, on the
Replace All
button
=> As expected, all text between parentheses, of the
<descrip>
tag only, has been deleted, but the other parentheses, present in other tags, are untouched !
In the second example, we’ll try to replace any number of consecutive
dash
character with a singlespace
char in any range<text>..........</text>
, possibly splitted into several lines- Paste the following
XML
text in a new tab
<val>37--001</val> <text>This-is -a</text> <pos>4-1234</pos> <val>37--002</val> <text>-small---example</text> <pos>9-0012</pos> <val>37--003</val> <text>-of-text- which-</text> <pos>1-9999</pos> <val>37--004</val> <text>need -to-be- modi fied</text> <pos>0-0000</pos>
-
As, this time, the
<text>..........</text>
may be spread over several lines, we’ll use the first generic regex :-
SEARCH
(?-si:
BSR|(?!\A)\G)(?s-i:(?!
ESR).)*?\K(?-si:
FR)
-
REPLACE RR
-
-
Obviously, the RR regex is simply
\x20
-
Now, the FR regex represents a non-null number of consecutive dashe(s) => FR is just
-+
, as the non-capturing group seems not needed at all -
The BSR regex is simply the literal string
<text>
, with this exact case => BSR =(?-si:<text>
-
The ESR regex is the literal string
</text>
, with this exact case. So the BSR regex, within its non-capturing group, is(?s-i:(?!</text>).)
Then, the real regex S/R to use is :
-
SEARCH
(?-si:<text>|(?!\A)\G)(?s-i:(?!</text>).)*?\K-+
-
REPLACE
\x20
-
Open the Replace dialog
Ctrl + H
-
Untick all options
-
Select the
Regular expression
search mode -
Move to the very beginning of current file (
Ctrl + Home
) -
Hit several times the
Find Next
button to verify if the FR regex does match what you want ! In this present case it matches any consecutive range ofdash
chars -
Again, move to the very beginning of current file (
Ctrl + Home
) -
Click, once only, on the
Replace All
button
=> As expected, all range of consecutive dashes, of the
<text>
tag only, have been replaced with a singlespace
char and the otherdash
characters, present in other tags, are kepted ! -
-
-
Two other examples regarding this generic regex ! In these ones, we’ll even restrict the replacements to each concerned zone before a
#
character !Paste the text below in a new tab :
<iden>123456 (START)</iden> <name>Case_1</name> <descrip>This is a (short) text to (easily) see the results (of the modifications)# (12345) test (67890)</descrip> <param>val (250)</param> <iden>123456</iden> <name>Case_2</name> <descrip>And the (obvious) changes occur only in (the) "descrip" tag # Parentheses (Yeaah) OK</descrip> <param>val (500)</param> <iden>123456 (END)</iden> <name>Case_3</name> <descrip>All (the) other tags are (just) untouched #(This is) the end (of the test)</descrip> <param>val (999)</param>
In this first example, of single-line
<descrip>
tags , two solutions are possible :-
Use the complete generic regex
(?-si:
BSR|(?!\A)\G)(?s-i:(?!
ESR).)*?\K(?-si:
FR)
where ESR =#
which leads to the functional S/R :-
SEARCH
(?-s)(?-i:<descrip>|(?!\A)\G)((?!#).)*?\K(?:\x20\(.+?\))
-
REPLACE
Leave EMPTY
-
=> This time, in addition to only replace in each
<descrip>..........</descrip>
zone, NO replacement will occur after the#
character of each<descrip>
tag !-
Use the simplified solution and add a ESR condition at the end of the regex, giving this generic variant
(?-s)(?-i:
BSR|(?!\A)\G).*?\K(?-i:
FR)(?=
ESR)
-
SEARCH
(?-s)(?-i:<descrip>|(?!\A)\G).*?\K(?:\x20\(.+?\))(?=.*#)
-
REPLACE
Leave EMPTY
-
However, this other solution needs that all the
<descrip>
tags contains a comment zone with a#
char
Now, paste this other text below in a new tab :
<val>37--001</val> <text>This-is -a--very---< # Dashes - - - OK/text> <pos>4-1234</pos> <val>37--002</val> <text>-small----#---example</text> <pos>9-0012</pos> <val>37--003</val> <text>-of-a-text- which-</text> <pos>1-9999</pos> <val>37--004</val> <text>need -to-be- modi fied # but - not - there</text> <pos>0-0000</pos>
This second example is a multi-lines replacement, in each
<text>.............</text>
zone only and also limited to the part before a#
char which can be present or notOf course, we’ll have to use the complete generic regex
(?-si:
BSR|(?!\A)\G)(?s-i:(?!
ESR).)*?\K(?-si:
FR)
but, instead of a single(?!
ESR)
, we’ll have to use this variant :(?-si:
BSR|(?!\A)\G)(?s-i:(?!
ESR_1)(?!
ESR_2).)*?\K(?-si:
FR)
So, the functional regex S/R becomes :
-
SEARCH
(?-si:<text>|(?!\A)\G)(?s-i:(?!</text>)(?!#).)*?\K-+
-
REPLACE
\x20
=> ONLY IF a sequence of dashes is located in a
<text>..........</text>
zone AND, moreover, before a possible#
char, it will be replaced with a singlespace
characterAs you can verify, the third multi-lines
<text>.............</text>
zone does not contain any#
char. Thus, all dash characters, of that<Text>
tag, are replaced with a singlespace
char !
Remainder :
-
You must use, at least, the
v7.9.1
N++ release, so that the\A
assertion is correctly handled -
Move to the very beginning of file, before any
Find Next
sequence orReplace All
operation -
Do not click on the step-by-step
Replace
button
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-