Proximity search
-
Hello, @iona-hine
So,@iona-hine, here are my clear instructions :-;))
- To search for the string
WordA
, separated from an identical stringWordA
, by the GREATEST range of characters, containing between0
and50
words max, use the following regex :
(?si)(?<=\W)(WordA)(?:\W+\w+){0,50}\W+\1(?=\W)
- To search for the string
WordA
, separated from an identical stringWordA
, by the SHORTTEST range of characters, containing between0
and50
words max, use the following regex :
(?si)(?<=\W)(WordA)(?:\W+\w+){0,50}?\W+\1(?=\W)
- To search for the string
Word1
, separated from an other stringWord2
( or the opposite ) by the GREATEST range of characters, containing between0
and50
words max, use the following regex :
(?si)(?<=\W)(Word1)((?:\W+\w+){0,50}\W+)(Word2)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
- To search for the string
Word1
, separated from an other stringWord2
( or the opposite ) by the SHORTEST range of characters, containing between0
and50
words max, use the following regex :
(?si)(?<=\W)(Word1)((?:\W+\w+){0,50}?\W+)(Word2)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
-
Notes :
-
Of course, change the strings WordA / Word1 and Word2 by whatever you like, as well as the 50 number, of “gap” words, if necessary
-
The
(?si)
syntax, at beginning of these regexes, are modifiers which mean that :-
The dot (
.
) special character matches, absolutely, any single character ( standard or EOL one ) -
The search will be perform, in an insensitive case way ( If you need a sensitive search, just use the syntax
(?s-i)
)
-
-
The part
(?<=\W)(Word1)
OR(?<=\W)(WordA)
matches the stringsWord1
orWordA
, ONLY IF they are preceded by a non-Word character -
The
(?:\W+\w+){0,50}\W+
part, represents the greatest range of words, each preceded by non-Word character(s), up to 50, and followed by some non-Word character(s). Note that the sub-part(?:\W+\w+)
is, itself, a non-capturing group ! -
If a, exclamation mark,
?
, is added, right after the quantifier range{0,50}
, the regex engine will look, instead, for the shortest quantifier, which can match the overall regex -
The part
(Word2)(?=\W)
OR\1(?=\W)
(\1
stands for the stringWordA
) looks for the stringWord2
orWordA
, ONLY IF it’s followed by a non-Word character -
Finally, the part
(?<=\W)(?3)(?2)(?1)(?=\W)
, located after the alternative symbol, in the last two regexes, tries to match the opposite formWord2........Word1
. -
We cannot use backreferences and must use the called subpattern construction (
(?#)
). Indeed, when the regex engine matches the second part of the alternative, the back-references\1
,\2
and\3
would not have been defined. Whereas :-
The called subpattern
(?1)
is identical to the stringWord1
-
The called subpattern
(?2)
is identical to the regex(?:\W+\w+){0,50}\W+
-
The called subpattern
(?3)
is identical to the stringWord2
-
-
Remember, also, that the parentheses, surrounding the look-around features, do not represent any group
You may test these regexes, with real examples, using the license.txt file, provided with any N++ release !
For instance, in the N++
v7.3.3
license.txt file, with a maximum of 50 words between :- To look for the shortest ranges, between two occurrences of the article
the
, whatever its case, use :
(?si)(?<=\W)(the)(?:\W+\w+){0,50}?\W+\1(?=\W)
=> 86 matches
- To look for the greatest ranges, between two occurrences of the article
a
, whatever its case, use :
(?si)(?<=\W)(a)(?:\W+\w+){0,50}\W+\1(?=\W)
=> 16 matches
- To look for the greatest ranges, between the article
the
and the articlea
( or betweena
andthe
), whatever their case, use :
(?si)(?<=\W)(the)((?:\W+\w+){0,50}\W+)(a)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
=> 34 matches
- To look for the shortest ranges, between the article
the
and the articlea
( or betweena
andthe
) , whatever their case, use :
**(?si)(?<=\W)(the)((?:\W+\w+){0,50}?\W+)(a)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
**=> 49 matches
IMPORTANT :
I just realize that we must add a blank line at the very beginning AND an other at the very end of the current file.
Indeed, if the string
Word1
orWordA
is located at the very beginning of the current file AND / OR the stringWord2
orWordA
is located at the very end of the current file, without any additional line-break, the look-arounds(?<=\W)
and(?=\W)
could not be satisfied, leading to a non-overall match !Best Regards,
guy038
P.S. :
- I forgot to add that, if you use, for instance, the Courrier New font, a word character, which can be found with the simple regex
\w
, is any character from the list, below :
------------------------------------------------------------------ BASIC Word characters ----------------- 0123456789 _ ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz ------------------------------------------------------------------ LATIN Letters ------------------------- ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖØÙÚÛÜÝŸÞ ĀĂĄĆĈĊČĎĐĒĔĖĘĚĜĞĠĢĤĦĨĪĬĮİIJĴĶĹĻĽĿŁŃŅŇŊŌŎŐŒŔŖŘŚŜŞŠŢŤŦŨŪŬŮŰŲŴŶŹŻŽ àáâãäåæçèéêëìíîïðñòóôõöøùúûüýÿþ āăąćĉċčďđēĕėęěĝğġģĥħĩīĭįiijĵķĺļľŀłńņňŋōŏőœŕŗřśŝşšţťŧũūŭůűųŵŷźżž ƏƠƯǍǏǑǓǕǗǙǛǺǼǾ ẀẂẄ ẠẢẤẦẨẪẬẮẰẲẴẶẸẺẼẾỀỂỄỆỈỊỌỎỐỒỔỖỘỚỜỞỠỢỤỦỨỪỬỮỰỲỴỶỸ əơưǎǐǒǔǖǘǚǜǻǽǿ ẁẃẅ ạảấầẩẫậắằẳẵặẹẻẽếềểễệỉịọỏốồổỗộớờởỡợụủứừửữựỳỵỷỹ ------------------------------------------------------------------ Unique LATIN Letters ------------------ ß ĸ ʼn ſ ƒ ℓ fi fl ------------------------------------------------------------------ MISCELLANEOUS Symbols ----------------- ¹ ² ³ ⁿ ª º µ Ω ------------------------------------------------------------------ GREEK Letters ------------------------- ΆΈΉΊΌΎΏ ΑΒΓΔΕΖΗΘΙΚΛΜΝΞΟΠΡΣΤΥΦΧΨΩΪΫ άέήίόύώ αβγδεζηθικλμνξοπρστυφχψωϊϋ ΐΰ ς ------------------------------------------------------------------ CYRILLIC Letters ---------------------- ЁЂЃЄЅІЇЈЉЊЋЌЎЏ АБВГДЕЖЗИЙКЛМНОПРСТУФХЦЧШЩЪЫЬЭЮЯ ҐҒҖҚҜҢҮҰҲҸҺӘӨ ёђѓєѕіїјљњћќўџ абвгдежзийклмнопрстуфхцчшщъыьэюя ґғҗқҝңүұҳҹһәө ------------------------------------------------------------------ HEBRAIC Letters ----------------------- אבגדהוזחטיךכלםמןנסעףפץצקרשת פֿﭏﬠשׁשׂשּׁשּׂאַאָאּבּגּדּהּוּזּטּיּךּכּלּמּנּסּףּפּצּקּרּשּתּוֹבֿכֿ ------------------------------------------------------------------ ARABIC Letters ------------------------ ءآأؤإئابةتثجحخدذرزسشصضطظعغفقكلمنهوىي ً ٌ ٍ َ ُ ِ ّ ْ ٠١٢٣٤٥٦٧٨٩ ٰ ٱٴپچژڤکگیە ۤ ۰۱۲۳۴۵۶۷۸۹ ﭐﭑﭖﭗﭘﭙﭪﭫﭬﭭﭺﭻﭼﭽﮊﮋﮎﮏﮐﮑﮒﮓﮔﮕﯼﯽﯾﯿﰈﰉﰎﰒﰱﰲﰿﱀﱁﱂﱃﱄﱎﱏﱘﱙ ﱞﱟﱠﱡﱢ ﱪﱭﱮﱯﱰﱳﱴﱵﲎﲏﲑﲔﲜﲝﲞﲟﲠﲡﲢﲣﲤﲥﲦﲨﲪﲬﲰﳉﳊﳋﳌﳍﳎﳏﳐﳑﳒﳓﳔﳕﳘﳚﳛﳜﳝﴰﴼﴽﶈﷲﺀﺁﺂﺃﺄﺅﺆﺇﺈﺉﺊﺋﺌﺍﺎﺏﺐﺑﺒﺓﺔﺕﺖﺗﺘﺙﺚﺛﺜﺝﺞﺟﺠﺡﺢﺣﺤ ﺥﺦﺧﺨﺩﺪﺫﺬﺭﺮﺯﺰﺱﺲﺳﺴﺵﺶﺷﺸﺹﺺﺻﺼﺽﺾﺿﻀﻁﻂﻃﻄﻅﻆﻇﻈﻉﻊﻋﻌﻍﻎﻏﻐﻑﻒﻓﻔﻕﻖﻗﻘﻙﻚﻛﻜﻝﻞﻟﻠﻡﻢﻣﻤﻥﻦﻧﻨﻩﻪﻫﻬﻭﻮﻯﻰﻱﻲﻳﻴﻵﻶﻷﻸﻹﻺﻻﻼ
So, a Non-Word character (
\W
) is any character, of the Courrier New font, which does not belong to the list just above !
- Concerning the
(?#)
regex syntax, you may, also, refer to the last part of the two posts, below :
- To search for the string
-
Thank you — for taking the time to explain the expression to me as well as providing the necessary answer. I have done my best to follow the explanation. If I can just ask for one clarification:
I ran the following Find in Files search:
(?si)(?<=\W)(word1)((?:\W+\w+){0,50}\W+)(word2)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
and got the following result: 1908 hits in 185 files
I then inverted the search as a check:
(?si)(?<=\W)(word2)((?:\W+\w+){0,50}\W+)(word1)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
and got 1946 hits in 181 files.
Could the difference be accounted for by the absence of initial and final blank lines?
(I’m searching a few hundred files so haven’t amended them. I’m using Notepad++ to investigate a much larger discrepancy, so the results are already an improvement. But it would help to be confident about where the difference lies.) -
I tried to unravel this for myself by checking through one file manually (with the help of Ctrl + F) and comparing my findings with the search result. This has left me even more confused.
Manually, I find that (checking the space of 50 words before and 50 words after in each case, noting that punctuation had already been removed):
word2 appears 17 times in the file
There are 15 instances of word1 preceding word2
There are 8 instances of word 1 following word 2.
And the sum is also true: there are 23 instances of word1 around word2I also find that:
- 10 instances of word2 are preceded by at least one instance of word1.
- 7 instances of word2 are followed by at least one instance of word1.
- 5 instances of word2 have no proximate instances of word1.
- 12 instances of word2 have at least one proximate instance of word1
- 5 instances of word2 are both preceded and followed by at least one instance of word1.
Now, when I run the regex search (see above/below), I get a report of 11 hits in this file, regardless of the order in which I position word1 and word2. So what event is being reported?
“(?si)(?<=\W)(word1)((?:\W+\w+){0,50}\W+)(word2)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)”
“(?si)(?<=\W)(word2)((?:\W+\w+){0,50}\W+)(word1)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)” -
In a related but different direction, I was looking to automate a find-and-replace to add the blank lines mentioned at the top and bottom of the file(s).
I came up with this regex for the Find-what box:
(?-m)^|\z
, using a Replace-with of\r\n
, and that does the job when I try it in RegexBuddy, but when trying same in Notepad++, I get different (and bad) results.For example, if I start with this text:
abc
def
ghi
after the replace I end up with:
blank line
a
b
c
blank line
d
e
f
blank line
g
h
i
blank lineI expected to get (and I did get it in RB, with boost 1.54-1.57):
blank line
abc
def
ghi
blank lineCan anyone fill me in on what goes wrong with this find-n-replace in N++?
-
Hello, @alan-kilborn,
May be I’ll give you a detailed answer, after I solve the Iona problem :-)). @iona-hine, I’m preparing my next post to you. Just wait one hour about !
In the meanwhile, Alan just try this simple S/R :
SEARCH
(?s).+
REPLACE
\r\n$0\r\n
Et voilà !
Cheers,
guy038
-
Hi, @iona-hine,
Ah yes ! I didn’t think about switching the two words. I, immediately, verified with the my test file : the license.txt file ! And, luckily, using the count command, in the Find dialog, I always found the same results, between the regexes :
(?si)(?<=\W)(Word1)((?:\W+\w+){0,50}\W+)(Word2)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
AND
(?si)(?<=\W)(Word2)((?:\W+\w+){0,50}\W+)(Word1)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
and, also, between the regexes :
(?si)(?<=\W)(Word1)((?:\W+\w+){0,50}?\W+)(Word2)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
AND
(?si)(?<=\W)(Word2)((?:\W+\w+){0,50}?\W+)(Word1)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
I tried with some couples of words (
the
anda
,you
andit
,you
andthe
,a
andyou
,the
andit
… ). And, each time, the number of occurrences found was identical for both regexesI also repeated the test with a 750K file and the results are the same, whatever the form
Word1........Word2
orWord2........Word1
was used !I finally changed the range between the two boundaries, from
{50}
to{10}
=> Results OK, all cases.So, Iona, I don’t know, exactly, why you got a difference !?
-
Could you, first, identify one of your files, which does not give the same results, in the two cases.
-
Then, send me an e-mail, with this attached file at :
-
Don’t forget to tell me about the real boundaries strings that you’re using, as
Word1
andWord2
! Thanks :-))
BTW, the absence of the initial and final blank lines doesn’t matter. It would be a problem, ONLY IF
Word1
and/orWord2
were located at the very beginning and/or at the very end of the current file scanned !
Finally, at the current file location, reached by the regex engine, ( each character after another ! ) it tests the regex, in order to match, either, the range
Word1.....Word2
OR the rangeWord2.....Word1
. To be convinced, just copy this short part of the license.txt file, below, in a new tab :1. You may copy and distribute verbatim copies of the Program's source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and give any other recipients of the Program a copy of this License along with the Program. You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee. 2. You may modify your copy or copies of the Program or any portion of it, thus forming a work based on the Program, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:
Select, preferably, the option View > Word wrap, to split long lines
Now, using the Find dialog, and the two words
the
anda
, of my previous post, in, for instance, the regex :(?si)(?<=\W)(the)((?:\W+\w+){0,50}?\W+)(a)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
OR
(?si)(?<=\W)(a)((?:\W+\w+){0,50}?\W+)(the)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
If you click, successively, on the Find Next button, you’ll find 5 occurrences :
-
The fisrt three ones are of the type
the..........a
-
The last two ones are of the type
a..........the
See you later,
Cheers,
guy038
-
@guy038 said:
May be I’ll give you a detailed answer, after I solve the Iona problem :-)
In the meanwhile, Alan just try this simple S/R :Sure, @guy038, the alternative regex you supplied accomplishes the goal, but the more interesting thing is why the regex I had does what it does withing Notepad++…
-
Hello, @alan-kilborn,
The problem, Alan, that you raised, is due to the fact that the present N++ Boost regex engine, does NOT handle backward assertions properly !! It’s the case, for instance, for the syntaxes
\A
( or(?m)^
),\b
,\B
,\<
as well as some look-behinds syntaxes :-((Indeed, with your regex
(?-m)^|\z
` and the sample text, below :abc def ghi
When you hit the Find Next button, repeatedly, it matches, wrongly, 12 times, the zero-length string, instead of 2 times ( at the very beginning and at the very end of the file )
To get the right behaviour, like with the RegexBuddy software ( except for my work-around ), the ONLY way is to use the Francois-R Boyer regex code, which is a very good N++ implementation of the Boost regex library !
If you install the improved François-R Boyer version, of the BOOST regex engine, you’ll get some strong new regex features :
-
Search is performed in 32 bits code-points, so it can handle characters, over the BMP ( Basic Multilingual Plane ). An interesting feature for most Asiatic people !
-
It can manage NUL characters, both, in search and in replacement, too.
-
Look-behinds are correctly handled, even in case of OVERLAPPING, with the end of the previous match.
-
It can handle ALL the Universal Character Names ( UCN) of the UCS Transformation Format , from
\x{0}
to\x{7FFFFFFF}
, particularly, all those of code-points over\x{FFFF}
, which are outside the BMP. -
The backward regex search isn’t stopped, on matching a character, with Unicode code-point over
\x{00FF}
-
The case modifiers
\u
,\l
,\U
and\L
, in replacement, do change any accentuated letter !! -
Most of the time, the step by step replacement, often broken with the present version, works nice with François-R Boyer version
Here are, below, a non exhaustive list of issues, with the CURRENT regex engine, which DO NOT occur, with the François-R Boyer’s version :
-
Overlapping lookbehinds and matched strings are NOT correctly handled. For instance, giving the 20 characters subject string aaaabaaababbbaabbabb and SEARCH =
(?<!a)ba*
, we get 6 matches, but, unfortunately, 2 results are wrong. With the improved version of François, it’s all OK ! -
We can’t use the NUL character in replacement. For example, the simple S/R : SEARCH =
ABC
and REPLACE =DEF\x00GHI
, the result is the string DEF only :-(. The François’s version do insert the NUL character between the strings DEF and GHI ! -
BACKWARD assertions are NOT correctly supported. E.g. : SEARCH =
\A.
matches, successively, all the characters of the FIRST line. With the François’s version it only matches the FIRST character of the current file -
It doesn’t search and replace characters, which are outside the Basic Multilingual Plane (BMP ). For instance, in an full UTF-8 file ( with a BOM ), if SEARCH =
\x{104A5}\x{20AC}
and REPLACE =\x{A3}\x{10482}
, The present regex engine answers Invalid regular expression ! as for the François’s version does the replacement correctly ! ( Of course, your text font must handle these “Over BMP” characters ) -
Now, let’s suppose, for instance, the French subject string Un évènement, on a new line, and the simple SEARCH regex
\w
. After a click on the Find Next button, close the Replace dialog, and keep on searching some word characters, by hitting the F3 key. When you’re, about, at the end of the string, just go searching backwards, by hitting the SHIFT + F3 key. You’ll notice _that it CAN’T go backwards, past the è character !!!. The François’s version does works well, in both directions ! -
A last example : if you try to mark the matches of the simple SEARCH regex
(?<=.).
, the present regex engine marks any character, EVERY OTHER time. With the François’s version, it correctly find all characters, except for the very first of each line ! -
The SEARCH =
(.*)
and REPLACE =\U\1\E
does change any lower-case letter, even accentuated, into its associated upper-case letter ! -
François-R Boyer also created a new option SCFIND_REGEXP_LOCALEORDER, to get ranges of characters, in a locale order, NOT in Unicode order. For instance, the regex range
(?-i)[A-B]
would match all the following characters AÀÁÂÃÄÅĀĂĄǍǺẠẢẤẦẨẪẬẮẰẲẴẶǼB, in a true UTF-8 file, with a suitable font ! -
To end with, the François-R Boyer’s version could display the EXACT error messages, instead of the generic message Invalid regular expression. For instance, the regex
(\d+ab
would report the Unmatched marking parenthesis error message !
VERY IMPORTANT : The Beta N++ regex code, of François-R Boyer DO NOT work, will all versions of N++, AFTER the
6.9.0
version ! So, just follow the few steps, below :- Download, first, the .zip archive of Notepad++ v6.9.0, from the link, below :
https://notepad-plus-plus.org/repository/6.x/6.9/npp.6.9.bin.zip
-
Extract all the contents, of this archive, in any folder, in order to get a
local
N++ installation :-)) -
Inside this new folder, rename the SciLexer.dll file as, for instance,
SciLexer.690
-
Then, to get this Beta N++ regex code ( that has, BTW, NEVER been part of ANY official N++ release ) :
-
Download, from the link below, the modified
SciLexer.dll
file. of François-R Boyer
http://sourceforge.net/projects/npppythonplugsq/files/Beta N%2B%2B regex code/
-
Copy this file, in the installation folder, along with the Notepad++.exe and the
SciLexer.690
files -
Download, too, the readme.txt file, for the infos
-
Restart Notepad++ and enjoy it !
Remark :
Remember that this modified SciLexer.dll file, build on May 2013, is, also, based on the old Scintilla version v2.2.7 !
Of course, I long for, ( since more than 3 years ! ), that this nice version would be fully integrated with, both, the latest version of N++ and Scintilla. Unfortunately, up to now, NO ONE feels interesting to implement the new regex code and, as my C++ skills are, unfortunately, rather, near 0, I’m just trying hard to be satisfied with the present bugged N++ regex code :-((
So, to conclude, in the meanwhile, I keep, in addition to the last N++ version, a local installation of N++ v6.9 , with the François-R Boyer modified version of SciLexer.dll, on my laptop, in order to see the correct search behaviour of most of the regexes and to perform, from time to time, special replacements ;-)
Best Regards,
guy038
-
-
Okay, @guy038…I guess I didn’t expect it to go THERE. :-D
I vaguely remember something about this “Boyer” thing before, probably in previous posts by you.
So…how do we bring Francois R Boyer out of retirement to get a new version of this that will work with a newer Notepad++? :-D I guess that isn’t an option or it would have been done already.
So…I have a pretty good C++ (15 years?) background. My first thought was that maybe I could so something. So last night I started looking at that code. I don’t have trouble with the C++ part, but I have problems with seeing how that code “fits in”. A usual approach for something like this is to use Beyond Compare to do some code differencing to see the changes from one version to another, then take those changes and attempt to apply them to, in this case, a codebase that has evolved (i.e., the current Notepad++ codebase). Unfortunately, in this case, it appears difficult to get a solid starting point.
Maybe others are more familiar with the background of this and are considering working on it? Or would be willing to help me figure out where it stands and how to take it forward?
It seems a pity that Notepad++'s regex engine is known to be deficient and no one cares to make it better.
-
Unfortunately, up to now, NO ONE feels interesting to implement the new regex code
You wanna make me feel guilty, don’t you ;-)
I’m sorry, I totally forgot that I wanted to take a look and give it a try.
I can say, I gave it a try but failed terribly - but now I do have an excuse - no windows anymore :-)Cheers
Claudia -
So I am interested in helping out if I somehow can. I have a good deal of C++ experience, although I took a very quick look at the referenced code and sort of feel the same way about it that @Alan-Kilborn does… I guess @guy038 would have to “drive the bus” on this effort? Not saying he needs C++ experience to do this; some of my best managers ever didn’t understand code at all! Managers facilitate, and have a good understanding of the problem, and there is nobody better on “Notepad++ plus regex” than @guy038
And @Claudia-Frank , I have zero feeling that @guy038 was taking a shot at you! Just not in his nature! (I know you know this).
:) -
Hello @scott-sumner and @claudia-frank,
Oh no, Claudia, please, don’t feel guilty at all ! Scott is totally right about this : I don’t want to “put pressure” on anyone ! I just would like to suggest the nice N++ improvement it would be, to use a stable regex engine implementation, with new features and without some unacceptable bugs of the current version :-))
On the contrary, Claudia It’s rather me which should feel guilty to have left out, since a long time, the testing of your Python script RegexTesterPro :-((
So, cool ! We, all, have plenty of things to do, all day long and, as the common proverb says, tomorrow is an other day !
Cheers,
guy038
-
hello friends, and Happy Easter !
Just read this topic, and guy038, I don’t understand 2 things on those regex of yours, like this one:(?si)(?<=\W)(Word1)((?:\W+\w+){0,50}\W+)(Word2)(?=\W)|(?<=\W)(?3)(?2)(?1)(?=\W)
You forgot to mention what means the second part of your regex
(?<=\W)(?3)(?2)(?1)(?=\W)
?You already know me, I ask a lot of question, to learn more. And if I don’t understand, I ask.