Hi, @Lemmy-westin, and All,
Thinking again about your problem, I succeeded to build a general method and the corresponding regexes !
So, let’s suppose you have a text, separated in TWO parts, by a single line, build of some # characters.
Then, you may like to search for :
Case D1 : Lines, which lie, ONLY, in the FIRST part of the text ( BEFORE the ###### line )
Case E1 : Lines, which lie, BOTH, in the TWO parts of the text ( BEFORE and AFTER the ###### line )
Case D2 : Parts of line, which lie, ONLY, in the FIRST part of the text ( BEFORE the ###### line )
Case E2 : Parts of line, which lie, BOTH, in the TWO parts of the text ( BEFORE and AFTER the ###### line )
Case D3 : Single words, which lie, ONLY, in the FIRST part of the text ( BEFORE the ###### line )
Case E3 : Single words, which lie, BOTH, in the TWO parts of the text ( BEFORE and AFTER the ###### line )
Remark :
If you want to search for ranges, in the SECOND part of text, exclusively, just swap the two parts of text and use, either, the case D1, D2 or D3 !
To, correctly, define these three ranges of text, we’ll use a start boundary and an end boundary. They will be used, in the look-behind and look-ahead structures, and will NEVER be part of the regex to search for !
For cases D1 and E1 :
Start boundary = ^ ( Beginning of line ) OR \R ( End of Line characters of previous line )
End boundary = \R ( End of line character(s) = \r\n in Windows files or \n in Unix files )
Searched regex .+ ( All standard characters of any NO-blank line )
For cases D2 and E2 :
Start boundary = % ( An other dummy character, NOT already used in current text )
End boundary = % ( The same character, as above )
Searched regex = .+ ( Any NON-null range of standard characters, between the two % excluded limits )
For cases D3 and E3 :
Start boundary = \W ( A NON-word character, so, any character different from [0-9A-Za-z] and from all accentuated characters. This, also, includes the End of Line characters )
End boundary = \W ( A NON-word character, as above )
Searched regex = (\w+) ( A complete single word, of any length, between two excluded NON-word characters )
Now, here are the regexes to achieve these different searches :
Case D1 : (?i)^(.+)(?s)(?=\R.*#+(?!.*\R\1(\R|\z))) OR (?i)^(.+)(?s)(?=\R.*#+)(?!.*#+.*\R\1(\R|\z))
Case E1 : (?i)^(.+)(?s)(?=\R.*#+(?=.*\R\1(\R|\z))) OR (?i)^(.+)(?s)(?=\R.*#+.*\R\1(\R|\z))
You may test the D1 and E1 regexes with, for instance, the text, below, in a NEW tab :
When we speak of free
software, we are referring to
freedom, not price. Our General
When we speak of free
software, we are referring to
make sure that you have the
freedom to distribute copies
This is a simple test
#########################################
This IS A simple TEST
When we SPEAK of free
freedom, not price. Our General
make sure that you have the
freedom, not price. Our General
Case D2 : (?i)(?<=%)(.+)(?s)(?=%.*#+(?!.*%\1%)) OR (?i)(?<=%)(.+)(?s)(?=%.*#+)(?!%.*#+.*%\1%)
Case E2 : (?i)(?<=%)(.+)(?s)(?=%.*#+(?=.*%\1%)) OR (?i)(?<=%)(.+)(?s)(?=%.*#+.*%\1%)
You may test the D2 and E2 regexes with, for instance, the text, below, in a NEW tab :
111 %When we speak of free% 111
222,%software, we are referring to%,222
333 % freedom, not price. Our General% 333
abc %When we speak of free% abc
xyz,%software, we are referring to%,xyz
%make sure that you have the%
555 %freedom to distribute copies% 555
666:%This is a simple test%:666
#####################################################################
777|||%This is A simple TEST%|||777
888----%When we SPEAK of free%----888
999% freedom, not price. Our General%999
abc %make sure that you have the% abc
000000000% freedom, not price. Our General%0000000000000000
------------- %make sure that you have the% ------------
Case D3 : (?si)(?<=\W)(\w+)(?=\W.*#+(?!.*\W\1(\W|\z))) OR (?si)(?<=\W)(\w+)(?=\W.*#+)(?!.*#+.*\W\1(\W|\z))
Case E3 : (?si)(?<=\W)(\w+)(?=\W.*#+(?=.*\W\1(\W|\z))) OR (?si)(?<=\W)(\w+)(?=\W.*#+.*\W\1(\W|\z))
You may test the D3 and E3 regexes with, for instance, the text, below, in a NEW tab :
software
price
freedom
SOFtware
prICE
General
Public
This is a simple test to find out identical / different words inside that text
##########################################################################################
This, is A test in order to know the same / different words of the text
SoftwarE
freeDOM
genERal
FREEDOM
Notes :
The last cases D3 and E3 are the ones, discussed in my previous topic
All the regexes , above, are case insensitive. If searches must be sensitive, just change the (?i) syntaxes into (?-i) and the (?si) syntaxes into (?s-i)
Remember that your text must contain just ONE line with , at least, one # character
Regarding the D1, D2 and D3 equivalent regexes, their general template are :
[Modifiers][Positive Look-Behind][Regex to Search][Positive Look-Ahead[Negative Look-Ahead]], with nested look-aheads
[Modifiers][Positive Look-Behind][Regex to Search][Positive Look-Ahead][Negative Look-Ahead], with juxtaposed look-aheads
Regarding the E1, E2 and E3 equivalent regexes, their general template are :
[Modifiers][Positive Look-Behind][Regex to Search][Positive Look-Ahead[Positive Look-Ahead]], with nested look-aheads
[Modifiers][Positive Look-Behind][Regex to Search][Positive Look-Ahead], with 1 look-ahead, only
Just notice that a positive look-ahead, nested in an other positive look-ahead, may be merged in an unique look-ahead. But it’s impossible to merge a negative look-ahead, nested in a positive look-ahead !
Of course, as usual, you may replace, delete, mark or bookmark the different matches, for further modifications !
Cheers,
guy038