Regex: Finds words that are repeated in multiple lines
-
hello. I have this lines with regex expressions, separated by
|, of typeRegex_A|Regex_B(?s)((^.*)(<div class="entry-excerpt">)|(<!-- //.entry -->)(.*$)) (?s)((^.*)(<ul class="smallThumb-mainList">)|(<div class="navigation">)(.*$)) (?s)((^.*)(word_2)|(<!-- //.entry -->)(.*$)) (?s)((^.*)(word_2)|(<!-- //.ambro34 -->)(.*$))I want to find all those words\regex that are repeated before | and those that repeats after |
I try a regex, but doesn’t work too good:
(?m)(.*)^(.*)\|(.*)(?=.*\1) -
Basic, I want after search and replace to remain only one instance of:
(?s)((^.*)(word_2)because is repeated 2 times before|(on line 3 and 4)(<!-- //.entry -->)(.*$))because is repeated after|(on line 1 and 3) -
Maybe, a simple example will be much better:
Word_1 | Word_2
Word_3 | Word_2
Word_4 | Word_5
Word_4 | Word_6In this case, Word_4 and Word_2 are repeated. So, I want after search to remain only this ones.
-
As stated before here (https://notepad-plus-plus.org/community/topic/13248/regex-datetime) I think you’ve worn out everyone’s good nature (with the possible exception of @guy038) with your infinite regex questions. @MAPJe71 pointed out some good references for you to self-learn; that advice still holds. Sorry, but that’s the way I see it.
-
Hello, @Vasile-Caraus, @alan-kilborn and @MapJe71
First of all, @alan-kilborn and @MapJe71, although I do understand your point of view and the advices that you give to @Vasile-Caraus, this present exercise seems, however, interesting. You may simply consider that it would allow you to know, in a two-columns table, any text which is repeated, one or more times, in each column !
So @Vasile-Caraus, let’s go !
To begin with, some statements and hypotheses :
-
I’ll limit this topic to the general case of two parts of text, only, separated with one Vertical Line character (
Text_A|Text_B), which, of course, matches the sub-problem of two regexes, separated by the alternative symbol (Regex_A|Regex_B) -
For syntaxes, as
Text_A|Text_B|Text_Cor more, it would be more expensive !! Well, set your mind at ease, I’m joking :-)) -
Of course, these two parts of text do NOT contain the Vertical Line character (
|), themselves ! -
I chose the Commercial At sign as a temporary character. If your regexes may contain this character, just choose an other symbol, which, preferably, won’t be a special regex symbol !
-
I’ll use the 12-lines original text, below :
Text_0|Text_C Text_1|Text_2 Text_4|Text_5 Text_3|Text_2 Text_4|Text_6 Text_7|Text_8 Text_9|Text_2 Text_4|Text_5 Text_7|Text_A Text_0|Text_B Text_2|Text_7 Text_6|Text_7- Of course, the different NON-null strings Text_? can have any size !
So :
-
Open a new tab
-
Copy/Paste the original text, above
-
Hit the Backspace key to suppress the possible End of Line character(s), of the last line ( Line 12 )
-
Open the Replace dialog
-
Then the
firstregex S/R, below :
SEARCH
(?=(\|))|$REPLACE
@(?1A-:B-)@should produce the text :
Text_0@A-@|Text_C@B-@ Text_1@A-@|Text_2@B-@ Text_4@A-@|Text_5@B-@ Text_3@A-@|Text_2@B-@ Text_4@A-@|Text_6@B-@ Text_7@A-@|Text_8@B-@ Text_9@A-@|Text_2@B-@ Text_4@A-@|Text_5@B-@ Text_7@A-@|Text_A@B-@ Text_0@A-@|Text_B@B-@ Text_2@A-@|Text_7@B-@ Text_6@A-@|Text_7@B-@-
Now, choose the Edit > Column Editor…, or hit the
ALT + Cshortcut -
Select the zone Number to Insert
-
Choose 1, as Initial number
-
Choose 1, in the Increase by field
-
Select the Dec format of numbers
-
Place the caret, on the first line, between the strings
@A-and@| -
Click on the OK button
=> A list of numbers, between 1 and 12, is inserted at caret position
Now, move the caret, on the first line, between the strings
@B-and the last@-
Re-open the Column Editor, with the
ALT + Cshortcut -
Hit the Enter key
=> The same list of numbers is inserted, before the last
@, of each line :Text_0@A-1 @|Text_C@B-1 @ Text_1@A-2 @|Text_2@B-2 @ Text_4@A-3 @|Text_5@B-3 @ Text_3@A-4 @|Text_2@B-4 @ Text_4@A-5 @|Text_6@B-5 @ Text_7@A-6 @|Text_8@B-6 @ Text_9@A-7 @|Text_2@B-7 @ Text_4@A-8 @|Text_5@B-8 @ Text_7@A-9 @|Text_A@B-9 @ Text_0@A-10@|Text_B@B-10@ Text_2@A-11@|Text_7@B-11@ Text_6@A-12@|Text_7@B-12@Then, with that
secondregex S/R :SEARCH
\|REPLACE
\r\nwe get the one-column list, below :
Text_0@A-1 @ Text_C@B-1 @ Text_1@A-2 @ Text_2@B-2 @ Text_4@A-3 @ Text_5@B-3 @ Text_3@A-4 @ Text_2@B-4 @ Text_4@A-5 @ Text_6@B-5 @ Text_7@A-6 @ Text_8@B-6 @ Text_9@A-7 @ Text_2@B-7 @ Text_4@A-8 @ Text_5@B-8 @ Text_7@A-9 @ Text_A@B-9 @ Text_0@A-10@ Text_B@B-10@ Text_2@A-11@ Text_7@B-11@ Text_6@A-12@ Text_7@B-12@Now, let’s use the menu option Edit > Line Operations > Sort lines Lexicographically Ascending
We obtain the sorted text, below :
Text_0@A-1 @ Text_0@A-10@ Text_1@A-2 @ Text_2@A-11@ Text_2@B-2 @ Text_2@B-4 @ Text_2@B-7 @ Text_3@A-4 @ Text_4@A-3 @ Text_4@A-5 @ Text_4@A-8 @ Text_5@B-3 @ Text_5@B-8 @ Text_6@A-12@ Text_6@B-5 @ Text_7@A-6 @ Text_7@A-9 @ Text_7@B-11@ Text_7@B-12@ Text_8@B-6 @ Text_9@A-7 @ Text_A@B-9 @ Text_B@B-10@ Text_C@B-1 @Then, the
thirdregex S/R, below :SEARCH
(^.+@.).+\R(?:\1.+\R)+|.+\RREPLACE
?1$0should delete any text, which is unique, in its column and keeps, only, the different texts, which occur several times, in their column :
Text_0@A-1 @ Text_0@A-10@ Text_2@B-2 @ Text_2@B-4 @ Text_2@B-7 @ Text_4@A-3 @ Text_4@A-5 @ Text_4@A-8 @ Text_5@B-3 @ Text_5@B-8 @ Text_7@A-6 @ Text_7@A-9 @ Text_7@B-11@ Text_7@B-12@Finally, use the
fourthand last regex S/R, below :SEARCH
(^(.+?)@B-|@A-)|\x20*@REPLACE
?1|(?2\2)\x20\x20\x20\x20\x20Notes :
-
You may replace any syntax
\x20with a single space character ! -
In the replacement regex, you may add some other spaces or replace the spaces by several tabulation characters
This S/R displays the different texts :
-
With the syntax
Text_?|, if this text was located BEFORE the Vertical Line symbol -
With the syntax
|Text_?, if this text was located AFTER the Vertical Line symbol -
The number, ending each line, represents, by increasing order, the number of each line, where the string
Text_?occurs, in order to easily localize this string !
Text_0| 1 Text_0| 10 |Text_2 2 |Text_2 4 |Text_2 7 Text_4| 3 Text_4| 5 Text_4| 8 |Text_5 3 |Text_5 8 Text_7| 6 Text_7| 9 |Text_7 11 |Text_7 12Best Regards,
guy038
P.S. :
If any of the four S/R, above, seems a bit tricky, just tell me about it !
-
-
Test it and it WORKS. I believe I will use Macros for this long regex.
thanks, guy038. I believe you are my only friend around here. ;)
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login