Community
    • Login

    Regex: Finds words that are repeated in multiple lines

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    6 Posts 3 Posters 3.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Vasile CarausV
      Vasile Caraus
      last edited by

      hello. I have this lines with regex expressions, separated by |, of type Regex_A|Regex_B

      (?s)((^.*)(<div class="entry-excerpt">)|(<!-- //.entry -->)(.*$))
      (?s)((^.*)(<ul class="smallThumb-mainList">)|(<div class="navigation">)(.*$))
      (?s)((^.*)(word_2)|(<!-- //.entry -->)(.*$))
      (?s)((^.*)(word_2)|(<!-- //.ambro34 -->)(.*$))
      

      I want to find all those words\regex that are repeated before | and those that repeats after |

      I try a regex, but doesn’t work too good: (?m)(.*)^(.*)\|(.*)(?=.*\1)

      1 Reply Last reply Reply Quote -1
      • Vasile CarausV
        Vasile Caraus
        last edited by Vasile Caraus

        Basic, I want after search and replace to remain only one instance of:

        (?s)((^.*)(word_2) because is repeated 2 times before | (on line 3 and 4)

        (<!-- //.entry -->)(.*$)) because is repeated after | (on line 1 and 3)

        1 Reply Last reply Reply Quote -1
        • Vasile CarausV
          Vasile Caraus
          last edited by

          Maybe, a simple example will be much better:

          Word_1 | Word_2
          Word_3 | Word_2
          Word_4 | Word_5
          Word_4 | Word_6

          In this case, Word_4 and Word_2 are repeated. So, I want after search to remain only this ones.

          Alan KilbornA 1 Reply Last reply Reply Quote -1
          • Alan KilbornA
            Alan Kilborn @Vasile Caraus
            last edited by

            @Vasile-Caraus

            As stated before here (https://notepad-plus-plus.org/community/topic/13248/regex-datetime) I think you’ve worn out everyone’s good nature (with the possible exception of @guy038) with your infinite regex questions. @MAPJe71 pointed out some good references for you to self-learn; that advice still holds. Sorry, but that’s the way I see it.

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hello, @Vasile-Caraus, @alan-kilborn and @MapJe71

              First of all, @alan-kilborn and @MapJe71, although I do understand your point of view and the advices that you give to @Vasile-Caraus, this present exercise seems, however, interesting. You may simply consider that it would allow you to know, in a two-columns table, any text which is repeated, one or more times, in each column !


              So @Vasile-Caraus, let’s go !

              To begin with, some statements and hypotheses :

              • I’ll limit this topic to the general case of two parts of text, only, separated with one Vertical Line character ( Text_A|Text_B ), which, of course, matches the sub-problem of two regexes, separated by the alternative symbol ( Regex_A|Regex_B )

              • For syntaxes, as Text_A|Text_B|Text_C or more, it would be more expensive !! Well, set your mind at ease, I’m joking :-))

              • Of course, these two parts of text do NOT contain the Vertical Line character ( | ), themselves !

              • I chose the Commercial At sign as a temporary character. If your regexes may contain this character, just choose an other symbol, which, preferably, won’t be a special regex symbol !

              • I’ll use the 12-lines original text, below :

              Text_0|Text_C
              Text_1|Text_2
              Text_4|Text_5
              Text_3|Text_2
              Text_4|Text_6
              Text_7|Text_8
              Text_9|Text_2
              Text_4|Text_5
              Text_7|Text_A
              Text_0|Text_B
              Text_2|Text_7
              Text_6|Text_7
              
              • Of course, the different NON-null strings Text_? can have any size !

              So :

              • Open a new tab

              • Copy/Paste the original text, above

              • Hit the Backspace key to suppress the possible End of Line character(s), of the last line ( Line 12 )

              • Open the Replace dialog

              • Then the first regex S/R, below :

              SEARCH (?=(\|))|$

              REPLACE @(?1A-:B-)@

              should produce the text :

              Text_0@A-@|Text_C@B-@
              Text_1@A-@|Text_2@B-@
              Text_4@A-@|Text_5@B-@
              Text_3@A-@|Text_2@B-@
              Text_4@A-@|Text_6@B-@
              Text_7@A-@|Text_8@B-@
              Text_9@A-@|Text_2@B-@
              Text_4@A-@|Text_5@B-@
              Text_7@A-@|Text_A@B-@
              Text_0@A-@|Text_B@B-@
              Text_2@A-@|Text_7@B-@
              Text_6@A-@|Text_7@B-@
              
              • Now, choose the Edit > Column Editor…, or hit the ALT + C shortcut

              • Select the zone Number to Insert

              • Choose 1, as Initial number

              • Choose 1, in the Increase by field

              • Select the Dec format of numbers

              • Place the caret, on the first line, between the strings @A- and @|

              • Click on the OK button

              => A list of numbers, between 1 and 12, is inserted at caret position

              Now, move the caret, on the first line, between the strings @B- and the last @

              • Re-open the Column Editor, with the ALT + C shortcut

              • Hit the Enter key

              => The same list of numbers is inserted, before the last @, of each line :

              Text_0@A-1 @|Text_C@B-1 @
              Text_1@A-2 @|Text_2@B-2 @
              Text_4@A-3 @|Text_5@B-3 @
              Text_3@A-4 @|Text_2@B-4 @
              Text_4@A-5 @|Text_6@B-5 @
              Text_7@A-6 @|Text_8@B-6 @
              Text_9@A-7 @|Text_2@B-7 @
              Text_4@A-8 @|Text_5@B-8 @
              Text_7@A-9 @|Text_A@B-9 @
              Text_0@A-10@|Text_B@B-10@
              Text_2@A-11@|Text_7@B-11@
              Text_6@A-12@|Text_7@B-12@
              

              Then, with that second regex S/R :

              SEARCH \|

              REPLACE \r\n

              we get the one-column list, below :

              Text_0@A-1 @
              Text_C@B-1 @
              Text_1@A-2 @
              Text_2@B-2 @
              Text_4@A-3 @
              Text_5@B-3 @
              Text_3@A-4 @
              Text_2@B-4 @
              Text_4@A-5 @
              Text_6@B-5 @
              Text_7@A-6 @
              Text_8@B-6 @
              Text_9@A-7 @
              Text_2@B-7 @
              Text_4@A-8 @
              Text_5@B-8 @
              Text_7@A-9 @
              Text_A@B-9 @
              Text_0@A-10@
              Text_B@B-10@
              Text_2@A-11@
              Text_7@B-11@
              Text_6@A-12@
              Text_7@B-12@
              

              Now, let’s use the menu option Edit > Line Operations > Sort lines Lexicographically Ascending

              We obtain the sorted text, below :

              Text_0@A-1 @
              Text_0@A-10@
              Text_1@A-2 @
              Text_2@A-11@
              Text_2@B-2 @
              Text_2@B-4 @
              Text_2@B-7 @
              Text_3@A-4 @
              Text_4@A-3 @
              Text_4@A-5 @
              Text_4@A-8 @
              Text_5@B-3 @
              Text_5@B-8 @
              Text_6@A-12@
              Text_6@B-5 @
              Text_7@A-6 @
              Text_7@A-9 @
              Text_7@B-11@
              Text_7@B-12@
              Text_8@B-6 @
              Text_9@A-7 @
              Text_A@B-9 @
              Text_B@B-10@
              Text_C@B-1 @
              

              Then, the third regex S/R, below :

              SEARCH (^.+@.).+\R(?:\1.+\R)+|.+\R

              REPLACE ?1$0

              should delete any text, which is unique, in its column and keeps, only, the different texts, which occur several times, in their column :

              Text_0@A-1 @
              Text_0@A-10@
              Text_2@B-2 @
              Text_2@B-4 @
              Text_2@B-7 @
              Text_4@A-3 @
              Text_4@A-5 @
              Text_4@A-8 @
              Text_5@B-3 @
              Text_5@B-8 @
              Text_7@A-6 @
              Text_7@A-9 @
              Text_7@B-11@
              Text_7@B-12@
              

              Finally, use the fourth and last regex S/R, below :

              SEARCH (^(.+?)@B-|@A-)|\x20*@

              REPLACE ?1|(?2\2)\x20\x20\x20\x20\x20

              Notes :

              • You may replace any syntax \x20 with a single space character !

              • In the replacement regex, you may add some other spaces or replace the spaces by several tabulation characters

              This S/R displays the different texts :

              • With the syntax Text_?|, if this text was located BEFORE the Vertical Line symbol

              • With the syntax |Text_?, if this text was located AFTER the Vertical Line symbol

              • The number, ending each line, represents, by increasing order, the number of each line, where the string Text_? occurs, in order to easily localize this string !

              Text_0|     1
              Text_0|     10
              |Text_2     2
              |Text_2     4
              |Text_2     7
              Text_4|     3
              Text_4|     5
              Text_4|     8
              |Text_5     3
              |Text_5     8
              Text_7|     6
              Text_7|     9
              |Text_7     11
              |Text_7     12
              

              Best Regards,

              guy038

              P.S. :

              If any of the four S/R, above, seems a bit tricky, just tell me about it !

              1 Reply Last reply Reply Quote 0
              • Vasile CarausV
                Vasile Caraus
                last edited by

                Test it and it WORKS. I believe I will use Macros for this long regex.

                thanks, guy038. I believe you are my only friend around here. ;)

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors