Community
    • Login

    Regex: Finds words that are repeated in multiple lines

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    6 Posts 3 Posters 3.7k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Vasile CarausV Offline
      Vasile Caraus
      last edited by

      hello. I have this lines with regex expressions, separated by |, of type Regex_A|Regex_B

      (?s)((^.*)(<div class="entry-excerpt">)|(<!-- //.entry -->)(.*$))
      (?s)((^.*)(<ul class="smallThumb-mainList">)|(<div class="navigation">)(.*$))
      (?s)((^.*)(word_2)|(<!-- //.entry -->)(.*$))
      (?s)((^.*)(word_2)|(<!-- //.ambro34 -->)(.*$))
      

      I want to find all those words\regex that are repeated before | and those that repeats after |

      I try a regex, but doesn’t work too good: (?m)(.*)^(.*)\|(.*)(?=.*\1)

      1 Reply Last reply Reply Quote -1
      • Vasile CarausV Offline
        Vasile Caraus
        last edited by Vasile Caraus

        Basic, I want after search and replace to remain only one instance of:

        (?s)((^.*)(word_2) because is repeated 2 times before | (on line 3 and 4)

        (<!-- //.entry -->)(.*$)) because is repeated after | (on line 1 and 3)

        1 Reply Last reply Reply Quote -1
        • Vasile CarausV Offline
          Vasile Caraus
          last edited by

          Maybe, a simple example will be much better:

          Word_1 | Word_2
          Word_3 | Word_2
          Word_4 | Word_5
          Word_4 | Word_6

          In this case, Word_4 and Word_2 are repeated. So, I want after search to remain only this ones.

          Alan KilbornA 1 Reply Last reply Reply Quote -1
          • Alan KilbornA Offline
            Alan Kilborn @Vasile Caraus
            last edited by

            @Vasile-Caraus

            As stated before here (https://notepad-plus-plus.org/community/topic/13248/regex-datetime) I think you’ve worn out everyone’s good nature (with the possible exception of @guy038) with your infinite regex questions. @MAPJe71 pointed out some good references for you to self-learn; that advice still holds. Sorry, but that’s the way I see it.

            1 Reply Last reply Reply Quote 0
            • guy038G Offline
              guy038
              last edited by guy038

              Hello, @Vasile-Caraus, @alan-kilborn and @MapJe71

              First of all, @alan-kilborn and @MapJe71, although I do understand your point of view and the advices that you give to @Vasile-Caraus, this present exercise seems, however, interesting. You may simply consider that it would allow you to know, in a two-columns table, any text which is repeated, one or more times, in each column !


              So @Vasile-Caraus, let’s go !

              To begin with, some statements and hypotheses :

              • I’ll limit this topic to the general case of two parts of text, only, separated with one Vertical Line character ( Text_A|Text_B ), which, of course, matches the sub-problem of two regexes, separated by the alternative symbol ( Regex_A|Regex_B )

              • For syntaxes, as Text_A|Text_B|Text_C or more, it would be more expensive !! Well, set your mind at ease, I’m joking :-))

              • Of course, these two parts of text do NOT contain the Vertical Line character ( | ), themselves !

              • I chose the Commercial At sign as a temporary character. If your regexes may contain this character, just choose an other symbol, which, preferably, won’t be a special regex symbol !

              • I’ll use the 12-lines original text, below :

              Text_0|Text_C
              Text_1|Text_2
              Text_4|Text_5
              Text_3|Text_2
              Text_4|Text_6
              Text_7|Text_8
              Text_9|Text_2
              Text_4|Text_5
              Text_7|Text_A
              Text_0|Text_B
              Text_2|Text_7
              Text_6|Text_7
              
              • Of course, the different NON-null strings Text_? can have any size !

              So :

              • Open a new tab

              • Copy/Paste the original text, above

              • Hit the Backspace key to suppress the possible End of Line character(s), of the last line ( Line 12 )

              • Open the Replace dialog

              • Then the first regex S/R, below :

              SEARCH (?=(\|))|$

              REPLACE @(?1A-:B-)@

              should produce the text :

              Text_0@A-@|Text_C@B-@
              Text_1@A-@|Text_2@B-@
              Text_4@A-@|Text_5@B-@
              Text_3@A-@|Text_2@B-@
              Text_4@A-@|Text_6@B-@
              Text_7@A-@|Text_8@B-@
              Text_9@A-@|Text_2@B-@
              Text_4@A-@|Text_5@B-@
              Text_7@A-@|Text_A@B-@
              Text_0@A-@|Text_B@B-@
              Text_2@A-@|Text_7@B-@
              Text_6@A-@|Text_7@B-@
              
              • Now, choose the Edit > Column Editor…, or hit the ALT + C shortcut

              • Select the zone Number to Insert

              • Choose 1, as Initial number

              • Choose 1, in the Increase by field

              • Select the Dec format of numbers

              • Place the caret, on the first line, between the strings @A- and @|

              • Click on the OK button

              => A list of numbers, between 1 and 12, is inserted at caret position

              Now, move the caret, on the first line, between the strings @B- and the last @

              • Re-open the Column Editor, with the ALT + C shortcut

              • Hit the Enter key

              => The same list of numbers is inserted, before the last @, of each line :

              Text_0@A-1 @|Text_C@B-1 @
              Text_1@A-2 @|Text_2@B-2 @
              Text_4@A-3 @|Text_5@B-3 @
              Text_3@A-4 @|Text_2@B-4 @
              Text_4@A-5 @|Text_6@B-5 @
              Text_7@A-6 @|Text_8@B-6 @
              Text_9@A-7 @|Text_2@B-7 @
              Text_4@A-8 @|Text_5@B-8 @
              Text_7@A-9 @|Text_A@B-9 @
              Text_0@A-10@|Text_B@B-10@
              Text_2@A-11@|Text_7@B-11@
              Text_6@A-12@|Text_7@B-12@
              

              Then, with that second regex S/R :

              SEARCH \|

              REPLACE \r\n

              we get the one-column list, below :

              Text_0@A-1 @
              Text_C@B-1 @
              Text_1@A-2 @
              Text_2@B-2 @
              Text_4@A-3 @
              Text_5@B-3 @
              Text_3@A-4 @
              Text_2@B-4 @
              Text_4@A-5 @
              Text_6@B-5 @
              Text_7@A-6 @
              Text_8@B-6 @
              Text_9@A-7 @
              Text_2@B-7 @
              Text_4@A-8 @
              Text_5@B-8 @
              Text_7@A-9 @
              Text_A@B-9 @
              Text_0@A-10@
              Text_B@B-10@
              Text_2@A-11@
              Text_7@B-11@
              Text_6@A-12@
              Text_7@B-12@
              

              Now, let’s use the menu option Edit > Line Operations > Sort lines Lexicographically Ascending

              We obtain the sorted text, below :

              Text_0@A-1 @
              Text_0@A-10@
              Text_1@A-2 @
              Text_2@A-11@
              Text_2@B-2 @
              Text_2@B-4 @
              Text_2@B-7 @
              Text_3@A-4 @
              Text_4@A-3 @
              Text_4@A-5 @
              Text_4@A-8 @
              Text_5@B-3 @
              Text_5@B-8 @
              Text_6@A-12@
              Text_6@B-5 @
              Text_7@A-6 @
              Text_7@A-9 @
              Text_7@B-11@
              Text_7@B-12@
              Text_8@B-6 @
              Text_9@A-7 @
              Text_A@B-9 @
              Text_B@B-10@
              Text_C@B-1 @
              

              Then, the third regex S/R, below :

              SEARCH (^.+@.).+\R(?:\1.+\R)+|.+\R

              REPLACE ?1$0

              should delete any text, which is unique, in its column and keeps, only, the different texts, which occur several times, in their column :

              Text_0@A-1 @
              Text_0@A-10@
              Text_2@B-2 @
              Text_2@B-4 @
              Text_2@B-7 @
              Text_4@A-3 @
              Text_4@A-5 @
              Text_4@A-8 @
              Text_5@B-3 @
              Text_5@B-8 @
              Text_7@A-6 @
              Text_7@A-9 @
              Text_7@B-11@
              Text_7@B-12@
              

              Finally, use the fourth and last regex S/R, below :

              SEARCH (^(.+?)@B-|@A-)|\x20*@

              REPLACE ?1|(?2\2)\x20\x20\x20\x20\x20

              Notes :

              • You may replace any syntax \x20 with a single space character !

              • In the replacement regex, you may add some other spaces or replace the spaces by several tabulation characters

              This S/R displays the different texts :

              • With the syntax Text_?|, if this text was located BEFORE the Vertical Line symbol

              • With the syntax |Text_?, if this text was located AFTER the Vertical Line symbol

              • The number, ending each line, represents, by increasing order, the number of each line, where the string Text_? occurs, in order to easily localize this string !

              Text_0|     1
              Text_0|     10
              |Text_2     2
              |Text_2     4
              |Text_2     7
              Text_4|     3
              Text_4|     5
              Text_4|     8
              |Text_5     3
              |Text_5     8
              Text_7|     6
              Text_7|     9
              |Text_7     11
              |Text_7     12
              

              Best Regards,

              guy038

              P.S. :

              If any of the four S/R, above, seems a bit tricky, just tell me about it !

              1 Reply Last reply Reply Quote 0
              • Vasile CarausV Offline
                Vasile Caraus
                last edited by

                Test it and it WORKS. I believe I will use Macros for this long regex.

                thanks, guy038. I believe you are my only friend around here. ;)

                1 Reply Last reply Reply Quote 0

                Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                With your input, this post could be even better 💗

                Register Login
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors