• Login
Community
  • Login

Regex: Finds words that are repeated in multiple lines

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
6 Posts 3 Posters 3.2k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V
    Vasile Caraus
    last edited by Mar 14, 2017, 7:06 AM

    hello. I have this lines with regex expressions, separated by |, of type Regex_A|Regex_B

    (?s)((^.*)(<div class="entry-excerpt">)|(<!-- //.entry -->)(.*$))
    (?s)((^.*)(<ul class="smallThumb-mainList">)|(<div class="navigation">)(.*$))
    (?s)((^.*)(word_2)|(<!-- //.entry -->)(.*$))
    (?s)((^.*)(word_2)|(<!-- //.ambro34 -->)(.*$))
    

    I want to find all those words\regex that are repeated before | and those that repeats after |

    I try a regex, but doesn’t work too good: (?m)(.*)^(.*)\|(.*)(?=.*\1)

    1 Reply Last reply Reply Quote -1
    • V
      Vasile Caraus
      last edited by Vasile Caraus Mar 14, 2017, 1:46 PM Mar 14, 2017, 1:45 PM

      Basic, I want after search and replace to remain only one instance of:

      (?s)((^.*)(word_2) because is repeated 2 times before | (on line 3 and 4)

      (<!-- //.entry -->)(.*$)) because is repeated after | (on line 1 and 3)

      1 Reply Last reply Reply Quote -1
      • V
        Vasile Caraus
        last edited by Mar 14, 2017, 5:06 PM

        Maybe, a simple example will be much better:

        Word_1 | Word_2
        Word_3 | Word_2
        Word_4 | Word_5
        Word_4 | Word_6

        In this case, Word_4 and Word_2 are repeated. So, I want after search to remain only this ones.

        A 1 Reply Last reply Mar 14, 2017, 5:50 PM Reply Quote -1
        • A
          Alan Kilborn @Vasile Caraus
          last edited by Mar 14, 2017, 5:50 PM

          @Vasile-Caraus

          As stated before here (https://notepad-plus-plus.org/community/topic/13248/regex-datetime ) I think you’ve worn out everyone’s good nature (with the possible exception of @guy038) with your infinite regex questions. @MAPJe71 pointed out some good references for you to self-learn; that advice still holds. Sorry, but that’s the way I see it.

          1 Reply Last reply Reply Quote 0
          • G
            guy038
            last edited by guy038 Mar 15, 2017, 7:56 AM Mar 14, 2017, 11:37 PM

            Hello, @Vasile-Caraus, @alan-kilborn and @MapJe71

            First of all, @alan-kilborn and @MapJe71, although I do understand your point of view and the advices that you give to @Vasile-Caraus, this present exercise seems, however, interesting. You may simply consider that it would allow you to know, in a two-columns table, any text which is repeated, one or more times, in each column !


            So @Vasile-Caraus, let’s go !

            To begin with, some statements and hypotheses :

            • I’ll limit this topic to the general case of two parts of text, only, separated with one Vertical Line character ( Text_A|Text_B ), which, of course, matches the sub-problem of two regexes, separated by the alternative symbol ( Regex_A|Regex_B )

            • For syntaxes, as Text_A|Text_B|Text_C or more, it would be more expensive !! Well, set your mind at ease, I’m joking :-))

            • Of course, these two parts of text do NOT contain the Vertical Line character ( | ), themselves !

            • I chose the Commercial At sign as a temporary character. If your regexes may contain this character, just choose an other symbol, which, preferably, won’t be a special regex symbol !

            • I’ll use the 12-lines original text, below :

            Text_0|Text_C
            Text_1|Text_2
            Text_4|Text_5
            Text_3|Text_2
            Text_4|Text_6
            Text_7|Text_8
            Text_9|Text_2
            Text_4|Text_5
            Text_7|Text_A
            Text_0|Text_B
            Text_2|Text_7
            Text_6|Text_7
            
            • Of course, the different NON-null strings Text_? can have any size !

            So :

            • Open a new tab

            • Copy/Paste the original text, above

            • Hit the Backspace key to suppress the possible End of Line character(s), of the last line ( Line 12 )

            • Open the Replace dialog

            • Then the first regex S/R, below :

            SEARCH (?=(\|))|$

            REPLACE @(?1A-:B-)@

            should produce the text :

            Text_0@A-@|Text_C@B-@
            Text_1@A-@|Text_2@B-@
            Text_4@A-@|Text_5@B-@
            Text_3@A-@|Text_2@B-@
            Text_4@A-@|Text_6@B-@
            Text_7@A-@|Text_8@B-@
            Text_9@A-@|Text_2@B-@
            Text_4@A-@|Text_5@B-@
            Text_7@A-@|Text_A@B-@
            Text_0@A-@|Text_B@B-@
            Text_2@A-@|Text_7@B-@
            Text_6@A-@|Text_7@B-@
            
            • Now, choose the Edit > Column Editor…, or hit the ALT + C shortcut

            • Select the zone Number to Insert

            • Choose 1, as Initial number

            • Choose 1, in the Increase by field

            • Select the Dec format of numbers

            • Place the caret, on the first line, between the strings @A- and @|

            • Click on the OK button

            => A list of numbers, between 1 and 12, is inserted at caret position

            Now, move the caret, on the first line, between the strings @B- and the last @

            • Re-open the Column Editor, with the ALT + C shortcut

            • Hit the Enter key

            => The same list of numbers is inserted, before the last @, of each line :

            Text_0@A-1 @|Text_C@B-1 @
            Text_1@A-2 @|Text_2@B-2 @
            Text_4@A-3 @|Text_5@B-3 @
            Text_3@A-4 @|Text_2@B-4 @
            Text_4@A-5 @|Text_6@B-5 @
            Text_7@A-6 @|Text_8@B-6 @
            Text_9@A-7 @|Text_2@B-7 @
            Text_4@A-8 @|Text_5@B-8 @
            Text_7@A-9 @|Text_A@B-9 @
            Text_0@A-10@|Text_B@B-10@
            Text_2@A-11@|Text_7@B-11@
            Text_6@A-12@|Text_7@B-12@
            

            Then, with that second regex S/R :

            SEARCH \|

            REPLACE \r\n

            we get the one-column list, below :

            Text_0@A-1 @
            Text_C@B-1 @
            Text_1@A-2 @
            Text_2@B-2 @
            Text_4@A-3 @
            Text_5@B-3 @
            Text_3@A-4 @
            Text_2@B-4 @
            Text_4@A-5 @
            Text_6@B-5 @
            Text_7@A-6 @
            Text_8@B-6 @
            Text_9@A-7 @
            Text_2@B-7 @
            Text_4@A-8 @
            Text_5@B-8 @
            Text_7@A-9 @
            Text_A@B-9 @
            Text_0@A-10@
            Text_B@B-10@
            Text_2@A-11@
            Text_7@B-11@
            Text_6@A-12@
            Text_7@B-12@
            

            Now, let’s use the menu option Edit > Line Operations > Sort lines Lexicographically Ascending

            We obtain the sorted text, below :

            Text_0@A-1 @
            Text_0@A-10@
            Text_1@A-2 @
            Text_2@A-11@
            Text_2@B-2 @
            Text_2@B-4 @
            Text_2@B-7 @
            Text_3@A-4 @
            Text_4@A-3 @
            Text_4@A-5 @
            Text_4@A-8 @
            Text_5@B-3 @
            Text_5@B-8 @
            Text_6@A-12@
            Text_6@B-5 @
            Text_7@A-6 @
            Text_7@A-9 @
            Text_7@B-11@
            Text_7@B-12@
            Text_8@B-6 @
            Text_9@A-7 @
            Text_A@B-9 @
            Text_B@B-10@
            Text_C@B-1 @
            

            Then, the third regex S/R, below :

            SEARCH (^.+@.).+\R(?:\1.+\R)+|.+\R

            REPLACE ?1$0

            should delete any text, which is unique, in its column and keeps, only, the different texts, which occur several times, in their column :

            Text_0@A-1 @
            Text_0@A-10@
            Text_2@B-2 @
            Text_2@B-4 @
            Text_2@B-7 @
            Text_4@A-3 @
            Text_4@A-5 @
            Text_4@A-8 @
            Text_5@B-3 @
            Text_5@B-8 @
            Text_7@A-6 @
            Text_7@A-9 @
            Text_7@B-11@
            Text_7@B-12@
            

            Finally, use the fourth and last regex S/R, below :

            SEARCH (^(.+?)@B-|@A-)|\x20*@

            REPLACE ?1|(?2\2)\x20\x20\x20\x20\x20

            Notes :

            • You may replace any syntax \x20 with a single space character !

            • In the replacement regex, you may add some other spaces or replace the spaces by several tabulation characters

            This S/R displays the different texts :

            • With the syntax Text_?|, if this text was located BEFORE the Vertical Line symbol

            • With the syntax |Text_?, if this text was located AFTER the Vertical Line symbol

            • The number, ending each line, represents, by increasing order, the number of each line, where the string Text_? occurs, in order to easily localize this string !

            Text_0|     1
            Text_0|     10
            |Text_2     2
            |Text_2     4
            |Text_2     7
            Text_4|     3
            Text_4|     5
            Text_4|     8
            |Text_5     3
            |Text_5     8
            Text_7|     6
            Text_7|     9
            |Text_7     11
            |Text_7     12
            

            Best Regards,

            guy038

            P.S. :

            If any of the four S/R, above, seems a bit tricky, just tell me about it !

            1 Reply Last reply Reply Quote 0
            • V
              Vasile Caraus
              last edited by Mar 15, 2017, 5:50 AM

              Test it and it WORKS. I believe I will use Macros for this long regex.

              thanks, guy038. I believe you are my only friend around here. ;)

              1 Reply Last reply Reply Quote 0
              6 out of 6
              • First post
                6/6
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors