Community
    • Login

    Delete all duplicates words in the text

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 1.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Сергій БородінС Offline
      Сергій Бородін
      last edited by

      Hello

      Please tell me how to use regular expressions to remove all duplicates in the text?

      Initial text:

      FileName,Keywords
      filename1.eps,tag1;tag2;tag3
      filename2.eps,tag4;tag1;tag5
      filename3.eps,tag6;tag2;tag9
      filename3.eps,tag7;tag2;tag3;tag8

      It should turn out:

      filename1.eps,tag1;tag2;tag3
      filename2.eps,tag4;tag5
      filename3.eps,tag6;tag9
      filename3.eps,tag7;tag8

      1 Reply Last reply Reply Quote 0
      • guy038G Offline
        guy038
        last edited by guy038

        Hello, @сергій-бородін and All,

        Assuming this initial text :

        filename1.eps,tag1;tag2;tag3
        filename2.eps,tag4;tag1;tag5
        filename3.eps,tag6;tag2;tag9
        filename3.eps,tag7;tag2;tag3;tag8
        filename4.eps,tag1;tag9;tag5
        filename5.eps,tag4;tag6;tag10;tag12
        filename5.eps,tag8;tag2;tag1;tag6;tag11
        filename6.eps,tag3;tag2;tag3;tag10;tag14
        filename7.eps,tag5;tag7;tag15
        filename8.eps,tag4;tag5;tag15;tag16
        filename8.eps,tag3;tag14;tag9;tag7
        filename8.eps,tag7;tag2;tag3;tag8
        filename9.eps,tag2;tag10;tag17
        filename10.eps,tag5;tag1;tag13
        filename10.eps,tag7;tag6;tag9;tag10
        filename11.eps,tag7;tag2;tag3;tag8;tag18
        filename11.eps,tag10;tag12;tag13;tag20
        filename12.eps,tag4;tag8;tag3;tag19
        filename13.eps,tag6;tag15;tag9;tag11
        filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4
        filename15.eps,tag0;tag9,tag20
        

        If I follow your algorithm, I suppose that you expect the text below :

        filename1.eps,tag1;tag2;tag3
        filename2.eps,tag4;tag5
        filename3.eps,tag6;tag9
        filename3.eps,tag7;tag8
        filename4.eps
        filename5.eps,tag10;tag12
        filename5.eps,tag11
        filename6.eps,tag14
        filename7.eps,tag15
        filename8.eps,tag16
        filename8.eps
        filename8.eps
        filename9.eps,tag17
        filename10.eps,tag13
        filename10.eps
        filename11.eps,tag18
        filename11.eps,tag20
        filename12.eps,tag19
        filename13.eps
        filename14.eps
        filename15.eps,tag0
        

        if so, here is a road map to achieve such a task ! Let’s go :


        • Move your caret at beginning of the first line of your list

        • Open the Column Editor ( Alt + C )

          • Select the option Number to Insert

          • Type in the value 1 in all the zones

          • Tick the Leading zeros option

          • Click on the OK button

        You should get :

        01filename1.eps,tag1;tag2;tag3
        02filename2.eps,tag4;tag1;tag5
        03filename3.eps,tag6;tag2;tag9
        04filename3.eps,tag7;tag2;tag3;tag8
        05filename4.eps,tag1;tag9;tag5
        06filename5.eps,tag4;tag6;tag10;tag12
        07filename5.eps,tag8;tag2;tag1;tag6;tag11
        08filename6.eps,tag3;tag2;tag3;tag10;tag14
        09filename7.eps,tag5;tag7;tag15
        10filename8.eps,tag4;tag5;tag15;tag16
        11filename8.eps,tag3;tag14;tag9;tag7
        12filename8.eps,tag7;tag2;tag3;tag8
        13filename9.eps,tag2;tag10;tag17
        14filename10.eps,tag5;tag1;tag13
        15filename10.eps,tag7;tag6;tag9;tag10
        16filename11.eps,tag7;tag2;tag3;tag8;tag18
        17filename11.eps,tag10;tag12;tag13;tag20
        18filename12.eps,tag4;tag8;tag3;tag19
        19filename13.eps,tag6;tag15;tag9;tag11
        20filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4
        21filename15.eps,tag0;tag9,tag20
        
        • Run the menu option Edit > Line Operations > Sort Lines Lexicographically Descending ( Not ascending ! )

        So :

        21filename15.eps,tag0;tag9,tag20
        20filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4
        19filename13.eps,tag6;tag15;tag9;tag11
        18filename12.eps,tag4;tag8;tag3;tag19
        17filename11.eps,tag10;tag12;tag13;tag20
        16filename11.eps,tag7;tag2;tag3;tag8;tag18
        15filename10.eps,tag7;tag6;tag9;tag10
        14filename10.eps,tag5;tag1;tag13
        13filename9.eps,tag2;tag10;tag17
        12filename8.eps,tag7;tag2;tag3;tag8
        11filename8.eps,tag3;tag14;tag9;tag7
        10filename8.eps,tag4;tag5;tag15;tag16
        09filename7.eps,tag5;tag7;tag15
        08filename6.eps,tag3;tag2;tag3;tag10;tag14
        07filename5.eps,tag8;tag2;tag1;tag6;tag11
        06filename5.eps,tag4;tag6;tag10;tag12
        05filename4.eps,tag1;tag9;tag5
        04filename3.eps,tag7;tag2;tag3;tag8
        03filename3.eps,tag6;tag2;tag9
        02filename2.eps,tag4;tag1;tag5
        01filename1.eps,tag1;tag2;tag3
        

        With this simple regex S/R, we change all this list in a one-line list :

        • Open the Replace dialog ( Ctrl + H )

          • SEARCH \R

          • REPLACE #    ( any symbol, not used yet, can be chosen )

          • Select the Regular expression search mode

          • Click on the Replace All button

        We obtain the single line, below :

        21filename15.eps,tag0;tag9,tag20#20filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4#19filename13.eps,tag6;tag15;tag9;tag11#18filename12.eps,tag4;tag8;tag3;tag19#17filename11.eps,tag10;tag12;tag13;tag20#16filename11.eps,tag7;tag2;tag3;tag8;tag18#15filename10.eps,tag7;tag6;tag9;tag10#14filename10.eps,tag5;tag1;tag13#13filename9.eps,tag2;tag10;tag17#12filename8.eps,tag7;tag2;tag3;tag8#11filename8.eps,tag3;tag14;tag9;tag7#10filename8.eps,tag4;tag5;tag15;tag16#09filename7.eps,tag5;tag7;tag15#08filename6.eps,tag3;tag2;tag3;tag10;tag14#07filename5.eps,tag8;tag2;tag1;tag6;tag11#06filename5.eps,tag4;tag6;tag10;tag12#05filename4.eps,tag1;tag9;tag5#04filename3.eps,tag7;tag2;tag3;tag8#03filename3.eps,tag6;tag2;tag9#02filename2.eps,tag4;tag1;tag5#01filename1.eps,tag1;tag2;tag3
        
        • Now, here is the regex S/R, which deletes any duplicated tags :

          • SEARCH (?-is)[,;](\w+)(?=[,;#].*?[,;]\1([,;#]|\R|\z))

          • REPLACE Leave the zone EMPTY

        Your text is shortened as below :

        21filename15.eps,tag0#20filename14.eps#19filename13.eps#18filename12.eps;tag19#17filename11.eps;tag20#16filename11.eps;tag18#15filename10.eps#14filename10.eps;tag13#13filename9.eps;tag17#12filename8.eps#11filename8.eps#10filename8.eps;tag16#09filename7.eps;tag15#08filename6.eps;tag14#07filename5.eps;tag11#06filename5.eps;tag10;tag12#05filename4.eps#04filename3.eps,tag7;tag8#03filename3.eps,tag6;tag9#02filename2.eps,tag4;tag5#01filename1.eps,tag1;tag2;tag3
        
        • Then, we use this other regex S/R to change this single line in a multi-lines list :

          • SEARCH #

          • REPLACE \r\n    ( or \n if your file is an Unix file )

        Giving :

        21filename15.eps,tag0
        20filename14.eps
        19filename13.eps
        18filename12.eps;tag19
        17filename11.eps;tag20
        16filename11.eps;tag18
        15filename10.eps
        14filename10.eps;tag13
        13filename9.eps;tag17
        12filename8.eps
        11filename8.eps
        10filename8.eps;tag16
        09filename7.eps;tag15
        08filename6.eps;tag14
        07filename5.eps;tag11
        06filename5.eps;tag10;tag12
        05filename4.eps
        04filename3.eps,tag7;tag8
        03filename3.eps,tag6;tag9
        02filename2.eps,tag4;tag5
        01filename1.eps,tag1;tag2;tag3
        
        • Run the menu option Edit > Line Operations > Sort Lines Lexicographically Ascending
        01filename1.eps,tag1;tag2;tag3
        02filename2.eps,tag4;tag5
        03filename3.eps,tag6;tag9
        04filename3.eps,tag7;tag8
        05filename4.eps
        06filename5.eps;tag10;tag12
        07filename5.eps;tag11
        08filename6.eps;tag14
        09filename7.eps;tag15
        10filename8.eps;tag16
        11filename8.eps
        12filename8.eps
        13filename9.eps;tag17
        14filename10.eps;tag13
        15filename10.eps
        16filename11.eps;tag18
        17filename11.eps;tag20
        18filename12.eps;tag19
        19filename13.eps
        20filename14.eps
        21filename15.eps,tag0
        
        • Finally, the last regex S/R, below :

          • will get rid of the numbering, at beginning of lines

          • will replace any semi-colon, right after the string .eps with a comma

        So :

          • SEARCH ^\d+|(?<=eps)(;)

          • REPLACE ?1,

        And, here is your final expected text ;-))

        filename1.eps,tag1;tag2;tag3
        filename2.eps,tag4;tag5
        filename3.eps,tag6;tag9
        filename3.eps,tag7;tag8
        filename4.eps
        filename5.eps,tag10;tag12
        filename5.eps,tag11
        filename6.eps,tag14
        filename7.eps,tag15
        filename8.eps,tag16
        filename8.eps
        filename8.eps
        filename9.eps,tag17
        filename10.eps,tag13
        filename10.eps
        filename11.eps,tag18
        filename11.eps,tag20
        filename12.eps,tag19
        filename13.eps
        filename14.eps
        filename15.eps,tag0
        

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 2
        • guy038G Offline
          guy038
          last edited by guy038

          Hi, @сергій-бородін and All,

          Thinking back on your problem, here is a second method, requiring fewer steps, but which will classify each non-duplicated tag, according to a different layout !

          So, assuming the same initial text, below :

          filename1.eps,tag1;tag2;tag3
          filename2.eps,tag4;tag1;tag5
          filename3.eps,tag6;tag2;tag9
          filename3.eps,tag7;tag2;tag3;tag8
          filename4.eps,tag1;tag9;tag5
          filename5.eps,tag4;tag6;tag10;tag12
          filename5.eps,tag8;tag2;tag1;tag6;tag11
          filename6.eps,tag3;tag2;tag3;tag10;tag14
          filename7.eps,tag5;tag7;tag15
          filename8.eps,tag4;tag5;tag15;tag16
          filename8.eps,tag3;tag14;tag9;tag7
          filename8.eps,tag7;tag2;tag3;tag8
          filename9.eps,tag2;tag10;tag17
          filename10.eps,tag5;tag1;tag13
          filename10.eps,tag7;tag6;tag9;tag10
          filename11.eps,tag7;tag2;tag3;tag8;tag18
          filename11.eps,tag10;tag12;tag13;tag20
          filename12.eps,tag4;tag8;tag3;tag19
          filename13.eps,tag6;tag15;tag9;tag11
          filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4
          filename15.eps,tag0;tag9,tag20
          

          First this simple regex S/R, changes all this list in a one-line list :

          • Open the Replace dialog ( Ctrl + H )

            • SEARCH \R

            • REPLACE # ( Any symbol, not used yet, can be chosen )

            • Select the Regular expression search mode

            • Click on the Replace All button

          Which gives the single line, below :

          filename1.eps,tag1;tag2;tag3#filename2.eps,tag4;tag1;tag5#filename3.eps,tag6;tag2;tag9#filename3.eps,tag7;tag2;tag3;tag8#filename4.eps,tag1;tag9;tag5#filename5.eps,tag4;tag6;tag10;tag12#filename5.eps,tag8;tag2;tag1;tag6;tag11#filename6.eps,tag3;tag2;tag3;tag10;tag14#filename7.eps,tag5;tag7;tag15#filename8.eps,tag4;tag5;tag15;tag16#filename8.eps,tag3;tag14;tag9;tag7#filename8.eps,tag7;tag2;tag3;tag8#filename9.eps,tag2;tag10;tag17#filename10.eps,tag5;tag1;tag13#filename10.eps,tag7;tag6;tag9;tag10#filename11.eps,tag7;tag2;tag3;tag8;tag18#filename11.eps,tag10;tag12;tag13;tag20#filename12.eps,tag4;tag8;tag3;tag19#filename13.eps,tag6;tag15;tag9;tag11#filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4#filename15.eps,tag0;tag9,tag20
          
          • Now, here is the regex S/R, which deletes any duplicated tag ( The same regex, described in my previous post ) :

            • SEARCH (?-is)[,;](\w+)(?=[,;#].*?[,;]\1([,;#]|\R|\z))

            • REPLACE Leave the zone EMPTY

          Your text should be shortened as below :

          filename1.eps#filename2.eps#filename3.eps#filename3.eps#filename4.eps#filename5.eps#filename5.eps#filename6.eps#filename7.eps#filename8.eps;tag16#filename8.eps;tag14#filename8.eps#filename9.eps#filename10.eps,tag5;tag1#filename10.eps#filename11.eps;tag18#filename11.eps,tag10;tag13#filename12.eps;tag8;tag19#filename13.eps,tag6;tag15;tag11#filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4#filename15.eps,tag0;tag9,tag20
          
          • Finally, this regex S/R, below :

            • Replaces any semi-colon, right after the string eps with a comma

            • Replaces any # symbol with a line-break ( \r\n or \n )

          SEARCH eps;|(#)

          REPLACE ?1\r\n:eps,    OR    ?1\n:eps, if you works with an Unix file

          And we obtain the final output :

          filename1.eps
          filename2.eps
          filename3.eps
          filename3.eps
          filename4.eps
          filename5.eps
          filename5.eps
          filename6.eps
          filename7.eps
          filename8.eps,tag16
          filename8.eps,tag14
          filename8.eps
          filename9.eps
          filename10.eps,tag5;tag1
          filename10.eps
          filename11.eps,tag18
          filename11.eps,tag10;tag13
          filename12.eps,tag8;tag19
          filename13.eps,tag6;tag15;tag11
          filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4
          filename15.eps,tag0;tag9,tag20
          

          As you can see, the 21 non-duplicated tags ( From tag0 to tag20 ) are arranged differently, with many lines without tag, at beginning of the list !

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 2

          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

          With your input, this post could be even better 💗

          Register Login
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors