Community
    • Login

    How to remove duplicates words?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 4 Posters 3.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Bahaa0013B
      Bahaa0013
      last edited by

      example:

      [Math+ Biology+ Chemistry+ History+ Chemistry+ Math+ Math]
      

      Output:

      [Math+ Biology+ Chemistry+ History]
      

      there are multiple different words, It’s hard to delete the duplicates manually

      Also I tried this:

      (\b\S+\b)(?=.*\b\1\b)
      

      - But it’s keeping the plus symbol “+”

      Alan KilbornA 1 Reply Last reply Reply Quote 1
      • Alan KilbornA
        Alan Kilborn @Bahaa0013
        last edited by

        @Bahaa-Eddin-ツ

        It’s a bit of a tough one.
        Perhaps as a starting point, try fiddling with this:

        Find: (?-s)^(.*?\b(\w+)\b.+?) \2\+?
        Replace: ${1}
        Search mode: Regular expression

        You’d have to run it several times, until no more replacements are made.

        And I just tried it quickly, so I’m sure some holes can be shot into it. :-)

        Bahaa0013B 1 Reply Last reply Reply Quote 2
        • Bahaa0013B
          Bahaa0013 @Alan Kilborn
          last edited by

          @Alan-Kilborn
          Thank you I guess it’s work…
          But I guess I have to run it at least 500 times to remove all the duplicated words xD

          but no problem I will use, it much easier, Thanks

          PeterJonesP Alan KilbornA 2 Replies Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @Bahaa0013
            last edited by

            @Bahaa-Eddin-ツ said in How to remove duplicates words?:

            I guess I have to run it at least 500 times

            Record the search/replace as a macro, then use Macros > Run a Macro Multiple Times to run it 500 (or whatever is necessary).

            1 Reply Last reply Reply Quote 1
            • Alan KilbornA
              Alan Kilborn @Bahaa0013
              last edited by Alan Kilborn

              @Bahaa-Eddin-ツ said in How to remove duplicates words?:

              Thank you I guess it’s work…

              Don’t guess…be sure…your data is important.

              I have to run it at least 500 times to remove all the duplicated words xD

              Hold down the keyboard accelerator for Replace All until the Replace window’s status bar indicates no more replacements were made?

              Bahaa0013B 1 Reply Last reply Reply Quote 1
              • Bahaa0013B
                Bahaa0013 @Alan Kilborn
                last edited by

                @Alan-Kilborn
                I guess it’s not work as I wanted…
                because I didn’t add the right example

                this is what I want:
                example:

                [math part1 +Bilology part1+ biology part3+ History part1+ math part1+ Biology part3+ history part1]
                

                output:

                [Bilology part1+ History part1+ math part1+ Biology part3]
                
                Alan KilbornA 1 Reply Last reply Reply Quote 0
                • Alan KilbornA
                  Alan Kilborn @Bahaa0013
                  last edited by

                  @Bahaa-Eddin-ツ

                  I’d say, start from my kickstart attempt, and go from there. Good luck.

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by guy038

                    Hello, @Bahaa-Eddin-ツ, @alan-kilborn, @peterjones and All,

                    @Bahaa-Eddin-ツ, I suppose that you were already successful with the @alan-kilborn solution !

                    However, here is a solution which just needs one Replace All action !

                    • Open the Replace dialog ( Ctrl + H )

                    • Untick all box options

                    • SEARCH (?xi-s) (?: \[ | \+ ) \x20* ( [^+\r\n]+ ) (?= \x20* \+ .+ \1 )

                    • REPLACE Leave EMPTY

                    • Check the Wrap around option

                    • Select the Regular expression search mode

                    • Click once only on the Replace All button ( or several times on the Replace button )


                    So, for instance, from the INPUT text :

                    [math part1+ Biology part1+ biology part3+ History part1+ Test N°1+ math part1+ Biology part3+ history part1+ Biology part3+ Biology part1+ test number 2+ math part1+ History part1]
                    

                    You should get this OUTPUT text :

                    + Test N°1+ Biology part3+ Biology part1+ test number 2+ math part1+ History part1]
                    

                    Finally, just change the beginning of each section with this obvious regex S/R :

                    SEARCH (?x) ^ \+ \x20*

                    REPLACE [

                    Best Regards

                    guy038

                    Alan KilbornA 1 Reply Last reply Reply Quote 1
                    • Alan KilbornA
                      Alan Kilborn @guy038
                      last edited by Alan Kilborn

                      @guy038 said:

                      SEARCH (?xi-s) (?: [ | + ) \x20* ( [^+\r\n]+ ) (?= \x20* + .+ \1 )

                      It looks suspiciously like the first [ is a victim of this site losing the leading escape??

                      1 Reply Last reply Reply Quote 2
                      • guy038G
                        guy038
                        last edited by guy038

                        Hello, @alan-kilborn and All,

                        Sorry for the confusion !

                        Thus, I replaced my search regex in its initial state

                        And here is the right syntax that should be used :

                        • SEARCH (?xi-s) (?: \\[ | \+ ) \x20* ( [^+\r\n]+ ) (?= \x20* \+ .+ \1 )
                          BR

                        guy038

                        So, Alan, you can delete the EDIT part of your last post !

                        Alan KilbornA 1 Reply Last reply Reply Quote 1
                        • Alan KilbornA
                          Alan Kilborn @guy038
                          last edited by

                          @guy038 said in How to remove duplicates words?:

                          So, Alan, you can delete the EDIT part of your last post !

                          It ALREADY never happened! :-)

                          1 Reply Last reply Reply Quote 1
                          • guy038G
                            guy038
                            last edited by guy038

                            Hello, @alan-kilborn and All,

                            I’ve found out an interesting thing about posts which contains a literal [ character in search regexes :

                            \\[
                            

                            If you must edit one of these posts in order to change any other part, you’ll need to repeat the special modifications, regarding the regexes, by using, again, the syntax :

                            \\\[
                            

                            BR

                            guy038

                            1 Reply Last reply Reply Quote 1
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors