• Login
Community
  • Login

How to remove duplicates words?

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
12 Posts 4 Posters 3.7k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • B
    Bahaa0013
    last edited by Feb 24, 2023, 12:46 PM

    example:

    [Math+ Biology+ Chemistry+ History+ Chemistry+ Math+ Math]
    

    Output:

    [Math+ Biology+ Chemistry+ History]
    

    there are multiple different words, It’s hard to delete the duplicates manually

    Also I tried this:

    (\b\S+\b)(?=.*\b\1\b)
    

    - But it’s keeping the plus symbol “+”

    A 1 Reply Last reply Feb 24, 2023, 1:21 PM Reply Quote 1
    • A
      Alan Kilborn @Bahaa0013
      last edited by Feb 24, 2023, 1:21 PM

      @Bahaa-Eddin-ツ

      It’s a bit of a tough one.
      Perhaps as a starting point, try fiddling with this:

      Find: (?-s)^(.*?\b(\w+)\b.+?) \2\+?
      Replace: ${1}
      Search mode: Regular expression

      You’d have to run it several times, until no more replacements are made.

      And I just tried it quickly, so I’m sure some holes can be shot into it. :-)

      B 1 Reply Last reply Feb 24, 2023, 1:47 PM Reply Quote 2
      • B
        Bahaa0013 @Alan Kilborn
        last edited by Feb 24, 2023, 1:47 PM

        @Alan-Kilborn
        Thank you I guess it’s work…
        But I guess I have to run it at least 500 times to remove all the duplicated words xD

        but no problem I will use, it much easier, Thanks

        P A 2 Replies Last reply Feb 24, 2023, 1:49 PM Reply Quote 0
        • P
          PeterJones @Bahaa0013
          last edited by Feb 24, 2023, 1:49 PM

          @Bahaa-Eddin-ツ said in How to remove duplicates words?:

          I guess I have to run it at least 500 times

          Record the search/replace as a macro, then use Macros > Run a Macro Multiple Times to run it 500 (or whatever is necessary).

          1 Reply Last reply Reply Quote 1
          • A
            Alan Kilborn @Bahaa0013
            last edited by Alan Kilborn Feb 24, 2023, 1:51 PM Feb 24, 2023, 1:50 PM

            @Bahaa-Eddin-ツ said in How to remove duplicates words?:

            Thank you I guess it’s work…

            Don’t guess…be sure…your data is important.

            I have to run it at least 500 times to remove all the duplicated words xD

            Hold down the keyboard accelerator for Replace All until the Replace window’s status bar indicates no more replacements were made?

            B 1 Reply Last reply Feb 24, 2023, 2:23 PM Reply Quote 1
            • B
              Bahaa0013 @Alan Kilborn
              last edited by Feb 24, 2023, 2:23 PM

              @Alan-Kilborn
              I guess it’s not work as I wanted…
              because I didn’t add the right example

              this is what I want:
              example:

              [math part1 +Bilology part1+ biology part3+ History part1+ math part1+ Biology part3+ history part1]
              

              output:

              [Bilology part1+ History part1+ math part1+ Biology part3]
              
              A 1 Reply Last reply Feb 24, 2023, 2:27 PM Reply Quote 0
              • A
                Alan Kilborn @Bahaa0013
                last edited by Feb 24, 2023, 2:27 PM

                @Bahaa-Eddin-ツ

                I’d say, start from my kickstart attempt, and go from there. Good luck.

                1 Reply Last reply Reply Quote 0
                • G
                  guy038
                  last edited by guy038 Feb 25, 2023, 12:44 PM Feb 25, 2023, 10:35 AM

                  Hello, @Bahaa-Eddin-ツ, @alan-kilborn, @peterjones and All,

                  @Bahaa-Eddin-ツ, I suppose that you were already successful with the @alan-kilborn solution !

                  However, here is a solution which just needs one Replace All action !

                  • Open the Replace dialog ( Ctrl + H )

                  • Untick all box options

                  • SEARCH (?xi-s) (?: \[ | \+ ) \x20* ( [^+\r\n]+ ) (?= \x20* \+ .+ \1 )

                  • REPLACE Leave EMPTY

                  • Check the Wrap around option

                  • Select the Regular expression search mode

                  • Click once only on the Replace All button ( or several times on the Replace button )


                  So, for instance, from the INPUT text :

                  [math part1+ Biology part1+ biology part3+ History part1+ Test N°1+ math part1+ Biology part3+ history part1+ Biology part3+ Biology part1+ test number 2+ math part1+ History part1]
                  

                  You should get this OUTPUT text :

                  + Test N°1+ Biology part3+ Biology part1+ test number 2+ math part1+ History part1]
                  

                  Finally, just change the beginning of each section with this obvious regex S/R :

                  SEARCH (?x) ^ \+ \x20*

                  REPLACE [

                  Best Regards

                  guy038

                  A 1 Reply Last reply Feb 25, 2023, 12:26 PM Reply Quote 1
                  • A
                    Alan Kilborn @guy038
                    last edited by Alan Kilborn Feb 25, 2023, 12:53 PM Feb 25, 2023, 12:26 PM

                    @guy038 said:

                    SEARCH (?xi-s) (?: [ | + ) \x20* ( [^+\r\n]+ ) (?= \x20* + .+ \1 )

                    It looks suspiciously like the first [ is a victim of this site losing the leading escape??

                    1 Reply Last reply Reply Quote 2
                    • G
                      guy038
                      last edited by guy038 Feb 25, 2023, 1:03 PM Feb 25, 2023, 12:52 PM

                      Hello, @alan-kilborn and All,

                      Sorry for the confusion !

                      Thus, I replaced my search regex in its initial state

                      And here is the right syntax that should be used :

                      • SEARCH (?xi-s) (?: \\[ | \+ ) \x20* ( [^+\r\n]+ ) (?= \x20* \+ .+ \1 )
                        BR

                      guy038

                      So, Alan, you can delete the EDIT part of your last post !

                      A 1 Reply Last reply Feb 25, 2023, 1:06 PM Reply Quote 1
                      • A
                        Alan Kilborn @guy038
                        last edited by Feb 25, 2023, 1:06 PM

                        @guy038 said in How to remove duplicates words?:

                        So, Alan, you can delete the EDIT part of your last post !

                        It ALREADY never happened! :-)

                        1 Reply Last reply Reply Quote 1
                        • G
                          guy038
                          last edited by guy038 Feb 25, 2023, 1:21 PM Feb 25, 2023, 1:15 PM

                          Hello, @alan-kilborn and All,

                          I’ve found out an interesting thing about posts which contains a literal [ character in search regexes :

                          \\[
                          

                          If you must edit one of these posts in order to change any other part, you’ll need to repeat the special modifications, regarding the regexes, by using, again, the syntax :

                          \\\[
                          

                          BR

                          guy038

                          1 Reply Last reply Reply Quote 1
                          3 out of 12
                          • First post
                            3/12
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors