How to remove duplicates words?
-
@Bahaa-Eddin-ツ
It’s a bit of a tough one.
Perhaps as a starting point, try fiddling with this:Find:
(?-s)^(.*?\b(\w+)\b.+?) \2\+?
Replace:${1}
Search mode: Regular expressionYou’d have to run it several times, until no more replacements are made.
And I just tried it quickly, so I’m sure some holes can be shot into it. :-)
-
@Alan-Kilborn
Thank you I guess it’s work…
But I guess I have to run it at least 500 times to remove all the duplicated words xDbut no problem I will use, it much easier, Thanks
-
@Bahaa-Eddin-ツ said in How to remove duplicates words?:
I guess I have to run it at least 500 times
Record the search/replace as a macro, then use Macros > Run a Macro Multiple Times to run it 500 (or whatever is necessary).
-
@Bahaa-Eddin-ツ said in How to remove duplicates words?:
Thank you I guess it’s work…
Don’t guess…be sure…your data is important.
I have to run it at least 500 times to remove all the duplicated words xD
Hold down the keyboard accelerator for Replace All until the Replace window’s status bar indicates no more replacements were made?
-
@Alan-Kilborn
I guess it’s not work as I wanted…
because I didn’t add the right examplethis is what I want:
example:[math part1 +Bilology part1+ biology part3+ History part1+ math part1+ Biology part3+ history part1]
output:
[Bilology part1+ History part1+ math part1+ Biology part3]
-
@Bahaa-Eddin-ツ
I’d say, start from my kickstart attempt, and go from there. Good luck.
-
Hello, @Bahaa-Eddin-ツ, @alan-kilborn, @peterjones and All,
@Bahaa-Eddin-ツ, I suppose that you were already successful with the @alan-kilborn solution !
However, here is a solution which just needs one
Replace All
action !-
Open the Replace dialog (
Ctrl + H
) -
Untick all box options
-
SEARCH
(?xi-s) (?: \[ | \+ ) \x20* ( [^+\r\n]+ ) (?= \x20* \+ .+ \1 )
-
REPLACE
Leave EMPTY
-
Check the
Wrap around
option -
Select the
Regular expression
search mode -
Click once only on the
Replace All
button ( or several times on theReplace
button )
So, for instance, from the INPUT text :
[math part1+ Biology part1+ biology part3+ History part1+ Test N°1+ math part1+ Biology part3+ history part1+ Biology part3+ Biology part1+ test number 2+ math part1+ History part1]
You should get this OUTPUT text :
+ Test N°1+ Biology part3+ Biology part1+ test number 2+ math part1+ History part1]
Finally, just change the beginning of each section with this obvious regex S/R :
SEARCH
(?x) ^ \+ \x20*
REPLACE
[
Best Regards
guy038
-
-
@guy038 said:
SEARCH (?xi-s) (?: [ | + ) \x20* ( [^+\r\n]+ ) (?= \x20* + .+ \1 )
It looks suspiciously like the first
[
is a victim of this site losing the leading escape?? -
Hello, @alan-kilborn and All,
Sorry for the confusion !
Thus, I replaced my search regex in its initial state
And here is the right syntax that should be used :
- SEARCH
(?xi-s) (?: \\[ | \+ ) \x20* ( [^+\r\n]+ ) (?= \x20* \+ .+ \1 )
BR
guy038
So, Alan, you can delete the EDIT part of your last post !
- SEARCH
-
@guy038 said in How to remove duplicates words?:
So, Alan, you can delete the EDIT part of your last post !
It ALREADY never happened! :-)
-
Hello, @alan-kilborn and All,
I’ve found out an interesting thing about posts which contains a literal
[
character in search regexes :\\[
If you must edit one of these posts in order to change any other part, you’ll need to repeat the special modifications, regarding the regexes, by using, again, the syntax :
\\\[
BR
guy038