Delete all duplicates words in the text
-
Hello
Please tell me how to use regular expressions to remove all duplicates in the text?
Initial text:
FileName,Keywords
filename1.eps,tag1;tag2;tag3
filename2.eps,tag4;tag1;tag5
filename3.eps,tag6;tag2;tag9
filename3.eps,tag7;tag2;tag3;tag8It should turn out:
filename1.eps,tag1;tag2;tag3
filename2.eps,tag4;tag5
filename3.eps,tag6;tag9
filename3.eps,tag7;tag8 -
Hello, @сергій-бородін and All,
Assuming this initial text :
filename1.eps,tag1;tag2;tag3 filename2.eps,tag4;tag1;tag5 filename3.eps,tag6;tag2;tag9 filename3.eps,tag7;tag2;tag3;tag8 filename4.eps,tag1;tag9;tag5 filename5.eps,tag4;tag6;tag10;tag12 filename5.eps,tag8;tag2;tag1;tag6;tag11 filename6.eps,tag3;tag2;tag3;tag10;tag14 filename7.eps,tag5;tag7;tag15 filename8.eps,tag4;tag5;tag15;tag16 filename8.eps,tag3;tag14;tag9;tag7 filename8.eps,tag7;tag2;tag3;tag8 filename9.eps,tag2;tag10;tag17 filename10.eps,tag5;tag1;tag13 filename10.eps,tag7;tag6;tag9;tag10 filename11.eps,tag7;tag2;tag3;tag8;tag18 filename11.eps,tag10;tag12;tag13;tag20 filename12.eps,tag4;tag8;tag3;tag19 filename13.eps,tag6;tag15;tag9;tag11 filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4 filename15.eps,tag0;tag9,tag20If I follow your algorithm, I suppose that you expect the text below :
filename1.eps,tag1;tag2;tag3 filename2.eps,tag4;tag5 filename3.eps,tag6;tag9 filename3.eps,tag7;tag8 filename4.eps filename5.eps,tag10;tag12 filename5.eps,tag11 filename6.eps,tag14 filename7.eps,tag15 filename8.eps,tag16 filename8.eps filename8.eps filename9.eps,tag17 filename10.eps,tag13 filename10.eps filename11.eps,tag18 filename11.eps,tag20 filename12.eps,tag19 filename13.eps filename14.eps filename15.eps,tag0if so, here is a road map to achieve such a task ! Let’s go :
-
Move your caret at beginning of the first line of your list
-
Open the Column Editor (
Alt + C)-
Select the option
Number to Insert -
Type in the value
1in all the zones -
Tick the
Leading zerosoption -
Click on the
OKbutton
-
You should get :
01filename1.eps,tag1;tag2;tag3 02filename2.eps,tag4;tag1;tag5 03filename3.eps,tag6;tag2;tag9 04filename3.eps,tag7;tag2;tag3;tag8 05filename4.eps,tag1;tag9;tag5 06filename5.eps,tag4;tag6;tag10;tag12 07filename5.eps,tag8;tag2;tag1;tag6;tag11 08filename6.eps,tag3;tag2;tag3;tag10;tag14 09filename7.eps,tag5;tag7;tag15 10filename8.eps,tag4;tag5;tag15;tag16 11filename8.eps,tag3;tag14;tag9;tag7 12filename8.eps,tag7;tag2;tag3;tag8 13filename9.eps,tag2;tag10;tag17 14filename10.eps,tag5;tag1;tag13 15filename10.eps,tag7;tag6;tag9;tag10 16filename11.eps,tag7;tag2;tag3;tag8;tag18 17filename11.eps,tag10;tag12;tag13;tag20 18filename12.eps,tag4;tag8;tag3;tag19 19filename13.eps,tag6;tag15;tag9;tag11 20filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4 21filename15.eps,tag0;tag9,tag20- Run the menu option
Edit > Line Operations > Sort Lines Lexicographically Descending( Not ascending ! )
So :
21filename15.eps,tag0;tag9,tag20 20filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4 19filename13.eps,tag6;tag15;tag9;tag11 18filename12.eps,tag4;tag8;tag3;tag19 17filename11.eps,tag10;tag12;tag13;tag20 16filename11.eps,tag7;tag2;tag3;tag8;tag18 15filename10.eps,tag7;tag6;tag9;tag10 14filename10.eps,tag5;tag1;tag13 13filename9.eps,tag2;tag10;tag17 12filename8.eps,tag7;tag2;tag3;tag8 11filename8.eps,tag3;tag14;tag9;tag7 10filename8.eps,tag4;tag5;tag15;tag16 09filename7.eps,tag5;tag7;tag15 08filename6.eps,tag3;tag2;tag3;tag10;tag14 07filename5.eps,tag8;tag2;tag1;tag6;tag11 06filename5.eps,tag4;tag6;tag10;tag12 05filename4.eps,tag1;tag9;tag5 04filename3.eps,tag7;tag2;tag3;tag8 03filename3.eps,tag6;tag2;tag9 02filename2.eps,tag4;tag1;tag5 01filename1.eps,tag1;tag2;tag3With this simple regex S/R, we change all this list in a
one-line list :-
Open the Replace dialog (
Ctrl + H)-
SEARCH
\R -
REPLACE
#( any symbol, not used yet, can be chosen ) -
Select the
Regular expressionsearch mode -
Click on the
Replace Allbutton
-
We obtain the single line, below :
21filename15.eps,tag0;tag9,tag20#20filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4#19filename13.eps,tag6;tag15;tag9;tag11#18filename12.eps,tag4;tag8;tag3;tag19#17filename11.eps,tag10;tag12;tag13;tag20#16filename11.eps,tag7;tag2;tag3;tag8;tag18#15filename10.eps,tag7;tag6;tag9;tag10#14filename10.eps,tag5;tag1;tag13#13filename9.eps,tag2;tag10;tag17#12filename8.eps,tag7;tag2;tag3;tag8#11filename8.eps,tag3;tag14;tag9;tag7#10filename8.eps,tag4;tag5;tag15;tag16#09filename7.eps,tag5;tag7;tag15#08filename6.eps,tag3;tag2;tag3;tag10;tag14#07filename5.eps,tag8;tag2;tag1;tag6;tag11#06filename5.eps,tag4;tag6;tag10;tag12#05filename4.eps,tag1;tag9;tag5#04filename3.eps,tag7;tag2;tag3;tag8#03filename3.eps,tag6;tag2;tag9#02filename2.eps,tag4;tag1;tag5#01filename1.eps,tag1;tag2;tag3-
Now, here is the regex S/R, which deletes any duplicated tags :
-
SEARCH
(?-is)[,;](\w+)(?=[,;#].*?[,;]\1([,;#]|\R|\z)) -
REPLACE
Leave the zone EMPTY
-
Your text is shortened as below :
21filename15.eps,tag0#20filename14.eps#19filename13.eps#18filename12.eps;tag19#17filename11.eps;tag20#16filename11.eps;tag18#15filename10.eps#14filename10.eps;tag13#13filename9.eps;tag17#12filename8.eps#11filename8.eps#10filename8.eps;tag16#09filename7.eps;tag15#08filename6.eps;tag14#07filename5.eps;tag11#06filename5.eps;tag10;tag12#05filename4.eps#04filename3.eps,tag7;tag8#03filename3.eps,tag6;tag9#02filename2.eps,tag4;tag5#01filename1.eps,tag1;tag2;tag3-
Then, we use this other regex S/R to change this single line in a
multi-lines list :-
SEARCH
# -
REPLACE
\r\n( or\nif your file is an Unix file )
-
Giving :
21filename15.eps,tag0 20filename14.eps 19filename13.eps 18filename12.eps;tag19 17filename11.eps;tag20 16filename11.eps;tag18 15filename10.eps 14filename10.eps;tag13 13filename9.eps;tag17 12filename8.eps 11filename8.eps 10filename8.eps;tag16 09filename7.eps;tag15 08filename6.eps;tag14 07filename5.eps;tag11 06filename5.eps;tag10;tag12 05filename4.eps 04filename3.eps,tag7;tag8 03filename3.eps,tag6;tag9 02filename2.eps,tag4;tag5 01filename1.eps,tag1;tag2;tag3- Run the menu option
Edit > Line Operations > Sort Lines Lexicographically Ascending
01filename1.eps,tag1;tag2;tag3 02filename2.eps,tag4;tag5 03filename3.eps,tag6;tag9 04filename3.eps,tag7;tag8 05filename4.eps 06filename5.eps;tag10;tag12 07filename5.eps;tag11 08filename6.eps;tag14 09filename7.eps;tag15 10filename8.eps;tag16 11filename8.eps 12filename8.eps 13filename9.eps;tag17 14filename10.eps;tag13 15filename10.eps 16filename11.eps;tag18 17filename11.eps;tag20 18filename12.eps;tag19 19filename13.eps 20filename14.eps 21filename15.eps,tag0-
Finally, the last regex S/R, below :
-
will get rid of the numbering, at beginning of lines
-
will replace any semi-colon, right after the string
.epswith a comma
-
So :
-
-
SEARCH
^\d+|(?<=eps)(;) -
REPLACE
?1,
-
And, here is your final expected text ;-))
filename1.eps,tag1;tag2;tag3 filename2.eps,tag4;tag5 filename3.eps,tag6;tag9 filename3.eps,tag7;tag8 filename4.eps filename5.eps,tag10;tag12 filename5.eps,tag11 filename6.eps,tag14 filename7.eps,tag15 filename8.eps,tag16 filename8.eps filename8.eps filename9.eps,tag17 filename10.eps,tag13 filename10.eps filename11.eps,tag18 filename11.eps,tag20 filename12.eps,tag19 filename13.eps filename14.eps filename15.eps,tag0Best Regards,
guy038
-
-
Hi, @сергій-бородін and All,
Thinking back on your problem, here is a second method, requiring fewer steps, but which will classify each non-duplicated tag, according to a different layout !
So, assuming the same initial text, below :
filename1.eps,tag1;tag2;tag3 filename2.eps,tag4;tag1;tag5 filename3.eps,tag6;tag2;tag9 filename3.eps,tag7;tag2;tag3;tag8 filename4.eps,tag1;tag9;tag5 filename5.eps,tag4;tag6;tag10;tag12 filename5.eps,tag8;tag2;tag1;tag6;tag11 filename6.eps,tag3;tag2;tag3;tag10;tag14 filename7.eps,tag5;tag7;tag15 filename8.eps,tag4;tag5;tag15;tag16 filename8.eps,tag3;tag14;tag9;tag7 filename8.eps,tag7;tag2;tag3;tag8 filename9.eps,tag2;tag10;tag17 filename10.eps,tag5;tag1;tag13 filename10.eps,tag7;tag6;tag9;tag10 filename11.eps,tag7;tag2;tag3;tag8;tag18 filename11.eps,tag10;tag12;tag13;tag20 filename12.eps,tag4;tag8;tag3;tag19 filename13.eps,tag6;tag15;tag9;tag11 filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4 filename15.eps,tag0;tag9,tag20First this simple regex S/R, changes all this list in a
one-line list :-
Open the Replace dialog (
Ctrl + H)-
SEARCH
\R -
REPLACE
#( Any symbol, not used yet, can be chosen ) -
Select the
Regular expressionsearch mode -
Click on the
Replace Allbutton
-
Which gives the single line, below :
filename1.eps,tag1;tag2;tag3#filename2.eps,tag4;tag1;tag5#filename3.eps,tag6;tag2;tag9#filename3.eps,tag7;tag2;tag3;tag8#filename4.eps,tag1;tag9;tag5#filename5.eps,tag4;tag6;tag10;tag12#filename5.eps,tag8;tag2;tag1;tag6;tag11#filename6.eps,tag3;tag2;tag3;tag10;tag14#filename7.eps,tag5;tag7;tag15#filename8.eps,tag4;tag5;tag15;tag16#filename8.eps,tag3;tag14;tag9;tag7#filename8.eps,tag7;tag2;tag3;tag8#filename9.eps,tag2;tag10;tag17#filename10.eps,tag5;tag1;tag13#filename10.eps,tag7;tag6;tag9;tag10#filename11.eps,tag7;tag2;tag3;tag8;tag18#filename11.eps,tag10;tag12;tag13;tag20#filename12.eps,tag4;tag8;tag3;tag19#filename13.eps,tag6;tag15;tag9;tag11#filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4#filename15.eps,tag0;tag9,tag20-
Now, here is the regex S/R, which deletes any duplicated tag ( The same regex, described in my previous post ) :
-
SEARCH
(?-is)[,;](\w+)(?=[,;#].*?[,;]\1([,;#]|\R|\z)) -
REPLACE
Leave the zone EMPTY
-
Your text should be shortened as below :
filename1.eps#filename2.eps#filename3.eps#filename3.eps#filename4.eps#filename5.eps#filename5.eps#filename6.eps#filename7.eps#filename8.eps;tag16#filename8.eps;tag14#filename8.eps#filename9.eps#filename10.eps,tag5;tag1#filename10.eps#filename11.eps;tag18#filename11.eps,tag10;tag13#filename12.eps;tag8;tag19#filename13.eps,tag6;tag15;tag11#filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4#filename15.eps,tag0;tag9,tag20-
Finally, this regex S/R, below :
-
Replaces any semi-colon, right after the string
epswith a comma -
Replaces any
#symbol with a line-break (\r\nor\n)
-
SEARCH
eps;|(#)REPLACE
?1\r\n:eps,OR?1\n:eps,if you works with an Unix fileAnd we obtain the final output :
filename1.eps filename2.eps filename3.eps filename3.eps filename4.eps filename5.eps filename5.eps filename6.eps filename7.eps filename8.eps,tag16 filename8.eps,tag14 filename8.eps filename9.eps filename10.eps,tag5;tag1 filename10.eps filename11.eps,tag18 filename11.eps,tag10;tag13 filename12.eps,tag8;tag19 filename13.eps,tag6;tag15;tag11 filename14.eps,tag7;tag2;tag3;tag17;tag12;tag4 filename15.eps,tag0;tag9,tag20As you can see, the
21non-duplicated tags ( Fromtag0totag20) are arranged differently, with many lines without tag, at beginning of the list !Best Regards,
guy038
-
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login