Hello, @カヒノビチアレクセイ
Not very difficult, indeed !
If you don’t mind about a final sort of your unique CJK characters, here is a way to achieve it, very quickly :-))
First of all, just backup your original list ( A safe behaviour to adopt, in any case ! )
Now, let suppose you have the following list of CJK characters. I just added, after a space, the Unicode code-point of each character
丰 4E30
不 4E0D
丆 4E06
与 4E0E
不 4E0D
丰 4E30
且 4E14
世 4E16
中 4E2D
且 4E14
与 4E0E
丰 4E30
丟 4E1F
中 4E2D
与 4E0E
中 4E2D
丆 4E06
丰 4E30
First, perform a classical sort, with the menu option Edit > Line Operations > Sort lines Lexicographically Ascending. We get, immediately, the sorted text, below :
丆 4E06
丆 4E06
不 4E0D
不 4E0D
与 4E0E
与 4E0E
与 4E0E
且 4E14
且 4E14
世 4E16
丟 4E1F
中 4E2D
中 4E2D
中 4E2D
丰 4E30
丰 4E30
丰 4E30
丰 4E30
Now :
Move back to the very beginning of your file ( Ctrl + Origin )
Open the Replace dialog ( Ctrl + H )
In the Find what: zone, paste or type the regex (?-s)^(.+\R)\1+
Leave the Replace with: zone EMPTY
Select the Regular expression search mode
Click on the Replace All button
=> You should get, only, the two lines, below :
世 4E16
丟 4E1F
Et voilà !! It just remains the two unique characters of the original list :-))
Notes :
The first part (?-s) is a modifier which implies that any dot will match a single standard character and not EOL characters
Then, the ^ symbol is a zero-length assertion, which means beginning of line
Now, the part (.+\R) represents a non-empty range of consecutive standard characters, followed by its EOL character(s). As the current complete line is enclosed in parentheses, it’s stored as group 1
Finally, the part \1+, is a repeated back-reference to group 1, which looks for any non-empty range of consecutive lines, identical to the first one !
As the replacement zone is EMPTY, all these repeated lines ( > 1 ) are simply deleted !
Best Regards,
guy038