question about compare with additional special chars and wildcard
-
Hello,
I have two text documents that I would like to synchronize, but I have no idea how.
the files are called exist.txt and download.txt, in the exist.txt are folder names below each other, i would like to match them line by line with the download.txt, conditions are it must be per wildcard and ascii \x02 before and after.
as an example
content of exist.txt
my.folder1
my.folder2
my.folder3content of download.txt
anything\x02my.folder1\x02whatever
dream\x02my.folder2\x02countryIf he finds a match, he should remove it from the download.txt.
i would be happy to receive ideas or tips thanks in advance.
-
Hello, @daniel-b-0 and All,
Not difficult with regexes ! Just follow the road map below :
-
First, rename your
download.txt
file asdownload_SVG.txt
-
Open your two files
exist.txt
anddownload_SVG.txt
in Notepad++ -
Now, open a new file in Notepad++
-
Append the contents of your
download_SVG.txt
file in this new file -
Then, at the very end of the new file, append a line of some
equal
signs -
Finally, append the contents of your
exist.txt
file, right below the line ofequal
signs -
Save this new file as
download.txt
Thus, for example, your new
download.txt
file would temporarily looks like below :anythingmy.folder1whatever dreammy.folder2country dreammy.folder3 anythingmy.folder4whatever anythingmy.folder5whatever ===================================== my.folder1 my.folder3 my.folder5
-
Open the Replace dialog (
Ctrl + H
) -
SEARCH
(?-si)^.+?\x02(.+)\x02.*\R(?=(?s).+?\1)|(?s)^=+.+
-
REPLACE
Leave EMPTY
-
Check the
Wrap around
option -
Select the
Regular expression
mode -
Click on the
Replace All
button
=> Here you are : all lines, whose folder were present twice in the file, are deleted. So it remains the folders not downloaded yet :
dreammy.folder2country anythingmy.folder4whatever
- Re-save your final
download.txt
file
May be, when you said :
… and ascii \x02 before and after.
You spoke about the true literal expression
\x02
In that case, the S/R above must be changed as :
-
SEARCH
(?-si)^.+?\\x02(.+)\\x02.*\R(?=(?s).+?\1)|(?s)^=+.+
-
REPLACE
Leave EMPTY
Best Regards
guy038
-
-
thank you very much! @guy038 i am really amazed that regex can be so versatile. it does exactly what it is supposed to do!
-
Hi, @daniel-b-0,
Just for info :
Did you speak about the C1 control code
\x02
or about the literal expression\x02
?BR
guy038
-
Hi, @guy038,
it was about the control code, your solution works very well! unfortunately notepad is very very slow with more than 4000 lines.
BR
Daniel
-
Hi, @daniel-b-0 and All,
Last UPDATED on 2024/05/22 : In the first version of this post, I exposed some real names of my personal photos. After reflection, I decided, for confidentiality, to change it and only show non-personal data !!
I understand that my method cannot be used safely with files of important size. So, I’m going to expose an second method which should work in all cases !
I experimented this new method with real data : A USB key of mine, containing
8,186
photos, collected over a period from2004
to2023
( Don’t worry, these photos are also stored on two external hard drives. In all circonstances, we must imitate the Mother Nature;, which uses RNA to code proteins and, NEVER, DNA itself for this purpose !! )
The general organisation of my USB drive is :
G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \01.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \02.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \03.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \03_ORG.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \04.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\01.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\02.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\03.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\04.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\05.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\06.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\07.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\08.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\09.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\10.jpg G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\01.jpg G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\02.jpg G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\03.jpg G:\_PHOTOS\2005\08_22_xxxx xxxxxx\01.jpg G:\_PHOTOS\2006\01_07_xxxxxxx xxxxxxxxxxx\01.jpg ... ... ... G:\_PHOTOS\2023\10_01_xxxxx_xxxxx.jpg G:\_PHOTOS\2023\10_01_xxxxx_xxxxx.jpg G:\_PHOTOS\2023\10_08xxxxx xxxxx xxxxxxxxxxxx\01.jpg G:\_PHOTOS\2023\10_22_xxxxx_xxxxx_xxxxx\01.jpg G:\_PHOTOS\2023\12_02_xxxx_xxxxxx_xxxxxx\01.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\01.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\02.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\03.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\04.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\05.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\06.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\07.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\08.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\09.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\10.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\11.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\12.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\13.jpg G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\01.jpg G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\02.jpg G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\03.jpg G:\_PHOTOS\2023\12_31_xxxxxx - xxxxxxxx\01.jpg
So, sorted by year, then by motif (
month_day[-day]_location_reason
or, sometimes,month_day[-day]_reason_location
) and finally by photo number, with, sometimes, the initial of the person who took the photo ( -A forAnnie
, my sister, -X for unknown, etc, )In order to mimic your
download.txt
file, I placed the\x02
delimiters right after the G:_PHOTOS\ part and right before the \xx.jpg part; giving this format :G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \01.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \02.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \03.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \03_ORG.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \04.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\01.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\02.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\03.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\04.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\05.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\06.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\07.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\08.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\09.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\10.jpg G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\01.jpg G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\02.jpg G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\03.jpg G:\_PHOTOS\2005\08_22_xxxx xxxxxx\01.jpg G:\_PHOTOS\2006\01_07_xxxxxxx xxxxxxxxxxx\01.jpg ... ... ... G:\_PHOTOS\2023\10_01_xxxxx_xxxxx.jpg G:\_PHOTOS\2023\10_01_xxxxx_xxxxx.jpg G:\_PHOTOS\2023\10_08xxxxx xxxxx xxxxxxxxxxxx\01.jpg G:\_PHOTOS\2023\10_22_xxxxx_xxxxx_xxxxx\01.jpg G:\_PHOTOS\2023\12_02_xxxx_xxxxxx_xxxxxx\01.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\01.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\02.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\03.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\04.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\05.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\06.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\07.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\08.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\09.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\10.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\11.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\12.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\13.jpg G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\01.jpg G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\02.jpg G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\03.jpg G:\_PHOTOS\2023\12_31_xxxxxx - xxxxxxxx\01.jpg
In this way, we are sure that the zones, between delimiters, are unique like, for instance :
G:\_PHOTOS\2010\00_abcde_fghij\01.jpg ... ... G:\_PHOTOS\2011\00_abcde_fghij\01.jpg
Then, I randomized this file, using the N++ option :
Edit > Line Operations > Sort Lines Randomly
So my
download.txt
file looks like :G:\_PHOTOS\2014\08_01_xxxxxxxx xxxxxxxxxxxx\009_G.jpg G:\_PHOTOS\2010\03_06_SKI_xxxxxxxxxx-xxxxxxx\14.jpg G:\_PHOTOS\2011\01_15_SKI_xxxxxxxxx-xxxxxxx\06.jpg G:\_PHOTOS\2014\02_21-22_xxxxxxxxxx_xxxxxxxxxx xxxxxx\07.jpg G:\_PHOTOS\2012\08_07-22_xxxxxxxx xxxxxxxxx\034_X.jpg G:\_PHOTOS\2010\05_29_xxxxxxxxx xxxxxxx_xxxxxxxx\14.jpg ... ... ... G:\_PHOTOS\2014\09_13_xxxxxxxxxx_xxxxxxxxxx\023.jpg G:\_PHOTOS\2017\08_10-28_xx xxxx\013.jpg G:\_PHOTOS\2010\10_30-31_xxxxxx_xxxxxxxxxxxx xxxxx\076_X.jpg G:\_PHOTOS\2022\07_13-08_27_xx_xxxx\099_A.jpg G:\_PHOTOS\2016\03_05-07_SKI_xxxxxxxxxxxx\006.jpg G:\_PHOTOS\2014\03_24_SKI_xxxxxxx-xxxxxxxx\44.jpg
Secondly, I created an
exist.txt
file, made of all the different zones, between theSTX
delimiters. I obtained a file of366
lines, whose I randomly deleted45
of them, giving a finalexist.txt
file with321
lines. So, at the end of the new method, we should get a file of all the lines containing one of the missing45
zones !
Important :
-
For a correct realization, you must use the last
v8.6.5
version of Notepad++, which improves the multi-selection process ! -
In all the search/replacements, listed below :
-
The
Wrap around
option is checked -
The
Regular expression
search mode is checked -
All the other options are un-checked
-
Let’s go :
-
First, re-copy your
download.txt
file asmark.txt
-
Open the
mark.txt
file in N++ -
Open the Replace dialog (
Ctrl + H
) -
SEARCH
(?-s)^.*\x02(.+)\x02.*
-
REPLACE
$1
-
Click on the
Replace All
button
=> We just keep the zones between delimiters
-
Now, use the menu option
Edit > Line Operations > Sort Lines Lexicographically Ascending
-
Re-open the Replace dialog (
Ctrl + H
) -
SEARCH
(?-s)^(.+\R)\K\1+
-
REPLACE
Leave EMPTY
-
Click on the
Replace All
button
=> The duplicate lines are deleted and your
mark.txt
file should have decreased drastically ! In my case, I did get amark.txt
file with only366
different lines-
Then, append your
exist.txt
at the end of themark.txt
file. In my case, the file contains366
+321
so687
lines -
Again, use the menu option
Edit > Line Operations > Sort Lines Lexicographically Ascending
-
Re-open the Replace dialog (
Ctrl + H
) -
SEARCH
(?-s)^(.+\R)\1
-
REPLACE
Leave EMPTY
-
Click on the
Replace All
button
=> The
mark.txt
file should have decreased and now contains only the zones which require downloading. In my case, it contains, as expected,45
lines / zones !- If the last line of the
mark.txt
file ends with anEOL
, delete theEOL
characters of this last line
Note :
-
If all or some lines contain sub-folders, you’ll have to replace any
\
character with a the literal\\
string -
Now, on column
1
, do a zero-length COLUMN selection of all the lines ( indicationN × 0
in the status bar ) -
Type in a
|
pipe character -
Hit the
Home key
-
Hit the
Backspace
key
=> The file is changed into a one-line file
-
Hit the
Home key
, again -
Delete the first
|
character -
Finally, save the
mark.txt
file, now a single-line file
Remark :
- If the entire line contains more than
2,000
characters, split this long line in parts, right before a|
char and delete any|
remaining at beginning and/or end of the lines
For example :
abc|def|.......................|uvw|xyz 01|23|.........................|67|89 Of course, in this case, you'll have to REPEAT the MARK operation, described below, for each CREATED line
-
Now, re-copy your
download.txt
file asto_do.txt
-
Switch to the
mark.txt
tab, containing, most of a time, just a single line -
Select all the text (
Ctrl + A
) -
Open the Mark dialog (
Ctrl + M
)
=> The text should be automatically inserted in the dialog
-
Check the
Bookmark line
andPurge for each search
options ( IMPORTANT ) -
Switch back to the
to_do.txt
tab -
Click on the
Mark All
button
=> Message of the dialog
Mark: xxx matches in entire file
(876
, in my case )-
In the Bookmark margin, select, with the right-click button, the option
Remove Unmarked Lines
or use the menu optionSearch > Bookmark > Remove Unmarked Lines
-
Click on the
Clear all marks
button of the Mark dialog -
Finally, save the
to_do.txt
file
=> You should get all the files that require downloading, In my theoric case, from the
45
zones to take in account, I got a list of876
files / lines to “download” ;-))Best Regards,
guy038
P.S. :
Here’s a tip to count a list of numbers :
-
Do a multi-column selection of all these numbers, located anywhere in your current file
-
Paste them in a new tab
-
Do a zero-length COLUMN selection of all these numbers
-
Hit the
+
sign -
Hit the
Home
key -
Hit the
Backspace
key -
Hit the
End
key -
Insert the
=
sign -
Copy all contents of this single line (
Ctrl + C
) -
Open
calc.exe
-
Paste the contents of the clipboard (
Ctrl + V
)
=> Here you are : the Windows calculator should show you the total of your **list of numbers ;-)) No possibility of errors and quick result !
You may even count numbers in other bases !
-