question about compare with additional special chars and wildcard
-
Hello,
I have two text documents that I would like to synchronize, but I have no idea how.
the files are called exist.txt and download.txt, in the exist.txt are folder names below each other, i would like to match them line by line with the download.txt, conditions are it must be per wildcard and ascii \x02 before and after.
as an example
content of exist.txt
my.folder1
my.folder2
my.folder3content of download.txt
anything\x02my.folder1\x02whatever
dream\x02my.folder2\x02countryIf he finds a match, he should remove it from the download.txt.
i would be happy to receive ideas or tips thanks in advance.
-
Hello, @daniel-b-0 and All,
Not difficult with regexes ! Just follow the road map below :
-
First, rename your
download.txtfile asdownload_SVG.txt -
Open your two files
exist.txtanddownload_SVG.txtin Notepad++ -
Now, open a new file in Notepad++
-
Append the contents of your
download_SVG.txtfile in this new file -
Then, at the very end of the new file, append a line of some
equalsigns -
Finally, append the contents of your
exist.txtfile, right below the line ofequalsigns -
Save this new file as
download.txt
Thus, for example, your new
download.txtfile would temporarily looks like below :anythingmy.folder1whatever dreammy.folder2country dreammy.folder3 anythingmy.folder4whatever anythingmy.folder5whatever ===================================== my.folder1 my.folder3 my.folder5-
Open the Replace dialog (
Ctrl + H) -
SEARCH
(?-si)^.+?\x02(.+)\x02.*\R(?=(?s).+?\1)|(?s)^=+.+ -
REPLACE
Leave EMPTY -
Check the
Wrap aroundoption -
Select the
Regular expressionmode -
Click on the
Replace Allbutton
=> Here you are : all lines, whose folder were present twice in the file, are deleted. So it remains the folders not downloaded yet :
dreammy.folder2country anythingmy.folder4whatever- Re-save your final
download.txtfile
May be, when you said :
… and ascii \x02 before and after.
You spoke about the true literal expression
\x02In that case, the S/R above must be changed as :
-
SEARCH
(?-si)^.+?\\x02(.+)\\x02.*\R(?=(?s).+?\1)|(?s)^=+.+ -
REPLACE
Leave EMPTY
Best Regards
guy038
-
-
thank you very much! @guy038 i am really amazed that regex can be so versatile. it does exactly what it is supposed to do!
-
Hi, @daniel-b-0,
Just for info :
Did you speak about the C1 control code
\x02or about the literal expression\x02?BR
guy038
-
Hi, @guy038,
it was about the control code, your solution works very well! unfortunately notepad is very very slow with more than 4000 lines.
BR
Daniel
-
Hi, @daniel-b-0 and All,
Last UPDATED on 2024/05/22 : In the first version of this post, I exposed some real names of my personal photos. After reflection, I decided, for confidentiality, to change it and only show non-personal data !!
I understand that my method cannot be used safely with files of important size. So, I’m going to expose an second method which should work in all cases !
I experimented this new method with real data : A USB key of mine, containing
8,186photos, collected over a period from2004to2023( Don’t worry, these photos are also stored on two external hard drives. In all circonstances, we must imitate the Mother Nature;, which uses RNA to code proteins and, NEVER, DNA itself for this purpose !! )
The general organisation of my USB drive is :
G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \01.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \02.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \03.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \03_ORG.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \04.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\01.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\02.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\03.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\04.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\05.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\06.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\07.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\08.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\09.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\10.jpg G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\01.jpg G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\02.jpg G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\03.jpg G:\_PHOTOS\2005\08_22_xxxx xxxxxx\01.jpg G:\_PHOTOS\2006\01_07_xxxxxxx xxxxxxxxxxx\01.jpg ... ... ... G:\_PHOTOS\2023\10_01_xxxxx_xxxxx.jpg G:\_PHOTOS\2023\10_01_xxxxx_xxxxx.jpg G:\_PHOTOS\2023\10_08xxxxx xxxxx xxxxxxxxxxxx\01.jpg G:\_PHOTOS\2023\10_22_xxxxx_xxxxx_xxxxx\01.jpg G:\_PHOTOS\2023\12_02_xxxx_xxxxxx_xxxxxx\01.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\01.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\02.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\03.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\04.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\05.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\06.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\07.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\08.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\09.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\10.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\11.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\12.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\13.jpg G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\01.jpg G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\02.jpg G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\03.jpg G:\_PHOTOS\2023\12_31_xxxxxx - xxxxxxxx\01.jpgSo, sorted by year, then by motif (
month_day[-day]_location_reasonor, sometimes,month_day[-day]_reason_location) and finally by photo number, with, sometimes, the initial of the person who took the photo ( -A forAnnie, my sister, -X for unknown, etc, )In order to mimic your
download.txtfile, I placed the\x02delimiters right after the G:_PHOTOS\ part and right before the \xx.jpg part; giving this format :G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \01.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \02.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \03.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \03_ORG.jpg G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \04.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\01.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\02.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\03.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\04.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\05.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\06.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\07.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\08.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\09.jpg G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\10.jpg G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\01.jpg G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\02.jpg G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\03.jpg G:\_PHOTOS\2005\08_22_xxxx xxxxxx\01.jpg G:\_PHOTOS\2006\01_07_xxxxxxx xxxxxxxxxxx\01.jpg ... ... ... G:\_PHOTOS\2023\10_01_xxxxx_xxxxx.jpg G:\_PHOTOS\2023\10_01_xxxxx_xxxxx.jpg G:\_PHOTOS\2023\10_08xxxxx xxxxx xxxxxxxxxxxx\01.jpg G:\_PHOTOS\2023\10_22_xxxxx_xxxxx_xxxxx\01.jpg G:\_PHOTOS\2023\12_02_xxxx_xxxxxx_xxxxxx\01.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\01.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\02.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\03.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\04.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\05.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\06.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\07.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\08.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\09.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\10.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\11.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\12.jpg G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\13.jpg G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\01.jpg G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\02.jpg G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\03.jpg G:\_PHOTOS\2023\12_31_xxxxxx - xxxxxxxx\01.jpgIn this way, we are sure that the zones, between delimiters, are unique like, for instance :
G:\_PHOTOS\2010\00_abcde_fghij\01.jpg ... ... G:\_PHOTOS\2011\00_abcde_fghij\01.jpgThen, I randomized this file, using the N++ option :
Edit > Line Operations > Sort Lines RandomlySo my
download.txtfile looks like :G:\_PHOTOS\2014\08_01_xxxxxxxx xxxxxxxxxxxx\009_G.jpg G:\_PHOTOS\2010\03_06_SKI_xxxxxxxxxx-xxxxxxx\14.jpg G:\_PHOTOS\2011\01_15_SKI_xxxxxxxxx-xxxxxxx\06.jpg G:\_PHOTOS\2014\02_21-22_xxxxxxxxxx_xxxxxxxxxx xxxxxx\07.jpg G:\_PHOTOS\2012\08_07-22_xxxxxxxx xxxxxxxxx\034_X.jpg G:\_PHOTOS\2010\05_29_xxxxxxxxx xxxxxxx_xxxxxxxx\14.jpg ... ... ... G:\_PHOTOS\2014\09_13_xxxxxxxxxx_xxxxxxxxxx\023.jpg G:\_PHOTOS\2017\08_10-28_xx xxxx\013.jpg G:\_PHOTOS\2010\10_30-31_xxxxxx_xxxxxxxxxxxx xxxxx\076_X.jpg G:\_PHOTOS\2022\07_13-08_27_xx_xxxx\099_A.jpg G:\_PHOTOS\2016\03_05-07_SKI_xxxxxxxxxxxx\006.jpg G:\_PHOTOS\2014\03_24_SKI_xxxxxxx-xxxxxxxx\44.jpgSecondly, I created an
exist.txtfile, made of all the different zones, between theSTXdelimiters. I obtained a file of366lines, whose I randomly deleted45of them, giving a finalexist.txtfile with321lines. So, at the end of the new method, we should get a file of all the lines containing one of the missing45zones !
Important :
-
For a correct realization, you must use the last
v8.6.5version of Notepad++, which improves the multi-selection process ! -
In all the search/replacements, listed below :
-
The
Wrap aroundoption is checked -
The
Regular expressionsearch mode is checked -
All the other options are un-checked
-
Let’s go :
-
First, re-copy your
download.txtfile asmark.txt -
Open the
mark.txtfile in N++ -
Open the Replace dialog (
Ctrl + H) -
SEARCH
(?-s)^.*\x02(.+)\x02.* -
REPLACE
$1 -
Click on the
Replace Allbutton
=> We just keep the zones between delimiters
-
Now, use the menu option
Edit > Line Operations > Sort Lines Lexicographically Ascending -
Re-open the Replace dialog (
Ctrl + H) -
SEARCH
(?-s)^(.+\R)\K\1+ -
REPLACE
Leave EMPTY -
Click on the
Replace Allbutton
=> The duplicate lines are deleted and your
mark.txtfile should have decreased drastically ! In my case, I did get amark.txtfile with only366different lines-
Then, append your
exist.txtat the end of themark.txtfile. In my case, the file contains366+321so687lines -
Again, use the menu option
Edit > Line Operations > Sort Lines Lexicographically Ascending -
Re-open the Replace dialog (
Ctrl + H) -
SEARCH
(?-s)^(.+\R)\1 -
REPLACE
Leave EMPTY -
Click on the
Replace Allbutton
=> The
mark.txtfile should have decreased and now contains only the zones which require downloading. In my case, it contains, as expected,45lines / zones !- If the last line of the
mark.txtfile ends with anEOL, delete theEOLcharacters of this last line
Note :
-
If all or some lines contain sub-folders, you’ll have to replace any
\character with a the literal\\string -
Now, on column
1, do a zero-length COLUMN selection of all the lines ( indicationN × 0in the status bar ) -
Type in a
|pipe character -
Hit the
Home key -
Hit the
Backspacekey
=> The file is changed into a one-line file
-
Hit the
Home key, again -
Delete the first
|character -
Finally, save the
mark.txtfile, now a single-line file
Remark :
- If the entire line contains more than
2,000characters, split this long line in parts, right before a|char and delete any|remaining at beginning and/or end of the lines
For example :
abc|def|.......................|uvw|xyz 01|23|.........................|67|89 Of course, in this case, you'll have to REPEAT the MARK operation, described below, for each CREATED line-
Now, re-copy your
download.txtfile asto_do.txt -
Switch to the
mark.txttab, containing, most of a time, just a single line -
Select all the text (
Ctrl + A) -
Open the Mark dialog (
Ctrl + M)
=> The text should be automatically inserted in the dialog
-
Check the
Bookmark lineandPurge for each searchoptions ( IMPORTANT ) -
Switch back to the
to_do.txttab -
Click on the
Mark Allbutton
=> Message of the dialog
Mark: xxx matches in entire file(876, in my case )-
In the Bookmark margin, select, with the right-click button, the option
Remove Unmarked Linesor use the menu optionSearch > Bookmark > Remove Unmarked Lines -
Click on the
Clear all marksbutton of the Mark dialog -
Finally, save the
to_do.txtfile
=> You should get all the files that require downloading, In my theoric case, from the
45zones to take in account, I got a list of876files / lines to “download” ;-))Best Regards,
guy038
P.S. :
Here’s a tip to count a list of numbers :
-
Do a multi-column selection of all these numbers, located anywhere in your current file
-
Paste them in a new tab
-
Do a zero-length COLUMN selection of all these numbers
-
Hit the
+sign -
Hit the
Homekey -
Hit the
Backspacekey -
Hit the
Endkey -
Insert the
=sign -
Copy all contents of this single line (
Ctrl + C) -
Open
calc.exe -
Paste the contents of the clipboard (
Ctrl + V)
=> Here you are : the Windows calculator should show you the total of your **list of numbers ;-)) No possibility of errors and quick result !
You may even count numbers in other bases !
-