Community
    • Login

    question about compare with additional special chars and wildcard

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    6 Posts 2 Posters 385 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Daniel B. 0D
      Daniel B. 0
      last edited by

      Hello,

      I have two text documents that I would like to synchronize, but I have no idea how.

      the files are called exist.txt and download.txt, in the exist.txt are folder names below each other, i would like to match them line by line with the download.txt, conditions are it must be per wildcard and ascii \x02 before and after.

      as an example

      content of exist.txt

      my.folder1
      my.folder2
      my.folder3

      content of download.txt

      anything\x02my.folder1\x02whatever
      dream\x02my.folder2\x02country

      If he finds a match, he should remove it from the download.txt.

      i would be happy to receive ideas or tips thanks in advance.

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello, @daniel-b-0 and All,

        Not difficult with regexes ! Just follow the road map below :

        • First, rename your download.txt file as download_SVG.txt

        • Open your two files exist.txt and download_SVG.txt in Notepad++

        • Now, open a new file in Notepad++

        • Append the contents of your download_SVG.txt file in this new file

        • Then, at the very end of the new file, append a line of some equal signs

        • Finally, append the contents of your exist.txt file, right below the line of equal signs

        • Save this new file as download.txt

        Thus, for example, your new download.txt file would temporarily looks like below :

        anythingmy.folder1whatever
        dreammy.folder2country
        dreammy.folder3
        anythingmy.folder4whatever
        anythingmy.folder5whatever
        =====================================
        my.folder1
        my.folder3
        my.folder5
        
        • Open the Replace dialog ( Ctrl + H )

        • SEARCH (?-si)^.+?\x02(.+)\x02.*\R(?=(?s).+?\1)|(?s)^=+.+

        • REPLACE Leave EMPTY

        • Check the Wrap around option

        • Select the Regular expression mode

        • Click on the Replace All button

        => Here you are : all lines, whose folder were present twice in the file, are deleted. So it remains the folders not downloaded yet :

        dreammy.folder2country
        anythingmy.folder4whatever
        
        • Re-save your final download.txt file

        May be, when you said :

        … and ascii \x02 before and after.

        You spoke about the true literal expression \x02

        In that case, the S/R above must be changed as :

        • SEARCH (?-si)^.+?\\x02(.+)\\x02.*\R(?=(?s).+?\1)|(?s)^=+.+

        • REPLACE Leave EMPTY

        Best Regards

        guy038

        Daniel B. 0D 1 Reply Last reply Reply Quote 1
        • Daniel B. 0D
          Daniel B. 0 @guy038
          last edited by

          thank you very much! @guy038 i am really amazed that regex can be so versatile. it does exactly what it is supposed to do!

          1 Reply Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by guy038

            Hi, @daniel-b-0,

            Just for info :

            Did you speak about the C1 control code \x02 or about the literal expression \x02 ?

            BR

            guy038

            Daniel B. 0D 1 Reply Last reply Reply Quote 0
            • Daniel B. 0D
              Daniel B. 0 @guy038
              last edited by

              Hi, @guy038,

              it was about the control code, your solution works very well! unfortunately notepad is very very slow with more than 4000 lines.

              BR

              Daniel

              1 Reply Last reply Reply Quote 1
              • guy038G
                guy038
                last edited by guy038

                Hi, @daniel-b-0 and All,

                Last UPDATED on 2024/05/22 : In the first version of this post, I exposed some real names of my personal photos. After reflection, I decided, for confidentiality, to change it and only show non-personal data !!

                I understand that my method cannot be used safely with files of important size. So, I’m going to expose an second method which should work in all cases !

                I experimented this new method with real data : A USB key of mine, containing 8,186 photos, collected over a period from 2004 to 2023

                ( Don’t worry, these photos are also stored on two external hard drives. In all circonstances, we must imitate the Mother Nature;, which uses RNA to code proteins and, NEVER, DNA itself for this purpose !! )

                The general organisation of my USB drive is :

                G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \01.jpg
                G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \02.jpg
                G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \03.jpg
                G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \03_ORG.jpg
                G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \04.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\01.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\02.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\03.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\04.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\05.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\06.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\07.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\08.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\09.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\10.jpg
                G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\01.jpg
                G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\02.jpg
                G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\03.jpg
                G:\_PHOTOS\2005\08_22_xxxx xxxxxx\01.jpg
                G:\_PHOTOS\2006\01_07_xxxxxxx xxxxxxxxxxx\01.jpg
                ...
                ...
                ...
                G:\_PHOTOS\2023\10_01_xxxxx_xxxxx.jpg
                G:\_PHOTOS\2023\10_01_xxxxx_xxxxx.jpg
                G:\_PHOTOS\2023\10_08xxxxx xxxxx xxxxxxxxxxxx\01.jpg
                G:\_PHOTOS\2023\10_22_xxxxx_xxxxx_xxxxx\01.jpg
                G:\_PHOTOS\2023\12_02_xxxx_xxxxxx_xxxxxx\01.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\01.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\02.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\03.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\04.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\05.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\06.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\07.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\08.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\09.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\10.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\11.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\12.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\13.jpg
                G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\01.jpg
                G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\02.jpg
                G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\03.jpg
                G:\_PHOTOS\2023\12_31_xxxxxx - xxxxxxxx\01.jpg
                

                So, sorted by year, then by motif ( month_day[-day]_location_reason or, sometimes, month_day[-day]_reason_location ) and finally by photo number, with, sometimes, the initial of the person who took the photo ( -A for Annie, my sister, -X for unknown, etc, )

                In order to mimic your download.txt file, I placed the \x02 delimiters right after the G:_PHOTOS\ part and right before the \xx.jpg part; giving this format :

                G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \01.jpg
                G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \02.jpg
                G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \03.jpg
                G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \03_ORG.jpg
                G:\_PHOTOS\2004\06_11-22_xxxxxxx - xxxxxxxxx - xxxxxxxxxxxxxx \04.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\01.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\02.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\03.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\04.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\05.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\06.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\07.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\08.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\09.jpg
                G:\_PHOTOS\2005\01_24-29_SKI_xxxx xxxx xxxxx\10.jpg
                G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\01.jpg
                G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\02.jpg
                G:\_PHOTOS\2005\03_22_SKI_xx xxxxxxx\03.jpg
                G:\_PHOTOS\2005\08_22_xxxx xxxxxx\01.jpg
                G:\_PHOTOS\2006\01_07_xxxxxxx xxxxxxxxxxx\01.jpg
                ...
                ...
                ...
                G:\_PHOTOS\2023\10_01_xxxxx_xxxxx.jpg
                G:\_PHOTOS\2023\10_01_xxxxx_xxxxx.jpg
                G:\_PHOTOS\2023\10_08xxxxx xxxxx xxxxxxxxxxxx\01.jpg
                G:\_PHOTOS\2023\10_22_xxxxx_xxxxx_xxxxx\01.jpg
                G:\_PHOTOS\2023\12_02_xxxx_xxxxxx_xxxxxx\01.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\01.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\02.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\03.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\04.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\05.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\06.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\07.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\08.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\09.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\10.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\11.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\12.jpg
                G:\_PHOTOS\2023\12_15_xxxxxx xxxxxxx xxxxxxxx xxx\13.jpg
                G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\01.jpg
                G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\02.jpg
                G:\_PHOTOS\2023\12_26_xxxxx xxxxxxxxx xx xxxx xxxxxxx\03.jpg
                G:\_PHOTOS\2023\12_31_xxxxxx - xxxxxxxx\01.jpg
                

                In this way, we are sure that the zones, between delimiters, are unique like, for instance :

                G:\_PHOTOS\2010\00_abcde_fghij\01.jpg
                ...
                ...
                G:\_PHOTOS\2011\00_abcde_fghij\01.jpg
                

                Then, I randomized this file, using the N++ option :

                Edit > Line Operations > Sort Lines Randomly

                So my download.txt file looks like :

                G:\_PHOTOS\2014\08_01_xxxxxxxx xxxxxxxxxxxx\009_G.jpg
                G:\_PHOTOS\2010\03_06_SKI_xxxxxxxxxx-xxxxxxx\14.jpg
                G:\_PHOTOS\2011\01_15_SKI_xxxxxxxxx-xxxxxxx\06.jpg
                G:\_PHOTOS\2014\02_21-22_xxxxxxxxxx_xxxxxxxxxx xxxxxx\07.jpg
                G:\_PHOTOS\2012\08_07-22_xxxxxxxx xxxxxxxxx\034_X.jpg
                G:\_PHOTOS\2010\05_29_xxxxxxxxx xxxxxxx_xxxxxxxx\14.jpg
                ...
                ...
                ...
                G:\_PHOTOS\2014\09_13_xxxxxxxxxx_xxxxxxxxxx\023.jpg
                G:\_PHOTOS\2017\08_10-28_xx xxxx\013.jpg
                G:\_PHOTOS\2010\10_30-31_xxxxxx_xxxxxxxxxxxx xxxxx\076_X.jpg
                G:\_PHOTOS\2022\07_13-08_27_xx_xxxx\099_A.jpg
                G:\_PHOTOS\2016\03_05-07_SKI_xxxxxxxxxxxx\006.jpg
                G:\_PHOTOS\2014\03_24_SKI_xxxxxxx-xxxxxxxx\44.jpg
                

                Secondly, I created an exist.txt file, made of all the different zones, between the STX delimiters. I obtained a file of 366 lines, whose I randomly deleted 45 of them, giving a final exist.txt file with 321 lines. So, at the end of the new method, we should get a file of all the lines containing one of the missing 45 zones !


                Important :

                • For a correct realization, you must use the last v8.6.5 version of Notepad++, which improves the multi-selection process !

                • In all the search/replacements, listed below :

                  • The Wrap around option is checked

                  • The Regular expression search mode is checked

                  • All the other options are un-checked

                Let’s go :

                • First, re-copy your download.txt file as mark.txt

                • Open the mark.txt file in N++

                • Open the Replace dialog ( Ctrl + H )

                • SEARCH (?-s)^.*\x02(.+)\x02.*

                • REPLACE $1

                • Click on the Replace All button

                => We just keep the zones between delimiters

                • Now, use the menu option Edit > Line Operations > Sort Lines Lexicographically Ascending

                • Re-open the Replace dialog ( Ctrl + H )

                • SEARCH (?-s)^(.+\R)\K\1+

                • REPLACE Leave EMPTY

                • Click on the Replace All button

                => The duplicate lines are deleted and your mark.txt file should have decreased drastically ! In my case, I did get a mark.txt file with only 366 different lines

                • Then, append your exist.txt at the end of the mark.txt file. In my case, the file contains 366 + 321 so 687 lines

                • Again, use the menu option Edit > Line Operations > Sort Lines Lexicographically Ascending

                • Re-open the Replace dialog ( Ctrl + H )

                • SEARCH (?-s)^(.+\R)\1

                • REPLACE Leave EMPTY

                • Click on the Replace All button

                => The mark.txt file should have decreased and now contains only the zones which require downloading. In my case, it contains, as expected, 45 lines / zones !

                • If the last line of the mark.txt file ends with an EOL, delete the EOL characters of this last line

                Note :

                • If all or some lines contain sub-folders, you’ll have to replace any \ character with a the literal \\ string

                • Now, on column 1, do a zero-length COLUMN selection of all the lines ( indication N × 0 in the status bar )

                • Type in a | pipe character

                • Hit the Home key

                • Hit the Backspace key

                => The file is changed into a one-line file

                • Hit the Home key, again

                • Delete the first | character

                • Finally, save the mark.txt file, now a single-line file

                Remark :

                • If the entire line contains more than 2,000 characters, split this long line in parts, right before a | char and delete any | remaining at beginning and/or end of the lines

                For example :

                abc|def|.......................|uvw|xyz
                01|23|.........................|67|89
                
                Of course, in this case, you'll have to REPEAT the MARK operation, described below, for each CREATED line
                
                • Now, re-copy your download.txt file as to_do.txt

                • Switch to the mark.txt tab, containing, most of a time, just a single line

                • Select all the text ( Ctrl + A )

                • Open the Mark dialog ( Ctrl + M )

                => The text should be automatically inserted in the dialog

                • Check the Bookmark line and Purge for each search options ( IMPORTANT )

                • Switch back to the to_do.txt tab

                • Click on the Mark All button

                => Message of the dialog Mark: xxx matches in entire file ( 876, in my case )

                • In the Bookmark margin, select, with the right-click button, the option Remove Unmarked Lines or use the menu option Search > Bookmark > Remove Unmarked Lines

                • Click on the Clear all marks button of the Mark dialog

                • Finally, save the to_do.txt file

                => You should get all the files that require downloading, In my theoric case, from the 45 zones to take in account, I got a list of 876 files / lines to “download” ;-))

                Best Regards,

                guy038

                P.S. :

                Here’s a tip to count a list of numbers :

                • Do a multi-column selection of all these numbers, located anywhere in your current file

                • Paste them in a new tab

                • Do a zero-length COLUMN selection of all these numbers

                • Hit the + sign

                • Hit the Home key

                • Hit the Backspace key

                • Hit the End key

                • Insert the = sign

                • Copy all contents of this single line ( Ctrl + C )

                • Open calc.exe

                • Paste the contents of the clipboard ( Ctrl + V )

                => Here you are : the Windows calculator should show you the total of your **list of numbers ;-)) No possibility of errors and quick result !

                You may even count numbers in other bases !

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors