Help Needed with COMPARE plugin..thanks in advance

ray Landolfi

Hey Everyone,
I suck at this so please bare with me.
I am trying to compare two notepad files. They are essentially lists of files I have.
I can successfully compare them (woohoo)…
Here is an example of a few lines for better understanding
!MrRay BKD-24787 - Maren Morris - Drunk Girls Don’t Cry.zip
!MrRay BKD-24789 - Bruno Mars - That’s What I Like.zip
!MrRay BKD-24791 - Brett Young - Sleep Without You.zip
!MrRay BKD-24793 - Chris Janson - Power Of Positive Drinkin’.zip
!MrRay BKD-24795 - Drake - Passionfruit.zip
!MrRay BKD-24797 - Drake White - Makin’ Me Look Good Again.zip
!MrRay BKD-24799 - Brothers Osborne - It Ain’t My Fault.zip
!MrRay BKD-24801 - Florida Georgia Line - Dig Your Roots.zip

in a perfect world I would love to be able to compare by columns…
I have “show only diffs” checked off but it returns the while list with differences highlighted. which requires me to mark each difference by hand
700k lines compared and thousands of results need to be C&P

in addition…
Right now it highlights some only because the spelling is different or there are a few extra characters in the titles.
I would love to have the program compare what i would call the second column
(BKD-xxxxx) and only show me the differences (ones missing) so I could C&P them into another notepad file.

Is there a way to accomplish what I am looking for? or is it beyond what notepad++ and COMPARE plug-in is capable of?

1st image attached is what I get after I compare
compare 1.jpg

2nd is what I am looking for
compare 2.jpg

Coises

I think the problem you are trying to solve is this:

Only the numbers after BKD- matter.
What you want to know is which numbers occur in the first file but not in the second file.

I can tell you how I would solve this problem. I’m sure it’s not the only way.

First, make copies of each of your files. Edit the copies (open both — you’ll have two tabs).

In each file, delete the part that comes before the BKD-, and add a marker that will help us know which file is which.

To do that in the first file:

From the main menu, choose Search | Replace.
In the dialog, enter:
Find what: ^.* (BKD-\d++)
Replace with: \1 1
Wrap around: checked
Search Mode: Regular expression
. matches newline: not checked
Click Replace All.
Close the dialog.

Do the same in the second file, but replace with \1 2 instead of \1 1.

Next, select all of the second file (Ctrl+A) and copy to the clipboard (Ctrl+C).

Switch to the first file and go to the bottom. If there is a numbered, empty line at the bottom, click on that line and paste there (Ctrl+V). If the last line is not empty, go to the end of the last line and press the Enter key to create an empty line, then paste there.

Next, make sure nothing is selected and use Edit | Line Operations | Sort Lines Lexicographically Ascending to sort the combined file so that all of a single BKD number are together, with the first file cases of each number preceeding the second file cases.

Now we can use a regular expression to delete sequences of lines that contain a single BKD number with both 1 and 2 variants:

From the main menu, choose Search | Replace.
In the dialog, enter:
Find what: ^(BKD-\d++) 1.*\R(\1 1.*\R)*(\1 2.*(\R|\Z))+
Replace with: (leave empty)
Wrap around: checked
Search Mode: Regular expression
. matches newline: not checked
Click Replace All.
Close the dialog.

You’ll be left with the BKD numbers that didn’t occur in both files.

If you need to condense that list so that each number appears only once, even if it occurred multiple times in one file (but not at all in the other), you can do another regular expression to eliminate all but the first of each BKD- number. I’ll leave it to you to see if you can figure out how to do that; if you need guidance, tell us what you tried and what went wrong.

guy038

Hello, @ray-landolfi, @coises and All,

I think that there’s a way to achieve want you want to, natively, from within the ComparePlus plugin :-))

First, in order to verify my assertion, here are, below, two texts. The first one will be saved ad File_A.txt and the second as File_B.txt


!MrRay BKD-24734 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6
!MrRay BKD-24734 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6
!MrRay BKD-24734 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6
!MrRay BKD-24735 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6
!MrRay BKD-24735 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6
!MrRay BKD-24736 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6
!MrRay BKD-24736 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6

And :


!beni-Lappy BKD-24733 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6
!beni-Lappy BKD-24734 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6
!beni-Lappy BKD-24735 - Brantley Gilbert - Aaaaaaaaaaaaaaaaa.zip  ::INFO:: 3.6
!beni-Lappy BKD-24735 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6
!beni-Lappy BKD-24735 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6
!beni-Lappy BKD-24736 - Brantley Gilbert - Bbbbbbbbbbbbbbbbb.zip  ::INFO:: 3.6
!beni-Lappy BKD-24736 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6
!beni-Lappy BKD-24737 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6

Then, follow these few steps :

Select the option Plugins > ComparePlus > Ignore Regex...
In the pop-up window which appears, enter the regex ^.+(?=BKD)|\x20-\x20.+$
Click on the Enable button

=> You can verify that the Ignore Regex... option is checked

Now, (re)compare the two files File_A.txt and File_B.txt with the option Plugins > ComparePlus > Compare

=> This time, ONLY the part BKD-##### will be taken in account in the comparison process. More easy to read, isn’t it !

To improve the reading, you may even check the Plugins > ComparePlus > Show Only Diffs (Hide Matches) option
You may also select the File_B.txt file and bookmark all its differences !

Voila ;-))

Best Regards,

guy038

My Debug Info :

Notepad++ v8.7.1   (64-bit)
Build time : Oct 25 2024 - 04:07:21
Path : D:\871_x64\notepad++.exe
Command Line : 
Admin mode : OFF
Local Conf mode : ON
Cloud Config : OFF
Periodic Backup : OFF
OS Name : Windows 10 Pro (64-bit)
OS Version : 22H2
OS Build : 19045.5011
Current ANSI codepage : 1252
Plugins : 
    mimeTools (3.1)
    NppConverter (4.6)
    NppExport (0.4)
    ComparePlus (1.2)

BEFORE :

7346ddeb-c09c-4c1e-93d3-716c9d139c24-Capture d’écran 2024-10-28 095608.png

AFTER :

e41b6c67-0cc6-4602-abf1-2faafdf8d2e0-Capture d’écran 2024-10-28 095245.png

pnedev

@ray-Landolfi ,

Just follow @guy038 's instuctions and you should achieve your goal without problems.

@Coises ,
Thank you for helping users here on the forum and for providing work-around solution.

@guy038 ,
Thank you as always for your invaluable RegEx wizardy ;)

BR

ray Landolfi

THAT WORKED FANTASTICALLY…YOU SAVED ME “HOURS” OF WORK…
thank you… thank you … thank you

Using that expression got be through the"BKX-xxxxxs" in minutes

I know that expression was made for just that example I gave
How to I learn to make expressions for other track names?
there are hundreds of different naming combinations like
scxxxx-xx , zpcp2024-xx, kv-xxxxx
but the all follow the same format just have different letters and numbers
typically the standard format for all like this…
DISC #(dash)TRACK #(SpaceDashSpace)ARTIST(SpaceDashSpace)TITLE
the artist and title could be ignored if they are sorted by just the DISK & TRACK #s

I really need to learn the process of creating an expression if there is no way to make a universal one.

where does one learn how to construct a regex expression?

oh and THANK YOU AGAIN for what you already provided…

Alan Kilborn

@ray-Landolfi said in Help Needed with COMPARE plugin..thanks in advance:

where does one learn how to construct a regex expression?

Start HERE.

ray Landolfi

@Alan-Kilborn thank you…

upon reading that I have realized I am not as smart as I previously thought…
I actually feel LESS smart than before…

I will have to depend on someone coming up with a universal regex expression based on my follow up as I am no where capable of doing it myself… LOL

guy038

Hi, @ray-landolfi, @coises, @alan-kilborn and All,*

To begin, I’ll just explain my previous regex. What all this stuff means ?

The search regex ^.+(?=BKD)|\x20-\x20.+$ can first be split in two different sub-regexes, separated by the alternative regex char which is the |.

The part ^.+(?=BKD) searches from beginning of line [ ^ ] for a non-zero amount of standard characters [ .+ ] but ONLY IF it’s followed with the BKD string [ (?=BKD) ]. The last syntax is called a look-ahead structure which is NEVER part of the regex match, but MUST occur at this location in order that there’s a match.
The part \x20-\x20.+$ searches for an space char, then a dash and an other space [ \x20-\x20 ] followed, again, with a non-zero amount of standard characters [ .+ ], till the end of line [ $ ]

In other words, if, for example, I use the first line of the File_B.txt, discussed in my previous post :

!beni-Lappy BKD-24733 - Brantley Gilbert - Devil Don't Sleep.zip  ::INFO:: 3.6
<---------->         <------------------------------------------------------->                   
  ^.+(?=BKD)                            \x20-\x20.+$

As you can see, everything of each line, but BKD-#####, is matched by this search regex and, thus, will be ignored by the ComparePlus plugin during the process !

Remarks :

By default for the ComparePlus plugin, the search is sensible to case
By default, any dot regex symbol [ . ] stands for a standard char, not an EOL char
The leading in-line modifiers like (?-i) or (?s) are not taken in account
Do not insert any line-break [ as \r, \n or \R ] in the regex ( Comparisons is based on a line by line process ! )

Now, @ray-landolfi, I pleased to see that your file seems well structured, like below :

DISC ### - TRACK ### - ARTIST ......... - TITLE .........

As you say that, both, the ARTIST and the TITLE could be ignored and that you just bother for the DISC and TRACK entities, then the Ignore regex could be simply as below :

\x20-\x20(?!TRACK).+$

DISC ### - TRACK ### - ARTIST ......... - TITLE .........
                    <--- \x20-\x20(?!TRACK).+$ --------->

However, @ray-landolfi, I suppose that there are a lot of other conditions to respect in your files ! If so, in your reply, just insert your text as raw text, using the </> icon !

Best Regards,

guy038

ray Landolfi

@guy038 all of the tracks are formatted the same… OMG how confusing if they didn’t…lol
the length of the “DISK” and “TRACK” selections only vary by character quantity…such as
zoom-1234
misc-12
homemade-0004

they always contain the same separators
“dash with no spaces” separating “Disk” & “Track#”
“space dash space” between "TRACK#"and “ARTIST” and again between “ARTIST & TITLE”
example SC1000-15 - Frank Sinatra - My Way.zip

The last code you offered seems to work well…
It returned the tracks I was missing compared to the other listby highlighting them

for some reason even with “return only diffs” checked off it still only highlighted the diffs…it didn’t just list the differences …but I can live with that…
it basically decides what to show me by simply the differences in the “disc and track#s” and ignores any differences in the “Artist and Title”…so THATS GREAT

I then bookmark the ones I need manually, clear all non-bookmarked lines and I have a nice list to C&P for my next step of acquiring them from the other persons library…
you have made this process easier and THANK YOU…

PeterJones

@ray-Landolfi,

Of course, you are still being overly vague, giving only one example, when even your original screenshots prove that your data is more complicated than you are implying. !beni-lappy BKD-#### shows that - can occur even in the “DISC ####” section, so the regex has to accept even hyphens (the character is an ASCII hyphen, not a dash), and that despite using the # , which means “number sign” in ASCII text, “DISC ####” is not a number, even though for “TRACK ####”, it is a number. Being ultra-loose in your examples like that is why @guy038 was asking for more example data, input using the </> button in the Forum post editor, so that it shows up in a text box, and we know we aren’t missing important characters.

But based on a guess as to what you mean – I think you are saying that DISC can have anything, and there is a hyphen separating DISC and TRACK, and track must be a one-or-more-digit number.

Thus, instead of having @guy038’d lookahead of (?=BKD), which was making the assumption that BKD always came before the DISC-TRACK separating hyphen, you would actually want a lookahead of (?=-\d+) … and if I were you, I would also make the “match-anything” before that .+? instead of .+, so that it will stop at the first -#### if there happens to be more than one in a given line. Based on this, my recommendation for a more-generic regex would be ^.+?(?=-\d+)|\x20-\x20.+$

I actually feel LESS smart than before

Anytime you are learning something new that’s worth learning, you must start with the realization that you have more to learn. Instead of using that as an excuse to give up, you need to use it as motivation. The FAQ gave lots of resources, any one of which would be a good place to start learning. Or just learn what each of the chunks that @guy038 has already shown do, and then start playing around in data that is familiar to you, and see what you can and cannot figure out how to match, and then start looking up terms in the Notepad++ User Manual’s regex section to see if you can figure out how to use the manual to figure out what other syntax does.

But just saying “it looks too hard” and giving up is guaranteed to mean you will never learn, and you’re the one who gets hurt by that decision.

I will have to depend on someone

Well, okay, if you are going to assume that someone else will always do stuff for you when it’s “too hard” for you, any such people that you abuse will also be harmed (or at least saddened) by your decision.

ray Landolfi

Thank you to everyone who helped…

PETER: Its true should learn this and thank you for your advice…
You are correct that the “DISC” field will contain text other than “BKD” as tried to explain in the other examples I shared… and the “TRACK #” will possibly contain more then 1 or 2 digits sometimes as many as 8 or more

The first string of data is the person’s list I am getting the info from it will vary
The second string will vary in length and characters as well but always be in the same format **“DISK-TRACK”**with no-space between them

Thank you for pointing out that I may have not explained the naming format fully. I have been working with this format for years and forget it’s new to others.
The following examples will show that the data is different depending on the person the list is from…

!celtic KVD-59990 - Luther Vandross - Killing Me Softly.zip
!beni-Lappy BKD-24733 - Brantley Gilbert - Devil Don’t Sleep.zip
!MrRay SC7515-12 - Frank Sinatra - My Way.zip

We can be sure that data to be sorted immediately follows the FIRST SPACE
and ends immediately before the SECOND SPACE.
I don’t know of any other way to describe it

GUY: thank you again for your help… The second code you shared seems to be ignoring the person the list is from…that worked
However since I wasn’t specific in explaining that the disk and track could be other than always being BKD I think it is not ignoring the other text after the DISC-TRACK#

Here is a screen shot of what I get when I use a list from another person and discs/tracks other than BKD with the same regex code

I don’t want to be a bother or pain in the butt so I will use what I have and no hard feelings…
I will follow up on seeing if I can learn this stuff but it’s going to be from the very start and this could take a while…LOL

Thanks again guys

guy038

Hi, @ray-landolfi, @coises, @alan-kilborn and All,

@ray-landolfi, while seeing your picture, I note that the ComparePlus plugin tell you to add a new empty line at the very beginning of your two files in order to compare them properly. So :

First, follow this advice and add an empty line on top of each file
Secondly, change the Ignore regex and use this new one ^.+?\x20|\x20-\x20.+$
Thirdly, re-compare your two files

This time, everything should be OK !

For your information :

Copy the three lines below, provided in your last post, in a new tab

!celtic KVD-59990 - Luther Vandross - Killing Me Softly.zip  ::INFO:: 2.6Mb
!beni-Lappy BKD-24733 - Brantley Gilbert - Devil Don’t Sleep.zip
!MrRay SC7515-12 - Frank Sinatra - My Way.zip

Open the Mark dialog of N++ ( Ctrl + M )
Type in the regex ^.+?\x20|\x20-\x20.+$ in the Find what: zone
Un-check all the box options ( IMPORTANT )
Check only the Wrap around option
Click on the Mark All button

=> All text should be highlighted, except for the zones KBD-59990 , BKD-24733 and SC7515-12

So, the highlighted zones are the zones excluded of the comparison by the ComparePlus plugin. Thus, ONLY the parts KBD-59990 , BKD-24733 and SC7515-12 will be taken in account in the comparison process !

Now :

Open a real file of yours
Repeat the above MARK operation

=> You’ll can verify that everything, but the DISK-TRACK zones, is marked, as you expected to !

Best Regards,

guy038

ray Landolfi

@guy038

I havent tried to sort files in a while and finally came back to check your responses…

Yes that latest regex worked to isolate just the files that were missing solely by the track info… Thank you again…

The added info on marking all those returned files at once however didn’t work… I only resumed the process today and have been marking them by hand… I tried to select all and copy the lines but it copies every line not just the compared lines so its back to individually marking them… it’s not fun but it beats looking through tens of thousands one at a time…

Thanks again for all your help…
If you have the urge to work out the marking issue
feel free…

Ray

pnedev

@ray-Landolfi said in Help Needed with COMPARE plugin..thanks in advance:

I tried to select all and copy the lines but it copies every line not just the compared lines so its back to individually marking them…

Check this description (I quote myself from the corresponding issue in ComparePlus repository):

After a compare you switch to the file which differences you want to copy, open the ComparePlus menu and use one of the new Bookmark… commands depending on what type of diffs you want to copy (for example Bookmark All Diffs in Current View). This will add a Notepad++ bookmark on every diff line matching that criteria. Next you can use the Notepad++ functionality to copy all bookmarked lines (you will find it under Search -> Bookmark sub-menu).

BR