Search in one file DELETE in another (YASADT)



  • Hello I need some help !

    The title is design to be a clickbait but in fact this is exactly what I want to do.

    Let me introduce the exercise:
    Say you have 2 m3u8 playlists with overlapping title in each of them. Then when you will play them in a single row you will listen two times the same title thus be bored.
    So if you don’t want this to appends you may want to delete same titles in the two playlists. I hope it is clear enough.

    If you don’t know how m3u8 playlist look like don’t worry I provide examples on pastbins.

    What I have done so far is finding how the hell regex work and found that what I am looking for is :~,.+ - .+$
    What need to be done is to delete same matches of that regex in the other file but by deleting the row where it is and the next one.

    To do so I set up python script (cuz i thought i knew how to code python) but i figured out that likely too much work (learning the API) for a little job.

    Unfortunately it is not possible to include code in this post nor screenshots i know you are smart to understand what i mean. Feel free to yell some insults if you don’t.

    m3u8 file exemple pastbin : BMHk9ZMM

    English isn’t my native language sorry about that be kind :)



  • @Gregory-Mompezat
    it seems from your post that you’d like to get in deep (python coding), so instead of giving you the answer directly, can I point you to another post, which was asking a very similar question.
    In there you will find some regexes which you may like to tinker with.
    https://notepad-plus-plus.org/community/topic/16377/find-matching-word-between-two-text-file

    There are also similar posts doing very similar things dotted throughout, just try the search function at the top of the page.
    If after doing that you are still having issues, come back to the forum with what you’ve tried. We will help if needed.

    Terry



  • I tried the solution provided here but it doesn’t help me a lot :/ (sad react only).

    What i want to match is ,.+ - .+$.

    What seem to do the job is (.+)(?=.*^---\R.*^\1$)

    i have copy and paste the files in one put the --- thing in between but it does not work :/. i don’t hunderstand exactly how (?= ... ) is supose to work…

    so trying genuine (?si)(.+)(?=.*^---\R.*^\1$) gave me :

    not exactly what i was hoping for but this is normal bc guy038 said that it would work only with 1 word per lines

    so i modifed in this way: (?si)(,.+ - .+$)(?=.*^---\R.^\1)
    and i got the exact same result (all red)

    but what i want is to match the same ,.+ - .+$ “like so”
    cmon work!
    pastebin



  • @Gregory-Mompezat
    bravo for giving it a try. You did an excellent job of editing the original to try and match your circumstances. Don’t be sad that it didn’t work first time.

    I made up a test using some of the text from your images and came up with the following regex to be used in the marking or deleting of lines. Try it and see if it matches better. As I haven’t spent a lot of time and don’t fully know the data you are working with there may still be changes required.

    So
    Find what: (?si)#EXTINF:\d+,(.+?)$(?=.*^-----\R.*\1)

    I used 5 - in my delimiter line. I figured that the string you are most interested in is the actual song title, the #EXTINF portion wasn’t to be used in the testing of a duplicate.

    So if you use this with the Mark function then you can spend a bit of time verifying the data before removing the lines. When marking you can use the ‘bookmark line’ (sorry don’t know what that is in your language). This would allow you to remove those lines at a later stage.

    You said you aren’t sure about how the (?=…) part works. This is called a lookahead, specifically a positive lookahead. I found this site to give excellent information on regexes, and indeed on the lookahead and lookbehind functions.
    http://rexegg.com/regex-disambiguation.html
    Dismantling my one, we see first the ?=, this identifies the type of lookahead/lookbehind we are using. The rest of the function works exactly like a regex would, so it’s providing the search pattern we are searching for.

    So in my case we are looking ahead for some characters (or not - this is the *) .*, then we want a start of line followed by 5 - and an end of line ^-----\R. Then afterwards we are finding some more characters (or not) and then the string we first found \1. This is special, it means the first group identified in this regex, hence the (.+?) portion near the start of the regex.

    I hope that gives you some help.

    Come back if you need further guidance.

    Terry



  • WOW i am so amazed !

    this is it! it works!
    look this at this beauty!

    THANK YOU 💙 Terry

    To conclude for people who want to perform the same thing:

    • Copy paste the two file A & B in one
    • put a delimiter line in between the two - i used ---
    • then you need to design a regex of that kind (?si)Patern(unique feature)(?=.*^---\R.*\1)
      • here patern is #EXTINF:\d+, and the unique feature that i wanted to be in both A&B is just what’s next
    • in the end (?si)#EXTINF:\d+,(.+?)$(?=.*^---\R.*\1)

    I made up a test using some of the text from your images and came up with the following regex to be used in the marking or deleting of lines. Try it and see if it matches better. As I haven’t spent a lot of time and don’t fully know the data you are working with there may still be changes required.

    the data was available at the pastbin link in my previous post (sneaky yet present)



  • @Gregory-Mompezat
    thanks for upvotes. I’m glad it worked without issues. I’m sad because I spent some minutes writing some of the text from the image, copying it and duplicating, then changing some of the text, when I didn’t see the “pastebin” link below.

    Terry


Log in to reply