Bookmarking Lines Containing Values in Other File



  • Greetings!

    I’m trying to do something in NP++, but I don’t know that it’s even possible.

    Say that one has a file that is 10,000 records long that contains several fields (comma delimited).

    There is also another file, say 500 records long, that holds just the data found in one of the fields of the 10,000 record file.

    What I want to do is to bookmark each line in the large file that matches any record in the small file.

    To simplify it, imagine that the big file just has all kinds of fruit and the little file only has apples, bananas, & grapes. I want to bookmark each line in the big file that has any one of those three values.

    Thoughts?

    Thanks in advance for any help or insight you can give.

    Joe



  • Sounds like a wrong use of N++



  • I guess it’s possible but probably not the ideal way of doing it as @Kasper-Graversen said. Once you bookmark the lines what are you wanting to do with them?



  • Hello Joe Murphy,

    I’ve got a solution :-))

    So, let’s suppose that you have :

    • A huge file, File A, with a lot of records, containing several fields, separated with comma

    • A smaller file, File B with some records, having the same organisation. This File B may contain, either :

      • Some records identical to those, in File A

      • Some fields, numbered n, identical to those, in File A, with the same number n

    Just one hypothesis : Each record must NOT begin with a comma character

    So, two main cases may happen :

    1. You need to bookmark all the records, of File A, that have a same record in File B

    2. You need to bookmark all the records, of File A, whose Field n have a corresponding Field n, in a record of File B


    Follow the different steps, below :

    • Open, first, your file File B, in Notepad++

    • Select all the contents of File B ( CTRL + A )

    • Save it, in the clipboard ( CTRL + C )

    • Now, open your file File A

    • Add, at the end of File A, a new line #####, which will determine the boundary, between the two files File A and File B !

    You must choose any character, NOT present in, both, File A and File B. It may be, either, a @, %, &,… character

    • Then append the contents of the clipboard, at the end of File A, after the line ##### ( CTRL + V )

    • Go back to very beginning of the present File A ( CTRL + Origin )

    • Open the Find dialog ( CTRL + F )

    • Select the Mark tab

    • Check the Bookmark line option

    • Select the Regular expression search mode

    • Depending of case 1) or 2) chosen, type in the regex :

      • SEARCH : (?-s)^(.+\R)(?=(?s).*#####.*?\1) for case 1)

      • SEARCH : ^([^,\r\n]+?,){n-1}\K([^,\r\n]+)(?=(?s).*#####.*?\2) for case 2)

    IMPORTANT : In the second regex, just change the n-1 formula by the appropriate number, depending on number n of the concerned field )

    • Click on the Mark All button

    • Depending of case 1) or 2) chosen :

      • => All the lines, of File A, with an identical line, located downwards, after the line #####, ( from File B ) are bookmarked

      • => All the lines, of File A, with an identical Field n, located downwards, after the line #####, ( from File B ) are bookmarked

    • Move to line #####

    • Delete from that line ##### to the end of the file ( all the appended lines of File B )

    • Finally, save, the new state of File A, with all the bookmarks

    Et voilà !

    Notes :

    • If some lines/fields are identical in File A, without, at least, one corresponding line/field, in File B, they are NOT bookmarked, due to the ##### boundary !

    • If some lines/fields are identical, in File B, it doesn’t matter, as long as there is, at least, one corresponding line/field, in File A, which will be, correctly, bookmarked

    Best Regards,

    guy038

    P.S.. :

    I forgot to tell you about the (?s) and (?-s) modifiers :

    • Usually, when the . matches newline option, of the Find dialog is unchecked OR when you use the (?-s) form, any dot, in the regex, stands for a standard character, that is to say any character, part of the class [^\r\n\f]

    • However, when the . matches newline option is checked OR when you use the (?s) form, in regexes, the dot stands for absolutely any character ( standard and/or End of Line characters ). For instance, if the cursor is at the very beginning of the current file, the simple regex (?s).* matches any character and is equivalent to a CTRL + A command !!


Log in to reply