Find only files with exact two words



  • I have a lot of different files. Notepad++ should only find the files which inlude the two words i’m searching for.
    The two words are not located in the same line, just somewhere in the same file.

    How can i do this?



  • There is the search words in files, he that does not work?



  • Maybe this works, but i would like to do it in notepad. I had used it in notepad a few months ago, but i’ve forgot what i have to put in the search field in notepad. Tried some i found via google but nothing works the way i need it.

    It also finds files which include either one of the words. But i need to find files with both words in.



  • welcome to the notepad++ community, @Ronny-Kerk

    our regex specialists are currently offline, and i’m only at janitor level for regex, but here’s something you could try:

    open up find in files and enter:
    find what: (?=.*word1)(?=.*word2)
    directory: your desired path
    search mode: regular expression
    and hit find all



  • Thanks for your answer.
    I’ve tried it but it does not work. It finds the two words i’m looking for, but just in a big section. Not in the whole file.

    My files (ca. 1,500) are filled with many words. Most files have over 1000 lines. Now i want to give notepad two or maybe more words to look for. For example: “Ronny Kerk” and “1982” are the words i’m looking for. Now notepad should show me all the files where both these two search criterias are included.



  • @Ronny-Kerk

    I’m not promoted to be an regex expert yet but what about using something like

    (?s)(?=.*1982)(?=.*Ronny Kerk).*



  • @Ronny-Kerk

    I would suggest this:

    Find: (?si)(\bword1\b.*?\bword2\b)|(\bword2\b.*?\bword1\b)
    Search mode: Regular expression

    The \b are there to enforce word boundaries–remove them if not desired. Also this will find word1 and word2 in either order, and without regard to the case.

    So basically this: I’ve tried it but it does not work. It finds the two words i’m looking for, but just in a big section. Not in the whole file doesn’t make a lot of sense. How can it not work but yet find the 2 words you want? Can you explain more about what you expect versus what happens?

    Note that Notepad++ can’t directly give you a list of files. It can only give you a list of matches, which includes the filenames but also has more information about the matches.



  • @Alan-Kilborn

    may I ask you, where do you see the advantage of using alternations versus lookaheads?



  • @Ekopalypse said:

    where do you see the advantage of using alternations versus lookaheads?

    I suppose for the current case of the OP, it doesn’t matter, but if I were doing it, I suspect I might like to see the range where my match was found, in certain instances. The lookahead approach selects as a match the entire file contents. BTW, I’m always nervous when the regex engine causes an entire file contents match. It makes me think it has failed in a big way…see here.

    If the 2 words need to occur on a single line (not the OP’s case!), I am not reluctant to use the lookahead approach, the classic example of which is here. I always remember that one by recalling it is the “jack” approach. :)



  • @Alan-Kilborn

    thank you very much. I guess I understood :-)



  • @Alan-Kilborn said:

    @Ronny-Kerk

    I would suggest this:

    Find: (?si)(\bword1\b.*?\bword2\b)|(\bword2\b.*?\bword1\b)
    Search mode: Regular expression

    Hello Alan,
    this is the solution. It works like it should. Thanks for your help.



  • Hello, @ronny-kerk, @andrecool-68, @meta-chuh, @ekopalypse, @alan-kilborn and All,

    Here is a general method to list all files which contains word1 AND word2 AND word3 AND … wordN. The + of that solution is it should be fast enough and that you do not need to worry about regex problems, as the use of the (?s) syntax, look-arounds, and the order of the different words to match :-))

    In addition, even if you were about to look for 3 expressions, simultaneously, with a regex, you should have to test the different ranges, below :

    Word3........Word1..........Word2
    Word3........Word2..........Word1
    Word1........Word3..........Word2
    Word2........Word3..........Word1
    Word1........Word2..........Word3
    Word2........Word1..........Word3

    Rather fastidious, isn’t it ?


    So, in short, the different steps, of that general method, are :

    • Search, in Normal mode, of each expression word1, word2,…,wordN and successive outputs in the Find result panel

    • Paste of all the contents of the Find result panel in a new tab

    • Use of a first regex S/R, in order to keep the absolute pathnames, only

    • Alphabetic sort of these pathnames

    • Use of a second regex S/R, to isolate the pathnames which are present N times

    • Use of a third regex S/R to delete all the other pathnames, which do not contain the N words simultaneously


    OK, let’s go :

    • Open the Find ( Ctrl + F ) or the Find in Files dialog ( Ctrl + Shift + F )

    • Search, successively, for the expressions word1, word2wordN

    • Tick, if necessary, the Match whole word only and/or the Match case options

    • Tick the Wrap around option

    • Select, preferably, the Normal search mode

    • Click, either, on the Find All in All Opened Documents or the Find All button

    => After the N consecutive searches, you’ll get N searches in the Find result panel


    • In the Find resul panel, select all the text ( Ctrl + A ) and copy it in the clipboard ( Ctrl + C )

    • Open a new tab ( Ctrl + N ) and paste the clipboard’s contents ( Ctrl + V )

    • Open the Replace dialog ( Ctrl + H )

    • Perform the following regex S/R, to keep, only, the different absolute pathnames

    SEARCH (?-is)^(\t|Search).+\R|\x20\(\d+\x20hits?\)$

    REPLACE Leave EMPTY

    • Tick the Wrap around option

    • Select the Regular expression search mode

    • Click on the Replace All button

    • Now, let’s sort that text, with the option Search > Line Operations > Sort Lines Lexicographically Ascending

    • Add a manual line-break at the very end of that sorted list ( IMPORTANT )


    • Perform this second regex S/R, to detach the only pathnames present, N times

    SEARCH (^.+\R)\1{N-1} , where N represents the number of the searched expressions

    REPLACE \1\r\n ( or \1\n if Unix files )

    • Tick the Wrap around option

    • Click on the Replace All button

    So, for a search of any file, containing 4 expressions/words, just use the search regex (^.+\R)\1{3}


    • Finally, using the final regex S/R, below, you’ll obtain the expected list, after suppression of the unwanted pathnames, and line-breaks :

    SEARCH ^.+\R(?!\R)|\R(?=\R)

    REPLACE Leave EMPTY

    • Tick the Wrap around option

    • Click on the Replace All button

    You’ll get, the list of all the absolute pathnames of files containing, at least once, all the words word1, word2wordN, in any order !


    Of course, you may search for expressions more complicated than simple words, using the Regular expression search mode !

    Best Regards,

    guy038


Log in to reply