Community
    • Login

    Find only files with exact two words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 6 Posters 11.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Ronny KerkR
      Ronny Kerk
      last edited by

      I have a lot of different files. Notepad++ should only find the files which inlude the two words i’m searching for.
      The two words are not located in the same line, just somewhere in the same file.

      How can i do this?

      1 Reply Last reply Reply Quote 0
      • andrecool-68A
        andrecool-68
        last edited by

        There is the search words in files, he that does not work?

        1 Reply Last reply Reply Quote 0
        • Ronny KerkR
          Ronny Kerk
          last edited by

          Maybe this works, but i would like to do it in notepad. I had used it in notepad a few months ago, but i’ve forgot what i have to put in the search field in notepad. Tried some i found via google but nothing works the way i need it.

          It also finds files which include either one of the words. But i need to find files with both words in.

          Meta ChuhM 1 Reply Last reply Reply Quote 0
          • Meta ChuhM
            Meta Chuh moderator @Ronny Kerk
            last edited by Meta Chuh

            welcome to the notepad++ community, @Ronny-Kerk

            our regex specialists are currently offline, and i’m only at janitor level for regex, but here’s something you could try:

            open up find in files and enter:
            find what: (?=.*word1)(?=.*word2)
            directory: your desired path
            search mode: regular expression
            and hit find all

            1 Reply Last reply Reply Quote 2
            • Ronny KerkR
              Ronny Kerk
              last edited by

              Thanks for your answer.
              I’ve tried it but it does not work. It finds the two words i’m looking for, but just in a big section. Not in the whole file.

              My files (ca. 1,500) are filled with many words. Most files have over 1000 lines. Now i want to give notepad two or maybe more words to look for. For example: “Ronny Kerk” and “1982” are the words i’m looking for. Now notepad should show me all the files where both these two search criterias are included.

              EkopalypseE Alan KilbornA 2 Replies Last reply Reply Quote 2
              • EkopalypseE
                Ekopalypse @Ronny Kerk
                last edited by Ekopalypse

                @Ronny-Kerk

                I’m not promoted to be an regex expert yet but what about using something like

                (?s)(?=.*1982)(?=.*Ronny Kerk).*

                1 Reply Last reply Reply Quote 2
                • Alan KilbornA
                  Alan Kilborn @Ronny Kerk
                  last edited by

                  @Ronny-Kerk

                  I would suggest this:

                  Find: (?si)(\bword1\b.*?\bword2\b)|(\bword2\b.*?\bword1\b)
                  Search mode: Regular expression

                  The \b are there to enforce word boundaries–remove them if not desired. Also this will find word1 and word2 in either order, and without regard to the case.

                  So basically this: I’ve tried it but it does not work. It finds the two words i’m looking for, but just in a big section. Not in the whole file doesn’t make a lot of sense. How can it not work but yet find the 2 words you want? Can you explain more about what you expect versus what happens?

                  Note that Notepad++ can’t directly give you a list of files. It can only give you a list of matches, which includes the filenames but also has more information about the matches.

                  EkopalypseE Ronny KerkR 2 Replies Last reply Reply Quote 2
                  • EkopalypseE
                    Ekopalypse @Alan Kilborn
                    last edited by

                    @Alan-Kilborn

                    may I ask you, where do you see the advantage of using alternations versus lookaheads?

                    Alan KilbornA 1 Reply Last reply Reply Quote 1
                    • Alan KilbornA
                      Alan Kilborn @Ekopalypse
                      last edited by

                      @Ekopalypse said:

                      where do you see the advantage of using alternations versus lookaheads?

                      I suppose for the current case of the OP, it doesn’t matter, but if I were doing it, I suspect I might like to see the range where my match was found, in certain instances. The lookahead approach selects as a match the entire file contents. BTW, I’m always nervous when the regex engine causes an entire file contents match. It makes me think it has failed in a big way…see here.

                      If the 2 words need to occur on a single line (not the OP’s case!), I am not reluctant to use the lookahead approach, the classic example of which is here. I always remember that one by recalling it is the “jack” approach. :)

                      EkopalypseE 1 Reply Last reply Reply Quote 3
                      • EkopalypseE
                        Ekopalypse @Alan Kilborn
                        last edited by

                        @Alan-Kilborn

                        thank you very much. I guess I understood :-)

                        1 Reply Last reply Reply Quote 1
                        • Ronny KerkR
                          Ronny Kerk @Alan Kilborn
                          last edited by

                          @Alan-Kilborn said:

                          @Ronny-Kerk

                          I would suggest this:

                          Find: (?si)(\bword1\b.*?\bword2\b)|(\bword2\b.*?\bword1\b)
                          Search mode: Regular expression

                          Hello Alan,
                          this is the solution. It works like it should. Thanks for your help.

                          1 Reply Last reply Reply Quote 3
                          • guy038G
                            guy038
                            last edited by guy038

                            Hello, @ronny-kerk, @andrecool-68, @meta-chuh, @ekopalypse, @alan-kilborn and All,

                            Here is a general method to list all files which contains word1 AND word2 AND word3 AND … wordN. The + of that solution is it should be fast enough and that you do not need to worry about regex problems, as the use of the (?s) syntax, look-arounds, and the order of the different words to match :-))

                            In addition, even if you were about to look for 3 expressions, simultaneously, with a regex, you should have to test the different ranges, below :

                            Word3........Word1..........Word2
                            Word3........Word2..........Word1
                            Word1........Word3..........Word2
                            Word2........Word3..........Word1
                            Word1........Word2..........Word3
                            Word2........Word1..........Word3

                            Rather fastidious, isn’t it ?


                            So, in short, the different steps, of that general method, are :

                            • Search, in Normal mode, of each expression word1, word2,…,wordN and successive outputs in the Find result panel

                            • Paste of all the contents of the Find result panel in a new tab

                            • Use of a first regex S/R, in order to keep the absolute pathnames, only

                            • Alphabetic sort of these pathnames

                            • Use of a second regex S/R, to isolate the pathnames which are present N times

                            • Use of a third regex S/R to delete all the other pathnames, which do not contain the N words simultaneously


                            OK, let’s go :

                            • Open the Find ( Ctrl + F ) or the Find in Files dialog ( Ctrl + Shift + F )

                            • Search, successively, for the expressions word1, word2 … wordN

                            • Tick, if necessary, the Match whole word only and/or the Match case options

                            • Tick the Wrap around option

                            • Select, preferably, the Normal search mode

                            • Click, either, on the Find All in All Opened Documents or the Find All button

                            => After the N consecutive searches, you’ll get N searches in the Find result panel


                            • In the Find resul panel, select all the text ( Ctrl + A ) and copy it in the clipboard ( Ctrl + C )

                            • Open a new tab ( Ctrl + N ) and paste the clipboard’s contents ( Ctrl + V )

                            • Open the Replace dialog ( Ctrl + H )

                            • Perform the following regex S/R, to keep, only, the different absolute pathnames

                            SEARCH (?-is)^(\t|Search).+\R|\x20\(\d+\x20hits?\)$

                            REPLACE Leave EMPTY

                            • Tick the Wrap around option

                            • Select the Regular expression search mode

                            • Click on the Replace All button

                            • Now, let’s sort that text, with the option Search > Line Operations > Sort Lines Lexicographically Ascending

                            • Add a manual line-break at the very end of that sorted list ( IMPORTANT )


                            • Perform this second regex S/R, to detach the only pathnames present, N times

                            SEARCH (^.+\R)\1{N-1} , where N represents the number of the searched expressions

                            REPLACE \1\r\n ( or \1\n if Unix files )

                            • Tick the Wrap around option

                            • Click on the Replace All button

                            So, for a search of any file, containing 4 expressions/words, just use the search regex (^.+\R)\1{3}


                            • Finally, using the final regex S/R, below, you’ll obtain the expected list, after suppression of the unwanted pathnames, and line-breaks :

                            SEARCH ^.+\R(?!\R)|\R(?=\R)

                            REPLACE Leave EMPTY

                            • Tick the Wrap around option

                            • Click on the Replace All button

                            You’ll get, the list of all the absolute pathnames of files containing, at least once, all the words word1, word2 … wordN, in any order !


                            Of course, you may search for expressions more complicated than simple words, using the Regular expression search mode !

                            Best Regards,

                            guy038

                            1 Reply Last reply Reply Quote 3
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors