Community
    • Login

    Find only files with exact two words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 6 Posters 11.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Ronny KerkR
      Ronny Kerk
      last edited by

      Maybe this works, but i would like to do it in notepad. I had used it in notepad a few months ago, but i’ve forgot what i have to put in the search field in notepad. Tried some i found via google but nothing works the way i need it.

      It also finds files which include either one of the words. But i need to find files with both words in.

      Meta ChuhM 1 Reply Last reply Reply Quote 0
      • Meta ChuhM
        Meta Chuh moderator @Ronny Kerk
        last edited by Meta Chuh

        welcome to the notepad++ community, @Ronny-Kerk

        our regex specialists are currently offline, and i’m only at janitor level for regex, but here’s something you could try:

        open up find in files and enter:
        find what: (?=.*word1)(?=.*word2)
        directory: your desired path
        search mode: regular expression
        and hit find all

        1 Reply Last reply Reply Quote 2
        • Ronny KerkR
          Ronny Kerk
          last edited by

          Thanks for your answer.
          I’ve tried it but it does not work. It finds the two words i’m looking for, but just in a big section. Not in the whole file.

          My files (ca. 1,500) are filled with many words. Most files have over 1000 lines. Now i want to give notepad two or maybe more words to look for. For example: “Ronny Kerk” and “1982” are the words i’m looking for. Now notepad should show me all the files where both these two search criterias are included.

          EkopalypseE Alan KilbornA 2 Replies Last reply Reply Quote 2
          • EkopalypseE
            Ekopalypse @Ronny Kerk
            last edited by Ekopalypse

            @Ronny-Kerk

            I’m not promoted to be an regex expert yet but what about using something like

            (?s)(?=.*1982)(?=.*Ronny Kerk).*

            1 Reply Last reply Reply Quote 2
            • Alan KilbornA
              Alan Kilborn @Ronny Kerk
              last edited by

              @Ronny-Kerk

              I would suggest this:

              Find: (?si)(\bword1\b.*?\bword2\b)|(\bword2\b.*?\bword1\b)
              Search mode: Regular expression

              The \b are there to enforce word boundaries–remove them if not desired. Also this will find word1 and word2 in either order, and without regard to the case.

              So basically this: I’ve tried it but it does not work. It finds the two words i’m looking for, but just in a big section. Not in the whole file doesn’t make a lot of sense. How can it not work but yet find the 2 words you want? Can you explain more about what you expect versus what happens?

              Note that Notepad++ can’t directly give you a list of files. It can only give you a list of matches, which includes the filenames but also has more information about the matches.

              EkopalypseE Ronny KerkR 2 Replies Last reply Reply Quote 2
              • EkopalypseE
                Ekopalypse @Alan Kilborn
                last edited by

                @Alan-Kilborn

                may I ask you, where do you see the advantage of using alternations versus lookaheads?

                Alan KilbornA 1 Reply Last reply Reply Quote 1
                • Alan KilbornA
                  Alan Kilborn @Ekopalypse
                  last edited by

                  @Ekopalypse said:

                  where do you see the advantage of using alternations versus lookaheads?

                  I suppose for the current case of the OP, it doesn’t matter, but if I were doing it, I suspect I might like to see the range where my match was found, in certain instances. The lookahead approach selects as a match the entire file contents. BTW, I’m always nervous when the regex engine causes an entire file contents match. It makes me think it has failed in a big way…see here.

                  If the 2 words need to occur on a single line (not the OP’s case!), I am not reluctant to use the lookahead approach, the classic example of which is here. I always remember that one by recalling it is the “jack” approach. :)

                  EkopalypseE 1 Reply Last reply Reply Quote 3
                  • EkopalypseE
                    Ekopalypse @Alan Kilborn
                    last edited by

                    @Alan-Kilborn

                    thank you very much. I guess I understood :-)

                    1 Reply Last reply Reply Quote 1
                    • Ronny KerkR
                      Ronny Kerk @Alan Kilborn
                      last edited by

                      @Alan-Kilborn said:

                      @Ronny-Kerk

                      I would suggest this:

                      Find: (?si)(\bword1\b.*?\bword2\b)|(\bword2\b.*?\bword1\b)
                      Search mode: Regular expression

                      Hello Alan,
                      this is the solution. It works like it should. Thanks for your help.

                      1 Reply Last reply Reply Quote 3
                      • guy038G
                        guy038
                        last edited by guy038

                        Hello, @ronny-kerk, @andrecool-68, @meta-chuh, @ekopalypse, @alan-kilborn and All,

                        Here is a general method to list all files which contains word1 AND word2 AND word3 AND … wordN. The + of that solution is it should be fast enough and that you do not need to worry about regex problems, as the use of the (?s) syntax, look-arounds, and the order of the different words to match :-))

                        In addition, even if you were about to look for 3 expressions, simultaneously, with a regex, you should have to test the different ranges, below :

                        Word3........Word1..........Word2
                        Word3........Word2..........Word1
                        Word1........Word3..........Word2
                        Word2........Word3..........Word1
                        Word1........Word2..........Word3
                        Word2........Word1..........Word3

                        Rather fastidious, isn’t it ?


                        So, in short, the different steps, of that general method, are :

                        • Search, in Normal mode, of each expression word1, word2,…,wordN and successive outputs in the Find result panel

                        • Paste of all the contents of the Find result panel in a new tab

                        • Use of a first regex S/R, in order to keep the absolute pathnames, only

                        • Alphabetic sort of these pathnames

                        • Use of a second regex S/R, to isolate the pathnames which are present N times

                        • Use of a third regex S/R to delete all the other pathnames, which do not contain the N words simultaneously


                        OK, let’s go :

                        • Open the Find ( Ctrl + F ) or the Find in Files dialog ( Ctrl + Shift + F )

                        • Search, successively, for the expressions word1, word2 … wordN

                        • Tick, if necessary, the Match whole word only and/or the Match case options

                        • Tick the Wrap around option

                        • Select, preferably, the Normal search mode

                        • Click, either, on the Find All in All Opened Documents or the Find All button

                        => After the N consecutive searches, you’ll get N searches in the Find result panel


                        • In the Find resul panel, select all the text ( Ctrl + A ) and copy it in the clipboard ( Ctrl + C )

                        • Open a new tab ( Ctrl + N ) and paste the clipboard’s contents ( Ctrl + V )

                        • Open the Replace dialog ( Ctrl + H )

                        • Perform the following regex S/R, to keep, only, the different absolute pathnames

                        SEARCH (?-is)^(\t|Search).+\R|\x20\(\d+\x20hits?\)$

                        REPLACE Leave EMPTY

                        • Tick the Wrap around option

                        • Select the Regular expression search mode

                        • Click on the Replace All button

                        • Now, let’s sort that text, with the option Search > Line Operations > Sort Lines Lexicographically Ascending

                        • Add a manual line-break at the very end of that sorted list ( IMPORTANT )


                        • Perform this second regex S/R, to detach the only pathnames present, N times

                        SEARCH (^.+\R)\1{N-1} , where N represents the number of the searched expressions

                        REPLACE \1\r\n ( or \1\n if Unix files )

                        • Tick the Wrap around option

                        • Click on the Replace All button

                        So, for a search of any file, containing 4 expressions/words, just use the search regex (^.+\R)\1{3}


                        • Finally, using the final regex S/R, below, you’ll obtain the expected list, after suppression of the unwanted pathnames, and line-breaks :

                        SEARCH ^.+\R(?!\R)|\R(?=\R)

                        REPLACE Leave EMPTY

                        • Tick the Wrap around option

                        • Click on the Replace All button

                        You’ll get, the list of all the absolute pathnames of files containing, at least once, all the words word1, word2 … wordN, in any order !


                        Of course, you may search for expressions more complicated than simple words, using the Regular expression search mode !

                        Best Regards,

                        guy038

                        1 Reply Last reply Reply Quote 3
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors