Find only files with exact two words
-
I have a lot of different files. Notepad++ should only find the files which inlude the two words i’m searching for.
The two words are not located in the same line, just somewhere in the same file.How can i do this?
-
There is the search words in files, he that does not work?
-
Maybe this works, but i would like to do it in notepad. I had used it in notepad a few months ago, but i’ve forgot what i have to put in the search field in notepad. Tried some i found via google but nothing works the way i need it.
It also finds files which include either one of the words. But i need to find files with both words in.
-
welcome to the notepad++ community, @Ronny-Kerk
our regex specialists are currently offline, and i’m only at janitor level for regex, but here’s something you could try:
open up
find in files
and enter:
find what:(?=.*word1)(?=.*word2)
directory:your desired path
search mode:regular expression
and hitfind all
-
Thanks for your answer.
I’ve tried it but it does not work. It finds the two words i’m looking for, but just in a big section. Not in the whole file.My files (ca. 1,500) are filled with many words. Most files have over 1000 lines. Now i want to give notepad two or maybe more words to look for. For example: “Ronny Kerk” and “1982” are the words i’m looking for. Now notepad should show me all the files where both these two search criterias are included.
-
I’m not promoted to be an regex expert yet but what about using something like
(?s)(?=.*1982)(?=.*Ronny Kerk).*
-
I would suggest this:
Find:
(?si)(\bword1\b.*?\bword2\b)|(\bword2\b.*?\bword1\b)
Search mode: Regular expressionThe
\b
are there to enforce word boundaries–remove them if not desired. Also this will findword1
andword2
in either order, and without regard to the case.So basically this:
I’ve tried it but it does not work. It finds the two words i’m looking for, but just in a big section. Not in the whole file
doesn’t make a lot of sense. How can it not work but yet find the 2 words you want? Can you explain more about what you expect versus what happens?Note that Notepad++ can’t directly give you a list of files. It can only give you a list of matches, which includes the filenames but also has more information about the matches.
-
may I ask you, where do you see the advantage of using alternations versus lookaheads?
-
@Ekopalypse said:
where do you see the advantage of using alternations versus lookaheads?
I suppose for the current case of the OP, it doesn’t matter, but if I were doing it, I suspect I might like to see the range where my match was found, in certain instances. The lookahead approach selects as a match the entire file contents. BTW, I’m always nervous when the regex engine causes an entire file contents match. It makes me think it has failed in a big way…see here .
If the 2 words need to occur on a single line (not the OP’s case!), I am not reluctant to use the lookahead approach, the classic example of which is here . I always remember that one by recalling it is the “jack” approach. :)
-
thank you very much. I guess I understood :-)
-
@Alan-Kilborn said:
I would suggest this:
Find:
(?si)(\bword1\b.*?\bword2\b)|(\bword2\b.*?\bword1\b)
Search mode: Regular expressionHello Alan,
this is the solution. It works like it should. Thanks for your help. -
Hello, @ronny-kerk, @andrecool-68, @meta-chuh, @ekopalypse, @alan-kilborn and All,
Here is a general method to list all files which contains
word1
ANDword2
ANDword3
AND …wordN
. The+
of that solution is it should be fast enough and that you do not need to worry about regex problems, as the use of the(?s)
syntax, look-arounds, and the order of the different words to match :-))In addition, even if you were about to look for
3
expressions, simultaneously, with a regex, you should have to test the different ranges, below :Word3........Word1..........Word2
Word3........Word2..........Word1
Word1........Word3..........Word2
Word2........Word3..........Word1
Word1........Word2..........Word3
Word2........Word1..........Word3
Rather fastidious, isn’t it ?
So, in short, the different steps, of that general method, are :
-
Search, in
Normal
mode, of each expressionword1
,word2
,…,wordN
and successive outputs in theFind result
panel -
Paste of all the contents of the
Find result
panel in a new tab -
Use of a first regex S/R, in order to keep the absolute pathnames, only
-
Alphabetic sort of these pathnames
-
Use of a second regex S/R, to isolate the pathnames which are present
N
times -
Use of a third regex S/R to delete all the other pathnames, which do not contain the
N
words simultaneously
OK, let’s go :
-
Open the Find (
Ctrl + F
) or the Find in Files dialog (Ctrl + Shift + F
) -
Search, successively, for the expressions
word1
,word2
…wordN
-
Tick, if necessary, the
Match whole word only
and/or theMatch case
options -
Tick the
Wrap around
option -
Select, preferably, the
Normal
search mode -
Click, either, on the
Find All in All Opened Documents
or theFind All
button
=> After the
N
consecutive searches, you’ll getN
searches in theFind result
panel
-
In the
Find resul
panel, select all the text (Ctrl + A
) and copy it in the clipboard (Ctrl + C
) -
Open a new tab (
Ctrl + N
) and paste the clipboard’s contents (Ctrl + V
) -
Open the Replace dialog (
Ctrl + H
) -
Perform the following regex S/R, to keep, only, the different absolute pathnames
SEARCH
(?-is)^(\t|Search).+\R|\x20\(\d+\x20hits?\)$
REPLACE
Leave EMPTY
-
Tick the
Wrap around
option -
Select the
Regular expression
search mode -
Click on the
Replace All
button -
Now, let’s sort that text, with the option
Search > Line Operations > Sort Lines Lexicographically Ascending
-
Add a manual line-break at the very end of that sorted list ( IMPORTANT )
- Perform this second regex S/R, to detach the only pathnames present,
N
times
SEARCH
(^.+\R)\1{
N-1}
, where N represents the number of the searched expressionsREPLACE
\1\r\n
( or\1\n
if Unix files )-
Tick the
Wrap around
option -
Click on the
Replace All
button
So, for a search of any file, containing
4
expressions/words, just use the search regex(^.+\R)\1{3}
- Finally, using the final regex S/R, below, you’ll obtain the expected list, after suppression of the unwanted pathnames, and line-breaks :
SEARCH
^.+\R(?!\R)|\R(?=\R)
REPLACE
Leave EMPTY
-
Tick the
Wrap around
option -
Click on the
Replace All
button
You’ll get, the list of all the absolute pathnames of files containing, at least once, all the words
word1
,word2
…wordN
, in any order !
Of course, you may search for expressions more complicated than simple words, using the
Regular expression
search mode !Best Regards,
guy038
-