<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[finding files using reg-ex]]></title><description><![CDATA[<p dir="auto">Hi,<br />
I want to find files in a directory that contain two (or more) specific words. Files containing word1 OR word2 are returned using | but how can I find files that contain word1 AND word2 ? I tried (word1)*.(word2) but that didn’t work.<br />
Thanks for your help.</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/20388/finding-files-using-reg-ex</link><generator>RSS for Node</generator><lastBuildDate>Tue, 09 Jun 2026 11:51:08 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/20388.rss" rel="self" type="application/rss+xml"/><pubDate>Fri, 27 Nov 2020 13:03:12 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to finding files using reg-ex on Sat, 28 Nov 2020 00:21:50 GMT]]></title><description><![CDATA[<p dir="auto">Another variant on this general theme that is handy is finding two words, in either order, with a certain degree of proximity.  This way I can find, in my notes, words that I may not know the exact phrasing for, but that I know are going to be there, and be close to each other, when I need to look up something.</p>
<p dir="auto">So, say I want to find <code>foo</code> close to <code>bar</code>, say within 50 characters.  Maybe <code>bar</code> occurs before <code>foo</code>, but maybe not.  Here’s what I’d search for:</p>
<p dir="auto"><code>(?s)(foo)(.{0,50}?)(bar)|(?3)(?2)(?1)</code></p>
]]></description><link>https://community.notepad-plus-plus.org/post/60219</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60219</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Sat, 28 Nov 2020 00:21:50 GMT</pubDate></item><item><title><![CDATA[Reply to finding files using reg-ex on Fri, 27 Nov 2020 22:11:09 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a></p>
<p dir="auto">Hi Guy, yes, here’s what happened when I answered the question originally:</p>
<p dir="auto">I looked in my file of notes and I saw this example:</p>
<p dir="auto"><code>(?-s)(?=.*foo)(?=.*bar).*</code></p>
<p dir="auto">So I copied that example in my response above, and blindly changed the leading <code>(?-s)</code> to <code>(?s)</code>, after some quick testing on small data.</p>
<p dir="auto">After the OP had problems with that, I looked a bit farther down in my notes file and found the note to use this one when the data is not necessarily on the same line:</p>
<p dir="auto"><code>(?s)(foo).+?(bar)|(?2).+?(?1)</code></p>
<p dir="auto">So the conclusion I draw, is that it is great to have “notes”, but it is also smart to really read them before just grabbing a snippet and changing it even slightly, to then offer it as advice.  :-(</p>
]]></description><link>https://community.notepad-plus-plus.org/post/60217</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60217</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Fri, 27 Nov 2020 22:11:09 GMT</pubDate></item><item><title><![CDATA[Reply to finding files using reg-ex on Fri, 27 Nov 2020 21:04:47 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="/user/dieter-zweigel" aria-label="Profile: dieter-zweigel">@<bdi>dieter-zweigel</bdi></a> , <a class="plugin-mentions-user plugin-mentions-a" href="/user/alan-kilborn" aria-label="Profile: alan-kilborn">@<bdi>alan-kilborn</bdi></a> and <strong>All</strong>,</p>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/dieter-zweigel" aria-label="Profile: dieter-zweigel">@<bdi>dieter-zweigel</bdi></a>, you said :</p>
<blockquote>
<p dir="auto">I have to admit that I do not fully understand the expressions.</p>
</blockquote>
<p dir="auto">The <a class="plugin-mentions-user plugin-mentions-a" href="/user/alan-kilborn" aria-label="Profile: alan-kilborn">@<bdi>alan-kilborn</bdi></a>’s regex <strong><code>(?s)(?=.*?word1)(?=.*?word2).*</code></strong> may be described as :</p>
<ul>
<li>
<p dir="auto">First, the <strong><code>(?s)</code></strong> syntax means that the regex <strong>dot</strong> symbol <strong><code>.</code></strong> represents any <strong>single</strong> character, even <strong>EOL</strong> chars !</p>
</li>
<li>
<p dir="auto">Then, come two <strong>positive look-ahead</strong> structures <strong><code>(?=.....)</code></strong> which <strong>test</strong> if the regex expression , after the <strong><code>=</code></strong> sign is <strong><code>true</code></strong></p>
<ul>
<li>
<p dir="auto">From <strong>beginning</strong> of file, is there, further on, a string <strong><code>Word 1</code></strong>, after a <strong>greatest</strong> range, possibly <strong>null</strong>, of any character ?</p>
</li>
<li>
<p dir="auto">After this <strong>first</strong> step, it’s important to understand that processing the first <strong>look-ahead</strong> <strong><code>(?=.*word1)</code></strong> has <strong>not</strong> changed the regex engine search <strong>position</strong> which is, still, at the <strong>very beginning</strong> of file !</p>
</li>
<li>
<p dir="auto">So, from <strong>beginning</strong> of file, is there, further on, a string <strong><code>Word 2</code></strong>, after a <strong>greatest</strong> range, possibly <strong>null</strong> of any character ?</p>
</li>
</ul>
</li>
<li>
<p dir="auto">If the answer to these <strong>two</strong> questions is <strong><code>yes</code></strong>, then the regex engine matches, again from the <strong>very beginning</strong>, <strong>all</strong> the file contents <strong><code>.*</code></strong> . However, note that, when the <strong><code>Find Result</code></strong> panel is involved, only the <strong>first physical</strong> line of each file, globally seen as a <strong>single</strong> line, is displayed Safe behavior in case of <strong>huge</strong> files ;-))</p>
</li>
</ul>
<p dir="auto">And to search for files containing, at <strong>least</strong>, <strong><code>1</code></strong> string <strong><code>word1</code></strong> OR <strong><code>1</code></strong> string <strong><code>word2</code></strong>, use this regex, with an <strong>alternative</strong> located inside the <strong>look-ahead</strong> :</p>
<p dir="auto"><strong><code>(?s)(?=.*(word1|word2)).*</code></strong></p>
<hr />
<p dir="auto">Now, <strong>Alan</strong> I did some tests with the more <strong>simple</strong> regex <strong><code>(?s)(?=.*AAA).*</code></strong> against the well-known <strong><code>license.txt</code></strong> file. This regex should <strong>select</strong> all file contents if the string <strong><code>AAA</code></strong>, whatever its case, <strong>exists</strong> and should <strong>beep</strong>, if no string <strong><code>AAA</code></strong> is found.</p>
<p dir="auto">Unfortunately, I noticed that the search <strong>crashed</strong> and selects all file contents, although this file does <strong>not</strong> contain, obviously, the <strong><code>AAA</code></strong> string. I, then, shortened this file and the regex seems to work for a <strong><code>13,5 kB</code></strong> file, only, with the <strong>expected</strong> message <strong><code>Find: Can't find the text "(?s)(?=.*AAA).*"</code></strong> Surely, my <strong>weak</strong> configuration corrupt <strong>correct</strong> results. Just test it on various files. The problem occurs when <strong>no</strong> match can be found !</p>
<p dir="auto">It’s worth to add that this regex would <strong>correctly</strong> work if we were searching <strong><code>word1</code></strong> and <strong><code>word2</code></strong> in <strong>each</strong> line of a file and <strong>not</strong> in <strong>all</strong> file contents, with the regex <strong><code>(?-s)(?=.*word1)(?=.*word2).+</code></strong> ;-))</p>
<hr />
<p dir="auto">So, <a class="plugin-mentions-user plugin-mentions-a" href="/user/dieter-zweigel" aria-label="Profile: dieter-zweigel">@<bdi>dieter-zweigel</bdi></a>, I would advise you to use, preferably, the <strong>second</strong> <a class="plugin-mentions-user plugin-mentions-a" href="/user/alan-kilborn" aria-label="Profile: alan-kilborn">@<bdi>alan-kilborn</bdi></a> regex syntax, which is must <strong>faster</strong> and does <strong>not</strong> report <strong>wrong</strong> matches</p>
<p dir="auto">To end with, <a class="plugin-mentions-user plugin-mentions-a" href="/user/dieter-zweigel" aria-label="Profile: dieter-zweigel">@<bdi>dieter-zweigel</bdi></a>, note that this regex <strong><code>(?s)(word1).+?(word2)|(?2).+?(?1)</code></strong> is a <strong>shortened</strong> syntax for :</p>
<p dir="auto"><strong><code>(?s)(word1).+?(word2)|(word2).+?(word1)</code></strong>. This form is <strong>easier</strong> to understand and almost <strong>obvious</strong>. Indeed, we are looking for a text :</p>
<ul>
<li>Containing the string <strong><code>word1</code></strong> and, further on, the string <strong><code>word2</code></strong></li>
</ul>
<p dir="auto">OR  ( <strong><code>|</code></strong> )</p>
<ul>
<li>Containing the string <strong><code>word2</code></strong> and, further on, the string <strong><code>word1</code></strong></li>
</ul>
<p dir="auto">It’s important to realize that, although <strong><code>word1</code></strong> and <strong><code>word2</code></strong> are stored as <strong>groups</strong> <strong><code>1</code></strong> and <strong><code>2</code></strong> we <strong>cannot</strong> use the syntax <strong><code>(?s)(word1).+?(word2)|\2.*?\1</code></strong>, with <strong>back-references</strong> to these groups !</p>
<p dir="auto">Do you see why ? Well, when the <strong>first</strong> alternative is matched ( <strong><code>Word1.........Word2</code></strong> ), the <strong>back-references</strong> <strong><code>\1</code></strong> and <strong><code>\2</code></strong>, although <strong>not</strong> used, do contain the strings <strong><code>word1</code></strong> and <strong><code>word2</code></strong>. But, when the <strong>first</strong> alternative fails ( case  <strong><code>Word2........Word1</code></strong> ), the <strong>second</strong> alternative <strong><code>\2.*?\1</code></strong> is tried. However, as <strong>no</strong> group is defined, this regex part is just <strong>invalid</strong></p>
<p dir="auto">Conversely, with the <strong><code>(?1)</code></strong> and <strong><code>(?2)</code></strong> syntaxes which are <strong>subroutine calls</strong> to contents of <strong>groups</strong> <strong><code>1</code></strong> and <strong><code>2</code></strong>, the syntax <strong><code>(?s)(word1).+?(word2)|(?2).+?(?1)</code></strong> is correct and can match the <strong>two</strong> cases. Note that the <strong>subroutine calls</strong> are really interesting when groups contains, themselves, regexes, possibly <strong>complex</strong>, instead of simple strings !</p>
<p dir="auto">A simple example : given this text :</p>
<pre><code class="language-z">123---ABC---123
123---ABC---456
123---ABC---789

456---ABC---123
456---ABC---456
456---ABC---789

789---ABC---123
789---ABC---456
789---ABC---789
</code></pre>
<p dir="auto">See the difference between the regex <strong><code>(\d+)---ABC---\1</code></strong> and the regex <strong><code>(\d+)---ABC---(?1)</code></strong>, against that text :</p>
<ul>
<li>
<p dir="auto">In the former, the <strong>back-reference</strong> <strong><code>\1</code></strong> refers to the <strong>present</strong> value of the <strong>group</strong> <strong><code>1</code></strong></p>
</li>
<li>
<p dir="auto">In the latter, the <strong>subroutine call</strong> <strong><code>(?1)</code></strong> refers to regex contents of the <strong>group</strong> <strong><code>1</code></strong>, so <strong><code>\d+</code></strong></p>
</li>
</ul>
<p dir="auto">This means that the <strong>last</strong> regex is just <strong>identical</strong> to the regex <strong><code>\d+---ABC---\d+</code></strong>. Of course a <strong>subroutine call</strong> can refer to a much <strong>complex</strong> regex than <strong><code>\d+</code></strong> !</p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/60216</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60216</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Fri, 27 Nov 2020 21:04:47 GMT</pubDate></item><item><title><![CDATA[Reply to finding files using reg-ex on Fri, 27 Nov 2020 16:58:11 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/dieter-zweigel" aria-label="Profile: Dieter-Zweigel">@<bdi>Dieter-Zweigel</bdi></a> said in <a href="/post/60202">finding files using reg-ex</a>:</p>
<blockquote>
<p dir="auto">Is the file size only a problem when using regular expressions?</p>
</blockquote>
<p dir="auto">File size isn’t the problem, per se.  The problem is that in a large file the two words could be far apart, causing the regex engine to have to do a lot of “work” and it can become “overloaded”.</p>
<p dir="auto">In large files where the words are close together it should be no problem; obviously, also the case for small files that they should be okay.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/60204</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60204</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Fri, 27 Nov 2020 16:58:11 GMT</pubDate></item><item><title><![CDATA[Reply to finding files using reg-ex on Fri, 27 Nov 2020 15:24:08 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/alan-kilborn" aria-label="Profile: Alan-Kilborn">@<bdi>Alan-Kilborn</bdi></a> My original files are MS-Word .doc of a size between 40 kB and 70 kB. I would not consider these files being large - and apparently they are small enough for a normal search for only one word. Is the file size only a problem when using regular expressions?<br />
Your second regex delivers correct results on both the original doc and the testfiles (txt). I have to admit that I do not fully understand the expressions. However, I am very happy now, having two solutions for the problem. Thank you!</p>
]]></description><link>https://community.notepad-plus-plus.org/post/60202</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60202</guid><dc:creator><![CDATA[Dieter Zweigel]]></dc:creator><pubDate>Fri, 27 Nov 2020 15:24:08 GMT</pubDate></item><item><title><![CDATA[Reply to finding files using reg-ex on Fri, 27 Nov 2020 14:31:26 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/dieter-zweigel" aria-label="Profile: Dieter-Zweigel">@<bdi>Dieter-Zweigel</bdi></a> said in <a href="/post/60198">finding files using reg-ex</a>:</p>
<blockquote>
<p dir="auto">I don’t know what went wrong with the first search on my original files</p>
</blockquote>
<p dir="auto">It could be that the regex I gave is causing an “overflow” with larger files.<br />
This is a known Notepad++ problem where, on a single file, all text is deemed a “hit” when really an error message should be displayed.<br />
With the info provided to this point, I can’t tell for certain if this is something you are encountering.  My testing of it was certainly done on very small files that I quickly made, so the large file phenomenon, if that’s truly what is happening for you, would not have happened to me.</p>
<p dir="auto">Here’s another possible regex to try:</p>
<p dir="auto"><code>(?s)(word1).+?(word2)|(?2).+?(?1)</code></p>
<p dir="auto">I’d be interested to know if you have a different experience with that one, on your original fileset.</p>
<p dir="auto">However, it seems like your immediate problem is solved with the other technique, and that is a “good thing”. :-)</p>
]]></description><link>https://community.notepad-plus-plus.org/post/60199</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60199</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Fri, 27 Nov 2020 14:31:26 GMT</pubDate></item><item><title><![CDATA[Reply to finding files using reg-ex on Fri, 27 Nov 2020 14:05:35 GMT]]></title><description><![CDATA[<p dir="auto">I don’t know what went wrong with the first search on my original files. After that I created a test directory with some simple test files and the second search using your regex delivered the expected result.<br />
Thank you very much for the suggestion to use “find in these results”. This method is much easier (though less elegant) and works fine.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/60198</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60198</guid><dc:creator><![CDATA[Dieter Zweigel]]></dc:creator><pubDate>Fri, 27 Nov 2020 14:05:35 GMT</pubDate></item><item><title><![CDATA[Reply to finding files using reg-ex on Fri, 27 Nov 2020 13:47:59 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/dieter-zweigel" aria-label="Profile: Dieter-Zweigel">@<bdi>Dieter-Zweigel</bdi></a> said in <a href="/post/60190">finding files using reg-ex</a>:</p>
<blockquote>
<p dir="auto">how can I find files that contain word1 AND word2</p>
</blockquote>
<p dir="auto">Another technique for this “and” problem:</p>
<p dir="auto">Did you know that you can base a second search on the results of a first search?</p>
<p dir="auto">Here’s how:</p>
<p dir="auto">After you do the <em>Find in Files</em> for “word1”, right-click in the “Find result” window and select <em>Find in these found results…</em><br />
You can then proceed to specify “word2” and the next search will be conducted only in the files found with your earlier search.<br />
The net result of the second search should be files that must contain both “word1” and “word2”.<br />
Caution: Be aware of your setting for <em>Search only in found lines</em> --for what you’ve specified you want to untick that.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/60196</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60196</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Fri, 27 Nov 2020 13:47:59 GMT</pubDate></item><item><title><![CDATA[Reply to finding files using reg-ex on Fri, 27 Nov 2020 13:41:28 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/dieter-zweigel" aria-label="Profile: Dieter-Zweigel">@<bdi>Dieter-Zweigel</bdi></a></p>
<p dir="auto">Hmm, well I just tried it again to verify, and for me it found only the files that contained both words.<br />
Not sure what would be going wrong for you with it.<br />
Sorry. :-(</p>
]]></description><link>https://community.notepad-plus-plus.org/post/60195</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60195</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Fri, 27 Nov 2020 13:41:28 GMT</pubDate></item><item><title><![CDATA[Reply to finding files using reg-ex on Fri, 27 Nov 2020 13:32:50 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/alan-kilborn" aria-label="Profile: Alan-Kilborn">@<bdi>Alan-Kilborn</bdi></a> Unfortunatley it does not work. The result contains all the files in the directory indipendant of the occurance of either word1 or word2.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/60194</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60194</guid><dc:creator><![CDATA[Dieter Zweigel]]></dc:creator><pubDate>Fri, 27 Nov 2020 13:32:50 GMT</pubDate></item><item><title><![CDATA[Reply to finding files using reg-ex on Fri, 27 Nov 2020 13:29:46 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/dieter-zweigel" aria-label="Profile: Dieter-Zweigel">@<bdi>Dieter-Zweigel</bdi></a></p>
<p dir="auto">So then I think what I already provided should work fine for you.<br />
Does it?</p>
]]></description><link>https://community.notepad-plus-plus.org/post/60193</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60193</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Fri, 27 Nov 2020 13:29:46 GMT</pubDate></item><item><title><![CDATA[Reply to finding files using reg-ex on Fri, 27 Nov 2020 13:20:43 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/alan-kilborn" aria-label="Profile: Alan-Kilborn">@<bdi>Alan-Kilborn</bdi></a> No, word order is not important; the regex should find any order (and occurance) of both words.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/60192</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60192</guid><dc:creator><![CDATA[Dieter Zweigel]]></dc:creator><pubDate>Fri, 27 Nov 2020 13:20:43 GMT</pubDate></item><item><title><![CDATA[Reply to finding files using reg-ex on Fri, 27 Nov 2020 13:17:02 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/dieter-zweigel" aria-label="Profile: Dieter-Zweigel">@<bdi>Dieter-Zweigel</bdi></a></p>
<p dir="auto">Is word order important?<br />
That is, must word1 appear before word2?</p>
<p dir="auto">Must the two words be on the same line?<br />
Or can they be anywhere in the file?</p>
<p dir="auto">Do you want to find all occurrences of this in a file?<br />
Or just the first is sufficient?</p>
<p dir="auto">Something that should work (until you “tighten up” your spec) is:</p>
<p dir="auto">find: <code>(?s)(?=.*word1)(?=.*word2).*</code></p>
]]></description><link>https://community.notepad-plus-plus.org/post/60191</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/60191</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Fri, 27 Nov 2020 13:17:02 GMT</pubDate></item></channel></rss>