<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[A new bug found 180304]]></title><description><![CDATA[<p dir="auto">Hello <a class="plugin-mentions-user plugin-mentions-a" href="/user/guy038" aria-label="Profile: guy038">@<bdi>guy038</bdi></a> , long time no see. I found a new bug today. Here’s the bug:<br />
If we have text</p>
<blockquote>
<p dir="auto">d<br />
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa<br />
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa</p>
<p dir="auto">aaaaaaaaaaaa</p>
</blockquote>
<p dir="auto">Which is, exact 5 lines, each ending with <code>\r\n</code> except line 5.<br />
line 1: 1 letter <code>d</code><br />
line 2: 197 letter <code>a</code><br />
line 3: 823 letter <code>a</code><br />
line 4: <code>empty</code><br />
line 5: 12 letter <code>a</code></p>
<p dir="auto">And the regx is <code>d(.*\R.*){1,5}c</code>. Of course <strong>. match newline</strong> is not checked</p>
<p dir="auto">Obviously, no match should be found, because there is no letter <code>c</code> in the text. But in fact, when I press Find, after a few seconds busy running, it will match the whole text. Even if I insert another letter in front of <code>d</code>, it will still match the whole text.</p>
<p dir="auto">OK. 2 more detailes of this bug.</p>
<ol>
<li>If you replace the <code>\R</code> with <code>\r\n</code> in the regx, you’ll still get the same bug if you add more <code>a</code> to each line. So this bug is not caused by <code>\R</code>.</li>
<li>Why 197, 823 and 12? If you remove 1 <code>a</code> from any of line 2, 3 or 5, then the search result will be no match. However, the searching time is never short(I understand this is not a very efficient regx expression).</li>
</ol>
<p dir="auto">I have done all the experiments on the newest release of NPP, and the <strong>François-R Boyer</strong> version(which is NPP 6.9.0 with some modification) that you introduced in <a href="https://notepad-plus-plus.org/community/topic/12835/how-to-remove-duplicate-words-in-a-line-using-notepad" rel="nofollow ugc">this post</a> a year ago.<br />
I want know what could be the problem. If it is because the text is too long, I believe there should be some document or statement that declares this limitation.<br />
Why does regx has so many bugs? Is it too difficult to implement? Is there any more stable platform that can execute regx? I know there are some websites that we can execute regx, but we cannot use the <strong>search in files</strong> functionality of NPP, and I am not sure they’re bugless.</p>
<p dir="auto">Maybe you are curious how I came across this bug. Here’s the story. I want to search for the whole article 2 key words, <code>d</code> and <code>c</code>, with the restriction that <code>c</code> is following <code>d</code> but it’s not farther than a few lines. So i used regx like  <code>d(.*\R.*){1,2}c</code>, where <code>2</code> is increasing. It was fine from 1 to 4, but when I searched with <code>d(.*\R.*){1,5}c</code>, it started to match the whole text.<br />
The text I searched was 2000 lines. I reduce the text little by little, and finally reduce it to 5 lines.</p>
<p dir="auto">Best Regards!</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/15364/a-new-bug-found-180304</link><generator>RSS for Node</generator><lastBuildDate>Fri, 08 May 2026 08:07:46 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/15364.rss" rel="self" type="application/rss+xml"/><pubDate>Sat, 03 Mar 2018 16:04:03 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to A new bug found 180304 on Sun, 04 Mar 2018 09:43:42 GMT]]></title><description><![CDATA[<p dir="auto">Thanks for the reply. And yes, it seems to be the <strong>Catastrophic Backtracking</strong> thing.  And it seems this stackoverflow exception is not caught.<br />
The regex should be <code>(?-s)d(.*\R){1,5}?.*c</code> because <code>c</code> is not always the first of a line.<br />
I know I should avoid low efficient regex expressions, but it is buggy to directly show a wrong result instead of telling me the limitation is reached.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/30718</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/30718</guid><dc:creator><![CDATA[古旮]]></dc:creator><pubDate>Sun, 04 Mar 2018 09:43:42 GMT</pubDate></item><item><title><![CDATA[Reply to A new bug found 180304 on Sun, 04 Mar 2018 15:44:31 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="/user/%E5%8F%A4%E6%97%AE" aria-label="Profile: 古旮">@<bdi>古旮</bdi></a> and <strong>all</strong></p>
<p dir="auto">First of all, just read this <strong>short</strong> reply to <a class="plugin-mentions-user plugin-mentions-a" href="/user/marc-lalonde" aria-label="Profile: marc-lalonde">@<bdi>marc-lalonde</bdi></a>, below :</p>
<p dir="auto"><a href="https://notepad-plus-plus.org/community/topic/15247/replacing-duped-words-across-a-block-block-of-text-respecting/23" rel="nofollow ugc">https://notepad-plus-plus.org/community/topic/15247/replacing-duped-words-across-a-block-block-of-text-respecting/23</a></p>
<p dir="auto">Obviously, in your example, there is no <strong>recursion</strong> feature, nor <strong>big</strong> amounts of text ! But, I think that all the <strong>troubles</strong> comes from the <strong><code>.*\R.*</code></strong> syntax</p>
<p dir="auto">I succeeded to <strong>simplify</strong> the problem ! Let start with the following text :</p>
<p dir="auto">Line 1 : <strong><code>1</code></strong> letter <strong><code>d</code></strong>, with its <strong>line-break</strong><br />
Line 2 : <strong><code>14000</code></strong> letters <strong><code>a</code></strong>, with its <strong>line -break</strong>, too</p>
<p dir="auto">And let discuss about this <strong>similar</strong> regex :<strong><code>d(.*\R.*){1,2}c</code></strong></p>
<p dir="auto">Allow me to use the <strong><code>(?-s)</code></strong> modifier to ensures that the <strong>dot</strong>  will match standard chars, only ! Then, my regex <strong><code>d(.*\R.*){1,2}c</code></strong> can be re-written :</p>
<p dir="auto"><strong><code>(?-s)d(.*\R.*)(.*\R.*)c</code></strong> ( Regex <strong>R2</strong> )</p>
<p dir="auto">And, if <strong>NO</strong> match can be found, the regex engine goes on and, then, <strong>tries</strong> the regex :</p>
<p dir="auto"><strong><code>(?-s)d(.*\R.*)c</code></strong> ( Regex <strong>R1</strong> )</p>
<p dir="auto">Finally, if a match still <strong>cannot</strong> be found, the <strong>Find</strong> dialog displays the message <strong>Find: Can’t find the text “(?-s)d(.*\R.*){1,2}c”</strong></p>
<hr />
<p dir="auto">Well, let’s go back to my <strong>example</strong> !  To begin with, just create a <strong>new</strong> line <strong><code>3</code></strong> with the <strong>four</strong> letters <strong><code>cdef</code></strong>, only ( IMPORTANT )</p>
<p dir="auto">Now, let’s try the regex <strong><code>(?-s)d(.*\R.*){1,2}c</code></strong> against my text : the regex engine tries the regex <strong>R2</strong>, first, which <strong>does</strong> match, immediately, all letters <strong><code>a</code></strong>, between the letters <strong><code>d</code></strong> and <strong><code>c</code></strong> included !</p>
<pre><code class="language-diff">d                     :   Letter d

FIRST  block .*\R.*   :   Nothing   +   \R   +  14500 letters a

SECOND block .*\R.*   :   Nothing   +   \R   +  Nothing

c                     :   Letter c
</code></pre>
<p dir="auto">Now, get <strong>rid of</strong> the string <strong><code>cdef</code></strong>, in line <strong><code>3</code></strong> and re-try the regex <strong><code>(?-s)d(.*\R.*){1,2}c</code></strong>. This time, <strong>troubles</strong> begin and, as you said, after <strong>8s</strong> about, it <strong>wrongly</strong> selects all file contents !</p>
<p dir="auto">Then, <strong>reduce</strong> the number of letters a, in line <strong><code>2</code></strong>, to <strong><code>14000</code></strong> letters. This time, after <strong>8s</strong>, as expected, the <strong>Find</strong> dialog answers :</p>
<p dir="auto"><strong>Find: Can’t find the text “(?-s)d(.*\R.*){1,2}c”</strong></p>
<p dir="auto"><strong>IMPORTANT</strong> : Depending of your <strong>configuration</strong>, and the amount of <strong>memory</strong>, on your laptop, the <strong>limit</strong> ( <strong><code>14000</code></strong> - <strong><code>14500</code></strong> ) may be <strong>quite different</strong> than mime, but <strong>should</strong> occur, anyway !</p>
<p dir="auto">So, how to explain this <strong>difference</strong> ? Well, at first, as the quantifiers <strong><code>*</code></strong>  are <strong>greedy</strong>, the regex engine tries the case :</p>
<pre><code class="language-diff">d                     :   Letter d

FIRST  block .*\R.*   :   Nothing   +   \R   +  14500 letters a

SECOND block .*\R.*   :   Nothing   +   \R   +  Nothing

c                     :   MISSING
</code></pre>
<p dir="auto"><strong>NO</strong> match can be found and, as the regex could be rewritten <strong><code>(?-s)d.*\R.*.*\R.*c</code></strong>, the regex engine, still keeping the regex <strong>R2</strong>, then, begins to <strong>backtrack</strong> and tries this <strong>other</strong> configuration :</p>
<pre><code class="language-diff">d                     :   Letter d

FIRST  block .*\R.*   :   Nothing      +   \R   +  14499 letters a

SECOND block .*\R.*   :   1 letter a   +   \R   +  Nothing

c                     :   MISSING
</code></pre>
<p dir="auto">Of course, as the letter <strong><code>c</code></strong> is <strong>always</strong> missing, then it continues to <strong>backtrack</strong> and chooses to test :</p>
<pre><code class="language-diff">d                     :   Letter d

FIRST  block .*\R.*   :   Nothing       +   \R   +  14498 letters a

SECOND block .*\R.*   :   2 letters a   +   \R   +  Nothing

c                     :   MISSING
</code></pre>
<p dir="auto">… and going on, testing <strong><code>14499</code></strong> cases, trying to reach the <strong>last</strong> case :</p>
<pre><code class="language-diff">d                     :   Letter d

FIRST  block .*\R.*   :   Nothing           +   \R   +  Nothing

SECOND block .*\R.*   :   14500 letters a   +   \R   +  Nothing

c                     :   MISSING
</code></pre>
<p dir="auto">Unfortunately, while testing <strong>all</strong> the combinations, a <strong>catastrophic backtracking</strong> error occurred and the regex engine <strong>wrongly</strong> matches <strong>all</strong> file contents :-((</p>
<p dir="auto">Personally, I would advice you to <strong>strongly</strong> avoid regexes like <strong><code>(.*)(.*)</code></strong>  or <strong><code>(.*)+</code></strong> or even <strong><code>(x+x+)+y</code></strong> !!</p>
<hr />
<p dir="auto">Finally, as you said :</p>
<blockquote>
<p dir="auto">I want to search for the whole article 2 key words, d and c, with the restriction that c is following d but it’s not farther than a few lines</p>
</blockquote>
<p dir="auto">I think, <a class="plugin-mentions-user plugin-mentions-a" href="/user/%E5%8F%A4%E6%97%AE" aria-label="Profile: 古旮">@<bdi>古旮</bdi></a>, that the <strong>right</strong> regex should be, simply, <strong><code>(?-s)d(.*\R){1,5}c</code></strong>. For instance, <strong>this</strong> regex would match the text below :</p>
<pre><code class="language-diff">d
line 1
line 2
line 3
line 4
c
</code></pre>
<p dir="auto">but would <strong>not</strong> match the following one :</p>
<pre><code class="language-diff">d
line 1
line 2
line 3
line 4
line 5
line 6
c
</code></pre>
<p dir="auto">I, personally, did a test with the regex <strong><code>(?-s)d(.*\R){1,2}c</code></strong> and the following text</p>
<p dir="auto">Line 1 : <strong><code>1</code></strong> letter <strong><code>d</code></strong> + its <strong>line-break</strong><br />
Line 2 : <strong><code>100,000</code></strong> letters <strong><code>a</code></strong> + its <strong>line-break</strong></p>
<p dir="auto">It <strong>correctly</strong> displays the message <strong>Find: Can’t find “(?-s)d(.*\R){1,2}c”</strong></p>
<p dir="auto">Now, <strong>adding</strong> a line <strong><code>3</code></strong> with the <strong>string</strong> <strong><code>cdef</code></strong>, without any <strong>line-break</strong>, it selected, as expected, <strong>all</strong> text between the <strong>first</strong> char <strong><code>d</code></strong> and the <strong>single</strong> <strong><code>c</code></strong> letter, leaving the final <strong>def</strong> string <strong>unselected</strong> !</p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
<p dir="auto"><strong>P.S.</strong> :</p>
<p dir="auto">Refer, also, to this article,  about <strong>catastrophic backtracking</strong>, by <strong>Jan Goyvaerts</strong>, <strong><code>THE</code></strong> definitive regex <strong><code>GURU</code></strong> :</p>
<p dir="auto"><a href="http://www.regular-expressions.info/catastrophic.html" rel="nofollow ugc">http://www.regular-expressions.info/catastrophic.html</a></p>
]]></description><link>https://community.notepad-plus-plus.org/post/30714</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/30714</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 04 Mar 2018 15:44:31 GMT</pubDate></item></channel></rss>