<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Unexpected regex behaviour]]></title><description><![CDATA[<p dir="auto">Hi,</p>
<p dir="auto">I have a regular expression which is this</p>
<pre><code>editor\.([a-z]+)(?=[\(|\r\n])
</code></pre>
<p dir="auto">From my understanding it does looking for a string which has</p>
<ul>
<li>editor. (literally) followed by</li>
<li>lower case character (one or multiple) followed by either</li>
<li>( or carriage return newline</li>
</ul>
<p dir="auto">So I don’t expect that this line</p>
<pre><code>editor.changeInsertion(int length, const char *text)
</code></pre>
<p dir="auto">is found but this line should be found</p>
<pre><code>editor.changeinsertion(int length, const char *text)
</code></pre>
<p dir="auto">What happens is, that both lines are found.<br />
If I check Match Case box then it works, only the second line is found.</p>
<p dir="auto">Is this expected behaviour or did I misunderstand the regex meaning?</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/11126/unexpected-regex-behaviour</link><generator>RSS for Node</generator><lastBuildDate>Sat, 16 May 2026 01:24:00 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/11126.rss" rel="self" type="application/rss+xml"/><pubDate>Sun, 17 Jan 2016 23:52:12 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Unexpected regex behaviour on Fri, 29 Jan 2016 20:10:00 GMT]]></title><description><![CDATA[<p dir="auto">PMJI, but I think throwing the Match case option in this problem is exploding its complexity; just see the length of ~24 jan post from “guy038” starting (after salute) with “Because of some tiring days, this week…” (sorry for this lengthy pointing a post but absolute dates and every civilized ways have been removed).</p>
<p dir="auto">So I suggest to just slightly change the Ctrl+F “<strong>Find</strong>” box so that the “<em>Match case</em>” line is treated just like the “<em>Match whole word only</em>”, i.e. gets GRAYED when “<em>Regular expression</em>” is selected. This would make the problem much more reliable and powerful since making it more general and more compliant with what its labels are saying.</p>
<p dir="auto">Now I know by years of experience that the Notepad++ developers hate everything new or different and that this <a href="https://en.wikipedia.org/wiki/Not_invented_here" rel="nofollow ugc">NIH syndrome</a> of theirs will most probably get this suggestion thrown to trash even before being read… yet I submit it anyway.</p>
<p dir="auto">Versailles, Fri 29 Jan 2016 21:10:00 +0100</p>
]]></description><link>https://community.notepad-plus-plus.org/post/13667</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/13667</guid><dc:creator><![CDATA[Michel Merlin]]></dc:creator><pubDate>Fri, 29 Jan 2016 20:10:00 GMT</pubDate></item><item><title><![CDATA[Reply to Unexpected regex behaviour on Sun, 24 Jan 2016 20:53:11 GMT]]></title><description><![CDATA[<p dir="auto">Hello <strong>Claudia</strong> and <strong>All</strong>,</p>
<p dir="auto">Because of some tiring days, this week, at work, and, also, because of a nice <strong>ski</strong>-day, at <strong>Courchevel</strong>, on Saturday ( quite tired too, as it was the first outing, this winter ! ) I have <strong>not</strong> posted anything yet, trying, each evening, to sort out these case problems, little by little, with regexes of the form <strong><code>[Char1-Char2]</code></strong> or <strong><code>[Char1-Char2]+</code></strong>  However, it comes that it’s even <strong>worse</strong> that I thought, at first sight :-((</p>
<p dir="auto">Globally speaking, we must distinguish <strong>TWO</strong> main cases :</p>
<p dir="auto">A) The <strong>Match case</strong> option, of the <strong>Search/Replace/Mark</strong> dialog, is <strong>checked</strong></p>
<p dir="auto">Let’s consider the simple <strong>test</strong> string, with the <strong>26</strong> letters, in <strong>both</strong> cases, below :</p>
<pre><code>ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz
</code></pre>
<p dir="auto">Then, for instance, the search of the regex <strong><code>[F-q]+</code></strong> matches the string <strong>FGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopq</strong></p>
<p dir="auto">Things are quite <strong>simple</strong> : The regex <strong><code>[Char1-Char2]</code></strong> matches <strong>any</strong> character, whose code-point is <strong>&gt;=</strong> to <strong>Char1</strong>’s code-point AND <strong>&lt;=</strong> to <strong>Char2</strong>’s code-point !</p>
<p dir="auto">Note two points :</p>
<ul>
<li>
<p dir="auto">This <strong>sensitive</strong> way of search is the <strong>default</strong> case option, in most regex engines</p>
</li>
<li>
<p dir="auto">The code-point of <strong>Char2</strong> MUST be <strong>&gt;=</strong> to the code-point of <strong>Char1</strong>. Otherwise, while clicking on the <strong>Find Next</strong> button, you get, logically, the error message <strong>Find:Invalid regular expression</strong>.</p>
</li>
</ul>
<p dir="auto"><strong>Strangely</strong>, with N++, while searching the <strong>wrong</strong> regex <strong><code>[F-D]</code></strong>, if you click on the <strong>Count</strong> button, on the <strong>Find All</strong> of the <strong>Mark</strong> tab, or some other buttons, you just get the message <strong>Count: O matches</strong> or <strong>Mark: 0 matches</strong>, instead of the <strong>error</strong> message !?</p>
<hr />
<p dir="auto">B) The <strong>Match case</strong> option, of the <strong>Search/Replace/Mark</strong> dialog, is <strong>UNCHECKED</strong></p>
<p dir="auto">Things become more complicated ! Let’s consider our previous regex <strong><code>[F-q]+</code></strong> and the same <strong>test</strong> string.</p>
<p dir="auto">Logically, as the search is considered, in an <strong>insensitive</strong> way, <strong>any</strong> letter of the <strong>previous</strong> found range <strong>FGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopq</strong> is matched, either, in <strong>lowercase</strong> or <strong>uppercase</strong>. But, as this string contains, exactly, the <strong>26 letters</strong> of the alphabet, all the <strong>test</strong> string should be matched. Indeed, that is the <strong>correct</strong> behaviour, that I verified, on the site <a href="https://regex101.com" rel="nofollow ugc">https://regex101.com</a> , with the <strong>i</strong> modifier ( <strong>insensitive</strong> )</p>
<p dir="auto">Unfortunately, with Notepad++, the regex <strong><code>[F-q]+</code></strong> matches, successively the two strings <strong>FGHIJKLMNOPQ</strong>, then <strong>fghijklmnopq</strong>, only :-( The regex engine <strong>wrongly</strong> considers, only, letters that are <strong>both</strong>, in <strong>lower</strong> AND <strong>upper</strong> case, from the range [F…q] ! I would consider this behaviour as a bug !</p>
<p dir="auto">Moreover, the <strong>two</strong> regex, <strong><code>[F-q]+</code></strong> and <strong><code>[F-Q]+</code></strong>, give, wrongly, the <strong>same</strong> result, with the N++ regex engine, <strong>contrary to</strong> what the <strong>Regex101</strong> site give !</p>
<hr />
<p dir="auto">I also, found, some other <strong>weird</strong> cases, when one <strong>limit</strong>, of the range, is <strong>not</strong> a letter ! Let’s consider the standard <strong>ASCII</strong> list string ( from character <strong><code>\x21</code></strong> to character <strong><code>\x7e</code></strong> ) , below :</p>
<pre><code>!"#$%&amp;'()*+,-./0123456789:;&lt;=&gt;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
</code></pre>
<p dir="auto">and suppose that you use the N++ <strong>Mark</strong> feature, with the two options <strong>Purge for each search</strong> and <strong>Wrap around</strong> checked</p>
<p dir="auto">Then, for instance, the <strong>two</strong> regexes <strong><code>[5-l]</code></strong> and <strong><code>[K-~]</code></strong> match, respectively, ANY character in :</p>
<pre><code>56789:;&lt;=&gt;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijkl

KLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~
</code></pre>
<p dir="auto">when the <strong>Match case</strong> option is <strong>CHECHED</strong>. Luckily, it’s the <strong>expected</strong> behaviour !!</p>
<p dir="auto">And these <strong>same</strong> regexes match, respectively, ANY character in :</p>
<pre><code>56789:;&lt;=&gt;?@ABCDEFGHIJKL[\]^_`abcdefghijkl

KLMNOPQRSTUVWXYZklmnopqrstuvwxyz{|}~
</code></pre>
<p dir="auto">when the <strong>Match case</strong> option is <strong>UNCHECKED</strong>. Why, among other things, the <strong>six</strong>-characters block, below :</p>
<pre><code>[\]^_`
</code></pre>
<p dir="auto">is taken in account, in the <strong>first</strong> regex and NOT with the <strong>second</strong> one !!!???</p>
<hr />
<p dir="auto">So, I’m asking for people who could test this <strong>two</strong> regexes <strong><code>[5-l]</code></strong> and <strong><code>[K-~]</code></strong>, in an <strong>INSENSITIVE</strong> way, against the <strong>test</strong> string above ( range from <strong><code>\x21</code></strong> to <strong><code>\x7e</code></strong> ), with <strong>other</strong> regex tools than the <strong>Boost</strong> regex engine, in order to know the different <strong>one-character</strong>s that are <strong>really</strong> matched ?</p>
<p dir="auto">As for me, to be <strong>logic</strong>, the correct behaviour is :</p>
<ul>
<li>
<p dir="auto">The regex <strong><code>[5-l]</code></strong> should match ANY character in :</p>
<p dir="auto">56789:;&lt;=&gt;?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz</p>
</li>
<li>
<p dir="auto">The regex <strong><code>[K-~]</code></strong> should match ANY character in :</p>
<p dir="auto">ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~</p>
</li>
</ul>
<hr />
<p dir="auto">Therefore, and to sum up, when using the <strong>Regular expression</strong> search mode, I advice you to :</p>
<ul>
<li>
<p dir="auto">Always click, <strong>once</strong>, on the <strong>Find Next</strong> button to ensure that your regex is a <strong>valid</strong> one</p>
</li>
<li>
<p dir="auto">Preferably, apart from <strong>simple</strong> searches, such a <strong>single</strong> word, always check the <strong>Match case</strong> option, to get <strong>logical</strong> results from the regex engine</p>
</li>
</ul>
<hr />
<p dir="auto"><strong>Claudia</strong>, I tried to find out some infos, about the <strong>case insensibility</strong> feature. Look at the different <strong>links</strong>, ( without order ), below :</p>
<p dir="auto"><a href="http://www.rexegg.com/regex-modifiers.html#i" rel="nofollow ugc">http://www.rexegg.com/regex-modifiers.html#i</a></p>
<p dir="auto"><a href="http://www.pcre.org/current/doc/html/pcre2pattern.html#SEC9" rel="nofollow ugc">http://www.pcre.org/current/doc/html/pcre2pattern.html#SEC9</a></p>
<p dir="auto"><a href="http://userguide.icu-project.org/strings/regexp" rel="nofollow ugc">http://userguide.icu-project.org/strings/regexp</a></p>
<p dir="auto"><a href="http://unicode.org/faq/casemap_charprop.html" rel="nofollow ugc">http://unicode.org/faq/casemap_charprop.html</a></p>
<p dir="auto">and also :</p>
<p dir="auto"><a href="http://www.regular-expressions.info/modifiers.html" rel="nofollow ugc">http://www.regular-expressions.info/modifiers.html</a></p>
<p dir="auto"><a href="http://perldoc.perl.org/perlre.html#Modifiers" rel="nofollow ugc">http://perldoc.perl.org/perlre.html#Modifiers</a></p>
<p dir="auto"><a href="http://www.tutorialspoint.com/perl/perl_regular_expressions.htm" rel="nofollow ugc">http://www.tutorialspoint.com/perl/perl_regular_expressions.htm</a></p>
<p dir="auto"><a href="http://stackoverflow.com/questions/3754097/what-is-the-best-way-to-match-only-letters-in-a-regex" rel="nofollow ugc">http://stackoverflow.com/questions/3754097/what-is-the-best-way-to-match-only-letters-in-a-regex</a></p>
<hr />
<p dir="auto">BTW, <strong>Claudia</strong>, the site <a href="http://www.rexegg.com" rel="nofollow ugc">http://www.rexegg.com</a> seems a very, very insteresting site, with <strong>waluable</strong> examples to study. See, for instance, these topics, below :</p>
<p dir="auto"><a href="http://www.rexegg.com/regex-uses.html" rel="nofollow ugc">http://www.rexegg.com/regex-uses.html</a></p>
<p dir="auto"><a href="http://www.rexegg.com/regex-style.html" rel="nofollow ugc">http://www.rexegg.com/regex-style.html</a></p>
<p dir="auto"><a href="http://www.rexegg.com/regex-best-trick.html" rel="nofollow ugc">http://www.rexegg.com/regex-best-trick.html</a></p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/13575</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/13575</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 24 Jan 2016 20:53:11 GMT</pubDate></item><item><title><![CDATA[Reply to Unexpected regex behaviour on Sun, 24 Jan 2016 20:33:02 GMT]]></title><description><![CDATA[<p dir="auto">Hi <strong>Claudia</strong>,</p>
<p dir="auto">Just a simple test because my <strong>reply</strong> to your <strong>previous</strong> post seems to be considered as a <strong>spam</strong> ( !? ) I get the message :</p>
<p dir="auto">Error<br />
Post content was flagged as spam by <strong><a href="http://Akismet.com" rel="nofollow ugc">Akismet.com</a></strong></p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/13574</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/13574</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 24 Jan 2016 20:33:02 GMT</pubDate></item><item><title><![CDATA[Reply to Unexpected regex behaviour on Mon, 18 Jan 2016 13:30:15 GMT]]></title><description><![CDATA[<p dir="auto">Hi guy038,</p>
<p dir="auto">thank you for testing this.<br />
When you say <em>necessarily, search for one lowercase letter, only</em> does this mean<br />
it is expected behaviour, even by regex definition? If so, why do I have to distinguish between A-Z and a-z?<br />
Don’t understand me wrong, I’m absolutely fine with it if I have to use the match case check box just<br />
want to understand if, from regex point of view, this is a misunderstand from my side.</p>
<p dir="auto">Cheers<br />
Claudia<br />
Btw. I saw I got a folder on your notepad tab list - yeah ;-)</p>
]]></description><link>https://community.notepad-plus-plus.org/post/13362</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/13362</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Mon, 18 Jan 2016 13:30:15 GMT</pubDate></item><item><title><![CDATA[Reply to Unexpected regex behaviour on Mon, 18 Jan 2016 02:56:16 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <strong>Claudia</strong>,</p>
<p dir="auto">Oh ! Interesting problem, indeed !</p>
<p dir="auto">The regex syntax <strong><code>[a-z]</code></strong> doesn’t mean that you, <strong>necessarily</strong>, search for one <strong>lowercase</strong> letter, only ! In the same way, the syntax <strong><code>[A-Z]</code></strong> don’t try to match an <strong>uppercase</strong> letter, <strong>exclusively</strong> ! Indeed, all that depends on the <strong>current</strong> state of the <strong>Match case</strong> option :</p>
<ul>
<li>
<p dir="auto">If the <strong>Match case</strong> is checked, the regex engine DOES care about the <strong>case</strong> of the letter matched ( <strong>Upper</strong> or <strong>Lower</strong> ). Then :</p>
<ul>
<li>
<p dir="auto">The regex <strong><code>[A-Z]</code></strong> search an <strong>uppercase</strong> letter, exclusively, between <strong>A</strong> and <strong>Z</strong>, included</p>
</li>
<li>
<p dir="auto">The regex <strong><code>[a-z]</code></strong> search an <strong>lowercase</strong> letter, exclusively, between <strong>a</strong> and <strong>z</strong>, included</p>
</li>
</ul>
</li>
<li>
<p dir="auto">If the <strong>Match case</strong> is <strong>NOT</strong> checked, the regex engine does <strong>NOT</strong> care about the <strong>case</strong> of the letter matched. So :</p>
<ul>
<li>
<p dir="auto">The regex <strong><code>[A-Z]</code></strong> search an <strong>uppercase</strong> OR a <strong>lowercase</strong> letter ( equivalent to the regex <strong><code>[A-Za-z]</code></strong> )</p>
</li>
<li>
<p dir="auto">The regex <strong><code>[a-z]</code></strong> search a <strong>lowercase</strong> OR an <strong>uppercase</strong> letter ( equivalent to the regex <strong><code>[A-Za-z]</code></strong> )</p>
</li>
</ul>
</li>
</ul>
<p dir="auto">Moreover, if you’re using the <strong><code>(?i)</code></strong> OR <strong><code>(?-i)</code></strong> modifier, <strong>before</strong> the <strong>square bracket</strong> range, you <strong>force</strong> the regex engine to behave, in an <strong>insensitive / sensitive</strong> way, <strong>independently</strong> of the <strong>current</strong> state of the <strong>Match case</strong> option !</p>
<hr />
<p dir="auto">I <strong>sum up</strong> all the different cases, in a table, below :</p>
<pre><code>•------------•-----------------------------------------------------------------------------------------------------------------•  
|   Option   |                               REGEX Syntax for matching 1 NON ACCENTUATED letter                                |
|            •---------------•---------------•---------------•---------------•----------------•----------------•---------------•
| Match case |     [A-Z]     |     [a-z]     |   (?i)[A-Z]   |   (?i)[a-z]   |   (?-i)[A-Z]   |   (?-i)[a-z]   |   [A-Za-z]    |
•------------•===============•===============•===============•===============•================•================•===============•
|     NO     |  Upper/Lower  |  Upper/Lower  |  Upper/Lower  |  Upper/Lower  |     Upper      |     Lower      |  Upper/Lower  |
•------------•---------------•---------------•---------------•---------------•----------------•----------------•---------------•
|    YES     |     Upper     |     Lower     |  Upper/Lower  |  Upper/Lower  |     Upper      |     Lower      |  Upper/Lower  |
•------------•---------------•---------------•---------------•---------------•----------------•----------------•---------------•
</code></pre>
<p dir="auto">From that table, it’s easy to see that :</p>
<ul>
<li>
<p dir="auto">The use of the <strong><code>(?-i)</code></strong> <strong>modifier</strong> implies a search of letters, <strong>sensitive</strong> to the <strong>case</strong>, whatever the <strong>Match case</strong> option is checked or <strong>NOT</strong></p>
</li>
<li>
<p dir="auto">The use of the <strong><code>(?i)</code></strong> <strong>modifier</strong> implies a search of letters, <strong>insensitive</strong> to the <strong>case</strong>, whatever the <strong>Match case</strong> option is checked or <strong>NOT</strong>, as well as the use of the regex <strong><code>[A-Za-z]</code></strong> !</p>
</li>
</ul>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/13357</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/13357</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Mon, 18 Jan 2016 02:56:16 GMT</pubDate></item></channel></rss>