<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Is it planned to switch to PCRE2?]]></title><description><![CDATA[<p dir="auto">Hello,</p>
<p dir="auto">the PCRE library has changed its API and will provide new features only by means of the new API.</p>
<p dir="auto">Is it planned to switch to the new versions with the new API, or may I create an issue?</p>
<p dir="auto">Summary: <a href="https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html" rel="nofollow ugc">https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html</a><br />
See also: <a href="http://www.pcre.org/" rel="nofollow ugc">http://www.pcre.org/</a></p>
]]></description><link>https://community.notepad-plus-plus.org/topic/9703/is-it-planned-to-switch-to-pcre2</link><generator>RSS for Node</generator><lastBuildDate>Fri, 17 Apr 2026 18:07:22 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/9703.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 27 Aug 2015 17:44:13 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Thu, 11 Aug 2016 02:55:52 GMT]]></title><description><![CDATA[<p dir="auto">I like using regular expressions.</p>
<p dir="auto">For me the biggest problem is the amount of different engines that have their own expression syntax. Yes, that includes the one’s that claim to be Perl compatible. IMO compatible in most cases means sharing some, re-interpreting some and extending some of the syntax instead of just extending the syntax. As a result the RE’s successfully used with one RE engine do not work with another RE engine (unless the RE’s are very-very basic).</p>
<p dir="auto">So at some moment you think: “I know RE’s, I can do this.” and then it appears you don’t :)</p>
]]></description><link>https://community.notepad-plus-plus.org/post/17189</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/17189</guid><dc:creator><![CDATA[MAPJe71]]></dc:creator><pubDate>Thu, 11 Aug 2016 02:55:52 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Sat, 17 Oct 2015 16:46:48 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a> said:</p>
<blockquote>
<p dir="auto">PRUNE</p>
</blockquote>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a> You had me <strong>backtracking control verbs</strong> :)</p>
<p dir="auto">I’ve been wondering why this wasn’t working with my regular expressions within the IDE.  Now I understand why.  This would be a <strong>HUGE</strong> improvement to Notepad++.  There’s a lot of macros that I could finally implement that would make my life so much simpler to develop software.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/11737</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/11737</guid><dc:creator><![CDATA[Erutan409]]></dc:creator><pubDate>Sat, 17 Oct 2015 16:46:48 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Wed, 07 Oct 2015 18:38:39 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a>:<br />
See <a href="https://bugs.exim.org/show_bug.cgi?id=1689" rel="nofollow ugc">https://bugs.exim.org/show_bug.cgi?id=1689</a> for evidence that PCRE2 supports the <code>(?{name}true:false)</code> syntax for substitution, now.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/11632</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/11632</guid><dc:creator><![CDATA[h-h-h-h]]></dc:creator><pubDate>Wed, 07 Oct 2015 18:38:39 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Sun, 20 Sep 2015 18:39:57 GMT]]></title><description><![CDATA[<p dir="auto">Hello <strong>h-h-h-h</strong>,</p>
<p dir="auto">I have studied, from some days, all the <strong>sections</strong>, from the <strong>two</strong> reference <strong>chapters</strong>, of the site, below :</p>
<p dir="auto"><a href="http://www.regular-expressions.info/refflavors.html" rel="nofollow ugc">http://www.regular-expressions.info/refflavors.html</a></p>
<p dir="auto"><a href="http://www.regular-expressions.info/refreplace.html" rel="nofollow ugc">http://www.regular-expressions.info/refreplace.html</a></p>
<p dir="auto">and, here are, my deductions, about the possible <strong>missing BOOST</strong> features. Luckily, most of them are not <strong>main</strong> features. Of course, this list is <strong>NOT exhaustive</strong> at all.</p>
<p dir="auto"><strong>IMPORTANT</strong> :</p>
<p dir="auto">I won’t mention <strong>missing</strong> features <em>that can be achieved with an <strong>other regex syntax</strong> or a <strong>specific</strong> regex</em>. For instance :</p>
<ul>
<li>
<p dir="auto">The special quantifier syntax <strong><code>{,m}</code></strong> may be simulated, by the simple <strong>BOOST</strong> syntax <strong><code>{0,m}</code></strong> )</p>
</li>
<li>
<p dir="auto">The character class <strong>subtraction</strong> <strong><code>[a-z-[aeiuoy&rsqb;&rsqb;</code></strong> can be replaced with the <strong>BOOST</strong> regex <strong><code>(?![aeiouy])[a-z]</code></strong></p>
</li>
<li>
<p dir="auto">The <strong>TCL</strong> modifier <strong><code>(?p)</code></strong> can be changed by the <strong><code>(?ms)</code></strong> combined modifiers <strong>BOOST</strong> form</p>
</li>
<li>
<p dir="auto">The <strong><code>(?P&lt;name&gt;....)</code></strong> construction, for a <strong>named capturing</strong> group, in <strong>Python</strong>, may be obtained with the <strong>BOOST</strong> syntaxes <strong><code>(?&lt;name&gt;....)</code></strong> or <strong><code>(?'name'....)</code></strong></p>
</li>
<li>
<p dir="auto">The <strong>match context</strong> form <strong><code>$_</code></strong>, standing for the <strong>whole regex</strong> match, can be replaced by the <strong>BOOST</strong> syntax below :</p>
<pre><code>$`$&amp;$'
</code></pre>
</li>
</ul>
<p dir="auto">And so on…Therefore, I pointed out :</p>
<hr />
<p dir="auto"><strong>A</strong>) In <strong>SEARCH</strong> regexes :</p>
<ul>
<li>
<p dir="auto">The specific syntaxes <strong><code>\i</code></strong>, <strong><code>\c</code></strong> and its <strong>negative</strong> forms <strong><code>\I</code></strong>,<strong><code>\C</code></strong>, used in <strong>XML Schema</strong> or <strong>XPath</strong>, that apply to <strong>XML</strong> names</p>
</li>
<li>
<p dir="auto"><strong>All</strong> the syntaxes, related to the <strong>Unicode</strong> properties of text ( as <strong><code>\p{Lu}</code></strong>, <strong><code>\pM</code></strong>, <strong><code>\P{IsCntrl}</code></strong>, … )</p>
</li>
<li>
<p dir="auto">The <strong>conditional</strong> form <strong><code>(?(+n)...|....)</code></strong> or <strong><code>(?(-n)...|....)</code></strong>, where the <strong>condition</strong> is the <strong>relative nth</strong> group after, or <strong>before</strong>, the <strong>current</strong> group</p>
</li>
<li>
<p dir="auto">The modifier <strong><code>(?n)</code></strong>, used by <strong>.NET</strong> and <strong>JGsoft</strong>, that make all <strong>unnamed</strong> groups, <strong>non-capturing</strong> groups</p>
</li>
<li>
<p dir="auto">The modifier <strong><code>(?J)</code></strong>, used by <strong>PCRE</strong>, <strong>Delphi</strong>, <strong>PHP</strong>, which allows <strong>duplicate</strong> group names</p>
</li>
<li>
<p dir="auto">The modifier <strong><code>(?U)</code></strong>, used by <strong>PCRE</strong> , which switches the syntax, between <strong>greedy</strong> and <strong>lazy</strong> quantifiers</p>
</li>
<li>
<p dir="auto">The modifier <strong><code>(?X)</code></strong>, used by <strong>PCRE</strong>, that generate an <strong>error</strong>, when a <strong>no-valid</strong> token is escaped</p>
</li>
<li>
<p dir="auto">The modifiers <strong><code>(?b)</code></strong> and <strong><code>(?e)</code></strong>, used in <strong>TCL</strong>, which interprets the regex as a <strong>POSIX Basic RE</strong> or as a <strong>POSIX Extended RE</strong></p>
</li>
</ul>
<hr />
<p dir="auto"><strong>B</strong>) In <strong>Replacement</strong> regexes :</p>
<ul>
<li>
<p dir="auto">The notation <strong><code>\o{####}</code></strong>, where <strong>####</strong> stands for a <strong>octal</strong> number, when it lies between <strong><code>\o{1000}</code></strong> and <strong><code>\o{7777}</code></strong></p>
</li>
<li>
<p dir="auto">The <strong><code>\0</code></strong> syntax, meaning the <strong>NUL</strong> character, which <strong>CANNOT</strong> be inserted, in <strong>replacement</strong>, yet !</p>
</li>
</ul>
<hr />
<p dir="auto">So, if we would switch to the <strong>PCRE2</strong> library, we could benefit from <strong>most</strong> of the <strong>present missing</strong> features, listed  just <strong>above</strong>, but, as I said in my <strong>previous</strong> post, we would lose some <strong>nice replacement</strong> features, too !</p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
<p dir="auto"><strong>P.S.</strong> :</p>
<p dir="auto">If you think of an other <strong>missing BOOST</strong> feature, compared to <strong>PCRE2</strong> ones, just let me know !</p>
]]></description><link>https://community.notepad-plus-plus.org/post/11389</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/11389</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 20 Sep 2015 18:39:57 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Sat, 17 Oct 2015 17:48:02 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <strong>h-h-h-h</strong>,</p>
<p dir="auto">I apologize for this <strong>late</strong> reply, but I was <strong>very</strong> busy, at work, this week and I preferred to <strong>rest</strong> ! I’m, no more, the <strong>young</strong> man that I used to be, before :-((. Again, this post is quite long ! So, let’s have a <strong>second</strong> drink :-))</p>
<hr />
<p dir="auto">Some years ago ( I can’t find again this <strong>quoted</strong> text below, but I had printed it ! ) <strong>Jan Goyvaerts</strong>, the author of the site below :</p>
<p dir="auto"><a href="http://www.regular-expressions.info/" rel="nofollow ugc">http://www.regular-expressions.info/</a></p>
<p dir="auto">said, in the <strong>Replacement Text Reference</strong> section :</p>
<blockquote>
<p dir="auto">a list of <strong>replacement</strong> text flavours is NOT the same as the list of <strong>regular</strong> expression flavours. Indeed, replacements are NOT made by the regular expression engine, but by the tool or programming library, providing the search-and-replace capability. So, tools or languages, using the <strong>SAME</strong> regex engine, may behave <strong>DIFFERENTLY</strong>, when it comes to making replacements. E.g. The <strong>PCRE</strong> library does NOT provide a <strong>search-and-replace</strong> function =&gt; All tools and languages, implementing <strong>PCRE</strong>, use their <strong>OWN</strong> search-and-replace feature, which may result in differences in the <strong>replacement</strong> syntax.</p>
</blockquote>
<p dir="auto">Of course, from the link below :</p>
<p dir="auto"><a href="http://www.pcre.org/changelog.txt" rel="nofollow ugc">http://www.pcre.org/changelog.txt</a></p>
<p dir="auto">we know that, from PCRE2 version 10.00, a new <strong>pcre2_subsitute()</strong> function has been implemented. However, if your read the <strong>two</strong> sections <strong>Using PCRE2</strong>, and <strong>Substituting Matches</strong> of the page, below,</p>
<p dir="auto"><a href="http://www.regular-expressions.info/pcre2.html" rel="nofollow ugc">http://www.regular-expressions.info/pcre2.html</a></p>
<p dir="auto">the handling of <strong>PCRE2</strong> is, seemingly, not as easy as it was with <strong>PCRE</strong> and the <strong>substitute</strong> function has rather simple features, if we compare with the present <strong>BOOST extended format string replacement</strong> tool, in Notepad++ ! Here are, below, some nice features about the <strong>present</strong> BOOST replacement tool :</p>
<hr />
<ul>
<li>With the <strong>BOOST extended</strong> format string tool, <strong>named</strong> groups can be used and any group, named or not, which doesn’t match anything, is just replaced by an <strong>empty</strong> string.</li>
</ul>
<p dir="auto">For instance, if SEARCH = <strong><code>(?&lt;letters&gt;[A-Za-z]+) *(?&lt;digits&gt;\d+)|(\d+) *([A-Za-z]+)</code></strong> and REPLACE = <strong><code>Name : $+{letters}\4  Age : $+{digits}\3</code></strong>, from the text</p>
<pre><code>Peter 35
18Marie
David52
63   Edith
</code></pre>
<p dir="auto">you get the text :</p>
<pre><code>Name : Peter  Age : 35
Name : Marie  Age : 18
Name : David  Age : 52
Name : Edith  Age : 63
</code></pre>
<ul>
<li>
<p dir="auto">With the <strong>BOOST extended</strong> format string tool, the <strong>conditional</strong> replacements can be <strong>nested</strong>. So, if SEARCH = <strong><code>(\d)?(\d)?\d</code></strong> and REPLACE = <strong><code>a (?2three:(?1two:one)) digit(?1s) number</code></strong>, the list of numbers :</p>
<p dir="auto">3<br />
60<br />
729<br />
10</p>
</li>
</ul>
<p dir="auto">is changed into :</p>
<pre><code>a one digit number
a two digits number
a three digits number
a two digits number
</code></pre>
<p dir="auto">Note : For a <strong>TWO</strong> digits number, group <strong>1</strong> is the <strong>TEN</strong> digit, group <strong>2</strong> is <strong>EMPTY</strong> and the last <strong>\d</strong> is the <strong>UNIT</strong> digit !</p>
<ul>
<li>
<p dir="auto">With the <strong>BOOST extended</strong> format string tool, the <strong>context</strong> sequences, below, are supported :</p>
<p dir="auto">$MATCH                  or  ${^MATCH}                 or  $&amp;  or  $0  or  ${0}<br />
$PREMATCH               or  ${^PREMATCH}              or  $`<br />
$POSTMATCH              or  ${^POSTMATCH}             or  $’<br />
$LAST_SUBMATCH_RESULT   or  ${^LAST_SUBMATCH_RESULT}  or  $^N<br />
$LAST_PAREN_MATCH       or  ${^LAST_PAREN_MATCH}      or  $+</p>
</li>
</ul>
<p dir="auto">For instance, <strong><code>$^N</code></strong> represents the contents of the <strong>last capture</strong> group, presently matched. So, giving the subject string <strong>—abcdef—</strong>, SEARCH = <strong><code>(a)|b|(c)(d)e|(f)</code></strong> and REPLACE = <strong><code>&lt;$^N&gt;</code></strong>, we obtain the <strong>replacement</strong> string, below :</p>
<p dir="auto"><strong>—&lt;a&gt;&lt;&gt;&lt;d&gt;&lt;f&gt;—</strong></p>
<p dir="auto">Why ? Well, just because :</p>
<p dir="auto">When it matches <strong>(a)</strong> or <strong>(f)</strong>, the value of <strong>$^N</strong> is the group itself, <strong>a</strong> or <strong>f</strong><br />
When it matches <strong>b</strong> ( NO group ), the value of <strong>$^N</strong> is an EMPTY string<br />
When it matches <strong>(c)(d)e</strong>, the value of <strong>$^N</strong> is <strong>d</strong> ( the contents of the <strong>UPPEST</strong> group matched )</p>
<ul>
<li>With the <strong>BOOST extended</strong> format string tool, the five <strong>case conversions</strong> <strong><code>\u</code></strong>,  <strong><code>\l</code></strong>,  <strong><code>\U</code></strong>, <strong><code>\L</code></strong> and <strong><code>\E</code></strong> are possible.</li>
</ul>
<p dir="auto">For example, the <strong>Proper Case</strong> capitalization rule can be obtained with SEARCH = <strong><code>(\w)(\w*)</code></strong> and REPLACE = <strong><code>\u\1\L\2</code></strong>. So, the sentence <strong>“thIs is a tEST”</strong> will give the nicer text <strong>“This Is A Test”</strong></p>
<p dir="auto">So, <strong>h-h-h-h</strong>, to sum up, I’m not FOR or AGAINST the new <strong>PCRE2</strong> library. It’s just that I wouldn’t <strong>lose</strong> the features above, and some others, that we can <strong>already</strong> use in <strong>replacement</strong> strings !</p>
<hr />
<p dir="auto">Secondly, to my mind, we <strong>do</strong> need to improve the present regular S/R regex engine, by using the <strong>François-R Boyer</strong> version. Of course, between the N++ version <strong>6.0</strong> and version <strong>6.4.2</strong>, some improvements were done and some bugs were fixed by, both, <strong>Dave BrotherStone</strong> and <strong>François-R Boyer</strong> ( as the <strong>Zero length match</strong> call-tip message,… )</p>
<p dir="auto">However, <em>although <strong>François</strong>’s version simply relies on the <strong>BOOST</strong> library</em>, he was able to fix major <strong>issues</strong>, relative to <strong>look-behinds</strong> and <strong>backward assertions</strong>, and succeeded to manage <strong>all UNICODE</strong> characters, as well as <strong>NUL</strong> characters, in replacement !</p>
<p dir="auto">Here are, below, a NON exhaustive list of issues with the <strong>current</strong> regex engine,_ which DON’T occur, with <strong>François-R Boyer</strong>’s version_ :</p>
<hr />
<ul>
<li>
<p dir="auto"><strong>Overlapping</strong> lookbehinds and matched strings are <strong>NOT</strong> correctly handled. For instance, giving the <strong>20 characters subject</strong> string <strong>aaaabaaababbbaabbabb</strong> and SEARCH = <strong><code>(?&lt;!a)ba*</code></strong>, we get <strong>6</strong> matches, but, unfortunately, <strong>2</strong> results are wrong. With the <strong>improved</strong> version of François, it’s all OK !</p>
</li>
<li>
<p dir="auto">We can’t use the <strong>NUL</strong> character in replacement. For example, the simple S/R : SEARCH = <strong><code>ABC</code></strong> and REPLACE = <strong><code>DEF\x00GHI</code></strong>, the result is the string <strong>DEF</strong> only :-(. The <strong>François</strong>’s version does insert the <strong>NUL</strong> character between the strings <strong>DEF</strong> and <strong>GHI</strong> !</p>
</li>
<li>
<p dir="auto"><strong>BACKWARD</strong> assertions are <strong>NOT</strong> correctly supported. E.g. : SEARCH = <strong><code>\A.</code></strong> matches, successively, <strong>all</strong> the characters of the <strong>FIRST</strong> line. With the <strong>François</strong>’s version it only matches, as expected, the <strong>FIRST</strong> character of the current file</p>
</li>
<li>
<p dir="auto">It doesn’t search and replace characters, which are <strong>outside</strong> the Basic Multilingual Plane (<strong>BMP</strong> ). For instance, <em>in an full <strong>UTF-8</strong> file</em> ( with a <strong>BOM</strong> ), if SEARCH = <strong><code>\x{104A5}\x{20AC}</code></strong> and REPLACE = <strong><code>\x{A3}\x{10482}</code></strong>, The present regex engine answers <strong>Invalid regular expression</strong> ! as for the <strong>François</strong>’s version does the replacement <strong>correctly</strong> !</p>
</li>
</ul>
<p dir="auto">Note :</p>
<p dir="auto">Of course, for that specific S/R, you need a <strong>font</strong>, that can display the <strong>Osmanya</strong> characters, and which is affected as the <strong>default style</strong> font, in the <strong>Style Configurator…</strong> dialogue ! To that purpose, download the <strong>Andagii</strong> font at :</p>
<p dir="auto"><a href="http://www.i18nguy.com/unicode/unicode-font.html" rel="nofollow ugc">http://www.i18nguy.com/unicode/unicode-font.html</a></p>
<p dir="auto">and have a look to <strong>Osmanya</strong> characters at :</p>
<p dir="auto"><a href="http://www.unicode.org/charts/PDF/U10480.pdf" rel="nofollow ugc">http://www.unicode.org/charts/PDF/U10480.pdf</a></p>
<ul>
<li>
<p dir="auto">Now, let’s suppose, for instance, the <strong>French</strong> subject string <strong>Un événement</strong>, on a <strong>new</strong> line, and the simple SEARCH regex <strong><code>\w</code></strong>. After a click on the <strong>Find Next</strong> button, close the Replace dialog, and keep on searching some <strong>word</strong> characters, by hitting the <strong>F3</strong> key. When you’re, about, at the end of the string, just go searching <strong>backwards</strong>, by hitting the <strong>SHIFT + F3</strong> key. You’ll notice _that it CAN’T go backwards, <strong>past</strong> the <strong>é</strong> character !!!. The <strong>François</strong>’s version  does works well, in <strong>both</strong> directions !</p>
</li>
<li>
<p dir="auto">A <strong>last</strong> example : if you try to mark the matches of the simple SEARCH regex <strong><code>(?&lt;=.).</code></strong>, the present regex engine marks any character, <strong>EVERY OTHER</strong> time. With the <strong>François</strong>’s version, it <strong>correctly</strong> find all characters, except for the <strong>very first</strong> of each line !</p>
</li>
<li>
<p dir="auto"><strong>François-R Boyer</strong> also created a new option <strong>SCFIND_REGEXP_LOCALEORDER</strong>, to get ranges of characters, in a <strong>locale</strong> order, NOT in <strong>Unicode</strong> order. For instance, the regex range <strong><code>[A-B]</code></strong>, <em>with the <strong>Match case</strong> option SET</em>, would match all the following characters <strong>AÀÁÂÃÄÅĀĂĄǍǺẠẢẤẦẨẪẬẮẰẲẴẶǼB</strong>, in a true <strong>UTF-8</strong> file, with a suitable font !</p>
</li>
<li>
<p dir="auto">To end with, the <strong>François-R Boyer</strong>’s version could display the <strong>EXACT error</strong> messages, instead of the generic message <strong>Invalid regular expression</strong>. For instance, the regex <strong><code>(\d+ab</code></strong> would report the <strong>Unmatched marking parenthesis</strong> error message !</p>
</li>
</ul>
<hr />
<p dir="auto">So, <strong>h-h-h-h</strong>, it wouldn’t be worth switching to the <strong>PCRE2</strong> regex engine, <em>while keeping <strong>all</strong> these <strong>issues</strong>.</em> To my mind, we should aim the <strong>best</strong> regex engine but, also, the <strong>best</strong> replacement tool and the <strong>best</strong> integration to Notepad++ ! Just remember that <strong>François-R Boyer</strong> could produce this <strong>nice</strong> version, with the <strong>present</strong> BOOST library only !</p>
<p dir="auto">I end this post with some links to the <strong>BOOST</strong> library. I haven’t the software abilities to verify these assertions, but I think that, in N++, we currently use the BOOST <strong>v1.55</strong> library, with the <strong>PERL</strong> syntax, and <strong>without</strong> the <strong>Unicode</strong> support !</p>
<p dir="auto">The <strong>Home BOOST C++ Regex</strong> library page can be found at :</p>
<p dir="auto"><a href="http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/index.html" rel="nofollow ugc">http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/index.html</a></p>
<p dir="auto">The BOOST regex <strong>SEARCH</strong> syntax is explained at :</p>
<p dir="auto"><a href="http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html" rel="nofollow ugc">http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html</a></p>
<p dir="auto">And the BOOST-Extended <strong>REPLACEMENT</strong> format syntax can be read at :</p>
<p dir="auto"><a href="http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html" rel="nofollow ugc">http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html</a></p>
<p dir="auto">Seemingly, the <strong>latest</strong> BOOST C++ Regex library version is Boost-Regex <strong>5.0.1</strong> ( Boost-<strong>1.59.0</strong> ). So, the <strong>latest</strong> main page, on <strong>BOOST C++ Regex</strong> library, can be obtained at :</p>
<p dir="auto"><a href="http://www.boost.org/doc/libs/1_59_0/libs/regex/doc/html/index.html" rel="nofollow ugc">http://www.boost.org/doc/libs/1_59_0/libs/regex/doc/html/index.html</a></p>
<p dir="auto">And the <strong>history</strong> of the the <strong>BOOST C++ Regex</strong> library is at :</p>
<p dir="auto"><a href="http://www.boost.org/doc/libs/1_59_0/libs/regex/doc/html/boost_regex/background_information/history.html" rel="nofollow ugc">http://www.boost.org/doc/libs/1_59_0/libs/regex/doc/html/boost_regex/background_information/history.html</a></p>
<p dir="auto">Best Regards</p>
<p dir="auto">guy038</p>
<p dir="auto">P.S. :</p>
<ul>
<li>
<p dir="auto">Concerning the <strong>SEARCH</strong> regex documentation, there are few <strong>typographic</strong> and <strong>syntactic</strong> errors ( which are different for each version ! ). If you still wonder about a <strong>specific</strong> BOOST syntax, I’ll be able to point out all these errors, next time !</p>
</li>
<li>
<p dir="auto">From the <strong>two</strong> links below, I’m going to determine, shortly, ALL the syntaxes, that are <em><strong>NOT SUPPORTED</strong> yet, by the present <strong>BOOST</strong> regex engine</em>, implemented in Notepad++.</p>
</li>
</ul>
<p dir="auto"><a href="http://www.regular-expressions.info/refflavors.html" rel="nofollow ugc">http://www.regular-expressions.info/refflavors.html</a></p>
<p dir="auto"><a href="http://www.regular-expressions.info/refreplace.html" rel="nofollow ugc">http://www.regular-expressions.info/refreplace.html</a></p>
]]></description><link>https://community.notepad-plus-plus.org/post/11069</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/11069</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sat, 17 Oct 2015 17:48:02 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Mon, 07 Sep 2015 00:51:21 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a>:<br />
The syntax <code>(apple)|(lemon)</code>, <code>(?1pear)(?2orange)</code> is a good one. Doesn’t PCRE have something similar? Is it possible with named capturing groups, too?</p>
<p dir="auto">These boost versions you speak of don’t seem to be maintained as good as PCRE2. One year you mentioned was 2013. Further, I think PCRE2 has a syntax better known. To me this is important. <a href="http://regular-expressions.info" rel="nofollow ugc">regular-expressions.info</a> doesn’t even mention boost.</p>
<p dir="auto">You mentioned positive aspects about PCRE2. So, you aren’t against it?</p>
<p dir="auto">Where did you get the information about the boost regex syntax? I haven’t found a boost  regex documentation.</p>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/14">@MAPJe71</a>:<br />
That’s strange because PCRE is also an official name of a regex library with a specific syntax.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/10738</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/10738</guid><dc:creator><![CDATA[h-h-h-h]]></dc:creator><pubDate>Mon, 07 Sep 2015 00:51:21 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Mon, 07 Sep 2015 00:46:19 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/2425">@h-h-h-h</a></p>
<p dir="auto">The term PCRE Search/Replace just states that the regex engine used is PERL compatible.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/10737</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/10737</guid><dc:creator><![CDATA[MAPJe71]]></dc:creator><pubDate>Mon, 07 Sep 2015 00:46:19 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Sun, 06 Sep 2015 17:08:10 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <strong>h-h-h-h</strong>,</p>
<p dir="auto">Concerning your example, with a list of <strong>fruits</strong>, let’s suppose the <strong>four</strong> wanted replacements, below :</p>
<pre><code>apple      -&gt; pear
lemon      -&gt; orange
strawberry -&gt; raspberry
apricot    -&gt; plum
</code></pre>
<p dir="auto">With the current <strong>BOOST</strong> regex engine, used in N++, you can use the following <strong>S/R</strong> :</p>
<p dir="auto">SEARCH = <strong><code>(apple)|(lemon)|(strawberry)|(apricot)</code></strong> and REPLACE = <strong><code>(?3raspberry)(?2orange)(?4plum)(?1pear)</code></strong>. Then, after a click on the <strong>Replace All</strong> button, the list below :</p>
<pre><code>lemon
apricot
apple
strawberry
</code></pre>
<p dir="auto">is changed into :</p>
<pre><code>orange
plum
pear
raspberry
</code></pre>
<p dir="auto">Notes :</p>
<ul>
<li>
<p dir="auto">You’ll notice that, in the <strong>replacement</strong> block, the <strong>conditional</strong> replacements don’t need to be enumerated, in the <strong>same order</strong>, than in the <strong>search</strong> block.</p>
</li>
<li>
<p dir="auto">A trick : if a <strong>replacement</strong> string, for instance, relative to the group <strong>#5</strong>, begins with a <strong>number</strong> and contains <strong>parenthesis</strong>, you can use the syntax <strong><code>(?{5}123\(abc\))</code></strong></p>
</li>
</ul>
<hr />
<ul>
<li>
<p dir="auto"><strong>Free Spacing</strong> is ALLOWED, with the BOOST regex library. For instance, the search regex <strong><code>(?x) ( S \. O \. S \. ) \  \- \  \1</code></strong>, or also, the regex <strong><div class="plugin-markdown"><input type="checkbox" />\- [ ] \1</div></strong>, both, match the subject string <strong>S.O.S. - S.O.S.</strong></p>
</li>
<li>
<p dir="auto">The two <strong>BOOST</strong> syntaxes <strong><code>(?#.......)</code></strong> and <strong><code>(?x)...#.........</code></strong> define a <strong>COMMENT</strong> string. For instance, the <strong>five</strong> regexes, below, are <strong>equivalent</strong> to the simple regex <strong><code>T+CA</code></strong> :</p>
</li>
</ul>
<p dir="auto"><strong><code>T+(?# UPPER T, 1 or MORE times)CA</code></strong></p>
<p dir="auto"><strong><code>T+CA(?#UPPER T, 1 or MORE times, followed with 'CA' )</code></strong></p>
<p dir="auto"><strong><code>(?x) T+      (?# UPPER T, 1 or MORE times) CA</code></strong></p>
<p dir="auto"><strong><code>(?x) T+ CA   (?# UPPER T, 1 or MORE times, followed with 'CA' )</code></strong></p>
<p dir="auto"><strong><code>(?x) T+ C A   #  UPPER T, 1 or MORE times, followed with 'CA'</code></strong></p>
<hr />
<p dir="auto">I quite agree with your GitHub issue <strong>#565</strong>. But, presently, it would <strong>still</strong> be, like below !!</p>
<pre><code> •- Search mode -------------------------•
 | ( ) Normal                            |
 | ( ) Extended (\r, \n, \t, \x..., \0)  |
 | (•) Regular expression (BOOST 1.55.0) |
 |     [ ] . matches newline             |
 •---------------------------------------•
</code></pre>
<p dir="auto">Did you have a try of the <strong>François-R Boyer</strong>’s version ? It’s a very <strong>powerful</strong> one !</p>
<p dir="auto">BTW, as my <strong>present</strong> knowledge about <strong>C/C++</strong> is rather near <strong>zero</strong>, it would be nice if someone could merge that <strong>improved</strong> <strong>François-R Boyer</strong>’s version of the N++ regex engine, in the present <strong>Scilexer.dll</strong> file, based on <strong>Scintilla v3.3.4</strong> !</p>
<p dir="auto">And, generally speaking, may someone be able to find a way to <strong>include</strong> that <strong>improved</strong> version, whatever the <strong>both</strong> versions of <strong>N++</strong> and <strong>Scintilla</strong> are ?</p>
<p dir="auto">Cheers,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/10733</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/10733</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 06 Sep 2015 17:08:10 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Sun, 06 Sep 2015 15:37:13 GMT]]></title><description><![CDATA[<blockquote>
<p dir="auto">BTW, this post is quite long and not easily readable</p>
</blockquote>
<p dir="auto">Indeed. Also, I don’t have this much insight to compare boost and PCRE2. Before starting this thread, I wasn’t even aware of the usage of the boost library because the Notepad++ website states using PCRE. PCRE2 ist just the future of PCRE.</p>
<p dir="auto">You can read a feature that’s surely missing in the boost library on the issue page: <a href="https://github.com/notepad-plus-plus/notepad-plus-plus/issues/816" rel="nofollow ugc">https://github.com/notepad-plus-plus/notepad-plus-plus/issues/816</a>.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/10719</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/10719</guid><dc:creator><![CDATA[h-h-h-h]]></dc:creator><pubDate>Sun, 06 Sep 2015 15:37:13 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Sun, 06 Sep 2015 15:25:42 GMT]]></title><description><![CDATA[<p dir="auto">Hello <strong>h-h-h-h</strong>, <strong>milipili</strong> and <strong>All</strong>,</p>
<p dir="auto">So, <strong>h-h-h-h</strong> and <strong>milipili</strong>, you would prefer to switch to the <strong>PCRE2 regex</strong> library. May I ask you what are the <strong>main reasons</strong> for ?</p>
<p dir="auto">Below, I tried to examine some differences between <strong>BOOST</strong> and <strong>PCRE</strong> regex library and, to my mind, <strong>we don’t lack important features</strong>, keeping our <strong>present</strong> regex engine ! So, it’s up to you to tell me in the different ways I could be <strong>wrong</strong> :-)</p>
<p dir="auto">BTW, this post is quite <strong>long</strong> and not easily readable :-(( So, have a <strong>drink</strong> and begin reading this damned post !!!</p>
<p dir="auto">I end, with a link to an <strong>improved</strong> version of our <strong>BOOST</strong> regex library, created by <strong>François-R Boyer</strong>, on <strong>May 2013</strong>, which could be good enough, for most of N++ users !?</p>
<hr />
<p dir="auto">At the bottom of the <strong>Wikipedia</strong> article below,</p>
<p dir="auto"><a href="https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions" rel="nofollow ugc">https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions</a></p>
<p dir="auto">there are a description of some differences, between <strong>PCRE</strong> and <strong>PERL</strong> regex expressions.</p>
<p dir="auto">Below, I’ll try to test the current <strong>Boost Regex</strong> library, <strong>v1.55</strong>, included in N++, by <strong>Dave Brotherstone</strong>, from the <strong>6.0</strong> version, against the given differences !</p>
<p dir="auto">Given a slight modification of the <strong>first</strong> example, the regex <strong><code>^(&lt;(?:[^&lt;!&gt;]+|(?2)|(?1))*&gt;)(!&gt;!&gt;!&gt;)$</code></strong> does match the subject string <strong>&lt;&lt;&lt;&lt;!&gt;!&gt;!&gt;&gt;&gt;&gt;&lt;&gt;&gt;!&gt;!&gt;!&gt;</strong>.</p>
<p dir="auto">The process can be split in :</p>
<pre><code>    &lt; &lt; &lt; &lt; !&gt;!&gt;!&gt; &gt; &gt; &gt;    &lt;&gt;  &gt;    !&gt;!&gt;!&gt;
4         ----------
3       --------------
1     ------------------    --
0   -----------------------------    ------
</code></pre>
<p dir="auto">Although the FIRST alternative <strong><code>[^&lt;!&gt;]</code></strong>, of the NON capturing group, can’t <strong>NEVER</strong> be matched, in the subject string, either, at level <strong>0</strong>, <strong>outside</strong> recursion OR, in <strong>higher</strong> levels, in recursion, the TWO other alternatives ( the <strong>called</strong> subpattern <strong><code>(?2)</code></strong>, idem <strong>!&gt;!&gt;!&gt;</strong> OR the <strong>recursive</strong> subpattern <strong><code>(?1)</code></strong>, have also been tried, in the RECURSION process, by the regex engine.</p>
<p dir="auto">Therefore, seemingly, with the BOOST regex library, <em>RECURSIVE matches are <strong>NON atomic</strong></em>, like in PERL, and UNLIKE PRCE.</p>
<hr />
<p dir="auto">If we consider the Search-Replacement SEARCH = <strong><code>^(a(b|c){0,3})+$</code></strong>  and REPLACE = <strong><code>&gt;\1&lt;&gt;\2&lt;</code></strong>,</p>
<p dir="auto">Against the subject string <strong>abbababbaccca</strong>, we obtain the replacement string <strong>&gt;a&lt;&gt;c&lt;</strong>.</p>
<p dir="auto">So, with the BOOST regex library, like in PCRE and unlike PERL, any quantified capture group, <em>with LOW limit is <strong>0</strong></em>, contains the <strong>last NON NULL</strong> value matched, of that group, <em><strong>EVEN IF</strong> the last match, of the subject string, DOESN’T include that group.</em></p>
<hr />
<p dir="auto">The different <strong>backtracking control</strong> verbs, (*FAIL), (*F), (*PRUNE), (*SKIP), (*THEN), (*COMMIT), (<em>ACCEPT) and (</em>:NAME), inside a regex pattern, implies the <strong>invalid regular expression</strong> message, in the Replace dialog.</p>
<p dir="auto">So, the <strong>backtracking control</strong> verbs are <strong>NOT</strong> allowed, with the current N++ BOOST regex library.</p>
<hr />
<p dir="auto">The regex <strong><code>(?&lt;A456&gt;\d+)\l+\g&lt;A456&gt;</code></strong> does match the following strings :</p>
<pre><code>123text123
0text0
99999text99999
</code></pre>
<p dir="auto">But, the regex <strong><code>(?&lt;456&gt;\d+)\l+\g&lt;456&gt;</code></strong> is considered as an <strong>INVALID</strong> regular expression.</p>
<p dir="auto">So, with the BOOST regex library, like in PERL and unlike PRCE, <strong>names</strong> of capture groups must NOT be <strong>TRUE</strong> numbers.</p>
<hr />
<p dir="auto">The form <strong><code>(?!.*s{3,5}).+</code></strong>, that you may test against the example text below, is a <strong>valid</strong> regex, in N++.</p>
<pre><code>aaaa
aaaaaas123
aaass123456
aaaaaaaaaaasss78
assss0000000000000
aaaaaasssss99999
</code></pre>
<p dir="auto">Then, with the BOOST regex library, <strong>NEGATIVE look-ahead</strong> can, seemingly, contain <strong>quantifiers</strong>.</p>
<hr />
<p dir="auto">On my old <strong>Win XP</strong> laptop ( with <strong>1 Gb</strong> of RAM only ! ), the regex <strong><code>(.+)+X</code></strong> does match the following TWO strings</p>
<pre><code>cccccXaaaa
cccccXaaaaaaaaaaaaaaa
</code></pre>
<p dir="auto">but, <strong>wrongly</strong>, select <strong>ALL</strong> the file, with the <strong>longer</strong> subject string, below :</p>
<pre><code>cccccXaaaaaaaaaaaaaaaaaaaaaaaaa
</code></pre>
<p dir="auto">This is due to the <strong>multiple</strong> matching tries of the <strong>combination</strong> of the two <strong>PLUS</strong> quantifiers, during <strong>backtracking</strong> from the <strong>end</strong> of the subject string to the <strong>X</strong> character. Of course, the <strong>limit</strong>, between these <strong>two</strong> behaviours, may change, according to your technical configuration !</p>
<p dir="auto">Therefore, as PCRE and unlike PERL, seemingly, the BOOST regex library has a <strong>HARD limit</strong> in <strong>recursion depth</strong>.</p>
<p dir="auto">Just compare with the more simple regex <strong><code>(.+)X</code></strong> which perfectly works, whatever the <strong>length</strong> of the subject string.</p>
<hr />
<p dir="auto">Now, from the link, below,</p>
<p dir="auto"><a href="http://www.rexegg.com/pcre-documentation.html" rel="nofollow ugc">http://www.rexegg.com/pcre-documentation.html</a></p>
<p dir="auto">if I test the BOOST regex library, on all the points, listed on that page, beginning with the <strong>oldest</strong>, the <strong>missing</strong> features, comparing to true <strong>PCRE</strong> patters, <em>and NOT previously discussed,</em> are the following :</p>
<ul>
<li>
<p dir="auto">The inline modifier <strong><code>(?U)</code></strong>, to turn on the <strong>ungreedy</strong> mode, is absent. Therefore we need, systematically, to add the <strong>question mark</strong> character, <strong>after</strong> a quantifier, to get an <strong>ungreedy</strong> behaviour, in regular expressions.</p>
</li>
<li>
<p dir="auto">The <strong>named</strong> groups, written <strong><code>(?P&lt;foo&gt;....)</code></strong>, are not allowed, nor are the <strong>back-references</strong> <strong><code>(?P=foo)</code></strong>. However, these forms can be changed, with BOOST, into <strong><code>(?&lt;foo&gt;....)</code></strong> and the back-references <strong><code>\g&lt;foo&gt;</code></strong> or <strong><code>\k&lt;foo&gt;</code></strong>.</p>
</li>
<li>
<p dir="auto">The <strong>callouts</strong> <strong><code>(?C#) </code></strong> and <strong><code>(?C'abc')</code></strong>, which can call an <strong>external function</strong>, are, seemingly, <strong>NOT</strong> supported by the BOOST regex library, but it’s rather <strong>useless</strong>, as for the simple <strong>S/R</strong> dialog, used in Notepad++.</p>
</li>
<li>
<p dir="auto">The form <strong><code>\C</code></strong>, which matches a <strong>single</strong> byte, <strong>EVEN</strong> in <strong>UTF-8</strong> mode, doesn’t work and, with the BOOST regex library, is just an equivalent to the <strong>DOT</strong> special character. They, both, stand for <strong><code>[^\n\f\r]</code></strong></p>
</li>
</ul>
<p dir="auto">Using <strong>PCRE</strong>, a <strong>safe</strong> syntax to manage the <strong>individual UTF-8</strong> bytes of characters, could be the following regex :</p>
<p dir="auto"><strong><code>(?x) (?| (?=[\x00-\x7f])(\C) | (?=[\x80-\x{7ff}])(\C)(\C) | (?=[\x{800}-\x{ffff}])(\C)(\C)(\C) | (?=[\x{10000}- \x{1fffff}])(\C)(\C)(\C)(\C))</code></strong></p>
<ul>
<li>
<p dir="auto">The <strong>[negative] Unicode categories</strong> forms, as <strong><code>\p{L}</code></strong> or <strong><code>\P{Nd}</code></strong>, and the <strong>Unicode script names</strong> forms, as <strong><code>\p{Arabic}</code></strong>, are <strong>NOT</strong> supported in the BOOST regex library, because, in N++, it has been compiled <em><strong>WITHOUT</strong> the <strong>“Unicode character property support”</strong></em>. However, note that matching characters, by <strong>Unicode</strong> property, isn’t very fast, because it had to search in a structure of <strong>over 15000</strong> characters, even in the Basic Multilingual Plane ( <strong>BMP</strong> ) only !</p>
</li>
<li>
<p dir="auto">Unlike PCRE, from <strong>v7.20</strong>, the <strong>conditional relative capture groups</strong> are NOT allowed with the BOOST regex library. For instance the BOOST regex <strong><code>(\d)?\d : (?(1)two|one) digit(?(1)s) number</code></strong> does matches the two strings <strong>23 : two digits number</strong> and <strong>5 : one digit number</strong>, but the regex <strong><code>(\d)?\d : (?(-1)two|one) digit(?(-1)s) number</code></strong> is an INVALID regular expression ! Luckily, it isn’t used very often and there are plenty of <strong>equivalent</strong> regexes. For instance, the above example could be simply rewritten : <strong><code>\d(\d : two digits| : one digit) number</code></strong> !</p>
</li>
<li>
<p dir="auto">The <strong>Line Break</strong> modifiers ( <strong><code>(*CR)</code></strong>, <strong><code>(*LF)</code></strong>, <strong><code>(*CRLF)</code></strong>, <strong><code>(*ANYCRLF)</code></strong> and <strong><code>(*ANY)</code></strong> ), as well as the <strong>BSR</strong> modifiers ( <strong><code>(*BSR_ANYCRLF)</code></strong> and <strong><code>(*BSR_UNICODE)</code></strong> ), the <strong>UTF</strong> modifiers ( <strong><code>(*UTF)</code></strong>, <strong><code>(*UTF8)</code></strong>, <strong><code>(*UTF16)</code></strong> and <strong><code>(*UTF32)</code></strong> ) and the <strong>Unicode</strong> modifier <strong><code>(*UCP)</code></strong> don’t exist in the BOOST regex library.</p>
</li>
<li>
<p dir="auto">The syntax <strong><code>\N</code></strong>, which matches, in PCRE, any character <strong>different</strong> than a <strong>line break</strong>, <strong>EVEN</strong> when the <strong><code>(?s)</code></strong> begins the regex, is an INVALID form, in the BOOST regex library. The same result can be obtained, in N++, with the regex <strong><code>[^\n\r]</code></strong></p>
</li>
<li>
<p dir="auto">Finally, all the new <strong>options</strong> and <strong>control</strong> verbs, starting with the new API, <strong>PCRE2</strong>, are <strong>NOT</strong> supported, in the BOOST regex library !</p>
</li>
</ul>
<hr />
<p dir="auto">To my mind and to sum up, except for the <strong><code>\C</code></strong> syntax and, may be, the <strong>line break</strong> and the <strong>encoding</strong> modifiers, we don’t miss <strong>major</strong> features, with the current <strong>BOOST</strong> regex library.</p>
<p dir="auto">On the contrary, if we move to the <strong>PCRE2</strong> library, we likely miss <strong>two</strong> main features, of the BOOST regex library, used in the <strong>Replacement</strong> part :</p>
<ul>
<li>
<p dir="auto">The <strong>CONDITIONAL replacements</strong> ( <strong><code>(?#...)</code></strong> and <strong><code>(?#...:...)</code></strong> ). Let’s suppose a list of <strong>names</strong> and <strong>ages</strong>, below :</p>
<pre><code>Peter
35
John
52
Marie
18
</code></pre>
</li>
</ul>
<p dir="auto">If the search regex is <strong><code>^(\d+)?.+$</code></strong> and the replacement <strong><code>(?1Age :Name) : $&amp;</code></strong>, at once, that list is, <strong>magically</strong>, changed into :</p>
<pre><code>Name : Peter
Age  : 35
Name : John
Age  : 52
Name : Marie
Age  : 18
</code></pre>
<ul>
<li>The <strong>case</strong> modifiers ( <strong><code>\U</code></strong>, <strong><code>\L</code></strong>, <strong><code>\u</code></strong>, <strong><code>\l</code></strong> and  <strong><code>\E</code></strong> ). For instance, given the subject string <strong>ShaKesPeare wiLLiam</strong>, the SEARCH regex <strong><code>(\w+) (\w)(\w+)</code></strong> and the REPLACEMENT syntax <strong><code>\U\1 \2\L\3</code></strong>, we obtain the string <strong>SHAKESPEARE William</strong></li>
</ul>
<hr />
<p dir="auto">On this page, below, relative to the new <strong>PCRE2</strong> version,</p>
<p dir="auto"><a href="https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html" rel="nofollow ugc">https://lists.exim.org/lurker/message/20150105.162835.0666407a.en.html</a></p>
<p dir="auto">it is said, at point #5 :</p>
<blockquote>
<p dir="auto">Patterns, subject strings, and replacement strings may <strong>all</strong> contain <strong>binary<br />
zeros</strong> and, for this reason, are always passed as a pointer and a length.</p>
</blockquote>
<p dir="auto">Presently, the <strong>BOOST</strong> regex library can deal with <strong>NUL</strong> characters in <strong>subject</strong> strings and in <strong>search</strong> regexes, but are <strong>NOT ALLOWED</strong>, in <strong>replacement</strong> strings.</p>
<p dir="auto">Luckily, if you install the improved <strong>François-R Boyer</strong> version, of the BOOST regex engine, you’ll get some <strong>strong</strong> new features :</p>
<ul>
<li>
<p dir="auto">Search is performed in <strong>32 bits</strong> code-points, so it can handle characters, <strong>over the BMP</strong> ( Basic Multilingual Plane ). An interesting feature for most <strong>Asiatic</strong> people !</p>
</li>
<li>
<p dir="auto">It can manage <strong>NUL</strong> characters, both, in <strong>search</strong> and <em>in <strong>replacement</strong></em>, too.</p>
</li>
<li>
<p dir="auto"><strong>Look-behinds</strong> are correctly handled, even in case of <strong>OVERLAPPING</strong>, with the end of the <strong>previous</strong> match.</p>
</li>
<li>
<p dir="auto">It can handle <strong>ALL</strong> the Universal Character Names ( <strong>UCN</strong>) of the <strong>UCS Transformation Format</strong> , from <strong><code>\x{0}</code></strong> to <strong><code>\x{7FFFFFFF}</code></strong>, particularly, all those of code-points over <strong><code>\x{FFFF}</code></strong>, which are outside the <strong>BMP</strong>.</p>
</li>
<li>
<p dir="auto">The <strong>backward</strong> regex search isn’t <strong>stopped</strong>, on matching a character, with <strong>Unicode</strong> code-point over <strong><code>\x{00FF}</code></strong></p>
</li>
</ul>
<hr />
<p dir="auto">To get this <strong>Beta N++ regex code</strong> ( that has <strong>NEVER</strong> been part of an <strong>official</strong> N++ release ) :</p>
<ul>
<li>
<p dir="auto">Rename your present <strong>SciLexer.dll</strong> file as, for instance, <strong><a href="http://SciLexer.xxx" rel="nofollow ugc">SciLexer.xxx</a></strong></p>
</li>
<li>
<p dir="auto">Download, from the link below, the modified <strong>SciLexer.dll</strong> file. of <strong>François-R Boyer</strong></p>
</li>
</ul>
<p dir="auto"><a href="http://sourceforge.net/projects/npppythonplugsq/files/Beta%20N%2B%2B%20regex%20code/" rel="nofollow ugc">http://sourceforge.net/projects/npppythonplugsq/files/Beta N%2B%2B regex code/</a></p>
<ul>
<li>Copy this file, in the <strong>installation</strong> folder, along with the <strong>Notepad++.exe</strong> and the <strong><a href="http://SciLexer.xxx" rel="nofollow ugc">SciLexer.xxx</a></strong> files</li>
</ul>
<p dir="auto"><strong>IMPORTANT</strong> :</p>
<p dir="auto">Don’t forget that this modified <strong>SciLexer.dll</strong>, build on <strong>May 2013</strong>, <em>is based on the old <strong>Scintilla v2.2.7</strong> !</em></p>
<hr />
<p dir="auto">Thank you, very much, to be <strong>still</strong> there and quite <strong>awoken</strong> !!!</p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/10708</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/10708</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sun, 06 Sep 2015 15:25:42 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Sat, 29 Aug 2015 11:16:03 GMT]]></title><description><![CDATA[<p dir="auto">@milipili said:</p>
<blockquote>
<p dir="auto">Actually we would like to get rid of boost::regex and to directly use pcre.</p>
</blockquote>
<p dir="auto">That’s good news!</p>
]]></description><link>https://community.notepad-plus-plus.org/post/10430</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/10430</guid><dc:creator><![CDATA[h-h-h-h]]></dc:creator><pubDate>Sat, 29 Aug 2015 11:16:03 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Fri, 28 Aug 2015 21:41:46 GMT]]></title><description><![CDATA[<p dir="auto"><em>Note:</em> Here’s the issue: <a href="https://github.com/notepad-plus-plus/notepad-plus-plus/issues/816" rel="nofollow ugc">https://github.com/notepad-plus-plus/notepad-plus-plus/issues/816</a>.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/10422</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/10422</guid><dc:creator><![CDATA[h-h-h-h]]></dc:creator><pubDate>Fri, 28 Aug 2015 21:41:46 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Fri, 28 Aug 2015 13:39:00 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/154">@gerdb42</a> said:</p>
<blockquote>
<p dir="auto">Notepad++ does not use the lib from <a href="http://www.pcre.org" rel="nofollow ugc">www.pcre.org</a>. It uses <a href="http://www.boost.org" rel="nofollow ugc">boost::regex</a>, which is completely unrelated to PCRE.</p>
</blockquote>
<p dir="auto">Why then does <a href="https://notepad-plus-plus.org/features/" rel="nofollow ugc">the features page</a> say:</p>
<blockquote>
<p dir="auto">PCRE (Perl Compatible Regular Expression) Search/Replace</p>
</blockquote>
<p dir="auto">Is it just an implementation of the exact same rules?</p>
]]></description><link>https://community.notepad-plus-plus.org/post/10416</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/10416</guid><dc:creator><![CDATA[h-h-h-h]]></dc:creator><pubDate>Fri, 28 Aug 2015 13:39:00 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Fri, 28 Aug 2015 10:47:06 GMT]]></title><description><![CDATA[<p dir="auto">Notepad++ does not use the lib from <a href="http://www.pcre.org" rel="nofollow ugc">www.pcre.org</a>. It uses <a href="http://www.boost.org" rel="nofollow ugc">boost::regex</a>, which is completely unrelated to PCRE.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/10413</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/10413</guid><dc:creator><![CDATA[gerdb42]]></dc:creator><pubDate>Fri, 28 Aug 2015 10:47:06 GMT</pubDate></item><item><title><![CDATA[Reply to Is it planned to switch to PCRE2? on Fri, 28 Aug 2015 02:46:14 GMT]]></title><description><![CDATA[<p dir="auto">Okay. I’ve done that…</p>
]]></description><link>https://community.notepad-plus-plus.org/post/10404</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/10404</guid><dc:creator><![CDATA[h-h-h-h]]></dc:creator><pubDate>Fri, 28 Aug 2015 02:46:14 GMT</pubDate></item></channel></rss>