Hi Alan and MapJe71,
Thanks, MapJe71, for the link about Word Boundaries, from the definitive site about regular expressions ! Of course, Alan, I know the differences between the three assertions : \b , \< and \>. I just preferred not to speak about it, first, in order to keep concentrated on your problem !
To be short, the \b assertion acts, either, as a \< assertion OR as a \> assertion. This explains that the regex \<WORD\> can be simply replaced by the regex \bWORD\b.
BTW, in the Words Boundaries table, I noticed the POSIX word boundaries ( [[:<:]] and [[:>:]] ) which have, exactly, the same meaning as the GNU word boundaries \< and >\ ). These syntaxes are functional, with the N++ Boost regex engine ! Unfortunately, Alan, the problem that you noticed does occur with the POSIX word boundaries, too :-((.
On top of that, from the LAST row of the “Word Boundaries” table, named Word Boundaries behaviour, it is said that “word boundaries” are not correctly handled, in most regex engines :
Word boundaries always match at the start of the match attempt if that position is followed by a word character, regardless of the character that precedes the start of the match attempt. (Thus, word boundaries are not handled correctly for the second and following match attempts in the same string.)
And it shows an example :
\b. matches all of the letters but not the space when iterating over all matches, in the string “abc def”
So, I did some tests ( again !! )
I copied this
single sentence, below, part of the
license.txt file, in a
new tab
By contrast, the GNU General Public License is intended to guarantee your freedom...
In the Find dialog, I left the Match case and the . matches newline options UNCHECKED
I selected, of course, the Regular expression search mode
I tested the different regexes, below, against the example text
REMARK : In the table, below, each dash character, under the sentence, indicates a match of the corresponding regex(es) !
========================================================================================================================
| REGEXES | EXAMPLE text - MATCHES noted by a DASH character | RESULTS |
========================================================================================================================
| | | |
| | By contrast, the GNU General Public License is intended to guarantee your freedom... | INCORRECT ! |
| (^|(?<!\w)). | ------------------------------------------------------------------------------------ | |
| | | |
+-----------------+--------------------------------------------------------------------------------------+-------------+
| | | |
| \b. | | |
| \<. | | |
| [[:<:]]. | | |
| | | |
| | By contrast, the GNU General Public License is intended to guarantee your freedom... | INCORRECT ! |
| \b\w | -- -------- --- --- ------- ------ ------- -- -------- -- --------- ---- ------- | |
| \<\w | | |
| [[:<:]]\w | | |
| (^|(?<!\w))\w | | |
| | | |
+-----------------+--------------------------------------------------------------------------------------+-------------+
| | | |
| | By contrast, the GNU General Public License is intended to guarantee your freedom... | INCORRECT |
| (^|(?<=\W)). | - - - - - - - - - - - - - - | |
| | | |
+-----------------+--------------------------------------------------------------------------------------+-------------+
| | | (At last !) |
| | By contrast, the GNU General Public License is intended to guarantee your freedom... | |
| (^|(?<=\W))\w | - - - - - - - - - - - - - | CORRECT |
| | | |
==================+======================================================================================+==============
| | | |
| | By contrast, the GNU General Public License is intended to guarantee your freedom... | INCORRECT ! |
| .\b | -- - - -- -- -- -- -- -- -- -- -- -- - | |
| | | |
+-----------------+--------------------------------------------------------------------------------------+-------------+
| | | |
| | | |
| .((?=\W)|$) | By contrast, the GNU General Public License is intended to guarantee your freedom... | INCORRECT ! |
| .((?!\w)|$) | - -- - - - - - - - - - - ---- | |
| | | |
| | | |
+-----------------+--------------------------------------------------------------------------------------+-------------+
| | | |
| .\> | | |
| .[[:>:]] | | |
| | | |
| \w\b | By contrast, the GNU General Public License is intended to guarantee your freedom... | CORRECT |
| \w\> | - - - - - - - - - - - - - | |
| \w[[:>:]] | | |
| \w((?=\W)|$) | | |
| \w((?!\w)|$) | | |
| | | |
========================================================================================================================
From that table, it obvious that the handle of the assertions, by the N++ Boost engine, seems quite weird !!!
To be coherent, only two regexes, with similar syntax, should be used :
The regex (^|(?<=\W))\w, which matches the FIRST character of a word
The regex \w((?=\W)|$), which matches the LAST character of a word
=> The regex (^|(?<=\W))\w|\w((?=\W)|$) matches the first AND the last characters of a word
Best Regards,
guy038