@neil-schipper ,
Reading the specs, \d and \R are “peers”,
I disagree. But I do agree that maybe it could be explained better in the Notepad++ Searching document. However, that document does point you to the canonical Boost regex documentation, which is the official spec for the regex used by Notepad++; and, in my opinion, the Boost documents can only be interpreted to say that \R behaves differently than \d or \r or \n or even \h or \v or \s.
In that document, you will see that \r, \n, \t, \v and others are listed under the sentence, “The following escape sequences are all synonyms for single characters:” – meaning that each of those sequences matches only a single character at a time. So \v might match any of the vertical spaces (CR, LF, and the weird ones), but a single \v in a regex will only match a single character at a time. So if you had the string AB\r\n and matched for \v, the first FIND would find just the \r.
The \R is described in its own section called “Matching Line Endings”, which shows that it expands into (?>\x0D\x0A?|[\x0A-\x0C\x85\x{2028}\x{2029}]), which is an expression with parentheses around it and including an internal alternation | – searching for (?>), you find it’s the syntax for an independent sub-expression . This is different than all the single-character escapes listed previously. With the same string AB\r\n, searching for \R, the first FIND would match the two-character sequence \r\n. The \R behaves differently than all those other single-character escapes, because it can match multiple characters at once.
Then if you back up a few paragraphs to Character sets, you will see the rules for character sets, including the sentence, “A bracket expression may contain any combination of the following:”. The sub-sections that follow underneath that are “Single characters”, “Character ranges”, “Negation”, “Character classes”, “Collating Elements”, “Collating Elements”, “Equivalence classes”, “Escaped Characters”, and “Combinations”. Note that none of those include “independent sub-expression”, or any other term that references a parentheses-based expression.
The bracket[]-based character sets cannot contain parentheses()-based expressions. That is why \R does not work in a bracket[]-based class.
An updated version of the usermanual mentions of \R can be found at https://github.com/pryrt/npp-usermanual/blob/backslashBigR/content/docs/searching.md (that temporary URL will be changed to the permanent URL by moderator power once the changes are merged into the main usermanual repository)
First, it’s been moved out of the
Control Characters section into its own special section: 9fe5e730-8a6e-4086-a0bf-07a317ee0e98-image.png
Second, the
Character Classes section has been improved to note that character classes cannot contain any parentheses-based group, including \R. 667dc9c3-3c5a-4d48-b84f-e1e56ece7d9e-image.png
Third, in the
Character Escape Sequences section, which contains the \h, \v, and \s (and thus people might assume that \R fits in there), it is clarified that being a group causes \R to be treated differently: 8cf239b3-bb64-4bdc-ae17-f5735e483916-image.png
Hopefully, this is sufficient description in enough locations that it will prevent future confusion when users are looking up the meaning of \R and whether or not it can go inside a character class.