Hi sngbrdb and All,
First of all, as MAPJe71 said and, as you found out yourself, the regex [^,], for instance, matches, absolutely any of the 128,172 characters of the last Unicode 9.0 version, except for the usual comma sign !. So, to prevent any match of the regex engine to extend on several lines, it’s a good practise to include, systematically, the two characters \r\n in a negative character class. Therefore, our example should be re-written [^,\r\n]
Just note that the shorten syntax \R, to match any kind of EOL character(s), cannot be used, inside a character class !
Now, I would like to clarify some points, relative to :
The two modifiers (?s) and (?m) and their opposite form (?-s) and (?-m)
The two assertions ^ and $
The dot meta-character .
As I presume that this post will be ( too ! ) long , just have a drink and… let’s go !
In the first place, it’s VERY important to realize that the two modifiers (?m) and (?s) do NOT deal of the same things :
The (?m) modifier, and its opposite form (?-m), change the meaning of the ^ and $ assertions
The (?s) modifier, and its opposite form (?-s), change the meaning of the . dot meta-character
By default, the regex engine of N++ considers any text as made of multiple lines. So :
The ^ symbol is a zero length assertion, which represents the location between an EOL character OR the very beginning of the current file and the first standard character of a line
The $ symbol is a zero length assertion, which represents the location between the last standard character of a line and an EOL character OR the very end of the current file
Although not necessary, and, especially, if all parts of your regex follows that behaviour, you may include, at the beginning of the regex, the (?m) modifier ( for multi-lines )
For instance, the regexes ^123 or (?m)^123 would match the 123 string of any line, which begins with the string 123 and the regexes 789$ or (?m)789$ would match the 789 string of any line, which ends with the string 789
On the contrary, when your regex begins with the (?-m) modifier ( for no multi-lines ) the regex engine considers all the contents of your current file as an unique line. So, the meaning of the ^ and $ symbols are restricted :
The ^ symbol becomes a zero length assertion, which represents the location before the very first character of the current file
The $ symbol becomes a zero length assertion, which represents the location after the very last character of the current file
For instance, the regex (?-m)^123 would match a 123 string, at the beginning of the very first line of the current file and the regex (?-m)789$ would match a 789 string, at the end of the very last line of the current file. Notice, this implies that no EOL character follows the string 789. Indeed, in that case, the string 789 would not really end the file !!
You’ll probably agree, as I do, that the behaviour of the regex engine, when using a (?-m) modifier, seems rather uninteresting :-(( Indeed, the two regexes, above, could be, simply, re-written as \A123 and 789\z, with the zero-length assertions \A and \z
VERY IMPORTANT :
If your regex does NOT contain any ^ symbol, nor $ symbol, the modifiers (?m) and/or (?-m) are quite USELESS !!
By default, if the “. matches new line” option is UNCHECKED, the regex engine of N++ considers that the dot meta-character matches a standard character, only, and skips any EOL character !
Although not necessary, and, especially, if all parts of your regex follows that behaviour, you may include, at the beginning of the regex, the (?-s) modifier ( for NO single line )
Then, if we consider the simple text, below, with the two EOL characters \r\n, after digit 5
12345
67890
The regexes .+ or (?-s).+ would match, successively, the strings 12345 and 67890
On the contrary, when your regex begins with the (?s) modifier ( For single line ) AND/OR if the ". matches new line" option is CHECKED, the N++ regex engine considers that the dot meta-character can match, absolutely, any character ( standard and EOL ones ) !
Therefore, on the sample text above, the regex (?s).+ would match the overall string 12345\r\n67890, in one go !
Notes :
The in-line modifiers (?s) and (?-s) have priority on the present state of the . matches new line option of the Find/Replace dialog. So :
Even if that option is checked, the regex (?-s).+ would match any standard text, till an EOL character, excluded
Even if that option is UNchecked, the regex (?s).+ would match all the subsequent text, till the end of the current file
Keep in mind that the combined use of the (?s) and (?-s) in-line modifiers, in a same regex, may be very interesting. For instance, the search of the regex (?s)(.+\R)(?-s)(.*123.*\R) and the replacement regex \2\1 would move the last line, containing the string 123, before the present contents of the current file, by swapping the two groups 1 and 2 !
VERY INPORTANT :
As above, for the (?m) modifier, if your regex does NOT contain any . dot symbol, the modifiers (?s) and/or (?-s) are quite USELESS !!
Finally, let’s see the action of the two modifiers, m and s, used together. Consequently to what I said, just above, any regex containing these two modifiers should contain, at least, one dot meta-character and, either, a ^ or a $ symbol ! It will be the case, as we’re going to use the two regexes .{100,350}$ and ^.{100,350}, each of them preceded by one of the four modifier’s form, below :
(?s-m) ( in short : ^ and $ symbols match beginning and end of file / . symbol matches any character )
(?-sm) ( in short : ^ and $ symbols match beginning and end of file / . symbol matches standard characters )
(?m-s) ( in short : ^ and $ symbols match beginning and end of line / . symbol matches standard characters )
(?sm) ( in short : ^ and $ symbols match beginning and end of line / . symbol matches any character )
To clearly notice the differences, between all these cases, let’s use the test text, below, corresponding to some parts of the license.txt file, slightly changed :
When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it.
----- Test line, which contains 60 characters, ONLY ! ------
For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights.
Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations.
Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary.
To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all
Once this text, pasted in a new tab, and before applying the regexes, just verify that :
The word When, beginning the first line of that text, is NOT preceded by any character
The word all, ending the last line of that text, is NOT followed by any EOL character !
And, preferably :
Select the Word wrap behaviour, with the menu option View - Word wrap
Select the Show all characters behaviour, with the menu option View - Show Symbols - Show All Characters
Finally, in the Find dialog :
UNCHECK the Wrap around option
Select the Regular expression search mode
A last advice : Before testing any of the regexes, below, just move back the cursor, at the very beginning of this sample text ( so, before the word When ), with the CTRL + Origin shortcut
Thus :
The regex (?s-m).{100,350}$ matches the last 350 characters, of the current file, spread out on one or several lines ( 1 occurrence )
The regex (?-sm).{100,350}$ matches the maximum of the last characters, if between 100 and 350, of the very last line, of the current file ( 1 occurrence )
The regex (?m-s).{100,350}$ matches the maximum of the last characters, if between 100 and 350, of any single line, of the current file ( 6 occurrences )
The regex (?sm).{100,350}$ matches a maximum range of any character, if between 100 and 350, followed by an EOL character OR finishing the current file, in one or several lines, empty or not ( 5 occurrences )
and :
The regex (?s-m)^.{100,350} matches the first 350 characters, of the current file, spread out on one or several lines ( See, note below ! )
The regex (?-sm)^.{100,350} matches the maximum of the first characters, if between 100 and 350, of the very first line, of the current file ( 1 occurrence )
The regex (?m-s)^.{100,350} matches the maximum of the first characters, if between 100 and 350, of any single line, of the current file ( 6 occurrences )
The regex (?sm)^.{100,350} matches a maximum range of any character, if between 100 and 350, preceded by an EOL character OR beginning the current file, in one or several lines, empty or not ( 4 occurrences )
IMPORTANT :
Due to an incorrect handling of backward assertions, the N++ regex engine may NOT produce, in some cases, the right matches ! It’s just the case of the regex (?s-m)^.{100,350}, with the backward assertion ^ This regex engine should find one match, ONLY. However it, wrongly, find 5 occurrences :-((
In fact, the regex engine seems, in that specific case, to use, instead, the regex (?s).{100,350}, which, simply, matches the longest string, till 350 characters, of any character, in one or several lines !
With the hope that this global oversight could help you, in some cases !!
Best Regards,
guy038