Hello, Ryan, Claudia, AdrianHHH and All,
Ryan, see the main differences, between the four simple regexes, below ( I suppose a sensitive search ) :
a.*z matches a lowercase letter a, followed by the LONGEST range of characters, even EMPTY, till a lowercase letter z
a.*?z matches a lowe-case letter a, followed by the SHORTEST range of characters, even EMPTY, till a lowercase letter z
a.+z matches a lowercase letter a, followed by the LONGEST range of characters, NON empty, till a lowercase letter z
a.+?z matches a lowercase letter a, followed by the SHORTEST range of characters, NON empty, till a lowercase letter z
Just try these four regexes, with the subject text : az abcxyz az abz abxz abcxyz az ab bcxz abcx, in a new tab. The differences are quite obvious !
AdrianHHH, you shouldn’t be annoyed, about choosing between the two syntaxes, below, as they are strictly identical !
(?-s)alignment=".*?"
alignment="[^"\r\n]*"
Similarly, the two syntaxes, below, are strictly identical, too :
(?s)alignment=".*?"
alignment="[^"]*"
The reason is that you reach a final UNIQUE character ( a quote mark ) "
Now, I’m speaking to everybody ! For instance, do NOT confuse these two regexes :
The regex 123.+?5, that means : A string 123 followed by the SHORTEST, NON-empty, range of characters, till a digit 5
And the regex, almost identical, 123.+?56, which
Does NOT mean : A string 123 followed by the shortest, NON-empty, range of characters, till a digit 5, then the 6 digit
But means : A string 123 followed by the SHORTEST, NON-empty, range of characters, till the string 56
So, against the subject text 012345789 0123456789 012345789 0123456789, the first regex 123.+?5 finds four occurrences, whereas the the second regex 123.+?56 would, only, find two occurrences !
Here is a summary example :
Let’s imagine the text, below, where the string abcdlmpqrst is repeated, 10 times, with, sometimes, the lack of the letters p and/or q :
q missing q missing pq missing p missing p missing pq missing q missing p missing
abcdlmprst abcdlmprst abcdlmrst abcdlmpqrst abcdlmqrst abcdlmqrst abcdlmrst abcdlmpqrst abcdlmprst abcdlmqrst
Against this text, let’s try, successively, the 20 regexes, below, where the last fourteen contains the [^...] structure :
Regex A : (?-s)ab.+p
Regex B : (?-s)ab.+q
Regex C : (?-s)ab.+pq
Regex D : (?-s)ab.+?p
Regex E : (?-s)ab.+?q
Regex F : (?-s)ab.+?pq
Regex G : ab[^p\r\n]+p
Regex H : ab[^q\r\n]+q
Regex I : ab[^p\r\n]+?p
Regex J : ab[^q\r\n]+?q
Regex K : ab[^q\r\n]+p
Regex L : ab[^p\r\n]+q
Regex M : ab[^q\r\n]+?p
Regex N : ab[^p\r\n]+?q
Regex O : ab[^p\r\n]+pq
Regex P : ab[^q\r\n]+pq
Regex Q : ab[^pq\r\n]+pq
Regex R : ab[^p\r\n]+?pq
Regex S : ab[^q\r\n]+?pq
Regex T : ab[^pq\r\n]+?pq
Here are the results, where each match is indicated by a range of dashes
q missing q missing pq missing p missing p missing pq missing q missing p missing
abcdlmprst abcdlmprst abcdlmrst abcdlmpqrst abcdlmqrst abcdlmqrst abcdlmrst abcdlmpqrst abcdlmprst abcdlmqrst
A ---------------------------------------------------------------------------------------------------------------
B ----------------------------------------------------------------------------------------------------------------------------
C --------------------------------------------------------------------------------------------------
D , G , I ------- ------- ------------------- --------------------------------------------- -------
E , H , J ---------------------------------------------- ------- ------- -------------------- --------------------
F ---------------------------------------------- ----------------------------------------------
K --------------------------------------------- ------------------- -------
L -------------------- -------
M ------- ------- ------------------- ------------------- -------
N ------- ------- -------
O , R -------------------- ----------------------------------------------
P , S ---------------------------------------------- --------------------
Q , T -------------------- --------------------
Just notice that, as I said, above :
The regex D, (?-s)ab.+?p, DOES have an equivalent regex G, ab[^p\r\n]+p, with the [^.....] structure
The regex E, (?-s)ab.+?q, DOES have an equivalent regex H, ab[^q\r\n]+q, with the [^.....] structure
but :
The regex
F,
(?-s)ab.+?pq, does
NOT have an
equivalent regex, containing the
[^.....] structure
Note, also, that :
The regexes O, ab[^p\r\n]+pq, and R, ab[^p\r\n]+?pq are equivalent
The regexes P, ab[^q\r\n]+pq, and S, ab[^q\r\n]+?pq are equivalent
The regexes Q, ab[^pq\r\n]+pq, and T, ab[^pq\r\n]+?pq are equivalent
Why ? Just because the range of characters, after the string ab, must NOT contain a part or the totality of the string pq. In other words, theses six regexes, from O to T, always look for the shortest range of characters, between the string ab and the string pq !
Best Regards,
guy038
P.S. : Ryan, for your regex “education”, just begin with that article, in N++ Wiki :
http://docs.notepad-plus-plus.org/index.php/Regular_Expressions
In addition, you’ll find good documentation, about the new Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v1.48.0 ), used by Notepad++, since its 6.0 version, at the TWO addresses below :
http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html
http://www.boost.org/doc/libs/1_48_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html
The FIRST link explains the syntax, of regular expressions, in the SEARCH part
The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part
You may, also, look for valuable informations, on the sites, below :
http://www.regular-expressions.info
http://www.rexegg.com
http://perldoc.perl.org/perlre.html