Regex: What is the difference between Normal Replace, Extended Replace and .Matches Newsline?
-
hi, can anyone tell me what is the difference on REGEX between
Normal Replace
, and.Matches Newsline
? Also, when isExtended Replace
used? -
The documentation describes extended search mode and regular expression search mode.
But in brief: “Normal” search mode just searches for literal characters, and cannot look for special characters or for fancy “patterns”. “Extended” search mode can search for normal characters and a brief list of special characters “escapes” that allow for searching for special characters like newlines and tabs. “Regular Expression” mode has a much larger selection of escapes and special characters, and can also search for things like “the beginning of the line” or “zero or more copies of the previous character” or other fancy things like that.
“. Matches Newline” only affects “Regular Expression” (regex) searches. In regex mode, the
.
character normally matches any character except for the newline characters; if that checkbox is checked, then.
also matches the newline characters. -
Hello, @hellena-crainicu and All,
First of all, Hellena, you speak about two different things : about the
Search
mode for one part and about an option of theRegular expression
search mode, on the other part !
-
The Search mode which can be :
-
Normal
: All the characters, in theFind what:
zone are supposed to be literal characters, without any interpretation. However, this statement is not totally exact : it depends on the status of theMatch case
option ! For example, if you’re searching for the wordLicense
:-
It will match the exact string
License
if theMatch case
option is checked -
It will match any form, like
LICENSE
,License
,license
but also the strings asliCENSe
,liCENSE
,LiCeNsE
,… if theMatch case
option is UNchecked
-
-
Extended
: In this mode almost all the characters are supposed to be literal characters, without any interpretation. However5
special characters can be found with a specific syntax, BOTH in theFind what:
and/or theReplace with:
zone :-
The Null character (
\x00
) with the\0
syntax -
The Tabulation character (
\x09
) with the\t
syntax -
The New Line character (
\x0A
) with the\n
syntax -
The Carriage Return character (
\x0D
) with the\r
syntax -
The AntiSlash character (
\x5C
) with the\\
syntax
-
-
In addition, in the
Extended
mode, anyANSI
character can be matched by its character’s code, in base10
,8
,2
or16
:-
in DECIMAL, from
\d000
to\d255
(3
digits, between0
and9
) -
in OCTAL, from
\o000
to\o377
(3
digits, between0
and7
) -
in BINARY from
\b00000000
to\b11111111
(8
digits, between0
and1
) -
in HEXADECIMAL from
\x00
to\xFF
(2
hexadecimal chars, between0
and9
and/or betweenA
andF
)
-
-
Note that the mention about the
Match case
option, inNormal
search mode, is also valid inExtended
mode andRegular Expression
as well ! -
Regular Expression
: As you know, in this search mode, a lot of structures has a special signification. For people not acquainted with these notions, consult, first, some tutorials from thisFAQ
post https://community.notepad-plus-plus.org/topic/15765/faq-desk-where-to-find-regular-expressions-regex-documentation/2
-
-
Now, the
. matches newline
option is a functional option for theRegular expression
search mode, only !-
If the
. matches newline
is unchecked, this means that the dot regex symbol (.
) matches a single standard character. So any character from theBMP
, from\x{0000}
to\x{FFFD}
, WITHOUT the six chars\x{000A}
( New Line ),\x{000C}
( Form Feed ),\x{000D}
( Carriage Return ),\x{0085}
( New line ),\x{2028}
( Line Separator ) and\x{2029}
( Paragrah Separator ) -
If the
. matches newline
is checked, this means that the dot regex symbol (.
) matches absolutely any character from the Basic Multilingual plane (BMP
), from\x{0000}
to\x{FFFD}
, with no exception !
-
Last recommendation : in
Extended
search mode, it’s best to uncheck theMatch whole word only
to avoid unpredictable results !Best Regards,
guy038
P.S. :
From above, we can deduce that the
Extended
search mode can be replaced, in most cases, by theRegular expression
search mode !Only two specific notations, in the
Extended
search mode, have no equivalent in theRegular expression
search mode :-
The
\b########
notation, where each#
represents a0
or a1
binary digit ( for instance,\b01000001
matches anA
letter ) -
The
\d###
notation, where each#
represents a digit from0
to9
( For instance the\d090
matches aZ
letter )
-
-
thank you @guy038