Regex: What is the difference between Normal Replace, Extended Replace and .Matches Newsline?

Hellena Crainicu

hi, can anyone tell me what is the difference on REGEX between Normal Replace, and .Matches Newsline ? Also, when is Extended Replace used?

PeterJones

@Hellena-Crainicu ,

The documentation describes extended search mode and regular expression search mode.

But in brief: “Normal” search mode just searches for literal characters, and cannot look for special characters or for fancy “patterns”. “Extended” search mode can search for normal characters and a brief list of special characters “escapes” that allow for searching for special characters like newlines and tabs. “Regular Expression” mode has a much larger selection of escapes and special characters, and can also search for things like “the beginning of the line” or “zero or more copies of the previous character” or other fancy things like that.

“. Matches Newline” only affects “Regular Expression” (regex) searches. In regex mode, the . character normally matches any character except for the newline characters; if that checkbox is checked, then . also matches the newline characters.

guy038

Hello, @hellena-crainicu and All,

First of all, Hellena, you speak about two different things : about the Search mode for one part and about an option of the Regular expression search mode, on the other part !

The Search mode which can be :
- Normal : All the characters, in the Find what: zone are supposed to be literal characters, without any interpretation. However, this statement is not totally exact : it depends on the status of the Match case option ! For example, if you’re searching for the word License :
  - It will match the exact string License if the Match case option is checked
  - It will match any form, like LICENSE, License, license but also the strings as liCENSe, liCENSE, LiCeNsE,… if the Match case option is UNchecked
- Extended : In this mode almost all the characters are supposed to be literal characters, without any interpretation. However 5 special characters can be found with a specific syntax, BOTH in the Find what: and/or the Replace with: zone :
  - The Null character ( \x00 ) with the \0 syntax
  - The Tabulation character ( \x09 ) with the \t syntax
  - The New Line character ( \x0A ) with the \n syntax
  - The Carriage Return character ( \x0D ) with the \r syntax
  - The AntiSlash character ( \x5C ) with the \\ syntax
- In addition, in the Extended mode, any ANSI character can be matched by its character’s code, in base 10, 8, 2 or 16 :
  - in DECIMAL, from \d000 to \d255 ( 3 digits, between 0 and 9 )
  - in OCTAL, from \o000 to \o377 ( 3 digits, between 0 and 7 )
  - in BINARY from \b00000000 to \b11111111 ( 8 digits, between 0 and 1 )
  - in HEXADECIMAL from \x00 to \xFF ( 2 hexadecimal chars, between 0 and 9 and/or between A and F )
- Note that the mention about the Match case option, in Normal search mode, is also valid in Extended mode and Regular Expression as well !
- Regular Expression : As you know, in this search mode, a lot of structures has a special signification. For people not acquainted with these notions, consult, first, some tutorials from this FAQ post https://community.notepad-plus-plus.org/topic/15765/faq-desk-where-to-find-regular-expressions-regex-documentation/2

Now, the . matches newline option is a functional option for the Regular expression search mode, only !
- If the . matches newline is unchecked, this means that the dot regex symbol ( . ) matches a single standard character. So any character from the BMP, from \x{0000} to \x{FFFD}, WITHOUT the six chars \x{000A} ( New Line ), \x{000C} ( Form Feed ), \x{000D} ( Carriage Return ), \x{0085} ( New line ), \x{2028} ( Line Separator ) and \x{2029} ( Paragrah Separator )
- If the . matches newline is checked, this means that the dot regex symbol ( . ) matches absolutely any character from the Basic Multilingual plane ( BMP ), from \x{0000} to \x{FFFD}, with no exception !

Last recommendation : in Extended search mode, it’s best to uncheck the Match whole word only to avoid unpredictable results !

Best Regards,

guy038

P.S. :

From above, we can deduce that the Extended search mode can be replaced, in most cases, by the Regular expression search mode !

Only two specific notations, in the Extended search mode, have no equivalent in the Regular expression search mode :

The \b######## notation, where each # represents a 0 or a 1 binary digit ( for instance, \b01000001 matches an A letter )
The \d### notation, where each # represents a digit from 0 to 9 ( For instance the \d090 matches a Z letter )

Hellena Crainicu

thank you @guy038