@guy038, @peterjones, and others.
It turns out the \c☒ topic gets fairly messy, and is far too messy to document the details in the manual. I started playing with ANSI…
\c☒ with ANSI or ASCII codes \x00 to \x7F works well and searches for the lower five bits of the ☒ character. Realistically, you should only do it with A-Z or a-z. Better yet is to use x## or x{####} style expressions as it’s clearer as to what is being searched for.
A case sensitive search for \c☒ using ANSI codes \x80 to \xFF matches ANSI codes in the \xE0 to \xFF range, with some exceptions… The logic first extracts the lower five bits of ☒ and then bitwise-or that with 11100000 or 0xE0. For example, all of these will match ANSI character 0xEC which is ì.
Hex
Pattern
\x8C
\cŒ
\xAC
\c¬
\xCC
\cÌ
\xEC
\cì
The lower five bits of the above hex codes ‘\x8C’, ‘\xAC’, ‘\xCC’, and ‘\xEC’ is 01100 or \x0C and we bitwise-or that result with 11100000 or 0xE0 to search for \xEC.
It turns out that with one exception, all of the ANSI characters in the \xE0 to \xFF range are lower case letters. A case-insensitive search for \c☒ using ANSI codes \x80 to \xFF works just like the case-insensitive version I just described but also matches the upper case forms of the letters in \xE0 to \xFF range.
The one exception is ANSI character code \xF7 which is a divide by sign ÷. A search for \c—, \c·, \c×, or \c÷ only matches ÷ when you use a case-insensitive search.
Searching for \c (\x20), \c@ (\x40), \c` (\x60), \c€ (\x80), \c (\xA0), \cÀ (\xC0), and \cà (\xE0) all match NUL (\x00) in ANSI encoded files. With one exception also match NUL (\x{0000}) in UTF-8 encoded files. The exception is searching for \c€ (\x80) matches \x{000C} (form feed) and not NUL \x{0000}.
Because searches for \c€ (\x80), \c (\xA0), \cÀ (\xC0), and \cà (\xE0) all match NUL (\x00) in ANSI files it means you can’t use them to match the lower case à at ANSI character \xE0 nor it’s upper-case À at \xC0.
I also ran across that while Notepad++ supports searching for \x00 or \x{0000} both which match a NUL (\x00 or \x{0000}) in a file using \x00 or \x{0000} in the replacement part both results in the replacement string getting terminated at the NUL (\x00 or \x{0000}) character.
As replacement strings are terminated at the NUL using \c~ where the ~ is a NUL (\x00) returns Invalid Regular Expression with the details being:
ASCII escape sequence terminated
prematurely. The error occurred
while parsing the regular expression:
'>>>HERE>>>\c'.
Using a search for xxx and replace of aaa\x00zzz or aaa\x{0000}zzz both result in xxx being replaced with aaa as the replacement string was terminated at the NUL. Apparently the engine first does a pass where it converted the \x☒☒ and \x{☒☒☒☒} forms of characters into the actual character value meaning \x00 or \x{0000} in a replacement simply terminates the string at that point.
I suspect that bug could be used to add a comment to the replacement!
Search: Hello
Replace: World\x0 This will never happen
Windows also use NUL as the text string terminator in its copy/paste system.