Regex searching for NUL characters



  • @guy038 or anyone else with input:

    In the FAQ it is implied that current Notepad++ has a problem doing regular expression searching for embedded NUL characters via this statement, which might at first read be confusing because it is talking about the benefits of using a non-standard N++ version:

    "Both, search and replace strings can contain embedded NUL characters and/or Escape sequences for NUL characters ( \x{0000} )"

    But…I did NOT find searching for embedded NULs to be a problem with Notepad++ 7.8.6; am I missing something?

    ec548427-c30a-430e-a746-e18785acd0d9-image.png



  • Hello, @alan-kilborn and All,

    When I wrote, in that FAQ that :

    • Both, search and replace strings can contain embedded NUL characters and/or Escape sequences for NUL characters ( \x{0000} )

    I was referring, specifically, to the Francois-R Boyer regex engine version !

    But, indeed, our present Boost regex engine do handle the NUL characters, but ONLY  in the search regex ! Embedded NUL chars in replacement, breaks the replacement process :-((


    BTW, for the record, in the Find what: zone, any of the regex syntaxes, below, can be used to match a single Control character NUL, of Unicode point-point 0000 :

    • In Regular expression search mode :

      • \0  ,  \00  ,  \000  ,  \0000     in octal

      • \x0  ,  \x00  ,  \x{00}  ,  \x{000}  ,  \x{0000}    in hexadecimal

    • In Extended search mode :

      • \0 ( special syntax )

      • \d000  ,  \o000  ,  \b00000000  ,  \x00    in decimal, octal, binary and hexadecimal

    Beware also that, in Extended search mode, you cannot search any string with contains characters after a first \0 character. For instance, search of \0 or abc\0 do work properly but the search of \0abc or even \0\0 fails !

    Best Regards,

    guy038



  • @guy038 said in Regex searching for NUL characters:

    our present Boost regex engine do handle the NUL characters, but ONLY  in the search regex !

    Ah, okay; thanks for the confirmation on what I was seeing in practice!
    Is it clear that the FAQ entry is implying that the even a search does not currently work when that is truly not the case?

    Beware also that, in Extended search mode, you cannot search any string with contains characters after a first \0 character. For instance, search of \0 or abc\0 do work properly but the search of \0abc or even \0\0 fails !

    Indeed, it appears to be a known issue to others besides yourself; see HERE.

    Perhaps to someone with a C/C++ background, this behavior, although not good, is totally understandable!? :-)

    I’m just thankful I don’t try to edit files with NULs very often.



  • So after a bit more real work with NULs…

    I noticed that the Find result window, after a Find All in Current Document, on a line with NULs, shows only the part of the line BEFORE the first NUL.

    A bit of time later, I noticed THIS. :-(

    I do agree that NUL isn’t a typical use case for a text file, but…



  • Hi, @alan-kilborn and All,

    Here is a solution, as a work-around, to manage the presence of the NUL character(s) in a file :

    • Choose an other character, not used, yet, in your file. Let’s take the \x{007F} control character Delete

    • So, you first run the regex S/R, below, with the Wrap around option and the Regular expression search mode

      • SEARCH \0

      • REPLACE \x7F

    • Then you perform all your text manipulations, in Notepad++

    • Finally, save your file and exit N++

    • As we cannot insert any NUL character, with an N++ replacement, we’ll simply use the well known utility sed.exe

    • Then, in a DOS console window, type in and execute this simple command :

      • sed.exe -i s/\x7f/\x00/ Your_File

    Best Regards,

    guy038


Log in to reply