Community
    • Login

    Regex searching for NUL characters

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 2 Posters 3.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn
      last edited by Alan Kilborn

      @guy038 or anyone else with input:

      In the FAQ it is implied that current Notepad++ has a problem doing regular expression searching for embedded NUL characters via this statement, which might at first read be confusing because it is talking about the benefits of using a non-standard N++ version:

      “Both, search and replace strings can contain embedded NUL characters and/or Escape sequences for NUL characters ( \x{0000} )”

      But…I did NOT find searching for embedded NULs to be a problem with Notepad++ 7.8.6; am I missing something?

      ec548427-c30a-430e-a746-e18785acd0d9-image.png

      1 Reply Last reply Reply Quote 2
      • guy038G
        guy038
        last edited by

        Hello, @alan-kilborn and All,

        When I wrote, in that FAQ that :

        • Both, search and replace strings can contain embedded NUL characters and/or Escape sequences for NUL characters ( \x{0000} )

        I was referring, specifically, to the Francois-R Boyer regex engine version !

        But, indeed, our present Boost regex engine do handle the NUL characters, but ONLY  in the search regex ! Embedded NUL chars in replacement, breaks the replacement process :-((


        BTW, for the record, in the Find what: zone, any of the regex syntaxes, below, can be used to match a single Control character NUL, of Unicode point-point 0000 :

        • In Regular expression search mode :

          • \0  ,  \00  ,  \000  ,  \0000     in octal

          • \x0  ,  \x00  ,  \x{00}  ,  \x{000}  ,  \x{0000}    in hexadecimal

        • In Extended search mode :

          • \0 ( special syntax )

          • \d000  ,  \o000  ,  \b00000000  ,  \x00    in decimal, octal, binary and hexadecimal

        Beware also that, in Extended search mode, you cannot search any string with contains characters after a first \0 character. For instance, search of \0 or abc\0 do work properly but the search of \0abc or even \0\0 fails !

        Best Regards,

        guy038

        Alan KilbornA 1 Reply Last reply Reply Quote 2
        • Alan KilbornA
          Alan Kilborn @guy038
          last edited by Alan Kilborn

          @guy038 said in Regex searching for NUL characters:

          our present Boost regex engine do handle the NUL characters, but ONLY  in the search regex !

          Ah, okay; thanks for the confirmation on what I was seeing in practice!
          Is it clear that the FAQ entry is implying that the even a search does not currently work when that is truly not the case?

          Beware also that, in Extended search mode, you cannot search any string with contains characters after a first \0 character. For instance, search of \0 or abc\0 do work properly but the search of \0abc or even \0\0 fails !

          Indeed, it appears to be a known issue to others besides yourself; see HERE.

          Perhaps to someone with a C/C++ background, this behavior, although not good, is totally understandable!? :-)

          I’m just thankful I don’t try to edit files with NULs very often.

          1 Reply Last reply Reply Quote 2
          • Alan KilbornA
            Alan Kilborn
            last edited by

            So after a bit more real work with NULs…

            I noticed that the Find result window, after a Find All in Current Document, on a line with NULs, shows only the part of the line BEFORE the first NUL.

            A bit of time later, I noticed THIS. :-(

            I do agree that NUL isn’t a typical use case for a text file, but…

            1 Reply Last reply Reply Quote 1
            • guy038G
              guy038
              last edited by guy038

              Hi, @alan-kilborn and All,

              Here is a solution, as a work-around, to manage the presence of the NUL character(s) in a file :

              • Choose an other character, not used, yet, in your file. Let’s take the \x{007F} control character Delete

              • So, you first run the regex S/R, below, with the Wrap around option and the Regular expression search mode

                • SEARCH \0

                • REPLACE \x7F

              • Then you perform all your text manipulations, in Notepad++

              • Finally, save your file and exit N++

              • As we cannot insert any NUL character, with an N++ replacement, we’ll simply use the well known utility sed.exe

                • You can download its last Windows v4.8 - 64 bits version, from https://github.com/mbuilov/sed-windows

                • Or other versions, from https://github.com/mbuilov/sed-windows/tree/master/archive

              • Then, in a DOS console window, type in and execute this simple command :

                • sed.exe -i s/\x7f/\x00/ Your_File

              Best Regards,

              guy038

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors