Community
    • Login

    Regex searching for NUL characters

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 2 Posters 5.4k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA Online
      Alan Kilborn
      last edited by Alan Kilborn

      @guy038 or anyone else with input:

      In the FAQ it is implied that current Notepad++ has a problem doing regular expression searching for embedded NUL characters via this statement, which might at first read be confusing because it is talking about the benefits of using a non-standard N++ version:

      “Both, search and replace strings can contain embedded NUL characters and/or Escape sequences for NUL characters ( \x{0000} )”

      But…I did NOT find searching for embedded NULs to be a problem with Notepad++ 7.8.6; am I missing something?

      ec548427-c30a-430e-a746-e18785acd0d9-image.png

      1 Reply Last reply Reply Quote 2
      • guy038G Offline
        guy038
        last edited by

        Hello, @alan-kilborn and All,

        When I wrote, in that FAQ that :

        • Both, search and replace strings can contain embedded NUL characters and/or Escape sequences for NUL characters ( \x{0000} )

        I was referring, specifically, to the Francois-R Boyer regex engine version !

        But, indeed, our present Boost regex engine do handle the NUL characters, but ONLY  in the search regex ! Embedded NUL chars in replacement, breaks the replacement process :-((


        BTW, for the record, in the Find what: zone, any of the regex syntaxes, below, can be used to match a single Control character NUL, of Unicode point-point 0000 :

        • In Regular expression search mode :

          • \0  ,  \00  ,  \000  ,  \0000     in octal

          • \x0  ,  \x00  ,  \x{00}  ,  \x{000}  ,  \x{0000}    in hexadecimal

        • In Extended search mode :

          • \0 ( special syntax )

          • \d000  ,  \o000  ,  \b00000000  ,  \x00    in decimal, octal, binary and hexadecimal

        Beware also that, in Extended search mode, you cannot search any string with contains characters after a first \0 character. For instance, search of \0 or abc\0 do work properly but the search of \0abc or even \0\0 fails !

        Best Regards,

        guy038

        Alan KilbornA 1 Reply Last reply Reply Quote 2
        • Alan KilbornA Online
          Alan Kilborn @guy038
          last edited by Alan Kilborn

          @guy038 said in Regex searching for NUL characters:

          our present Boost regex engine do handle the NUL characters, but ONLY  in the search regex !

          Ah, okay; thanks for the confirmation on what I was seeing in practice!
          Is it clear that the FAQ entry is implying that the even a search does not currently work when that is truly not the case?

          Beware also that, in Extended search mode, you cannot search any string with contains characters after a first \0 character. For instance, search of \0 or abc\0 do work properly but the search of \0abc or even \0\0 fails !

          Indeed, it appears to be a known issue to others besides yourself; see HERE.

          Perhaps to someone with a C/C++ background, this behavior, although not good, is totally understandable!? :-)

          I’m just thankful I don’t try to edit files with NULs very often.

          1 Reply Last reply Reply Quote 2
          • Alan KilbornA Online
            Alan Kilborn
            last edited by

            So after a bit more real work with NULs…

            I noticed that the Find result window, after a Find All in Current Document, on a line with NULs, shows only the part of the line BEFORE the first NUL.

            A bit of time later, I noticed THIS. :-(

            I do agree that NUL isn’t a typical use case for a text file, but…

            1 Reply Last reply Reply Quote 1
            • guy038G Offline
              guy038
              last edited by guy038

              Hi, @alan-kilborn and All,

              Here is a solution, as a work-around, to manage the presence of the NUL character(s) in a file :

              • Choose an other character, not used, yet, in your file. Let’s take the \x{007F} control character Delete

              • So, you first run the regex S/R, below, with the Wrap around option and the Regular expression search mode

                • SEARCH \0

                • REPLACE \x7F

              • Then you perform all your text manipulations, in Notepad++

              • Finally, save your file and exit N++

              • As we cannot insert any NUL character, with an N++ replacement, we’ll simply use the well known utility sed.exe

                • You can download its last Windows v4.8 - 64 bits version, from https://github.com/mbuilov/sed-windows

                • Or other versions, from https://github.com/mbuilov/sed-windows/tree/master/archive

              • Then, in a DOS console window, type in and execute this simple command :

                • sed.exe -i s/\x7f/\x00/ Your_File

              Best Regards,

              guy038

              1 Reply Last reply Reply Quote 0

              Hello! It looks like you're interested in this conversation, but you don't have an account yet.

              Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

              With your input, this post could be even better 💗

              Register Login
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors