Notepad++ and NUL characters
-
A change is coming soon to Notepad++ which will allow proper handling of NUL characters when searching; see https://github.com/notepad-plus-plus/notepad-plus-plus/pull/16469 for a preview.
This got me wondering how well Notepad++ is positioned to be a “file editor” rather than simply a “text editor”.
Let’s discard for a moment that it doesn’t have native hex editor capabilities, and let’s pretend that arbitrary editing of files with encoded/binary content isn’t dangerous…
I’m specifically wondering how ready Notepad++ is to not have ANY stumbling blocks over the NUL character. Historically this has been a problem because a NUL character has been used internally by Notepad++ to signify that “the content of this string variable ends here”. Ideally, a NUL character is not treated that way, and is the same as any other character.
Does anyone interested in this conversation have examples of things you’d want to do with Notepad++ that you currently can’t do, because the NUL character gets in the way?
-
@Alan-Kilborn
Most plugin messages that select text from a file (e.g.,SCI_GETTEXT
,SCI_GETSELTEXT
) still treat the text as a NUL-terminated string. -
@Mark-Olson said in Notepad++ and NUL characters:
Most plugin messages that select text from a file (e.g.,
SCI_GETTEXT
,SCI_GETSELTEXT
) still treat the text as a NUL-terminated string.Indeed, but I think the ultimate limiting factor is the Win32 API, which is basically ANSI C (properly speaking, it’s the C++98 standard, which is so old as to no longer even resemble C++ as most developers understand it today).
The
NULL
byte will always have special significance as long as you are passing “string” data to C library functions, given that C has no dedicated “string” type, the closest thing being an array ofchar
, and such arrays have no way of storing their length — as, for example, a Pascal string can ¹ — except the presence of aNULL
byte to mark the end.
¹
The old Macintosh operating system used Pascal strings everywhere. Many C programmers on other platforms used Pascal strings for speed. Excel uses Pascal strings internally which is why strings in many places in Excel are limited to 255 bytes, and it’s also one reason Excel is blazingly fast.
For a long time, if you wanted to put a Pascal string literal in your C code, you had to write:
char* str = "\006Hello!";
-
I have no idea what additional possibilities I could have now if this comes, Npp is and remains a pure text editor for me.
I’m a bit surprised that this is being implemented because, from my point of view, this signals that you can also edit files that you can search accordingly and that’s just not the case. But it is as it is or as it will be. -
Microsoft Windows’ copy/paste mechanism for text uses NUL text strings. You can get around this in Notepad++ using
Edit / Paste Special / ...
There are three sub-options that support NUL:- Copy Binary Content
- Cut Binary Content
- Paste Binary Content
-
@mkupper said:
Microsoft Windows’ copy/paste mechanism for text uses NUL text strings.
Ah, okay, so here’s a current Notepad++ limitation: (Normal) Copying of some text that contains NUL character(s) won’t paste back the NUL character(s), even if one is staying within Notepad++ for both operations.
-
@Alan-Kilborn said in Notepad++ and NUL characters:
Ah, okay, so here’s a current Notepad++ limitation: (Normal) Copying of some text that contains NUL character(s) won’t paste back the NUL character(s), even if one is staying within Notepad++ for both operations.
Normal Copying in Notepad++ converts NUL characters to spaces, so a paste back will not cut content, but deliver the same length of content which was copied, but with NULs converted to spaces.
But like @mkupper posted, there are also actions available to keep binary content like it is.
If Copy Binary Content is used and Ctrl + V is used for paste, pasted back content is cut at first NUL character.
-
@Alan-Kilborn said:
here’s a current Notepad++ limitation
So, to be clear, I meant this as a limitation when considering Notepad++ being an “editor” and not just a “text editor”. In such a case, having to do special things, e.g. “Copy/Cut/Paste” Special wouldn’t be necessary…within Notepad++. Trying to get data to the outside world, e.g. via the external clipboard is a different endeavor.
-
@Alan-Kilborn said in Notepad++ and NUL characters:
Trying to get data to the outside world, e.g. via the external clipboard is a different endeavor.
This depends more on how other applications handle the clipboard and paste command. Notepad++ is able to fill the clipboard with normal or special copy as you like.
-
Back to the initial post: I think, the new feature does not change the category of editor in any way. I would consider it more like a consolidation to what the user expects while searching in a file and looking at the search result.
My expectation of search results is that it displays the exact same content from the file I was searching in. Independent from file content being only text or binary mixed with text.
Of course, Notepad++ is primary a text editor. But the ++ indicates, that it is so much more than that.
Opening a binary file with Notepad++ is a valid use case. Otherwise Notepad++ should prevent opening such binary files.
I am very happy that I am able to open binary with Notepad++. It is useful, and it is useful to search in those binary files as well.
A use case for example: I have some binary build result from C++. I want to verify the build date of the binary in comparison with other build results. Open binary in Notepad++, build date and time is there and I can identify it quickly, or even search for it quickly. No additional tool needed. Sure, there might be more valid ways to do something like this, but why not?
Another use case: I had a case in which Enterprise Architect (a UML modelling tool) exported a corrupted xml file, it put some binary chars in, most likely because there was an encoding issue. I was not able to import this xml file on another PC again to Enterprise architect. What I did: Searched with Notepad++ in the corrupted binary xml, deleted these characters, save. Now, Import was possible.
Using Notepad++ for this was super easy solution. So yes, for me it is not only text editor, more a text editor ++ also including a bit of binary handling. -
@Alan-Kilborn said in Notepad++ and NUL characters:
So, to be clear, I meant this as a limitation when considering Notepad++ being an “editor” and not just a “text editor”.
It is possible to work around this. When an application loads stuff into the Windows copy/paste buffer the application provides the information in multiple formats. For example, if you are using a web browser and copy something from a web page then the web browser will be uploading various forms of plain text and also various forms of HTML, and usually more.
When you use “paste” in an application the code examines the list of available formats and picks the one that seems like the best fit. Notepad++ picks one of the plain text formats. Notepad++'s
Edit / Paste Special
menu has options for grabbing an HTML format blob and another option for grabbing an RTF format blob and dropping the results into Notepad++'s editing area.The workaround is that applications are allowed to define their own formats, including ones that generated on the fly. Notepad++ can take advantage of this by creating a
npp_binary_text
format. If the text being copied contains aNUL
then upload the data as the binary blob format and also uploadnpp_binary_text
with the string valuetrue
. On pasting, Notepad++ would examine the list of available formats and ifnpp_binary_text
is there and it’s set totrue
then grab the binary data format. This would allow Notepad++ to signal to itself that we are dealing with a Notepad++ to Notepad++ copy/paste while maintaining compatibility with older versions of Notepad++ and other applications.Microsoft Office uses internal to itself formats that allow for a richer experience when copy/pasting within Office applications. For example, there are format blobs that contain metadata about what is available in the normal well-known formats. You can copy data from an Office application and it’ll still look good if you paste into a non-Office application. If you paste the same thing into an Office application then you get extras such as the document’s original time stamps, the name(s) of the document creators and editors, etc.