Npp++ sometimes makes weird line terminations

Carmak Cusac

I don’t know much about this stuff so sorry if I say something tremendously dumb so here it goes:

It has been some times now that the app has created files with weird line terminations, I have only noticed this on batch (bat) files as they are executed and output errors, there may be more files with this issue, haven’t search for them because so far I have been able to read the files without problems.

So, in the problem-free files -by looking at the files hex values- the line terminations are composed by these two values 0D 0A, but sometimes the app for some reason decides to use only 0A which doesn’t give a problem when reading the documents but the problems do occur when the files are scripts and I try to execute those files, the previous times it has happened I managed to workaround it by creating a new text file, pasting the text then saving, its been just some moments ago that I decided to inspect the files to see what the hell was wrong with them as I tried everything I could think of: change encoding, join & split the lines again, and rewriting the start of the lines.

What I have noticed though is that this has (as far as I remember) always happened when I’ve used regex to replace whole lines or one line with two, for example replacing a pattern with $1\nMoreTextHere (notice the \n) so I end up with one more line of text, also it has always happened when the file had Unicode characters (Russian and accented á, é, í, ó, ú chars).
So what I’m guessing is that when using regex to create one more line with \n the app only writes 0A but I’m not sure about that as not all the lines had been created with regex, unfortunately I can’t share the files because they contain private data but I can share a part of the hex values (replacing the 0A with 0D 0A fixes this problem:

PROBLEM-FREE FILE:

PROBLEMATIC FILE:

Is this a bug?

Carmak Cusac

Also it seems it happens only with files that start with chcp 65001, I have other scripts that start with @echo off and the line is followed by chcp 65001, and they have their line terminations OK.

Ekopalypse

@Carmak-Cusac

Which end of line (eol) is currently being used can be seen in the status bar of Npp. Npp, or more precisely Scintilla, reads the first line to determine which eol is being used and uses that for the rest of the editing session.

Carmak Cusac

@Ekopalypse I just noticed this:

Could this be the problem? The file has Windows line terminations but for some reason Notepad++ has detected that the file has Unix terminations.

PeterJones

@Carmak-Cusac said in Npp++ sometimes makes weird line terminations:

for some reason

yes. That reason is that the first line had just LF, because, as @Ekopalypse said, “Scintilla reads the first line to determine which eol is being used”

The easiest way to fix it is to right-click on the Unix (LF) on the status bar and select a different option (like Windows (CRLF), which will convert all the lone LF to CRLF). If you really wanted Windows EOL, then leave it there; if you really wanted Unix EOL, then change it back to Unix (LF) through right clicking there again.

Carmak Cusac

@PeterJones Yes, I understand but the weird thing is I never changed form Windows to Unix nor would I have been able to write a file with Unix termination because new files are configured to have Windows’. I don’t understand how this happened since I didn’t even download the files, I created them myself.
Though one reason could it be that I, as I said in my OP, used only \n instead of using \r\n when adding new lines with regex but I’ve never in all my time using Np++ used \r. I just tried before sending this comment and yes, that’s the reason.
I think Notepad++ should replace the \n with \r\n as long as the document EOL is Windows

PeterJones

@Carmak-Cusac said in Npp++ sometimes makes weird line terminations:

Though one reason could it be that I, as I said in my OP, used only \n instead of using \r\n when adding new lines with regex

That is your culprit. To the regex engine, \n means LF character (ASCII 10); that is the definition.

The Notepad++ developers are not likely to hack at a well-established regex engine to make that engine behave differently than it has for all other instances of Notepad++ and every instance of that regex engine throughout the history of the boost regex engine; sorry. It would seriously break most users expectation. The best suggestion I have for you is to learn that in Notepad++ regex, if you want to use the Windows EOL in replacements, you must use the syntax of \r\n, not just \n.