Mixed Line Ending Detection. Possible?

SalviaSage

I learned that it is possible to have files with mixed line endings. I made a file with 3 different line endings (LF, CRLF, CR).
Results: LF has the priority and the file will be recognized as an LF ending file, even if it has other ones in there, (the user can not know unless if he has line endings display turned on and is going through line by line)
CRLF has the priority over CR, and CR is last.

This order makes perfect sense and is the way it should be. And, I understand that no file is supposed to have mixed line endings like this and operating systems do not use anything mixed (windows uses CRLF, mac and linux use LF)

BUT, it is possible to have files with mixed line endings, thus I would like to be able to detect if a file is in such a condition (without having to turn on line ending display and going through line by line)

Does anyone know if this is possible programmatically?

(ps. thanks claudia and scott sumner for all the help)

Scott Sumner

@SalviaSage

So ideally you’d have some code watching to see if you ever get a “bad” line ending in your editor tab buffer. “bad” is defined as:

Your file is Windows (CR LF) (see status bar) and you get either a LF line-ending or a CR line-ending (somehow) in its editing buffer
Your file is UNIX (LF) and you get either a CR LF line-ending or a CR line-ending in its editing buffer
Your file is MAC (CR) and you get either a CR LF line-ending or a LF line-ending in its editing buffer

However, one of the problems with doing this, at least with one of the scripting languages, is what’s known as the “big file problem”. As long as your files are relatively small, scanning them often (like when you make a change that could potentially put a wrong line-ending in) runs fairly quickly, but if your files are large then you start to see the time it takes to scan a file because it shows up in a sluggish user interface.

So then you make compromises. You can scan a file for mismatches when it is first opened. You can scan it at save time. You can come up with some complex algorithm to scan pieces of it at different times. Your choice.

That being said, normally it is a difficult task to get a wrong line-ending in your editor buffer. Scintilla normally takes care of keeping the line-endings consistent and correct. For example, if you copy lines out of a UNIX file buffer and into a Windows file buffer, Scintilla will convert the line endings from LF to CRLF during the paste. So…is it a really big issue after all?

SalviaSage

Yes, that is true, it is not a big issue, since they can easily be fixed with notepad++ auto convert feature anyway, by clicking the status bar.

If it is that hard to determine that there are bad line endings, then we might as well not bother with it at all!

Scott Sumner

@SalviaSage

If it is that hard to determine that there are bad line endings, then we might as well not bother with it at all!

It’s not necessarily hard, there just isn’t some magic that makes it especially easy.

I actually do this line-ending mismatch detection for myself, using pretty much the same technique used in this thread. I monitor what is happening in the currently viewable area, figuring that this is the likeliest place for a mismatched line ending to get inserted. If it is detected, the code turns on visible line-endings (normally I have that turned off) to really hit me with the problem.

Perfect? No. Really easy? No. Really hard? No. Acceptable? Yes.

By the way, here is a good way to get Unix line-endings in your Windows files: Copy some multiline text from the Pythonscript console window, then use the Clipboard History panel to paste it into a Windows editor tab! BOOM! LF(only) line-endings in your Windows buffer. Note that this series of events goes around Scintilla’s watchful paste protection.

SalviaSage

Oh, umm… You do know I will be asking for that end-of-line detection code… right?

xD xD

too bad, you shouldn’t have been such a nice open-source guy.

hehehehehehe…

guy038

Hi, @salviasage, @scott-sumner and All,

Once more, with a simple regex, it’s not that hard to find out if bad line-endings exist in a specific file ;-)

Open the Find/Replace dialog and select the Regular expression search mode
Just click on the Count button to get the number of occurrences of the following regexes, depending of the final status, wanted for that file
Fill up the Replace with: zone, as well, to get rid of all wrong line-endings, and click on the Replace All button

So, whatever the present status of the file, regarding its line-endings, if you want to get a final :

• WINDOWS file, with line-endings = CRLF  =>    SEARCH =  \r(?!\n)|(?<!\r)\n    REPLACE =  \r\n

• UNIX    file, with line-endings = LF    =>    SEARCH =  \r\n?                 REPLACE =  \n

• MAC     file, with line-ending  = CR    =>    SEARCH =  \r?\n                 REPLACE =  \r

Best Regards,

guy038

Scott Sumner

@SalviaSage

That code is wrapped up into something “bigger” that I have–not worth it to me to extract it. In general, I share code here in a couple of cases:

I already have the code done to do a certain specific thing when people post asking how it can be accomplished
Somebody posts an idea (that I hadn’t previously thought of) that can be of use to me; thus I write the code that implements it and share it

An example of case #2 is your request for this. I hadn’t thought of that previously, but when you brought it up, I thought “I can benefit from having that”…and now that it’s done I like it even more than I thought I would.

So…yea, if you wanna work on coding it yourself, I can give you hints if you get stuck, but with the framework from that other code a lot of the “hard” part is done already.

@guy038

Hi Guy, yea, this comes up a lot it seems, but people seem to want something that runs automagically–and unfortunately running a regular-expression replacement doesn’t fit in that scheme (hmmm, maybe a regex replacement macro that is run at save time–can NppExec do that? dunno…). However, take heart in that the regexs from your posting above do appear in the code for this, here’s a sample:

editor.research(r'\r(?!\n)|(?<!\r)\n', bad_line_ending_callback, 0, start_pos, end_pos)

:-D

Claudia Frank

If it is all about knowing that a file uses mixed eols only,
then my ¢¢ (CiYrm)

Behind the curtant you must see
The dark side never the strength in number it reveal
Don’t hestitate to break out of the loop you must do
if wasting time is not good for you

Cheers
Claudia

SalviaSage

@Scott-Sumner

Dear Scott, thanks for your input.

Scott Sumner

@Scott-Sumner said:

normally it is a difficult task to get a wrong line-ending in your editor buffer

I just thought I’d point out that with an errant regular-expression replacement, it is actually quite easy to get wrong line-endings in your buffer. For example, if your replacement expression involves line-ending characters, make sure you get it correct for your file-type, as shown above. Scintilla does nothing to protect you from messing up your files in this case (like it does when pasting). Absolute power (regex) corrupts absolutely! :-) Or maybe I should say that it can !

SalviaSage

Oh, so when I paste things into scintilla, scintilla automatically converts whatever eol my paste has to what the document is set to, (which I Can see in the statusbar)

I didn’t know that but I guess that makes sense and protects me from mixed line endings.
I like that.

I still would like a script to be able to highlight my mixed EOLs though (just the same way I highlight my line final whitespace), or at the least turn on the EOL displays like how your script has. (which you aren’t pasting here = ( )

Scott Sumner

@Scott-Sumner said:

normally it is a difficult task to get a wrong line-ending in your editor buffer

Apparently I shouldn’t have made that statement, as I keep proving it wrong…

I normally work with Windows formatted files with \r\n line-endings. I just noticed that pressing ctrl+m (which got done when my fingers got tangled up) will insert a Mac format line-ending \r in my Windows files–blech!.

Mixed Line Ending Detection. Possible?

:-D

:-(