Search for inconsistent line endings with a regex? (part 2)
-
A stretch for staying on-topic, but I found a great way to set up a scenario for inconsistent line-endings, from the hand of @donho himself:
- Open a file with Unix (Linux) line-endings
- Select all (ctrl+a)
- Invoke Plugins -> Mime Tools -> Quoted-printable Encode
Boom. A very mixed line-endings file (Unix and Windows ends) now results.
(I discovered this when I was needing to mime a short file. I decided I don’t like how the Mime Tools plugin does its thing–not just this line-ending thing–and will resort to WinZip’s mime for my future miming needs.)
-
@Ekopalypse said in Search for inconsistent line endings with a regex? (part 2):
regex_dict = {0:‘\r[^\n]|[^\r]\n’,
1:‘\n’,
2:‘\r’,}def check_eol(match):
notepad.messageBox(‘Different EOLS detected’,‘EOL Missmatch’, 0)editor.research(regex_dict[editor.getEOLMode()], # regex
check_eol, # function to call
0, # re flags
0, # start
editor.getTextLength(), # end
1) # countI know this topic is ancient, but what exactly do I do with the sample code above? Is it supposed to be an external command, a configuration file, ?
-
there is a plugin called PythonScript that allows you to manipulate data in notepad++.
Here are the steps on how to create and use it.
The purpose of the script is to check whether the current document has different line endings (EOL), which can be problematic if you edit a file under Windows and then upload it to a Linux server, for example.
-
Hello @ekopalypse, @alan-kilborn and All,
@ekopalypse, I did not completely understand your script so I changed it and improved it as below :
check = True false_EOL = {0:'$[^\r][^\n]', # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows EOL ) 1:'\n', # Should be \r as editor.getEOLMode() = 1 ( Macintosh EOL ) 2:'\r', # Should be \n as editor.getEOLMode() = 2 ( Unix EOL ) } def check_eol(match): global check check = False notepad.messageBox('Different EOLS detected','EOL Mismatch', 0) editor.research(false_EOL[editor.getEOLMode()], # regex to search for check_eol, # function to call if regex match 0, # re flags 0, # START of file editor.getLength(), # END of file 1) # count ( at FIRST match ) if check == True: notepad.messageBox('All EOLS correct','EOL check', 0)
Remarks :
-
I changed the word
missmatch
asmismatch
which seems to be the right spelling ! -
I changed the name of the Python dictionnary from
regex_dict
tofalse_EOL
. Thus, it emphasizes the wrong EOLS to match, in each case -
I added a way to indicate when all the EOL are correct
-
Finally, I modified the regex used to dectect false EOLS when the file is supposed to be a
Windows
file
So, I changed :
false_EOL = {0:'\r[^\n]|[^\r]\n', # Miss \n AFTER OR \r BEFORE as editor.getEOLMode() = 0 ( Windows EOL )
By :
false_EOL = {0:'$[^\r][^\n]', # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows EOL )
Because in case of huge files, the former syntax would lead to a
RuntimeError
regarding the regex. With the latter one, everything seems to work better !
Now, to be sure that your file contains normalized
EOLs
only, simply run, consecutively, the two commands below :- For a
Windows
file :
Edit > EOL conversion > Unix (LF) Edit > EOL conversion > Windows (CR LF)
- For an
Unix
file :
Edit > EOL conversion > Macintosh (CR) Edit > EOL conversion > Unix (LF)
- For a
Macintosh
file :
Edit > EOL conversion > Unix (LF) Edit > EOL conversion > Macintosh (CR)
Best regards,
guy038
-
-
@guy038 said in Search for inconsistent line endings with a regex? (part 2):
Now, to be sure that your file contains normalized EOLs only, simply run, consecutively, the two commands below
OR… have your script do it. Add these lines into your script, after the indicated existing lines:
def check_eol(match): # <--- existing line in script global check # <--- existing line in script check = False # <--- existing line in script #notepad.messageBox('Different EOLS detected','EOL Mismatch', 0) # <--- existing line in script, but now turned into a comment line_of_first_mismatch = editor.lineFromPosition(match.span(0)[1]) notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch),'EOL Mismatch', 0) user_input = notepad.prompt('Convert all line-endings in file?\r\nIf so, enter 0 for CRLF, 1 for CR, 2 for LF', 'INCONSISTENT LINE-ENDINGS DETECTED!', editor.getEOLMode()) if user_input is not None: desired_eol_index = int(user_input) if 0 <= desired_eol_index <= 2: eol_cmd_list = [ MENUCOMMAND.FORMAT_TODOS, MENUCOMMAND.FORMAT_TOMAC, MENUCOMMAND.FORMAT_TOUNIX, ] if desired_eol_index == editor.getEOLMode(): notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3]) # change to undesired line-endings notepad.menuCommand(eol_cmd_list[desired_eol_index]) # change to desired line-endings
Note also that I took the liberty of adding in some logic to tell you which line number has the first inconsistent line-ending.
-
Hello, @alan-kilborn,
I’ll study your last solution, on Monday 18 ( Again, I’m away on a three-day ski trip 😉 )
Best Regards,
guy038
-
Hello, @ekopalypse, @alan-kilborn, and All,
Like you proposed, @alan-kilborn, the enhanced script becomes :
check = True false_EOL = {0:'$[^\r][^\n]', # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows EOL ) 1:'\n', # Should be \r as editor.getEOLMode() = 1 ( Macintosh EOL ) 2:'\r', # Should be \n as editor.getEOLMode() = 2 ( Unix EOL ) } def check_eol(match): global check check = False line_of_first_mismatch = editor.lineFromPosition(match.span(0)[1]) notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch),'EOL Mismatch', 0) user_input = notepad.prompt('Convert all line-endings in file?\r\nIf so, enter 0 for CRLF, 1 for CR, 2 for LF', 'INCONSISTENT LINE-ENDINGS DETECTED!', editor.getEOLMode()) if user_input is not None: desired_eol_index = int(user_input) if 0 <= desired_eol_index <= 2: eol_cmd_list = [ MENUCOMMAND.FORMAT_TODOS, MENUCOMMAND.FORMAT_TOMAC, MENUCOMMAND.FORMAT_TOUNIX, ] if desired_eol_index == editor.getEOLMode(): notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3]) # change to UNDESIRED line-endings notepad.menuCommand(eol_cmd_list[desired_eol_index]) # change to DESIRED line-endings editor.research(false_EOL[editor.getEOLMode()], # regex to search for check_eol, # function to call if regex match 0, # re flags 0, # START of file editor.getLength(), # END of file 1) # count ( at FIRST match ) if check == True: notepad.messageBox('All EOLS correct','EOL check', 0)
Now, given this simple text :
This is a little test to try if OK
-
With
Windows (CR LF)
in the status bar -
With line
4
ending with CR -
line
6
ending with3
spaces + LF -
And all the other lines ending with CRLF
When running the script, it said :
Different EOLS detected -- The first inconsistency is on line 6
, although it should be on line4
ending with CR !
Still searching for other oddities :-)
Best Regards,
guy038
-
-
@guy038 said :
Different EOLS detected – The first inconsistency is on line 6, although it should be on line 4 ending with CR !
Well… that seems to be because
$[^\r][^\n]
(when searching from top of file) misses line 4 and matches the LF at the end of line 6 and thet
at the start of line 7.The original regex of
\r[^\n]|[^\r]\n
seems to work better… -
I noticed that other odd things can happen.
Example:
I created a
Unix (LF)
file and put some lines in it, and then changed one of the line’s endings to CRLF:The status bar said:
Running the script said:
but it should have said line 3.
Moving to the PS console window and checking the EOL mode, I discovered:
So I seem to have found a case where something is out of sync: Notepad++ 's status bar says LF for line-endings, but the Scintilla buffer says something different (CRLF).
EDIT: I seem to have figured out why: The editorconfig plugin seems to be interfering. I have it set for CRLF for the file in question. However, I’d have thought that this plugin only does things when I save a file, and in the above I’ve not saved the data. Oh, well, (non)problem solved.
-
This time I’ve found a real bug in the script, and it is with the code I suggested:
Buggy code:
line_of_first_mismatch = editor.lineFromPosition(match.span(0)[1]) notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch),'EOL Mismatch', 0)
Better code:
line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0]) notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
-
Hello, @ekopalypse, @alan-kilborn and All,
Ah…, OK. I see the problem ! Now, Alan, if you try this script on files with more than
500,000
lines, the regex\r[^\n]|[^\r]\n
return an error whereas the regex$[^\r][^\n]
works correctly and displays the expected messageAll EOLS correct
Thus, I decided that this behaviour is of higher importance compared to knowing which is the first mismatched line found ! I, then, changed this script as below :
check = True false_EOL = {0:'$[^\r][^\n]', # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows EOL ) 1:'\n', # Should be \r as editor.getEOLMode() = 1 ( Macintosh EOL ) 2:'\r', # Should be \n as editor.getEOLMode() = 2 ( Unix EOL ) } def check_eol(match): global check check = False user_input = notepad.prompt('Convert ALL line-endings of CURRENT file ( 0 for CRLF, 1 for CR, 2 for LF )', 'INCONSISTENT line-endings DETECTED !', editor.getEOLMode()) if user_input is not None: desired_eol_index = int(user_input) if 0 <= desired_eol_index <= 2: eol_cmd_list = [ MENUCOMMAND.FORMAT_TODOS, MENUCOMMAND.FORMAT_TOMAC, MENUCOMMAND.FORMAT_TOUNIX, ] if desired_eol_index == editor.getEOLMode(): notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3]) # change to UNDESIRED line-endings notepad.menuCommand(eol_cmd_list[desired_eol_index]) # change to DESIRED line-endings editor.research(false_EOL[editor.getEOLMode()], # regex to search for check_eol, # function to call if regex match 0, # re flags 0, # START of file editor.getLength(), # END of file 1) # count ( at FIRST match ) if check == True: notepad.messageBox('All EOLS correct','EOL check', 0)
Do note that it’s my own preference, only !
Best Regards,
guy038
P.S. :
In the meantime, I saw that you"ve done testing a lot ! Thanks for your tests but, as you can see, I solved the problem definitively ;-))
-
@guy038 said :
whereas the regex
$[^\r][^\n]
works correctlyTry it on a
Windows (CR LF)
file and this data:That regex doesn’t hit anything in that text.
I solved the problem definitively
Hmm. :-)
-
Hi, @ekopalypse, @alan-kilborn and All,
I deeply apologize, because my regex to find out all wrong cases, in case of a
Windows
file, was itself bugged !You were right about it, Alan. The correct regex is
$\n|\r^
leading to the line :false_EOL = {0:'$\n|\r^', # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows EOL )
This time, results are coherent, even for large files !
BR
guy038
-
Hello, @ekopalypse, @alan-kilborn and All,
I did some additional tests, with your modifications, Alan :
line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0]) notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
and my own one :
false_EOL = {0:'$\n|\r^', # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows EOL )
And everything seems to work as expected !
So the final version of this script is :
check = True false_EOL = {0:'$\n|\r^', # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows EOL ) 1:'\n', # Find \n ( should be \r ) as editor.getEOLMode() = 1 ( Macintosh EOL ) 2:'\r', # Find \r ( should be \n ) as editor.getEOLMode() = 2 ( Unix EOL ) } def check_eol(match): global check check = False line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0]) notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0) user_input = notepad.prompt('Convert ALL line-endings of CURRENT file ( 0 for CRLF, 1 for CR, 2 for LF )', 'INCONSISTENT line-endings DETECTED !', editor.getEOLMode()) if user_input is not None: desired_eol_index = int(user_input) if 0 <= desired_eol_index <= 2: eol_cmd_list = [ MENUCOMMAND.FORMAT_TODOS, MENUCOMMAND.FORMAT_TOMAC, MENUCOMMAND.FORMAT_TOUNIX, ] if desired_eol_index == editor.getEOLMode(): notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3]) # change to UNDESIRED line-endings notepad.menuCommand(eol_cmd_list[desired_eol_index]) # change to DESIRED line-endings editor.research(false_EOL[editor.getEOLMode()], # regex to search for check_eol, # function to call if regex match 0, # re flags 0, # START of file editor.getLength(), # END of file 1) # count ( at FIRST match ) if check == True: notepad.messageBox('All EOLS correct','EOL check', 0)
To be rigorous, note that the first EOL inconsistency is always the first line with line-ending chars(s) different from the status bar indication !
Best Regards,
guy038