Search for inconsistent line endings with a regex? (part 2)
-
Hello, @alan-kilborn,
I’ll study your last solution, on Monday 18 ( Again, I’m away on a three-day ski trip 😉 )
Best Regards,
guy038
-
Hello, @ekopalypse, @alan-kilborn, and All,
Like you proposed, @alan-kilborn, the enhanced script becomes :
check = True false_EOL = {0:'$[^\r][^\n]', # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows EOL ) 1:'\n', # Should be \r as editor.getEOLMode() = 1 ( Macintosh EOL ) 2:'\r', # Should be \n as editor.getEOLMode() = 2 ( Unix EOL ) } def check_eol(match): global check check = False line_of_first_mismatch = editor.lineFromPosition(match.span(0)[1]) notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch),'EOL Mismatch', 0) user_input = notepad.prompt('Convert all line-endings in file?\r\nIf so, enter 0 for CRLF, 1 for CR, 2 for LF', 'INCONSISTENT LINE-ENDINGS DETECTED!', editor.getEOLMode()) if user_input is not None: desired_eol_index = int(user_input) if 0 <= desired_eol_index <= 2: eol_cmd_list = [ MENUCOMMAND.FORMAT_TODOS, MENUCOMMAND.FORMAT_TOMAC, MENUCOMMAND.FORMAT_TOUNIX, ] if desired_eol_index == editor.getEOLMode(): notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3]) # change to UNDESIRED line-endings notepad.menuCommand(eol_cmd_list[desired_eol_index]) # change to DESIRED line-endings editor.research(false_EOL[editor.getEOLMode()], # regex to search for check_eol, # function to call if regex match 0, # re flags 0, # START of file editor.getLength(), # END of file 1) # count ( at FIRST match ) if check == True: notepad.messageBox('All EOLS correct','EOL check', 0)
Now, given this simple text :
This is a little test to try if OK
-
With
Windows (CR LF)
in the status bar -
With line
4
ending with CR -
line
6
ending with3
spaces + LF -
And all the other lines ending with CRLF
When running the script, it said :
Different EOLS detected -- The first inconsistency is on line 6
, although it should be on line4
ending with CR !
Still searching for other oddities :-)
Best Regards,
guy038
-
-
@guy038 said :
Different EOLS detected – The first inconsistency is on line 6, although it should be on line 4 ending with CR !
Well… that seems to be because
$[^\r][^\n]
(when searching from top of file) misses line 4 and matches the LF at the end of line 6 and thet
at the start of line 7.The original regex of
\r[^\n]|[^\r]\n
seems to work better… -
I noticed that other odd things can happen.
Example:
I created a
Unix (LF)
file and put some lines in it, and then changed one of the line’s endings to CRLF:The status bar said:
Running the script said:
but it should have said line 3.
Moving to the PS console window and checking the EOL mode, I discovered:
So I seem to have found a case where something is out of sync: Notepad++ 's status bar says LF for line-endings, but the Scintilla buffer says something different (CRLF).
EDIT: I seem to have figured out why: The editorconfig plugin seems to be interfering. I have it set for CRLF for the file in question. However, I’d have thought that this plugin only does things when I save a file, and in the above I’ve not saved the data. Oh, well, (non)problem solved.
-
This time I’ve found a real bug in the script, and it is with the code I suggested:
Buggy code:
line_of_first_mismatch = editor.lineFromPosition(match.span(0)[1]) notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch),'EOL Mismatch', 0)
Better code:
line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0]) notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
-
Hello, @ekopalypse, @alan-kilborn and All,
Ah…, OK. I see the problem ! Now, Alan, if you try this script on files with more than
500,000
lines, the regex\r[^\n]|[^\r]\n
return an error whereas the regex$[^\r][^\n]
works correctly and displays the expected messageAll EOLS correct
Thus, I decided that this behaviour is of higher importance compared to knowing which is the first mismatched line found ! I, then, changed this script as below :
check = True false_EOL = {0:'$[^\r][^\n]', # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows EOL ) 1:'\n', # Should be \r as editor.getEOLMode() = 1 ( Macintosh EOL ) 2:'\r', # Should be \n as editor.getEOLMode() = 2 ( Unix EOL ) } def check_eol(match): global check check = False user_input = notepad.prompt('Convert ALL line-endings of CURRENT file ( 0 for CRLF, 1 for CR, 2 for LF )', 'INCONSISTENT line-endings DETECTED !', editor.getEOLMode()) if user_input is not None: desired_eol_index = int(user_input) if 0 <= desired_eol_index <= 2: eol_cmd_list = [ MENUCOMMAND.FORMAT_TODOS, MENUCOMMAND.FORMAT_TOMAC, MENUCOMMAND.FORMAT_TOUNIX, ] if desired_eol_index == editor.getEOLMode(): notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3]) # change to UNDESIRED line-endings notepad.menuCommand(eol_cmd_list[desired_eol_index]) # change to DESIRED line-endings editor.research(false_EOL[editor.getEOLMode()], # regex to search for check_eol, # function to call if regex match 0, # re flags 0, # START of file editor.getLength(), # END of file 1) # count ( at FIRST match ) if check == True: notepad.messageBox('All EOLS correct','EOL check', 0)
Do note that it’s my own preference, only !
Best Regards,
guy038
P.S. :
In the meantime, I saw that you"ve done testing a lot ! Thanks for your tests but, as you can see, I solved the problem definitively ;-))
-
@guy038 said :
whereas the regex
$[^\r][^\n]
works correctlyTry it on a
Windows (CR LF)
file and this data:That regex doesn’t hit anything in that text.
I solved the problem definitively
Hmm. :-)
-
Hi, @ekopalypse, @alan-kilborn and All,
I deeply apologize, because my regex to find out all wrong cases, in case of a
Windows
file, was itself bugged !You were right about it, Alan. The correct regex is
$\n|\r^
leading to the line :false_EOL = {0:'$\n|\r^', # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows EOL )
This time, results are coherent, even for large files !
BR
guy038
-
Hello, @ekopalypse, @alan-kilborn and All,
I did some additional tests, with your modifications, Alan :
line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0]) notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
and my own one :
false_EOL = {0:'$\n|\r^', # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows EOL )
And everything seems to work as expected !
So the final version of this script is :
check = True false_EOL = {0:'$\n|\r^', # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows EOL ) 1:'\n', # Find \n ( should be \r ) as editor.getEOLMode() = 1 ( Macintosh EOL ) 2:'\r', # Find \r ( should be \n ) as editor.getEOLMode() = 2 ( Unix EOL ) } def check_eol(match): global check check = False line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0]) notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0) user_input = notepad.prompt('Convert ALL line-endings of CURRENT file ( 0 for CRLF, 1 for CR, 2 for LF )', 'INCONSISTENT line-endings DETECTED !', editor.getEOLMode()) if user_input is not None: desired_eol_index = int(user_input) if 0 <= desired_eol_index <= 2: eol_cmd_list = [ MENUCOMMAND.FORMAT_TODOS, MENUCOMMAND.FORMAT_TOMAC, MENUCOMMAND.FORMAT_TOUNIX, ] if desired_eol_index == editor.getEOLMode(): notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3]) # change to UNDESIRED line-endings notepad.menuCommand(eol_cmd_list[desired_eol_index]) # change to DESIRED line-endings editor.research(false_EOL[editor.getEOLMode()], # regex to search for check_eol, # function to call if regex match 0, # re flags 0, # START of file editor.getLength(), # END of file 1) # count ( at FIRST match ) if check == True: notepad.messageBox('All EOLS correct','EOL check', 0)
To be rigorous, note that the first EOL inconsistency is always the first line with line-ending chars(s) different from the status bar indication !
Best Regards,
guy038