• Login
Community
  • Login

Search for inconsistent line endings with a regex? (part 2)

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
29 Posts 5 Posters 5.1k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • E
    Ekopalypse @Alan Kilborn
    last edited by May 9, 2019, 2:13 PM

    @Alan-Kilborn

    slightly different approach

    regex_dict = {0:'\r[^\n]|[^\r]\n',
                  1:'\n',
                  2:'\r',}
    
    def check_eol(match):
        notepad.messageBox('Different EOLS detected','EOL Missmatch', 0)
    
    editor.research(regex_dict[editor.getEOLMode()],    # regex
                    check_eol,                          # function to call
                    0,                                  # re flags
                    0,                                  # start
                    editor.getTextLength(),             # end
                    1)                                  # count
    
    
    A D 3 Replies Last reply May 9, 2019, 2:16 PM Reply Quote 4
    • A
      Alan Kilborn @Ekopalypse
      last edited by May 9, 2019, 2:16 PM

      @Ekopalypse

      I either forgot about or never knew about editor.getEOLMode(). Perhaps I could have used that knowledge YESTERDAY before I finished my design the way I did. :(

      but Thank you.

      1 Reply Last reply Reply Quote 3
      • A
        Alan Kilborn @Ekopalypse
        last edited by May 10, 2019, 7:46 PM

        @Ekopalypse

        So it appears the reason (maybe) that I never noticed editor.getEOLMode() before is that I “grew up” here on Community sample scripts where it seems that notepad.getFormatType() was used much more frequently for a very similar purpose. On an integer basis the functions even return the same number values!

        I suppose the notepad.getFormatType() function is for what Notepad++ thinks the setting is for a file upon loading, and after that it follows the current user setting for “EOL conversion”…and the editor.getEOLMode() function usually follows the notepad.getFormatType() setting, but could be set independently (via PS code call to editor.setEOLMode()).

        I did verify this, the editor.getEOLMode() value follows the notepad.getFormatType() value, and if you editor.setEOLMode() to something different than the Notepad++ EOL setting, and then switch the active tab and then come back to the original tab, editor.getEOLMode() will again be back at the notepad.getFormatType() setting. [A fair number of settings work this way: You can change them via PS editor functions, but a switch of tabs and a return will find them reset to original Notepad++ controlling values.]

        For my purposes, however, the editor function is valuable to know the setting for editor1 and editor2, without, say, having to make editor2 the active editor–when it isn’t currently–and then calling notepad.getFormatType().

        …if that all makes any kind of sense to you. :)

        E M 2 Replies Last reply May 10, 2019, 8:21 PM Reply Quote 3
        • E
          Ekopalypse @Alan Kilborn
          last edited by May 10, 2019, 8:21 PM

          @Alan-Kilborn

          You are absolutely right and this is something one needs to keep in mind.
          Whenever possible, a notepad object method should be used to stay in sync with npp. Npp itself does, as far as I understand the code, use SCI_SETEOLMODE and SCI_GETEOLMODE to set/get the current eol and as far as I have understood,
          scintilla only checks the first line to determine the eol mode.

          I would say, to get a value it is safe to use editor object methods but, as said, if one wants to change something then notepad object methods should be preferred.

          A 1 Reply Last reply May 13, 2019, 12:58 PM Reply Quote 4
          • M
            Meta Chuh moderator @Alan Kilborn
            last edited by Meta Chuh May 11, 2019, 4:48 PM May 11, 2019, 4:47 PM

            @Alan-Kilborn

            My guess is that it was locked by “Father Time”! (i.e., age + inactivity)

            very funny … nooot

            • no: topics don’t get locked automatically, when marked as solved.
            • yes: topics have to be locked manually.
            • no: this topic was not locked, to prevent follow up posts, in order to preserve it’s extraordinary state for eternity, like keeping an ancient ming vase empty.
            • no: there was no content reason to lock this topic.
            • no: the community place does not need a clean up, as the separate information exchange does not interfere with the issue tracker readability for developers.
            • maybe: this was one of those topics, that got spammed back then, and was locked to contain it a bit.
            1 Reply Last reply Reply Quote 0
            • A
              Alan Kilborn @Ekopalypse
              last edited by May 13, 2019, 12:58 PM

              @Ekopalypse said:

              Whenever possible, a notepad object method should be used to stay in sync with npp

              if one wants to change something then notepad object methods should be preferred

              Very much agree. Usually the notepad object provides only get access, e.g. in this case notepad.getFormatType() has no corresponding notepad.setFormatType(). In order to do the set, one must do `notepad.menuCommand(MENUCOMMAND.FORMAT_TOUNIX) as an example. This is nice because it keeps the Notepad++ user interface consistent.

              Note a very similar discussion involving View -> Show Symbol -> … menu items via script control is found here: https://notepad-plus-plus.org/community/topic/14585/turn-on-off-the-line-ending-symbols-via-script

              1 Reply Last reply Reply Quote 2
              • A
                Alan Kilborn
                last edited by May 15, 2019, 1:37 PM

                A stretch for staying on-topic, but I found a great way to set up a scenario for inconsistent line-endings, from the hand of @donho himself:

                • Open a file with Unix (Linux) line-endings
                • Select all (ctrl+a)
                • Invoke Plugins -> Mime Tools -> Quoted-printable Encode

                Boom. A very mixed line-endings file (Unix and Windows ends) now results.

                (I discovered this when I was needing to mime a short file. I decided I don’t like how the Mime Tools plugin does its thing–not just this line-ending thing–and will resort to WinZip’s mime for my future miming needs.)

                Imgur

                1 Reply Last reply Reply Quote 3
                • D
                  Doug Hart @Ekopalypse
                  last edited by Mar 13, 2024, 3:00 AM

                  @Ekopalypse said in Search for inconsistent line endings with a regex? (part 2):

                  regex_dict = {0:‘\r[^\n]|[^\r]\n’,
                  1:‘\n’,
                  2:‘\r’,}

                  def check_eol(match):
                  notepad.messageBox(‘Different EOLS detected’,‘EOL Missmatch’, 0)

                  editor.research(regex_dict[editor.getEOLMode()], # regex
                  check_eol, # function to call
                  0, # re flags
                  0, # start
                  editor.getTextLength(), # end
                  1) # count

                  I know this topic is ancient, but what exactly do I do with the sample code above? Is it supposed to be an external command, a configuration file, ?

                  E 1 Reply Last reply Mar 13, 2024, 7:41 AM Reply Quote 0
                  • E
                    Ekopalypse @Doug Hart
                    last edited by Mar 13, 2024, 7:41 AM

                    @Doug-Hart

                    there is a plugin called PythonScript that allows you to manipulate data in notepad++.

                    Here are the steps on how to create and use it.

                    The purpose of the script is to check whether the current document has different line endings (EOL), which can be problematic if you edit a file under Windows and then upload it to a Linux server, for example.

                    1 Reply Last reply Reply Quote 4
                    • G
                      guy038
                      last edited by guy038 Mar 14, 2024, 11:35 AM Mar 14, 2024, 11:04 AM

                      Hello @ekopalypse, @alan-kilborn and All,

                      @ekopalypse, I did not completely understand your script so I changed it and improved it as below :

                      check = True
                      
                      false_EOL = {0:'$[^\r][^\n]',  # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows   EOL )
                                   1:'\n',           # Should be \r                             as editor.getEOLMode() = 1 ( Macintosh EOL )
                                   2:'\r',           # Should be \n                             as editor.getEOLMode() = 2 ( Unix      EOL )
                                  }
                      
                      def check_eol(match):
                          global check
                          check = False
                          notepad.messageBox('Different EOLS detected','EOL Mismatch', 0)
                      
                      editor.research(false_EOL[editor.getEOLMode()],     # regex to search for
                                      check_eol,                          # function to call if regex match
                                      0,                                  # re flags
                                      0,                                  # START of file
                                      editor.getLength(),                 # END   of file
                                      1)                                  # count ( at FIRST match )
                      
                      if check == True:
                          notepad.messageBox('All EOLS correct','EOL check', 0)
                      

                      Remarks :

                      • I changed the word missmatch as mismatch which seems to be the right spelling !

                      • I changed the name of the Python dictionnary from regex_dict to false_EOL. Thus, it emphasizes the wrong EOLS to match, in each case

                      • I added a way to indicate when all the EOL are correct

                      • Finally, I modified the regex used to dectect false EOLS when the file is supposed to be a Windows file

                      So, I changed :

                      false_EOL = {0:'\r[^\n]|[^\r]\n',   # Miss \n AFTER OR \r BEFORE as editor.getEOLMode() = 0 ( Windows   EOL )
                      

                      By :

                      false_EOL = {0:'$[^\r][^\n]',  # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows   EOL )
                      

                      Because in case of huge files, the former syntax would lead to a RuntimeError regarding the regex. With the latter one, everything seems to work better !


                      Now, to be sure that your file contains normalized EOLs only, simply run, consecutively, the two commands below :

                      • For a Windows file :
                      Edit > EOL conversion > Unix (LF)
                      Edit > EOL conversion > Windows (CR LF)
                      
                      • For an Unix file :
                      Edit > EOL conversion > Macintosh (CR)
                      Edit > EOL conversion > Unix (LF)
                      
                      • For a Macintosh file :
                      Edit > EOL conversion > Unix (LF)
                      Edit > EOL conversion > Macintosh (CR)
                      

                      Best regards,

                      guy038

                      A 1 Reply Last reply Mar 14, 2024, 1:58 PM Reply Quote 1
                      • A
                        Alan Kilborn @guy038
                        last edited by Alan Kilborn Mar 14, 2024, 2:00 PM Mar 14, 2024, 1:58 PM

                        @guy038 said in Search for inconsistent line endings with a regex? (part 2):

                        Now, to be sure that your file contains normalized EOLs only, simply run, consecutively, the two commands below

                        OR… have your script do it. Add these lines into your script, after the indicated existing lines:

                        def check_eol(match):                                                  # <--- existing line in script
                            global check                                                       # <--- existing line in script
                            check = False                                                      # <--- existing line in script
                            #notepad.messageBox('Different EOLS detected','EOL Mismatch', 0)   # <--- existing line in script, but now turned into a comment
                            line_of_first_mismatch = editor.lineFromPosition(match.span(0)[1])
                            notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch),'EOL Mismatch', 0)
                            user_input = notepad.prompt('Convert all line-endings in file?\r\nIf so, enter 0 for CRLF, 1 for CR, 2 for LF',
                                'INCONSISTENT LINE-ENDINGS DETECTED!', editor.getEOLMode())
                            if user_input is not None:
                                desired_eol_index = int(user_input)
                                if 0 <= desired_eol_index <= 2:
                                    eol_cmd_list = [
                                        MENUCOMMAND.FORMAT_TODOS,
                                        MENUCOMMAND.FORMAT_TOMAC,
                                        MENUCOMMAND.FORMAT_TOUNIX,
                                    ]
                                    if desired_eol_index == editor.getEOLMode():
                                        notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3])  # change to undesired line-endings
                                    notepad.menuCommand(eol_cmd_list[desired_eol_index])  # change to desired line-endings
                        

                        Note also that I took the liberty of adding in some logic to tell you which line number has the first inconsistent line-ending.

                        1 Reply Last reply Reply Quote 1
                        • G
                          guy038
                          last edited by guy038 Mar 14, 2024, 3:46 PM Mar 14, 2024, 3:45 PM

                          Hello, @alan-kilborn,

                          I’ll study your last solution, on Monday 18 ( Again, I’m away on a three-day ski trip 😉 )

                          Best Regards,

                          guy038

                          1 Reply Last reply Reply Quote 0
                          • G
                            guy038
                            last edited by Mar 18, 2024, 10:39 AM

                            Hello, @ekopalypse, @alan-kilborn, and All,

                            Like you proposed, @alan-kilborn, the enhanced script becomes :

                            check = True
                            
                            false_EOL = {0:'$[^\r][^\n]',  # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows   EOL )
                                         1:'\n',           # Should be \r                             as editor.getEOLMode() = 1 ( Macintosh EOL )
                                         2:'\r',           # Should be \n                             as editor.getEOLMode() = 2 ( Unix      EOL )
                                        }
                            
                            def check_eol(match):
                                global check
                                check = False
                                line_of_first_mismatch = editor.lineFromPosition(match.span(0)[1])
                                notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch),'EOL Mismatch', 0)
                                user_input = notepad.prompt('Convert all line-endings in file?\r\nIf so, enter 0 for CRLF, 1 for CR, 2 for LF',
                                    'INCONSISTENT LINE-ENDINGS DETECTED!', editor.getEOLMode())
                                if user_input is not None:
                                    desired_eol_index = int(user_input)
                                    if 0 <= desired_eol_index <= 2:
                                        eol_cmd_list = [
                                            MENUCOMMAND.FORMAT_TODOS,
                                            MENUCOMMAND.FORMAT_TOMAC,
                                            MENUCOMMAND.FORMAT_TOUNIX,
                                        ]
                                        if desired_eol_index == editor.getEOLMode():
                                            notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3])  # change to UNDESIRED line-endings
                                        notepad.menuCommand(eol_cmd_list[desired_eol_index])                # change to DESIRED   line-endings
                            
                            editor.research(false_EOL[editor.getEOLMode()],     # regex to search for
                                            check_eol,                          # function to call if regex match
                                            0,                                  # re flags
                                            0,                                  # START of file
                                            editor.getLength(),                 # END   of file
                                            1)                                  # count ( at FIRST match )
                            
                            if check == True:
                                notepad.messageBox('All EOLS correct','EOL check', 0)
                            

                            Now, given this simple text :

                            This
                            is
                            a
                            little
                            test
                            to   
                            try
                            if
                            OK
                            
                            • With Windows (CR LF) in the status bar

                            • With line 4 ending with CR

                            • line 6 ending with 3 spaces + LF

                            • And all the other lines ending with CRLF

                            When running the script, it said :

                            Different EOLS detected -- The first inconsistency is on line 6, although it should be on line 4 ending with CR !


                            Still searching for other oddities :-)

                            Best Regards,

                            guy038

                            A 1 Reply Last reply Mar 18, 2024, 11:05 AM Reply Quote 0
                            • A
                              Alan Kilborn @guy038
                              last edited by Alan Kilborn Mar 18, 2024, 11:07 AM Mar 18, 2024, 11:05 AM

                              @guy038 said :

                              Different EOLS detected – The first inconsistency is on line 6, although it should be on line 4 ending with CR !

                              Well… that seems to be because $[^\r][^\n] (when searching from top of file) misses line 4 and matches the LF at the end of line 6 and the t at the start of line 7.

                              The original regex of \r[^\n]|[^\r]\n seems to work better…

                              1 Reply Last reply Reply Quote 1
                              • A
                                Alan Kilborn
                                last edited by Alan Kilborn Mar 18, 2024, 11:34 AM Mar 18, 2024, 11:22 AM

                                I noticed that other odd things can happen.

                                Example:

                                I created a Unix (LF) file and put some lines in it, and then changed one of the line’s endings to CRLF:

                                7f90b7d9-e102-4085-b25e-5b6da0751f0a-image.png

                                The status bar said:

                                59a293bc-e6fd-48b2-8081-f8afd1617d9f-image.png

                                Running the script said:

                                04f26430-725b-4649-bec5-02d06d797169-image.png

                                but it should have said line 3.

                                Moving to the PS console window and checking the EOL mode, I discovered:

                                54577d16-e3b9-4f83-aded-accfafc391c3-image.png

                                So I seem to have found a case where something is out of sync: Notepad++ 's status bar says LF for line-endings, but the Scintilla buffer says something different (CRLF).

                                EDIT: I seem to have figured out why: The editorconfig plugin seems to be interfering. I have it set for CRLF for the file in question. However, I’d have thought that this plugin only does things when I save a file, and in the above I’ve not saved the data. Oh, well, (non)problem solved.

                                1 Reply Last reply Reply Quote 2
                                • A
                                  Alan Kilborn
                                  last edited by Mar 18, 2024, 11:35 AM

                                  This time I’ve found a real bug in the script, and it is with the code I suggested:

                                  Buggy code:

                                      line_of_first_mismatch = editor.lineFromPosition(match.span(0)[1])
                                      notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch),'EOL Mismatch', 0)
                                  

                                  Better code:

                                      line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0])
                                      notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
                                  
                                  1 Reply Last reply Reply Quote 1
                                  • G
                                    guy038
                                    last edited by guy038 Mar 18, 2024, 12:13 PM Mar 18, 2024, 12:12 PM

                                    Hello, @ekopalypse, @alan-kilborn and All,

                                    Ah…, OK. I see the problem ! Now, Alan, if you try this script on files with more than 500,000 lines, the regex \r[^\n]|[^\r]\n return an error whereas the regex $[^\r][^\n] works correctly and displays the expected message All EOLS correct


                                    Thus, I decided that this behaviour is of higher importance compared to knowing which is the first mismatched line found ! I, then, changed this script as below :

                                    check = True
                                    
                                    false_EOL = {0:'$[^\r][^\n]',  # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows   EOL )
                                                 1:'\n',           # Should be \r                             as editor.getEOLMode() = 1 ( Macintosh EOL )
                                                 2:'\r',           # Should be \n                             as editor.getEOLMode() = 2 ( Unix      EOL )
                                                }
                                    
                                    def check_eol(match):
                                        global check
                                        check = False
                                        user_input = notepad.prompt('Convert ALL line-endings of CURRENT file ( 0 for CRLF, 1 for CR, 2 for LF )',
                                            'INCONSISTENT line-endings DETECTED !', editor.getEOLMode())
                                        if user_input is not None:
                                            desired_eol_index = int(user_input)
                                            if 0 <= desired_eol_index <= 2:
                                                eol_cmd_list = [
                                                    MENUCOMMAND.FORMAT_TODOS,
                                                    MENUCOMMAND.FORMAT_TOMAC,
                                                    MENUCOMMAND.FORMAT_TOUNIX,
                                                ]
                                                if desired_eol_index == editor.getEOLMode():
                                                    notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3])  # change to UNDESIRED line-endings
                                                notepad.menuCommand(eol_cmd_list[desired_eol_index])                # change to DESIRED   line-endings
                                    
                                    editor.research(false_EOL[editor.getEOLMode()],     # regex to search for
                                                    check_eol,                          # function to call if regex match
                                                    0,                                  # re flags
                                                    0,                                  # START of file
                                                    editor.getLength(),                 # END   of file
                                                    1)                                  # count ( at FIRST match )
                                    
                                    if check == True:
                                        notepad.messageBox('All EOLS correct','EOL check', 0)
                                    

                                    Do note that it’s my own preference, only !

                                    Best Regards,

                                    guy038

                                    P.S. :

                                    In the meantime, I saw that you"ve done testing a lot ! Thanks for your tests but, as you can see, I solved the problem definitively ;-))

                                    A 1 Reply Last reply Mar 18, 2024, 12:25 PM Reply Quote 0
                                    • A
                                      Alan Kilborn @guy038
                                      last edited by Alan Kilborn Mar 18, 2024, 12:26 PM Mar 18, 2024, 12:25 PM

                                      @guy038 said :

                                      whereas the regex $[^\r][^\n] works correctly

                                      Try it on a Windows (CR LF) file and this data:

                                      d0d694b4-e64f-4497-897d-176f5009356d-image.png

                                      That regex doesn’t hit anything in that text.


                                      I solved the problem definitively

                                      Hmm. :-)

                                      1 Reply Last reply Reply Quote 0
                                      • G
                                        guy038
                                        last edited by guy038 Mar 18, 2024, 1:29 PM Mar 18, 2024, 1:25 PM

                                        Hi, @ekopalypse, @alan-kilborn and All,

                                        I deeply apologize, because my regex to find out all wrong cases, in case of a Windows file, was itself bugged !

                                        You were right about it, Alan. The correct regex is $\n|\r^ leading to the line :

                                        false_EOL = {0:'$\n|\r^',  # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows   EOL )
                                        

                                        This time, results are coherent, even for large files !

                                        BR

                                        guy038

                                        1 Reply Last reply Reply Quote 1
                                        • G
                                          guy038
                                          last edited by guy038 Mar 21, 2024, 8:47 AM Mar 21, 2024, 8:47 AM

                                          Hello, @ekopalypse, @alan-kilborn and All,

                                          I did some additional tests, with your modifications, Alan :

                                              line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0])
                                              notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
                                          

                                          and my own one :

                                          false_EOL = {0:'$\n|\r^',  # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows   EOL )
                                          

                                          And everything seems to work as expected !

                                          So the final version of this script is :

                                          check = True
                                          
                                          false_EOL = {0:'$\n|\r^',  # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows   EOL )
                                                       1:'\n',       # Find \n ( should be \r )                                 as editor.getEOLMode() = 1 ( Macintosh EOL )
                                                       2:'\r',       # Find \r ( should be \n )                                 as editor.getEOLMode() = 2 ( Unix      EOL )
                                                      }
                                          
                                          def check_eol(match):
                                              global check
                                              check = False
                                              line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0])
                                              notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
                                              user_input = notepad.prompt('Convert ALL line-endings of CURRENT file ( 0 for CRLF, 1 for CR, 2 for LF )',
                                                  'INCONSISTENT line-endings DETECTED !', editor.getEOLMode())
                                              if user_input is not None:
                                                  desired_eol_index = int(user_input)
                                                  if 0 <= desired_eol_index <= 2:
                                                      eol_cmd_list = [
                                                          MENUCOMMAND.FORMAT_TODOS,
                                                          MENUCOMMAND.FORMAT_TOMAC,
                                                          MENUCOMMAND.FORMAT_TOUNIX,
                                                      ]
                                                      if desired_eol_index == editor.getEOLMode():
                                                          notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3])  # change to UNDESIRED line-endings
                                                      notepad.menuCommand(eol_cmd_list[desired_eol_index])                # change to DESIRED   line-endings
                                          
                                          editor.research(false_EOL[editor.getEOLMode()],     # regex to search for
                                                          check_eol,                          # function to call if regex match
                                                          0,                                  # re flags
                                                          0,                                  # START of file
                                                          editor.getLength(),                 # END   of file
                                                          1)                                  # count ( at FIRST match )
                                          
                                          if check == True:
                                              notepad.messageBox('All EOLS correct','EOL check', 0)
                                          

                                          To be rigorous, note that the first EOL inconsistency is always the first line with line-ending chars(s) different from the status bar indication !

                                          Best Regards,

                                          guy038

                                          1 Reply Last reply Reply Quote 2
                                          • First post
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors