Community
    • Login

    Search for inconsistent line endings with a regex? (part 2)

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    29 Posts 5 Posters 5.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      Hello, @alan-kilborn,

      I’ll study your last solution, on Monday 18 ( Again, I’m away on a three-day ski trip 😉 )

      Best Regards,

      guy038

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by

        Hello, @ekopalypse, @alan-kilborn, and All,

        Like you proposed, @alan-kilborn, the enhanced script becomes :

        check = True
        
        false_EOL = {0:'$[^\r][^\n]',  # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows   EOL )
                     1:'\n',           # Should be \r                             as editor.getEOLMode() = 1 ( Macintosh EOL )
                     2:'\r',           # Should be \n                             as editor.getEOLMode() = 2 ( Unix      EOL )
                    }
        
        def check_eol(match):
            global check
            check = False
            line_of_first_mismatch = editor.lineFromPosition(match.span(0)[1])
            notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch),'EOL Mismatch', 0)
            user_input = notepad.prompt('Convert all line-endings in file?\r\nIf so, enter 0 for CRLF, 1 for CR, 2 for LF',
                'INCONSISTENT LINE-ENDINGS DETECTED!', editor.getEOLMode())
            if user_input is not None:
                desired_eol_index = int(user_input)
                if 0 <= desired_eol_index <= 2:
                    eol_cmd_list = [
                        MENUCOMMAND.FORMAT_TODOS,
                        MENUCOMMAND.FORMAT_TOMAC,
                        MENUCOMMAND.FORMAT_TOUNIX,
                    ]
                    if desired_eol_index == editor.getEOLMode():
                        notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3])  # change to UNDESIRED line-endings
                    notepad.menuCommand(eol_cmd_list[desired_eol_index])                # change to DESIRED   line-endings
        
        editor.research(false_EOL[editor.getEOLMode()],     # regex to search for
                        check_eol,                          # function to call if regex match
                        0,                                  # re flags
                        0,                                  # START of file
                        editor.getLength(),                 # END   of file
                        1)                                  # count ( at FIRST match )
        
        if check == True:
            notepad.messageBox('All EOLS correct','EOL check', 0)
        

        Now, given this simple text :

        This
        is
        a
        little
        test
        to   
        try
        if
        OK
        
        • With Windows (CR LF) in the status bar

        • With line 4 ending with CR

        • line 6 ending with 3 spaces + LF

        • And all the other lines ending with CRLF

        When running the script, it said :

        Different EOLS detected -- The first inconsistency is on line 6, although it should be on line 4 ending with CR !


        Still searching for other oddities :-)

        Best Regards,

        guy038

        Alan KilbornA 1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @guy038
          last edited by Alan Kilborn

          @guy038 said :

          Different EOLS detected – The first inconsistency is on line 6, although it should be on line 4 ending with CR !

          Well… that seems to be because $[^\r][^\n] (when searching from top of file) misses line 4 and matches the LF at the end of line 6 and the t at the start of line 7.

          The original regex of \r[^\n]|[^\r]\n seems to work better…

          1 Reply Last reply Reply Quote 1
          • Alan KilbornA
            Alan Kilborn
            last edited by Alan Kilborn

            I noticed that other odd things can happen.

            Example:

            I created a Unix (LF) file and put some lines in it, and then changed one of the line’s endings to CRLF:

            7f90b7d9-e102-4085-b25e-5b6da0751f0a-image.png

            The status bar said:

            59a293bc-e6fd-48b2-8081-f8afd1617d9f-image.png

            Running the script said:

            04f26430-725b-4649-bec5-02d06d797169-image.png

            but it should have said line 3.

            Moving to the PS console window and checking the EOL mode, I discovered:

            54577d16-e3b9-4f83-aded-accfafc391c3-image.png

            So I seem to have found a case where something is out of sync: Notepad++ 's status bar says LF for line-endings, but the Scintilla buffer says something different (CRLF).

            EDIT: I seem to have figured out why: The editorconfig plugin seems to be interfering. I have it set for CRLF for the file in question. However, I’d have thought that this plugin only does things when I save a file, and in the above I’ve not saved the data. Oh, well, (non)problem solved.

            1 Reply Last reply Reply Quote 2
            • Alan KilbornA
              Alan Kilborn
              last edited by

              This time I’ve found a real bug in the script, and it is with the code I suggested:

              Buggy code:

                  line_of_first_mismatch = editor.lineFromPosition(match.span(0)[1])
                  notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch),'EOL Mismatch', 0)
              

              Better code:

                  line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0])
                  notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
              
              1 Reply Last reply Reply Quote 1
              • guy038G
                guy038
                last edited by guy038

                Hello, @ekopalypse, @alan-kilborn and All,

                Ah…, OK. I see the problem ! Now, Alan, if you try this script on files with more than 500,000 lines, the regex \r[^\n]|[^\r]\n return an error whereas the regex $[^\r][^\n] works correctly and displays the expected message All EOLS correct


                Thus, I decided that this behaviour is of higher importance compared to knowing which is the first mismatched line found ! I, then, changed this script as below :

                check = True
                
                false_EOL = {0:'$[^\r][^\n]',  # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows   EOL )
                             1:'\n',           # Should be \r                             as editor.getEOLMode() = 1 ( Macintosh EOL )
                             2:'\r',           # Should be \n                             as editor.getEOLMode() = 2 ( Unix      EOL )
                            }
                
                def check_eol(match):
                    global check
                    check = False
                    user_input = notepad.prompt('Convert ALL line-endings of CURRENT file ( 0 for CRLF, 1 for CR, 2 for LF )',
                        'INCONSISTENT line-endings DETECTED !', editor.getEOLMode())
                    if user_input is not None:
                        desired_eol_index = int(user_input)
                        if 0 <= desired_eol_index <= 2:
                            eol_cmd_list = [
                                MENUCOMMAND.FORMAT_TODOS,
                                MENUCOMMAND.FORMAT_TOMAC,
                                MENUCOMMAND.FORMAT_TOUNIX,
                            ]
                            if desired_eol_index == editor.getEOLMode():
                                notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3])  # change to UNDESIRED line-endings
                            notepad.menuCommand(eol_cmd_list[desired_eol_index])                # change to DESIRED   line-endings
                
                editor.research(false_EOL[editor.getEOLMode()],     # regex to search for
                                check_eol,                          # function to call if regex match
                                0,                                  # re flags
                                0,                                  # START of file
                                editor.getLength(),                 # END   of file
                                1)                                  # count ( at FIRST match )
                
                if check == True:
                    notepad.messageBox('All EOLS correct','EOL check', 0)
                

                Do note that it’s my own preference, only !

                Best Regards,

                guy038

                P.S. :

                In the meantime, I saw that you"ve done testing a lot ! Thanks for your tests but, as you can see, I solved the problem definitively ;-))

                Alan KilbornA 1 Reply Last reply Reply Quote 0
                • Alan KilbornA
                  Alan Kilborn @guy038
                  last edited by Alan Kilborn

                  @guy038 said :

                  whereas the regex $[^\r][^\n] works correctly

                  Try it on a Windows (CR LF) file and this data:

                  d0d694b4-e64f-4497-897d-176f5009356d-image.png

                  That regex doesn’t hit anything in that text.


                  I solved the problem definitively

                  Hmm. :-)

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by guy038

                    Hi, @ekopalypse, @alan-kilborn and All,

                    I deeply apologize, because my regex to find out all wrong cases, in case of a Windows file, was itself bugged !

                    You were right about it, Alan. The correct regex is $\n|\r^ leading to the line :

                    false_EOL = {0:'$\n|\r^',  # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows   EOL )
                    

                    This time, results are coherent, even for large files !

                    BR

                    guy038

                    1 Reply Last reply Reply Quote 1
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello, @ekopalypse, @alan-kilborn and All,

                      I did some additional tests, with your modifications, Alan :

                          line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0])
                          notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
                      

                      and my own one :

                      false_EOL = {0:'$\n|\r^',  # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows   EOL )
                      

                      And everything seems to work as expected !

                      So the final version of this script is :

                      check = True
                      
                      false_EOL = {0:'$\n|\r^',  # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows   EOL )
                                   1:'\n',       # Find \n ( should be \r )                                 as editor.getEOLMode() = 1 ( Macintosh EOL )
                                   2:'\r',       # Find \r ( should be \n )                                 as editor.getEOLMode() = 2 ( Unix      EOL )
                                  }
                      
                      def check_eol(match):
                          global check
                          check = False
                          line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0])
                          notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
                          user_input = notepad.prompt('Convert ALL line-endings of CURRENT file ( 0 for CRLF, 1 for CR, 2 for LF )',
                              'INCONSISTENT line-endings DETECTED !', editor.getEOLMode())
                          if user_input is not None:
                              desired_eol_index = int(user_input)
                              if 0 <= desired_eol_index <= 2:
                                  eol_cmd_list = [
                                      MENUCOMMAND.FORMAT_TODOS,
                                      MENUCOMMAND.FORMAT_TOMAC,
                                      MENUCOMMAND.FORMAT_TOUNIX,
                                  ]
                                  if desired_eol_index == editor.getEOLMode():
                                      notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3])  # change to UNDESIRED line-endings
                                  notepad.menuCommand(eol_cmd_list[desired_eol_index])                # change to DESIRED   line-endings
                      
                      editor.research(false_EOL[editor.getEOLMode()],     # regex to search for
                                      check_eol,                          # function to call if regex match
                                      0,                                  # re flags
                                      0,                                  # START of file
                                      editor.getLength(),                 # END   of file
                                      1)                                  # count ( at FIRST match )
                      
                      if check == True:
                          notepad.messageBox('All EOLS correct','EOL check', 0)
                      

                      To be rigorous, note that the first EOL inconsistency is always the first line with line-ending chars(s) different from the status bar indication !

                      Best Regards,

                      guy038

                      1 Reply Last reply Reply Quote 2
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors