Community
    • Login

    Search for inconsistent line endings with a regex? (part 2)

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    29 Posts 5 Posters 11.0k Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G Offline
      guy038
      last edited by guy038

      Hello, @alan-kilborn,

      I’ll study your last solution, on Monday 18 ( Again, I’m away on a three-day ski trip 😉 )

      Best Regards,

      guy038

      1 Reply Last reply Reply Quote 0
      • guy038G Offline
        guy038
        last edited by

        Hello, @ekopalypse, @alan-kilborn, and All,

        Like you proposed, @alan-kilborn, the enhanced script becomes :

        check = True
        
        false_EOL = {0:'$[^\r][^\n]',  # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows   EOL )
                     1:'\n',           # Should be \r                             as editor.getEOLMode() = 1 ( Macintosh EOL )
                     2:'\r',           # Should be \n                             as editor.getEOLMode() = 2 ( Unix      EOL )
                    }
        
        def check_eol(match):
            global check
            check = False
            line_of_first_mismatch = editor.lineFromPosition(match.span(0)[1])
            notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch),'EOL Mismatch', 0)
            user_input = notepad.prompt('Convert all line-endings in file?\r\nIf so, enter 0 for CRLF, 1 for CR, 2 for LF',
                'INCONSISTENT LINE-ENDINGS DETECTED!', editor.getEOLMode())
            if user_input is not None:
                desired_eol_index = int(user_input)
                if 0 <= desired_eol_index <= 2:
                    eol_cmd_list = [
                        MENUCOMMAND.FORMAT_TODOS,
                        MENUCOMMAND.FORMAT_TOMAC,
                        MENUCOMMAND.FORMAT_TOUNIX,
                    ]
                    if desired_eol_index == editor.getEOLMode():
                        notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3])  # change to UNDESIRED line-endings
                    notepad.menuCommand(eol_cmd_list[desired_eol_index])                # change to DESIRED   line-endings
        
        editor.research(false_EOL[editor.getEOLMode()],     # regex to search for
                        check_eol,                          # function to call if regex match
                        0,                                  # re flags
                        0,                                  # START of file
                        editor.getLength(),                 # END   of file
                        1)                                  # count ( at FIRST match )
        
        if check == True:
            notepad.messageBox('All EOLS correct','EOL check', 0)
        

        Now, given this simple text :

        This
        is
        a
        little
        test
        to   
        try
        if
        OK
        
        • With Windows (CR LF) in the status bar

        • With line 4 ending with CR

        • line 6 ending with 3 spaces + LF

        • And all the other lines ending with CRLF

        When running the script, it said :

        Different EOLS detected -- The first inconsistency is on line 6, although it should be on line 4 ending with CR !


        Still searching for other oddities :-)

        Best Regards,

        guy038

        Alan KilbornA 1 Reply Last reply Reply Quote 0
        • Alan KilbornA Offline
          Alan Kilborn @guy038
          last edited by Alan Kilborn

          @guy038 said :

          Different EOLS detected – The first inconsistency is on line 6, although it should be on line 4 ending with CR !

          Well… that seems to be because $[^\r][^\n] (when searching from top of file) misses line 4 and matches the LF at the end of line 6 and the t at the start of line 7.

          The original regex of \r[^\n]|[^\r]\n seems to work better…

          1 Reply Last reply Reply Quote 1
          • Alan KilbornA Offline
            Alan Kilborn
            last edited by Alan Kilborn

            I noticed that other odd things can happen.

            Example:

            I created a Unix (LF) file and put some lines in it, and then changed one of the line’s endings to CRLF:

            7f90b7d9-e102-4085-b25e-5b6da0751f0a-image.png

            The status bar said:

            59a293bc-e6fd-48b2-8081-f8afd1617d9f-image.png

            Running the script said:

            04f26430-725b-4649-bec5-02d06d797169-image.png

            but it should have said line 3.

            Moving to the PS console window and checking the EOL mode, I discovered:

            54577d16-e3b9-4f83-aded-accfafc391c3-image.png

            So I seem to have found a case where something is out of sync: Notepad++ 's status bar says LF for line-endings, but the Scintilla buffer says something different (CRLF).

            EDIT: I seem to have figured out why: The editorconfig plugin seems to be interfering. I have it set for CRLF for the file in question. However, I’d have thought that this plugin only does things when I save a file, and in the above I’ve not saved the data. Oh, well, (non)problem solved.

            1 Reply Last reply Reply Quote 2
            • Alan KilbornA Offline
              Alan Kilborn
              last edited by

              This time I’ve found a real bug in the script, and it is with the code I suggested:

              Buggy code:

                  line_of_first_mismatch = editor.lineFromPosition(match.span(0)[1])
                  notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch),'EOL Mismatch', 0)
              

              Better code:

                  line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0])
                  notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
              
              1 Reply Last reply Reply Quote 1
              • guy038G Offline
                guy038
                last edited by guy038

                Hello, @ekopalypse, @alan-kilborn and All,

                Ah…, OK. I see the problem ! Now, Alan, if you try this script on files with more than 500,000 lines, the regex \r[^\n]|[^\r]\n return an error whereas the regex $[^\r][^\n] works correctly and displays the expected message All EOLS correct


                Thus, I decided that this behaviour is of higher importance compared to knowing which is the first mismatched line found ! I, then, changed this script as below :

                check = True
                
                false_EOL = {0:'$[^\r][^\n]',  # Miss the TWO chars \r\n at 'end of line' as editor.getEOLMode() = 0 ( Windows   EOL )
                             1:'\n',           # Should be \r                             as editor.getEOLMode() = 1 ( Macintosh EOL )
                             2:'\r',           # Should be \n                             as editor.getEOLMode() = 2 ( Unix      EOL )
                            }
                
                def check_eol(match):
                    global check
                    check = False
                    user_input = notepad.prompt('Convert ALL line-endings of CURRENT file ( 0 for CRLF, 1 for CR, 2 for LF )',
                        'INCONSISTENT line-endings DETECTED !', editor.getEOLMode())
                    if user_input is not None:
                        desired_eol_index = int(user_input)
                        if 0 <= desired_eol_index <= 2:
                            eol_cmd_list = [
                                MENUCOMMAND.FORMAT_TODOS,
                                MENUCOMMAND.FORMAT_TOMAC,
                                MENUCOMMAND.FORMAT_TOUNIX,
                            ]
                            if desired_eol_index == editor.getEOLMode():
                                notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3])  # change to UNDESIRED line-endings
                            notepad.menuCommand(eol_cmd_list[desired_eol_index])                # change to DESIRED   line-endings
                
                editor.research(false_EOL[editor.getEOLMode()],     # regex to search for
                                check_eol,                          # function to call if regex match
                                0,                                  # re flags
                                0,                                  # START of file
                                editor.getLength(),                 # END   of file
                                1)                                  # count ( at FIRST match )
                
                if check == True:
                    notepad.messageBox('All EOLS correct','EOL check', 0)
                

                Do note that it’s my own preference, only !

                Best Regards,

                guy038

                P.S. :

                In the meantime, I saw that you"ve done testing a lot ! Thanks for your tests but, as you can see, I solved the problem definitively ;-))

                Alan KilbornA 1 Reply Last reply Reply Quote 0
                • Alan KilbornA Offline
                  Alan Kilborn @guy038
                  last edited by Alan Kilborn

                  @guy038 said :

                  whereas the regex $[^\r][^\n] works correctly

                  Try it on a Windows (CR LF) file and this data:

                  d0d694b4-e64f-4497-897d-176f5009356d-image.png

                  That regex doesn’t hit anything in that text.


                  I solved the problem definitively

                  Hmm. :-)

                  1 Reply Last reply Reply Quote 0
                  • guy038G Offline
                    guy038
                    last edited by guy038

                    Hi, @ekopalypse, @alan-kilborn and All,

                    I deeply apologize, because my regex to find out all wrong cases, in case of a Windows file, was itself bugged !

                    You were right about it, Alan. The correct regex is $\n|\r^ leading to the line :

                    false_EOL = {0:'$\n|\r^',  # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows   EOL )
                    

                    This time, results are coherent, even for large files !

                    BR

                    guy038

                    1 Reply Last reply Reply Quote 1
                    • guy038G Offline
                      guy038
                      last edited by guy038

                      Hello, @ekopalypse, @alan-kilborn and All,

                      I did some additional tests, with your modifications, Alan :

                          line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0])
                          notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
                      

                      and my own one :

                      false_EOL = {0:'$\n|\r^',  # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows   EOL )
                      

                      And everything seems to work as expected !

                      So the final version of this script is :

                      check = True
                      
                      false_EOL = {0:'$\n|\r^',  # Find \n AFTER end of line OR \r BEFORE beginning of line as editor.getEOLMode() = 0 ( Windows   EOL )
                                   1:'\n',       # Find \n ( should be \r )                                 as editor.getEOLMode() = 1 ( Macintosh EOL )
                                   2:'\r',       # Find \r ( should be \n )                                 as editor.getEOLMode() = 2 ( Unix      EOL )
                                  }
                      
                      def check_eol(match):
                          global check
                          check = False
                          line_of_first_mismatch = editor.lineFromPosition(match.span(0)[0])
                          notepad.messageBox('Different EOLS detected -- the first inconsistency is on line ' + str(line_of_first_mismatch + 1),'EOL Mismatch', 0)
                          user_input = notepad.prompt('Convert ALL line-endings of CURRENT file ( 0 for CRLF, 1 for CR, 2 for LF )',
                              'INCONSISTENT line-endings DETECTED !', editor.getEOLMode())
                          if user_input is not None:
                              desired_eol_index = int(user_input)
                              if 0 <= desired_eol_index <= 2:
                                  eol_cmd_list = [
                                      MENUCOMMAND.FORMAT_TODOS,
                                      MENUCOMMAND.FORMAT_TOMAC,
                                      MENUCOMMAND.FORMAT_TOUNIX,
                                  ]
                                  if desired_eol_index == editor.getEOLMode():
                                      notepad.menuCommand(eol_cmd_list[(desired_eol_index + 1) % 3])  # change to UNDESIRED line-endings
                                  notepad.menuCommand(eol_cmd_list[desired_eol_index])                # change to DESIRED   line-endings
                      
                      editor.research(false_EOL[editor.getEOLMode()],     # regex to search for
                                      check_eol,                          # function to call if regex match
                                      0,                                  # re flags
                                      0,                                  # START of file
                                      editor.getLength(),                 # END   of file
                                      1)                                  # count ( at FIRST match )
                      
                      if check == True:
                          notepad.messageBox('All EOLS correct','EOL check', 0)
                      

                      To be rigorous, note that the first EOL inconsistency is always the first line with line-ending chars(s) different from the status bar indication !

                      Best Regards,

                      guy038

                      1 Reply Last reply Reply Quote 2

                      Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                      Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                      With your input, this post could be even better 💗

                      Register Login
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors