Community
    • Login

    Mixed Line Ending Detection. Possible?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    line endingeol
    12 Posts 4 Posters 6.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Scott SumnerS
      Scott Sumner @SalviaSage
      last edited by

      @SalviaSage

      So ideally you’d have some code watching to see if you ever get a “bad” line ending in your editor tab buffer. “bad” is defined as:

      • Your file is Windows (CR LF) (see status bar) and you get either a LF line-ending or a CR line-ending (somehow) in its editing buffer
      • Your file is UNIX (LF) and you get either a CR LF line-ending or a CR line-ending in its editing buffer
      • Your file is MAC (CR) and you get either a CR LF line-ending or a LF line-ending in its editing buffer

      However, one of the problems with doing this, at least with one of the scripting languages, is what’s known as the “big file problem”. As long as your files are relatively small, scanning them often (like when you make a change that could potentially put a wrong line-ending in) runs fairly quickly, but if your files are large then you start to see the time it takes to scan a file because it shows up in a sluggish user interface.

      So then you make compromises. You can scan a file for mismatches when it is first opened. You can scan it at save time. You can come up with some complex algorithm to scan pieces of it at different times. Your choice.

      That being said, normally it is a difficult task to get a wrong line-ending in your editor buffer. Scintilla normally takes care of keeping the line-endings consistent and correct. For example, if you copy lines out of a UNIX file buffer and into a Windows file buffer, Scintilla will convert the line endings from LF to CRLF during the paste. So…is it a really big issue after all?

      1 Reply Last reply Reply Quote 1
      • SalviaSageS
        SalviaSage
        last edited by

        Yes, that is true, it is not a big issue, since they can easily be fixed with notepad++ auto convert feature anyway, by clicking the status bar.

        If it is that hard to determine that there are bad line endings, then we might as well not bother with it at all!

        1 Reply Last reply Reply Quote 1
        • Scott SumnerS
          Scott Sumner
          last edited by

          @SalviaSage

          If it is that hard to determine that there are bad line endings, then we might as well not bother with it at all!

          It’s not necessarily hard, there just isn’t some magic that makes it especially easy.

          I actually do this line-ending mismatch detection for myself, using pretty much the same technique used in this thread. I monitor what is happening in the currently viewable area, figuring that this is the likeliest place for a mismatched line ending to get inserted. If it is detected, the code turns on visible line-endings (normally I have that turned off) to really hit me with the problem.

          Perfect? No. Really easy? No. Really hard? No. Acceptable? Yes.

          By the way, here is a good way to get Unix line-endings in your Windows files: Copy some multiline text from the Pythonscript console window, then use the Clipboard History panel to paste it into a Windows editor tab! BOOM! LF(only) line-endings in your Windows buffer. Note that this series of events goes around Scintilla’s watchful paste protection.

          1 Reply Last reply Reply Quote 2
          • SalviaSageS
            SalviaSage
            last edited by

            Oh, umm… You do know I will be asking for that end-of-line detection code… right?

            xD xD

            too bad, you shouldn’t have been such a nice open-source guy.

            hehehehehehe…

            Scott SumnerS 1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hi, @salviasage, @scott-sumner and All,

              Once more, with a simple regex, it’s not that hard to find out if bad line-endings exist in a specific file ;-)

              • Open the Find/Replace dialog and select the Regular expression search mode

              • Just click on the Count button to get the number of occurrences of the following regexes, depending of the final status, wanted for that file

              • Fill up the Replace with: zone, as well, to get rid of all wrong line-endings, and click on the Replace All button


              So, whatever the present status of the file, regarding its line-endings, if you want to get a final :

              • WINDOWS file, with line-endings = CRLF  =>    SEARCH =  \r(?!\n)|(?<!\r)\n    REPLACE =  \r\n
              
              • UNIX    file, with line-endings = LF    =>    SEARCH =  \r\n?                 REPLACE =  \n
              
              • MAC     file, with line-ending  = CR    =>    SEARCH =  \r?\n                 REPLACE =  \r
              

              Best Regards,

              guy038

              1 Reply Last reply Reply Quote 0
              • Scott SumnerS
                Scott Sumner @SalviaSage
                last edited by Scott Sumner

                @SalviaSage

                That code is wrapped up into something “bigger” that I have–not worth it to me to extract it. In general, I share code here in a couple of cases:

                1. I already have the code done to do a certain specific thing when people post asking how it can be accomplished
                2. Somebody posts an idea (that I hadn’t previously thought of) that can be of use to me; thus I write the code that implements it and share it

                An example of case #2 is your request for this. I hadn’t thought of that previously, but when you brought it up, I thought “I can benefit from having that”…and now that it’s done I like it even more than I thought I would.

                So…yea, if you wanna work on coding it yourself, I can give you hints if you get stuck, but with the framework from that other code a lot of the “hard” part is done already.

                @guy038

                Hi Guy, yea, this comes up a lot it seems, but people seem to want something that runs automagically–and unfortunately running a regular-expression replacement doesn’t fit in that scheme (hmmm, maybe a regex replacement macro that is run at save time–can NppExec do that? dunno…). However, take heart in that the regexs from your posting above do appear in the code for this, here’s a sample:

                editor.research(r'\r(?!\n)|(?<!\r)\n', bad_line_ending_callback, 0, start_pos, end_pos)
                

                :-D

                1 Reply Last reply Reply Quote 1
                • Claudia FrankC
                  Claudia Frank
                  last edited by

                  If it is all about knowing that a file uses mixed eols only,
                  then my ¢¢ (CiYrm)

                  Behind the curtant you must see
                  The dark side never the strength in number it reveal
                  Don’t hestitate to break out of the loop you must do
                  if wasting time is not good for you

                  Cheers
                  Claudia

                  1 Reply Last reply Reply Quote 1
                  • SalviaSageS
                    SalviaSage
                    last edited by

                    @Scott-Sumner

                    Dear Scott, thanks for your input.

                    1 Reply Last reply Reply Quote 0
                    • Scott SumnerS
                      Scott Sumner
                      last edited by

                      @Scott-Sumner said:

                      normally it is a difficult task to get a wrong line-ending in your editor buffer

                      I just thought I’d point out that with an errant regular-expression replacement, it is actually quite easy to get wrong line-endings in your buffer. For example, if your replacement expression involves line-ending characters, make sure you get it correct for your file-type, as shown above. Scintilla does nothing to protect you from messing up your files in this case (like it does when pasting). Absolute power (regex) corrupts absolutely! :-) Or maybe I should say that it can !

                      1 Reply Last reply Reply Quote 1
                      • SalviaSageS
                        SalviaSage
                        last edited by SalviaSage

                        Oh, so when I paste things into scintilla, scintilla automatically converts whatever eol my paste has to what the document is set to, (which I Can see in the statusbar)

                        I didn’t know that but I guess that makes sense and protects me from mixed line endings.
                        I like that.

                        I still would like a script to be able to highlight my mixed EOLs though (just the same way I highlight my line final whitespace), or at the least turn on the EOL displays like how your script has. (which you aren’t pasting here = ( )

                        1 Reply Last reply Reply Quote 0
                        • Scott SumnerS
                          Scott Sumner
                          last edited by

                          @Scott-Sumner said:

                          normally it is a difficult task to get a wrong line-ending in your editor buffer

                          Apparently I shouldn’t have made that statement, as I keep proving it wrong…

                          I normally work with Windows formatted files with \r\n line-endings. I just noticed that pressing ctrl+m (which got done when my fingers got tangled up) will insert a Mac format line-ending \r in my Windows files–blech!.

                          :-(

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors