Community
    • Login

    Removal of Blank Lines in a large number of files

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 5 Posters 13.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • John FairweatherJ
      John Fairweather
      last edited by

      Though it is possible to remove blank lines, from single files. I wonder if it is possible to do this in bulk (ie. a large number of files), in one go.

      1 Reply Last reply Reply Quote 0
      • AdrianHHHA
        AdrianHHH
        last edited by

        The “Find in files” has a “Replace” facility. So you could try a regular expression replacement of something like (note that there is a space before the “\t”):
        \r\n[ \t\r\n]*\r\n
        with
        \r\n
        (But adjust the “\r\n” parts to reflect the line endings in your files if they are not Windows files.)

        The regular expression looks for one newline, then as many spaces, tabs and newlines as can be found, then one more newline. The replacement is a single newline.

        WARNING
        Be very careful with the “Replace” facility of “Find in files”. If you use the wrong search or replacement you can mess up a lot of files. Try with some unimportant files (or copies) first.

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by

          Hi John,

          Here is a very simple regex which will delete any pure empty line, whatever its End of Line character(s) :

          But above all, the warning of AdrianHHH is quite sensible. So, I would suggest to copy a couple of files in a new folder to test it, first !

          Then :

          • Open the Find in Files dialog ( CTRL + SHIFT + F )

          • Type the regex ^\R+ in the Find what zone

          • Leave the Replace with EMPTY

          • Fill up the Filters and the Directory fields

          • Set the Regular expression search mode

          • Click on the Replace in Files button

          • Confirm the Are you sure dialog

          All at once, your test files won’t contain any pure empty line. Et voilà !


          Notes :

          • The interest of \R syntax is that it matches any kind of EOL ( \r\n of a Windows file, \n of a Unix/OSX file and \r of an Old Mac file ) In fact, strictly speaking \R = \r\n|[\n\v\f\r\x{2028}\x{2029}] but practically, it is, most of the time, identical to \r\n|\n|\r ( The order of the alternatives is important ! )

          • If, in addition, you would like to delete lines containing ONLY blank characters, use the search regex ^(\h*\R)+. Again, the Replace with zone stays empty. The syntax \h represents any horizontal blank character, that is to say, either the Space character ( \x20), the Tabulation character ( \x09 ) or the No-Break Space character ( \xA0 )

          • If you would like to delete any surplus pure blank line ( in other words, keeping ONLY ONE blank line, as a paragraphs separator ), just change the search regex into \R\R\K\R+. However, due to the \K form, inside this regex, the step by step replacement, with the Replace button, in the Replace dialog, won’t work. Use, ONLY the Replace All button !

          Best Regards,

          guy038

          Nguyễn Huy HảiN 1 Reply Last reply Reply Quote 0
          • John FairweatherJ
            John Fairweather
            last edited by

            Thanks for all your replies.
            However, I should have mentioned that these files with blank lines, are TXT files, which contain output data, derived from a number of EXCEL spreadsheets, containing astronomical data, one TXT file, from each EXCEL spreadsheet. Each TXT file contains 6003 lines (by default), with only the first ten lines (or so) containing any data, so 5999 (or so) blank lines have to be removed. The person who wrote the code, assumed that the maximum output would be 6003 lines.

            1 Reply Last reply Reply Quote 0
            • John FairweatherJ
              John Fairweather
              last edited by

              Forgot to say, the above solution solved my problem - Thanks.

              1 Reply Last reply Reply Quote 0
              • Nguyễn Huy HảiN
                Nguyễn Huy Hải @guy038
                last edited by

                @guy038 said:

                Hi John,

                Here is a very simple regex which will delete any pure empty line, whatever its End of Line character(s) :

                But above all, the warning of AdrianHHH is quite sensible. So, I would suggest to copy a couple of files in a new folder to test it, first !

                Then :

                • Open the Find in Files dialog ( CTRL + SHIFT + F )

                • Type the regex ^\R+ in the Find what zone

                • Leave the Replace with EMPTY

                • Fill up the Filters and the Directory fields

                • Set the Regular expression search mode

                • Click on the Replace in Files button

                • Confirm the Are you sure dialog

                All at once, your test files won’t contain any pure empty line. Et voilà !


                Notes :

                • The interest of \R syntax is that it matches any kind of EOL ( \r\n of a Windows file, \n of a Unix/OSX file and \r of an Old Mac file ) In fact, strictly speaking \R = \r\n|[\n\v\f\r\x{2028}\x{2029}] but practically, it is, most of the time, identical to \r\n|\n|\r ( The order of the alternatives is important ! )

                • If, in addition, you would like to delete lines containing ONLY blank characters, use the search regex ^(\h*\R)+. Again, the Replace with zone stays empty. The syntax \h represents any horizontal blank character, that is to say, either the Space character ( \x20), the Tabulation character ( \x09 ) or the No-Break Space character ( \xA0 )

                • If you would like to delete any surplus pure blank line ( in other words, keeping ONLY ONE blank line, as a paragraphs separator ), just change the search regex into \R\R\K\R+. However, due to the \K form, inside this regex, the step by step replacement, with the Replace button, in the Replace dialog, won’t work. Use, ONLY the Replace All button !

                Best Regards,

                guy038

                Hi!
                The answer above fromGuy038 seems excellent and should work but somehow I can’t make it work with my files.

                I’m running the latest version of Notepad++ (7.5.4). I have very limited knowledge about Regex, I’d really appreciate if someone can point where I might have done wrong to make it work.

                Thank you!

                Scott SumnerS 1 Reply Last reply Reply Quote 0
                • Scott SumnerS
                  Scott Sumner @Nguyễn Huy Hải
                  last edited by

                  @Nguyễn-Huy-Hải

                  Just a guess but maybe your lines are blank but not empty, the difference being that a blank line would contain only whitespace (spaces, tabs, …) and a truly empty line would contain, well, nothing but the line-ending. Without turning on whitespace visibility it would be difficult to see what you have.

                  Maybe try turning on this option: View (menu) -> Show Symbol -> Show White Space and TAB

                  Nguyễn Huy HảiN 1 Reply Last reply Reply Quote 0
                  • Nguyễn Huy HảiN
                    Nguyễn Huy Hải @Scott Sumner
                    last edited by

                    Hi @Scott-Sumner
                    Thanks for your reply!

                    I had my doubt so I went to turn on Show white space and TAB but nothing shown (in pix below)
                    https://cdn.discordapp.com/attachments/311547963883388938/406663528154529823/unknown.png

                    I used echo command to add the last line to the text file but that commands also generates another empty line that follows. That’s why I need to remove it.

                    After a bit of googling, I found that [\n\r]+$ works. I’m happy but still curious about the differences between multiple regex.

                    1 Reply Last reply Reply Quote 0
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors