Removal of Blank Lines in a large number of files



  • Though it is possible to remove blank lines, from single files. I wonder if it is possible to do this in bulk (ie. a large number of files), in one go.



  • The “Find in files” has a “Replace” facility. So you could try a regular expression replacement of something like (note that there is a space before the “\t”):
    \r\n[ \t\r\n]*\r\n
    with
    \r\n
    (But adjust the “\r\n” parts to reflect the line endings in your files if they are not Windows files.)

    The regular expression looks for one newline, then as many spaces, tabs and newlines as can be found, then one more newline. The replacement is a single newline.

    WARNING
    Be very careful with the “Replace” facility of “Find in files”. If you use the wrong search or replacement you can mess up a lot of files. Try with some unimportant files (or copies) first.



  • Hi John,

    Here is a very simple regex which will delete any pure empty line, whatever its End of Line character(s) :

    But above all, the warning of AdrianHHH is quite sensible. So, I would suggest to copy a couple of files in a new folder to test it, first !

    Then :

    • Open the Find in Files dialog ( CTRL + SHIFT + F )

    • Type the regex ^\R+ in the Find what zone

    • Leave the Replace with EMPTY

    • Fill up the Filters and the Directory fields

    • Set the Regular expression search mode

    • Click on the Replace in Files button

    • Confirm the Are you sure dialog

    All at once, your test files won’t contain any pure empty line. Et voilà !


    Notes :

    • The interest of \R syntax is that it matches any kind of EOL ( \r\n of a Windows file, \n of a Unix/OSX file and \r of an Old Mac file ) In fact, strictly speaking \R = \r\n|[\n\v\f\r\x{2028}\x{2029}] but practically, it is, most of the time, identical to \r\n|\n|\r ( The order of the alternatives is important ! )

    • If, in addition, you would like to delete lines containing ONLY blank characters, use the search regex ^(\h*\R)+. Again, the Replace with zone stays empty. The syntax \h represents any horizontal blank character, that is to say, either the Space character ( \x20), the Tabulation character ( \x09 ) or the No-Break Space character ( \xA0 )

    • If you would like to delete any surplus pure blank line ( in other words, keeping ONLY ONE blank line, as a paragraphs separator ), just change the search regex into \R\R\K\R+. However, due to the \K form, inside this regex, the step by step replacement, with the Replace button, in the Replace dialog, won’t work. Use, ONLY the Replace All button !

    Best Regards,

    guy038



  • Thanks for all your replies.
    However, I should have mentioned that these files with blank lines, are TXT files, which contain output data, derived from a number of EXCEL spreadsheets, containing astronomical data, one TXT file, from each EXCEL spreadsheet. Each TXT file contains 6003 lines (by default), with only the first ten lines (or so) containing any data, so 5999 (or so) blank lines have to be removed. The person who wrote the code, assumed that the maximum output would be 6003 lines.



  • Forgot to say, the above solution solved my problem - Thanks.



  • @guy038 said:

    Hi John,

    Here is a very simple regex which will delete any pure empty line, whatever its End of Line character(s) :

    But above all, the warning of AdrianHHH is quite sensible. So, I would suggest to copy a couple of files in a new folder to test it, first !

    Then :

    • Open the Find in Files dialog ( CTRL + SHIFT + F )

    • Type the regex ^\R+ in the Find what zone

    • Leave the Replace with EMPTY

    • Fill up the Filters and the Directory fields

    • Set the Regular expression search mode

    • Click on the Replace in Files button

    • Confirm the Are you sure dialog

    All at once, your test files won’t contain any pure empty line. Et voilà !


    Notes :

    • The interest of \R syntax is that it matches any kind of EOL ( \r\n of a Windows file, \n of a Unix/OSX file and \r of an Old Mac file ) In fact, strictly speaking \R = \r\n|[\n\v\f\r\x{2028}\x{2029}] but practically, it is, most of the time, identical to \r\n|\n|\r ( The order of the alternatives is important ! )

    • If, in addition, you would like to delete lines containing ONLY blank characters, use the search regex ^(\h*\R)+. Again, the Replace with zone stays empty. The syntax \h represents any horizontal blank character, that is to say, either the Space character ( \x20), the Tabulation character ( \x09 ) or the No-Break Space character ( \xA0 )

    • If you would like to delete any surplus pure blank line ( in other words, keeping ONLY ONE blank line, as a paragraphs separator ), just change the search regex into \R\R\K\R+. However, due to the \K form, inside this regex, the step by step replacement, with the Replace button, in the Replace dialog, won’t work. Use, ONLY the Replace All button !

    Best Regards,

    guy038

    Hi!
    The answer above fromGuy038 seems excellent and should work but somehow I can’t make it work with my files.

    I’m running the latest version of Notepad++ (7.5.4). I have very limited knowledge about Regex, I’d really appreciate if someone can point where I might have done wrong to make it work.

    Thank you!



  • @Nguyễn-Huy-Hải

    Just a guess but maybe your lines are blank but not empty, the difference being that a blank line would contain only whitespace (spaces, tabs, …) and a truly empty line would contain, well, nothing but the line-ending. Without turning on whitespace visibility it would be difficult to see what you have.

    Maybe try turning on this option: View (menu) -> Show Symbol -> Show White Space and TAB



  • Hi @Scott-Sumner
    Thanks for your reply!

    I had my doubt so I went to turn on Show white space and TAB but nothing shown (in pix below)
    https://cdn.discordapp.com/attachments/311547963883388938/406663528154529823/unknown.png

    I used echo command to add the last line to the text file but that commands also generates another empty line that follows. That’s why I need to remove it.

    After a bit of googling, I found that [\n\r]+$ works. I’m happy but still curious about the differences between multiple regex.


Log in to reply