Removing Blank Lines From All Pages



  • I have a 1,400 page program-generated text document. Each page is in the same format with the only system characters being ends of lines and page breaks. The text on each page is in the same location and is preceded by 12 lank lines. I need a way to delete four of those blank lines from the top of each page.

    Thank you in advance for your help!



  • @Gregory-Heffner

    by using regular expression and negating multiline modifier it should be possible to use the following regex

    (?-m)^(\R{4})(?=\R{8})
    

    Using (?-m) means we do not want to have multilines active which means that
    ^ is used as start of file anchor
    \R is a eol char like \r\n or \r or \n
    {4} 4x times previous char
    (?=\R{8}) = followed by 8x eols

    Replace with needs to be empty.
    Try it first on a file before running on all 1400 files (maybe a backup could be helpful as well)

    Cheers
    Claudia



  • Hello, @gregory-heffner and @claudia-frank,

    Claudia, your regex may be simplified : Your don’t need to store the four line-breaks in group 1 with parentheses, as your replacement part is empty !

    So your regex becomes :

    (?-m)^\R{4}(?=\R{8})
    

    However, there’s a bug with that regex, when a second block of 12 blank lines exist, further in the current file and that you set the Wrap around option. Indeed, if you, manually, place the caret at beginning of that second block, it also selects the first fourth lines of that second block, before going back to the first four lines, at the very beginning of the file :-((

    I also tried with the regex, below, with the \A zero-length assertion, standing for the beginning of file, too, without more success ! As said in previous posts, these backward assertions are really not very well managed by our BOOST regex engine !!

    \A\R{4}(?=\R{8})
    

    Of course, as Gregory will certainly use the Find in Files dialog, with a search way, starting at the very beginning of each scanned file, this, normally, doesn’t matter :-))


    But, in conclusion, Gregory, if you’re quite sure that these 12 consecutive blank lines occur, only once, in each file, simply use the following regex :

    SEARCH ^\R{4}(?=\R{8})

    REPLACE : Leave EMPTY

    Notes :

    • The \R syntax represents any single line break ( \r\n in Windows files, \n in Unix files and \r in Mac files )

    • So this regex looks for 4 complete blank lines, included the first one, due to the ^ symbol, which, usually, stands for a beginning of line.

    • With the condition ( because of the look-ahead feature (?=\R{8}) ) that it followed by a range of 8 complete blank lines !

    Remark : During tests, just check the Wrap around option !

    Best regards,

    guy038


Log in to reply