Replace X number of lines after finding Y

Brian Krontz

I have a ton of text files that are cluttered with headers where there is a page break/FF symbol. The header always contains 6 lines starting at the FF symbol. I’m not sure how to search for a vague number of lines and/or characters after the unique FF starts off the header. The header also ends with “Y.T.D”. followed by an empty line, if that helps. In a perfect world, I’d like to remove the “(Continued)” aka 7th line only if the word “Continued” appears (in green below).

Example (goal is to remove everything highlighted… for all of the page breaks/headers the file may contain):

Thanks to all of the folks that created and maintain this software. I really appreciate ya’ll and your community!

B

guy038

Hello, @brian-krontz, and All,

Easy with regular expressions ! The following regex S/R should work :

Open the Replace dialog ( Ctrl + H )
- SEARCH (?s-i)^\x0C.+?Y\.T\.D\.\h*\R\h*\R(?-s)(?:.+\(Continued\).*\R)?
- REPLACE Leave EMPTY
Tick the Wrap around option
Select the Regular expression search mode
CLick once on the Replace All button or several times on the Replace button

Voila !

If it solves your problem, I’ll explain the regex syntax next time !

Best Regards,

guy038

Ekopalypse

@Brian-Krontz

depending on your real data this might work.
Find what:\x0C(.*\R){6}.*Y\.T\.D\r\n\r\n.*\(Continued\)|\x0C(.*\R){6}.*Y\.T\.D
leave replace with empty and check regular expression in search mode

Terry R

@Brian-Krontz
Just in case you were worried about removing more lines than you wanted I’d suggest using @guy038 regex, but with the “Mark” function and tick the bookmark line. Then you could “mark all” and go through the document/file to verify it was ONLY the lines you wanted were marked. Then once confirmed use the “Remove bookmarked Lines” from under Search (main menu), Bookmark.

I too was working on an almost exact copy of @guy038 regex but he was faster to the keyboard.

Good luck
Terry

Brian Krontz

You guys are amazing. I checked the results in Beyond Compare and everything looks perfect. Guy, it worked, so explain away if you’d like. Eko and Terry, I think he’s got me fixed up, but appreciate the extra set of eyes. Thanks to all!!

B

Ekopalypse

@Terry-R said in Replace X number of lines after finding Y:

Just in case you were worried about removing more lines than you wanted I’d suggest using @guy038 regex

Just curious, where do you see an issue that my regex
might remove more lines then expected?

Terry R

@Ekopalypse said in Replace X number of lines after finding Y:

where do you see an issue that my regex

My concern was for the OP. The example provided would seem to be accountancy stuff. VERY important data and I was just letting him know that IF HE WAS concerned this was a way to verify the correct lines were removed. As you will see, he had another method to prove that anyways (Beyond Compare).

Cheers
Terry

Makwana Prahlad

Hello,@Brian-Krontz

Please try this information,To Replace X number of lines after finding Y

In the simplest calling of sed, it has one line of text in the pattern space, ie. 1 line of \n delimited text from the input. The single line in the pattern space has no \n… That’s why your regex is not finding anything.

You can read multiple lines into the pattern-space and manipulate things surprisingly well, but with a more than normal effort… Sed has a set of commands which allow this type of thing… Here is a link to a Command Summary for sed. It is the best one I’ve found, and got me rolling.

However forget the “one-liner” idea once you start using sed’s micro-commands. It is useful to lay it out like a structured program until you get the feel of it… It is surprisingly simple, and equally unusual. You could think of it as the “assembler language” of text editing.

Summary: Use sed for simple things, and maybe a bit more, but in general, when it gets beyond working with a single line, most people prefer something else…
I’ll let someone else suggest something else… I’m really not sure what the best choice would be (I’d use sed, but that’s because I don’t know perl well enough.)

sed '/^a test$/{
       $!{ N        # append the next line when not on the last line
         s/^a test\nPlease do not$/not a test\nBe/
                    # now test for a successful substitution, otherwise
                    #+  unpaired "a test" lines would be mis-handled
         t sub-yes  # branch_on_substitute (goto label :sub-yes)
         :sub-not   # a label (not essential; here to self document)
                    # if no substituion, print only the first line
         P          # pattern_first_line_print
         D          # pattern_ltrunc(line+nl)_top/cycle
         :sub-yes   # a label (the goto target of the 't' branch)
                    # fall through to final auto-pattern_print (2 lines)
       }    
     }' alpha.txt

I hope this information will be usefull for you.
Thank you.

Alan Kilborn

@Makwana-Prahlad

Why is it necessary to take this outside of Notepad++?
This is a Notepad++ forum and if a solution can be found using Notepad++ then no discussion of outside tools is needed or wanted.
Also, the question has been answered previously so there is no real need to seek further solutions.

guy038

Hi, @brian-krontz, @ekopalypse, @terry-r and All,

For a better readability, if we use the free-spacing mode, (?x) modifier, my previous search regex can be rewritten as :

(?x)   (?s-i)   ^\x0C   .+?   Y\.T\.D\.   \h*\R   \h*\R   (?-s)   (?:   .+\(Continued\).*   \R   )?

Notes :

First the part (?s-i) are in-line modifiers :
- (?s) means that the dot regex symbol ( . ) represents any single character, even like-break chars like \r or \n ( single-line mode )
- (?-i) carries out the search in a case-sensitive manner ( non-insensitive mode )
Then, the part ^\x0C searches for a FF control character, at beginning of lines ( ^ )
Now, the .+? syntax looks for the shortest non-null range of any character, even line-breaks, TILL…
…the part Y\.T\.D\.\h*\R. That is to say …till the string Y.T.D., with that exact case, followed by possible horizontal blank char(s) ( Tab / Space ) and its line-break , as \R match any kind of line-end char(s)
And the \h*\R looks for a blank line ( the 6th ), possibly preceded with horizontal blank character(s)
Then, the (?-s) modifier means that, from now on, any regex . symbol matches a single standard character only ( not EOL ones )
Now, the part .+\(Continued\).*\R searches, in current line, for any string, (Continued), with this exact case, preceded with a non-null range of standard char(s) ( .+ ) and followed with other range of standard char(s), possibly null ( .* ) and its line-break char(s) ( \R )
As the above part is embedded in the structure (?:......)?, this means that all the 7th-line contents are embedded in a non-capturing group, which may be optional, due to the ? quantifier, at the end ( a shorthand of the formal syntax {0,1} )

Remark : The dot . and parentheses () chars are regex symbols. So, in order to be interpreted as literal characters, they need to be escaped with the \ symbol

Best Regards,

guy038

guy038

Hello, @brian-krontz, @ekopalypse, @terry-r @alan-kilborn, @makwana-prahlad and All,

To @makwana-prahlad :

Why are you using such a complicated script, with some advanced options of sed ?

To my mind, the one-line script, below, does the job nicely, too ;-))

sed -n "/^\x0C/,+5d ; /(Continued)/d ; p" Input.txt > Output.txt

Notes :

First, sed searches, in the Input.txt file, for a range of lines :
- The first line of this range must contain the FF char, at beginning of current line, so the /^\x0C/ syntax
- Till the next 5 lines, that is to say, till the 6th empty line. Thus, the ,+5 syntax
Then, the d command deletes this range of entire lines
Secondly, sed searches if the next 7th line contains the (Continued) string, with that exact case, so the syntax /(Continued)/ , and, in that case, the d command, again, deletes this current line
Finally, if none of these criteria can be verified, the p command simply rewrites the current line in the Output.txt file

Indeed, not the same philosophy at usual N++ regexes, but rather easy to understand, too !

Cheers,

guy038

P.S. :

My sed script is not totally exact ! Indeed, if a line, located outside the header zone, contains the string (Continued), this line is wrongly deleted ! But, anyway, we’re on a Notepad++ forum, after all ;-))