Replace X number of lines after finding Y
-
I have a ton of text files that are cluttered with headers where there is a page break/FF symbol. The header always contains 6 lines starting at the FF symbol. I’m not sure how to search for a vague number of lines and/or characters after the unique FF starts off the header. The header also ends with “Y.T.D”. followed by an empty line, if that helps. In a perfect world, I’d like to remove the “(Continued)” aka 7th line only if the word “Continued” appears (in green below).
Example (goal is to remove everything highlighted… for all of the page breaks/headers the file may contain):
Thanks to all of the folks that created and maintain this software. I really appreciate ya’ll and your community!
B
-
Hello, @brian-krontz, and All,
Easy with regular expressions ! The following regex S/R should work :
-
Open the Replace dialog (
Ctrl + H
)-
SEARCH
(?s-i)^\x0C.+?Y\.T\.D\.\h*\R\h*\R(?-s)(?:.+\(Continued\).*\R)?
-
REPLACE
Leave EMPTY
-
-
Tick the
Wrap around
option -
Select the
Regular expression
search mode -
CLick once on the
Replace All
button or several times on theReplace
button
Voila !
If it solves your problem, I’ll explain the regex syntax next time !
Best Regards,
guy038
-
-
depending on your real data this might work.
Find what:\x0C(.*\R){6}.*Y\.T\.D\r\n\r\n.*\(Continued\)|\x0C(.*\R){6}.*Y\.T\.D
leave replace with empty and check regular expression in search mode -
@Brian-Krontz
Just in case you were worried about removing more lines than you wanted I’d suggest using @guy038 regex, but with the “Mark” function and tick the bookmark line. Then you could “mark all” and go through the document/file to verify it was ONLY the lines you wanted were marked. Then once confirmed use the “Remove bookmarked Lines” from under Search (main menu), Bookmark.I too was working on an almost exact copy of @guy038 regex but he was faster to the keyboard.
Good luck
Terry -
You guys are amazing. I checked the results in Beyond Compare and everything looks perfect. Guy, it worked, so explain away if you’d like. Eko and Terry, I think he’s got me fixed up, but appreciate the extra set of eyes. Thanks to all!!
B
-
@Terry-R said in Replace X number of lines after finding Y:
Just in case you were worried about removing more lines than you wanted I’d suggest using @guy038 regex
Just curious, where do you see an issue that my regex
might remove more lines then expected? -
@Ekopalypse said in Replace X number of lines after finding Y:
where do you see an issue that my regex
My concern was for the OP. The example provided would seem to be accountancy stuff. VERY important data and I was just letting him know that IF HE WAS concerned this was a way to verify the correct lines were removed. As you will see, he had another method to prove that anyways (Beyond Compare).
Cheers
Terry -
Hello,@Brian-Krontz
Please try this information,To Replace X number of lines after finding Y
In the simplest calling of sed, it has one line of text in the pattern space, ie. 1 line of \n delimited text from the input. The single line in the pattern space has no \n… That’s why your regex is not finding anything.
You can read multiple lines into the pattern-space and manipulate things surprisingly well, but with a more than normal effort… Sed has a set of commands which allow this type of thing… Here is a link to a Command Summary for sed. It is the best one I’ve found, and got me rolling.
However forget the “one-liner” idea once you start using sed’s micro-commands. It is useful to lay it out like a structured program until you get the feel of it… It is surprisingly simple, and equally unusual. You could think of it as the “assembler language” of text editing.
Summary: Use sed for simple things, and maybe a bit more, but in general, when it gets beyond working with a single line, most people prefer something else…
I’ll let someone else suggest something else… I’m really not sure what the best choice would be (I’d use sed, but that’s because I don’t know perl well enough.)sed '/^a test$/{ $!{ N # append the next line when not on the last line s/^a test\nPlease do not$/not a test\nBe/ # now test for a successful substitution, otherwise #+ unpaired "a test" lines would be mis-handled t sub-yes # branch_on_substitute (goto label :sub-yes) :sub-not # a label (not essential; here to self document) # if no substituion, print only the first line P # pattern_first_line_print D # pattern_ltrunc(line+nl)_top/cycle :sub-yes # a label (the goto target of the 't' branch) # fall through to final auto-pattern_print (2 lines) } }' alpha.txt
I hope this information will be usefull for you.
Thank you. -
Why is it necessary to take this outside of Notepad++?
This is a Notepad++ forum and if a solution can be found using Notepad++ then no discussion of outside tools is needed or wanted.
Also, the question has been answered previously so there is no real need to seek further solutions. -
Hi, @brian-krontz, @ekopalypse, @terry-r and All,
For a better readability, if we use the free-spacing mode,
(?x)
modifier, my previous search regex can be rewritten as :(?x) (?s-i) ^\x0C .+? Y\.T\.D\. \h*\R \h*\R (?-s) (?: .+\(Continued\).* \R )?
Notes :
-
First the part
(?s-i)
are in-line modifiers :-
(?s)
means that the dot regex symbol (.
) represents any single character, even like-break chars like\r
or\n
(single-line
mode ) -
(?-i)
carries out the search in a case-sensitive manner (non-insensitive
mode )
-
-
Then, the part
^\x0C
searches for aFF
control character, at beginning of lines (^
) -
Now, the
.+?
syntax looks for the shortest non-null range of any character, even line-breaks, TILL… -
…the part
Y\.T\.D\.\h*\R
. That is to say …till the stringY.T.D.
, with that exact case, followed by possible horizontal blank char(s) (Tab
/Space
) and its line-break , as\R
match any kind of line-end char(s) -
And the
\h*\R
looks for a blank line ( the6th
), possibly preceded with horizontal blank character(s) -
Then, the
(?-s)
modifier means that, from now on, any regex.
symbol matches a single standard character only ( notEOL
ones ) -
Now, the part
.+\(Continued\).*\R
searches, in current line, for any string,(Continued)
, with this exact case, preceded with a non-null range of standard char(s) (.+
) and followed with other range of standard char(s), possibly null (.*
) and its line-break char(s) (\R
) -
As the above part is embedded in the structure
(?:......)?
, this means that all the7th
-line contents are embedded in a non-capturing group, which may be optional, due to the?
quantifier, at the end ( a shorthand of the formal syntax{0,1}
)
Remark : The dot
.
and parentheses()
chars are regex symbols. So, in order to be interpreted as literal characters, they need to be escaped with the\
symbolBest Regards,
guy038
-
-
Hello, @brian-krontz, @ekopalypse, @terry-r @alan-kilborn, @makwana-prahlad and All,
To @makwana-prahlad :
Why are you using such a complicated script, with some advanced options of
sed
?To my mind, the
one
-line script, below, does the job nicely, too ;-))sed -n "/^\x0C/,+5d ; /(Continued)/d ; p" Input.txt > Output.txt
Notes :
-
First,
sed
searches, in theInput.txt
file, for a range of lines :-
The first line of this range must contain the
FF
char, at beginning of current line, so the/^\x0C/
syntax -
Till the next
5 lines
, that is to say, till the6th
empty line. Thus, the,+5
syntax
-
-
Then, the
d
command deletes this range of entire lines -
Secondly,
sed
searches if the next7th
line contains the(Continued)
string, with that exact case, so the syntax/(Continued)/
, and, in that case, thed
command, again, deletes this current line -
Finally, if none of these criteria can be verified, the
p
command simply rewrites the current line in theOutput.txt
file
Indeed, not the same philosophy at usual N++ regexes, but rather easy to understand, too !
Cheers,
guy038
P.S. :
My
sed
script is not totally exact ! Indeed, if a line, located outside theheader
zone, contains the string(Continued)
, this line is wrongly deleted ! But, anyway, we’re on a Notepad++ forum, after all ;-)) -