Community
    • Login

    Replace X number of lines after finding Y

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    11 Posts 6 Posters 1.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Brian KrontzB
      Brian Krontz
      last edited by

      I have a ton of text files that are cluttered with headers where there is a page break/FF symbol. The header always contains 6 lines starting at the FF symbol. I’m not sure how to search for a vague number of lines and/or characters after the unique FF starts off the header. The header also ends with “Y.T.D”. followed by an empty line, if that helps. In a perfect world, I’d like to remove the “(Continued)” aka 7th line only if the word “Continued” appears (in green below).

      Example (goal is to remove everything highlighted… for all of the page breaks/headers the file may contain):

      933b1861-6e86-4968-aa57-1be3bd98799d-image.png

      Thanks to all of the folks that created and maintain this software. I really appreciate ya’ll and your community!

      B

      EkopalypseE 1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello, @brian-krontz, and All,

        Easy with regular expressions ! The following regex S/R should work :

        • Open the Replace dialog ( Ctrl + H )

          • SEARCH (?s-i)^\x0C.+?Y\.T\.D\.\h*\R\h*\R(?-s)(?:.+\(Continued\).*\R)?

          • REPLACE Leave EMPTY

        • Tick the Wrap around option

        • Select the Regular expression search mode

        • CLick once on the Replace All button or several times on the Replace button

        Voila !

        If it solves your problem, I’ll explain the regex syntax next time !

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 4
        • EkopalypseE
          Ekopalypse @Brian Krontz
          last edited by

          @Brian-Krontz

          depending on your real data this might work.
          Find what:\x0C(.*\R){6}.*Y\.T\.D\r\n\r\n.*\(Continued\)|\x0C(.*\R){6}.*Y\.T\.D
          leave replace with empty and check regular expression in search mode

          1 Reply Last reply Reply Quote 2
          • Terry RT
            Terry R
            last edited by

            @Brian-Krontz
            Just in case you were worried about removing more lines than you wanted I’d suggest using @guy038 regex, but with the “Mark” function and tick the bookmark line. Then you could “mark all” and go through the document/file to verify it was ONLY the lines you wanted were marked. Then once confirmed use the “Remove bookmarked Lines” from under Search (main menu), Bookmark.

            I too was working on an almost exact copy of @guy038 regex but he was faster to the keyboard.

            Good luck
            Terry

            EkopalypseE 1 Reply Last reply Reply Quote 3
            • Brian KrontzB
              Brian Krontz
              last edited by

              You guys are amazing. I checked the results in Beyond Compare and everything looks perfect. Guy, it worked, so explain away if you’d like. Eko and Terry, I think he’s got me fixed up, but appreciate the extra set of eyes. Thanks to all!!

              B

              1 Reply Last reply Reply Quote 3
              • EkopalypseE
                Ekopalypse @Terry R
                last edited by

                @Terry-R said in Replace X number of lines after finding Y:

                Just in case you were worried about removing more lines than you wanted I’d suggest using @guy038 regex

                Just curious, where do you see an issue that my regex
                might remove more lines then expected?

                1 Reply Last reply Reply Quote 0
                • Terry RT
                  Terry R
                  last edited by

                  @Ekopalypse said in Replace X number of lines after finding Y:

                  where do you see an issue that my regex

                  My concern was for the OP. The example provided would seem to be accountancy stuff. VERY important data and I was just letting him know that IF HE WAS concerned this was a way to verify the correct lines were removed. As you will see, he had another method to prove that anyways (Beyond Compare).

                  Cheers
                  Terry

                  1 Reply Last reply Reply Quote 4
                  • Makwana PrahladM
                    Makwana Prahlad Banned
                    last edited by

                    Hello,@Brian-Krontz

                    Please try this information,To Replace X number of lines after finding Y

                    In the simplest calling of sed, it has one line of text in the pattern space, ie. 1 line of \n delimited text from the input. The single line in the pattern space has no \n… That’s why your regex is not finding anything.

                    You can read multiple lines into the pattern-space and manipulate things surprisingly well, but with a more than normal effort… Sed has a set of commands which allow this type of thing… Here is a link to a Command Summary for sed. It is the best one I’ve found, and got me rolling.

                    However forget the “one-liner” idea once you start using sed’s micro-commands. It is useful to lay it out like a structured program until you get the feel of it… It is surprisingly simple, and equally unusual. You could think of it as the “assembler language” of text editing.

                    Summary: Use sed for simple things, and maybe a bit more, but in general, when it gets beyond working with a single line, most people prefer something else…
                    I’ll let someone else suggest something else… I’m really not sure what the best choice would be (I’d use sed, but that’s because I don’t know perl well enough.)

                    sed '/^a test$/{
                           $!{ N        # append the next line when not on the last line
                             s/^a test\nPlease do not$/not a test\nBe/
                                        # now test for a successful substitution, otherwise
                                        #+  unpaired "a test" lines would be mis-handled
                             t sub-yes  # branch_on_substitute (goto label :sub-yes)
                             :sub-not   # a label (not essential; here to self document)
                                        # if no substituion, print only the first line
                             P          # pattern_first_line_print
                             D          # pattern_ltrunc(line+nl)_top/cycle
                             :sub-yes   # a label (the goto target of the 't' branch)
                                        # fall through to final auto-pattern_print (2 lines)
                           }    
                         }' alpha.txt
                    

                    I hope this information will be usefull for you.
                    Thank you.

                    Alan KilbornA 1 Reply Last reply Reply Quote 1
                    • Alan KilbornA
                      Alan Kilborn @Makwana Prahlad
                      last edited by

                      @Makwana-Prahlad

                      Why is it necessary to take this outside of Notepad++?
                      This is a Notepad++ forum and if a solution can be found using Notepad++ then no discussion of outside tools is needed or wanted.
                      Also, the question has been answered previously so there is no real need to seek further solutions.

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by guy038

                        Hi, @brian-krontz, @ekopalypse, @terry-r and All,

                        For a better readability, if we use the free-spacing mode, (?x) modifier, my previous search regex can be rewritten as :

                        (?x)   (?s-i)   ^\x0C   .+?   Y\.T\.D\.   \h*\R   \h*\R   (?-s)   (?:   .+\(Continued\).*   \R   )?
                        

                        Notes :

                        • First the part (?s-i) are in-line modifiers :

                          • (?s) means that the dot regex symbol ( . ) represents any single character, even like-break chars like \r or \n ( single-line mode )

                          • (?-i) carries out the search in a case-sensitive manner ( non-insensitive mode )

                        • Then, the part ^\x0C searches for a FF control character, at beginning of lines ( ^ )

                        • Now, the .+? syntax looks for the shortest non-null range of any character, even line-breaks, TILL…

                        • …the part Y\.T\.D\.\h*\R. That is to say …till the string Y.T.D., with that exact case, followed by possible horizontal blank char(s) ( Tab / Space ) and its line-break , as \R match any kind of line-end char(s)

                        • And the \h*\R looks for a blank line ( the 6th ), possibly preceded with horizontal blank character(s)

                        • Then, the (?-s) modifier means that, from now on, any regex . symbol matches a single standard character only ( not EOL ones )

                        • Now, the part .+\(Continued\).*\R searches, in current line, for any string, (Continued), with this exact case, preceded with a non-null range of standard char(s) ( .+ ) and followed with other range of standard char(s), possibly null ( .* ) and its line-break char(s) ( \R )

                        • As the above part is embedded in the structure (?:......)?, this means that all the 7th-line contents are embedded in a non-capturing group, which may be optional, due to the ? quantifier, at the end ( a shorthand of the formal syntax {0,1} )


                        Remark : The dot . and parentheses () chars are regex symbols. So, in order to be interpreted as literal characters, they need to be escaped with the \ symbol

                        Best Regards,

                        guy038

                        1 Reply Last reply Reply Quote 2
                        • guy038G
                          guy038
                          last edited by guy038

                          Hello, @brian-krontz, @ekopalypse, @terry-r @alan-kilborn, @makwana-prahlad and All,

                          To @makwana-prahlad :

                          Why are you using such a complicated script, with some advanced options of sed ?

                          To my mind, the one-line script, below, does the job nicely, too ;-))

                          sed -n "/^\x0C/,+5d ; /(Continued)/d ; p" Input.txt > Output.txt


                          Notes :

                          • First, sed searches, in the Input.txt file, for a range of lines :

                            • The first line of this range must contain the FF char, at beginning of current line, so the /^\x0C/ syntax

                            • Till the next 5 lines, that is to say, till the 6th empty line. Thus, the ,+5 syntax

                          • Then, the d command deletes this range of entire lines

                          • Secondly, sed searches if the next 7th line contains the (Continued) string, with that exact case, so the syntax /(Continued)/ , and, in that case, the d command, again, deletes this current line

                          • Finally, if none of these criteria can be verified, the p command simply rewrites the current line in the Output.txt file


                          Indeed, not the same philosophy at usual N++ regexes, but rather easy to understand, too !

                          Cheers,

                          guy038

                          P.S. :

                          My sed script is not totally exact ! Indeed, if a line, located outside the header zone, contains the string (Continued), this line is wrongly deleted ! But, anyway, we’re on a Notepad++ forum, after all ;-))

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors