Community
    • Login

    Find line above given text in document

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    27 Posts 9 Posters 3.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Krithagis CK
      Krithagis C
      last edited by

      Hello. I have a file that is Is over 100,000 lines long.

      What I’m trying to do is fine every line that shows ‡NO NAMES‡ and mark or display the line above it.

      So for something like this, I want to keep the bold lines:
      ADDRESS
      111 Main St
      LEVEL 4
      Anytown, USA
      FORM OF PAYMENT

      1. CASH TYPE
        CASH ‡

      *ZFWAQJ-P
      ‡NO NAMES‡

      *LUOKAG-P
      ‡NO NAMES‡

      I’ve tried a Linefilter plugin but it doesn’t seem to be working. Thanks so much.

      PeterJonesP Alan KilbornA 2 Replies Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Krithagis C
        last edited by

        @Krithagis-C

        • Search > Mark
        • Find What: (?-s)^.*$(?=\R‡NO NAMES‡)
        • ☑ Bookmark Line
        • ☑ Purge for each search
        • Search Mode = ☑ Regular Expression
        • Mark All

        06be7114-4ba0-46cc-8e21-01a606dc5470-image.png

        ----

        Useful References

        • Notepad++ Online User Manual: Bookmarks vs. Marks describes how to use Marks and Bookmarks, and how to easily navigate between bookmarked lines
        • Notepad++ Online User Manual: Searching/Regex
        • FAQ: Where to find regular expressions (regex) documentation
        • Please Read Before Posting
        • Template for Search/Replace Questions
        Krithagis CK 1 Reply Last reply Reply Quote 4
        • Alan KilbornA
          Alan Kilborn @Krithagis C
          last edited by Alan Kilborn

          @Krithagis-C

          To show your bolded lines in the Search results window, you can do:

          Invoke Find window with Ctrl+f
          Find: (?-is)^.+\R‡NO NAMES‡
          Search mode: Regular expression
          Action: Press Find All in Current Document button

          and you’ll get something like this:

          d2575599-48c7-4a4f-89a2-b6f78ff1b816-image.png

          That seems to fit the bill for “display the line above it”. If you want the red-on-yellow text content, you can copy out of the Search results window.

          When you say “I want to keep the bold lines”, is that what you really mean, you want to delete all other text, in your original document?

          Which is it, displaying as I show (in Search results), marking in place as Peter shows, deleting all other text, or…something else?

          Krithagis CK 2 Replies Last reply Reply Quote 3
          • Krithagis CK
            Krithagis C @Alan Kilborn
            last edited by

            @Alan-Kilborn Thank you so much! Ultimately I only need to keep the line above ‡NO NAMES‡. There’s no need for bold or red as long as I have a way to grab all of those lines for further review. The NO NAMES means there’s an issue with that line that I have to review in another way.

            Alan KilbornA 1 Reply Last reply Reply Quote 0
            • Krithagis CK
              Krithagis C @PeterJones
              last edited by

              @PeterJones Thanks this works for me. I can just copy marked text into another file and remove the ‡NO NAMES‡ text easily.

              Thanks so much!

              Alan KilbornA 1 Reply Last reply Reply Quote 0
              • Alan KilbornA
                Alan Kilborn @Krithagis C
                last edited by PeterJones

                @Krithagis-C said in Find line above given text in document:

                Ultimately I only need to keep the line above ‡NO NAMES‡… as long as I have a way to grab all of those lines for further review

                Then you can do this from the Search results result I showed before:

                • Right-click on the filename line (the line with green text that ends with (XXX hits))
                • Choose Copy Selected Line(s)
                • Create a new tab and then Ctrl+v (paste) there

                Here’s a screenshot where I demo the above (note that for maximum clarity I made a partial selection on the new 2 (2 hits) line before right-clicking it):

                2c3fd0c8-3993-465d-94e3-775f485e2969-image.png

                There’s no need for bold or red

                This is only a visual effect anyway, not part of the data. All you’ll get with the “Copy Selected Line(s)” technique is the data you seek:

                *ZFWAQJ-P
                *LUOKAG-P
                
                1 Reply Last reply Reply Quote 3
                • Alan KilbornA
                  Alan Kilborn @Krithagis C
                  last edited by

                  @Krithagis-C said in Find line above given text in document:

                  PeterJones Thanks this works for me. I can just copy marked text into another file and remove the ‡NO NAMES‡ text easily.

                  If you’re using Peter’s technique as he stated and showed it, there would be no ‡NO NAMES‡ text to remove. The Copy Marked Text button will only copy what is red-marked, and, as you can see from Peter’s screenshot, ‡NO NAMES‡ is not red-marked.

                  1 Reply Last reply Reply Quote 3
                  • Krithagis CK
                    Krithagis C
                    last edited by

                    This post is deleted!
                    1 Reply Last reply Reply Quote 0
                    • Krithagis CK
                      Krithagis C @Alan Kilborn
                      last edited by

                      @Alan-Kilborn Thanks so much to you and @PeterJones

                      It’s working now! I had my bottom search result hidden somehow and I haven’t used this app in awhile. This is going to save me SO MUCH TIME!!

                      b9c856c3-02ce-4200-a800-3f54d00f12e5-image.png

                      Benji2025B 1 Reply Last reply Reply Quote 1
                      • Benji2025B
                        Benji2025 @Krithagis C
                        last edited by

                        Hi All,

                        Sorry to bring up an old post but I am trying to bookmark or mark the lines above the text “Access is denied” (without quotes).

                        The suggested method works fine on a small number of lines but with millions of lines it is erroring with “Invalid Regular Expression”.

                        (?-s)^.*$(?=\RAccess is denied)

                        Please help :)

                        Cheers,
                        Ben

                        e9b925ab-b9e1-4e80-923a-48cd9c2eebf0-image.png

                        EkopalypseE 2 Replies Last reply Reply Quote 0
                        • EkopalypseE
                          Ekopalypse @Benji2025
                          last edited by

                          @Benji2025

                          The regex looks fine to me, but maybe you have introduced an invisible symbol?
                          What does the tooltip say about the error? See my prepared example.

                          0903c9d7-585b-418c-846c-54ee657c631f-{AC22899E-4B97-42BA-8222-5B3B08B3BC9C}.png
                          Have you tried typing it in to make sure you don’t have an invisible symbol?

                          Benji2025B 1 Reply Last reply Reply Quote 1
                          • Benji2025B
                            Benji2025 @Ekopalypse
                            last edited by Benji2025

                            @Ekopalypse

                            Hi Eko, thank you for quick response, I have managed to sort it now using Python.

                            Tooltip in screenshot.

                            7deb9ff6-4225-4a3b-81ec-2824167d8330-image.png

                            No invisible characters by the way, and it works fine with a smaller amount of lines.

                            1 Reply Last reply Reply Quote 0
                            • EkopalypseE
                              Ekopalypse @Benji2025
                              last edited by

                              @Benji2025 said in Find line above given text in document:

                              The suggested method works fine on a small number of lines but with millions of lines

                              I tested this with 120_000_000 lines and it worked for me, but to be honest, it only had 1.5 GB, so … your file must be much larger accordingly. Unfortunately I don’t know the internals from what size this becomes a problem.

                              c5cafef1-7945-4c85-bab7-e0567b13ca81-{B52BFEFA-92C5-4B2D-A551-4DDD7409CB32}.png

                              I have managed to sort it now using Python

                              hehe … from my point of view that is ALWAYS the solution :-D

                              Mark OlsonM 1 Reply Last reply Reply Quote 2
                              • guy038G
                                guy038
                                last edited by guy038

                                Hello, @benji2025, @ekopalypse, @alan-kilborn and All,

                                Oh, My God, I’ve been beaten by @ekopalypse :-((

                                @benji2025, I’d really like to know the average size of your files and their number of lines !

                                Indeed, I did a test with a file of size 143,151,374 bytes, containing 3,151,513 lines of 47 characters each. And, both, my or your regex worked fine and mark 121,212 lines !!

                                So, your regex that I used is (?-s)^.*$(?=\RTEST)

                                And I used a similar syntax (?-s)^.*\R(?=TEST$)


                                On my old Win XP machine, with N++ v7.9.2 release, :

                                • Your regex did the marking operation in about 24,3 seconds

                                • My regex did the marking operation in about 23,2 seconds

                                So, I suppose that you should try my regex version !

                                Best Regards,

                                guy038

                                Benji2025B CoisesC EkopalypseE 3 Replies Last reply Reply Quote 0
                                • Benji2025B
                                  Benji2025 @guy038
                                  last edited by

                                  @guy038 @Ekopalypse

                                  Thanks guys

                                  Just the 9 million lines at 1GB, and that is just the short run of a job I am running.

                                  Same error with guy038’s syntax.

                                  197094a7-ce8a-48c8-a66e-8d0ffd20d25a-image.png

                                  1 Reply Last reply Reply Quote 0
                                  • Mark OlsonM
                                    Mark Olson @Ekopalypse
                                    last edited by

                                    @Ekopalypse said in Find line above given text in document:

                                    I have managed to sort it now using Python

                                    hehe … from my point of view that is ALWAYS the solution :-D

                                    Yeah, to expand on that for the benefit of others who don’t know: Python’s re library is usually at least 10x faster than Notepad++'s built-in search capability, such that it is vastly better when searching extremely large files. The Columns++ and MultiReplace plugins, while very powerful in their own right, will AFAIK never be much faster than the Notepad++ find/replace form because they also do their search-replace operations through Scintilla.

                                    @PeterJones
                                    At this point I’ve repeated this PSA enough times that it should probably be added to one of the FAQ’s, maybe this one?

                                    CoisesC PeterJonesP 2 Replies Last reply Reply Quote 1
                                    • CoisesC
                                      Coises @Mark Olson
                                      last edited by

                                      @Mark-Olson said in Find line above given text in document:

                                      The Columns++ and MultiReplace plugins, while very powerful in their own right, will AFAIK never be much faster than the Notepad++ find/replace form because they also do their search-replace operations through Scintilla.

                                      A minor technical quibble: the regex search in Columns++ search does not use Scintilla search. While it does search within Scintilla’s buffer (avoiding a copy, using a documented interface that exposes the content as addressable bytes), it uses Boost::regex directly, with its own custom iterators, rather than the Scintilla search interface.

                                      Still, this probably only makes it approximately equal in speed relative to Notepad++, since it does use the same regular expression engine in essentially the same way.

                                      1 Reply Last reply Reply Quote 2
                                      • CoisesC
                                        Coises @guy038
                                        last edited by

                                        @guy038

                                        I’ll have to see if I can test this with some large files, but does it not strike you as very strange that this expression:

                                        (?-s)^.*$(?=\RAccess is denied)

                                        should yield a complexity error, regardless of the data? Complexity errors are supposed to happen when the same text keeps getting rescanned; that is, not just when the operation takes a long time, but when the number of bytes examined is growing “too much” faster than the start point is being moved forward (or else when internal stacks overflow preset bounds). This expression shouldn’t cause that. It doesn’t backtrack.

                                        @Benji2025 — I know you’ve solved your problem, but if you are still reading and interested: Does the same thing happen on the same data with:

                                        (?-s)^.*+(?=\RAccess is denied)

                                        That shouldn’t matter, but maybe the regex engine isn’t as smart as I think it is.

                                        Terry RT CoisesC 2 Replies Last reply Reply Quote 2
                                        • PeterJonesP
                                          PeterJones @Mark Olson
                                          last edited by PeterJones

                                          @Mark-Olson said in Find line above given text in document:

                                          At this point I’ve repeated this PSA enough times that it should probably be added to one of the FAQ’s, maybe this one?

                                          The suggestion to “use Python” is not necessarily the same as the suggestion to “use PythonScript”; I am not sure whether @Benji2025 was using the standalone python interpreter, or using the Notepad++ automation plugin to search the open file. I am also not sure whether your comment about performance is saying “python.exe’s re library is faster” or “using PythonScript and the re library is faster” or “using PythonScript and it’s re-like editor.research() is faster”. Because those are three different things.

                                          The FAQ you pointed to is only about using PythonScript plugin (or other plugins) to do mathy-replacements; the generalized statement you seem to be making doesn’t seem to be restricted to mathy-replacements, so I’m not sure that’s the best place, even once the context of the claim is clarified.

                                          So that we don’t clutter this specific question with workshopping, if you wanted to make an RFC post in the Blogs category to workshop a new FAQ entry, once everyone was happy with everything workshopped, I could create a new entry in the FAQ category and duplicate the final version of the post (with you as the author).

                                          1 Reply Last reply Reply Quote 2
                                          • Terry RT
                                            Terry R @Coises
                                            last edited by Terry R

                                            @Coises said in Find line above given text in document:

                                            I’ll have to see if I can test this with some large files, but does it not strike you as very strange that this expression:

                                            I tried thinking a bit laterally about this issue, more specifically the error of “complexity” stopping the regular expression from completing.

                                            Since we all seemed unsure of why it would generate such a message from what “seemed” to be a simple find expression I thought I would do a small amount of testing to see if the .* or the lookahead was likely to blame.

                                            I created a file with the lines bla bla bla bla and Access is denied on a ratio of about 10 bla to 1 of Access. I got to 240M lines approx (so by my calculation around 3.6Gb) at which point the lags in updating NPP were significant. At this point I gave the original regex a try. Work called so after I completed that job I came back to a very sorry dual monitor Windows 11 system. One monitor had called it quits (Windows wouldn’t use it) and the smaller monitor had all windows squished on it. On the upside, the regex worked, whereas I actually wanted it to fail so I could hatch my next cunning step, a revised regex to continue testing.

                                            So back to my idea of seeing which part might be the cause of the issue. Since the original request was to be able to mark the lines I considered not trying to "capture’ the entire line, instead just use something like .\R(?=Access is denied$). Mark function will still mark the line even if only 1 character on that line is sought.

                                            Another similar idea would have been to (again) just select the last character on a line and also select the following line with “access is denied.” If the reason for the “mark” was to extract them to another tab/file, then it would be much simpler to remove the “access is denied” line at that time.

                                            There was also a 3rd idea. That is to remove the \R immediately before the “access is denied” line. Then use the mark function to mark the actual text “Access is denied”. Based on this another slightly modified idea is to copy the “Access is denied” to the end of the line above. Then Mark lines which have this text but not from the start of the line.

                                            As shown, there are often many answers/solutions to the problem, especially if one is willing to divide and conquer. It is nice to give a one line solution but often complexities (inability to solve or easily adjust or user unable to comprehend the solution) will make a multiple step solution more palatable.

                                            Terry

                                            CoisesC 1 Reply Last reply Reply Quote 4
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors