• Login
Community
  • Login

Select/Export certain text that appears above a different set of certain text - Two separate find queries

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
20 Posts 5 Posters 1.4k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • A
    Alan Kilborn @StormyCalories
    last edited by Alan Kilborn Oct 21, 2023, 10:12 PM Oct 21, 2023, 10:10 PM

    @StormyCalories

    Yea…so point the mouse at the little speech bubble with the ... in it, and see what it says when a popup appears. The regular expression isn’t necessarily “invalid”, but more than likely is that the engine ran out of memory, or detected that complexity was increasing such that it would…

    S 1 Reply Last reply Oct 21, 2023, 10:13 PM Reply Quote 2
    • S
      StormyCalories @Alan Kilborn
      last edited by Oct 21, 2023, 10:12 PM

      @Alan-Kilborn

      This is tricky to reproduce but here’s how you can see the same behavior. Create a blank text file with the following text:

      Product Name="name1" Style="name/type" Cat="name/type"
      {
      }
      

      Within the curly braces, paste the following text:

        Attribute Name="name/type"
        {
          Val { Val=20 }
          Val_Mod=T
          Def_Val=F
        }
      

      If you paste this 314 times, you’ll have a total file size of 1887 lines and the expression will work properly. However, if you paste it one more time, giving you 315 elements with a total line count of 1892, the expression will fail.

      I’ve noted it’s not necessarily the file line size or the total number of characters. If the Attribute subelement has more complicated data within it, I’ll end up with less than 314 elements before the expression fails.

      P 1 Reply Last reply Oct 21, 2023, 10:49 PM Reply Quote 0
      • S
        StormyCalories @Alan Kilborn
        last edited by Oct 21, 2023, 10:13 PM

        @Alan-Kilborn said in Select/Export certain text that appears above a different set of certain text - Two separate find queries:

        @StormyCalories

        Yea…so point the mouse at the little speech bubble with the ... in it, and see what it says when a popup appears.

        Wow, I had tried clicking on that speech bubble earlier and nothing happened. I clearly didn’t have the patience to just hover my mouse for a fraction of a second. Sorry about that!

        Here’s the full error message:

        241c7b96-511f-4224-b0bb-165a21392756-image.png

        A 1 Reply Last reply Oct 21, 2023, 10:15 PM Reply Quote 1
        • A
          Alan Kilborn @StormyCalories
          last edited by Oct 21, 2023, 10:15 PM

          @StormyCalories

          I think it all makes sense now.
          Pretty much your only choice is to work on your expression to make it so such a condition isn’t entered.
          That may or may not be possible.

          1 Reply Last reply Reply Quote 1
          • P
            PeterJones @StormyCalories
            last edited by PeterJones Oct 21, 2023, 10:50 PM Oct 21, 2023, 10:49 PM

            @StormyCalories ,

            My suggestion: If every Product Name ... { } ends with a } by itself on a line, change the instances of (?:(?!Product Name).) in the expression into (?:(?!^}).) , which will stop each section with } at the beginning of a line, instead of stopping at the next Product Name. If there’s actually whitespace before the }, then use (?:(?!^\h*}).) . As long as the number of characters from Product Name to } isn’t ever too huge, it should be able to handle the complexity. (If you have a megabyte of data within each record, that probably won’t work; but a few dozen lines shouldn’t be a problem, and probably not even a problem on a few hundred lines)

            S 1 Reply Last reply Oct 22, 2023, 12:04 AM Reply Quote 2
            • C
              Coises @StormyCalories
              last edited by Coises Oct 21, 2023, 11:15 PM Oct 21, 2023, 11:14 PM

              @StormyCalories said in Select/Export certain text that appears above a different set of certain text - Two separate find queries:

              Is this a problem I can resolve by using a more powerful computer or is there some sort of maximum dataset size that I’m running into?

              No promises (I’m at the edge of my knowledge), but this might work:

              (?s)(?<=Product Name=")\w++(?=.*?(\.TRIGGER':=1|Product Name=(*THEN)(*FAIL)))

              S 1 Reply Last reply Oct 22, 2023, 12:10 AM Reply Quote 2
              • S
                StormyCalories @PeterJones
                last edited by StormyCalories Oct 22, 2023, 12:06 AM Oct 22, 2023, 12:04 AM

                @PeterJones said in Select/Export certain text that appears above a different set of certain text - Two separate find queries:

                As long as the number of characters from Product Name to } isn’t ever too huge, it should be able to handle the complexity. (If you have a megabyte of data within each record, that probably won’t work; but a few dozen lines shouldn’t be a problem, and probably not even a problem on a few hundred lines)

                I see. That might be my biggest problem then. Each element can have upwards of 15,000 lines and half a million characters. Does this large of potential datasets scratch the whole idea of parsing the file with Notepad++?

                edit: I should add, the full file I’m working against can be about a half gig. Obviously, I have no problem splitting the file into multiple different smaller pieces, but the data within will still be very large and complex elements.

                1 Reply Last reply Reply Quote 0
                • S
                  StormyCalories @Coises
                  last edited by StormyCalories Oct 22, 2023, 12:21 AM Oct 22, 2023, 12:10 AM

                  @Coises said in Select/Export certain text that appears above a different set of certain text - Two separate find queries:

                  No promises (I’m at the edge of my knowledge), but this might work:

                  (?s)(?<=Product Name=")\w++(?=.*?(.TRIGGER’:=1|Product Name=(*THEN)(*FAIL)))

                  This actually seemed to solve the problem of having too much data within the element. I was able to successfully run this expression even on a file that had failed on the original expression in this thread. It’s still not working on the full file that I have and I’m still trying to track down why that might be. I’ve been able to see that if there are 10’s of thousands of lines of elements between two positive elements, it seems to fail. I’m attempting to figure out how much it can parse before breaking.

                  edit: This seems to break whenever the total file line size is greater than 165k lines or so.

                  C 1 Reply Last reply Oct 22, 2023, 12:31 AM Reply Quote 0
                  • C
                    Coises @StormyCalories
                    last edited by Oct 22, 2023, 12:31 AM

                    @StormyCalories said in Select/Export certain text that appears above a different set of certain text - Two separate find queries:

                    I’ve been able to see that if there are 10’s of thousands of lines of elements between two positive elements, it seems to fail. I’m attempting to figure out how much it can parse before breaking.

                    Perhaps you could first use something like:

                    Find what : (?-s)^.*?(\R|(\.TRIGGER':=1|Product Name=)(*THEN)(*FAIL))
                    Replace with : (leave empty)

                    on a copy of your file to remove all the lines that don’t affect the search.

                    S 1 Reply Last reply Oct 22, 2023, 12:41 AM Reply Quote 1
                    • S
                      StormyCalories @Coises
                      last edited by Oct 22, 2023, 12:41 AM

                      @Coises

                      That is a really intriguing idea. So simple and I would never have considered it.

                      I shall report back. Thank you!

                      C 1 Reply Last reply Oct 22, 2023, 12:44 AM Reply Quote 0
                      • C
                        Coises @StormyCalories
                        last edited by Coises Oct 22, 2023, 12:45 AM Oct 22, 2023, 12:44 AM

                        @StormyCalories said in Select/Export certain text that appears above a different set of certain text - Two separate find queries:

                        @Coises

                        That is a really intriguing idea. So simple and I would never have considered it.

                        Perhaps even better:

                        Find what : ^.*?((Product Name=|\.TRIGGER':=1)|\R|\z)
                        Replace with : $2

                        S 1 Reply Last reply Oct 22, 2023, 1:22 AM Reply Quote 2
                        • S
                          StormyCalories @Coises
                          last edited by Oct 22, 2023, 1:22 AM

                          @Coises
                          Well it took 35 minutes for my PC to chug through with the search/replace but after it finally finished, all the expressions by everyone above works against the data.

                          Thank all of you so much!

                          1 Reply Last reply Reply Quote 0
                          • G
                            guy038
                            last edited by guy038 Oct 23, 2023, 12:28 AM Oct 22, 2023, 1:30 PM

                            Hello, @stormycalories, @peterjones, @alan-kilborn, @Coises and all,

                            Yeah, interesting topic !

                            First, I would simplified the @peterjones’s regex :

                            • SEARCH (?s)(?<=Product Name=")(?:(?!Product Name).)*?(?="((?:(?!Product Name).)+?)(?=.TRIGGER':=1))

                            As :

                            • SEARCH (?xs) (?<= Product[ ]Name=" ) \w+ (?= " ( (?! Product[ ]Name ). )+? \. TRIGGER':=1 .+? \} )

                            Because the whole regex would search for some word chars only, instead of standard chars


                            Now, @coises, your use of backtracking control verbs, in the below regex, is quite clever !

                            • SEARCH (?s)(?<=Product Name=")\w++(?=.*?(\.TRIGGER':=1|Product Name=(*THEN)(*FAIL)))

                            I personally get used to follow this method :

                            https://www.rexegg.com/backtracking-control-verbs.html#skipfail

                            As it is said, the generic regex <.....(*SKIP)(*FAIL)|.....> syntax just means <What we DON'T want | What we DO want> !

                            In this example, this end up to :

                            • SEARCH (?xs) (?<= Product[ ]Name=" ) \w+ (?= .*? ( Product[ ]Name= (*SKIP) (*FAIL) | \. TRIGGER':=1 ) )

                            In the same way, @coises, I would change your regex S/R, which deletes any line which does NOT contain the Product Name NOR the .TRIGGER':=1 strings :

                            • SEARCH (?-s)^.*?(\R|(\.TRIGGER':=1|Product Name=)(*THEN)(*FAIL))

                            • REPLACE Leave EMPTY

                            by this one :

                            • SEARCH (?x-is) ^ .* (?: Product[ ]Name= | \.TRIGGER':=1 ) (*SKIP) (*F) | ^ .* \R

                            • REPLACE Leave EMPTY

                            Which could be expressed as :

                            • IF, from beginning of current line, it is followed, further on, by, either, the Product Name= or the .TRIGGER':=1 strings, we DO NOT want to delete the currrent line

                            • ELSE, we DO want to delete the current line


                            Finally, I admit that your final try, for the same purpose :

                            • SEARCH (?-s)^.*?((Product Name=|\.TRIGGER':=1)|\R|\z)

                            • REPLACE $2

                            seems the best one !

                            I also thought of this equivalent syntax :

                            • SEARCH (?x-s) ^ (?! .* (?: Product[ ]Name= | \. TRIGGER':=1 ) ) .* \R

                            • REPLACE Leave EMPTY

                            Where the search comes from (?x-s) ^ (?! (?: .* Product[ ]Name= | .* \. TRIGGER':=1 ) ) .* \R


                            Note that the regex part :

                            
                                 (?! (?: .* Product[ ]Name= | .* \. TRIGGER':=1 ) )
                                         ------------------   -----------------
                                                  A                B
                            

                            Means, in a logical way :

                            
                                 NOT ( ( .* Product Name=   OR   .* .TRIGGER':=1 ) )
                                         ----------------        ---------------
                                                  A                     B
                            

                            which, in turn, can be expressed as :

                                 NOT ( .* Product Name= )   AND   NOT ( .* .TRIGGER':=1 )
                                       ----------------                 ---------------
                                              A                                B
                            

                            Thus, meaning :

                            Consider all the current line and its EOL chars, which do NOT contain the Product Name= NOR  the .TRIGGER':=1 strings, and delete it !

                            Best Regards,

                            guy038

                            C 1 Reply Last reply Oct 22, 2023, 5:38 PM Reply Quote 4
                            • C
                              Coises @guy038
                              last edited by Oct 22, 2023, 5:38 PM

                              @guy038 said in Select/Export certain text that appears above a different set of certain text - Two separate find queries:

                              https://www.rexegg.com/backtracking-control-verbs.html#skipfail

                              That entire page is very useful. Thanks!

                              I was reading the Boost regex documentation to which Notepad++ documentation links and experimenting, trying to figure out what those terse descriptions meant. I was sure there should be a way to reduce the matching complexity with verbs, but I didn’t really know how they work.

                              (*SKIP)(*FAIL) is clearly the better choice for this case.

                              1 Reply Last reply Reply Quote 2
                              • G
                                guy038
                                last edited by Oct 22, 2023, 10:06 PM

                                Hello, @coises and All,

                                You may, also, refer to this topic, that I wrote after that a new version of the Boost regex library comes out and was available within Notepad++ :

                                https://community.notepad-plus-plus.org/topic/19632/new-backtracking-control-verbs-feature-available-since-notepad-v7-7/1

                                BR

                                guy038

                                1 Reply Last reply Reply Quote 3
                                15 out of 20
                                • First post
                                  15/20
                                  Last post
                                The Community of users of the Notepad++ text editor.
                                Powered by NodeBB | Contributors