• Login
Community
  • Login

Random line search

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
13 Posts 3 Posters 1.4k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • R
    Richard Lohr
    last edited by Feb 8, 2023, 5:47 PM

    I am using data from Open Street Maps, USFS, and BLM to create KML files of roads, trails, and POI’s for Google Earth Pro. Google Earth tends to choke as the Myplaces.kml file gets real large so much of what I need to do is deleting, or revising sections to make the files smaller.

    Listed below is a sample of my problem. In this case I was using the line “<SimpleData name=“surface”>asphalt</SimpleData>” to change the color and width of the line based on the surface type being “asphalt”. The problem for me is that the “surface” designation is not always on the same line, and the number of lines between the <Placemark> statements can vary considerably. It is not unusual for me to have to write 4-8 different Regexp’s to handle this problem. Another scenario is that I delete a lot of the Placemark groups based on a test of a data field within the section. I have only been working with Notebook and Regexp’s for less than 6 months, so I would be eternally grateful if someone can provide me some tips on how to handle this scenario.

    <Placemark>
    <Style><LineStyle><color>ff95fcff</color><width>2</width></LineStyle></Style>
    <ExtendedData><SchemaData schemaUrl=“#Hwy_Primary__UT_”>
    <SimpleData name=“highway”>primary</SimpleData>
    <SimpleData name=“maxspeed”>55 mph</SimpleData>
    <SimpleData name=“surface”>asphalt</SimpleData>
    <SimpleData name=“ref”>UT 21</SimpleData>
    </SchemaData></ExtendedData>
    <LineString><coordinates>-113.0009,38.3766 -113.00039,38.3758 -112.99967,38.37472 -112.99803,38.37212</coordinates></LineString>
    </Placemark>

    <Placemark>
    <name>South 100 East Street</name>
    <Style><LineStyle><color>ff95fcff</color><width>2</width></LineStyle></Style>
    <ExtendedData><SchemaData schemaUrl=“#Hwy_Primary__UT_”>
    <SimpleData name=“highway”>primary</SimpleData>
    <SimpleData name=“tiger:name_type”>St</SimpleData>
    <SimpleData name=“tiger:name_base”>State Route 21;100 East;100 East;100 East</SimpleData>
    <SimpleData name=“tiger:county”>Beaver, UT</SimpleData>
    <SimpleData name=“maxspeed”>55 mph</SimpleData>
    <SimpleData name=“surface”>asphalt</SimpleData>
    <SimpleData name=“smoothness”>excellent</SimpleData>
    <SimpleData name=“ref”>UT 21</SimpleData>
    </SchemaData></ExtendedData>
    <LineString><coordinates>-113.00662,38.38557 -113.00613,38.38482 -113.00467,38.38253</coordinates></LineString>
    </Placemark>

    P 1 Reply Last reply Feb 8, 2023, 6:02 PM Reply Quote 0
    • P
      PeterJones @Richard Lohr
      last edited by Feb 8, 2023, 6:02 PM

      @Richard-Lohr ,

      The Generic Regex Formulas => Replacing Text in a Specific Zone formula may help you.

      For changing color/width based on surface being asphalt, you might want to use a combo of positive and negative lookaheads that say that after the <LineStyle> section, it must contain "surface">asphalt< before it hits </Placemark>. My first attempt (not tried) would be something like (?s)<LineStyle>.*?</LineStyle>(?=((?!</Placemark).)*"surface">asphalt<) … actually, that will probably extend the first group a lot farther than expected, so might want to use (?s)<LineStyle>((?!</Placemark).)*?</LineStyle>(?=((?!</Placemark).)*"surface">asphalt<) instead

      R 1 Reply Last reply Feb 8, 2023, 6:48 PM Reply Quote 1
      • R
        Richard Lohr @PeterJones
        last edited by Feb 8, 2023, 6:48 PM

        @PeterJones Thank You sooo Much! I will be able to use this for so much of the stuff that I am doing.

        1 Reply Last reply Reply Quote 0
        • A
          Alan Kilborn
          last edited by Feb 8, 2023, 7:05 PM

          Just a regex tip inspired by Peter’s discussion:

          It is almost always a good idea to use the construct x*? (x could be literal x or some other regex) right out of the gate, when crafting a new regex, rather than most people’s first impulse to use the shorter-to-type x*.

          The *? version will match minimally, whereas the * version will match maximally, to satisfy its craving for a match.

          if *? isn’t meeting the need, then consider a potential switch as you refine the regex.

          R 1 Reply Last reply Feb 8, 2023, 9:07 PM Reply Quote 3
          • R
            Richard Lohr @Alan Kilborn
            last edited by Feb 8, 2023, 9:07 PM

            @Alan-Kilborn Thanks for the help Alan! It’s so amazing that there are people like you and Peter who are willing to share your time and expertise to help people! I spent quite a bit of time trying to find an easier way to accomplish the above, and didn’t get close.

            A 1 Reply Last reply Feb 8, 2023, 9:15 PM Reply Quote 2
            • A
              Alan Kilborn @Richard Lohr
              last edited by Feb 8, 2023, 9:15 PM

              @Richard-Lohr said in Random line search:

              amazing that there are people like you and Peter who are willing to share your time and expertise to help people

              Ha, well, not everyone likes us and our replies. :-)

              I have, and maybe it is the same for Peter, free cycles during the day while I do other things that don’t keep me 100% utilized. Doing N++ stuff, including helping out here is a nice diversion and time-filler. And why not use your free cycles to maybe do some good, right?

              R 1 Reply Last reply Feb 11, 2023, 6:54 PM Reply Quote 3
              • R
                Richard Lohr @Alan Kilborn
                last edited by Feb 11, 2023, 6:54 PM

                When I initially received the solution, I tried it, and it worked. Then I attempted to use the code and finalize my files and find that it will not process all of the records to the EOF. Once I invoke the search, it usually processes 5-11 records and then it returns an error message: “find: invalid irregular expression”. If I used the find function to find the next occurrence of “asphalt” and attempt to S&R using the function again, it processes 1 or two records and then returns that same error message.

                I did study Guy’s writeup, I’ve subsequently done a lot of reading about lookaheads and I have found absolutely nothing that will point me in the right direction.

                Additional Info

                • Using N++ v8.4.8
                • File was orig 3.1 mil lines, reduced file to 9500 lines, symptoms are exactly the same.
                • There were a total of 941 occurrences of “asphalt” in file.
                • “ALL” of the records in the files are exactly the same format as I initially provided.
                A 1 Reply Last reply Feb 11, 2023, 7:02 PM Reply Quote 0
                • A
                  Alan Kilborn @Richard Lohr
                  last edited by Feb 11, 2023, 7:02 PM

                  @Richard-Lohr

                  I have no reason to doubt you, but I’ve never seen a situation where “invalid regular expression” is returned only sometimes for the same Find what data. I mean, if an expression is valid once, it should be valid for all time; it doesn’t depend upon data, AFAIK.

                  I’m talking about this specific error message, which seems to be what you’re getting as well:

                  4bd84d54-2e0f-49b8-bc93-fc9ef3132348-image.png

                  When you obtain that, can you hover of the little 3-dot speech balloon and indicate what that says; for me and my simple bad regex, I obtained:

                  566b0b08-4bb4-4a55-a61f-1dbe8f81807e-image.png

                  R 1 Reply Last reply Feb 11, 2023, 7:07 PM Reply Quote 1
                  • R
                    Richard Lohr @Alan Kilborn
                    last edited by Feb 11, 2023, 7:07 PM

                    @Alan-Kilborn

                    Darn, I have never hovered over that.

                    The additional info says:
                    Ran out of stack space trying to match the regular expression.

                    A 1 Reply Last reply Feb 11, 2023, 7:09 PM Reply Quote 2
                    • A
                      Alan Kilborn @Richard Lohr
                      last edited by Alan Kilborn Feb 11, 2023, 7:10 PM Feb 11, 2023, 7:09 PM

                      @Richard-Lohr

                      OK, that error message IS data-dependent, in conjunction with the constant Find what regex. What it means is that the combination of data and expression is “too much” for the regex engine. It’s a big hint to refactor your expression so that the engine doesn’t overflow. Not an easy task for a newbie, however.

                      R 1 Reply Last reply Feb 11, 2023, 7:12 PM Reply Quote 2
                      • R
                        Richard Lohr @Alan Kilborn
                        last edited by Feb 11, 2023, 7:12 PM

                        @Alan-Kilborn Not just a newbie, a 66 year old that wishes for the days when he still had half a brain. ;-)

                        A 1 Reply Last reply Feb 11, 2023, 7:18 PM Reply Quote 0
                        • A
                          Alan Kilborn @Richard Lohr
                          last edited by Feb 11, 2023, 7:18 PM

                          @Richard-Lohr

                          OK, well, you’ve got 10 years on me, but we ain’t dead yet as they say…

                          The problem with posting here with a general question is that respondents can only “take their best shot” at the solution; they don’t have your exact data situation.

                          And it is very much likely that something in your specific data is causing part of the regex to “go haywire”, and do some catastrophic (as far as computer memory usage) backtracking.

                          On one hand, it is nice that this can be detected, rather than just Notepad++ crashing. On the other hand, it’s nice to have things just “work”.

                          R 1 Reply Last reply Feb 11, 2023, 7:36 PM Reply Quote 2
                          • R
                            Richard Lohr @Alan Kilborn
                            last edited by Feb 11, 2023, 7:36 PM

                            @Alan-Kilborn I just changed the “</Placemark” statement to “</SchemaData></ExtendedData”, and the expression works. Don’t know exactly why.

                            I wonder why it is some people lose their mental acuity, while others don’t? I am an extremely serious hiker and have remained extremely strong with high endurance. If I had to make a choice, I think that I would rather lose the mental part, people understand it when you tell them that it’s because your old.

                            Again, Thank you!

                            1 Reply Last reply Reply Quote 3
                            4 out of 13
                            • First post
                              4/13
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors