Community
    • Login

    Regex: Find only one line, from 2 similar lines (html tags)

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    15 Posts 4 Posters 608 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Robin CruiseR
      Robin Cruise
      last edited by

      I have this regex, it finds only the meta html tag that contain at least 3 of this words.

      <meta name="description" content=.*(( the | that | of ).*){3,}.*>

      The problem:

      I have this 2 similar lines. Both have the same words, except the second line, where the is in a different place. So why does my regex finds only the second line, and not also the first line? How can I change the regex so as to find both lines?

      <meta name="description" content="the mystery of the art that seeks its meaning.">

      <meta name="description" content="the mystery of art that seeks the its meaning.">

      Alan KilbornA astrosofistaA 2 Replies Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Robin Cruise
        last edited by

        @Robin-Cruise said in Regex: Find only one line, from 2 similar lines (html tags):

        How can I change the regex so as to find both lines?

        I would see the advise HERE for the “and” case.

        You can certainly add in <meta name="description" content= after the ^.

        1 Reply Last reply Reply Quote 0
        • Robin CruiseR
          Robin Cruise
          last edited by

          @Robin-Cruise said in Regex: Find only one line, from 2 similar lines (html tags):

          <meta name=“description” content=.(( the | that | of ).){3,}.*>

          I don’t think that can be apply in my case

          Alan KilbornA 1 Reply Last reply Reply Quote 0
          • Alan KilbornA
            Alan Kilborn @Robin Cruise
            last edited by

            @Robin-Cruise said in Regex: Find only one line, from 2 similar lines (html tags):

            I don’t think that can be apply in my case

            I would NOT have suggested it if it couldn’t be applied to your case, as you’ve stated your case, and as I understand it.

            1 Reply Last reply Reply Quote 0
            • astrosofistaA
              astrosofista @Robin Cruise
              last edited by

              @Robin-Cruise said in Regex: Find only one line, from 2 similar lines (html tags):

              <meta name=“description” content=.(( the | that | of ).){3,}.*>

              I think it’s easy to understand why the first line isn’t matched.

              <meta name="description" content="the mystery of the art that seeks its meaning.">
              

              The first “the” is not matched because it lacks a space before it. Then " of " is matched —notice the spaces surrounding it—, but the second “the” isn’t matched, again because it lacks a space before it, since the previous match —" of "— consumed the required space. Finally, " that " is matched, but you only got two matches, not the required three ones.

              One way to solve the issue is to remove the spaces and surround the group with the symbol \b. See the details in the documentation.

              Just to be clearer:

              <meta name="description" content=.*(\b(the|that|of)\b.*){3,}.*>
              

              HTH

              Alan KilbornA 2 Replies Last reply Reply Quote 1
              • Alan KilbornA
                Alan Kilborn @astrosofista
                last edited by Alan Kilborn

                @astrofist

                I was hoping not to give too much of a “stop-all-thinking-here’s-your-solution” to the OP, a known and repetitive data manipulation “taker”. Thus my pointing to the “formula” for how to do what OP needs, with an implied “go off and try it”.

                I believe we have to continue to promote learning.
                And perhaps some day the takers actually will learn and we’ll have such noise here less and less (because they actually WILL start solving their own problems and not need to post).
                Hmmm, maybe this is wishful thinking.

                However, your info about the spaces was good.
                Regex is sensitive to such extra spaces unless the (?x) directive is used.

                astrosofistaA 1 Reply Last reply Reply Quote 0
                • Alan KilbornA
                  Alan Kilborn
                  last edited by

                  This post is deleted!
                  1 Reply Last reply Reply Quote 0
                  • Alan KilbornA
                    Alan Kilborn @astrosofista
                    last edited by

                    @astrosofista

                    What happened to your final a ? :-)

                    Alan KilbornA 1 Reply Last reply Reply Quote 0
                    • Robin CruiseR
                      Robin Cruise
                      last edited by

                      thanks @astrosofista

                      1 Reply Last reply Reply Quote 1
                      • Alan KilbornA
                        Alan Kilborn @Alan Kilborn
                        last edited by Alan Kilborn

                        @Alan-Kilborn said in Regex: Find only one line, from 2 similar lines (html tags):

                        What happened to your final a ? :-)

                        I guess it is more than the final a that changed. :-)
                        Or…is maybe still changing.
                        Personally, I don’t like when people change their user name here, even slightly.
                        It just confuses what I’m used to.
                        I thought about removing the space between Alan and Kilborn and couldn’t decide conclusively if that was a good or bad idea.
                        I notice when searching for users with a space between one or more words, the user doesn’t appear in the popup suggestion list (that’s why I was considering a change to drop the space).

                        astrosofistaA 1 Reply Last reply Reply Quote 0
                        • Terry RT
                          Terry R
                          last edited by

                          @Alan-Kilborn said in Regex: Find only one line, from 2 similar lines (html tags):

                          Personally, I don’t like when people change their user name here, even slightly.
                          It just confuses what I’m used to.
                          I thought about removing the space between Alan and Kilborn and couldn’t decide conclusively if that was a good or bad idea.

                          Sounds like you are in 2 minds on the matter. ;-))
                          I admit it was an issue when I first started posting trying to get the right name when typing the @. I noted just now that with your “handle” I can type @k and you come right to the top, even though the k is further down the string. So there does seem to be some intelligence with the lookup table.

                          It’s also not consistent when it allows the names with spaces against our icons, yet when referencing users the system insists on replacing spaces with -.

                          Terry

                          PS keep it as it is!

                          1 Reply Last reply Reply Quote 1
                          • astrosofistaA
                            astrosofista @Alan Kilborn
                            last edited by

                            @Alan-Kilborn

                            Yes, I am aware of OP’s behavior and in fact I believe this is the first time I have responded to one of his posts. However, I think my response was also educational, as I explained to him why his regular expression was failing. It was failing because of something simple to understand, but which for some reason eluded OP.

                            Since each term required a space, if there were two terms in a row, such as “of the”, there would have to have been two spaces between them for there to be a match. Since there were not, the regex failed.

                            The lesson here, and I hope OP will learn it and apply it from here on out, is to always be aware of the position of the reading head as it moves through the string. This would prevent a lot of trouble and frustration.

                            As for why I posted a solution, well, the explanation is also simple: I couldn’t resist :)

                            1 Reply Last reply Reply Quote 1
                            • astrosofistaA
                              astrosofista @Alan Kilborn
                              last edited by

                              @Alan-Kilborn

                              Nope, I didn’t change my nickname. I’m still astrosofista. I don’t know what could have happened.

                              1 Reply Last reply Reply Quote 0
                              • Alan KilbornA
                                Alan Kilborn
                                last edited by Alan Kilborn

                                I actually thought the OP was putting the extra spaces in for some sort of emphasis, even though they used this type of markup on it. I don’t know, posters do weird things some times. That’s why I didn’t even consider the spacing originally.

                                Something strange is going on.
                                While I was posting earlier, I saw your username being shown as “astrofist” and even “astrophista”! It was weird!
                                Now you are back where you belong as “astrofista”. BTW, is there any meaning to that name? Maybe you are an astrophysicist?

                                astrosofistaA 1 Reply Last reply Reply Quote 0
                                • astrosofistaA
                                  astrosofista @Alan Kilborn
                                  last edited by

                                  @Alan-Kilborn

                                  My guess is that OP used the spaces as a sort of word delimiter, but who knows.

                                  astrosofista is the nick I used on Twitter for an account that was indeed about space related topics. Since I used that account to register for this forum, I left the same nick.

                                  And although I like astronomy very much, I am not an astrophysicist. My academic studies are in philosophy. I have been teaching an introductory course in propositional logic and philosophy of science for twenty years. And now I am close to retirement - I will have more time to play with regex, scripting and the like.

                                  1 Reply Last reply Reply Quote 1
                                  • First post
                                    Last post
                                  The Community of users of the Notepad++ text editor.
                                  Powered by NodeBB | Contributors