Regex: Find only one line, from 2 similar lines (html tags)



  • @Robin-Cruise said in Regex: Find only one line, from 2 similar lines (html tags):

    <meta name=“description” content=.(( the | that | of ).){3,}.*>

    I don’t think that can be apply in my case



  • @Robin-Cruise said in Regex: Find only one line, from 2 similar lines (html tags):

    I don’t think that can be apply in my case

    I would NOT have suggested it if it couldn’t be applied to your case, as you’ve stated your case, and as I understand it.



  • @Robin-Cruise said in Regex: Find only one line, from 2 similar lines (html tags):

    <meta name=“description” content=.(( the | that | of ).){3,}.*>

    I think it’s easy to understand why the first line isn’t matched.

    <meta name="description" content="the mystery of the art that seeks its meaning.">
    

    The first “the” is not matched because it lacks a space before it. Then " of " is matched —notice the spaces surrounding it—, but the second “the” isn’t matched, again because it lacks a space before it, since the previous match —" of "— consumed the required space. Finally, " that " is matched, but you only got two matches, not the required three ones.

    One way to solve the issue is to remove the spaces and surround the group with the symbol \b. See the details in the documentation.

    Just to be clearer:

    <meta name="description" content=.*(\b(the|that|of)\b.*){3,}.*>
    

    HTH



  • @astrofist

    I was hoping not to give too much of a “stop-all-thinking-here’s-your-solution” to the OP, a known and repetitive data manipulation “taker”. Thus my pointing to the “formula” for how to do what OP needs, with an implied “go off and try it”.

    I believe we have to continue to promote learning.
    And perhaps some day the takers actually will learn and we’ll have such noise here less and less (because they actually WILL start solving their own problems and not need to post).
    Hmmm, maybe this is wishful thinking.

    However, your info about the spaces was good.
    Regex is sensitive to such extra spaces unless the (?x) directive is used.



  • This post is deleted!


  • @astrosofista

    What happened to your final a ? :-)



  • thanks @astrosofista



  • @Alan-Kilborn said in Regex: Find only one line, from 2 similar lines (html tags):

    What happened to your final a ? :-)

    I guess it is more than the final a that changed. :-)
    Or…is maybe still changing.
    Personally, I don’t like when people change their user name here, even slightly.
    It just confuses what I’m used to.
    I thought about removing the space between Alan and Kilborn and couldn’t decide conclusively if that was a good or bad idea.
    I notice when searching for users with a space between one or more words, the user doesn’t appear in the popup suggestion list (that’s why I was considering a change to drop the space).



  • @Alan-Kilborn said in Regex: Find only one line, from 2 similar lines (html tags):

    Personally, I don’t like when people change their user name here, even slightly.
    It just confuses what I’m used to.
    I thought about removing the space between Alan and Kilborn and couldn’t decide conclusively if that was a good or bad idea.

    Sounds like you are in 2 minds on the matter. ;-))
    I admit it was an issue when I first started posting trying to get the right name when typing the @. I noted just now that with your “handle” I can type @k and you come right to the top, even though the k is further down the string. So there does seem to be some intelligence with the lookup table.

    It’s also not consistent when it allows the names with spaces against our icons, yet when referencing users the system insists on replacing spaces with -.

    Terry

    PS keep it as it is!



  • @Alan-Kilborn

    Yes, I am aware of OP’s behavior and in fact I believe this is the first time I have responded to one of his posts. However, I think my response was also educational, as I explained to him why his regular expression was failing. It was failing because of something simple to understand, but which for some reason eluded OP.

    Since each term required a space, if there were two terms in a row, such as “of the”, there would have to have been two spaces between them for there to be a match. Since there were not, the regex failed.

    The lesson here, and I hope OP will learn it and apply it from here on out, is to always be aware of the position of the reading head as it moves through the string. This would prevent a lot of trouble and frustration.

    As for why I posted a solution, well, the explanation is also simple: I couldn’t resist :)



  • @Alan-Kilborn

    Nope, I didn’t change my nickname. I’m still astrosofista. I don’t know what could have happened.



  • I actually thought the OP was putting the extra spaces in for some sort of emphasis, even though they used this type of markup on it. I don’t know, posters do weird things some times. That’s why I didn’t even consider the spacing originally.

    Something strange is going on.
    While I was posting earlier, I saw your username being shown as “astrofist” and even “astrophista”! It was weird!
    Now you are back where you belong as “astrofista”. BTW, is there any meaning to that name? Maybe you are an astrophysicist?



  • @Alan-Kilborn

    My guess is that OP used the spaces as a sort of word delimiter, but who knows.

    astrosofista is the nick I used on Twitter for an account that was indeed about space related topics. Since I used that account to register for this forum, I left the same nick.

    And although I like astronomy very much, I am not an astrophysicist. My academic studies are in philosophy. I have been teaching an introductory course in propositional logic and philosophy of science for twenty years. And now I am close to retirement - I will have more time to play with regex, scripting and the like.


Log in to reply