RegEx to find a "p" tag followed by a space, followed by an angled bracket but skip any capital letters and span tags



  • <p[^>]*>\s*(<)(?![A-Z])(?!span) is skipping the span tags but not the words before the first opening angled bracket <



  • @Scott-Nielson For example, <p style=color: black> <head should be found but not <p style=color: black> Head<



  • @Scott-Nielson

    I cannot replicate the match with your given regex on either of those two examples.

    ec6dba86-9f8e-4f82-be7f-5eef59f2f3ce-image.png

    If I remove the lookaheads, it will match the first, not the second

    ae0224bf-3b5c-4aa8-b12d-4f1c801cf51a-image.png

    So I’m not seeing why you think it matches the wrong one. Please make sure the regex was properly shown in your message, and make sure the data displayed in your post matches what it is in your actual file.



  • @PeterJones OK, forget my RegEx. Please tell me how to find, <p style=color: black> < but not <p style=color: black> Head<, <p style=color: black> Sing< or <p style=color: blue> <span



  • @Scott-Nielson ,

    Well, the reduced one I showed in screenshot, it only matches 2/4, so it’s almost there. <p[^>]*>\s*(<)

    10b8d9e9-fd0b-4f66-852d-0b0b35dab7eb-image.png

    Then you just have to put back in the (?!span) to eliminate the fourth line to not match the final instance <p[^>]*>\s*(<)(?!span)
    ba458005-5d04-4d0c-8fe7-cd48876cef50-image.png



  • @PeterJones your RegEx helps skip only the span tag after the p tag but please tell me how to find, <p style=color: black> < but not <p style=color: black> Head<, <p style=color: black> Sing<, <p style=color: black> France< or <p style=color: blue> <span



  • @PeterJones, I want one RegEx to find everything at once (and skip what I mentioned above), not 2 RegExes!



  • This post is deleted!


  • (deleted first attempt and rephrased)

    In the document below, which has all of your test cases,

    <p style=color: black> <
    <p style=color: black> Head<
    <p style=color: black> Sing<
    <p style=color: blue> <span
    <p style=color: black> France<
    

    … the one final regex I already gave you, <p[^>]*>\s*(<)(?!span) already matches only the <p style=color: black> <, and none of the other examples you have shown. As was shown by my screenshot before, for all but the new “France” example. And I’m not sure why you though that the “France” example was any different than your other two text-before-the-< examples. From the regex perspective, whether it says Head or Sing or France is completely irrelevant. If there is ANY non-space between the > after p and the next <, it won’t match. The same has been true for all the regex I have presented in this topic.

    73a63392-b87e-409a-8847-c83b935bd36b-image.png

    ----

    Please note: This Community Forum is not a data transformation service; you should not expect to be able to always say “I have data like X and want it to look like Y” and have us do all the work for you. If you are new to the Forum, and new to regular expressions, we will often give help on the first one or two data-transformation questions, especially if they are well-asked and you show a willingness to learn; and we will point you to the documentation where you can learn how to do the data transformations for yourself in the future. But if you repeatedly ask us to do your work for you, you will find that the patience of usually-helpful Community members wears thin. The best way to learn regular expressions is by experimenting with them yourself, and getting a feel for how they work; having us spoon-feed you the answers without you putting in the effort doesn’t help you in the long term and is uninteresting and annoying for us.



  • OK, thanks a lot @PeterJones



  • @PeterJones, If I want to find a p tag followed by a span tag but skip an a name tag after those tags, will this RegEx work: <p[^>]*>\s*<span[^>]*>\s*(<)(?!a\s*name) or does it need to be tweaked (I feel the \s* just before the name may not be right)?



  • @PeterJones I also want it to skip an, a href tag, for example, <p class="MsoNormal" style="font-family: &quot;verdana&quot;; font-size: 18px; color: rgb(102, 102, 102); line-height: 18px; text-align: right;" align="right"><span style="font-family: Verdana,sans-serif;"> <a href="#" style="text-decoration: none; color: rgb(46, 150, 226);"> but it should find, <p class="MsoNormal" style="font-family: &quot;verdana&quot;; font-size: 18px; color: rgb(102, 102, 102); line-height: 18px; text-align: right;" align="right"><span style="font-family: Verdana,sans-serif;"> <



  • @PeterJones This RegEx might work: <p[^>]*>\s*<span[^>]*>\s*(<)(?!a\s*name|href) but I’m waiting for your expert guidance!



  • @PeterJones The RegEx I typed just above is not working, so I will need your help. Please help! This seems better, <p[^>]*>\s*<span[^>]*>\s*(<)(?!a\s*name)(?!a\s*href), but is it right?



  • @Scott-Nielson ,

    This RegEx might work

    Try it and see. We cannot tell you if it will actually meet your needs.

    I’m waiting for your expert guidance!

    Seems an inefficient way to solve your problems.

    The RegEx I typed just above is not working

    Good for you for trying. Sorry it didn’t work. Maybe try reading some more of the docs, or breaking the problem down into smaller pieces and try to get those smaller pieces working right before putting them together.

    This seems better, but is it right?

    How can we know? You give minor examples, but whether it matches all the appropriate data for your data set or not, including exceptions that you don’t show us, is not something that any of us here can tell you.

    With regex, you have to spend the effort, and try to figure it out. In every one of your half-dozen or mroe regex questions you’ve asked in the last few months, when I or one of the other regulars here who answer regex questions have answered, we have had to experiment to get it – it’s not like we can magically say “abracadabra” and the right regex suddenly appears. Each time I’ve helped you, I take the bits of regex syntax knowledge I have, and try combinations of those terms that seem like they’d do the job, and try them; if each piece does what I think, then I expand it to the next requirement, until it seems to match (in which case, job’s done, yay!) or until it stops matching parts that I think it should, at which point I back up and try a different tactic. The only way you’re going to learn regex is by figuring out what the individual pieces do, and try to take those pieces to solve the problems you have. You are not going to learn if I keep on handing you the answers.

    The reasons I am in this forum are to 1) learn more about Notepad++, 2) help others learn more about Notepad++, and 3) enjoy myself while doing so. Unfortunately, repeated questions from the same person just asking us to solve their data transformation needs doesn’t tick boxes #1 or #3 for me; and after a certain number of answers, it becomes obvious that the individual isn’t wanting to learn or isn’t able to learn by the way I answer, so box #2 is failing. If I fail at #2 enough, then eventually, I have to leave room for someone else to succeed on #2 with that person.

    So I wish you all the best, but my teaching style doesn’t seem to be helping you, so it’s not doing either of us any good for me to continue. Maybe someone else here will be able to help you learn where I could not. Good luck.



  • @PeterJones the last RegEx I typed above worked for me. I thank you you for your time, patience, help and support. I was not sure about it and thought I should ask but since you dislike it, I will ask for a solution here only if I can’t figure out what to do on my own next time. Thanks again.


Log in to reply