• Login
Community
  • Login

Regex: I want to check if the words between the tag <spam class> </spam> start with diacritics

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
4 Posts 2 Posters 777 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • V
    Vasile Caraus
    last edited by Vasile Caraus Aug 7, 2019, 8:09 AM Aug 7, 2019, 8:07 AM

    hello. I want to check if the words between <spam class> </spam> start with diacritics.

    Exemple: <p class="text_obisnuit"><span class="text_obisnuit2">This Immediacy Is A Product</span> of the direct particularity of purpose-oriented information that the network caters to.</p>

    So, the regex should match all tags that contains words that doesn’t start with diacritics such as: This immediacy is a product

    And if does, to replace the first letter from those words it with diacritics.

    1 Reply Last reply Reply Quote 1
    • V
      Vasile Caraus
      last edited by Aug 7, 2019, 9:41 AM

      please consider diacritics = capital letters = uppercase

      1 Reply Last reply Reply Quote 0
      • P
        PeterJones
        last edited by Aug 7, 2019, 1:27 PM

        @Vasile-Caraus said:

        diacritics = capital letters = uppercase

        Thank you for the clarification. (In my mind, “diacritics ” are accented characters, like à, and I was confused.)

        Also, thank you for both the want-to-match and the don’t-want-to-match examples.

        I have a solution that works for me given your example text, though I needed to run it multiple times, because I don’t know how to “back up” the search point; @guy038 will probably come up with a one-shot regex.

        If I start with the data

        <p class="text_obisnuit"><span class="text_obisnuit2">This Immediacy Is A Product</span> of the direct particularity of purpose-oriented information that the network caters to.</p>
        <p class="text_obisnuit"><span class="text_obisnuit2">This immediacy is a product</span> of the direct particularity of purpose-oriented information that the network caters to.</p>
        <p class="text_obisnuit"><span class="text_obisnuit2">This Immediacy Is A Product</span> of the direct particularity of purpose-oriented information that the network caters to.</p>
        <p class="text_obisnuit"><span class="text_obisnuit2">This immediacy is a product</span> of the direct particularity of purpose-oriented information that the network caters to.</p>
        

        Where some have words only starting with capitals and others have some words inside that start with lowercase, my thought process is “inside of the span tag, possibly after other words, look for a word boundary followed by a lowercase, and convert that lowercase to an uppercase”.

        • FIND = (?-i)<span class=[^>]*>.*?\K\b[a-z](?=.*?</span>)
        • REPLACE = \u$0
        • MODE = regular expression

        Since the longest phrase inside the <span>...</span> had four words that didn’t start with a capital letter, I had to run Replace All 4 times to get all the words capitalized. But in the end, I had:

        <p class="text_obisnuit"><span class="text_obisnuit2">This Immediacy Is A Product</span> of the direct particularity of purpose-oriented information that the network caters to.</p>
        <p class="text_obisnuit"><span class="text_obisnuit2">This Immediacy Is A Product</span> of the direct particularity of purpose-oriented information that the network caters to.</p>
        <p class="text_obisnuit"><span class="text_obisnuit2">This Immediacy Is A Product</span> of the direct particularity of purpose-oriented information that the network caters to.</p>
        <p class="text_obisnuit"><span class="text_obisnuit2">This Immediacy Is A Product</span> of the direct particularity of purpose-oriented information that the network caters to.</p>
        

        which is what I believe you want.
        -----
        FYI: I often add this to my response in regex threads, unless I am sure the original poster has seen it before. Here is some helpful information for finding out more about regular expressions, and for formatting posts in this forum (especially quoting data) so that we can fully understand what you’re trying to ask:

        This forum is formatted using Markdown , with a help link buried on the little grey ? in the COMPOSE window/pane when writing your post. For more about how to use Markdown in this forum, please see @Scott-Sumner’s post in the “how to markdown code on this forum” topic , and my updates near the end . It is very important that you use these formatting tips – using single backtick marks around small snippets, and using code-quoting for pasting multiple lines from your example data files – because otherwise, the forum will change normal quotes ("") to curly “smart” quotes (“”), will change hyphens to dashes, will sometimes hide asterisks (or if your text is c:\folder\*.txt, it will show up as c:\folder*.txt, missing the backslash). If you want to clearly communicate your text data to us, you need to properly format it.

        If you have further search-and-replace (“matching”, “marking”, “bookmarking”, regular expression, “regex”) needs, study this FAQ and the documentation it points to.

        ps: thanks again for the match and don’t-match; it allowed me to cut a paragraph-and-a-half out of my boilerplate for you. :-)

        1 Reply Last reply Reply Quote 3
        • V
          Vasile Caraus
          last edited by Aug 7, 2019, 2:12 PM

          Great answer, @PeterJones thanks a lot !

          1 Reply Last reply Reply Quote 1
          4 out of 4
          • First post
            4/4
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors