Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Regex: Change the first lowercase letters of each word to capital letter on html tags

    Help wanted · · · – – – · · ·
    3
    13
    928
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Hellena Crainicu
      Hellena Crainicu last edited by

      I have this html tag:

      <title>sunrise must gone for a moment</title>

      The Output must be:

      <title>Sunrise Must Gone For A Moment</title>

      SO, I use this regex. But my regex makes the replacing only for the first word, and I want to change all words from the <title> tag

      Find what: (<title>)([a-z])
      Replace with: \1\u$2

      Alan Kilborn 1 Reply Last reply Reply Quote 0
      • Alan Kilborn
        Alan Kilborn @Hellena Crainicu last edited by

        @hellena-crainicu

        Using the technique from HERE, your solution becomes:

        find: (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(?-i:(\w)(\w*))
        repl: \U${1}\E${2}
        mode: Regular expression

        Hellena Crainicu 1 Reply Last reply Reply Quote 2
        • Hellena Crainicu
          Hellena Crainicu @Alan Kilborn last edited by

          @alan-kilborn Is not working, it change all the text, not just the <title> tag

          Try this text:

          <title>semnificațiile elocinței lui Burke<title>
          
          S-ar părea că, de cîteva decenii încoace. în istoriile literaturii engleze, semnificațiile elocinței lui Burke nu mai tind să le pună în umbră pe acelea ale gîndirii lui estetice. Au existat, neîndoielnic, motive să se stăruie asupra trăsăturilor unui portret, conturat cu evidentă grandoare, al unei personalități care a înrîurit atît de profund mentalitatea politică britanică din epoca Războiului american pentru independență și a Revoluției franceze. Evoluția ideilor sale politice este, dealtfel extrem de caracteristică pentru o întreagă aripă a liberalilor care, după victoria iacobinilor în Franța, s-a îndreptat spre o poziție net conservatoare.
          
          Alan Kilborn 1 Reply Last reply Reply Quote 0
          • Alan Kilborn
            Alan Kilborn @Hellena Crainicu last edited by

            @hellena-crainicu said in Regex: Change the first lowercase letters of each word to capital letter on html tags:

            Is not working, it change all the text, not just the <title> tag
            Try this text:

            That text violates your original specification:

            6fcca1a4-cefa-4caa-a12e-cc711697495a-image.png

            So of course it doesn’t work.
            Actually it does work because it is still trying to find the </title> tag, and apparently is unsuccessful!

            Hellena Crainicu 2 Replies Last reply Reply Quote 1
            • Hellena Crainicu
              Hellena Crainicu @Alan Kilborn last edited by

              @alan-kilborn you right, IT WORKS

              thanks a lot

              1 Reply Last reply Reply Quote 0
              • Hellena Crainicu
                Hellena Crainicu @Alan Kilborn last edited by

                @alan-kilborn by the way, sir. One single question.

                If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

                I try this regex:

                FIND: (.*)([A-Z]+)
                REPLACE BY: \L$1$2

                Seems to be good for all letters - except Diacritics (Accent Marks)

                Hellena Crainicu 1 Reply Last reply Reply Quote 0
                • Hellena Crainicu
                  Hellena Crainicu @Hellena Crainicu last edited by

                  @hellena-crainicu said in Regex: Change the first lowercase letters of each word to capital letter on html tags:

                  @alan-kilborn by the way, sir. One single question.

                  If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

                  I try this regex:

                  FIND: (.*)([A-Z]+)
                  REPLACE BY: \L$1$2

                  Seems to be good for all letters - except Diacritics (Accent Marks)

                  For this case, this regex Works but only in Sublime Text:

                  FIND: ([A-Z])(.*)

                  REPLACE BY: \L$1$2

                  1 Reply Last reply Reply Quote 0
                  • guy038
                    guy038 last edited by guy038

                    Hi, @hellena-crainicu, @alan-kilborn and All,

                    Back to your first challenge :

                    I have this html tag:

                    <title>sunrise must gone for a moment</title>

                    The Output must be:

                    <title>Sunrise Must Gone For A Moment</title>

                    Don’t forget, @hellena-crainicu, that the @alan-kilborn’s solution works correctly ONLY IF :

                    • You moved the caret at the very beginning of current file

                    OR

                    • You ticked the Wrap around search option

                    before performing the replacement !


                    Now, Alan, a second and more simple version would be :

                    SEARCH (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(\w+)

                    REPLACE \u$0

                    Indeed, the lowercase argument \u, in replacement, means : just change the first letter of $0 in uppercase !


                    Now, @hellena-crainicu, regarding your second challenge :

                    If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

                    I try this regex:

                    FIND: (.*)([A-Z]+)
                    REPLACE BY: \L$1$2

                    Seems to be good for all letters - except Diacritics (Accent Marks)

                    You do not seem to understand the role of the different case modifiers ! Here is a summary, which acts ONLY in the replacement regex !

                    • The \u case modifier change the next char, of current replacement string, in uppercase

                    • The \l case modifier change the next char, of current replacement string, in lowercase

                    • The \U case modifier change all the next chars, of the current replacement string, in uppercase, except for a char preceded by \l, until a \L or \E case modifier occur

                    • The \L case modifier change all the next chars, of the current replacement string, in lowercase, except for a char preceded by \u, until a \U or \E case modifier occur

                    • The \E case modifier cancels any subsequent case changes, induced by the \U and/or \L case modifiers


                    You said :

                    If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

                    Your phrasing is quite ambiguous ! Do you mean :

                    • A I want to convert all letters in their uppercase form ?

                    • B I want to convert the first letter, of each word, in its uppercase form and leave all other letters, of each word, untouched ?

                    • C I want to convert the first letter, of each word, in its uppercase form and change all subsequent letters, of each word, in its lowercase form ?

                    • D I want to convert the first letter of a sentence, in its uppercase form and leave all other letters, of the sentence, untouched ?

                    • E I want to convert the first letter of a sentence, in its uppercase form and change all subsequent letters, of the sentence, in its lowercase form ?

                    • …

                    So, to my mind, if we consider traditional documents ( not code files ! ), this leads to these different regex S/R below :

                    • For case A :

                      • SEARCH \w+

                      • REPLACE \U$0

                    • For case B :

                      • SEARCH (\w)(\w*)

                      • REPLACE \u$1\E$2

                    • For case C :

                      • SEARCH (\w)(\w*)

                      • REPLACE \u$1\L$2

                    • For case D :

                      • SEARCH (?:\.\W*|\R)\K\w

                      • REPLACE \u$0

                    • For case E :

                      • SEARCH (?:\.\W*|\R)\K(\w).+?(?=\.|\R)

                      • REPLACE \u$1\L$2


                    Regarding cases D and E, some regex improvements could be needed as there are still some drawbacks with some specific expressions ! If I test the regexes against the license.txt file, it would match, for instance :

                    • h and h@free, in the part ©2016 Don HO don.h@free.fr, giving ©2016 Don HO don.H@free.fr, after replacement

                    • The letters a, b and c, in parts like a) You must cause…, giving A) you must cause…, after replacement

                    Best Regards

                    guy038

                    Hellena Crainicu Alan Kilborn 2 Replies Last reply Reply Quote 0
                    • Hellena Crainicu
                      Hellena Crainicu @guy038 last edited by Hellena Crainicu

                      hello @guy038

                      Your regex replace all text, even outside <title> tag. I need to make the replacement only inside <title> </title> tag.

                      Please test your regex on this simple example:

                      <title>Semnificațiile Elocinței Lui Burke<title>
                      
                      S-Ar Părea CĂ, De CÎteva Decenii încoace.
                      

                      Also, check on the same example the cases you give.

                      If someone wants to convert all capital letters (with diacritics) into lowercase letter, doesn’t work any of your cases.

                      1 Reply Last reply Reply Quote 0
                      • Alan Kilborn
                        Alan Kilborn @guy038 last edited by

                        @guy038 said in Regex: Change the first lowercase letters of each word to capital letter on html tags:

                        Now, Alan, a second and more simple version would be…

                        OK, but since we already have a recipe for replacing inside delimiters, I really don’t see value in simplification; just apply the recipe.

                        1 Reply Last reply Reply Quote 0
                        • guy038
                          guy038 last edited by guy038

                          Hello, @hellena-crainicu, @alan-kilborn and All,

                          First, your inverse video example is erroneous ( as @alan-kilborn already told you ) ! You should have written this correct INPUT text with an ending tag :

                          <title>semnificațiile elocinței lui burke</title>
                          

                          Now, I’m sorry but my regex and the @alan-kilborn version work, both, as expected ! For instance, using the following sample, in a new tab :

                          
                          s-ar părea că, de cîteva decenii încoace.
                          
                          <title>semnificațiile elocinței lui burke</title>
                          
                          s-ar părea că, de cîteva decenii încoace.
                          
                          <title>semnificațiile elocinței lui burke</title>
                          
                          s-ar părea că, de cîteva decenii încoace.
                          
                          <title>semnificațiile elocinței lui burke</title>
                          
                          s-ar părea că, de cîteva decenii încoace.
                          
                          • Open the Replace dialog ( Ctrl + H )

                            • SEARCH (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(\w)(\w*)    ( I omitted the useless -i modifier, near the end of the regex ! )

                            • REPLACE \U${1}\E${2}

                          • OR

                            • SEARCH (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(\w+)

                            • REPLACE \u$0

                          • Tick the Wrap around option ( IMPORTANT )

                          • Select the Regular expression search mode

                          • Click, once, on the Replace All button ( Do not use the Replace button ! )

                          You’ll get the expected OUTPUT text :

                          
                          s-ar părea că, de cîteva decenii încoace.
                          
                          <title>Semnificațiile Elocinței Lui Burke</title>
                          
                          s-ar părea că, de cîteva decenii încoace.
                          
                          <title>Semnificațiile Elocinței Lui Burke</title>
                          
                          s-ar părea că, de cîteva decenii încoace.
                          
                          <title>Semnificațiile Elocinței Lui Burke</title>
                          
                          s-ar părea că, de cîteva decenii încoace.
                          

                          As you can see, just each word of text between the tags <title and </title> are concerned, with their first letter in uppercase !


                          Now, in the second part of my previous post, when I spoke about cases A, B, …, I was not referring to your first challenge at all ! I just gave you general regexes, whatever text is embedded within <title>...........</title> tags or not !

                          BTW, I should have added the case F : I want to convert all letters in their lowercase form, leading to the regex S/R :

                          SEARCH \w+

                          REPLACE \L$0


                          Finally, I agree with you : the accentuated characters are not handled by these case modifiers ! Last year, I created an issue regarding this problem. Refer :

                          https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9636

                          Unfortunately, because of current global C Locale and some other stuff, there was a performance issue. So, this issue was reverted by Don :

                          https://github.com/notepad-plus-plus/notepad-plus-plus/commit/6844df039d54557a93a75752d651d5b9bb49f7ed

                          Best Regards,

                          guy038

                          @alan-kilborn : Yes, just an alternate method. Not essential ;-))

                          Hellena Crainicu 2 Replies Last reply Reply Quote 1
                          • Hellena Crainicu
                            Hellena Crainicu @guy038 last edited by

                            This post is deleted!
                            1 Reply Last reply Reply Quote 0
                            • Hellena Crainicu
                              Hellena Crainicu @guy038 last edited by

                              @guy038

                              I made a change to @guy038 regex formula, as to modify something else.

                              So, If someone wants to convert the first letter of the first word at the beginning of <title> tag, from lowercase to a capital letter:

                              Use this regex:

                              FIND: (<title>)(.\W*)
                              REPLACE BY: \1\U$2

                              1 Reply Last reply Reply Quote 0
                              • First post
                                Last post
                              Copyright © 2014 NodeBB Forums | Contributors