Community
    • Login

    Regex: Change the first lowercase letters of each word to capital letter on html tags

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    13 Posts 3 Posters 4.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @Hellena Crainicu
      last edited by

      @hellena-crainicu

      Using the technique from HERE, your solution becomes:

      find: (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(?-i:(\w)(\w*))
      repl: \U${1}\E${2}
      mode: Regular expression

      Hellena CrainicuH 1 Reply Last reply Reply Quote 2
      • Hellena CrainicuH
        Hellena Crainicu @Alan Kilborn
        last edited by

        @alan-kilborn Is not working, it change all the text, not just the <title> tag

        Try this text:

        <title>semnificațiile elocinței lui Burke<title>
        
        S-ar părea că, de cîteva decenii încoace. în istoriile literaturii engleze, semnificațiile elocinței lui Burke nu mai tind să le pună în umbră pe acelea ale gîndirii lui estetice. Au existat, neîndoielnic, motive să se stăruie asupra trăsăturilor unui portret, conturat cu evidentă grandoare, al unei personalități care a înrîurit atît de profund mentalitatea politică britanică din epoca Războiului american pentru independență și a Revoluției franceze. Evoluția ideilor sale politice este, dealtfel extrem de caracteristică pentru o întreagă aripă a liberalilor care, după victoria iacobinilor în Franța, s-a îndreptat spre o poziție net conservatoare.
        
        Alan KilbornA 1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @Hellena Crainicu
          last edited by

          @hellena-crainicu said in Regex: Change the first lowercase letters of each word to capital letter on html tags:

          Is not working, it change all the text, not just the <title> tag
          Try this text:

          That text violates your original specification:

          6fcca1a4-cefa-4caa-a12e-cc711697495a-image.png

          So of course it doesn’t work.
          Actually it does work because it is still trying to find the </title> tag, and apparently is unsuccessful!

          Hellena CrainicuH 2 Replies Last reply Reply Quote 1
          • Hellena CrainicuH
            Hellena Crainicu @Alan Kilborn
            last edited by

            @alan-kilborn you right, IT WORKS

            thanks a lot

            1 Reply Last reply Reply Quote 0
            • Hellena CrainicuH
              Hellena Crainicu @Alan Kilborn
              last edited by

              @alan-kilborn by the way, sir. One single question.

              If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

              I try this regex:

              FIND: (.*)([A-Z]+)
              REPLACE BY: \L$1$2

              Seems to be good for all letters - except Diacritics (Accent Marks)

              Hellena CrainicuH 1 Reply Last reply Reply Quote 0
              • Hellena CrainicuH
                Hellena Crainicu @Hellena Crainicu
                last edited by

                @hellena-crainicu said in Regex: Change the first lowercase letters of each word to capital letter on html tags:

                @alan-kilborn by the way, sir. One single question.

                If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

                I try this regex:

                FIND: (.*)([A-Z]+)
                REPLACE BY: \L$1$2

                Seems to be good for all letters - except Diacritics (Accent Marks)

                For this case, this regex Works but only in Sublime Text:

                FIND: ([A-Z])(.*)

                REPLACE BY: \L$1$2

                1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hi, @hellena-crainicu, @alan-kilborn and All,

                  Back to your first challenge :

                  I have this html tag:

                  <title>sunrise must gone for a moment</title>

                  The Output must be:

                  <title>Sunrise Must Gone For A Moment</title>

                  Don’t forget, @hellena-crainicu, that the @alan-kilborn’s solution works correctly ONLY IF :

                  • You moved the caret at the very beginning of current file

                  OR

                  • You ticked the Wrap around search option

                  before performing the replacement !


                  Now, Alan, a second and more simple version would be :

                  SEARCH (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(\w+)

                  REPLACE \u$0

                  Indeed, the lowercase argument \u, in replacement, means : just change the first letter of $0 in uppercase !


                  Now, @hellena-crainicu, regarding your second challenge :

                  If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

                  I try this regex:

                  FIND: (.*)([A-Z]+)
                  REPLACE BY: \L$1$2

                  Seems to be good for all letters - except Diacritics (Accent Marks)

                  You do not seem to understand the role of the different case modifiers ! Here is a summary, which acts ONLY in the replacement regex !

                  • The \u case modifier change the next char, of current replacement string, in uppercase

                  • The \l case modifier change the next char, of current replacement string, in lowercase

                  • The \U case modifier change all the next chars, of the current replacement string, in uppercase, except for a char preceded by \l, until a \L or \E case modifier occur

                  • The \L case modifier change all the next chars, of the current replacement string, in lowercase, except for a char preceded by \u, until a \U or \E case modifier occur

                  • The \E case modifier cancels any subsequent case changes, induced by the \U and/or \L case modifiers


                  You said :

                  If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

                  Your phrasing is quite ambiguous ! Do you mean :

                  • A I want to convert all letters in their uppercase form ?

                  • B I want to convert the first letter, of each word, in its uppercase form and leave all other letters, of each word, untouched ?

                  • C I want to convert the first letter, of each word, in its uppercase form and change all subsequent letters, of each word, in its lowercase form ?

                  • D I want to convert the first letter of a sentence, in its uppercase form and leave all other letters, of the sentence, untouched ?

                  • E I want to convert the first letter of a sentence, in its uppercase form and change all subsequent letters, of the sentence, in its lowercase form ?

                  • …

                  So, to my mind, if we consider traditional documents ( not code files ! ), this leads to these different regex S/R below :

                  • For case A :

                    • SEARCH \w+

                    • REPLACE \U$0

                  • For case B :

                    • SEARCH (\w)(\w*)

                    • REPLACE \u$1\E$2

                  • For case C :

                    • SEARCH (\w)(\w*)

                    • REPLACE \u$1\L$2

                  • For case D :

                    • SEARCH (?:\.\W*|\R)\K\w

                    • REPLACE \u$0

                  • For case E :

                    • SEARCH (?:\.\W*|\R)\K(\w).+?(?=\.|\R)

                    • REPLACE \u$1\L$2


                  Regarding cases D and E, some regex improvements could be needed as there are still some drawbacks with some specific expressions ! If I test the regexes against the license.txt file, it would match, for instance :

                  • h and h@free, in the part ©2016 Don HO don.h@free.fr, giving ©2016 Don HO don.H@free.fr, after replacement

                  • The letters a, b and c, in parts like a) You must cause…, giving A) you must cause…, after replacement

                  Best Regards

                  guy038

                  Hellena CrainicuH Alan KilbornA 2 Replies Last reply Reply Quote 0
                  • Hellena CrainicuH
                    Hellena Crainicu @guy038
                    last edited by Hellena Crainicu

                    hello @guy038

                    Your regex replace all text, even outside <title> tag. I need to make the replacement only inside <title> </title> tag.

                    Please test your regex on this simple example:

                    <title>Semnificațiile Elocinței Lui Burke<title>
                    
                    S-Ar Părea CĂ, De CÎteva Decenii încoace.
                    

                    Also, check on the same example the cases you give.

                    If someone wants to convert all capital letters (with diacritics) into lowercase letter, doesn’t work any of your cases.

                    1 Reply Last reply Reply Quote 0
                    • Alan KilbornA
                      Alan Kilborn @guy038
                      last edited by

                      @guy038 said in Regex: Change the first lowercase letters of each word to capital letter on html tags:

                      Now, Alan, a second and more simple version would be…

                      OK, but since we already have a recipe for replacing inside delimiters, I really don’t see value in simplification; just apply the recipe.

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by guy038

                        Hello, @hellena-crainicu, @alan-kilborn and All,

                        First, your inverse video example is erroneous ( as @alan-kilborn already told you ) ! You should have written this correct INPUT text with an ending tag :

                        <title>semnificațiile elocinței lui burke</title>
                        

                        Now, I’m sorry but my regex and the @alan-kilborn version work, both, as expected ! For instance, using the following sample, in a new tab :

                        
                        s-ar părea că, de cîteva decenii încoace.
                        
                        <title>semnificațiile elocinței lui burke</title>
                        
                        s-ar părea că, de cîteva decenii încoace.
                        
                        <title>semnificațiile elocinței lui burke</title>
                        
                        s-ar părea că, de cîteva decenii încoace.
                        
                        <title>semnificațiile elocinței lui burke</title>
                        
                        s-ar părea că, de cîteva decenii încoace.
                        
                        • Open the Replace dialog ( Ctrl + H )

                          • SEARCH (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(\w)(\w*)    ( I omitted the useless -i modifier, near the end of the regex ! )

                          • REPLACE \U${1}\E${2}

                        • OR

                          • SEARCH (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(\w+)

                          • REPLACE \u$0

                        • Tick the Wrap around option ( IMPORTANT )

                        • Select the Regular expression search mode

                        • Click, once, on the Replace All button ( Do not use the Replace button ! )

                        You’ll get the expected OUTPUT text :

                        
                        s-ar părea că, de cîteva decenii încoace.
                        
                        <title>Semnificațiile Elocinței Lui Burke</title>
                        
                        s-ar părea că, de cîteva decenii încoace.
                        
                        <title>Semnificațiile Elocinței Lui Burke</title>
                        
                        s-ar părea că, de cîteva decenii încoace.
                        
                        <title>Semnificațiile Elocinței Lui Burke</title>
                        
                        s-ar părea că, de cîteva decenii încoace.
                        

                        As you can see, just each word of text between the tags <title and </title> are concerned, with their first letter in uppercase !


                        Now, in the second part of my previous post, when I spoke about cases A, B, …, I was not referring to your first challenge at all ! I just gave you general regexes, whatever text is embedded within <title>...........</title> tags or not !

                        BTW, I should have added the case F : I want to convert all letters in their lowercase form, leading to the regex S/R :

                        SEARCH \w+

                        REPLACE \L$0


                        Finally, I agree with you : the accentuated characters are not handled by these case modifiers ! Last year, I created an issue regarding this problem. Refer :

                        https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9636

                        Unfortunately, because of current global C Locale and some other stuff, there was a performance issue. So, this issue was reverted by Don :

                        https://github.com/notepad-plus-plus/notepad-plus-plus/commit/6844df039d54557a93a75752d651d5b9bb49f7ed

                        Best Regards,

                        guy038

                        @alan-kilborn : Yes, just an alternate method. Not essential ;-))

                        Hellena CrainicuH 2 Replies Last reply Reply Quote 1
                        • Hellena CrainicuH
                          Hellena Crainicu @guy038
                          last edited by

                          This post is deleted!
                          1 Reply Last reply Reply Quote 0
                          • Hellena CrainicuH
                            Hellena Crainicu @guy038
                            last edited by

                            @guy038

                            I made a change to @guy038 regex formula, as to modify something else.

                            So, If someone wants to convert the first letter of the first word at the beginning of <title> tag, from lowercase to a capital letter:

                            Use this regex:

                            FIND: (<title>)(.\W*)
                            REPLACE BY: \1\U$2

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            The Community of users of the Notepad++ text editor.
                            Powered by NodeBB | Contributors