Community
    • Login

    Regex: Change the first lowercase letters of each word to capital letter on html tags

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    13 Posts 3 Posters 4.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @Hellena Crainicu
      last edited by

      @hellena-crainicu said in Regex: Change the first lowercase letters of each word to capital letter on html tags:

      Is not working, it change all the text, not just the <title> tag
      Try this text:

      That text violates your original specification:

      6fcca1a4-cefa-4caa-a12e-cc711697495a-image.png

      So of course it doesn’t work.
      Actually it does work because it is still trying to find the </title> tag, and apparently is unsuccessful!

      Hellena CrainicuH 2 Replies Last reply Reply Quote 1
      • Hellena CrainicuH
        Hellena Crainicu @Alan Kilborn
        last edited by

        @alan-kilborn you right, IT WORKS

        thanks a lot

        1 Reply Last reply Reply Quote 0
        • Hellena CrainicuH
          Hellena Crainicu @Alan Kilborn
          last edited by

          @alan-kilborn by the way, sir. One single question.

          If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

          I try this regex:

          FIND: (.*)([A-Z]+)
          REPLACE BY: \L$1$2

          Seems to be good for all letters - except Diacritics (Accent Marks)

          Hellena CrainicuH 1 Reply Last reply Reply Quote 0
          • Hellena CrainicuH
            Hellena Crainicu @Hellena Crainicu
            last edited by

            @hellena-crainicu said in Regex: Change the first lowercase letters of each word to capital letter on html tags:

            @alan-kilborn by the way, sir. One single question.

            If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

            I try this regex:

            FIND: (.*)([A-Z]+)
            REPLACE BY: \L$1$2

            Seems to be good for all letters - except Diacritics (Accent Marks)

            For this case, this regex Works but only in Sublime Text:

            FIND: ([A-Z])(.*)

            REPLACE BY: \L$1$2

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hi, @hellena-crainicu, @alan-kilborn and All,

              Back to your first challenge :

              I have this html tag:

              <title>sunrise must gone for a moment</title>

              The Output must be:

              <title>Sunrise Must Gone For A Moment</title>

              Don’t forget, @hellena-crainicu, that the @alan-kilborn’s solution works correctly ONLY IF :

              • You moved the caret at the very beginning of current file

              OR

              • You ticked the Wrap around search option

              before performing the replacement !


              Now, Alan, a second and more simple version would be :

              SEARCH (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(\w+)

              REPLACE \u$0

              Indeed, the lowercase argument \u, in replacement, means : just change the first letter of $0 in uppercase !


              Now, @hellena-crainicu, regarding your second challenge :

              If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

              I try this regex:

              FIND: (.*)([A-Z]+)
              REPLACE BY: \L$1$2

              Seems to be good for all letters - except Diacritics (Accent Marks)

              You do not seem to understand the role of the different case modifiers ! Here is a summary, which acts ONLY in the replacement regex !

              • The \u case modifier change the next char, of current replacement string, in uppercase

              • The \l case modifier change the next char, of current replacement string, in lowercase

              • The \U case modifier change all the next chars, of the current replacement string, in uppercase, except for a char preceded by \l, until a \L or \E case modifier occur

              • The \L case modifier change all the next chars, of the current replacement string, in lowercase, except for a char preceded by \u, until a \U or \E case modifier occur

              • The \E case modifier cancels any subsequent case changes, induced by the \U and/or \L case modifiers


              You said :

              If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

              Your phrasing is quite ambiguous ! Do you mean :

              • A I want to convert all letters in their uppercase form ?

              • B I want to convert the first letter, of each word, in its uppercase form and leave all other letters, of each word, untouched ?

              • C I want to convert the first letter, of each word, in its uppercase form and change all subsequent letters, of each word, in its lowercase form ?

              • D I want to convert the first letter of a sentence, in its uppercase form and leave all other letters, of the sentence, untouched ?

              • E I want to convert the first letter of a sentence, in its uppercase form and change all subsequent letters, of the sentence, in its lowercase form ?

              • …

              So, to my mind, if we consider traditional documents ( not code files ! ), this leads to these different regex S/R below :

              • For case A :

                • SEARCH \w+

                • REPLACE \U$0

              • For case B :

                • SEARCH (\w)(\w*)

                • REPLACE \u$1\E$2

              • For case C :

                • SEARCH (\w)(\w*)

                • REPLACE \u$1\L$2

              • For case D :

                • SEARCH (?:\.\W*|\R)\K\w

                • REPLACE \u$0

              • For case E :

                • SEARCH (?:\.\W*|\R)\K(\w).+?(?=\.|\R)

                • REPLACE \u$1\L$2


              Regarding cases D and E, some regex improvements could be needed as there are still some drawbacks with some specific expressions ! If I test the regexes against the license.txt file, it would match, for instance :

              • h and h@free, in the part ©2016 Don HO don.h@free.fr, giving ©2016 Don HO don.H@free.fr, after replacement

              • The letters a, b and c, in parts like a) You must cause…, giving A) you must cause…, after replacement

              Best Regards

              guy038

              Hellena CrainicuH Alan KilbornA 2 Replies Last reply Reply Quote 0
              • Hellena CrainicuH
                Hellena Crainicu @guy038
                last edited by Hellena Crainicu

                hello @guy038

                Your regex replace all text, even outside <title> tag. I need to make the replacement only inside <title> </title> tag.

                Please test your regex on this simple example:

                <title>Semnificațiile Elocinței Lui Burke<title>
                
                S-Ar Părea CĂ, De CÎteva Decenii încoace.
                

                Also, check on the same example the cases you give.

                If someone wants to convert all capital letters (with diacritics) into lowercase letter, doesn’t work any of your cases.

                1 Reply Last reply Reply Quote 0
                • Alan KilbornA
                  Alan Kilborn @guy038
                  last edited by

                  @guy038 said in Regex: Change the first lowercase letters of each word to capital letter on html tags:

                  Now, Alan, a second and more simple version would be…

                  OK, but since we already have a recipe for replacing inside delimiters, I really don’t see value in simplification; just apply the recipe.

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by guy038

                    Hello, @hellena-crainicu, @alan-kilborn and All,

                    First, your inverse video example is erroneous ( as @alan-kilborn already told you ) ! You should have written this correct INPUT text with an ending tag :

                    <title>semnificațiile elocinței lui burke</title>
                    

                    Now, I’m sorry but my regex and the @alan-kilborn version work, both, as expected ! For instance, using the following sample, in a new tab :

                    
                    s-ar părea că, de cîteva decenii încoace.
                    
                    <title>semnificațiile elocinței lui burke</title>
                    
                    s-ar părea că, de cîteva decenii încoace.
                    
                    <title>semnificațiile elocinței lui burke</title>
                    
                    s-ar părea că, de cîteva decenii încoace.
                    
                    <title>semnificațiile elocinței lui burke</title>
                    
                    s-ar părea că, de cîteva decenii încoace.
                    
                    • Open the Replace dialog ( Ctrl + H )

                      • SEARCH (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(\w)(\w*)    ( I omitted the useless -i modifier, near the end of the regex ! )

                      • REPLACE \U${1}\E${2}

                    • OR

                      • SEARCH (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(\w+)

                      • REPLACE \u$0

                    • Tick the Wrap around option ( IMPORTANT )

                    • Select the Regular expression search mode

                    • Click, once, on the Replace All button ( Do not use the Replace button ! )

                    You’ll get the expected OUTPUT text :

                    
                    s-ar părea că, de cîteva decenii încoace.
                    
                    <title>Semnificațiile Elocinței Lui Burke</title>
                    
                    s-ar părea că, de cîteva decenii încoace.
                    
                    <title>Semnificațiile Elocinței Lui Burke</title>
                    
                    s-ar părea că, de cîteva decenii încoace.
                    
                    <title>Semnificațiile Elocinței Lui Burke</title>
                    
                    s-ar părea că, de cîteva decenii încoace.
                    

                    As you can see, just each word of text between the tags <title and </title> are concerned, with their first letter in uppercase !


                    Now, in the second part of my previous post, when I spoke about cases A, B, …, I was not referring to your first challenge at all ! I just gave you general regexes, whatever text is embedded within <title>...........</title> tags or not !

                    BTW, I should have added the case F : I want to convert all letters in their lowercase form, leading to the regex S/R :

                    SEARCH \w+

                    REPLACE \L$0


                    Finally, I agree with you : the accentuated characters are not handled by these case modifiers ! Last year, I created an issue regarding this problem. Refer :

                    https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9636

                    Unfortunately, because of current global C Locale and some other stuff, there was a performance issue. So, this issue was reverted by Don :

                    https://github.com/notepad-plus-plus/notepad-plus-plus/commit/6844df039d54557a93a75752d651d5b9bb49f7ed

                    Best Regards,

                    guy038

                    @alan-kilborn : Yes, just an alternate method. Not essential ;-))

                    Hellena CrainicuH 2 Replies Last reply Reply Quote 1
                    • Hellena CrainicuH
                      Hellena Crainicu @guy038
                      last edited by

                      This post is deleted!
                      1 Reply Last reply Reply Quote 0
                      • Hellena CrainicuH
                        Hellena Crainicu @guy038
                        last edited by

                        @guy038

                        I made a change to @guy038 regex formula, as to modify something else.

                        So, If someone wants to convert the first letter of the first word at the beginning of <title> tag, from lowercase to a capital letter:

                        Use this regex:

                        FIND: (<title>)(.\W*)
                        REPLACE BY: \1\U$2

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors