Regex: Change the first lowercase letters of each word to capital letter on html tags

Alan Kilborn

@hellena-crainicu said in Regex: Change the first lowercase letters of each word to capital letter on html tags:

Is not working, it change all the text, not just the <title> tag
Try this text:

That text violates your original specification:

So of course it doesn’t work.
Actually it does work because it is still trying to find the </title> tag, and apparently is unsuccessful!

Hellena Crainicu

@alan-kilborn you right, IT WORKS

thanks a lot

Hellena Crainicu

@alan-kilborn by the way, sir. One single question.

If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

I try this regex:

FIND: (.*)([A-Z]+)
REPLACE BY: \L$1$2

Seems to be good for all letters - except Diacritics (Accent Marks)

Hellena Crainicu

@hellena-crainicu said in Regex: Change the first lowercase letters of each word to capital letter on html tags:

@alan-kilborn by the way, sir. One single question.

If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

I try this regex:

FIND: (.*)([A-Z]+)
REPLACE BY: \L$1$2

Seems to be good for all letters - except Diacritics (Accent Marks)

For this case, this regex Works but only in Sublime Text:

FIND: ([A-Z])(.*)

REPLACE BY: \L$1$2

guy038

Hi, @hellena-crainicu, @alan-kilborn and All,

Back to your first challenge :

I have this html tag:

<title>sunrise must gone for a moment</title>

The Output must be:

<title>Sunrise Must Gone For A Moment</title>

Don’t forget, @hellena-crainicu, that the @alan-kilborn’s solution works correctly ONLY IF :

You moved the caret at the very beginning of current file

OR

You ticked the Wrap around search option

before performing the replacement !

Now, Alan, a second and more simple version would be :

SEARCH (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(\w+)

REPLACE \u$0

Indeed, the lowercase argument \u, in replacement, means : just change the first letter of $0 in uppercase !

Now, @hellena-crainicu, regarding your second challenge :

If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

I try this regex:

FIND: (.*)([A-Z]+)
REPLACE BY: \L$1$2

Seems to be good for all letters - except Diacritics (Accent Marks)

You do not seem to understand the role of the different case modifiers ! Here is a summary, which acts ONLY in the replacement regex !

The \u case modifier change the next char, of current replacement string, in uppercase
The \l case modifier change the next char, of current replacement string, in lowercase
The \U case modifier change all the next chars, of the current replacement string, in uppercase, except for a char preceded by \l, until a \L or \E case modifier occur
The \L case modifier change all the next chars, of the current replacement string, in lowercase, except for a char preceded by \u, until a \U or \E case modifier occur
The \E case modifier cancels any subsequent case changes, induced by the \U and/or \L case modifiers

You said :

If I want to change multiple text files, convert all words from lowercase letter to capital letter, with Find In Files and replace all option, how can I do that?

Your phrasing is quite ambiguous ! Do you mean :

A I want to convert all letters in their uppercase form ?
B I want to convert the first letter, of each word, in its uppercase form and leave all other letters, of each word, untouched ?
C I want to convert the first letter, of each word, in its uppercase form and change all subsequent letters, of each word, in its lowercase form ?
D I want to convert the first letter of a sentence, in its uppercase form and leave all other letters, of the sentence, untouched ?
E I want to convert the first letter of a sentence, in its uppercase form and change all subsequent letters, of the sentence, in its lowercase form ?
…

So, to my mind, if we consider traditional documents ( not code files ! ), this leads to these different regex S/R below :

For case A :
- SEARCH \w+
- REPLACE \U$0
For case B :
- SEARCH (\w)(\w*)
- REPLACE \u$1\E$2
For case C :
- SEARCH (\w)(\w*)
- REPLACE \u$1\L$2
For case D :
- SEARCH (?:\.\W*|\R)\K\w
- REPLACE \u$0
For case E :
- SEARCH (?:\.\W*|\R)\K(\w).+?(?=\.|\R)
- REPLACE \u$1\L$2

Regarding cases D and E, some regex improvements could be needed as there are still some drawbacks with some specific expressions ! If I test the regexes against the license.txt file, it would match, for instance :

h and h@free, in the part ©2016 Don HO don.h@free.fr, giving ©2016 Don HO don.H@free.fr, after replacement
The letters a, b and c, in parts like a) You must cause…, giving A) you must cause…, after replacement

Best Regards

guy038

Hellena Crainicu

hello @guy038

Your regex replace all text, even outside <title> tag. I need to make the replacement only inside <title> </title> tag.

Please test your regex on this simple example:

<title>Semnificațiile Elocinței Lui Burke<title>

S-Ar Părea CĂ, De CÎteva Decenii încoace.

Also, check on the same example the cases you give.

If someone wants to convert all capital letters (with diacritics) into lowercase letter, doesn’t work any of your cases.

Alan Kilborn

@guy038 said in Regex: Change the first lowercase letters of each word to capital letter on html tags:

Now, Alan, a second and more simple version would be…

OK, but since we already have a recipe for replacing inside delimiters, I really don’t see value in simplification; just apply the recipe.

guy038

Hello, @hellena-crainicu, @alan-kilborn and All,

First, your inverse video example is erroneous ( as @alan-kilborn already told you ) ! You should have written this correct INPUT text with an ending tag :

<title>semnificațiile elocinței lui burke</title>

Now, I’m sorry but my regex and the @alan-kilborn version work, both, as expected ! For instance, using the following sample, in a new tab :


s-ar părea că, de cîteva decenii încoace.

<title>semnificațiile elocinței lui burke</title>

s-ar părea că, de cîteva decenii încoace.

<title>semnificațiile elocinței lui burke</title>

s-ar părea că, de cîteva decenii încoace.

<title>semnificațiile elocinței lui burke</title>

s-ar părea că, de cîteva decenii încoace.

Open the Replace dialog ( Ctrl + H )
- SEARCH (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(\w)(\w*) ( I omitted the useless -i modifier, near the end of the regex ! )
- REPLACE \U${1}\E${2}
OR
- SEARCH (?-i:<title>|(?!\A)\G)(?s:(?!</title>).)*?\K(\w+)
- REPLACE \u$0
Tick the Wrap around option ( IMPORTANT )
Select the Regular expression search mode
Click, once, on the Replace All button ( Do not use the Replace button ! )

You’ll get the expected OUTPUT text :


s-ar părea că, de cîteva decenii încoace.

<title>Semnificațiile Elocinței Lui Burke</title>

s-ar părea că, de cîteva decenii încoace.

<title>Semnificațiile Elocinței Lui Burke</title>

s-ar părea că, de cîteva decenii încoace.

<title>Semnificațiile Elocinței Lui Burke</title>

s-ar părea că, de cîteva decenii încoace.

As you can see, just each word of text between the tags <title and </title> are concerned, with their first letter in uppercase !

Now, in the second part of my previous post, when I spoke about cases A, B, …, I was not referring to your first challenge at all ! I just gave you general regexes, whatever text is embedded within <title>...........</title> tags or not !

BTW, I should have added the case F : I want to convert all letters in their lowercase form, leading to the regex S/R :

SEARCH \w+

REPLACE \L$0

Finally, I agree with you : the accentuated characters are not handled by these case modifiers ! Last year, I created an issue regarding this problem. Refer :

https://github.com/notepad-plus-plus/notepad-plus-plus/issues/9636

Unfortunately, because of current global C Locale and some other stuff, there was a performance issue. So, this issue was reverted by Don :

https://github.com/notepad-plus-plus/notepad-plus-plus/commit/6844df039d54557a93a75752d651d5b9bb49f7ed

Best Regards,

guy038

@alan-kilborn : Yes, just an alternate method. Not essential ;-))

Hellena Crainicu

This post is deleted!

Hellena Crainicu

@guy038

I made a change to @guy038 regex formula, as to modify something else.

So, If someone wants to convert the first letter of the first word at the beginning of <title> tag, from lowercase to a capital letter:

Use this regex:

FIND: (<title>)(.\W*)
REPLACE BY: \1\U$2