Please help for RegEx query

tzrtnlutz · Sep 26, 2018, 7:18 AM

I have this problem with RegEX:

Example:
Das ist richTig. ISt das ein TeXt. Ja, dAS ist in OrDnung.

If I make this:

(?-is).+
\L$0

= das ist richtig. ist das ein text. ja, das ist in ordnung.

The first letter must be ignored, that would be correct:
Das ist richtig. Ist das ein Text. Ja, das ist in Ordnung.

I hope for a help. Big thanks!

gerdb42 · Sep 26, 2018, 8:20 AM

You may try this:
Search for: (\<.)(\w*)((?:\s|[[:punct:]])*)
Replace with: $1\L$2$3

Then hit “Replace all”

Teardown:
(\<.) Selects the first Character following a word beginning and makes it group $1
(\w*) Selects all following word characters and makes it group $2
((?:\s|[[:punct:]])*) Selects all following space or punctuation characters and makes it group $3

Replace by Group $1 as is followed by group $2 converted to lowercase followed by group $3as is. Repeat for all subsequent words.

tzrtnlutz · Sep 26, 2018, 9:21 AM

Wonderful. That’s what I’ve done.
Big thanks again!

guy038 · Sep 26, 2018, 12:35 PM

Hello, @tzrtnlutz, @gerdb42 and All,

After some tests, I think that the regex may even be shortened :

SEARCH (\w)(\w*)

REPLACE $1\L$2

Indeed, the unique ASCII character ( so, < \x{0080} ), which is, either, a word character and a punctuation character is the Low Line symbol _ ( \x{005F} ). But, as the regex \w* will catch the greatest amount of word characters, it will include all possible _ symbols, anyway ! Thus, the punctuation character, after a word, will be, necessarily, a character different from _ :-))

Moreover, as the \w and \s sets of characters have no common element, neither, the ending part (?:\s|[[:punct:]])* is useless !

Cheers,

guy038

Scott Sumner · Sep 26, 2018, 12:43 PM

@guy038

As you know, I’m not much for shortening already published and working regexes here, but in this case I think it is worthwhile as it makes what is being done much clearer.