Please help for RegEx query



  • I have this problem with RegEX:

    Example:
    Das ist richTig. ISt das ein TeXt. Ja, dAS ist in OrDnung.

    If I make this:

    (?-is).+
    \L$0

    = das ist richtig. ist das ein text. ja, das ist in ordnung.

    The first letter must be ignored, that would be correct:
    Das ist richtig. Ist das ein Text. Ja, das ist in Ordnung.

    I hope for a help. Big thanks!



  • You may try this:
    Search for: (\<.)(\w*)((?:\s|[[:punct:]])*)
    Replace with: $1\L$2$3

    Then hit “Replace all”

    Teardown:
    (\<.) Selects the first Character following a word beginning and makes it group $1
    (\w*) Selects all following word characters and makes it group $2
    ((?:\s|[[:punct:]])*) Selects all following space or punctuation characters and makes it group $3

    Replace by Group $1 as is followed by group $2 converted to lowercase followed by group $3as is. Repeat for all subsequent words.



  • Wonderful. That’s what I’ve done.
    Big thanks again!



  • Hello, @tzrtnlutz, @gerdb42 and All,

    After some tests, I think that the regex may even be shortened :

    SEARCH (\w)(\w*)

    REPLACE $1\L$2


    Indeed, the unique ASCII character ( so, < \x{0080} ), which is, either, a word character and a punctuation character is the Low Line symbol _ ( \x{005F} ). But, as the regex \w* will catch the greatest amount of word characters, it will include all possible _ symbols, anyway ! Thus, the punctuation character, after a word, will be, necessarily, a character different from _ :-))

    Moreover, as the \w and \s sets of characters have no common element, neither, the ending part (?:\s|[[:punct:]])* is useless !

    Cheers,

    guy038



  • @guy038

    As you know, I’m not much for shortening already published and working regexes here, but in this case I think it is worthwhile as it makes what is being done much clearer.


Log in to reply