Community
    • Login

    Please help for RegEx query

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 4 Posters 1.4k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • tzrtnlutzT Offline
      tzrtnlutz
      last edited by

      I have this problem with RegEX:

      Example:
      Das ist richTig. ISt das ein TeXt. Ja, dAS ist in OrDnung.

      If I make this:

      (?-is).+
      \L$0

      = das ist richtig. ist das ein text. ja, das ist in ordnung.

      The first letter must be ignored, that would be correct:
      Das ist richtig. Ist das ein Text. Ja, das ist in Ordnung.

      I hope for a help. Big thanks!

      1 Reply Last reply Reply Quote 0
      • gerdb42G Offline
        gerdb42
        last edited by

        You may try this:
        Search for: (\<.)(\w*)((?:\s|[[:punct:]])*)
        Replace with: $1\L$2$3

        Then hit “Replace all”

        Teardown:
        (\<.) Selects the first Character following a word beginning and makes it group $1
        (\w*) Selects all following word characters and makes it group $2
        ((?:\s|[[:punct:]])*) Selects all following space or punctuation characters and makes it group $3

        Replace by Group $1 as is followed by group $2 converted to lowercase followed by group $3as is. Repeat for all subsequent words.

        1 Reply Last reply Reply Quote 1
        • tzrtnlutzT Offline
          tzrtnlutz
          last edited by

          Wonderful. That’s what I’ve done.
          Big thanks again!

          1 Reply Last reply Reply Quote 0
          • guy038G Offline
            guy038
            last edited by

            Hello, @tzrtnlutz, @gerdb42 and All,

            After some tests, I think that the regex may even be shortened :

            SEARCH (\w)(\w*)

            REPLACE $1\L$2


            Indeed, the unique ASCII character ( so, < \x{0080} ), which is, either, a word character and a punctuation character is the Low Line symbol _ ( \x{005F} ). But, as the regex \w* will catch the greatest amount of word characters, it will include all possible _ symbols, anyway ! Thus, the punctuation character, after a word, will be, necessarily, a character different from _ :-))

            Moreover, as the \w and \s sets of characters have no common element, neither, the ending part (?:\s|[[:punct:]])* is useless !

            Cheers,

            guy038

            Scott SumnerS 1 Reply Last reply Reply Quote 2
            • Scott SumnerS Offline
              Scott Sumner @guy038
              last edited by

              @guy038

              As you know, I’m not much for shortening already published and working regexes here, but in this case I think it is worthwhile as it makes what is being done much clearer.

              1 Reply Last reply Reply Quote 0

              Hello! It looks like you're interested in this conversation, but you don't have an account yet.

              Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

              With your input, this post could be even better 💗

              Register Login
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors