Hilfe bei Korrektur der Groß / Kleinschreibung benötigt



  • Hallo ich muss in einer Scriptdatei Die Groß / Kleinschreibung der Anreden korrigieren.
    Dabei soll “Du, Sie, Er” und so weiter nur am Satzanfang groß geschrieben werden.
    Sämtliche anderen Vorkommen sollen klein geschrieben sein.
    Der Zeilenanfang ist nicht der Beginn des zu übersetzenen Textes und ist zudem mit Leerzeichen (Keine Tabulatoren) eingerückt!

    Kann mir da jemand helfen?
    Hier ein Beispiel:

    "{i}Du bestellst einen Daiquiri für [b_name] und ein Bier für Dich selbst." <--- hier muß das "Dich" klein geschrieben werden.
    p "Das hast Du und ich werde es dabei belassen! Du bist großartig, [p_name]!" <--- hier muß das "Du" im ersten Satz klein und "Du" im zweiten Satz groß geschrieben werden.
    

    Ich hoffe ich konnte mich verständlich ausdrücken.



  • @Atlan1000
    Ich gehe davon aus, dass Sie der Meinung sind, dass dies mit regulären Ausrdücken (Regex) zu lösen wäre. Das ist aber leider nicht der Fall. Die Schwierigkeiten bestehen darin Regeln zu erstellen um zu erkennen, wann ein Satz anfängt und wann es sich, bei dem jeweiligen Wort, um ein Personalpronomen handelt.
    Sollte, die Erkennung solcher Muster, in ihrem Fall möglich sein, so müßten Sie uns hier diese aufzeigen, damit jemand dementsprechende Regexen entwickeln kann.

    I assume that you think that this could be solved with regular expressions (regex). Unfortunately, this is not the case. The difficulty is to create rules to recognize when a sentence starts and when the word in question is a personal pronoun.
    If it is possible to recognize such patterns in your case, you would have to point them out to us here, so that someone can develop corresponding regexes.



  • Bisher erledige ich das, indem ich zum Beispiel nach "Du " suche und alles durch "du "ersetze, dann verwende ich als Suche:

    ("du )|(! du )|(? du )|(. u )|(… du )|(( du ) <— damit Sind alle möglichen Satzanfänge bis auf einen abgedeckt. Es funtioniert irgendwie nicht mit der geschweiften schliessenden “}” Klammer von Textformatierungen - aber das ist das kleinere Übel…

    und ersetze mit diesem:

    (?1"Du )(?2! Du )(?3? Du )(?4. Du )(?5... Du )(?6( Du )

    Dieses wiederhole ich dann für alle Pronomen:

    Ich, Du, Er, Sie, Es, Wir, Ihr, Sie, Deine Euer, Ihre und so weiter…

    Das ist ist sehr Zeitaufwendig und nervig.
    Ich weiß, das Regex solche komplexen Ersetzungen regeln kann.
    In einer möglichen Lösung muss ja nicht alles stehen - ich brauche nur einen Ansatz mit 2 Pronomen, den Rest bekomme ich dann schon hin.

    Danke schon mal für die Mühe die ich verursache.



  • Sorry, hier der korrekte Ausdruck:

    Suche:
    ("du )|(! du )|(\? du )|(\. u )|(\.\.\. du )|(\( du )
    
    Ersetzen:
    (?1"Du )(?2! Du )(?3\? Du )(?4\. Du )(?5\.\.\. Du )(?6\( Du )
    


  • @Atlan1000 ,

    Please use English as the primary language in this forum. If you want to double-post with both English and German, that is fine. (Bitte verwenden Sie Englisch als Hauptsprache in diesem Forum. Wenn Sie sowohl auf Englisch als auch auf Deutsch doppelt posten möchten, ist das in Ordnung. My high-school German from decades ago was able to piece together “Bitte nur auf Englisch”, but I hope the translated sentences are more nuanced than that, and better match my original English sentences.)

    In your original post, I was under the impression that you had lots of pronouns that were not capitalized (Du, Dich, …) and wanted to remove capitalization when it wasn’t at the beginning of the sentence. Now it’s looking like you have lots of occurrences of lower-case pronouns (du, dich, …) and want to capitalize them only if they are the beginning of a sentence.

    Either way, we can make use of the replacement escapes \u or \l to uppercase or lowercase one letter. That simplifies the replacement significantly, so you don’t have to catch the different pronouns separately. You can also then capture the previous end-of-sentence without having to manually select through the options in the replacement.

    Since you know about conditional replacement, I will assume that you are okay with grouping as I use it. If you have questions about the meaning of the expressions below, ask specific questions.

    For converting the start of sentence to upper case, it might use something like:

    • FIND = (?x) (}\x20* | \.\.\.\x20+ | \.\x20+ | \?\x20+ | !\x20+ | \(\x20+ | "\x20+ ) (?-i: (du | dich | dein | sie) ) \b
      • I am using (?x) in the search to allow putting in extra whitespace to make it more readable. That then requires using \x20 to match a space character.
      • I put a boundary \b at the end, in case there are other words that start with the same sequence as your list of pronouns.
    • REPLACE = ${1}\u${2}
    • REPLACE multiple times, or REPLACE ALL

    converts

    {i}du
    ... end." du 
    blah. dein
    etc... sie
    Somthing else ( du hast recht. )
    are you sure? du bist.
    I don't think so! du ...
    

    into

    {i}Du
    ... end." Du 
    blah. Dein
    etc... Sie
    Somthing else ( Du hast recht. )
    are you sure? Du bist.
    I don't think so! Du ...
    

    If you want to go the other way (look for non-sentence-endings followed by the capital pronoun) and want to make that pronoun lower case: you’d want to look for instances of the pronouns that don’t have the sentence-ender before, and to use \l in the replacement to make the next word start with a lower-case.



  • I assume the start of sentence rule can be simplified to something like (?<=["!\?\.\}])(\s*)(\w) and replace with \1\u\2.



  • @Ekopalypse said in Hilfe bei Korrektur der Groß / Kleinschreibung benötigt:

    I assume the start of sentence rule can be simplified

    I was originally aiming for that. But with the } rule requiring 0 spaces in the original example, vs everything else requiring at least one space in the implied rules, I was having trouble nesting the parens the way I wanted to, so just used the spaces separately in each alternation.

    And, "of course, there exceptions to every end-of-sentence rule," he says discouragingly.

    (By that, I mean to say that according to the rules presented by the OP, “He” would be capitalized; but it shouldn’t be. And if the OP is trying to force non-sentence-start pronouns to lower case as originally implied, I hope there aren’t any formal vs informal “Sie” pronouns in non-sentence-start locations. ;-) )



  • @PeterJones said in Hilfe bei Korrektur der Groß / Kleinschreibung benötigt:

    I hope there aren’t any formal vs informal “Sie” pronouns in non-sentence-start locations.

    Well :-), they are there, but OP wants to get rid of them :-D
    I would be more concerned about identifying the pronouns.
    I mean, what would Klaus Meine, singer of the German band Scorpions think when he sees his name as Klaus meine(mine) or what would the people in Paris think when the Seine suddenly becomes seine (his)?



  • @Atlan1000 ,

    The hopefully-humorous exchange between myself and @Ekopalypse is meant to indicate some of the reasons why both of us discourage using super-complicated regex while attempting to get deep into identifying the beginning or ending of sentences, or parts of speech. Even applications like MS Word’s spellchecker or translation software (like Google Translate or deepl.com) can sometimes have trouble correctly parsing natural language. In a single regex, even when trying to identify something like the “start of a sentence” or “find this list of pronouns” soon has so many exceptions that the regex is too unwieldy to understand.

    Hopefully, our suggestions will help you in the limited case you showed… but understand that you will find edge cases, and making the regex catch them all would be difficult at best…



  • @PeterJones,

    First of all, thank you for the effort.
    The expression works so far, but it is not what I am looking for.
    Maybe I didn’t express myself correctly:
    So, what I need as a result is that the pronouns are written exclusively at the beginning of the sentence and after "… " are capitalized, all other occurrences are lowercase.
    In German, salutations within a sentence are increasingly lowercase - as in English, and usually only proper nouns are capitalized.
    The capitalization of pronouns is actually only done in letters.

    So no spelling or grammar check at all!



  • @Atlan1000 ,

    So no spelling or grammar check at all!

    Any regex solution will not be able to tell the difference between formal-Sie and informal-sie. Any regex solution will not be able to tell the difference between Klaus Meine and Meine Freundin – it would blindly lowercase both of them. I’m just saying, there are exceptions to every rule in most languages.

    but it is not what I am looking for.

    I showed you that particular regular expression, because you showed the data that you had as wanting to go from lowercase to uppercase, and that’s what the regex I showed does. I would have done the other direction first, but that’s not what your example showed. And I did describe the process for making the other direction of conversion.

    If I were to do the full task, I would do it in two steps. First, the step I already showed, which makes sure that all the lowercase pronouns at the beginning of the sentence are properly capitalized. Second, I would make a similar regex, but I would make it look for those same pronouns that aren’t at the beginning of a sentence, and make those lowercase.

    You seem to have some regex skill, so I was hoping you would know how to negate the first part of the regular expression and change the case in the pronoun list for correct match, and switch from \u to \l in the replacement to make sure. I highly recommend you give it a try, and then show us what you tried and ask for help if it didn’t work.

    I don’t have the time to debug regex right now (need to get back to focusing on what my boss pays me for), but if you can’t figure it out, maybe one of the other regex gurus will be able to chime in with help for the other direction.



  • @PeterJones,

    Ok, so this is how I’ve been doing it so far ( in two steps, you suggested it too):
    I converted all pronouns to lowercase first using this regex.
    (I have a separate like regex for each pronoun - so for “Ich, Du, Er, Sie, Es, Wir, Ihr, Sie. Meine, Deine, Ihre, Eure, Unsere”)

    Here is an example with “Du”:

    • FIND = ("Du )|(! Du )|(? Du )|(. Du )|(... Du )|(( Du )|(, Du)|(Du...)|( Du )|(}Du )

    • REPLACE = (?1"du )(?2! du )(?3? du )(?4. du )(?5... du )(?6( du )(?7, du)(?8du... )(?9 du )(?10}du )

    • REPLACE ALL

    Before:

    p "Du bist nicht zu sehr erschüttert über das, was er gesagt hat."
        p "Ich weiß, Du stehst auf schöne Sachen..."
        ri "Sag [p_name], Du weißt ich mag Dich und wir haben uns ein paar Mal geküsst... Du bist sehr liebenswert..."
        p "Du mußt nichts tun, worauf du keine Lust hast."
        "{i}Du und [p_name] schmiegt euch aneinander."
    

    After:

    p "du bist nicht zu sehr erschüttert über das, was er gesagt hat."
        p "Ich weiß, du stehst auf schöne Sachen..."
        ri "Sag [p_name], du weißt ich mag Dich und wir haben uns ein paar Mal geküsst... du bist sehr liebenswert..."
        p "du mußt nichts tun, worauf du keine Lust hast."
        "{i}du und [p_name] schmiegt euch aneinander.
    

    In the second step, I convert the sentence starters to uppercase letters:

    • FIND = ("du )|(! du )|(? du )|(. du )|(... du )|((du )|( du )|(}du )

    • REPLACE = (?1"Du )(?2! Du )(?3? Du )(?4. Du )(?5... Du )(?6(Du )(?7 du )(?8}Du )

    • REPLACE ALL

    Correct result as I need for all pronouns:

       p "Du bist nicht zu sehr erschüttert über das, was er gesagt hat."
       p "Ich weiß, du stehst auf schöne Sachen..."
       ri "Sag [p_name], du weißt ich mag Dich und wir haben uns ein paar Mal geküsst... Du bist sehr liebenswert..."
       p "Du mußt nichts tun, worauf du keine Lust hast."
       "{i}Du und [r_name] schmiegt euch aneinander.
    

    I’ve been trying all day to understand how your regex from yesterday works and rebuild it accordingly, but I’m just not succeeding.

    PS: If there are actually proper nouns like you mentioned above, I’ll change them with my own regex - I’ve already encountered this problem and always create a corresponding regex containing the proper nouns first.



  • @Atlan1000 ,

    So, with your example data

    p "Du bist nicht zu sehr erschüttert über das, was er gesagt hat."
        p "Ich weiß, Du stehst auf schöne Sachen..."
        ri "Sag [p_name], Du weißt ich mag Dich und wir haben uns ein paar Mal geküsst... Du bist sehr liebenswert..."
        p "Du mußt nichts tun, worauf du keine Lust hast."
        "{i}Du und [p_name] schmiegt euch aneinander."
    

    I have followed what I said earlier, but started with @Ekopalypse’s simpler regex:

    • I converted the first section to match not end-of-sentence by negating the character class.
    • I made sure there was at least one space between the previous and the pronoun
    • I looked for for the capitalized versions of the pronouns in the alternation list

    The new SEARCH expression for the convert-to-lowercase half of the assignment is (?x) (?<=[^"!\?\.\}\h]) (\x20+) (?-i: ( Du | Dich | Dein ) ) , which finds
    d1be1cae-4432-4b91-ab7e-9e2464832eb0-image.png

    and if you REPLACE with ${1}\l${2} , it gives me:

    p "Du bist nicht zu sehr erschüttert über das, was er gesagt hat."
        p "Ich weiß, du stehst auf schöne Sachen..."
        ri "Sag [p_name], du weißt ich mag dich und wir haben uns ein paar Mal geküsst... Du bist sehr liebenswert..."
        p "Du mußt nichts tun, worauf du keine Lust hast."
        "{i}Du und [p_name] schmiegt euch aneinander."
    

    Since I searched for Du and Dich and Dein, my example also lower-cased the Dich (which your “correct result” didn’t do). This was to show you how to add pronouns to the list.



  • @PeterJones

    Yes! that’s what I was looking for, thank you!

    “Since I searched for Du and Dich and Dein, my example also lower-cased the Dich (which your “correct result” didn’t do). This was to show you how to add pronouns to the list.”

    Yes I know that my expression “Dich” could not change, because I had to use a separate regex for EACH pronoun so far and I only wanted to show how I proceeded so far using the regex for “Du” as an example.

    I’m still at the very beginning of learning what is possible with regex and am grateful for any help - especially because the scripts I have to correct are each several thousand lines long.
    Through your solution I have learned a lot - especially how a positive lockbehind works!
    I have now added the other pronouns I need to your expression and it works beautifully!

    So thanks again for your support!



  • @Atlan1000 said in Hilfe bei Korrektur der Groß / Kleinschreibung benötigt:

    Through your solution I have learned a lot - especially how a positive lockbehind works!

    Glad to help.

    Other References:


Log in to reply