How to find lines with words that are Capitalised



  • Hi All,

    I have a huge txt file opened in notepad++ with has thousands of lines. Im looking for lines that have words capitalised in a sentence. For example

    Your Tea Is Ready
    The Cat Is Out Of The Bag

    I dont want to read the whole document, just want to highlight/ find those lines and delete them. There must be a quicker way to search than go through it all.

    Please can you help? Im a novice.

    Thanks,
    MA



  • Let’s see if I can beat @guy038 to it:

    As a first pass, assuming one sentence per line (every line is defined as a sentence, and only one sentence per line). Also assume we aren’t requiring upper-case as the first character in the line

    • Find What: (?-is)^.+[A-Z].*\R
    • Replace With: `` (empty)

    Source Document:

    Keep this line.
    Hello, my name is Inigo Montoya, prepare to delete this line.
    This is okay.
    Delete me, Yoda shall.
    starting lowercase is okay, too
    but not if there's a Capital
    

    Final:

    Keep this line.
    This is okay.
    starting lowercase is okay, too
    

    Explanation:

    • (?-is) = make it case sensitive; don’t span multiple lines
    • ^ = start match at beginning of the line
    • .+ = allow 1 or more of any character (this means that it won’t care if the first letter is upper case, because that will be taken up by the “allow 1”)
    • [A-Z] = require one capital letter somewhere not in the first character
    • .* = allow 0 or more any-character after the required capital letter
    • \R = include the newline (either CRLF or just LF) in the match
    • If all that matches, then replace with blank (since replacing the newline also, the whole line is deleted)

    However, this will not work when there are multiline sentences, or if you have multiple sentences in a line

    Multiline sentence won't work,
    because it will assume Previous Line is okay, even though this has capital,
    and the second line of this sentence will delete, but not the first or third.
    
    Multiple sentences won't work.  Defining end of sentence is harder.
    Some People use period-two-spaces, and some only period-space.
    And how do you want to handle Mr. John Doe?
    Or sentences that "end inside the quote."  And this becomes a Second sentence
    

    If you want more than the assumptions I made above, you will have to give a lot more details, and you may have to accept that you need a more intelligent parser than just a readable regex. (@guy038 may be able to get all of the edge cases in one regex, depending on what your rules are… but it will be a lot more complex than the one I showed…)


Log in to reply