Community
    • Login

    How to find lines with words that are Capitalised

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    2 Posts 2 Posters 5.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • alsolarisportal mA
      alsolarisportal m
      last edited by

      Hi All,

      I have a huge txt file opened in notepad++ with has thousands of lines. Im looking for lines that have words capitalised in a sentence. For example

      Your Tea Is Ready
      The Cat Is Out Of The Bag

      I dont want to read the whole document, just want to highlight/ find those lines and delete them. There must be a quicker way to search than go through it all.

      Please can you help? Im a novice.

      Thanks,
      MA

      1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones
        last edited by

        Let’s see if I can beat @guy038 to it:

        As a first pass, assuming one sentence per line (every line is defined as a sentence, and only one sentence per line). Also assume we aren’t requiring upper-case as the first character in the line

        • Find What: (?-is)^.+[A-Z].*\R
        • Replace With: `` (empty)

        Source Document:

        Keep this line.
        Hello, my name is Inigo Montoya, prepare to delete this line.
        This is okay.
        Delete me, Yoda shall.
        starting lowercase is okay, too
        but not if there's a Capital
        

        Final:

        Keep this line.
        This is okay.
        starting lowercase is okay, too
        

        Explanation:

        • (?-is) = make it case sensitive; don’t span multiple lines
        • ^ = start match at beginning of the line
        • .+ = allow 1 or more of any character (this means that it won’t care if the first letter is upper case, because that will be taken up by the “allow 1”)
        • [A-Z] = require one capital letter somewhere not in the first character
        • .* = allow 0 or more any-character after the required capital letter
        • \R = include the newline (either CRLF or just LF) in the match
        • If all that matches, then replace with blank (since replacing the newline also, the whole line is deleted)

        However, this will not work when there are multiline sentences, or if you have multiple sentences in a line

        Multiline sentence won't work,
        because it will assume Previous Line is okay, even though this has capital,
        and the second line of this sentence will delete, but not the first or third.
        
        Multiple sentences won't work.  Defining end of sentence is harder.
        Some People use period-two-spaces, and some only period-space.
        And how do you want to handle Mr. John Doe?
        Or sentences that "end inside the quote."  And this becomes a Second sentence
        

        If you want more than the assumptions I made above, you will have to give a lot more details, and you may have to accept that you need a more intelligent parser than just a readable regex. (@guy038 may be able to get all of the edge cases in one regex, depending on what your rules are… but it will be a lot more complex than the one I showed…)

        1 Reply Last reply Reply Quote 0
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors