Community
    • Login

    Line break before every UPPERCASE word

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 3 Posters 396 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • F
      floyddebarber
      last edited by

      Hey!

      I have text files of scanned tables that OCRed into a single line. The original table was essentially 3 columns: An UPPERCASE surname, a number (rating), a dividing dash and a couple of senteces of text (I don’t even need that).
      Something like this:

      MÜLLER 6 - Blahblah. SMITH 5 - Asdds. Asdsd. DI CARLO 8,5 - And. Maybe even. Multiple. Sentences here.
      

      to

      MÜLLER 6 - Blahblah. 
      SMITH 5 - Asdds. Asdsd. 
      DI CARLO 8,5 - And. Maybe even. Multiple. Sentences here.
      

      Can you help me out with an expression to break the lines before every completely UPPERCASE word, but not at every Sentence?
      Also, is there an elegant way to replace the leading space between the name and the number withour affecting the spaces in multipart names?

      Thank you!

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @floyddebarber
        last edited by PeterJones

        @floyddebarber said in Line break before every UPPERCASE word:

        MÜLLER 6 - Blahblah. SMITH 5 - Asdds. Asdsd. DI CARLO 8,5 - And. Maybe even. Multiple. Sentences here.

        FIND = (?-i)\h+(\b\u{2}[\u\x20]+)
        REPLACE = \r\n$1
        SEARCH MODE = regular expression

        important concepts:

        • \h and \u and [...] = character classes: https://npp-user-manual.org/docs/searching/#character-classes
        • + and {2} = multiplying operators: https://npp-user-manual.org/docs/searching/#multiplying-operators
        • \b = anchors: https://npp-user-manual.org/docs/searching/#anchors
        • (?-i) = search modifiers: https://npp-user-manual.org/docs/searching/#search-modifiers
        • (...) = capture groups: https://npp-user-manual.org/docs/searching/#capture-groups-and-backreferences
        • \r\n = control characters: https://npp-user-manual.org/docs/searching/#control-characters
        • $1 = substitution escape sequences: https://npp-user-manual.org/docs/searching/#substitution-escape-sequences

        edit: the boundary \b isn’t necessary; I had that in there from an early version, but I had added the \h+ before to prevent MÜLLER from getting an extra CRLF before it, so the boundary was no longer needed.

        1 Reply Last reply Reply Quote 1
        • F
          floyddebarber
          last edited by

          Wow, many thanks for the fast and detailed reply!

          1 Reply Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by

            Hello @floyddebarber, @peterjones and All,

            An alternative solution would be :

            SEARCH (?-i)(?<=\.)\h*(?=\u\u)

            REPLACE \r\n

            So, for instance, from this INPUT text :

            MÜLLER 6 - Blahblah.         SMITH 5 - Asdds. Asdsd.DI CARLO 8,5 - And. Maybe even. Multiple. Sentences here.
            

            you would get the OUTPUT text :

            MÜLLER 6 - Blahblah.
            SMITH 5 - Asdds. Asdsd.
            DI CARLO 8,5 - And. Maybe even. Multiple. Sentences here.
            

            Notes :

            • This regex searches a range of horizontal blank chars ( \x20, \x09 or \x85 ), possibly null, but ONLY IF :

              • It is preceded with a literal full period due to the positive look-behind (?<=\.)

              • It is followed with two upper-case letters, accentuated or not, due to the positive look-around (?=\u\u)

            • And, in replacement, this range is just replaced by a Windows line-break ( \r\n ) ( Use \n only if working on Unix files )

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 2
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors