Line break before every UPPERCASE word

Reply to Line break before every UPPERCASE word on Wed, 08 Sep 2021 19:56:31 GMT

guy038 — Wed, 08 Sep 2021 19:56:31 GMT

Hello @floyddebarber, @peterjones and All,

An alternative solution would be :

SEARCH (?-i)(?<=\.)\h*(?=\u\u)

REPLACE \r\n

So, for instance, from this INPUT text :

MÜLLER 6 - Blahblah.         SMITH 5 - Asdds. Asdsd.DI CARLO 8,5 - And. Maybe even. Multiple. Sentences here.

you would get the OUTPUT text :

MÜLLER 6 - Blahblah.
SMITH 5 - Asdds. Asdsd.
DI CARLO 8,5 - And. Maybe even. Multiple. Sentences here.

Notes :

This regex searches a range of horizontal blank chars ( \x20, \x09 or \x85 ), possibly null, but ONLY IF :
- It is preceded with a literal full period due to the positive look-behind (?<=\.)
- It is followed with two upper-case letters, accentuated or not, due to the positive look-around (?=\u\u)
And, in replacement, this range is just replaced by a Windows line-break ( \r\n ) ( Use \n only if working on Unix files )

Best Regards,

guy038

Reply to Line break before every UPPERCASE word on Tue, 07 Sep 2021 15:39:15 GMT

floyddebarber — Tue, 07 Sep 2021 15:39:15 GMT

Wow, many thanks for the fast and detailed reply!

Reply to Line break before every UPPERCASE word on Tue, 07 Sep 2021 14:05:28 GMT

PeterJones — Tue, 07 Sep 2021 14:05:28 GMT

@floyddebarber said in Line break before every UPPERCASE word:

MÜLLER 6 - Blahblah. SMITH 5 - Asdds. Asdsd. DI CARLO 8,5 - And. Maybe even. Multiple. Sentences here.

FIND = (?-i)\h+(\b\u{2}[\u\x20]+)
REPLACE = \r\n$1
SEARCH MODE = regular expression

important concepts:

\h and \u and [...] = character classes: https://npp-user-manual.org/docs/searching/#character-classes
+ and {2} = multiplying operators: https://npp-user-manual.org/docs/searching/#multiplying-operators
\b = anchors: https://npp-user-manual.org/docs/searching/#anchors
(?-i) = search modifiers: https://npp-user-manual.org/docs/searching/#search-modifiers
(...) = capture groups: https://npp-user-manual.org/docs/searching/#capture-groups-and-backreferences
\r\n = control characters: https://npp-user-manual.org/docs/searching/#control-characters
$1 = substitution escape sequences: https://npp-user-manual.org/docs/searching/#substitution-escape-sequences

edit: the boundary \b isn’t necessary; I had that in there from an early version, but I had added the \h+ before to prevent MÜLLER from getting an extra CRLF before it, so the boundary was no longer needed.