Line combination
-
@madison-assessor ,
You’ve misunderstood @Terry-R request that you read the FAQ. You need to provide the information so it can be pulled from your message and used to test variations on. Pictures are useless unless as a showing of a result.For text transformations that you can’t do on your own, you need to provide that in a
code
enclosed formatting, which shows you didn’t read the FAQ. Please read it “before posting”, thank you. -
here’s your freebie, @madison-assessor . Just understand that @Terry-R really was hoping you’d read the FAQ more closely, and provide data that we could copy/paste so we could verify our solution before posting, as @Lycan-Thrope seconded. We cannot very well run a regex on an image of text.
freebie:
FIND =(?<!")\r\n
REPLACE =\x20
MODE = Regular expressionunsolicited advice: if you don’t have those people’s permission, you probably shouldn’t be posting any data about them in a public forum.
----
Useful References
-
This post is deleted! -
Hi, All,
I’ve found out exactly the same regex S/R than @Peterjones !
But, just for fun, can you understand why this similar syntax below, does not work properly ?
I mean for people using Windows
EOL
… ( Use the¶
icon in the Tool Bar )SEARCH
(?<!")\R
REPLACE
\x20
BR
guy038
-
I said,
if you don’t have those people’s permission, you probably shouldn’t be posting any data about them in a public forum
As a result, the intervening posts and the original screenshot were deleted without any dummy text being used to replace it, making it harder for future readers to see what the regex applies to.
BEFORE
"FAKENAME FAKENAME X OR FAKENAME FAKENAME 9876 W 9876 S CITYNAME ST 12345" "FAKENAME FAKENAME X 9876 W 9876 S CITYNAME ST 12345" "FAKENAME FAKENAME 9876 N 9876 W CITYNAME ST 12345" "FAKENAME FAKENAME FAKENAME FAKENAME FAKENAME + Additional Owners and/or Contacts 9876 BLAH DIFFERENT CITY ST 12345"
AFTER:
"FAKENAME FAKENAME X OR FAKENAME FAKENAME 9876 W 9876 S CITYNAME ST 12345" "FAKENAME FAKENAME X 9876 W 9876 S CITYNAME ST 12345" "FAKENAME FAKENAME 9876 N 9876 W CITYNAME ST 12345" "FAKENAME FAKENAME FAKENAME FAKENAME FAKENAME + Additional Owners and/or Contacts 9876 BLAH DIFFERENT CITY ST 12345"
-
@guy038 said in Line combination:
But, just for fun, can you understand why this similar syntax below, does not work properly ?
@guy038 how does this sound?
Writing this as a sentence we have “find a position in the text such that the character immediately before the caret is NOT a quote (as specified) and the character following the caret is within the \R set of characters.”
So the last bit is most important as\R
can be a carriage return (CR), a line feed (LF), both (CRLF) or some other special characters as defined here.Normally \R will capture both the CR and the LF, but with the addition of the “cannot have a quote before the caret” the regular expression engine moves it’s caret to between the CR and the LF, thus making the code TRUE. So now it has captured ONLY the LF. In making the replacement we are left with a CR on the previous line with a space following. Notepad++ will show the result which appears to be correct, each record on a separate line (CR isn’t a valid line ending for Windows files), but there is also a space starting most records.
Terry
-
Hi, @madison-assessor, @terry-r, @peterjones and All,
Well… first of all, let’s remember the definition of the
\R
syntax :\R
=(?>\r\n|\n|\x0b|\f|\r|\x85|\x{2028}\x{2029})
, which is an atomic group of some alternatives, representing2
or1
line-break character(s)Practically, with the
Boost Regex
engine, we can approximate this syntax as below :\R
=\r\n|\n|\r
So, given this simple text, ending with the XYZ string followed with a double-quotes character, the
\r\n
EOL characters and a new line beginning with the string ABC :FIRST line : ... XYZ"CRLF 12345 6 SECOND line : ABC ...
and the equivalent regex :
SEARCH
(?<!")(?:\r\n|\r|\n)
-
When the regex engine position is at positions
1
to4
, there’s obviously no match as current character is not a line-break -
When the regex engine position is at position
5
the first alternative\r\n
could be OK but is preceded with a"
char, so no match again, as well as with the two other alternatives\r
and\n
! -
When the regex engine position is at position
6
, only the third alternative\n
does match because it’s a line-break which is not preceded with a"
char ( actually it’s the\r
char ! )
=> Thus, the first line-break
\r
remains right after the"
character and the second line-break\n
is replaced with aspace
character beginning a new line :FIRST line : ... XYZ"CR ... 12345 SECOND line : ABC ... 1234 ...
Best Regards,
guy038
P.S. :
Of course, if we change the search regex as :
SEARCH
(?<!")\r\n
This time, no ambiguity remains : we do search for the two characters
\r\n
, in this order, if the\r
char is not preceded with a"
character ! -
-
@Terry-R said in Line combination:
Normally \R will capture both the CR and the LF, but with the addition of the “cannot have a quote before the caret” the regular expression engine moves it’s caret to between the CR and the LF, thus making the code TRUE. So now it has captured ONLY the LF. In making the replacement we are left with a CR on the previous line with a space following. Notepad++ will show the result which appears to be correct, each record on a separate line (CR isn’t a valid line ending for Windows files), but there is also a space starting most records.
Terry seems to have “nailed it” with this.
Maybe the moral of the story is, be careful with
\R
, or perhaps just don’t even use it. Be aware of your file’s end-of-line type, and just use the corresponding\r\n
or\n
(or even\r
) in your expressions. In practicality, how often does a user switch between end-of-line types for different files in their daily work, anyway? -
Why not just
(?:\r?\n|\r)
, since that gets all three possible line endings? -
Why not just … ?
In this specific instance? Because it shares the same problem as
\R
in this instance: if your real text is"\r\n
, then(?<!")(?:\r?\n|\r)
will match between the\r
and the\n
in the file.In general? Because in the vast majority of plaintext files, everything it matches is also matched by
\R
, which is so much easier to type.