regexp help: lookahead and lookbehind with spaces
-
@alan-kilborn thanks but I would like f1fantasy matched as well
-
@patrickdrd said in regexp help: lookahead and lookbehind with spaces:
but I would like f1fantasy matched as well
Hmm, the target moves…
I think you can figure out how to change my regex do that, given your first guess at it. :-)
-
@alan-kilborn \s* is the 0 or more spaces but I can’t figure out where to attach it to, I tried all places and it doesn’t work
I also got a couple more conditions to check for like mls and ucl
-
ok got it,
(?:(f1|ucl|mls)|(fantasy))\s*(?(1)(?2)|(?1))
-
Hello, @patrickdrd, @alan-kilborn and All,
Actually, two solutions are possibles :
Regex A : SEARCH
(?-i:(f1|ucl|mls)|(fantasy))\s*(?(1)(?2)|(?1))
Regex B : SEARCH
(?-i:(f1|ucl|mls)|(fantasy))\x20*(?(1)(?2)|(?1))
It does not match exactly the same occurrences !
- Paste the text below in a new tab :
f1fantasy f1 fantasy f1 fantasy uclfantasy ucl fantasy ucl fantasy mlsfantasy mls fantasy mls fantasy fantasyf1 fantasy f1 fantasy f1 fantasyucl fantasy ucl fantasy ucl fantasymls fantasy mls fantasy mls ============================================================ Match with \x{000D}\x{000A} (CRLF) in between : f1 fantasy Match with \x{000A} (LF) in between : f1 fantasy Match with \x{000D} (CR) in between : f1 fantasy Match with \x{0009} (TABULATION) in between : fantasy f1 Match with \x{0011} ( VERTICAL TABULATION ) in between : f1fantasy Match with \x{0085} ( NO-BREAK SPACE ) in between : fantasy f1 with a Multi-lignes MIX of these SPECIFIC chars in between : fantasy f1
-
Open the Mark dialog (
Ctrl + M
) -
Tick only the
Purge for each search
andWrap around
options -
Click on the
Mark All
button
As you can see, the regex
A
, in addition to match usual space chars, also matches a lot a “SPACE” characters !=> After the line of equal signs, some matches, possibly multi lines, occurred !
So the regex
B
, more rigorous, just matches zero or more\x20
chars ;-))=> The part, after the equal signs is not detected at all !
If your different words may be present in any case, simply replace the
-i
in-line modifier, within the non-capturing group, by thei
modifier !Best regards,
guy038
-
@guy038 I prefere the first approach,
because I want it to match tabs as well,
also \n shouldn’t be a problem because I’m matching single line items -
@patrickdrd and @guy038,
In much the same way that Goldilocks found Papa Bear’s bed too hard, and Mama Bear’s bed too soft, but Baby Bear’s bed just right …
in the present context, surely
\s
is too promiscuous, and\x20
is too brittle, while\h
is just right! -
Hi, @patrickdrd, @alan-kilborn, @neil-schipper and All,
@neil-schipper is perfectly right ! This third solution
(?-i:(f1|ucl|mls)|(fantasy))\h*(?(1)(?2)|(?1))
is just what you need as\h*
will match any combination, possibly null, ofSpace
,Tabulation
and/orNo-Break Space
char(s) !BR
guy038
-
@neil-schipper I still tend to think that \s is slightly better because why not match “accidental” carriage returns/ line feeds?
which is higher? the possibility of an “accidental” carriage return/ line feed or a “next line mismatch”?
I don’t know if you understand, but I think the first is higher -
@patrickdrd said in regexp help: lookahead and lookbehind with spaces:
why not match “accidental” carriage returns/ line feeds?
I don’t know enough about your data or the intended purpose of the pair matching to say either way, but I will say that, since you are gathering up either ab or ba pairs, then, a small error can make every subsequent pair a different pair from what would have matched without the error.
It’s up to you to think through whether this harms what you’re trying to accomplish, and if it does, to try to devise a strategy that could detect a malformed pair, and maybe skip it and then resync and pick up all subsequent pairs with desired grouping.