Regex capture date if it exists in a block but match the block anyway
-
I am using this regex expression
(^(@X\d.+F\.$\R)(?s:.*?)(^\d\d \D\D\D \d\d\d\d\.)?(?s:.*?)(?=@|==))to capture two groups and capture an optional date in a test sample block:
== Ma == @X1@ F. line 1 line 2 @X2@ F. line 3. line 4 02 FEB 1842. line 6 ==This matches two groups
**@X1@ F.** line 1 line 2and
**@X2@ F.** line 3. line 4 02 FEB 1842. line 6but does not capture the date
whereas
^(@X\d.+F\.$\R)(?s:.*?)(^\d\d \D\D\D \d\d\d\d\.)(?s:.*?)(?=@|==)matches just one group (but both of the required groups)
**@X1@ F.** line 1 line 2 @X2@ F. line 3. line 4 **02 FEB 1842.** line 6and captures the date.
Can someone (guy038?) produce a regex that will capture the two groups AND the optional date?
-
What does “optional date” mean?
Does it mean your “before” block could look like this (exactly as you show):
== Ma == @X1@ F. line 1 line 2 @X2@ F. line 3. line 4 02 FEB 1842. line 6 ==Or it (the “before” block) could look like this:
== Ma == @X1@ F. line 1 line 2 @X2@ F. line 3. line 4 line 6 == -
it means it could be
== Ma == @X1@ F. line 1 line 2 @X2@ F. line 3. 03 MAR 1806. line 6 ==== Ma == @X1@ F. line 1 line 2 @X2@ F. line 3. line 4 line 6 ==or
== Ma == @X1@ F. line 1 04 FEB 1811. line 2 @X2@ F. line 3. 03 MAR 1806. line 6 ==or
== Ma == @X1@ F. line 1 04 JUN 1961. line 2 @X2@ F. line 3. line 6 ==Each block begins with the @…@ F. line and can have a variable number of lines, one of which can be the date line.
I want to match each block and capture the date where it occurs. -
Hello, @john-slee, @alan-kilborn and All,
I think that the following regex S/R should be OK !
SEARCH
(?-s)^@X.+\R(?:(?s:[^@]*?)(\d\d\x20\w\w\w\x20\d\d\d\d)\.(?s:.*?)|(?s:.*?))(?=^@X|^==|\Z)If we use the in-line modifier
(?x)we can build the corresponding multi-lines regex, with explanations in comments :(?x) # FREE-SPACING mode (?-s) # Forces the DOT regex symbol to match a SINGLE STANDARD character , only ( Not EOL chars ) ^@X.+\R # An ENTIRE Line BEGINNING with @X (?: # NON-capturing group, beginning 2 ALTERNATIVES (?s:[^@]*?) # SHORTEST range of chars, even NULL, DIFFERENT from @, till the DATE, in a NON-capturing group (\d\d\x20\w\w\w\x20\d\d\d\d)\. # DATE, stored in CAPTURING group 1, followed with a DOT (?s:.*?) # SHORTEST range of chars till the LOOK-AROUND, in a NON-capturing group | # OR (?s:.*?) # SHORTEST range of chars, even NULL, till the LOOK-AROUND, in a NON-capturing group ) # END of the NON-capturing group (?=^@X|^==|\Z) # LOOK-AROUND ( if FOLLOWED with @X or ==, BEGINNING a line, or the END of file [ possibly PRECDEDED with EMPTY lines ] )You can select all that block, with
Ctrl+Cand paste it, withCtrl + V, in the Find what zone of the Find dialog ;-))Notes :
-
Each matched multi-lines block, from a line
^@X...to the next line^@X, excluded, can be used in replacement with the$0syntax ( The overall match ) -
The group
1stores the date, when present in current block and is an empty string when the date is absent from block and you can re-use the date, in replacement, with the\1or$1syntaxes
Best Regards,
guy038
-
-
@guy038 Thank you so much, Guy. In order to achieve what I want, I adjusted the search regex slightly, so that the whole block, the label line (^@X[/d]+@ F./R) and date are all captured:
(?-s)((^@X.+\R)(?:(?s:[^@]*?)(\d\d\x20\w\w\w\x20\d\d\d\d)\.(?s:.*?)|(?s:.*?)))(?=^@X|^==|\Z)Best Regards. Stay safe!
John
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login