Regex capture date if it exists in a block but match the block anyway
-
I am using this regex expression
(^(@X\d.+F\.$\R)(?s:.*?)(^\d\d \D\D\D \d\d\d\d\.)?(?s:.*?)(?=@|==))
to capture two groups and capture an optional date in a test sample block:
== Ma == @X1@ F. line 1 line 2 @X2@ F. line 3. line 4 02 FEB 1842. line 6 ==
This matches two groups
**@X1@ F.** line 1 line 2
and
**@X2@ F.** line 3. line 4 02 FEB 1842. line 6
but does not capture the date
whereas
^(@X\d.+F\.$\R)(?s:.*?)(^\d\d \D\D\D \d\d\d\d\.)(?s:.*?)(?=@|==)
matches just one group (but both of the required groups)
**@X1@ F.** line 1 line 2 @X2@ F. line 3. line 4 **02 FEB 1842.** line 6
and captures the date.
Can someone (guy038?) produce a regex that will capture the two groups AND the optional date?
-
What does “optional date” mean?
Does it mean your “before” block could look like this (exactly as you show):
== Ma == @X1@ F. line 1 line 2 @X2@ F. line 3. line 4 02 FEB 1842. line 6 ==
Or it (the “before” block) could look like this:
== Ma == @X1@ F. line 1 line 2 @X2@ F. line 3. line 4 line 6 ==
-
it means it could be
== Ma == @X1@ F. line 1 line 2 @X2@ F. line 3. 03 MAR 1806. line 6 ==
== Ma == @X1@ F. line 1 line 2 @X2@ F. line 3. line 4 line 6 ==
or
== Ma == @X1@ F. line 1 04 FEB 1811. line 2 @X2@ F. line 3. 03 MAR 1806. line 6 ==
or
== Ma == @X1@ F. line 1 04 JUN 1961. line 2 @X2@ F. line 3. line 6 ==
Each block begins with the @…@ F. line and can have a variable number of lines, one of which can be the date line.
I want to match each block and capture the date where it occurs. -
Hello, @john-slee, @alan-kilborn and All,
I think that the following regex S/R should be OK !
SEARCH
(?-s)^@X.+\R(?:(?s:[^@]*?)(\d\d\x20\w\w\w\x20\d\d\d\d)\.(?s:.*?)|(?s:.*?))(?=^@X|^==|\Z)
If we use the in-line modifier
(?x)
we can build the corresponding multi-lines regex, with explanations in comments :(?x) # FREE-SPACING mode (?-s) # Forces the DOT regex symbol to match a SINGLE STANDARD character , only ( Not EOL chars ) ^@X.+\R # An ENTIRE Line BEGINNING with @X (?: # NON-capturing group, beginning 2 ALTERNATIVES (?s:[^@]*?) # SHORTEST range of chars, even NULL, DIFFERENT from @, till the DATE, in a NON-capturing group (\d\d\x20\w\w\w\x20\d\d\d\d)\. # DATE, stored in CAPTURING group 1, followed with a DOT (?s:.*?) # SHORTEST range of chars till the LOOK-AROUND, in a NON-capturing group | # OR (?s:.*?) # SHORTEST range of chars, even NULL, till the LOOK-AROUND, in a NON-capturing group ) # END of the NON-capturing group (?=^@X|^==|\Z) # LOOK-AROUND ( if FOLLOWED with @X or ==, BEGINNING a line, or the END of file [ possibly PRECDEDED with EMPTY lines ] )
You can select all that block, with
Ctrl+C
and paste it, withCtrl + V
, in the Find what zone of the Find dialog ;-))Notes :
-
Each matched multi-lines block, from a line
^@X...
to the next line^@X
, excluded, can be used in replacement with the$0
syntax ( The overall match ) -
The group
1
stores the date, when present in current block and is an empty string when the date is absent from block and you can re-use the date, in replacement, with the\1
or$1
syntaxes
Best Regards,
guy038
-
-
@guy038 Thank you so much, Guy. In order to achieve what I want, I adjusted the search regex slightly, so that the whole block, the label line (^@X[/d]+@ F./R) and date are all captured:
(?-s)((^@X.+\R)(?:(?s:[^@]*?)(\d\d\x20\w\w\w\x20\d\d\d\d)\.(?s:.*?)|(?s:.*?)))(?=^@X|^==|\Z)
Best Regards. Stay safe!
John