Delete all lines except
-
too slow for another edit: if you prefix and suffix the FIND string with
\R*
, you can get rid of the blank lines, too:- FIND =
(?s)\R*<tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>[^@]*?</table></td></tr>\R*
- FIND =
-
@PeterJones said:
well, if you fix smartquotes, assume nothing else was markdown-ed away, and ignore the initial
Or better yet, you fix nothing, and just…move…on…
It is actually amazing the number of people here that make you want to figure out what they are trying to ask before you give them help.
-
Given that it’s the probably the forum itself that clobbered the real quotes into smart quotes, and the little “?” isn’t super-obvious, being light grey and pretty small and a one-character-wide click, I’m willing to fix smartquotes for first-time posters (and since in his first thread, no one pointed out to Raymond how to properly format in these forums, I gave an extension).
But now that I’ve explained it, and pointed out multiple times to @Raymond-Lee-Fellers that he needs to apply Markdown formatting to force the blocks to be rendered unedited, I will expect any further clarifications or posts from him to be formatted better
-
For my own curiosity (one of these days, I want this idiom to stick): starting with my FIND =
(?s)\R*<tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>[^@]*?</table></td></tr>\R*
as the baseline, how would you fix[^@]*?
to search for "any non-greedy sequence of characters that does not containmailto:
" instead of “any non-greedy sequence of characters that does not contain an@
sign” -
@Alan-Kilborn oops, you’re right. The 1st example is different from the second one; however it appears that all subsequent blocks do begin and end with the same code. My apologies.
-
No problem…helping you out is the most important thing.
This may be the idiom you seek?:
((?!mailto:).)*
It is not the easiest thing to remember.
-
@PeterJones said:
(?s)<tr><td align=‘left’ colspan=3><hr style=‘border: solid 1px black;’/>[^@]*?</table></td></tr>
Works perfectly. Didn’t have to fix anything. Your code works out of the box. Thank you.
-
Hi, @raymond-lee-fellers, @peterjones, @alan-kilborn and All,
First of all, Peter, I was just about to reply when I saw your solution. My solution is quite similar and just a bit less accurate than your one, as my opening boundary is simply
<tr><td align='left' !
!I’ve found the regex :
(?s)<tr><td align='left'[^@]+</table></td></tr>\R
Regarding your question, Peter, the right answer is the Alan’s one ! So, if you’re looking for the smallest area of characters, even on several lines, between the opening boundary
START
and the ending boundaryEND
, in that exact case, which should not contain, for instance, the number123
, the correct regex is :(?s-i)START((?!123).)*?END
Indeed, this regex means that, before each position reached by the regex engine, after the word START, it tests the negative look-ahead, i.e. it asks : On the next three characters, is there a
123
string ? If NOT, the negative look-ahead is TRUE and, then, allows the regex engine to continue the process and to move to the next character
Of course, if you want to select this same area of characters, which does contain the number
123
, this regex is simply :(?s-i)START.*?123.*?END
Test it, against the text, below :
......START123......................END................START123......................END.......... ......START.........................END................START.........................END.......... ......START...........1.............END................START...........1.............END.......... ......START............2............END................START............2............END.......... ......START.............3...........END................START.............3...........END.......... ......START...........123...........END................START...........123...........END.......... ......START...........12............END................START...........12............END.......... ......START............23...........END................START............23...........END.......... ......START...........1.3...........END................START...........1.3...........END.......... ......START......................123END................START......................123END..........
You’ve certainly noticed that, if you look for areas, containing the
123
number, in my sample text, the best is to use the regex(?-is)START.*?123.*?END
, which limit to a single-line range ;-))Best regards,
guy038
P.S. :
Note that the regex
(?s-i)START((?!123).)*?END
could be rewritten, in a complicated way;, as :(?s-i)START(((?!1)|(?=1(?!2))|(?=12(?!3))).)*?END
Just the use of the Boole algebra !. Indeed, if we consider
3
consecutive chars, in order to match all cases different of number “123”, we may use :NOT (123) = NOT 1 OR ( 1 AND NOT 2 ) OR ( 12 AND NOT 3 ) V V V ... 1.. 12. .2. 1.3 ..3 .23
-
@Raymond-Lee-Fellers : glad it worked
@guy038 : thanks for the details
@Alan-Kilborn : a negative lookahead as the start of the match segment. No wonder I cannot store it. I’ve bookmarked it instead.
Now back to studying Python (since two of the forum’s python experts are not actively helping here anymore, I need to up my PythonScript output)
-
since two of the forum’s python experts are not actively helping here anymore, I need to up my PythonScript output …
i really wished there was an easy way to lure both back somehow … for example with a complementary free cake for all returning members 🥧🍰🎂 :D
i’ve still got the hope that someday maybe one or both will return for either historic, good times reasons, or 'cause you lured them back with future’s most ultimate py guru knowledge 👍
(ps: no pressure, i think your py is already pretty good and way better than eg. mine)reader’s note:
this was slightly off topic, so i give my sincere apologies to everyone in advance.