Delete all lines except
-
Thanks for taking the time for your extensive help. Many sites that allow posting of html provide a way to isolate the code so it is recognized as such but I did not see that option here.
I thought this would not be too difficult a problem to solve since every block of code begins and ends with the exact same code in this case and the exceptions I want to keep all will have the @ symbol somewhere in that block of code if there is an email address. If there is no @ symbol then all code between the start and end code would be deleted. How is that different than finding a single line and deleting it?
Regardless I will follow you advice and try to structure my question better.
-
@Raymond-Lee-Fellers said:
since every block of code begins and ends with the exact same code
Well, in your example above this is NOT the case (the 2 cases don’t start out the same way, although they do end the same way). One could guess at what is missing, but you seem so sure it is all there. Unless I am just not seeing it somehow.
-
@Alan-Kilborn , well, if you fix smartquotes, assume nothing else was markdown-ed away, and ignore the initial
<table align="center" style="width:95%;" border="0">
, then both blocks start with<tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>
and end with</table></td></tr>
.I am just not good enough with finding long strings that don’t contain a specific subsequence to be able to come up with one that will work. But I see @guy038 is browsing this topic now, so I assume magic regex will soon be appearing…
-
Wait, when I say it that way, it was easier than I thought. Assuming the
@
sign is sufficient to mark it as having an email, then the following should work to find and delete the ones without an email:- REGULAR EXPRESSION mode
- FIND =
(?s)<tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>[^@]*?</table></td></tr>
- REPLACE = `` (empty)
If you need a longer string than just the
@
to determine that it’s got an email, then I defer to Guy.edit: yes, when I run it on:
<table align="center" style="width:95%;" border="0"><tr><td align='left' colspan=3><hr style='border: solid 1px black;'/> <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'> <font color='#595f75'><strong>A & L INDUSTRIAL SERVICES</strong><br>Misty Martinez<br></font> <font color='#595f75'>2910 East P Street<br>Deer Park, TX 77536</font> <a href='http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536' target='_blank' style="color: ##228dc1;">Map</a></td> <td width='40%' align='right' valign='top'>281 470-9805<br>FAX: 281 470-9899<br><a href='http://www.anlindustrial.com' target='_blank'><font color='#228dc1'>www.anlindustrial.com</font></a><br><a href='mailto:misty.martinez@anlindustrial.com'><font color='#228dc1'>Email</font></a></td> </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'> </span></td></tr> </table></td></tr> <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/> <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'> <font color='#595f75'><strong>A Life to Live Animal Shelter & Adoption Center</strong><br>Megan Gonzales<br></font> <font color='#595f75'>P.O. Box 873<br>Baytown, TX 77522</font></td> <td width='40%' align='right' valign='top'>832 821-5420<br><a href='http://www.adopttosave.org' target='_blank'><font color='#228dc1'>www.adopttosave.org</font></a></td> </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'> </span></td></tr> </table></td></tr> <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/> <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'> <font color='#595f75'><strong>A & L INDUSTRIAL SERVICES</strong><br>Misty Martinez<br></font> <font color='#595f75'>2910 East P Street<br>Deer Park, TX 77536</font> <a href='http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536' target='_blank' style="color: ##228dc1;">Map</a></td> <td width='40%' align='right' valign='top'>281 470-9805<br>FAX: 281 470-9899<br><a href='http://www.anlindustrial.com' target='_blank'><font color='#228dc1'>www.anlindustrial.com</font></a><br><a href='mailto:misty.martinez@anlindustrial.com'><font color='#228dc1'>Email</font></a></td> </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'> </span></td></tr> </table></td></tr> <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/> <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'> <font color='#595f75'><strong>A & L INDUSTRIAL SERVICES</strong><br>Misty Martinez<br></font> <font color='#595f75'>2910 East P Street<br>Deer Park, TX 77536</font> <a href='http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536' target='_blank' style="color: ##228dc1;">Map</a></td> <td width='40%' align='right' valign='top'>281 470-9805<br>FAX: 281 470-9899<br><a href='http://www.anlindustrial.com' target='_blank'><font color='#228dc1'>www.anlindustrial.com</font></a><br><a href='mailto:misty.martinez@anlindustrial.com'><font color='#228dc1'>Email</font></a></td> </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'> </span></td></tr> </table></td></tr> <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/> <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'> <font color='#595f75'><strong>A Life to Live Animal Shelter & Adoption Center</strong><br>Megan Gonzales<br></font> <font color='#595f75'>P.O. Box 873<br>Baytown, TX 77522</font></td> <td width='40%' align='right' valign='top'>832 821-5420<br><a href='http://www.adopttosave.org' target='_blank'><font color='#228dc1'>www.adopttosave.org</font></a></td> </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'> </span></td></tr> </table></td></tr> <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/> <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'> <font color='#595f75'><strong>A Life to Live Animal Shelter & Adoption Center</strong><br>Megan Gonzales<br></font> <font color='#595f75'>P.O. Box 873<br>Baytown, TX 77522</font></td> <td width='40%' align='right' valign='top'>832 821-5420<br><a href='http://www.adopttosave.org' target='_blank'><font color='#228dc1'>www.adopttosave.org</font></a></td> </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'> </span></td></tr> </table></td></tr> </table>
I get
<table align="center" style="width:95%;" border="0"><tr><td align='left' colspan=3><hr style='border: solid 1px black;'/> <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'> <font color='#595f75'><strong>A & L INDUSTRIAL SERVICES</strong><br>Misty Martinez<br></font> <font color='#595f75'>2910 East P Street<br>Deer Park, TX 77536</font> <a href='http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536' target='_blank' style="color: ##228dc1;">Map</a></td> <td width='40%' align='right' valign='top'>281 470-9805<br>FAX: 281 470-9899<br><a href='http://www.anlindustrial.com' target='_blank'><font color='#228dc1'>www.anlindustrial.com</font></a><br><a href='mailto:misty.martinez@anlindustrial.com'><font color='#228dc1'>Email</font></a></td> </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'> </span></td></tr> </table></td></tr> <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/> <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'> <font color='#595f75'><strong>A & L INDUSTRIAL SERVICES</strong><br>Misty Martinez<br></font> <font color='#595f75'>2910 East P Street<br>Deer Park, TX 77536</font> <a href='http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536' target='_blank' style="color: ##228dc1;">Map</a></td> <td width='40%' align='right' valign='top'>281 470-9805<br>FAX: 281 470-9899<br><a href='http://www.anlindustrial.com' target='_blank'><font color='#228dc1'>www.anlindustrial.com</font></a><br><a href='mailto:misty.martinez@anlindustrial.com'><font color='#228dc1'>Email</font></a></td> </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'> </span></td></tr> </table></td></tr> <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/> <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'> <font color='#595f75'><strong>A & L INDUSTRIAL SERVICES</strong><br>Misty Martinez<br></font> <font color='#595f75'>2910 East P Street<br>Deer Park, TX 77536</font> <a href='http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536' target='_blank' style="color: ##228dc1;">Map</a></td> <td width='40%' align='right' valign='top'>281 470-9805<br>FAX: 281 470-9899<br><a href='http://www.anlindustrial.com' target='_blank'><font color='#228dc1'>www.anlindustrial.com</font></a><br><a href='mailto:misty.martinez@anlindustrial.com'><font color='#228dc1'>Email</font></a></td> </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'> </span></td></tr> </table></td></tr> </table>
-
too slow for another edit: if you prefix and suffix the FIND string with
\R*
, you can get rid of the blank lines, too:- FIND =
(?s)\R*<tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>[^@]*?</table></td></tr>\R*
- FIND =
-
@PeterJones said:
well, if you fix smartquotes, assume nothing else was markdown-ed away, and ignore the initial
Or better yet, you fix nothing, and just…move…on…
It is actually amazing the number of people here that make you want to figure out what they are trying to ask before you give them help.
-
Given that it’s the probably the forum itself that clobbered the real quotes into smart quotes, and the little “?” isn’t super-obvious, being light grey and pretty small and a one-character-wide click, I’m willing to fix smartquotes for first-time posters (and since in his first thread, no one pointed out to Raymond how to properly format in these forums, I gave an extension).
But now that I’ve explained it, and pointed out multiple times to @Raymond-Lee-Fellers that he needs to apply Markdown formatting to force the blocks to be rendered unedited, I will expect any further clarifications or posts from him to be formatted better
-
For my own curiosity (one of these days, I want this idiom to stick): starting with my FIND =
(?s)\R*<tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>[^@]*?</table></td></tr>\R*
as the baseline, how would you fix[^@]*?
to search for "any non-greedy sequence of characters that does not containmailto:
" instead of “any non-greedy sequence of characters that does not contain an@
sign” -
@Alan-Kilborn oops, you’re right. The 1st example is different from the second one; however it appears that all subsequent blocks do begin and end with the same code. My apologies.
-
No problem…helping you out is the most important thing.
This may be the idiom you seek?:
((?!mailto:).)*
It is not the easiest thing to remember.
-
@PeterJones said:
(?s)<tr><td align=‘left’ colspan=3><hr style=‘border: solid 1px black;’/>[^@]*?</table></td></tr>
Works perfectly. Didn’t have to fix anything. Your code works out of the box. Thank you.
-
Hi, @raymond-lee-fellers, @peterjones, @alan-kilborn and All,
First of all, Peter, I was just about to reply when I saw your solution. My solution is quite similar and just a bit less accurate than your one, as my opening boundary is simply
<tr><td align='left' !
!I’ve found the regex :
(?s)<tr><td align='left'[^@]+</table></td></tr>\R
Regarding your question, Peter, the right answer is the Alan’s one ! So, if you’re looking for the smallest area of characters, even on several lines, between the opening boundary
START
and the ending boundaryEND
, in that exact case, which should not contain, for instance, the number123
, the correct regex is :(?s-i)START((?!123).)*?END
Indeed, this regex means that, before each position reached by the regex engine, after the word START, it tests the negative look-ahead, i.e. it asks : On the next three characters, is there a
123
string ? If NOT, the negative look-ahead is TRUE and, then, allows the regex engine to continue the process and to move to the next character
Of course, if you want to select this same area of characters, which does contain the number
123
, this regex is simply :(?s-i)START.*?123.*?END
Test it, against the text, below :
......START123......................END................START123......................END.......... ......START.........................END................START.........................END.......... ......START...........1.............END................START...........1.............END.......... ......START............2............END................START............2............END.......... ......START.............3...........END................START.............3...........END.......... ......START...........123...........END................START...........123...........END.......... ......START...........12............END................START...........12............END.......... ......START............23...........END................START............23...........END.......... ......START...........1.3...........END................START...........1.3...........END.......... ......START......................123END................START......................123END..........
You’ve certainly noticed that, if you look for areas, containing the
123
number, in my sample text, the best is to use the regex(?-is)START.*?123.*?END
, which limit to a single-line range ;-))Best regards,
guy038
P.S. :
Note that the regex
(?s-i)START((?!123).)*?END
could be rewritten, in a complicated way;, as :(?s-i)START(((?!1)|(?=1(?!2))|(?=12(?!3))).)*?END
Just the use of the Boole algebra !. Indeed, if we consider
3
consecutive chars, in order to match all cases different of number “123”, we may use :NOT (123) = NOT 1 OR ( 1 AND NOT 2 ) OR ( 12 AND NOT 3 ) V V V ... 1.. 12. .2. 1.3 ..3 .23
-
@Raymond-Lee-Fellers : glad it worked
@guy038 : thanks for the details
@Alan-Kilborn : a negative lookahead as the start of the match segment. No wonder I cannot store it. I’ve bookmarked it instead.
Now back to studying Python (since two of the forum’s python experts are not actively helping here anymore, I need to up my PythonScript output)
-
since two of the forum’s python experts are not actively helping here anymore, I need to up my PythonScript output …
i really wished there was an easy way to lure both back somehow … for example with a complementary free cake for all returning members 🥧🍰🎂 :D
i’ve still got the hope that someday maybe one or both will return for either historic, good times reasons, or 'cause you lured them back with future’s most ultimate py guru knowledge 👍
(ps: no pressure, i think your py is already pretty good and way better than eg. mine)reader’s note:
this was slightly off topic, so i give my sincere apologies to everyone in advance.