Delete all lines except



  • @Raymond-Lee-Fellers said:

    since every block of code begins and ends with the exact same code

    Well, in your example above this is NOT the case (the 2 cases don’t start out the same way, although they do end the same way). One could guess at what is missing, but you seem so sure it is all there. Unless I am just not seeing it somehow.



  • @Alan-Kilborn , well, if you fix smartquotes, assume nothing else was markdown-ed away, and ignore the initial <table align="center" style="width:95%;" border="0">, then both blocks start with <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/> and end with </table></td></tr>.

    I am just not good enough with finding long strings that don’t contain a specific subsequence to be able to come up with one that will work. But I see @guy038 is browsing this topic now, so I assume magic regex will soon be appearing…



  • Wait, when I say it that way, it was easier than I thought. Assuming the @ sign is sufficient to mark it as having an email, then the following should work to find and delete the ones without an email:

    • REGULAR EXPRESSION mode
    • FIND = (?s)<tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>[^@]*?</table></td></tr>
    • REPLACE = `` (empty)

    If you need a longer string than just the @ to determine that it’s got an email, then I defer to Guy.

    edit: yes, when I run it on:

    <table align="center" style="width:95%;" border="0"><tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>
    <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'>
    <font color='#595f75'><strong>A & L INDUSTRIAL SERVICES</strong><br>Misty Martinez<br></font>
    <font color='#595f75'>2910 East P Street<br>Deer Park, TX 77536</font>   <a href='http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536' target='_blank' style="color: ##228dc1;">Map</a></td>
    <td width='40%' align='right' valign='top'>281 470-9805<br>FAX: 281 470-9899<br><a href='http://www.anlindustrial.com' target='_blank'><font color='#228dc1'>www.anlindustrial.com</font></a><br><a href='mailto:misty.martinez@anlindustrial.com'><font color='#228dc1'>Email</font></a></td>
    </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'>
    </span></td></tr>
    </table></td></tr>
    <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>
    <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'>
    <font color='#595f75'><strong>A Life to Live Animal Shelter & Adoption Center</strong><br>Megan Gonzales<br></font>
    <font color='#595f75'>P.O. Box 873<br>Baytown, TX 77522</font></td>
    <td width='40%' align='right' valign='top'>832 821-5420<br><a href='http://www.adopttosave.org' target='_blank'><font color='#228dc1'>www.adopttosave.org</font></a></td>
    </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'>
    </span></td></tr>
    </table></td></tr>
    <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>
    <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'>
    <font color='#595f75'><strong>A & L INDUSTRIAL SERVICES</strong><br>Misty Martinez<br></font>
    <font color='#595f75'>2910 East P Street<br>Deer Park, TX 77536</font>   <a href='http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536' target='_blank' style="color: ##228dc1;">Map</a></td>
    <td width='40%' align='right' valign='top'>281 470-9805<br>FAX: 281 470-9899<br><a href='http://www.anlindustrial.com' target='_blank'><font color='#228dc1'>www.anlindustrial.com</font></a><br><a href='mailto:misty.martinez@anlindustrial.com'><font color='#228dc1'>Email</font></a></td>
    </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'>
    </span></td></tr>
    </table></td></tr>
    <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>
    <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'>
    <font color='#595f75'><strong>A & L INDUSTRIAL SERVICES</strong><br>Misty Martinez<br></font>
    <font color='#595f75'>2910 East P Street<br>Deer Park, TX 77536</font>   <a href='http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536' target='_blank' style="color: ##228dc1;">Map</a></td>
    <td width='40%' align='right' valign='top'>281 470-9805<br>FAX: 281 470-9899<br><a href='http://www.anlindustrial.com' target='_blank'><font color='#228dc1'>www.anlindustrial.com</font></a><br><a href='mailto:misty.martinez@anlindustrial.com'><font color='#228dc1'>Email</font></a></td>
    </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'>
    </span></td></tr>
    </table></td></tr>
    <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>
    <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'>
    <font color='#595f75'><strong>A Life to Live Animal Shelter & Adoption Center</strong><br>Megan Gonzales<br></font>
    <font color='#595f75'>P.O. Box 873<br>Baytown, TX 77522</font></td>
    <td width='40%' align='right' valign='top'>832 821-5420<br><a href='http://www.adopttosave.org' target='_blank'><font color='#228dc1'>www.adopttosave.org</font></a></td>
    </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'>
    </span></td></tr>
    </table></td></tr>
    <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>
    <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'>
    <font color='#595f75'><strong>A Life to Live Animal Shelter & Adoption Center</strong><br>Megan Gonzales<br></font>
    <font color='#595f75'>P.O. Box 873<br>Baytown, TX 77522</font></td>
    <td width='40%' align='right' valign='top'>832 821-5420<br><a href='http://www.adopttosave.org' target='_blank'><font color='#228dc1'>www.adopttosave.org</font></a></td>
    </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'>
    </span></td></tr>
    </table></td></tr>
    </table>
    

    I get

     <table align="center" style="width:95%;" border="0"><tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>
    <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'>
    <font color='#595f75'><strong>A & L INDUSTRIAL SERVICES</strong><br>Misty Martinez<br></font>
    <font color='#595f75'>2910 East P Street<br>Deer Park, TX 77536</font>   <a href='http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536' target='_blank' style="color: ##228dc1;">Map</a></td>
    <td width='40%' align='right' valign='top'>281 470-9805<br>FAX: 281 470-9899<br><a href='http://www.anlindustrial.com' target='_blank'><font color='#228dc1'>www.anlindustrial.com</font></a><br><a href='mailto:misty.martinez@anlindustrial.com'><font color='#228dc1'>Email</font></a></td>
    </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'>
    </span></td></tr>
    </table></td></tr>
    
    <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>
    <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'>
    <font color='#595f75'><strong>A & L INDUSTRIAL SERVICES</strong><br>Misty Martinez<br></font>
    <font color='#595f75'>2910 East P Street<br>Deer Park, TX 77536</font>   <a href='http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536' target='_blank' style="color: ##228dc1;">Map</a></td>
    <td width='40%' align='right' valign='top'>281 470-9805<br>FAX: 281 470-9899<br><a href='http://www.anlindustrial.com' target='_blank'><font color='#228dc1'>www.anlindustrial.com</font></a><br><a href='mailto:misty.martinez@anlindustrial.com'><font color='#228dc1'>Email</font></a></td>
    </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'>
    </span></td></tr>
    </table></td></tr>
    <tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>
    <table align='left' cellspacing='0' cellpadding='3' width='500'><tr><td align='left' width='60%' valign='top'>
    <font color='#595f75'><strong>A & L INDUSTRIAL SERVICES</strong><br>Misty Martinez<br></font>
    <font color='#595f75'>2910 East P Street<br>Deer Park, TX 77536</font>   <a href='http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536' target='_blank' style="color: ##228dc1;">Map</a></td>
    <td width='40%' align='right' valign='top'>281 470-9805<br>FAX: 281 470-9899<br><a href='http://www.anlindustrial.com' target='_blank'><font color='#228dc1'>www.anlindustrial.com</font></a><br><a href='mailto:misty.martinez@anlindustrial.com'><font color='#228dc1'>Email</font></a></td>
    </tr><tr><td align='left' colspan=3><span style='font-style: italic; font-weight: bold;'>
    </span></td></tr>
    </table></td></tr>
    
    
    </table>


  • too slow for another edit: if you prefix and suffix the FIND string with \R*, you can get rid of the blank lines, too:

    • FIND = (?s)\R*<tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>[^@]*?</table></td></tr>\R*


  • @PeterJones said:

    well, if you fix smartquotes, assume nothing else was markdown-ed away, and ignore the initial

    Or better yet, you fix nothing, and just…move…on…

    It is actually amazing the number of people here that make you want to figure out what they are trying to ask before you give them help.



  • Given that it’s the probably the forum itself that clobbered the real quotes into smart quotes, and the little “?” isn’t super-obvious, being light grey and pretty small and a one-character-wide click, I’m willing to fix smartquotes for first-time posters (and since in his first thread, no one pointed out to Raymond how to properly format in these forums, I gave an extension).

    But now that I’ve explained it, and pointed out multiple times to @Raymond-Lee-Fellers that he needs to apply Markdown formatting to force the blocks to be rendered unedited, I will expect any further clarifications or posts from him to be formatted better



  • @guy038,

    For my own curiosity (one of these days, I want this idiom to stick): starting with my FIND = (?s)\R*<tr><td align='left' colspan=3><hr style='border: solid 1px black;'/>[^@]*?</table></td></tr>\R* as the baseline, how would you fix [^@]*? to search for "any non-greedy sequence of characters that does not contain mailto: " instead of “any non-greedy sequence of characters that does not contain an @ sign”



  • @Alan-Kilborn oops, you’re right. The 1st example is different from the second one; however it appears that all subsequent blocks do begin and end with the same code. My apologies.



  • @Raymond-Lee-Fellers

    No problem…helping you out is the most important thing.

    @PeterJones

    This may be the idiom you seek?:

    ((?!mailto:).)*

    It is not the easiest thing to remember.



  • @PeterJones said:

    (?s)<tr><td align=‘left’ colspan=3><hr style=‘border: solid 1px black;’/>[^@]*?</table></td></tr>

    Works perfectly. Didn’t have to fix anything. Your code works out of the box. Thank you.



  • Hi, @raymond-lee-fellers, @peterjones, @alan-kilborn and All,

    First of all, Peter, I was just about to reply when I saw your solution. My solution is quite similar and just a bit less accurate than your one, as my opening boundary is simply <tr><td align='left' ! !

    I’ve found the regex : (?s)<tr><td align='left'[^@]+</table></td></tr>\R


    Regarding your question, Peter, the right answer is the Alan’s one ! So, if you’re looking for the smallest area of characters, even on several lines, between the opening boundary START and the ending boundary END, in that exact case, which should not contain, for instance, the number 123, the correct regex is :

    (?s-i)START((?!123).)*?END

    Indeed, this regex means that, before each position reached by the regex engine, after the word START, it tests the negative look-ahead, i.e. it asks : On the next three characters, is there a 123 string ? If NOT, the negative look-ahead is TRUE and, then, allows the regex engine to continue the process and to move to the next character


    Of course, if you want to select this same area of characters, which does contain the number 123, this regex is simply :

    (?s-i)START.*?123.*?END

    Test it, against the text, below :

    ......START123......................END................START123......................END..........
    ......START.........................END................START.........................END..........
    ......START...........1.............END................START...........1.............END..........
    ......START............2............END................START............2............END..........
    ......START.............3...........END................START.............3...........END..........
    ......START...........123...........END................START...........123...........END..........
    ......START...........12............END................START...........12............END..........
    ......START............23...........END................START............23...........END..........
    ......START...........1.3...........END................START...........1.3...........END..........
    ......START......................123END................START......................123END..........
    

    You’ve certainly noticed that, if you look for areas, containing the 123 number, in my sample text, the best is to use the regex (?-is)START.*?123.*?END, which limit to a single-line range ;-))

    Best regards,

    guy038

    P.S. :

    Note that the regex (?s-i)START((?!123).)*?END could be rewritten, in a complicated way;, as :

    (?s-i)START(((?!1)|(?=1(?!2))|(?=12(?!3))).)*?END

    Just the use of the Boole algebra !. Indeed, if we consider 3 consecutive chars, in order to match all cases different of number “123”, we may use :

    NOT (123) = NOT 1 OR ( 1 AND NOT 2 ) OR ( 12 AND NOT 3 ) 
                 V             V                  V
                ...           1..                12.
                .2.           1.3
                ..3
                .23
    


  • @Raymond-Lee-Fellers : glad it worked

    @guy038 : thanks for the details

    @Alan-Kilborn : a negative lookahead as the start of the match segment. No wonder I cannot store it. I’ve bookmarked it instead.

    Now back to studying Python (since two of the forum’s python experts are not actively helping here anymore, I need to up my PythonScript output)



  • @PeterJones

    since two of the forum’s python experts are not actively helping here anymore, I need to up my PythonScript output …

    i really wished there was an easy way to lure both back somehow … for example with a complementary free cake for all returning members 🥧🍰🎂 :D

    i’ve still got the hope that someday maybe one or both will return for either historic, good times reasons, or 'cause you lured them back with future’s most ultimate py guru knowledge 👍
    (ps: no pressure, i think your py is already pretty good and way better than eg. mine)

    reader’s note:
    this was slightly off topic, so i give my sincere apologies to everyone in advance.


Log in to reply