Delete all rows of a text file except company names

Raymond Lee Fellers

This is a text file in hmtl containing company information. I want to delete everything except the company names.
Every row containing the company names has this code before the name. 
And this code after the company name.

I tried using a regex find and replace with that supposedly would do this but it didn’t work so I’m asking here for suggestions.

An example of one companies html code listing. There could be as many as a 1000 companies in each list so automating this would be a big help.

<table align=‘left’ cellspacing=‘0’ cellpadding=‘3’ width=‘500’><tr><td align=‘left’ width=‘60%’ valign=‘top’>
A & L INDUSTRIAL SERVICES Misty Martinez 
2910 East P Street Deer Park, TX 77536 <a href=‘http://maps.google.com/?q=2910+East+P+Street%2C+Deer+Park%2C+TX+77536’ target=‘_blank’ style=“color: ##228dc1;”>Map</a></td>
<td width=‘40%’ align=‘right’ valign=‘top’>281 470-9805 Fax: 281 470-9899 <a href=‘http://www.anlindustrial.com’ target=‘_blank’>www.anlindustrial.com</a> <a href=‘mailto:misty.martinez@anlindustrial.com’>Email</a></td>
</tr><tr><td align=‘left’ colspan=3>
</td></tr>
</table></td></tr>

Thanks for any suggestions. It’s probably simple for you but not for me.

Terry R

Firstly I can see from your example that possibly at least 1 line has wrapped and now appears as at least 2 lines. The examples are very important as when we create a regex knowing how the line REALLY appears is very important.

Can I therefore suggest you read the FAQ, specifically the posting called
“Request for Help without sufficient information to help you”.
In there is how to represent the data (example) so that the markdown interpreter (which runs these posts) does NOT interfere with the formatting.

Terry

Terry R

@Raymond-Lee-Fellers

Actually there is another way to delete the lines you don’t want. I’ll explain as it seems you may have some regex knowledge already.
Under the Search menu there is a “mark” option. Now you use the text you know that exists for the companies (this MUST NOT occur any any lines you want to delete, only the ones to remain) and insert into the Mark “find what” field. Click on the bookmark line option and then click on “mark all”. So this has now marked all the lines you want to keep. From here you can use the Search menu, Bookmark (near bottom) and select either “remove bookmarked lines” or “remove unmarked lines”. If the first option, then open another tab in NPP and paste them there.

I hope that helps.

Terry

Terry R

@Raymond-Lee-Fellers
Sorry, slight mistake in previous post, I meant to say you could use the “cut bookmarked lines” and then paste in another tab in NPP. However the easiest option is to use “remove unmarked lines”, which will leave the lines you DO want.

Terry

guy038

Hello, @raymond-lee-fellers, @terry-r and All,

So, Raymond, you would like to delete everything except the company names which are located :

After the string 
Before the string

No problem at all with regular expressions ;-))

Copy / Paste your html file in a new Notepad++ tab
Open the Replace dialog ( Ctrl + H )
SEARCH (?s).+?((?-s).+?)|.+
REPLACE \1\r\n ( or \1\n if you work with UNIX files )
Tick the Wrap around option
Select the Regular expression search mode
Click on the Replace All button

Et voilà !

Notes :

First, the global modifier (?s) means that, by default, the dot character will match any single char ( standard or EOL one )
Then the part .+? looks, from cursor position, for the smallest range, even on multi-lines, of any char till the literal string 
Now, the part ((?-s).+?) tries to match the smallest range of standard characters, in a single line due to the (?-s) modifier, till the literal string . That range is stored as group 1, because of the parentheses
If no more range ............ cannot be found, the regex tries the second alternative, after the | symbol ( .+ ) which catches all the remaining chars till the very end of the file
In replacement, any company name is rewritten, \1, followed with the EOL chars \r\n and remaining chars at end of the file are simply replaced with a single line-break as, in that second alternative, the group 1 is not defined !

Remark :

If you do not tick the Wrap around option, in order to run the regex S/R from current location till the end of file, only, be sure that cursor is at the very beginning of the current line, before replacement !

Best Regards,

guy038

Raymond Lee Fellers

Thanks to everyone who helped with this question. Each answer contributed to the solution. Special thanks to guy038 who gave me a better understanding of how the code works and his solution worked perfectly.

Ray Fellers