Delete all rows of a text file except company names

Reply to Delete all rows of a text file except company names on Mon, 07 Jan 2019 10:42:31 GMT

Raymond Lee Fellers — Mon, 07 Jan 2019 10:42:31 GMT

Thanks to everyone who helped with this question. Each answer contributed to the solution. Special thanks to guy038 who gave me a better understanding of how the code works and his solution worked perfectly.

Ray Fellers

Reply to Delete all rows of a text file except company names on Mon, 07 Jan 2019 10:21:35 GMT

guy038 — Mon, 07 Jan 2019 10:21:35 GMT

Hello, @raymond-lee-fellers, @terry-r and All,

So, Raymond, you would like to delete everything except the company names which are located :

After the string

Before the string

No problem at all with regular expressions ;-))

Copy / Paste your html file in a new Notepad++ tab
Open the Replace dialog ( Ctrl + H )
SEARCH (?s).+?((?-s).+?)|.+

REPLACE \1\r\n ( or \1\n if you work with UNIX files )

Tick the Wrap around option

Select the Regular expression search mode

Click on the Replace All button

Et voilà !

Notes :

First, the global modifier (?s) means that, by default, the dot character will match any single char ( standard or EOL one )
Then the part .+? looks, from cursor position, for the smallest range, even on multi-lines, of any char till the literal string

Now, the part ((?-s).+?) tries to match the smallest range of standard characters, in a single line due to the (?-s) modifier, till the literal string . That range is stored as group 1, because of the parentheses

If no more range ............ cannot be found, the regex tries the second alternative, after the | symbol ( .+ ) which catches all the remaining chars till the very end of the file

In replacement, any company name is rewritten, \1, followed with the EOL chars \r\n and remaining chars at end of the file are simply replaced with a single line-break as, in that second alternative, the group 1 is not defined !

Remark :

If you do not tick the Wrap around option, in order to run the regex S/R from current location till the end of file, only, be sure that cursor is at the very beginning of the current line, before replacement !

Best Regards,

guy038

Reply to Delete all rows of a text file except company names on Mon, 07 Jan 2019 01:56:37 GMT

Terry R — Mon, 07 Jan 2019 01:56:37 GMT

@Raymond-Lee-Fellers
Sorry, slight mistake in previous post, I meant to say you could use the “cut bookmarked lines” and then paste in another tab in NPP. However the easiest option is to use “remove unmarked lines”, which will leave the lines you DO want.

Terry

Reply to Delete all rows of a text file except company names on Mon, 07 Jan 2019 01:46:20 GMT

Terry R — Mon, 07 Jan 2019 01:46:20 GMT

@Raymond-Lee-Fellers

Actually there is another way to delete the lines you don’t want. I’ll explain as it seems you may have some regex knowledge already.
Under the Search menu there is a “mark” option. Now you use the text you know that exists for the companies (this MUST NOT occur any any lines you want to delete, only the ones to remain) and insert into the Mark “find what” field. Click on the bookmark line option and then click on “mark all”. So this has now marked all the lines you want to keep. From here you can use the Search menu, Bookmark (near bottom) and select either “remove bookmarked lines” or “remove unmarked lines”. If the first option, then open another tab in NPP and paste them there.

I hope that helps.

Terry

Reply to Delete all rows of a text file except company names on Mon, 07 Jan 2019 01:36:38 GMT

Terry R — Mon, 07 Jan 2019 01:36:38 GMT

Firstly I can see from your example that possibly at least 1 line has wrapped and now appears as at least 2 lines. The examples are very important as when we create a regex knowing how the line REALLY appears is very important.

Can I therefore suggest you read the FAQ, specifically the posting called
“Request for Help without sufficient information to help you”.
In there is how to represent the data (example) so that the markdown interpreter (which runs these posts) does NOT interfere with the formatting.

Terry