Select lines

iCrack

Hello i have log file of 1mil+ lines and i want to select lines which contains specific words like abc, abcd, abcde and etc. i have 100 of these words and i want to select all of them at once. Any way i can do it?

PeterJones

@iCrack said:

i want to select lines which contains specific words like abc, abcd, abcde and etc. i have 100 of these words and i want to select all of them at once. Any way i can do it?

Yes, there is a way. Actually, there are many ways.

Here is one possibility:

Search > Mark…
Find What = ^.*(one|two|skip a few|ninety nine|one hundred).*$
- put each of your 100 words inside the parentheses, separated by the vertical bar |
- as I showed with ninety nine and one hundred, your “words” can be “phrases” that include spaces.
Enable ☑ Bookmark line
Enable ☑ Purge for each search
Mode = ☑ Regular expression
Click Mark All

When you do this, it will highlight all lines that have at least one of your words or phrases by changing the background color.

Thanks to ☑ Bookmark line, it also puts the ball-icon on the left side of each matching line. At this point, you can apply tasks from the Search > Bookmark > sub-menu , and those tasks will apply to all the matched lines.

I used to Purge for each search, because if you edited one of the lines to no longer contain a matching word, and ran the search again, all the previous matches would still be highlighted, even if they aren’t still matching. If you want them to remain highlighted even if they stop matching, then disable the Purge option.

It might look something like this:

guy038

Hello, @icrak, @peterjones and All,

@icrak, you said :

I want to select lines which contains specific words like abc, abcd, abcd…

Just a point of clarification :

If you are looking for words, whose one is a subset of another, the longest word must be placed first in the list of alternatives of the search regex !

So, the regex abc|abcd|abcde would only match the abc string, whereas the regex abcde|abcd|abc correctly matches the 3 strings abc, abcd and abcde ;-))

Indeed, in the Boost regex library, alternatives are tried, successively, from left to right and the first alternative, matching the text, is selected, if, of course, the subsequent parts of the regex also match !

Best Regards,

guy038

PeterJones

@guy038 said:

If you are looking for words, whose one is a subset of another, the longest word must be placed first in the list of alternatives of the search regex !

So, the regex abc|abcd|abcde would only match the abc string, whereas the regex abcde|abcd|abc correctly matches the 3 strings abc, abcd and abcde ;-))

I had thought of mentioning that, or coding around that. My thought would have been to include boundaries, ^.*\b(one|too)\b.*$, which would properly find

This one should be found
This should be found by none of the searches, despite o-n-e being inside "none".
This too shall pass
But not this
Works at the end-of-line, too
Or even when the end of file doesn't have a newline on this one