Select lines
-
Hello i have log file of 1mil+ lines and i want to select lines which contains specific words like abc, abcd, abcde and etc. i have 100 of these words and i want to select all of them at once. Any way i can do it?
-
@iCrack said:
i want to select lines which contains specific words like abc, abcd, abcde and etc. i have 100 of these words and i want to select all of them at once. Any way i can do it?
Yes, there is a way. Actually, there are many ways.
Here is one possibility:
- Search > Mark…
- Find What =
^.*(one|two|skip a few|ninety nine|one hundred).*$
- put each of your 100 words inside the parentheses, separated by the vertical bar
|
- as I showed with
ninety nine
andone hundred
, your “words” can be “phrases” that include spaces.
- put each of your 100 words inside the parentheses, separated by the vertical bar
- Enable
☑ Bookmark line
- Enable
☑ Purge for each search
- Mode =
☑ Regular expression
- Click Mark All
When you do this, it will highlight all lines that have at least one of your words or phrases by changing the background color.
Thanks to
☑ Bookmark line
, it also puts the ball-icon on the left side of each matching line. At this point, you can apply tasks from theSearch > Bookmark >
sub-menu , and those tasks will apply to all the matched lines.I used to
Purge for each search
, because if you edited one of the lines to no longer contain a matching word, and ran the search again, all the previous matches would still be highlighted, even if they aren’t still matching. If you want them to remain highlighted even if they stop matching, then disable thePurge
option.It might look something like this:
-
Hello, @icrak, @peterjones and All,
@icrak, you said :
I want to select lines which contains specific words like abc, abcd, abcd…
Just a point of clarification :
If you are looking for words, whose one is a subset of another, the longest word must be placed first in the list of alternatives of the search regex !
So, the regex
abc|abcd|abcde
would only match theabc
string, whereas the regexabcde|abcd|abc
correctly matches the3
strings abc, abcd and abcde ;-))Indeed, in the Boost regex library, alternatives are tried, successively, from left to right and the first alternative, matching the text, is selected, if, of course, the subsequent parts of the regex also match !
Best Regards,
guy038
-
@guy038 said:
If you are looking for words, whose one is a subset of another, the longest word must be placed first in the list of alternatives of the search regex !
So, the regex abc|abcd|abcde would only match the abc string, whereas the regex abcde|abcd|abc correctly matches the 3 strings abc, abcd and abcde ;-))
I had thought of mentioning that, or coding around that. My thought would have been to include boundaries,
^.*\b(one|too)\b.*$
, which would properly findThis one should be found This should be found by none of the searches, despite o-n-e being inside "none". This too shall pass But not this Works at the end-of-line, too Or even when the end of file doesn't have a newline on this one