how to remove lines of text that don’t contain any of specific words in the line of text?
-
What I mean is I want to remove all lines of text that don’t contain any particular words in them (such as mazda, temperature, ice,metal) – so any lines that don’t contain any of these words in the brackets are to be removed
-
The “best” way to do this is to do a Mark operation with the Bookmark line option selected. You can mark each word independently with successive Mark all operations.
When you are done with your marking, you can right-click the bookmark margin and choose Remove Unmarked Lines. Alternatively, this same command is found on the Search menu in the Bookmark submenu.
-
Thanks for that
-
Hello, @gifthubbro, @alan-kilborn and All,
Of course, the use of bookmarks, marking each line where each word occurs is quite relevant and easy to implement !
In addtion to this method, that I personally use very often, I’d like to show you some generic regular expressions that could help !
In this regard, I chose the beginning of the N++
License.txt
as a sample. I just moved each sentence in a new line and deleted any empty line. So, this test text :The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
Now, let’s suppose that we are looking for the three words
that
,or
andthis
, whatever their case, in these19
linesThen, four cases are mathematically possible :
- To remove all the lines which contain the word
that
OR the wordor
OR the wordthis
, use the regex S/R :
SEARCH
(?i-s)(?=^.*\b(that|or|this)\b)^.+\R
REPLACE
Leave EMPTY
- To remove all the lines which do not contain the words
that
ANDor
ANDthis
, use the regex S/R :
SEARCH
(?i-s)(?=^.*\b(that|or|this)\b)^.+|^.+\R
REPLACE
?1$0
- To remove all the lines which contain, simultaneously, the three words
that
,or
andthis
, use the regex S/R :
SEARCH
(?i-s)(?=^.*\bthat\b)(?=^.*\bor\b)(?=^.*\bthis\b)^.+\R
REPLACE
Leave EMPTY
- To remove all the lines which do not contain, simultaneously, the three words
that
,or
andthis
, use the regex S/R :
SEARCH
(?i-s)(?=^.*\bthat\b)(?=^.*\bor\b)(?=^.*\bthis\b)^(.+)|^.+\R
REPLACE
?1$0
Just test it against the text above !
Notes :
-
In the first case, the look-ahead verifies that, at least, one of the *three words exist in the current line and the complete line is deleted
-
In the second case, if, at least, one of the three words exist in current line, the group
1
is defined and the line contents are just rewritten. If none of these words exists in current line, the second alternative matches but as group1
is not defined, the complete line is deleted -
In the third case, three consecutive look-aheads test, at beginning of current line, the existence of each word. If each look-ahead return true, the current complete line is then deleted
-
In the fourth case, after the three consecutive look-aheads return true, the regex catches all current line, stored in group
1
and its contents are simply rewritten. If a line does not contain all these words, the second alternative matches, but as the group1
is not defined, the complete line is deleted
Cheers,
guy038
- To remove all the lines which contain the word
-
That’s a nice companion post to THIS related one from 3.5 years ago.
BTW, in the related one, you have a note at the end:
IMPORTANT : If the Find result panel contains results, from a non-saved file ( with new # name ), the context option Find in this filder… does NOT seem to work ! I’ll add a post to don-ho, to that purpose, very soon !
I think you never followed up on that, as it seems to still (v7.9) not work?
-
Hi, @alan-kilborn and All,
My bad ! I must say that I don’t generally use the Find in these found results option of the Find result panel :-( So, one more thing to put in my TO DO list !
I must also create an issue for the wrong results, in
View > Summary...
and in status bar when you use anUCS-2 LE/BE BOM
encoded file ! And…, since a couple of days, I’ve been trying to build up a regex to match a validURI / URL
, from the official document of the Network Working Group : a true nightmare ! I don’t even know if this regex will fit in the2048
characters, or so, of the search field ;-))This comes from point
2
of the change.log, below and after reading all the concerned issues :2. Fix inaccurate URL detection by replacing a new URL parser (Fix #3912, #3353, #4643, #5029, #6155, #7791, #8634)
I’ll certainly write some comments to Uhf7 and sasumner, very soon !
BR
guy038
-
@guy038 said in how to remove lines of text that don’t contain any of specific words in the line of text?:
I must also create an issue for the wrong results, in View > Summary…
You must be talking about some of the stuff in THIS incredible post. :-)
I’ll certainly write some comments to Uhf7 and sasumner, very soon !
Not sure whether or not to feel sorry for those people. :-)
-
Hi, @alan-kilborn and All,
After some other tests, I realized that this issue happens only when the
new 1
file, exclusively, is concerned. Quite weird !So, the issue
BR
guy038
-
This post is deleted! -
@prahladmifour said in how to remove lines of text that don’t contain any of specific words in the line of text?:
Step 4: In the Find what: text box, type the search word, use the regex search ^(?!..msn.com).$
I don’t think that solution is correct in any way… @prahladmifour Could you please give some examples of how that expression would help here?
-
@litos81 said in how to remove lines of text that don’t contain any of specific words in the line of text?:
Could you please give some examples of how that expression would help here?
It’s a bot.
So I’m guessing probably not.