select lines with specific number of word it it
-
hi i have a text file like this
test test \ dfg 54645 Hi fgd 54645 Hi 3 Hi
Hi 45 Hi 4454645 Hi fgd 54645 Hi 3 Hiand i want only have lines with 4 Hi in it( delete others something like that)
-
General technique:
-
Use Mark (Ctrl+m) function to Find what of
4 Hi
after tickingBookmark line
checkbox. -
Use Search menu’s Bookmark submenu and choose Remove Unmarked Lines
-
-
@Alan-Kilborn tnx for answering. i mean lines that have repeated Hi in it for 4 times
-
Quality of “effort in” must be >= quality of “effort out”
Anyone that abbreviates the problem statement like you did…deserves what you got as a reply. Cheers.
-
Following this advice will go a long way toward getting you better / more-complete answers.
Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as plain text using the
</>
toolbar button or manual Markdown syntax. Screenshots can be pasted from the clipboard to your post usingCtrl+V
to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get… Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries. -
Hi @dimitrov, @Alan-Kilborn, @PeterJones
If a two steps procedure is good enough for you, the following may solve your issue:
Please do as @Alan-Kilborn told you above:
mark
lines with this expressionSearch: (?-s)(Hi.*?){4}
and then
Remove unmarked lines
. You end up with lines with at least 4 Hi. In order to delete those which have 5 Hi or more,mark
again lines but with this other expressionSearch:(?-s)(Hi.*?){5}
This time, however,
Remove marked lines
via theSearch
menu.That’s all. If this procedure doesn’t solve your question, then follows @PeterJones advice.
Have fun!
-
Hello @dimitrov, @alan-kilborn, @peterjones, @astrosofista and All,
Here are four generic regexes to solve general cases :
- To match lines containing your string, whatever the case, as a whole expression, the number of times, within the range :
(?i-s)^((?:(?!(\b
Your string\b)).)*?)(?:(?1)(?2)){
Your range}(?1)$
- To match lines containing your string, whatever the case, the number of times, within the range :
(?i-s)^((?:(?!(
Your string)).)*?)(?:(?1)(?2)){
Your range}(?1)$
- To match lines containing your string, with its exact case, as a whole expression, the number of times, within the range :
(?-is)^((?:(?!(\b
Your string\b)).)*?)(?:(?1)(?2)){
Your range}(?1)$
- To match lines containing your string, with its exact case, the number of times, within the range :
(?-is)^((?:(?!(
Your string)).)*?)(?:(?1)(?2)){
Your range}(?1)$
For instance, all the regexes, below, are valid :
(?i-s)^((?:(?!(\bHi\b)).)*?)(?:(?1)(?2)){3}(?1)$
(?i-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){3,5}(?1)$
(?i-s)^((?:(?!(\bThis is a test\b)).)*?)(?:(?1)(?2)){2,4}(?1)$
(?-is)^((?:(?!(\bThis is a test\b)).)*?)(?:(?1)(?2)){2,4}(?1)$
Just test them against this text below , pasted in a new tab :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ---Hi----- ---Hi----Hi---- ---Hi----Hi-----Hi---- ---Hi----Hi-----Hi----Hi---- ---Hi----Hi-----Hi----Hi----Hi---- ---Hi----Hi-----Hi----Hi----Hi----Hi----- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ---This is a test----- ---This is a test----This is a test---- ---This is a test----This is a test-----This is a test---- ---This is a test----This is a test-----This is a test----This is a test---- ---This is a TEST----This is a TEST-----This is a TEST----This is a TEST---- ---This is a test----This is a test-----This is a test----This is a test----This is a test---- ---This is a test----This is a test-----This is a test----This is a test----This is a test----This is a test-----
Nice, isn’t it ? Have you noticed that the second regex example finds also lines containing the sentence
This is a test
? Logical, as the wordThis
does contain the stringhi
;-))Best regards
guy038
-
Hi, All,
Here are the different steps in the genesis of the generic regexes of my previous post, principally in the free-spacing mode
(?x)
!Hope it helps you to understand these tricky regexes ;-))
Part WITHOUT Hi (Part WITHOUT Hi + Hi) x 4 Part WITHOUT Hi ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ | | | | V V V V (?x-s) ^ ( (?! Hi ).)* ( ((?!Hi).)*? Hi ){4} ((?!Hi).)* $ # The INITIAL regex (?x-s) ^ ( (?! Hi ).)*? ( ((?!Hi).)*? Hi ){4} ((?!Hi).)*? $ # We add a LAZY quantifier at TWO locations and notice 3 IDENTICAL blocks ! ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯ Gr1 V (?x-s) ^ ( ( (?! Hi ).)*? ) ( ((?!Hi).)*? Hi ){4} ((?!Hi).)*? $ # We create a NEW group 1 AROUND the FIRST block ¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯ Gr1 | | V V V (?x-s) ^ ( ( (?! Hi ).)*? ) ( (?1) Hi ){4} (?1) $ # We replace the 2nd and 3rd BLOCKS by a SUB-ROUTINE CALL to GROUP 1 (?1) Gr1 V (?x-s) ^ ( (?: (?! Hi ).)*? ) ( (?1) Hi ){4} (?1) $ # We change the PRESENT 2nd GROUP as a NON-CAPTURING group and notice 2 strings "Hi" ¯¯ ¯¯ Gr1 Gr2 V V (?x-s) ^ ( (?: (?! (Hi) ).)*? ) ( (?1) Hi ){4} (?1) $ # We create a NEW group 2 around the FIRST string "Hi" ¯¯ ¯¯ Gr1 Gr2 | V V v (?x-s) ^ ( (?: (?! (Hi) ).)*? ) ( (?1) (?2) ){4} (?1) $ # We replace THE 2nd string "Hi" by a SUB-ROUTINE CALL to GROUP 2 (?2) Gr1 Gr2 V V (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (?: (?1) (?2) ){4} (?1) $ # We change the 3rd PRESENT group as a NON-CAPTURING group (?-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){4}(?1)$ # We suppress the FREE-SPACING mode and DELETE any SPACE character (?i-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){4}(?1)$ # We add the CASE modifier (?i-s)^((?:(?!(\bHi\b)).)*?)(?:(?1)(?2)){4}(?1)$ # We add the \b BOUNDARIES to get a WHOLE expression ^ | For the 3 LAST regexes, STOP the SELECTION at the $ sign
You may test any of these regexes against the text below, pasted in a new tab : It should match only the line containing four strings
Hi
, only !~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ---Hi----- ---Hi----Hi---- ---Hi----Hi-----Hi---- ---Hi----Hi-----Hi----Hi---- ---Hi----Hi-----Hi----Hi----Hi---- ---Hi----Hi-----Hi----Hi----Hi----Hi-----
Cheers,
guy038
-
Hello @dimitrov, @alan-kilborn, @peterjones, @astrosofista and All,
Two days later, I realize that my generic regex could be shortened a bit ! Reading, again, what I said in my previous post :
Part WITHOUT Hi (Part WITHOUT Hi + Hi) x 4 Part WITHOUT Hi ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ | | | | V V V V (?x-s) ^ ( (?! Hi ).)* ( ((?!Hi).)*? Hi ){4} ((?!Hi).)* $ # The INITIAL regex
You may see, as I did, that the regex begins with “Part WITHOUT Hi” and ( “Part WITHOUT Hi” and “Hi” ) repeated four times. So, there is a redundant part in this expression !
The initial regex should be, simply :
^ (Part WITHOUT Hi + Hi) x 4 + Part WITHOUT Hi $ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ | | | V V V (?x-s) ^ ( ((?!Hi).)*? Hi ){4} ((?!Hi).)* $ # The INITIAL regex
Thus, the four generic regexes to solve general cases are simplified :
- To match lines containing your string, whatever the case, as a whole expression, the number of times, within the range :
(?i-s)^(?:((?:(?!(\b
Your string\b)).)*?)(?2)){
Your range}(?1)$
- To match lines containing your string, whatever the case, the number of times, within the range :
(?i-s)^(?:((?:(?!(
Your string)).)*?)(?2)){
Your range}(?1)$
- To match lines containing your string, with its exact case, as a whole expression, the number of times, within the range :
(?-is)^(?:((?:(?!(\b
Your string\b)).)*?)(?2)){
Your range}(?1)$
- To match lines containing your string, with its exact case, the number of times, within the range :
(?-is)^(?:((?:(?!(
Your string)).)*?)(?2)){
Your range}(?1)$
For instance, all the regexes, below, are valid :
(?i-s)^(?:((?:(?!(\bHi\b)).)*?)(?2)){4}(?1)$
(?i-s)^(?:((?:(?!(Hi)).)*?)(?2)){3,5}(?1)$
(?i-s)^(?:((?:(?!(\bThis is a test\b)).)*?)(?2)){2,4}(?1)$
(?-is)^(?:((?:(?!(\bThis is a test\b)).)*?)(?2)){2,4}(?1)$
Just test them against this text below, pasted in a new tab : :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ---Hi----- ---Hi----Hi---- ---Hi----Hi-----Hi---- ---Hi----Hi-----Hi----Hi---- ---Hi----Hi-----Hi----Hi----Hi---- ---Hi----Hi-----Hi----Hi----Hi----Hi----- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ---This is a test----- ---This is a test----This is a test---- ---This is a test----This is a test-----This is a test---- ---This is a test----This is a test-----This is a test----This is a test---- ---This is a TEST----This is a TEST-----This is a TEST----This is a TEST---- ---This is a test----This is a test-----This is a test----This is a test----This is a test---- ---This is a test----This is a test-----This is a test----This is a test----This is a test----This is a test-----
Below, a list of the different steps in the genesis of the generic regex which should help you to understand this tricky regex !
^ ( Part WITHOUT Hi + Hi) x 4 + Part WITHOUT Hi $ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ | | | V V V (?x-s) ^ ( ( (?! Hi ).)*? Hi ){4} ( (?!Hi).)* $ # The INITIAL regex (?x-s) ^ (?: (?: (?! Hi ).)*? Hi ){4} (?: (?!Hi).)*? $ # We change all GROUPS as NON-CAPTURING, add a LAZY quantifier and notice 2 IDENTICAL blocks ! ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ ¯¯¯¯¯¯¯¯¯¯¯¯¯¯ Gr1 V (?x-s) ^ (?: ( (?: (?! Hi ).)*? ) Hi ){4} (?: (?!Hi).)*? $ # We create a NEW group 1 AROUND the FIRST block –– –– Gr1 V (?x-s) ^ (?: ( (?: (?! Hi ).)*? ) Hi ){4} (?1) $ # We replace the 2nd BLOCK by a SUB-ROUTINE CALL to GROUP 1 (?1) ¯¯ ¯¯ Gr1 Gr2 V V (?x-s) ^ (?: ( (?: (?! (Hi) ).)*? ) Hi ){4} (?1) $ # We create a NEW group 2 AROUND the FIRST string "Hi" Gr1 Gr2 V V (?x-s) ^ (?: ( (?: (?! (Hi) ).)*? ) (?2) ){4} (?1) $ # We replace THE 2nd string "Hi" by a SUB-ROUTINE CALL to GROUP 2 (?2) (?-s)^(?:((?:(?!(Hi)).)*?)(?2)){4}(?1)$ # We suppress the FREE-SPACING mode and DELETE any SPACE character (?i-s)^(?:((?:(?!(Hi)).)*?)(?2)){4}(?1)$ # We add the CASE modifier (?i-s)^(?:((?:(?!(\bHi\b)).)*?)(?2)){4}(?1)$ # We add the \b BOUNDARIES to get a WHOLE expression ^ | For the 3 LAST regexes, STOP the SELECTION at the $ sign
You may test any of these regexes above, against the text below, pasted in a new tab. It matches only the line containing exactly four strings
Hi
!
---Hi----- ---Hi----Hi---- ---Hi----Hi-----Hi---- ---Hi----Hi-----Hi----Hi---- ---Hi----Hi-----Hi----Hi----Hi---- ---Hi----Hi-----Hi----Hi----Hi----Hi-----
Best regards,
guy038
-