select lines with specific number of word it it



  • hi i have a text file like this

    test test \ dfg 54645 Hi fgd 54645 Hi 3 Hi
    Hi 45 Hi 4454645 Hi fgd 54645 Hi 3 Hi

    and i want only have lines with 4 Hi in it( delete others something like that)



  • @dimitrov

    General technique:

    1. Use Mark (Ctrl+m) function to Find what of 4 Hi after ticking Bookmark line checkbox.

    2. Use Search menu’s Bookmark submenu and choose Remove Unmarked Lines



  • @Alan-Kilborn tnx for answering. i mean lines that have repeated Hi in it for 4 times



  • @dimitrov

    Quality of “effort in” must be >= quality of “effort out”

    Anyone that abbreviates the problem statement like you did…deserves what you got as a reply. Cheers.



  • @dimitrov ,

    Following this advice will go a long way toward getting you better / more-complete answers.

    Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as plain text using the </> toolbar button or manual Markdown syntax. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get… Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.



  • Hi @dimitrov, @Alan-Kilborn, @PeterJones

    If a two steps procedure is good enough for you, the following may solve your issue:

    Please do as @Alan-Kilborn told you above: mark lines with this expression

    Search: (?-s)(Hi.*?){4}
    

    and then Remove unmarked lines. You end up with lines with at least 4 Hi. In order to delete those which have 5 Hi or more, mark again lines but with this other expression

    Search:(?-s)(Hi.*?){5}
    

    This time, however, Remove marked lines via the Search menu.

    That’s all. If this procedure doesn’t solve your question, then follows @PeterJones advice.

    Have fun!



  • Hello @dimitrov, @alan-kilborn, @peterjones, @astrosofista and All,

    Here are four generic regexes to solve general cases :

    • To match lines containing your string, whatever the case, as a whole expression, the number of times, within the range :

    (?i-s)^((?:(?!(\bYour string\b)).)*?)(?:(?1)(?2)){Your range}(?1)$

    • To match lines containing your string, whatever the case, the number of times, within the range :

    (?i-s)^((?:(?!(Your string)).)*?)(?:(?1)(?2)){Your range}(?1)$

    • To match lines containing your string, with its exact case, as a whole expression, the number of times, within the range :

    (?-is)^((?:(?!(\bYour string\b)).)*?)(?:(?1)(?2)){Your range}(?1)$

    • To match lines containing your string, with its exact case, the number of times, within the range :

    (?-is)^((?:(?!(Your string)).)*?)(?:(?1)(?2)){Your range}(?1)$


    For instance, all the regexes, below, are valid :

    (?i-s)^((?:(?!(\bHi\b)).)*?)(?:(?1)(?2)){3}(?1)$

    (?i-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){3,5}(?1)$

    (?i-s)^((?:(?!(\bThis is a test\b)).)*?)(?:(?1)(?2)){2,4}(?1)$

    (?-is)^((?:(?!(\bThis is a test\b)).)*?)(?:(?1)(?2)){2,4}(?1)$

    Just test them against this text below :

    -------
    ---Hi-----
    ---Hi----Hi----
    ---Hi----Hi-----Hi----
    ---Hi----Hi-----Hi----Hi----
    ---Hi----Hi-----Hi----Hi----Hi----
    ---Hi----Hi-----Hi----Hi----Hi----Hi-----
    
    
    -------
    ---This is a test-----
    ---This is a test----This is a test----
    ---This is a test----This is a test-----This is a test----
    
    ---This is a test----This is a test-----This is a test----This is a test----
    ---This is a TEST----This is a TEST-----This is a TEST----This is a TEST----
    
    ---This is a test----This is a test-----This is a test----This is a test----This is a test----
    ---This is a test----This is a test-----This is a test----This is a test----This is a test----This is a test-----
    

    Nice, isn’t it ? Have you noticed that the second regex example finds also lines containing the sentence This is a test ? Logical, as the word This does contain the string hi ;-))

    Best regards

    guy038



  • Hi, All,

    Here are the different steps in the genesis of the generic regexes of my previous post, principally in the free-spacing mode (?x) !

    Hope it helps you to understand these tricky regexes ;-))

                 Part WITHOUT Hi  (Part WITHOUT Hi + Hi) x 4  Part WITHOUT Hi
                 ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯       ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                       |                  |          |             |
                       V                  V          V             V
    (?x-s) ^   (   (?!  Hi  ).)*    ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*    $   #  The INITIAL regex
    
    
    (?x-s) ^   (   (?!  Hi  ).)*?   ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*?   $   #  We add a LAZY quantifier at TWO locations and notice 3 IDENTICAL blocks !
               ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯     ¯¯¯¯¯¯¯¯¯¯¯             ¯¯¯¯¯¯¯¯¯¯¯
            Gr1
             V
    (?x-s) ^ ( (   (?!  Hi  ).)*? ) ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*?   $   #  We create a NEW group 1 AROUND the FIRST block
                                      ¯¯¯¯¯¯¯¯¯¯¯             ¯¯¯¯¯¯¯¯¯¯¯
            Gr1                          |                          |
             V                           V                          V
    (?x-s) ^ ( (   (?!  Hi  ).)*? ) (   (?1)        Hi  ){4}       (?1)     $   #  We replace the 2nd and 3rd BLOCKS by a SUB-ROUTINE CALL to GROUP 1 (?1)
    
            Gr1
             V
    (?x-s) ^ ( (?: (?!  Hi  ).)*? ) (   (?1)        Hi  ){4}       (?1)     $   #  We change the PRESENT 2nd GROUP as a NON-CAPTURING group and notice 2 strings "Hi"
                        ¯¯                          ¯¯
             Gr1       Gr2
             V         V
    (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (   (?1)        Hi  ){4}       (?1)     $   #  We create a NEW group 2 around the FIRST string "Hi"
                        ¯¯                          ¯¯
             Gr1      Gr2                           |
             V         V                            v
    (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (   (?1)       (?2) ){4}       (?1)     $   #  We replace THE 2nd string "Hi" by a SUB-ROUTINE CALL to GROUP 2 (?2)
    
             Gr1      Gr2
             V         V
    (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (?: (?1)       (?2) ){4}       (?1)     $   #  We change the 3rd PRESENT group as a NON-CAPTURING group
    
    
    (?-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){4}(?1)$                                 #  We suppress the FREE-SPACING mode and DELETE any SPACE character
    
    
    (?i-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){4}(?1)$                                #  We add the CASE modifier
    
    
    (?i-s)^((?:(?!(\bHi\b)).)*?)(?:(?1)(?2)){4}(?1)$                            #  We add the \b BOUNDARIES to get a WHOLE expression
                                                   ^
                                                   | 
                                         For the 3 LAST regexes, STOP
                                         the SELECTION at the $ sign
    

    You may test any of these regexes against the text below: It should match only the line containing four strings Hi, only !

    -------
    ---Hi-----
    ---Hi----Hi----
    ---Hi----Hi-----Hi----
    ---Hi----Hi-----Hi----Hi----
    ---Hi----Hi-----Hi----Hi----Hi----
    ---Hi----Hi-----Hi----Hi----Hi----Hi-----
    

    Cheers,

    guy038



  • Hello @dimitrov, @alan-kilborn, @peterjones, @astrosofista and All,

    Two days later, I realize that my generic regex could be shortened a bit ! Reading, again, what I said in my previous post :

                 Part WITHOUT Hi  (Part WITHOUT Hi + Hi) x 4  Part WITHOUT Hi
                 ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯       ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                       |                  |          |             |
                       V                  V          V             V
    (?x-s) ^   (   (?!  Hi  ).)*    ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*    $   #  The INITIAL regex
    

    You may see, as I did, that the regex begins with “Part WITHOUT Hi” and ( “Part WITHOUT Hi” and “Hi” ) repeated four times. So, there is a redundant part in this expression !

    The initial regex should be, simply :

            ^  (Part WITHOUT Hi + Hi) x 4 + Part WITHOUT Hi  $
                ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯        ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                        |         |              |
                        V         V              V
    (?x-s)  ^     ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*       $   #  The INITIAL regex
    

    Thus, the four generic regexes to solve general cases are simplified :

    • To match lines containing your string, whatever the case, as a whole expression, the number of times, within the range :

    (?i-s)^(?:((?:(?!(\bYour string\b)).)*?)(?2)){Your range}(?1)$

    • To match lines containing your string, whatever the case, the number of times, within the range :

    (?i-s)^(?:((?:(?!(Your string)).)*?)(?2)){Your range}(?1)$

    • To match lines containing your string, with its exact case, as a whole expression, the number of times, within the range :

    (?-is)^(?:((?:(?!(\bYour string\b)).)*?)(?2)){Your range}(?1)$

    • To match lines containing your string, with its exact case, the number of times, within the range :

    (?-is)^(?:((?:(?!(Your string)).)*?)(?2)){Your range}(?1)$


    For instance, all the regexes, below, are valid :

    (?i-s)^(?:((?:(?!(\bHi\b)).)*?)(?2)){4}(?1)$

    (?i-s)^(?:((?:(?!(Hi)).)*?)(?2)){3,5}(?1)$

    (?i-s)^(?:((?:(?!(\bThis is a test\b)).)*?)(?2)){2,4}(?1)$

    (?-is)^(?:((?:(?!(\bThis is a test\b)).)*?)(?2)){2,4}(?1)$

    Just test them against this text below :

    -------
    ---Hi-----
    ---Hi----Hi----
    ---Hi----Hi-----Hi----
    ---Hi----Hi-----Hi----Hi----
    ---Hi----Hi-----Hi----Hi----Hi----
    ---Hi----Hi-----Hi----Hi----Hi----Hi-----
    
    
    -------
    ---This is a test-----
    ---This is a test----This is a test----
    ---This is a test----This is a test-----This is a test----
    
    ---This is a test----This is a test-----This is a test----This is a test----
    ---This is a TEST----This is a TEST-----This is a TEST----This is a TEST----
    
    ---This is a test----This is a test-----This is a test----This is a test----This is a test----
    ---This is a test----This is a test-----This is a test----This is a test----This is a test----This is a test-----
    

    Below, a list of the different steps in the genesis of the generic regex which should help you to understand this tricky regex !

           ^  (     Part WITHOUT Hi  +  Hi) x 4 + Part WITHOUT Hi  $
                    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯     ¯¯        ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                           |            |                |
                           V            V                V
    (?x-s) ^  (     (   (?! Hi ).)*?    Hi  ){4}   (   (?!Hi).)*   $   #  The INITIAL regex
    
    
    (?x-s) ^  (?:   (?: (?! Hi ).)*?    Hi  ){4}   (?: (?!Hi).)*?  $   #  We change all GROUPS as NON-CAPTURING, add a LAZY quantifier and notice 2 IDENTICAL blocks !
                    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯               ¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                 Gr1
                  V
    (?x-s) ^  (?: ( (?: (?!  Hi  ).)*? )  Hi  ){4} (?: (?!Hi).)*?  $   #  We create a NEW group 1 AROUND the FIRST block
                             ––           ––       
                 Gr1
                  V
    (?x-s) ^  (?: ( (?: (?!  Hi  ).)*? )  Hi  ){4}      (?1)       $   #  We replace the 2nd BLOCK by a SUB-ROUTINE CALL to GROUP 1 (?1)
                             ¯¯           ¯¯
                 Gr1        Gr2
                  V         V
    (?x-s) ^  (?: ( (?: (?! (Hi) ).)*? )  Hi  ){4}      (?1)       $   #  We create a NEW group 2 AROUND the FIRST string "Hi"
    
                 Gr1        Gr2
                  V         V
    (?x-s) ^  (?: ( (?: (?! (Hi) ).)*? ) (?2) ){4}      (?1)       $   #  We replace THE 2nd string "Hi" by a SUB-ROUTINE CALL to GROUP 2 (?2)
    
    
    (?-s)^(?:((?:(?!(Hi)).)*?)(?2)){4}(?1)$                            #  We suppress the FREE-SPACING mode and DELETE any SPACE character
    
    
    (?i-s)^(?:((?:(?!(Hi)).)*?)(?2)){4}(?1)$                           #  We add the CASE modifier
    
    
    (?i-s)^(?:((?:(?!(\bHi\b)).)*?)(?2)){4}(?1)$                       #  We add the \b BOUNDARIES to get a WHOLE expression
                                               ^
                                               | 
                                    For the 3 LAST regexes, STOP
                                    the SELECTION at the $ sign
    

    You may test any of these regexes against the text below. It matches only the line containing exactly four strings Hi !


    ---Hi-----
    ---Hi----Hi----
    ---Hi----Hi-----Hi----
    ---Hi----Hi-----Hi----Hi----
    ---Hi----Hi-----Hi----Hi----Hi----
    ---Hi----Hi-----Hi----Hi----Hi----Hi-----
    

    Best regards,

    guy038


Log in to reply