• Login
Community
  • Login

select lines with specific number of word it it

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
9 Posts 5 Posters 835 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • D
    dimitrov
    last edited by Nov 30, 2020, 4:04 PM

    hi i have a text file like this

    test test \ dfg 54645 Hi fgd 54645 Hi 3 Hi
    Hi 45 Hi 4454645 Hi fgd 54645 Hi 3 Hi

    and i want only have lines with 4 Hi in it( delete others something like that)

    A 1 Reply Last reply Nov 30, 2020, 4:10 PM Reply Quote 0
    • A
      Alan Kilborn @dimitrov
      last edited by Nov 30, 2020, 4:10 PM

      @dimitrov

      General technique:

      1. Use Mark (Ctrl+m) function to Find what of 4 Hi after ticking Bookmark line checkbox.

      2. Use Search menu’s Bookmark submenu and choose Remove Unmarked Lines

      D 1 Reply Last reply Nov 30, 2020, 4:46 PM Reply Quote 2
      • D
        dimitrov @Alan Kilborn
        last edited by Nov 30, 2020, 4:46 PM

        @Alan-Kilborn tnx for answering. i mean lines that have repeated Hi in it for 4 times

        A P A 3 Replies Last reply Nov 30, 2020, 6:25 PM Reply Quote 0
        • A
          Alan Kilborn @dimitrov
          last edited by Nov 30, 2020, 6:25 PM

          @dimitrov

          Quality of “effort in” must be >= quality of “effort out”

          Anyone that abbreviates the problem statement like you did…deserves what you got as a reply. Cheers.

          1 Reply Last reply Reply Quote 1
          • P
            PeterJones @dimitrov
            last edited by Nov 30, 2020, 6:38 PM

            @dimitrov ,

            Following this advice will go a long way toward getting you better / more-complete answers.

            Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as plain text using the </> toolbar button or manual Markdown syntax. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get… Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

            1 Reply Last reply Reply Quote 2
            • A
              astrosofista @dimitrov
              last edited by Nov 30, 2020, 7:40 PM

              Hi @dimitrov, @Alan-Kilborn, @PeterJones

              If a two steps procedure is good enough for you, the following may solve your issue:

              Please do as @Alan-Kilborn told you above: mark lines with this expression

              Search: (?-s)(Hi.*?){4}
              

              and then Remove unmarked lines. You end up with lines with at least 4 Hi. In order to delete those which have 5 Hi or more, mark again lines but with this other expression

              Search:(?-s)(Hi.*?){5}
              

              This time, however, Remove marked lines via the Search menu.

              That’s all. If this procedure doesn’t solve your question, then follows @PeterJones advice.

              Have fun!

              1 Reply Last reply Reply Quote 1
              • G
                guy038
                last edited by guy038 Nov 19, 2022, 7:48 PM Nov 30, 2020, 10:47 PM

                Hello @dimitrov, @alan-kilborn, @peterjones, @astrosofista and All,

                Here are four generic regexes to solve general cases :

                • To match lines containing your string, whatever the case, as a whole expression, the number of times, within the range :

                (?i-s)^((?:(?!(\bYour string\b)).)*?)(?:(?1)(?2)){Your range}(?1)$

                • To match lines containing your string, whatever the case, the number of times, within the range :

                (?i-s)^((?:(?!(Your string)).)*?)(?:(?1)(?2)){Your range}(?1)$

                • To match lines containing your string, with its exact case, as a whole expression, the number of times, within the range :

                (?-is)^((?:(?!(\bYour string\b)).)*?)(?:(?1)(?2)){Your range}(?1)$

                • To match lines containing your string, with its exact case, the number of times, within the range :

                (?-is)^((?:(?!(Your string)).)*?)(?:(?1)(?2)){Your range}(?1)$


                For instance, all the regexes, below, are valid :

                (?i-s)^((?:(?!(\bHi\b)).)*?)(?:(?1)(?2)){3}(?1)$

                (?i-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){3,5}(?1)$

                (?i-s)^((?:(?!(\bThis is a test\b)).)*?)(?:(?1)(?2)){2,4}(?1)$

                (?-is)^((?:(?!(\bThis is a test\b)).)*?)(?:(?1)(?2)){2,4}(?1)$

                Just test them against this text below , pasted in a new tab :

                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                ---Hi-----
                ---Hi----Hi----
                ---Hi----Hi-----Hi----
                ---Hi----Hi-----Hi----Hi----
                ---Hi----Hi-----Hi----Hi----Hi----
                ---Hi----Hi-----Hi----Hi----Hi----Hi-----
                
                
                ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                ---This is a test-----
                ---This is a test----This is a test----
                ---This is a test----This is a test-----This is a test----
                
                ---This is a test----This is a test-----This is a test----This is a test----
                ---This is a TEST----This is a TEST-----This is a TEST----This is a TEST----
                
                ---This is a test----This is a test-----This is a test----This is a test----This is a test----
                ---This is a test----This is a test-----This is a test----This is a test----This is a test----This is a test-----
                

                Nice, isn’t it ? Have you noticed that the second regex example finds also lines containing the sentence This is a test ? Logical, as the word This does contain the string hi ;-))

                Best regards

                guy038

                1 Reply Last reply Reply Quote 3
                • G
                  guy038
                  last edited by guy038 Nov 19, 2022, 8:11 PM Dec 1, 2020, 1:20 AM

                  Hi, All,

                  Here are the different steps in the genesis of the generic regexes of my previous post, principally in the free-spacing mode (?x) !

                  Hope it helps you to understand these tricky regexes ;-))

                               Part WITHOUT Hi  (Part WITHOUT Hi + Hi) x 4  Part WITHOUT Hi
                               ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯       ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                     |                  |          |             |
                                     V                  V          V             V
                  (?x-s) ^   (   (?!  Hi  ).)*    ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*    $   #  The INITIAL regex
                  
                  
                  (?x-s) ^   (   (?!  Hi  ).)*?   ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*?   $   #  We add a LAZY quantifier at TWO locations and notice 3 IDENTICAL blocks !
                             ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯     ¯¯¯¯¯¯¯¯¯¯¯             ¯¯¯¯¯¯¯¯¯¯¯
                          Gr1
                           V
                  (?x-s) ^ ( (   (?!  Hi  ).)*? ) ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*?   $   #  We create a NEW group 1 AROUND the FIRST block
                                                    ¯¯¯¯¯¯¯¯¯¯¯             ¯¯¯¯¯¯¯¯¯¯¯
                          Gr1                          |                          |
                           V                           V                          V
                  (?x-s) ^ ( (   (?!  Hi  ).)*? ) (   (?1)        Hi  ){4}       (?1)     $   #  We replace the 2nd and 3rd BLOCKS by a SUB-ROUTINE CALL to GROUP 1 (?1)
                  
                          Gr1
                           V
                  (?x-s) ^ ( (?: (?!  Hi  ).)*? ) (   (?1)        Hi  ){4}       (?1)     $   #  We change the PRESENT 2nd GROUP as a NON-CAPTURING group and notice 2 strings "Hi"
                                      ¯¯                          ¯¯
                           Gr1       Gr2
                           V         V
                  (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (   (?1)        Hi  ){4}       (?1)     $   #  We create a NEW group 2 around the FIRST string "Hi"
                                      ¯¯                          ¯¯
                           Gr1      Gr2                           |
                           V         V                            v
                  (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (   (?1)       (?2) ){4}       (?1)     $   #  We replace THE 2nd string "Hi" by a SUB-ROUTINE CALL to GROUP 2 (?2)
                  
                           Gr1      Gr2
                           V         V
                  (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (?: (?1)       (?2) ){4}       (?1)     $   #  We change the 3rd PRESENT group as a NON-CAPTURING group
                  
                  
                  (?-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){4}(?1)$                                 #  We suppress the FREE-SPACING mode and DELETE any SPACE character
                  
                  
                  (?i-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){4}(?1)$                                #  We add the CASE modifier
                  
                  
                  (?i-s)^((?:(?!(\bHi\b)).)*?)(?:(?1)(?2)){4}(?1)$                            #  We add the \b BOUNDARIES to get a WHOLE expression
                                                                 ^
                                                                 | 
                                                       For the 3 LAST regexes, STOP
                                                       the SELECTION at the $ sign
                  

                  You may test any of these regexes against the text below, pasted in a new tab : It should match only the line containing four strings Hi, only !

                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                  ---Hi-----
                  ---Hi----Hi----
                  ---Hi----Hi-----Hi----
                  ---Hi----Hi-----Hi----Hi----
                  ---Hi----Hi-----Hi----Hi----Hi----
                  ---Hi----Hi-----Hi----Hi----Hi----Hi-----
                  

                  Cheers,

                  guy038

                  1 Reply Last reply Reply Quote 2
                  • G
                    guy038
                    last edited by guy038 Nov 19, 2022, 7:54 PM Dec 2, 2020, 7:35 PM

                    Hello @dimitrov, @alan-kilborn, @peterjones, @astrosofista and All,

                    Two days later, I realize that my generic regex could be shortened a bit ! Reading, again, what I said in my previous post :

                                 Part WITHOUT Hi  (Part WITHOUT Hi + Hi) x 4  Part WITHOUT Hi
                                 ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯       ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                       |                  |          |             |
                                       V                  V          V             V
                    (?x-s) ^   (   (?!  Hi  ).)*    ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*    $   #  The INITIAL regex
                    

                    You may see, as I did, that the regex begins with “Part WITHOUT Hi” and ( “Part WITHOUT Hi” and “Hi” ) repeated four times. So, there is a redundant part in this expression !

                    The initial regex should be, simply :

                            ^  (Part WITHOUT Hi + Hi) x 4 + Part WITHOUT Hi  $
                                ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯        ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                        |         |              |
                                        V         V              V
                    (?x-s)  ^     ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*       $   #  The INITIAL regex
                    

                    Thus, the four generic regexes to solve general cases are simplified :

                    • To match lines containing your string, whatever the case, as a whole expression, the number of times, within the range :

                    (?i-s)^(?:((?:(?!(\bYour string\b)).)*?)(?2)){Your range}(?1)$

                    • To match lines containing your string, whatever the case, the number of times, within the range :

                    (?i-s)^(?:((?:(?!(Your string)).)*?)(?2)){Your range}(?1)$

                    • To match lines containing your string, with its exact case, as a whole expression, the number of times, within the range :

                    (?-is)^(?:((?:(?!(\bYour string\b)).)*?)(?2)){Your range}(?1)$

                    • To match lines containing your string, with its exact case, the number of times, within the range :

                    (?-is)^(?:((?:(?!(Your string)).)*?)(?2)){Your range}(?1)$


                    For instance, all the regexes, below, are valid :

                    (?i-s)^(?:((?:(?!(\bHi\b)).)*?)(?2)){4}(?1)$

                    (?i-s)^(?:((?:(?!(Hi)).)*?)(?2)){3,5}(?1)$

                    (?i-s)^(?:((?:(?!(\bThis is a test\b)).)*?)(?2)){2,4}(?1)$

                    (?-is)^(?:((?:(?!(\bThis is a test\b)).)*?)(?2)){2,4}(?1)$

                    Just test them against this text below, pasted in a new tab : :

                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    ---Hi-----
                    ---Hi----Hi----
                    ---Hi----Hi-----Hi----
                    ---Hi----Hi-----Hi----Hi----
                    ---Hi----Hi-----Hi----Hi----Hi----
                    ---Hi----Hi-----Hi----Hi----Hi----Hi-----
                    
                    
                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    ---This is a test-----
                    ---This is a test----This is a test----
                    ---This is a test----This is a test-----This is a test----
                    
                    ---This is a test----This is a test-----This is a test----This is a test----
                    ---This is a TEST----This is a TEST-----This is a TEST----This is a TEST----
                    
                    ---This is a test----This is a test-----This is a test----This is a test----This is a test----
                    ---This is a test----This is a test-----This is a test----This is a test----This is a test----This is a test-----
                    

                    Below, a list of the different steps in the genesis of the generic regex which should help you to understand this tricky regex !

                           ^  (     Part WITHOUT Hi  +  Hi) x 4 + Part WITHOUT Hi  $
                                    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯     ¯¯        ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                           |            |                |
                                           V            V                V
                    (?x-s) ^  (     (   (?! Hi ).)*?    Hi  ){4}   (   (?!Hi).)*   $   #  The INITIAL regex
                    
                    
                    (?x-s) ^  (?:   (?: (?! Hi ).)*?    Hi  ){4}   (?: (?!Hi).)*?  $   #  We change all GROUPS as NON-CAPTURING, add a LAZY quantifier and notice 2 IDENTICAL blocks !
                                    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯               ¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                 Gr1
                                  V
                    (?x-s) ^  (?: ( (?: (?!  Hi  ).)*? )  Hi  ){4} (?: (?!Hi).)*?  $   #  We create a NEW group 1 AROUND the FIRST block
                                             ––           ––       
                                 Gr1
                                  V
                    (?x-s) ^  (?: ( (?: (?!  Hi  ).)*? )  Hi  ){4}      (?1)       $   #  We replace the 2nd BLOCK by a SUB-ROUTINE CALL to GROUP 1 (?1)
                                             ¯¯           ¯¯
                                 Gr1        Gr2
                                  V         V
                    (?x-s) ^  (?: ( (?: (?! (Hi) ).)*? )  Hi  ){4}      (?1)       $   #  We create a NEW group 2 AROUND the FIRST string "Hi"
                    
                                 Gr1        Gr2
                                  V         V
                    (?x-s) ^  (?: ( (?: (?! (Hi) ).)*? ) (?2) ){4}      (?1)       $   #  We replace THE 2nd string "Hi" by a SUB-ROUTINE CALL to GROUP 2 (?2)
                    
                    
                    (?-s)^(?:((?:(?!(Hi)).)*?)(?2)){4}(?1)$                            #  We suppress the FREE-SPACING mode and DELETE any SPACE character
                    
                    
                    (?i-s)^(?:((?:(?!(Hi)).)*?)(?2)){4}(?1)$                           #  We add the CASE modifier
                    
                    
                    (?i-s)^(?:((?:(?!(\bHi\b)).)*?)(?2)){4}(?1)$                       #  We add the \b BOUNDARIES to get a WHOLE expression
                                                               ^
                                                               | 
                                                    For the 3 LAST regexes, STOP
                                                    the SELECTION at the $ sign
                    

                    You may test any of these regexes above, against the text below, pasted in a new tab. It matches only the line containing exactly four strings Hi !


                    ---Hi-----
                    ---Hi----Hi----
                    ---Hi----Hi-----Hi----
                    ---Hi----Hi-----Hi----Hi----
                    ---Hi----Hi-----Hi----Hi----Hi----
                    ---Hi----Hi-----Hi----Hi----Hi----Hi-----
                    

                    Best regards,

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • A Alan Kilborn referenced this topic on Jan 13, 2023, 6:18 PM
                    2 out of 9
                    • First post
                      2/9
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors