Community
    • Login

    select lines with specific number of word it it

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    9 Posts 5 Posters 807 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dimitrovD
      dimitrov
      last edited by

      hi i have a text file like this

      test test \ dfg 54645 Hi fgd 54645 Hi 3 Hi
      Hi 45 Hi 4454645 Hi fgd 54645 Hi 3 Hi

      and i want only have lines with 4 Hi in it( delete others something like that)

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @dimitrov
        last edited by

        @dimitrov

        General technique:

        1. Use Mark (Ctrl+m) function to Find what of 4 Hi after ticking Bookmark line checkbox.

        2. Use Search menu’s Bookmark submenu and choose Remove Unmarked Lines

        dimitrovD 1 Reply Last reply Reply Quote 2
        • dimitrovD
          dimitrov @Alan Kilborn
          last edited by

          @Alan-Kilborn tnx for answering. i mean lines that have repeated Hi in it for 4 times

          Alan KilbornA PeterJonesP astrosofistaA 3 Replies Last reply Reply Quote 0
          • Alan KilbornA
            Alan Kilborn @dimitrov
            last edited by

            @dimitrov

            Quality of “effort in” must be >= quality of “effort out”

            Anyone that abbreviates the problem statement like you did…deserves what you got as a reply. Cheers.

            1 Reply Last reply Reply Quote 1
            • PeterJonesP
              PeterJones @dimitrov
              last edited by

              @dimitrov ,

              Following this advice will go a long way toward getting you better / more-complete answers.

              Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as plain text using the </> toolbar button or manual Markdown syntax. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get… Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

              1 Reply Last reply Reply Quote 2
              • astrosofistaA
                astrosofista @dimitrov
                last edited by

                Hi @dimitrov, @Alan-Kilborn, @PeterJones

                If a two steps procedure is good enough for you, the following may solve your issue:

                Please do as @Alan-Kilborn told you above: mark lines with this expression

                Search: (?-s)(Hi.*?){4}
                

                and then Remove unmarked lines. You end up with lines with at least 4 Hi. In order to delete those which have 5 Hi or more, mark again lines but with this other expression

                Search:(?-s)(Hi.*?){5}
                

                This time, however, Remove marked lines via the Search menu.

                That’s all. If this procedure doesn’t solve your question, then follows @PeterJones advice.

                Have fun!

                1 Reply Last reply Reply Quote 1
                • guy038G
                  guy038
                  last edited by guy038

                  Hello @dimitrov, @alan-kilborn, @peterjones, @astrosofista and All,

                  Here are four generic regexes to solve general cases :

                  • To match lines containing your string, whatever the case, as a whole expression, the number of times, within the range :

                  (?i-s)^((?:(?!(\bYour string\b)).)*?)(?:(?1)(?2)){Your range}(?1)$

                  • To match lines containing your string, whatever the case, the number of times, within the range :

                  (?i-s)^((?:(?!(Your string)).)*?)(?:(?1)(?2)){Your range}(?1)$

                  • To match lines containing your string, with its exact case, as a whole expression, the number of times, within the range :

                  (?-is)^((?:(?!(\bYour string\b)).)*?)(?:(?1)(?2)){Your range}(?1)$

                  • To match lines containing your string, with its exact case, the number of times, within the range :

                  (?-is)^((?:(?!(Your string)).)*?)(?:(?1)(?2)){Your range}(?1)$


                  For instance, all the regexes, below, are valid :

                  (?i-s)^((?:(?!(\bHi\b)).)*?)(?:(?1)(?2)){3}(?1)$

                  (?i-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){3,5}(?1)$

                  (?i-s)^((?:(?!(\bThis is a test\b)).)*?)(?:(?1)(?2)){2,4}(?1)$

                  (?-is)^((?:(?!(\bThis is a test\b)).)*?)(?:(?1)(?2)){2,4}(?1)$

                  Just test them against this text below , pasted in a new tab :

                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                  ---Hi-----
                  ---Hi----Hi----
                  ---Hi----Hi-----Hi----
                  ---Hi----Hi-----Hi----Hi----
                  ---Hi----Hi-----Hi----Hi----Hi----
                  ---Hi----Hi-----Hi----Hi----Hi----Hi-----
                  
                  
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                  ---This is a test-----
                  ---This is a test----This is a test----
                  ---This is a test----This is a test-----This is a test----
                  
                  ---This is a test----This is a test-----This is a test----This is a test----
                  ---This is a TEST----This is a TEST-----This is a TEST----This is a TEST----
                  
                  ---This is a test----This is a test-----This is a test----This is a test----This is a test----
                  ---This is a test----This is a test-----This is a test----This is a test----This is a test----This is a test-----
                  

                  Nice, isn’t it ? Have you noticed that the second regex example finds also lines containing the sentence This is a test ? Logical, as the word This does contain the string hi ;-))

                  Best regards

                  guy038

                  1 Reply Last reply Reply Quote 3
                  • guy038G
                    guy038
                    last edited by guy038

                    Hi, All,

                    Here are the different steps in the genesis of the generic regexes of my previous post, principally in the free-spacing mode (?x) !

                    Hope it helps you to understand these tricky regexes ;-))

                                 Part WITHOUT Hi  (Part WITHOUT Hi + Hi) x 4  Part WITHOUT Hi
                                 ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯       ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                       |                  |          |             |
                                       V                  V          V             V
                    (?x-s) ^   (   (?!  Hi  ).)*    ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*    $   #  The INITIAL regex
                    
                    
                    (?x-s) ^   (   (?!  Hi  ).)*?   ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*?   $   #  We add a LAZY quantifier at TWO locations and notice 3 IDENTICAL blocks !
                               ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯     ¯¯¯¯¯¯¯¯¯¯¯             ¯¯¯¯¯¯¯¯¯¯¯
                            Gr1
                             V
                    (?x-s) ^ ( (   (?!  Hi  ).)*? ) ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*?   $   #  We create a NEW group 1 AROUND the FIRST block
                                                      ¯¯¯¯¯¯¯¯¯¯¯             ¯¯¯¯¯¯¯¯¯¯¯
                            Gr1                          |                          |
                             V                           V                          V
                    (?x-s) ^ ( (   (?!  Hi  ).)*? ) (   (?1)        Hi  ){4}       (?1)     $   #  We replace the 2nd and 3rd BLOCKS by a SUB-ROUTINE CALL to GROUP 1 (?1)
                    
                            Gr1
                             V
                    (?x-s) ^ ( (?: (?!  Hi  ).)*? ) (   (?1)        Hi  ){4}       (?1)     $   #  We change the PRESENT 2nd GROUP as a NON-CAPTURING group and notice 2 strings "Hi"
                                        ¯¯                          ¯¯
                             Gr1       Gr2
                             V         V
                    (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (   (?1)        Hi  ){4}       (?1)     $   #  We create a NEW group 2 around the FIRST string "Hi"
                                        ¯¯                          ¯¯
                             Gr1      Gr2                           |
                             V         V                            v
                    (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (   (?1)       (?2) ){4}       (?1)     $   #  We replace THE 2nd string "Hi" by a SUB-ROUTINE CALL to GROUP 2 (?2)
                    
                             Gr1      Gr2
                             V         V
                    (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (?: (?1)       (?2) ){4}       (?1)     $   #  We change the 3rd PRESENT group as a NON-CAPTURING group
                    
                    
                    (?-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){4}(?1)$                                 #  We suppress the FREE-SPACING mode and DELETE any SPACE character
                    
                    
                    (?i-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){4}(?1)$                                #  We add the CASE modifier
                    
                    
                    (?i-s)^((?:(?!(\bHi\b)).)*?)(?:(?1)(?2)){4}(?1)$                            #  We add the \b BOUNDARIES to get a WHOLE expression
                                                                   ^
                                                                   | 
                                                         For the 3 LAST regexes, STOP
                                                         the SELECTION at the $ sign
                    

                    You may test any of these regexes against the text below, pasted in a new tab : It should match only the line containing four strings Hi, only !

                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    ---Hi-----
                    ---Hi----Hi----
                    ---Hi----Hi-----Hi----
                    ---Hi----Hi-----Hi----Hi----
                    ---Hi----Hi-----Hi----Hi----Hi----
                    ---Hi----Hi-----Hi----Hi----Hi----Hi-----
                    

                    Cheers,

                    guy038

                    1 Reply Last reply Reply Quote 2
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello @dimitrov, @alan-kilborn, @peterjones, @astrosofista and All,

                      Two days later, I realize that my generic regex could be shortened a bit ! Reading, again, what I said in my previous post :

                                   Part WITHOUT Hi  (Part WITHOUT Hi + Hi) x 4  Part WITHOUT Hi
                                   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯       ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                         |                  |          |             |
                                         V                  V          V             V
                      (?x-s) ^   (   (?!  Hi  ).)*    ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*    $   #  The INITIAL regex
                      

                      You may see, as I did, that the regex begins with “Part WITHOUT Hi” and ( “Part WITHOUT Hi” and “Hi” ) repeated four times. So, there is a redundant part in this expression !

                      The initial regex should be, simply :

                              ^  (Part WITHOUT Hi + Hi) x 4 + Part WITHOUT Hi  $
                                  ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯        ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                          |         |              |
                                          V         V              V
                      (?x-s)  ^     ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*       $   #  The INITIAL regex
                      

                      Thus, the four generic regexes to solve general cases are simplified :

                      • To match lines containing your string, whatever the case, as a whole expression, the number of times, within the range :

                      (?i-s)^(?:((?:(?!(\bYour string\b)).)*?)(?2)){Your range}(?1)$

                      • To match lines containing your string, whatever the case, the number of times, within the range :

                      (?i-s)^(?:((?:(?!(Your string)).)*?)(?2)){Your range}(?1)$

                      • To match lines containing your string, with its exact case, as a whole expression, the number of times, within the range :

                      (?-is)^(?:((?:(?!(\bYour string\b)).)*?)(?2)){Your range}(?1)$

                      • To match lines containing your string, with its exact case, the number of times, within the range :

                      (?-is)^(?:((?:(?!(Your string)).)*?)(?2)){Your range}(?1)$


                      For instance, all the regexes, below, are valid :

                      (?i-s)^(?:((?:(?!(\bHi\b)).)*?)(?2)){4}(?1)$

                      (?i-s)^(?:((?:(?!(Hi)).)*?)(?2)){3,5}(?1)$

                      (?i-s)^(?:((?:(?!(\bThis is a test\b)).)*?)(?2)){2,4}(?1)$

                      (?-is)^(?:((?:(?!(\bThis is a test\b)).)*?)(?2)){2,4}(?1)$

                      Just test them against this text below, pasted in a new tab : :

                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                      ---Hi-----
                      ---Hi----Hi----
                      ---Hi----Hi-----Hi----
                      ---Hi----Hi-----Hi----Hi----
                      ---Hi----Hi-----Hi----Hi----Hi----
                      ---Hi----Hi-----Hi----Hi----Hi----Hi-----
                      
                      
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                      ---This is a test-----
                      ---This is a test----This is a test----
                      ---This is a test----This is a test-----This is a test----
                      
                      ---This is a test----This is a test-----This is a test----This is a test----
                      ---This is a TEST----This is a TEST-----This is a TEST----This is a TEST----
                      
                      ---This is a test----This is a test-----This is a test----This is a test----This is a test----
                      ---This is a test----This is a test-----This is a test----This is a test----This is a test----This is a test-----
                      

                      Below, a list of the different steps in the genesis of the generic regex which should help you to understand this tricky regex !

                             ^  (     Part WITHOUT Hi  +  Hi) x 4 + Part WITHOUT Hi  $
                                      ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯     ¯¯        ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                             |            |                |
                                             V            V                V
                      (?x-s) ^  (     (   (?! Hi ).)*?    Hi  ){4}   (   (?!Hi).)*   $   #  The INITIAL regex
                      
                      
                      (?x-s) ^  (?:   (?: (?! Hi ).)*?    Hi  ){4}   (?: (?!Hi).)*?  $   #  We change all GROUPS as NON-CAPTURING, add a LAZY quantifier and notice 2 IDENTICAL blocks !
                                      ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯               ¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                   Gr1
                                    V
                      (?x-s) ^  (?: ( (?: (?!  Hi  ).)*? )  Hi  ){4} (?: (?!Hi).)*?  $   #  We create a NEW group 1 AROUND the FIRST block
                                               ––           ––       
                                   Gr1
                                    V
                      (?x-s) ^  (?: ( (?: (?!  Hi  ).)*? )  Hi  ){4}      (?1)       $   #  We replace the 2nd BLOCK by a SUB-ROUTINE CALL to GROUP 1 (?1)
                                               ¯¯           ¯¯
                                   Gr1        Gr2
                                    V         V
                      (?x-s) ^  (?: ( (?: (?! (Hi) ).)*? )  Hi  ){4}      (?1)       $   #  We create a NEW group 2 AROUND the FIRST string "Hi"
                      
                                   Gr1        Gr2
                                    V         V
                      (?x-s) ^  (?: ( (?: (?! (Hi) ).)*? ) (?2) ){4}      (?1)       $   #  We replace THE 2nd string "Hi" by a SUB-ROUTINE CALL to GROUP 2 (?2)
                      
                      
                      (?-s)^(?:((?:(?!(Hi)).)*?)(?2)){4}(?1)$                            #  We suppress the FREE-SPACING mode and DELETE any SPACE character
                      
                      
                      (?i-s)^(?:((?:(?!(Hi)).)*?)(?2)){4}(?1)$                           #  We add the CASE modifier
                      
                      
                      (?i-s)^(?:((?:(?!(\bHi\b)).)*?)(?2)){4}(?1)$                       #  We add the \b BOUNDARIES to get a WHOLE expression
                                                                 ^
                                                                 | 
                                                      For the 3 LAST regexes, STOP
                                                      the SELECTION at the $ sign
                      

                      You may test any of these regexes above, against the text below, pasted in a new tab. It matches only the line containing exactly four strings Hi !


                      ---Hi-----
                      ---Hi----Hi----
                      ---Hi----Hi-----Hi----
                      ---Hi----Hi-----Hi----Hi----
                      ---Hi----Hi-----Hi----Hi----Hi----
                      ---Hi----Hi-----Hi----Hi----Hi----Hi-----
                      

                      Best regards,

                      guy038

                      1 Reply Last reply Reply Quote 0
                      • Alan KilbornA Alan Kilborn referenced this topic on
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors