Community
    • 登入

    select lines with specific number of word it it

    已排程 已置頂 已鎖定 已移動 Help wanted · · · – – – · · ·
    9 貼文 5 Posters 1.3k 瀏覽
    正在載入更多貼文
    • 從舊到新
    • 從新到舊
    • 最多點贊
    回覆
    • 在新貼文中回覆
    登入後回覆
    此主題已被刪除。只有擁有主題管理權限的使用者可以查看。
    • dimitrovD
      dimitrov
      最後由 編輯

      hi i have a text file like this

      test test \ dfg 54645 Hi fgd 54645 Hi 3 Hi
      Hi 45 Hi 4454645 Hi fgd 54645 Hi 3 Hi

      and i want only have lines with 4 Hi in it( delete others something like that)

      Alan KilbornA 1 條回覆 最後回覆 回覆 引用 0
      • Alan KilbornA
        Alan Kilborn @dimitrov
        最後由 編輯

        @dimitrov

        General technique:

        1. Use Mark (Ctrl+m) function to Find what of 4 Hi after ticking Bookmark line checkbox.

        2. Use Search menu’s Bookmark submenu and choose Remove Unmarked Lines

        dimitrovD 1 條回覆 最後回覆 回覆 引用 2
        • dimitrovD
          dimitrov @Alan Kilborn
          最後由 編輯

          @Alan-Kilborn tnx for answering. i mean lines that have repeated Hi in it for 4 times

          Alan KilbornA PeterJonesP astrosofistaA 3 條回覆 最後回覆 回覆 引用 0
          • Alan KilbornA
            Alan Kilborn @dimitrov
            最後由 編輯

            @dimitrov

            Quality of “effort in” must be >= quality of “effort out”

            Anyone that abbreviates the problem statement like you did…deserves what you got as a reply. Cheers.

            1 條回覆 最後回覆 回覆 引用 1
            • PeterJonesP
              PeterJones @dimitrov
              最後由 編輯

              @dimitrov ,

              Following this advice will go a long way toward getting you better / more-complete answers.

              Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as plain text using the </> toolbar button or manual Markdown syntax. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get… Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

              1 條回覆 最後回覆 回覆 引用 2
              • astrosofistaA
                astrosofista @dimitrov
                最後由 編輯

                Hi @dimitrov, @Alan-Kilborn, @PeterJones

                If a two steps procedure is good enough for you, the following may solve your issue:

                Please do as @Alan-Kilborn told you above: mark lines with this expression

                Search: (?-s)(Hi.*?){4}
                

                and then Remove unmarked lines. You end up with lines with at least 4 Hi. In order to delete those which have 5 Hi or more, mark again lines but with this other expression

                Search:(?-s)(Hi.*?){5}
                

                This time, however, Remove marked lines via the Search menu.

                That’s all. If this procedure doesn’t solve your question, then follows @PeterJones advice.

                Have fun!

                1 條回覆 最後回覆 回覆 引用 1
                • guy038G
                  guy038
                  最後由 guy038 編輯

                  Hello @dimitrov, @alan-kilborn, @peterjones, @astrosofista and All,

                  Here are four generic regexes to solve general cases :

                  • To match lines containing your string, whatever the case, as a whole expression, the number of times, within the range :

                  (?i-s)^((?:(?!(\bYour string\b)).)*?)(?:(?1)(?2)){Your range}(?1)$

                  • To match lines containing your string, whatever the case, the number of times, within the range :

                  (?i-s)^((?:(?!(Your string)).)*?)(?:(?1)(?2)){Your range}(?1)$

                  • To match lines containing your string, with its exact case, as a whole expression, the number of times, within the range :

                  (?-is)^((?:(?!(\bYour string\b)).)*?)(?:(?1)(?2)){Your range}(?1)$

                  • To match lines containing your string, with its exact case, the number of times, within the range :

                  (?-is)^((?:(?!(Your string)).)*?)(?:(?1)(?2)){Your range}(?1)$


                  For instance, all the regexes, below, are valid :

                  (?i-s)^((?:(?!(\bHi\b)).)*?)(?:(?1)(?2)){3}(?1)$

                  (?i-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){3,5}(?1)$

                  (?i-s)^((?:(?!(\bThis is a test\b)).)*?)(?:(?1)(?2)){2,4}(?1)$

                  (?-is)^((?:(?!(\bThis is a test\b)).)*?)(?:(?1)(?2)){2,4}(?1)$

                  Just test them against this text below , pasted in a new tab :

                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                  ---Hi-----
                  ---Hi----Hi----
                  ---Hi----Hi-----Hi----
                  ---Hi----Hi-----Hi----Hi----
                  ---Hi----Hi-----Hi----Hi----Hi----
                  ---Hi----Hi-----Hi----Hi----Hi----Hi-----
                  
                  
                  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                  ---This is a test-----
                  ---This is a test----This is a test----
                  ---This is a test----This is a test-----This is a test----
                  
                  ---This is a test----This is a test-----This is a test----This is a test----
                  ---This is a TEST----This is a TEST-----This is a TEST----This is a TEST----
                  
                  ---This is a test----This is a test-----This is a test----This is a test----This is a test----
                  ---This is a test----This is a test-----This is a test----This is a test----This is a test----This is a test-----
                  

                  Nice, isn’t it ? Have you noticed that the second regex example finds also lines containing the sentence This is a test ? Logical, as the word This does contain the string hi ;-))

                  Best regards

                  guy038

                  1 條回覆 最後回覆 回覆 引用 3
                  • guy038G
                    guy038
                    最後由 guy038 編輯

                    Hi, All,

                    Here are the different steps in the genesis of the generic regexes of my previous post, principally in the free-spacing mode (?x) !

                    Hope it helps you to understand these tricky regexes ;-))

                                 Part WITHOUT Hi  (Part WITHOUT Hi + Hi) x 4  Part WITHOUT Hi
                                 ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯       ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                       |                  |          |             |
                                       V                  V          V             V
                    (?x-s) ^   (   (?!  Hi  ).)*    ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*    $   #  The INITIAL regex
                    
                    
                    (?x-s) ^   (   (?!  Hi  ).)*?   ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*?   $   #  We add a LAZY quantifier at TWO locations and notice 3 IDENTICAL blocks !
                               ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯     ¯¯¯¯¯¯¯¯¯¯¯             ¯¯¯¯¯¯¯¯¯¯¯
                            Gr1
                             V
                    (?x-s) ^ ( (   (?!  Hi  ).)*? ) ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*?   $   #  We create a NEW group 1 AROUND the FIRST block
                                                      ¯¯¯¯¯¯¯¯¯¯¯             ¯¯¯¯¯¯¯¯¯¯¯
                            Gr1                          |                          |
                             V                           V                          V
                    (?x-s) ^ ( (   (?!  Hi  ).)*? ) (   (?1)        Hi  ){4}       (?1)     $   #  We replace the 2nd and 3rd BLOCKS by a SUB-ROUTINE CALL to GROUP 1 (?1)
                    
                            Gr1
                             V
                    (?x-s) ^ ( (?: (?!  Hi  ).)*? ) (   (?1)        Hi  ){4}       (?1)     $   #  We change the PRESENT 2nd GROUP as a NON-CAPTURING group and notice 2 strings "Hi"
                                        ¯¯                          ¯¯
                             Gr1       Gr2
                             V         V
                    (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (   (?1)        Hi  ){4}       (?1)     $   #  We create a NEW group 2 around the FIRST string "Hi"
                                        ¯¯                          ¯¯
                             Gr1      Gr2                           |
                             V         V                            v
                    (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (   (?1)       (?2) ){4}       (?1)     $   #  We replace THE 2nd string "Hi" by a SUB-ROUTINE CALL to GROUP 2 (?2)
                    
                             Gr1      Gr2
                             V         V
                    (?x-s) ^ ( (?: (?! (Hi) ).)*? ) (?: (?1)       (?2) ){4}       (?1)     $   #  We change the 3rd PRESENT group as a NON-CAPTURING group
                    
                    
                    (?-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){4}(?1)$                                 #  We suppress the FREE-SPACING mode and DELETE any SPACE character
                    
                    
                    (?i-s)^((?:(?!(Hi)).)*?)(?:(?1)(?2)){4}(?1)$                                #  We add the CASE modifier
                    
                    
                    (?i-s)^((?:(?!(\bHi\b)).)*?)(?:(?1)(?2)){4}(?1)$                            #  We add the \b BOUNDARIES to get a WHOLE expression
                                                                   ^
                                                                   | 
                                                         For the 3 LAST regexes, STOP
                                                         the SELECTION at the $ sign
                    

                    You may test any of these regexes against the text below, pasted in a new tab : It should match only the line containing four strings Hi, only !

                    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                    ---Hi-----
                    ---Hi----Hi----
                    ---Hi----Hi-----Hi----
                    ---Hi----Hi-----Hi----Hi----
                    ---Hi----Hi-----Hi----Hi----Hi----
                    ---Hi----Hi-----Hi----Hi----Hi----Hi-----
                    

                    Cheers,

                    guy038

                    1 條回覆 最後回覆 回覆 引用 2
                    • guy038G
                      guy038
                      最後由 guy038 編輯

                      Hello @dimitrov, @alan-kilborn, @peterjones, @astrosofista and All,

                      Two days later, I realize that my generic regex could be shortened a bit ! Reading, again, what I said in my previous post :

                                   Part WITHOUT Hi  (Part WITHOUT Hi + Hi) x 4  Part WITHOUT Hi
                                   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯       ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                         |                  |          |             |
                                         V                  V          V             V
                      (?x-s) ^   (   (?!  Hi  ).)*    ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*    $   #  The INITIAL regex
                      

                      You may see, as I did, that the regex begins with “Part WITHOUT Hi” and ( “Part WITHOUT Hi” and “Hi” ) repeated four times. So, there is a redundant part in this expression !

                      The initial regex should be, simply :

                              ^  (Part WITHOUT Hi + Hi) x 4 + Part WITHOUT Hi  $
                                  ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯   ¯¯        ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                          |         |              |
                                          V         V              V
                      (?x-s)  ^     ( ((?!Hi).)*?   Hi  ){4}  ((?!Hi).)*       $   #  The INITIAL regex
                      

                      Thus, the four generic regexes to solve general cases are simplified :

                      • To match lines containing your string, whatever the case, as a whole expression, the number of times, within the range :

                      (?i-s)^(?:((?:(?!(\bYour string\b)).)*?)(?2)){Your range}(?1)$

                      • To match lines containing your string, whatever the case, the number of times, within the range :

                      (?i-s)^(?:((?:(?!(Your string)).)*?)(?2)){Your range}(?1)$

                      • To match lines containing your string, with its exact case, as a whole expression, the number of times, within the range :

                      (?-is)^(?:((?:(?!(\bYour string\b)).)*?)(?2)){Your range}(?1)$

                      • To match lines containing your string, with its exact case, the number of times, within the range :

                      (?-is)^(?:((?:(?!(Your string)).)*?)(?2)){Your range}(?1)$


                      For instance, all the regexes, below, are valid :

                      (?i-s)^(?:((?:(?!(\bHi\b)).)*?)(?2)){4}(?1)$

                      (?i-s)^(?:((?:(?!(Hi)).)*?)(?2)){3,5}(?1)$

                      (?i-s)^(?:((?:(?!(\bThis is a test\b)).)*?)(?2)){2,4}(?1)$

                      (?-is)^(?:((?:(?!(\bThis is a test\b)).)*?)(?2)){2,4}(?1)$

                      Just test them against this text below, pasted in a new tab : :

                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                      ---Hi-----
                      ---Hi----Hi----
                      ---Hi----Hi-----Hi----
                      ---Hi----Hi-----Hi----Hi----
                      ---Hi----Hi-----Hi----Hi----Hi----
                      ---Hi----Hi-----Hi----Hi----Hi----Hi-----
                      
                      
                      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                      ---This is a test-----
                      ---This is a test----This is a test----
                      ---This is a test----This is a test-----This is a test----
                      
                      ---This is a test----This is a test-----This is a test----This is a test----
                      ---This is a TEST----This is a TEST-----This is a TEST----This is a TEST----
                      
                      ---This is a test----This is a test-----This is a test----This is a test----This is a test----
                      ---This is a test----This is a test-----This is a test----This is a test----This is a test----This is a test-----
                      

                      Below, a list of the different steps in the genesis of the generic regex which should help you to understand this tricky regex !

                             ^  (     Part WITHOUT Hi  +  Hi) x 4 + Part WITHOUT Hi  $
                                      ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯     ¯¯        ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                             |            |                |
                                             V            V                V
                      (?x-s) ^  (     (   (?! Hi ).)*?    Hi  ){4}   (   (?!Hi).)*   $   #  The INITIAL regex
                      
                      
                      (?x-s) ^  (?:   (?: (?! Hi ).)*?    Hi  ){4}   (?: (?!Hi).)*?  $   #  We change all GROUPS as NON-CAPTURING, add a LAZY quantifier and notice 2 IDENTICAL blocks !
                                      ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯               ¯¯¯¯¯¯¯¯¯¯¯¯¯¯
                                   Gr1
                                    V
                      (?x-s) ^  (?: ( (?: (?!  Hi  ).)*? )  Hi  ){4} (?: (?!Hi).)*?  $   #  We create a NEW group 1 AROUND the FIRST block
                                               ––           ––       
                                   Gr1
                                    V
                      (?x-s) ^  (?: ( (?: (?!  Hi  ).)*? )  Hi  ){4}      (?1)       $   #  We replace the 2nd BLOCK by a SUB-ROUTINE CALL to GROUP 1 (?1)
                                               ¯¯           ¯¯
                                   Gr1        Gr2
                                    V         V
                      (?x-s) ^  (?: ( (?: (?! (Hi) ).)*? )  Hi  ){4}      (?1)       $   #  We create a NEW group 2 AROUND the FIRST string "Hi"
                      
                                   Gr1        Gr2
                                    V         V
                      (?x-s) ^  (?: ( (?: (?! (Hi) ).)*? ) (?2) ){4}      (?1)       $   #  We replace THE 2nd string "Hi" by a SUB-ROUTINE CALL to GROUP 2 (?2)
                      
                      
                      (?-s)^(?:((?:(?!(Hi)).)*?)(?2)){4}(?1)$                            #  We suppress the FREE-SPACING mode and DELETE any SPACE character
                      
                      
                      (?i-s)^(?:((?:(?!(Hi)).)*?)(?2)){4}(?1)$                           #  We add the CASE modifier
                      
                      
                      (?i-s)^(?:((?:(?!(\bHi\b)).)*?)(?2)){4}(?1)$                       #  We add the \b BOUNDARIES to get a WHOLE expression
                                                                 ^
                                                                 | 
                                                      For the 3 LAST regexes, STOP
                                                      the SELECTION at the $ sign
                      

                      You may test any of these regexes above, against the text below, pasted in a new tab. It matches only the line containing exactly four strings Hi !


                      ---Hi-----
                      ---Hi----Hi----
                      ---Hi----Hi-----Hi----
                      ---Hi----Hi-----Hi----Hi----
                      ---Hi----Hi-----Hi----Hi----Hi----
                      ---Hi----Hi-----Hi----Hi----Hi----Hi-----
                      

                      Best regards,

                      guy038

                      1 條回覆 最後回覆 回覆 引用 0
                      • Alan KilbornA Alan Kilborn referenced this topic on
                      • 第一個貼文
                        最後的貼文
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors