Community
    • 登入

    Notepad++ String Search

    已排程 已置頂 已鎖定 已移動 Help wanted · · · – – – · · ·
    16 貼文 3 Posters 12.2k 瀏覽
    正在載入更多貼文
    • 從舊到新
    • 從新到舊
    • 最多點贊
    回覆
    • 在新貼文中回覆
    登入後回覆
    此主題已被刪除。只有擁有主題管理權限的使用者可以查看。
    • dario33D
      dario33
      最後由 編輯

      I think no, cause the content of every textstring in the whole text is in every case different. In the text are more textstrings that i have to get (about 4). Marking by hand works and is the thing i do right now, but it is annoying when there is an easier way

      1 條回覆 最後回覆 回覆 引用 0
      • guy038G
        guy038
        最後由 guy038 編輯

        Hello Dario33,

        So, seemingly, you would like to get all text, between the two boundaries "ais":" and "},{"drg2" and delete any character, which lies outside these boundaries, as for instance :

        Text to delete"ais":"Text 1 to keep"},{"drg2"Other text to delete"ais":"Text 2 to keep"},{"drg2"Again, a text to delete"ais":"Text 3 to keep"},{"drg2"Final text to delete

        And afterwards, I suppose that your would, ONLY, get the list of all the parts of text, in several lines, as below :

        Text 1 to keep
        Text 2 to keep
        Text 3 to keep

        If so, I think that a search/replacement, with regular expressions, should do the job ! but I’d better verify my assertion with a real text. So, if you don’t mind and, of course, if your file , or part of it, is not confidential, you could send me a quick e-mail, with some example text, as an attached file, so that I’ll be able to test some regexes, against your encrypted text :-)

        My e-mail address is

        See you later,

        Best Regards,

        guy038

        1 條回覆 最後回覆 回覆 引用 1
        • dario33D
          dario33
          最後由 編輯

          That’s it guy038, many thanks for the explanations! I can’t describe it so excellent as you do! The text isn’t encrypted and also no special secret, but it is too large to post it here so i uploaded the textfile here:

          [link text]http://www.filedropper.com/sampletext(link url)

          TIA!

          1 條回覆 最後回覆 回覆 引用 0
          • guy038G
            guy038
            最後由 guy038 編輯

            Hi Dario33,

            • First, I downloaded your file sampletext.txt without any problem : I get a one-line text of 32376 bytes, exactly, without any End of Line character

            • Secondly, I tried to identify your two boundaries "ais":" and "},{"drg2" : I did find 12 occurrences of the boundary "ais":". Unfortunately, there NO ONE occurrence of the boundary "},{"drg2", in your file. Nevertheless, I succeeded to identify 12 occurrences, exactly, of the boundary "drg2":". So I assume that “drg2”:" is the right ending boundary !!

            So, just follow the few steps, below :

            • Open your sampletext.txt, in Notepad++

            • Go back to the very beginning of your file ( IMPORTANT )

            • Open the Replace dialog ( CTRL + H )

            • Set the Regular expression search mode

            • Type in the two zones :

              Find what : “drg2”:“.*?“ais”:”

              Replace with : \r\n

            • Click on the Replace All button

            This FIRST S/R delete any text, between an ending boundary and the closest next opening boundary, as well as the boundaries, themselves and replace it with the two Windows End of Line characters \r\n.

            Note : If you currently use Unix files, just use the End of Line character \n !

            • Now, change the Find what and Replace with zones into :

              Find what : ^.?“ais”:“|“drg2”:”.?$

              Replace with : Leave EMPTY

            • Click, again, on the Replace All button ( The cursor location shouldn’t have changed, after the first S/R )

            This SECOND S/R delete any text, between the very beginning of this long line and the remaining opening boundary "ais":" OR between the remaining ending boundary "drg2":" and the very end of this long line.

            Et voilà ! You should obtain 12 lines that represent, ONLY, the text that was present, in your original text, between the two boundaries :-) Quite logical, as there were 12 opening boundaries and 12 ending boundaries !

            Notes :

            • It’s not easy to gather these two regexes in a bigger one, because of the special behaviour of the ^ assertion, that means Beginning of Line. Indeed, as long as you just perform a search, there’s no trouble : the beginning of each line does NOT change. However, when successive replacements are performed, the beginning of each line is updated, each time and is NOT the same as it was, before all these S/R. Thus, it’s safer to split the job into two S/R :-))

            • Dario33, I did a test, copying all your text twice, getting a two-lines text. After the two consecutive S/R, I got, as expected, 24 lines :-)

            Cheers,

            guy038

            1 條回覆 最後回覆 回覆 引用 2
            • dario33D
              dario33
              最後由 編輯

              Hi guy038, THANK YOU SO MUCH! You’re life safer :) This code is really awesome!!

              One question i still have, please: Is it possible to set before the extracted textlines a fixed wordstring like ‘Sentence’ or so? TIA!

              1 條回覆 最後回覆 回覆 引用 0
              • guy038G
                guy038
                最後由 guy038 編輯

                Hi Dario33,

                No problem ! Just insert, in replacement of the first S/R, after the End of Line characters \r\n, the string "Sentence : ", like below :

                Find what      :     "drg2":".*?"ais":"
                
                Replace with   :     \r\nSentence :                 ( with a SPACE after the COLON )
                

                Cheers,

                guy038

                1 條回覆 最後回覆 回覆 引用 1
                • dario33D
                  dario33
                  最後由 編輯

                  Hi guy038, it works perfect, thank you so much again!! As i understand, one can extract the wanted textline by delimiting.

                  But what is in the case, i only know that the first zone “ais” and some of the second zone like “drg2”, “aim” or “bbk” for example.
                  Can i seperate like following to get all of them: “drg2”:“.*?“ais”:” plus “aim” plus “bbk” (also a command which includes near the drg2 also the seperating of aim and bbk) ? Sorry if this question is too silly ;)

                  1 條回覆 最後回覆 回覆 引用 0
                  • guy038G
                    guy038
                    最後由 guy038 編輯

                    Hello Dario33,

                    Right now, it’s about 12.30 a.m., in France and, as soon as I was awoken at about 10h30, after a good night( It’s the week-end anyway ! ), I understood that the regex, given in my previous post, was NOT exact :-((

                    So, the FIRST S/R is, as I said :

                    Find what    :       "drg2":".*?"ais":"
                    
                    Replace with :       \r\nSentence :                 ( with a SPACE after the COLON )
                    

                    but, the SECOND S/R should be :

                    Find what    :       (^.*?"ais":")|"drg2":".*?$
                    
                    Replace with :       (?1Sentence \: )               ( with a SPACE, before and after the TWO characters \: )
                    

                    Notes :

                    • The boundaries "ais":" and "drg2":" are only literal strings, to be matched

                    • The dot stands for any character, different from the End of Line characters \r and \n, as well as the Form Feed character \f

                    • The star means any repetition, even 0, of the previous character ( the dot )

                    • To understand the role of the question mark, after the star, I think that a short example will be better than a long speech !

                    Suppose the given subject string : “This is an example for the meaning of the question mark symbol”. Then :

                    • The regex a.*r would match the string “an example for the meaning of the question mar”

                    • The regex a.*?r would match the string “an example for”

                    So, in our example :

                    • The form .* means the longest range of characters, even empty, between the letters a and r

                    • The form .*? means the shortest range of characters, even empty, between the letters a and r


                    • In the second S/R, the ^ assertion means beginning of line and the $ assertion means End of line

                    • The | symbol represents an alternation between two regexes. As it has the lowest priority level, you’ll need, sometimes, to enclose the two parts of an alternation, inside round brackets. For instance :

                      • The regex abc(123|789)xyz would match the strings abc123xyz OR abc789xyz

                      • The regex abc123|789xyz would match the strings abc123 OR 789xyz

                    • The round brackets generate a group that contains the regex ^.*?"ais":" ( i.e. all the text between the beginning of a line and the first “ais”:" boundary )

                    • Finally, in the replacement part, the syntax (?1Sentence \: ) represents a conditional replacement. Its general form is (?nText if TRUE:Text if FALSE), with n as a digit, that means :

                      • If the group n EXISTS ( TRUE ), the text Text if TRUE is re-written

                      • If the group n does NOT exist ( FALSE ), the text Text if FALSE is re-written

                    So, in our example, the string Sentence : is added if the regex ^.*?"ais":" is matched and nothing is written when the regex "drg2":".*?$ is matched

                    • Note that the colon is escaped with the \ symbol, because it’s has a special meaning, in replacement part

                    As for your question, about several boundaries :

                    If I use the syntax Bn, for a beginning boundary n, En, for an ending boundary n, B0 for a single opening boundary and E0 for a single ending boundary, which kind of organization would you like to ?

                    ......B1.......E1..................B2.........E2.........B1.........E1...........B3.......E3..........
                    

                    or perhaps :

                    ................B0.........E1...............B0............E2.........B0.................E3............
                    

                    or, maybe :

                    ........B1.............E0........B2......................E0..................B3...........E0..........
                    

                    Of course, if the following case, below, would happen :

                    ........B1...........B2.................E1...........E2................
                    

                    There’s an ambiguity and the regex would, probably, consider the range B1 - E1, and NOT the range B2 - E2 !!


                    So just tell me all the strings, used as opening and ending boundaries and their different location, in the simple way, as above and I’ll try to built the right regexes :-)

                    Cheers,

                    guy038

                    1 條回覆 最後回覆 回覆 引用 2
                    • dario33D
                      dario33
                      最後由 編輯

                      thx you so much for the fast clarification - will try out it later. You rock really with this helpful codes!!

                      1 條回覆 最後回覆 回覆 引用 0
                      • dario33D
                        dario33
                        最後由 編輯

                        Hi guy038, this all works fine for me - thx so much again! For the several boundary issue, i have to think over, i have a wrong explanation for it! I think this question is soled!

                        1 條回覆 最後回覆 回覆 引用 0
                        • 第一個貼文
                          最後的貼文
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors