Community
    • Login

    Marked text manipulation

    Scheduled Pinned Locked Moved General Discussion
    regexmarkregexdelete textcopy
    50 Posts 12 Posters 52.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Scott SumnerS
      Scott Sumner @Suncatcher
      last edited by Scott Sumner

      @Suncatcher (& @guy038) :

      At one time I had a need to do what I think you are wanting to do with the Copy part, and I accomplished it with a short Pythonscript…I don’t know if you use Pythonscript, but here’s how I did it–I don’t believe there is a way to do it natively with Notepad++:

      SCE_UNIVERSAL_FOUND_STYLE = 31  # N++ red-"mark" feature highlighting style indicator number
      accum_text = ''
      pos = 0
      while pos < editor.getTextLength():
          start = editor.indicatorStart(SCE_UNIVERSAL_FOUND_STYLE, pos)
          if start != 0:
              end = editor.indicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, pos)
              if editor.indicatorValueAt(SCE_UNIVERSAL_FOUND_STYLE, pos) != 0:
                  accum_text += editor.getTextRange(start, end) + '\r\n'  # assumes want each match on new line; assumes Windows line-endings
                  pos += (end - start) - 1
          pos += 1
      doc_length = editor.getTextLength()
      curr_pos = editor.getCurrentPos()
      editor.beginUndoAction()
      editor.insertText(curr_pos, accum_text)
      editor.setSel(curr_pos, curr_pos + editor.getTextLength() - doc_length)
      editor.endUndoAction()
      editor.copy()  # put results in clipboard
      editor.undo()
      

      I should also say that this presumes that you use Notepad++'s “Mark” feature to “red-mark” your desired matches.

      1 Reply Last reply Reply Quote 2
      • guy038G
        guy038
        last edited by guy038

        Hi, Suncatcher and Scott,

        UPDATE : Post-Scriptum, at the end of that post, added on 11/20/16

        Oh, Scott, now, I see ! I’ve just run your Python script, which worked fine ! So, sorry, Suncatcher, for misunderstanding your problem ! Anyway, my previous post could be useful to some “newby” people about regexes !

        Well, but, Scott, you know my “nature”, by now, don’t you ? So, as a challenge, I tried to find out a regex equivalent to your script ;-) And I, finally got this following S/R :

        SEARCH (?s)^.*?(Your regex to match)|(?s).*\z

        REPLACE (?1\1\r\n)


        IMPORTANT :

        • Replace the expression Your regex to match by any string or regex expression

        • If you regex contains the dot symbol and tries to match text, in a single line, only, you MUST add the negative modifier (?-s), BEFORE your regex, in order to stop the action of the (?s) modifier, located at beginning of the overall regex !


        NOTES :

        • The first part (?s)^.*? matches the shortest range of characters, even null or on multi-lines, till the text that YOUR regex matches

        • The second part (Your regex to match), enclosed in round brackets, is the range of text, matched by YOUR regex, which is stored as group 1

        • The third and final part |(?s).*\z is a second alternative, which matches all the remaining text, even null, till the very end of the current file, when the first alternative cannot be matched, anymore !

        • In replacement, if group 1 exists ( first alternative ) this group is re-written, followed by a line break ( \r\n )

        • If group 1 does not exist ( case of the second alternative ), nothing occurs. So, as expected, the remaining text is, then, deleted


        EXAMPLE :

        • In the license.txt file, let suppose that you would like to extract all the longest single-line strings, matched by the regex (a.*j), in an insensitive way (?i), and replace all the contents of license.txt by these ranges of text, only, one per line. If so :

        • Go back to the very beginning of your current file ( Ctrl + Origin )

        • Open the Replace dialog ( Ctrl + H )

        • SEARCH (?si)^.*?(?-s)(a.*j)|(?s).*\z

        • REPLACE (?1\1\r\n)

        • Select the Regular expression search mode

        • Click on the Replace All button

        Et voilà !


        Remarks :

        • As said above, note the necessity to add, in that case, the negative modifier (?-s), in front of our regex a.*j

        • If you mark all the matches of the regex a.*j, in an insensitive way, you obtain 6 matches

        • Using your script, Scott, or performing the regex S/R does give the same results :-))

        Best Regards,

        guy038

        P.S. :

        You must be aware about TWO points :

        • If your regex, already, contains one or several capturing groups, you MUST increase by ONE the number of each capturing group, used in your OWN search regex, as, in my global regex, I use round brackets to define the first group 1 !!

        For instance, let’s suppose that you’re searching all the consecutive duplicate lines, in a sorted file, in order to extract them, leading to the common searched regex (?-s)^(.+\R)\1+. Once, inserted in the overall regex, the correct syntax is, therefore :

        (?s)^.*?((?-s)^(.+\R)\2+)|(?s).*\z ( and NOT (?s)^.*?((?-s)^(.+\R)\1+)|(?s).*\z ! )

        • If your regex matches complete lines, with their EOL characters, you’ll probably change the global replacement to the regex (?1\1) ( instead of (?1\1\r\n) ), unless you prefer to space any matched expression by a blank line separator !
        Scott SumnerS 1 Reply Last reply Reply Quote 1
        • Scott SumnerS
          Scott Sumner @guy038
          last edited by

          @guy038

          Very nice regex solution to the problem. Apparently I was in error when I stated “…there is (no) way to do it natively with Notepad++”. I always prefer a solution that doesn’t require anything “extra”–while Pythonscript is a great thing, I do realize that not everyone is willing to take the plunge into using it.

          Sometimes I’m happy to be wrong! :-))

          Kashif RanaK 1 Reply Last reply Reply Quote 0
          • SuncatcherS
            Suncatcher
            last edited by Suncatcher

            @PeterJones said:

            If you just want to copy and paste a single match

            The problem is that I want to copy not a single match but the whole set of result matches. Here is the sample scenario:

            I entered RegExp pattern which matched and highlighted multiple pieces of text. How do I copy all them in one click?

            Scott SumnerS 1 Reply Last reply Reply Quote 1
            • guy038G
              guy038
              last edited by guy038

              Hello, Suncatcher,

              Why don’t you use the Bookmark feature ?

              So :

              • Go back to the very beginning of your list of IP addresses ( Ctrl + Origin )

              • Open the Mark dialog ( Search > Mark… )

              • Enter your regex in the Find what: zone

              • Check the Bookmark line option

              • Click of the Mark All button

              • Select the menu option Search > Bookmark > Copy Bookmarked Lines

              • Open a new tab ( Ctrl + N )

              • Paste all the results ( Ctrl + V )

              Best Regards

              guy038

              SuncatcherS 1 Reply Last reply Reply Quote 0
              • Scott SumnerS
                Scott Sumner @Suncatcher
                last edited by

                @Suncatcher said:

                How do I copy all them in one click?

                In one click? Not possible [to the best of my knowledge…always have to add that disclaimer :-) ].

                The closest thing I can think of would be “one keypress”, which would involve tying the Pythonscript I provided before to a shortcut-mapper key-combination, so that it gets executed when the key-combo is pressed.

                1 Reply Last reply Reply Quote 0
                • SuncatcherS
                  Suncatcher @guy038
                  last edited by Suncatcher

                  @guy038 bookmark feature copies the whole line, and I need only the certain pieces of line, as you can see from the screenshot.
                  If I used bookmarking it’ll copy all lines (because I have matches on each line) and that’s not what I want. I want just highlighted pieces.

                  @Scott-Sumner could you provide the sample of such script?

                  Scott SumnerS 1 Reply Last reply Reply Quote 0
                  • Scott SumnerS
                    Scott Sumner @Suncatcher
                    last edited by

                    @Suncatcher

                    In my immediately previous post, I said “the Pythonscript I provided before”. What I meant by that, if it was unclear, was to scroll up in this thread to a previous posting by me. It starts with “At one time I had a need to do” and contains the code I was talking about.

                    SuncatcherS 1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, Suncatcher,

                      OK, I see what you mean ! But, how would you like that the extracted matches were displayed and which separator do you need between these different matches ?

                      For instance, from your picture :

                      • In line 2, do you want : “134.170.110.” OR “134. 170. 110.” OR, even, the string “134 170 110”, without dot OR else ?

                      • In line 4, do you want : “157.240.” OR “157. 240.” OR some spaces, to indicate the missing second group as “157.          240.” OR else ?

                      • In line 13 do you want: “185.222.185.223.” OR with a dash to show it’s from a range of IPV4 addresses, as “185.222. - 185.223.” OR else ?

                      See you soon,

                      Cheers,

                      guy038

                      SuncatcherS 1 Reply Last reply Reply Quote 0
                      • SuncatcherS
                        Suncatcher @Scott Sumner
                        last edited by

                        @Scott-Sumner I beg my pardon! I am so inattentive. The script makes exactly what I want. Thank you a lot!

                        1 Reply Last reply Reply Quote 0
                        • SuncatcherS
                          Suncatcher @guy038
                          last edited by

                          @guy038 Ideally I want the highlighted pieces to be copied exactly like they are placed on initial page, with the exact amount of spaces as before as a separator.
                          For example, from line 2 should be copied:

                           134.170.110.
                          

                          from line 12:

                           185.   220.
                          

                          from line 13:

                           185.   222.    185.   223.
                          
                          1 Reply Last reply Reply Quote 0
                          • guy038G
                            guy038
                            last edited by guy038

                            Suncatcher,

                            From what you said, I used the FOUR following rules :

                            • If a line does NOT contain a string of THREE digits, followed by a DOT, this line is completely deleted

                            • Any range of characters, ENDING a line, which does NOT contain a string of THREE digits, followed by a DOT, is deleted, too

                            • Any string of THREE digits, followed by a DOT, is UNCHANGED

                            • Any other single character is REPLACED by a single SPACE character


                            This leads to the following S/R, with the Regular expression search mode CHECKED :

                            SEARCH (?-s)^\R|(?!.*\d{3}\.).+|(\d{3}\.)|(.)

                            REPLACE (?1\1)(?2 )

                            For instance, from this example text, below :

                            134.170.110.48
                            85.33.98.0 - 85.33.99.255
                            185.33.220.38
                            200.25.6.78
                            65.55.52.23
                            5.155.52.23
                            12.3.8.145
                            185.33.220.0 - 185.33.223.255
                            1.23.137.2
                            1.2.3.4
                            25.155.52.153
                            67.42.95.0 - 67.42.95.99
                            31.53.61.99 - 31.53.61.100
                            58.33.99.0 - 58.33.101.1
                            

                            We would get the replaced text, below :

                            134.170.110.
                            185.   220.
                            200.
                              155.
                            185.   220.    185.   223.
                                 137.
                               155.
                                               101.
                            

                            As you may notice, as expected, some lines have been deleted :

                            • The lines without any three consecutive digits, at all

                            • The lines with, ONLY, one block of three digits, at the END of a line, without the final DOT character

                            REMARK :

                            If you prefer to keep a blank line, in case NO block of three consecutive digits exists, in a line, just change the search regex to :

                            SEARCH (?-s)(?!.*\d{3}\.).+|(\d{3}\.)|(.)

                            This time, you would obtain :

                            134.170.110.
                            
                            185.   220.
                            200.
                            
                              155.
                            
                            185.   220.    185.   223.
                                 137.
                            
                               155.
                            
                            
                                               101.
                            

                            Best Regards,

                            guy038

                            1 Reply Last reply Reply Quote 0
                            • SuncatcherS
                              Suncatcher
                              last edited by Suncatcher

                              @guy038 well, RegExp approach is not bad but seems to be disposable. I tried to adapt it to another pattern and that’s what I’ve got.
                              I set the pattern which consists of dot and two digits.

                              And rewrote your expressions accordingly - (?-s)(?!.*\.\d{2}\.).+|(\.\d{2}\.)|(.)
                              And that’s what it output to me

                              And that is obviously not the thing supposed to be there. To achieve our aim regexp should be rewritten every time. Pythonscript seems to be more universal and consistent approach.

                              1 Reply Last reply Reply Quote 0
                              • guy038G
                                guy038
                                last edited by guy038

                                Hi, Suncatcher and Scott,

                                Ah… Of course, my previous regex was much to closed to, your specific regex \d{3}\. . In addition, I tried, hard, from what you said, to keep the exact position where the different matches were !

                                So, I decided to run the Scott script to see how this script re-writes the different matches :-) BTW, Scott, very nice script for people, who does not worry about regex problems or details ;-))

                                So, starting from my original example text, below :

                                134.170.110.48
                                85.33.98.0 - 85.33.99.255
                                185.33.220.38
                                200.25.6.78
                                65.55.52.23
                                5.155.52.23
                                12.3.8.145
                                185.33.220.0 - 185.33.223.255
                                1.23.137.2
                                1.2.3.4
                                25.155.52.153
                                67.42.95.0 - 67.42.95.99
                                31.53.61.99 - 31.53.61.100
                                58.33.99.0 - 58.33.101.1
                                

                                The Scott’s script, put the following text, in the clipboard, in that form :

                                .17
                                .11
                                .48
                                .33.98
                                .33.99.25
                                .33.22
                                .38
                                .25
                                .78
                                .55.52.23
                                .15
                                .52.23
                                .14
                                .33.22
                                .33.22
                                .25
                                .23.13
                                .15
                                .52.15
                                .42.95
                                .42.95.99
                                .53.61.99
                                .53.61.10
                                .33.99
                                .33.10
                                

                                From this modified text, we are able to deduce that the script follows TWO main rules :

                                • If two or more matches are adjacent, there are rewritten as a single unit, in a same line

                                • As soon as two consecutive matches, are NON adjacent, there are displayed in two consecutive lines

                                Quite different than before, isn’t it ?!


                                I, then, realized that you can start, with my general regex S/R , exposed in my THIRD post, on that topic, which is :

                                SEARCH (?s)^.*?(Your regex to match)|(?s).*\z

                                REPLACE (?1\1\r\n)

                                With a minor modification, this new general S/R, below, will adopt the same output displaying, than the Scott’s script :-))

                                SEARCH (?s)^.*?((?:Your regex to match)+)|(?s).*\z ( GENERAL regex syntax )

                                REPLACE (?1\1\r\n)

                                So, if we use your second regex \.\d{2}, that we insert in the general regex syntax, above, we obtain the practical S/R:

                                SEARCH (?s)^.*?((?:\.\d{2})+)|(?s).*\z

                                REPLACE (?1\1\r\n)

                                which gives, after replacement, the expected output text, identical to Scott’s script one !


                                NOTES :

                                • Compared to my THIRD post, the second part, of this new general regex, has changed into : ((?:Your regex to match)+) :

                                  • It represents any consecutive and adjacent matches of your regex, which is stored as group 1 and output on a single line, followed by a line break

                                  • The inner parentheses , (?:Your regex to match) , stands for a non-capturing group, containing a single match of your regex

                                • The IMPORTANT and P.S. sections, of my THIRD post, are still pertinent

                                Best Regards,

                                guy038

                                P.S. :

                                I did an other tests, with your first regex \d{3}\. and, also, with the simple regex \d\.\d, leading to the appropriate following S/R :

                                SEARCH (?s)^.*?((?:\d\.\d)+)|(?s).*\z

                                REPLACE (?1\1\r\n)

                                After replacement, the resulting text is, as expected :

                                4.1
                                0.1
                                0.4
                                5.33.98.0
                                5.33.99.2
                                5.33.2
                                0.3
                                0.25.6
                                5.55.52.2
                                5.1
                                5.52.2
                                2.3
                                8.1
                                5.33.2
                                0.0
                                5.33.2
                                3.2
                                1.23.1
                                7.2
                                1.2
                                3.4
                                5.1
                                5.52.1
                                7.42.95.0
                                7.42.95.9
                                1.53.61.9
                                1.53.61.1
                                8.33.99.0
                                8.33.1
                                1.1
                                

                                and corresponds, exactly, to the text, generated by the Scott’s script :-))

                                1 Reply Last reply Reply Quote 0
                                • Scott SumnerS
                                  Scott Sumner
                                  last edited by

                                  So I’m currently trying to copy some multi-line (red)marked text out of a large (~70MB) file, and my Pythonscript technique for doing so (see earlier posting in this thread) works but is super-slow on a large file; it iterates through the file one position at a time (pos += 1). Is there a faster way to code it, given the functions we have at our disposal for doing this? @Claudia-Frank , ideas? :-)

                                  dailD 2 Replies Last reply Reply Quote 0
                                  • dailD
                                    dail @Scott Sumner
                                    last edited by dail

                                    @Scott-Sumner

                                    See next post, this is wrong!

                                    AFAIK using indicatorStart() and indicatorEnd() is quite efficient finding marked locations. The code you posted above doesn’t seem to be utilizing this as efficiently as it could. I have no way of testing this following code but you should be able to do something like this:

                                    start = 0
                                    end = 0
                                    while True:
                                    	start = editor.indicatorStart(SCE_UNIVERSAL_FOUND_STYLE, end)
                                    	if start == 0:
                                    		break
                                    	end = editor.indicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, start)
                                    	accum_text += editor.getTextRange(start, end) + '\r\n'
                                    

                                    Again this hasn’t been tested so there may be corner cases you need to check for…such as using start + 1 when calling indicatorEnd() but this is the gist of it.

                                    1 Reply Last reply Reply Quote 0
                                    • dailD
                                      dail @Scott Sumner
                                      last edited by dail

                                      @Scott-Sumner

                                      Woops sorry about the above, it was way off. Here is a small LuaScript which works (I’m sure you can easily translate it into Python)

                                      SCE_UNIVERSAL_FOUND_STYLE = 31
                                      start = editor:IndicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, -1)
                                      while start ~= 0 and start ~= editor.Length do
                                      	endd = editor:IndicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, start)
                                      	print(editor:textrange(start, endd))
                                      	start = editor:IndicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, endd)
                                      end
                                      

                                      Note: The one major initial bug I know if is that it is incorrect if the very first character of the file is marked.

                                      Claudia FrankC 1 Reply Last reply Reply Quote 2
                                      • Claudia FrankC
                                        Claudia Frank @dail
                                        last edited by

                                        @dail, @Scott-Sumner

                                        This is strange, isn’t it? You have to use IndicatorEnd to find the start position but it is like it is…

                                        Cheers
                                        Claudia

                                        dailD 1 Reply Last reply Reply Quote 1
                                        • dailD
                                          dail @Claudia Frank
                                          last edited by

                                          @Claudia-Frank

                                          Yeah I ran into this as well when modifying my DoxyIt plugin…the way I came to think of it now is that it finds the end of the range you specify by pos. And technically a range that is not marked has an end…which is the start of the range you want…oh well :)

                                          Claudia FrankC 1 Reply Last reply Reply Quote 0
                                          • Claudia FrankC
                                            Claudia Frank @dail
                                            last edited by Claudia Frank

                                            @dail

                                            yeah, :-) sounds … logical … some how . … still confusing :-)
                                            And what makes it confusing even more, what you already said, is, that if you do
                                            editor.indicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, -1) you will get the end position.
                                            Aahhhh :-D

                                            What I meant is about

                                            Note: The one major initial bug I know if is that it is incorrect if the very first character of the file is marked.

                                            Cheers
                                            Claudia

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors