• Login
Community
  • Login

Marked text manipulation

Scheduled Pinned Locked Moved General Discussion
regexmarkregexdelete textcopy
50 Posts 12 Posters 52.3k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    guy038
    last edited by guy038 Dec 10, 2016, 11:24 AM Dec 10, 2016, 11:20 AM

    Hello, Suncatcher,

    Why don’t you use the Bookmark feature ?

    So :

    • Go back to the very beginning of your list of IP addresses ( Ctrl + Origin )

    • Open the Mark dialog ( Search > Mark… )

    • Enter your regex in the Find what: zone

    • Check the Bookmark line option

    • Click of the Mark All button

    • Select the menu option Search > Bookmark > Copy Bookmarked Lines

    • Open a new tab ( Ctrl + N )

    • Paste all the results ( Ctrl + V )

    Best Regards

    guy038

    S 1 Reply Last reply Dec 10, 2016, 2:05 PM Reply Quote 0
    • S
      Scott Sumner @Suncatcher
      last edited by Dec 10, 2016, 1:25 PM

      @Suncatcher said:

      How do I copy all them in one click?

      In one click? Not possible [to the best of my knowledge…always have to add that disclaimer :-) ].

      The closest thing I can think of would be “one keypress”, which would involve tying the Pythonscript I provided before to a shortcut-mapper key-combination, so that it gets executed when the key-combo is pressed.

      1 Reply Last reply Reply Quote 0
      • S
        Suncatcher @guy038
        last edited by Suncatcher Dec 10, 2016, 2:06 PM Dec 10, 2016, 2:05 PM

        @guy038 bookmark feature copies the whole line, and I need only the certain pieces of line, as you can see from the screenshot.
        If I used bookmarking it’ll copy all lines (because I have matches on each line) and that’s not what I want. I want just highlighted pieces.

        @Scott-Sumner could you provide the sample of such script?

        S 1 Reply Last reply Dec 10, 2016, 5:37 PM Reply Quote 0
        • S
          Scott Sumner @Suncatcher
          last edited by Dec 10, 2016, 5:37 PM

          @Suncatcher

          In my immediately previous post, I said “the Pythonscript I provided before”. What I meant by that, if it was unclear, was to scroll up in this thread to a previous posting by me. It starts with “At one time I had a need to do” and contains the code I was talking about.

          S 1 Reply Last reply Dec 11, 2016, 12:54 PM Reply Quote 0
          • G
            guy038
            last edited by guy038 Dec 10, 2016, 9:19 PM Dec 10, 2016, 9:17 PM

            Hi, Suncatcher,

            OK, I see what you mean ! But, how would you like that the extracted matches were displayed and which separator do you need between these different matches ?

            For instance, from your picture :

            • In line 2, do you want : “134.170.110.” OR “134. 170. 110.” OR, even, the string “134 170 110”, without dot OR else ?

            • In line 4, do you want : “157.240.” OR “157. 240.” OR some spaces, to indicate the missing second group as “157.          240.” OR else ?

            • In line 13 do you want: “185.222.185.223.” OR with a dash to show it’s from a range of IPV4 addresses, as “185.222. - 185.223.” OR else ?

            See you soon,

            Cheers,

            guy038

            S 1 Reply Last reply Dec 11, 2016, 1:40 PM Reply Quote 0
            • S
              Suncatcher @Scott Sumner
              last edited by Dec 11, 2016, 12:54 PM

              @Scott-Sumner I beg my pardon! I am so inattentive. The script makes exactly what I want. Thank you a lot!

              1 Reply Last reply Reply Quote 0
              • S
                Suncatcher @guy038
                last edited by Dec 11, 2016, 1:40 PM

                @guy038 Ideally I want the highlighted pieces to be copied exactly like they are placed on initial page, with the exact amount of spaces as before as a separator.
                For example, from line 2 should be copied:

                 134.170.110.
                

                from line 12:

                 185.   220.
                

                from line 13:

                 185.   222.    185.   223.
                
                1 Reply Last reply Reply Quote 0
                • G
                  guy038
                  last edited by guy038 Dec 11, 2016, 4:42 PM Dec 11, 2016, 4:37 PM

                  Suncatcher,

                  From what you said, I used the FOUR following rules :

                  • If a line does NOT contain a string of THREE digits, followed by a DOT, this line is completely deleted

                  • Any range of characters, ENDING a line, which does NOT contain a string of THREE digits, followed by a DOT, is deleted, too

                  • Any string of THREE digits, followed by a DOT, is UNCHANGED

                  • Any other single character is REPLACED by a single SPACE character


                  This leads to the following S/R, with the Regular expression search mode CHECKED :

                  SEARCH (?-s)^\R|(?!.*\d{3}\.).+|(\d{3}\.)|(.)

                  REPLACE (?1\1)(?2 )

                  For instance, from this example text, below :

                  134.170.110.48
                  85.33.98.0 - 85.33.99.255
                  185.33.220.38
                  200.25.6.78
                  65.55.52.23
                  5.155.52.23
                  12.3.8.145
                  185.33.220.0 - 185.33.223.255
                  1.23.137.2
                  1.2.3.4
                  25.155.52.153
                  67.42.95.0 - 67.42.95.99
                  31.53.61.99 - 31.53.61.100
                  58.33.99.0 - 58.33.101.1
                  

                  We would get the replaced text, below :

                  134.170.110.
                  185.   220.
                  200.
                    155.
                  185.   220.    185.   223.
                       137.
                     155.
                                     101.
                  

                  As you may notice, as expected, some lines have been deleted :

                  • The lines without any three consecutive digits, at all

                  • The lines with, ONLY, one block of three digits, at the END of a line, without the final DOT character

                  REMARK :

                  If you prefer to keep a blank line, in case NO block of three consecutive digits exists, in a line, just change the search regex to :

                  SEARCH (?-s)(?!.*\d{3}\.).+|(\d{3}\.)|(.)

                  This time, you would obtain :

                  134.170.110.
                  
                  185.   220.
                  200.
                  
                    155.
                  
                  185.   220.    185.   223.
                       137.
                  
                     155.
                  
                  
                                     101.
                  

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 0
                  • S
                    Suncatcher
                    last edited by Suncatcher Dec 11, 2016, 6:29 PM Dec 11, 2016, 6:27 PM

                    @guy038 well, RegExp approach is not bad but seems to be disposable. I tried to adapt it to another pattern and that’s what I’ve got.
                    I set the pattern which consists of dot and two digits.

                    And rewrote your expressions accordingly - (?-s)(?!.*\.\d{2}\.).+|(\.\d{2}\.)|(.)
                    And that’s what it output to me

                    And that is obviously not the thing supposed to be there. To achieve our aim regexp should be rewritten every time. Pythonscript seems to be more universal and consistent approach.

                    1 Reply Last reply Reply Quote 0
                    • G
                      guy038
                      last edited by guy038 Dec 13, 2016, 6:45 PM Dec 13, 2016, 12:47 PM

                      Hi, Suncatcher and Scott,

                      Ah… Of course, my previous regex was much to closed to, your specific regex \d{3}\. . In addition, I tried, hard, from what you said, to keep the exact position where the different matches were !

                      So, I decided to run the Scott script to see how this script re-writes the different matches :-) BTW, Scott, very nice script for people, who does not worry about regex problems or details ;-))

                      So, starting from my original example text, below :

                      134.170.110.48
                      85.33.98.0 - 85.33.99.255
                      185.33.220.38
                      200.25.6.78
                      65.55.52.23
                      5.155.52.23
                      12.3.8.145
                      185.33.220.0 - 185.33.223.255
                      1.23.137.2
                      1.2.3.4
                      25.155.52.153
                      67.42.95.0 - 67.42.95.99
                      31.53.61.99 - 31.53.61.100
                      58.33.99.0 - 58.33.101.1
                      

                      The Scott’s script, put the following text, in the clipboard, in that form :

                      .17
                      .11
                      .48
                      .33.98
                      .33.99.25
                      .33.22
                      .38
                      .25
                      .78
                      .55.52.23
                      .15
                      .52.23
                      .14
                      .33.22
                      .33.22
                      .25
                      .23.13
                      .15
                      .52.15
                      .42.95
                      .42.95.99
                      .53.61.99
                      .53.61.10
                      .33.99
                      .33.10
                      

                      From this modified text, we are able to deduce that the script follows TWO main rules :

                      • If two or more matches are adjacent, there are rewritten as a single unit, in a same line

                      • As soon as two consecutive matches, are NON adjacent, there are displayed in two consecutive lines

                      Quite different than before, isn’t it ?!


                      I, then, realized that you can start, with my general regex S/R , exposed in my THIRD post, on that topic, which is :

                      SEARCH (?s)^.*?(Your regex to match)|(?s).*\z

                      REPLACE (?1\1\r\n)

                      With a minor modification, this new general S/R, below, will adopt the same output displaying, than the Scott’s script :-))

                      SEARCH (?s)^.*?((?:Your regex to match)+)|(?s).*\z ( GENERAL regex syntax )

                      REPLACE (?1\1\r\n)

                      So, if we use your second regex \.\d{2}, that we insert in the general regex syntax, above, we obtain the practical S/R:

                      SEARCH (?s)^.*?((?:\.\d{2})+)|(?s).*\z

                      REPLACE (?1\1\r\n)

                      which gives, after replacement, the expected output text, identical to Scott’s script one !


                      NOTES :

                      • Compared to my THIRD post, the second part, of this new general regex, has changed into : ((?:Your regex to match)+) :

                        • It represents any consecutive and adjacent matches of your regex, which is stored as group 1 and output on a single line, followed by a line break

                        • The inner parentheses , (?:Your regex to match) , stands for a non-capturing group, containing a single match of your regex

                      • The IMPORTANT and P.S. sections, of my THIRD post, are still pertinent

                      Best Regards,

                      guy038

                      P.S. :

                      I did an other tests, with your first regex \d{3}\. and, also, with the simple regex \d\.\d, leading to the appropriate following S/R :

                      SEARCH (?s)^.*?((?:\d\.\d)+)|(?s).*\z

                      REPLACE (?1\1\r\n)

                      After replacement, the resulting text is, as expected :

                      4.1
                      0.1
                      0.4
                      5.33.98.0
                      5.33.99.2
                      5.33.2
                      0.3
                      0.25.6
                      5.55.52.2
                      5.1
                      5.52.2
                      2.3
                      8.1
                      5.33.2
                      0.0
                      5.33.2
                      3.2
                      1.23.1
                      7.2
                      1.2
                      3.4
                      5.1
                      5.52.1
                      7.42.95.0
                      7.42.95.9
                      1.53.61.9
                      1.53.61.1
                      8.33.99.0
                      8.33.1
                      1.1
                      

                      and corresponds, exactly, to the text, generated by the Scott’s script :-))

                      1 Reply Last reply Reply Quote 0
                      • S
                        Scott Sumner
                        last edited by May 31, 2017, 3:07 PM

                        So I’m currently trying to copy some multi-line (red)marked text out of a large (~70MB) file, and my Pythonscript technique for doing so (see earlier posting in this thread) works but is super-slow on a large file; it iterates through the file one position at a time (pos += 1). Is there a faster way to code it, given the functions we have at our disposal for doing this? @Claudia-Frank , ideas? :-)

                        dailD 2 Replies Last reply May 31, 2017, 5:17 PM Reply Quote 0
                        • dailD
                          dail @Scott Sumner
                          last edited by dail May 31, 2017, 5:49 PM May 31, 2017, 5:17 PM

                          @Scott-Sumner

                          See next post, this is wrong!

                          AFAIK using indicatorStart() and indicatorEnd() is quite efficient finding marked locations. The code you posted above doesn’t seem to be utilizing this as efficiently as it could. I have no way of testing this following code but you should be able to do something like this:

                          start = 0
                          end = 0
                          while True:
                          	start = editor.indicatorStart(SCE_UNIVERSAL_FOUND_STYLE, end)
                          	if start == 0:
                          		break
                          	end = editor.indicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, start)
                          	accum_text += editor.getTextRange(start, end) + '\r\n'
                          

                          Again this hasn’t been tested so there may be corner cases you need to check for…such as using start + 1 when calling indicatorEnd() but this is the gist of it.

                          1 Reply Last reply Reply Quote 0
                          • dailD
                            dail @Scott Sumner
                            last edited by dail May 31, 2017, 5:56 PM May 31, 2017, 5:47 PM

                            @Scott-Sumner

                            Woops sorry about the above, it was way off. Here is a small LuaScript which works (I’m sure you can easily translate it into Python)

                            SCE_UNIVERSAL_FOUND_STYLE = 31
                            start = editor:IndicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, -1)
                            while start ~= 0 and start ~= editor.Length do
                            	endd = editor:IndicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, start)
                            	print(editor:textrange(start, endd))
                            	start = editor:IndicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, endd)
                            end
                            

                            Note: The one major initial bug I know if is that it is incorrect if the very first character of the file is marked.

                            Claudia FrankC 1 Reply Last reply May 31, 2017, 5:53 PM Reply Quote 2
                            • Claudia FrankC
                              Claudia Frank @dail
                              last edited by May 31, 2017, 5:53 PM

                              @dail, @Scott-Sumner

                              This is strange, isn’t it? You have to use IndicatorEnd to find the start position but it is like it is…

                              Cheers
                              Claudia

                              dailD 1 Reply Last reply May 31, 2017, 5:55 PM Reply Quote 1
                              • dailD
                                dail @Claudia Frank
                                last edited by May 31, 2017, 5:55 PM

                                @Claudia-Frank

                                Yeah I ran into this as well when modifying my DoxyIt plugin…the way I came to think of it now is that it finds the end of the range you specify by pos. And technically a range that is not marked has an end…which is the start of the range you want…oh well :)

                                Claudia FrankC 1 Reply Last reply May 31, 2017, 6:01 PM Reply Quote 0
                                • Claudia FrankC
                                  Claudia Frank @dail
                                  last edited by Claudia Frank May 31, 2017, 6:03 PM May 31, 2017, 6:01 PM

                                  @dail

                                  yeah, :-) sounds … logical … some how . … still confusing :-)
                                  And what makes it confusing even more, what you already said, is, that if you do
                                  editor.indicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, -1) you will get the end position.
                                  Aahhhh :-D

                                  What I meant is about

                                  Note: The one major initial bug I know if is that it is incorrect if the very first character of the file is marked.

                                  Cheers
                                  Claudia

                                  1 Reply Last reply Reply Quote 0
                                  • S
                                    Scott Sumner
                                    last edited by May 31, 2017, 7:19 PM

                                    @dail , @Claudia-Frank :

                                    Thanks for your inputs, I used the basic ideas but came up with my own Pythonscript version that is much faster than my original PS version on large files, and seems to correctly handle the oddities of the editor.indicatorEnd() function previously mentioned.

                                    So here is RedmarkedTextToClipboard2.py:

                                    def RTTC2__main():
                                    
                                        SCE_UNIVERSAL_FOUND_STYLE = 31  # N++ red-"mark" feature highlighting style indicator number
                                        ind_end_ret_vals_list = []
                                        ierv = 0
                                        while True:
                                            ierv = editor.indicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, ierv)
                                            # editor.indicatorEnd() returns 0 if no redmarked text exists
                                            # editor.indicatorEnd() returns last pos in file if no more redmarked text beyond the 'ierv' argument value
                                            if ierv == 0 or len(ind_end_ret_vals_list) > 0 and ierv == ind_end_ret_vals_list[-1]: break
                                            ind_end_ret_vals_list.append(ierv)
                                        if len(ind_end_ret_vals_list) > 0:
                                            if editor.indicatorValueAt(SCE_UNIVERSAL_FOUND_STYLE, 0) == 1:
                                                # compensate for weirdness with editor.indicatorEnd() when a match starts at the zero position
                                                zero = 0; ind_end_ret_vals_list.insert(0, zero)  # insert at BEGINNING of list
                                            if editor.indicatorValueAt(SCE_UNIVERSAL_FOUND_STYLE, ind_end_ret_vals_list[-1]) == 0:
                                                # remove end-of-file position unless it is part of the match
                                                ind_end_ret_vals_list.pop()
                                        start_end_pos_tup_list = zip(*[iter(ind_end_ret_vals_list)]*2)  # see https://stackoverflow.com/questions/14902686/turn-flat-list-into-two-tuples
                                        accum_text = ''
                                        for (start_pos, end_pos) in start_end_pos_tup_list:
                                            accum_text += editor.getTextRange(start_pos, end_pos) + '\r\n'
                                        if len(accum_text) > 0: editor.copyText(accum_text)  # put results in clipboard
                                    
                                    RTTC2__main()
                                    
                                    Claudia FrankC 1 Reply Last reply May 31, 2017, 11:48 PM Reply Quote 1
                                    • Claudia FrankC
                                      Claudia Frank @Scott Sumner
                                      last edited by May 31, 2017, 11:48 PM

                                      @Scott-Sumner

                                      Hi Scott,
                                      a nice one - good performance improvement. :-)

                                      If you are still looking for performance increase,
                                      a general suggestion would be to use as less global objects as possible within a loop as
                                      the cost of loading global is expensive.
                                      Cache global objects at the beginning of the script.
                                      Creating the tuple list from the beginning should be faster than creating from a flat list.
                                      Meaning do your two indicatorEnd calls and create a tuple from the results which than
                                      gets added to a list.
                                      Maybe a list comprehension to create the accum_text is faster as well - but not really sure
                                      as it would need to call the global object.
                                      All in all I assume this might make it faster up to 3-5%, not sure if it is worth thinking about it.

                                      Cheers
                                      Claudia

                                      S 1 Reply Last reply Jun 1, 2017, 12:17 AM Reply Quote 0
                                      • S
                                        Scott Sumner @Claudia Frank
                                        last edited by Scott Sumner Jun 1, 2017, 12:18 AM Jun 1, 2017, 12:17 AM

                                        @Claudia-Frank

                                        not sure if it is worth thinking about it.

                                        I should have posted my before-and-after timing, but really, the “before” was “forever” on my 70MB data file! The “after” was extremely quick, certainly on par with how long it took Notepad++'s Mark feature to redmark my desired text. Therefore performance was rated “very acceptable” for the new version. And that’s really all the performance I care about, so further optimizations aren’t worth it to me. Probably some of those optimizations you suggest would make the code less readable, too, so I’m definitely not wanting to go there (although now I leave myself open to comments on how readable/unreadable the existing code is). :-D

                                        I probably would have written this better the first time around if how these “indicator” functions worked was better documented!

                                        Until Notepad++ natively allows a non-destructive (@guy038’s regex method is destructive…but there is UNDO…hmmm) copy of all regex-matched text, this little script will serve me nicely, now on all files big and small.

                                        1 Reply Last reply Reply Quote 1
                                        • Maria DollM
                                          Maria Doll
                                          last edited by dail Jun 15, 2017, 12:33 PM Jun 15, 2017, 6:19 AM

                                          I have read almost all post but i did not know exactly, what was the problem … however i am continue read this forum and know the new things …[Dissertation Proposal Writing Service](LINK REMOVED)

                                          1 Reply Last reply Reply Quote -2
                                          • First post
                                            Last post
                                          The Community of users of the Notepad++ text editor.
                                          Powered by NodeBB | Contributors