Marked text manipulation

  • So I’m currently trying to copy some multi-line (red)marked text out of a large (~70MB) file, and my Pythonscript technique for doing so (see earlier posting in this thread) works but is super-slow on a large file; it iterates through the file one position at a time (pos += 1). Is there a faster way to code it, given the functions we have at our disposal for doing this? @Claudia-Frank , ideas? :-)

  • @Scott-Sumner

    See next post, this is wrong!

    AFAIK using indicatorStart() and indicatorEnd() is quite efficient finding marked locations. The code you posted above doesn’t seem to be utilizing this as efficiently as it could. I have no way of testing this following code but you should be able to do something like this:

    start = 0
    end = 0
    while True:
    	start = editor.indicatorStart(SCE_UNIVERSAL_FOUND_STYLE, end)
    	if start == 0:
    	end = editor.indicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, start)
    	accum_text += editor.getTextRange(start, end) + '\r\n'

    Again this hasn’t been tested so there may be corner cases you need to check for…such as using start + 1 when calling indicatorEnd() but this is the gist of it.

  • @Scott-Sumner

    Woops sorry about the above, it was way off. Here is a small LuaScript which works (I’m sure you can easily translate it into Python)

    start = editor:IndicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, -1)
    while start ~= 0 and start ~= editor.Length do
    	endd = editor:IndicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, start)
    	print(editor:textrange(start, endd))
    	start = editor:IndicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, endd)

    Note: The one major initial bug I know if is that it is incorrect if the very first character of the file is marked.

  • @dail, @Scott-Sumner

    This is strange, isn’t it? You have to use IndicatorEnd to find the start position but it is like it is…


  • @Claudia-Frank

    Yeah I ran into this as well when modifying my DoxyIt plugin…the way I came to think of it now is that it finds the end of the range you specify by pos. And technically a range that is not marked has an end…which is the start of the range you want…oh well :)

  • @dail

    yeah, :-) sounds … logical … some how . … still confusing :-)
    And what makes it confusing even more, what you already said, is, that if you do
    editor.indicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, -1) you will get the end position.
    Aahhhh :-D

    What I meant is about

    Note: The one major initial bug I know if is that it is incorrect if the very first character of the file is marked.


  • @dail , @Claudia-Frank :

    Thanks for your inputs, I used the basic ideas but came up with my own Pythonscript version that is much faster than my original PS version on large files, and seems to correctly handle the oddities of the editor.indicatorEnd() function previously mentioned.

    So here is

    def RTTC2__main():
        SCE_UNIVERSAL_FOUND_STYLE = 31  # N++ red-"mark" feature highlighting style indicator number
        ind_end_ret_vals_list = []
        ierv = 0
        while True:
            ierv = editor.indicatorEnd(SCE_UNIVERSAL_FOUND_STYLE, ierv)
            # editor.indicatorEnd() returns 0 if no redmarked text exists
            # editor.indicatorEnd() returns last pos in file if no more redmarked text beyond the 'ierv' argument value
            if ierv == 0 or len(ind_end_ret_vals_list) > 0 and ierv == ind_end_ret_vals_list[-1]: break
        if len(ind_end_ret_vals_list) > 0:
            if editor.indicatorValueAt(SCE_UNIVERSAL_FOUND_STYLE, 0) == 1:
                # compensate for weirdness with editor.indicatorEnd() when a match starts at the zero position
                zero = 0; ind_end_ret_vals_list.insert(0, zero)  # insert at BEGINNING of list
            if editor.indicatorValueAt(SCE_UNIVERSAL_FOUND_STYLE, ind_end_ret_vals_list[-1]) == 0:
                # remove end-of-file position unless it is part of the match
        start_end_pos_tup_list = zip(*[iter(ind_end_ret_vals_list)]*2)  # see
        accum_text = ''
        for (start_pos, end_pos) in start_end_pos_tup_list:
            accum_text += editor.getTextRange(start_pos, end_pos) + '\r\n'
        if len(accum_text) > 0: editor.copyText(accum_text)  # put results in clipboard

  • @Scott-Sumner

    Hi Scott,
    a nice one - good performance improvement. :-)

    If you are still looking for performance increase,
    a general suggestion would be to use as less global objects as possible within a loop as
    the cost of loading global is expensive.
    Cache global objects at the beginning of the script.
    Creating the tuple list from the beginning should be faster than creating from a flat list.
    Meaning do your two indicatorEnd calls and create a tuple from the results which than
    gets added to a list.
    Maybe a list comprehension to create the accum_text is faster as well - but not really sure
    as it would need to call the global object.
    All in all I assume this might make it faster up to 3-5%, not sure if it is worth thinking about it.


  • @Claudia-Frank

    not sure if it is worth thinking about it.

    I should have posted my before-and-after timing, but really, the “before” was “forever” on my 70MB data file! The “after” was extremely quick, certainly on par with how long it took Notepad++'s Mark feature to redmark my desired text. Therefore performance was rated “very acceptable” for the new version. And that’s really all the performance I care about, so further optimizations aren’t worth it to me. Probably some of those optimizations you suggest would make the code less readable, too, so I’m definitely not wanting to go there (although now I leave myself open to comments on how readable/unreadable the existing code is). :-D

    I probably would have written this better the first time around if how these “indicator” functions worked was better documented!

    Until Notepad++ natively allows a non-destructive (@guy038’s regex method is destructive…but there is UNDO…hmmm) copy of all regex-matched text, this little script will serve me nicely, now on all files big and small.

  • I have read almost all post but i did not know exactly, what was the problem … however i am continue read this forum and know the new things …[Dissertation Proposal Writing Service](LINK REMOVED)

  • @Scott-Sumner

    I am in the same situation but regular expression method is not working for me to copy match text.

    I want to grab all occurrences in configuration file where first line starts from ‘object’ and immediately second line starts with ‘nat’

    object network obj_any
    nat (inside,outside) dynamic interface
    object network obj-test
    nat (DMZ1,outside) static
    object network obj-
    nat (DMZ1,outside) static
    object network obj-
    nat (DMZ1,outside) static tcp 8080 80
    object network obj-
    nat (DMZ1,outside) static tcp 1002 22
    object network obj-
    nat (DMZ1,outside) static tcp 8080 80
    object network obj-
    nat (DMZ1,outside) static dns

    I wrote regular expression ^object.\R\snat.* to grab both lines
    starting with ‘object’ and with ‘nat’ but when I am replacing it with
    (?1\1), it is deleting the matched lines. Any dea what could be the correct replace string to keep only matced two lines

  • @Kashif-Rana :

    Not sure exactly what you are asking but on your data this seems to work to match it:

    Find-what zone: (?-s)^object.*\Rnat.*

    But what’s this about replacement? This thread is just talking about matching text, redmarking it, and copying it…so I’m confused about what you want to do…

  • @Scott-Sumner sorry for the confusion. What I want, whatever my regular expression matches, it is two line match (first line starts with ‘object’ and second line starts with ‘nat’). So like my regular expression will catch 100 instances of two lines below in huge file with other data as well and I want to copy that multi-line match.

    object network obj-
    nat (DMZ1,outside) static dns

    ‘mark’ is marking all lines but ‘bookmark’ is only bookmarking first line, not second line so I cannot copy through bookmark.

    So question is how to copy all instances of multi-line match by regular expression?

  • @Kashif-Rana

    Have you actually read this thread from top to bottom? If so, have you tried setting up and using above? If I’m understanding your need correctly (still have my doubts) it seems as if that would solve the problem…

  • @Scott-Sumner I will try this script. But without script, is it possible to copy multiple instances of matched result (that is multi-line) by regex in a text file?

  • @Kashif-Rana

    Ummmm, well…No…that’s why the script was developed in the first place…seems like this should be obvious from the earlier postings in this thread…