<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Little script is very slow (depends on file size)]]></title><description><![CDATA[<p dir="auto">Hello,<br />
I have a little PythonScript that is super important for my line of work, but can be very slow parsing large files. Seems that it gets worse the bigger the file is.</p>
<p dir="auto">It’s very simple, you highlight a portion of text, and it will bookmark all the lines containing that portion of text. Then you can copy the bookmarked lines, cut them , ect…</p>
<pre><code>MARK_BOOKMARK = 20

def match_found(m):
    targetStart = m.span(0)[0]
    lineNumber = editor.lineFromPosition(targetStart)
    editor.markerAdd(lineNumber, MARK_BOOKMARK)
   
pattern = editor.getSelText()
if pattern != '':
    editor.search(pattern, match_found) 
</code></pre>
<p dir="auto">I suspect <strong>editor.lineFromPosition(targetStart)</strong> is where the problem is. As it starts bookmarking lower and lower down the file, it gets increasingly slow.</p>
<p dir="auto">Would be nice to speed this up a bit. Not quite sure what to do. I suspect the <strong>search result ‘m’ should have the line number</strong> (m.lineNumber, … that kind of thing) when it finds a pattern somewhere in the file, so there would be no need for the clumsy conversion.</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/25902/little-script-is-very-slow-depends-on-file-size</link><generator>RSS for Node</generator><lastBuildDate>Wed, 10 Jun 2026 03:08:12 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/25902.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 24 Jun 2024 13:31:38 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Little script is very slow (depends on file size) on Mon, 24 Jun 2024 20:35:48 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/0bzen" aria-label="Profile: 0BZEN">@<bdi>0BZEN</bdi></a> said in <a href="/post/95563">Little script is very slow (depends on file size)</a>:</p>
<blockquote>
<p dir="auto">Not necessarily, no, it’s just very convenient, especially with shortcuts.</p>
</blockquote>
<p dir="auto">I’m just saying, select, <code>Ctrl+M</code>, visually confirm checkboxes, and click <strong>Mark All</strong> isn’t that onerous… and if your script is “very slow”, it’s sure to be faster than &gt;15sec for the script version.</p>
<blockquote>
<p dir="auto">Possibly, although that will get invalidated when the file content changes</p>
</blockquote>
<p dir="auto">Mine doesn’t cache that information from run-to-run.  It just precomputes the mapping of the positions-from-lines in a way that only requires going through the whole document once (as far as I can tell), rather than counting from the beginning every time.</p>
<blockquote>
<p dir="auto">Yeah, the search results don’t contain meta-data, like line numbers. It might be possible to count the number of ‘/R’ characters from the last hit, the first search result using the slow function (or count the ‘/R’ from beginning of file).</p>
</blockquote>
<p dir="auto">I’m pretty sure that the extra effort of counting between matches (which isn’t implemented already, so it’d have to be manually done) would be more time-consuming than the current.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/95565</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/95565</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Mon, 24 Jun 2024 20:35:48 GMT</pubDate></item><item><title><![CDATA[Reply to Little script is very slow (depends on file size) on Mon, 24 Jun 2024 20:23:18 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/alan-kilborn" aria-label="Profile: Alan-Kilborn">@<bdi>Alan-Kilborn</bdi></a> Could be linked to the same issue I encounter? Using some sort of file-position-to-line-number comversion, which means having to run through the begining of the file every time?</p>
<p dir="auto">Anyway, no big deal. Thanks for your input everyone!</p>
]]></description><link>https://community.notepad-plus-plus.org/post/95564</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/95564</guid><dc:creator><![CDATA[0BZEN]]></dc:creator><pubDate>Mon, 24 Jun 2024 20:23:18 GMT</pubDate></item><item><title><![CDATA[Reply to Little script is very slow (depends on file size) on Mon, 24 Jun 2024 20:37:46 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/peterjones" aria-label="Profile: PeterJones">@<bdi>PeterJones</bdi></a> said in <a href="/post/95552">Little script is very slow (depends on file size)</a>:</p>
<blockquote>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/0bzen" aria-label="Profile: 0BZEN">@<bdi>0BZEN</bdi></a> ,</p>
<p dir="auto">Regarding the behavior: do you really need it as a script?</p>
</blockquote>
<p dir="auto">Not necessarily, no, it’s just very convenient, especially with shortcuts.</p>
<blockquote>
<p dir="auto">Regarding script optimization: I don’t know of any ways that would for-sure optimize that…</p>
</blockquote>
<p dir="auto">Yeah, the search results don’t contain meta-data, like line numbers. It might be possible to count the number of ‘/R’ characters from the last hit, the first search result using the slow function (or count the ‘/R’ from beginning of file).</p>
<p dir="auto">Not sure if that would equate to the number of lines between search results? Maybe.</p>
<p dir="auto">something like</p>
<pre><code>int LineFromPosition(int position, int startpos=0, int startline=0)
{
    return startline + CountLineReturns(startpos, position);
}
</code></pre>
<p dir="auto">Something like that, more python-ey, with maybe a +1 / -1 extra line somewhere.</p>
<p dir="auto">if (position is &lt; startpos), we’ve gone back to the top of the file, so, will need to do something a bit more clever, but no biggie.</p>
<pre><code>int LineFromPosition(int position, int startpos=0, int startline=0)
{
    if (position is &lt; startpos)
    {
        return LineFromPosition(position, 0, 0);
    }
    else
    {
        return startline + CountLineReturns(startpos, position);
    }
}
</code></pre>
<p dir="auto">Something like that anyway.</p>
<blockquote>
<p dir="auto">if it were my script, I might look into whether I could cache a mapping that would help simplify, to avoid making PythonScript re-count line number from every call to the method…</p>
</blockquote>
<p dir="auto">Possibly, although that will get invalidated when the file content changes.</p>
<blockquote>
<p dir="auto">Selecting <code>!HERE!</code> and using the <strong>Mark</strong> dialog took a couple seconds.  Running my script (below) took &gt;15sec.</p>
</blockquote>
<p dir="auto">Interesting. I wonder what they do in that function. Not surprised though.</p>
<blockquote>
<p dir="auto">I don’t know whether the caching really helped speed things up or not – if you’re interested, you may try to compare your algorithm to mine on your own data.</p>
</blockquote>
<p dir="auto">It’s OK, I don’t mind the script being reasonably slow. It used to be far worse. I think the olde Notepad++ would lock up, at least it’s doing the search in a background process (I think).</p>
<blockquote>
<p dir="auto">But what I am sure about is that the <strong>Mark</strong> dialog was <em>significantly</em> faster than the script equivalent for my example data.</p>
</blockquote>
<p dir="auto">Yup, I may give it a go, if it gets very, very slow. I don’t want to spend much time on optimising a tool. As an exercise, it could be interesting, develop a method that is doing that bookmarking fast.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/95563</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/95563</guid><dc:creator><![CDATA[0BZEN]]></dc:creator><pubDate>Mon, 24 Jun 2024 20:37:46 GMT</pubDate></item><item><title><![CDATA[Reply to Little script is very slow (depends on file size) on Mon, 24 Jun 2024 19:50:12 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/alan-kilborn" aria-label="Profile: Alan-Kilborn">@<bdi>Alan-Kilborn</bdi></a> said in <a href="/post/95554">Little script is very slow (depends on file size)</a>:</p>
<blockquote>
<p dir="auto">Doesn’t this kill usage of the all-important <code>editor</code> in subsequently-run scripts??</p>
</blockquote>
<p dir="auto">So, funny story: when I first started writing/debugging the script, the script was in editor2 and the test file in editor1, and I wouldn’t always remember to click in editor1 before running the script… so when I found I had mixed some <code>editor.</code> and some <code>editor1.</code> during debug, I eventually <code>None</code>d the <code>editor</code> so that it would flag me if I made that mistake again.</p>
<p dir="auto">And then, by the time the script was fully working, I had forgotten I’d done that…</p>
<p dir="auto">So when I went to clean up before publishing, I couldn’t figure out why it wasn’t working when I tried to switch back to <code>editor.</code> … Instead of digging into it more, I just left it as-is with the <code>editor1</code> code.  And hence, a stupid bug</p>
<blockquote>
<p dir="auto">Doesn’t this kill usage of the all-important <code>editor</code> in subsequently-run scripts??</p>
</blockquote>
<p dir="auto">Only if your subsequently-run scripts assume that they come after <code>from Npp import editor</code> or equivalent in your startup; if they always have their own <code>from Npp import editor</code> line, then they will always correctly define <code>editor</code> for their own usage. :-)</p>
]]></description><link>https://community.notepad-plus-plus.org/post/95556</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/95556</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Mon, 24 Jun 2024 19:50:12 GMT</pubDate></item><item><title><![CDATA[Reply to Little script is very slow (depends on file size) on Mon, 24 Jun 2024 19:08:08 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/peterjones" aria-label="Profile: PeterJones">@<bdi>PeterJones</bdi></a> said in <a href="/post/95552">Little script is very slow (depends on file size)</a>:</p>
<blockquote>
<p dir="auto">editor = None</p>
</blockquote>
<p dir="auto">Doesn’t this kill usage of the all-important <code>editor</code> in subsequently-run scripts??</p>
]]></description><link>https://community.notepad-plus-plus.org/post/95554</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/95554</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Mon, 24 Jun 2024 19:08:08 GMT</pubDate></item><item><title><![CDATA[Reply to Little script is very slow (depends on file size) on Mon, 24 Jun 2024 16:23:38 GMT]]></title><description><![CDATA[<p dir="auto">Isn’t scripting with bookmarks involved historically slow for some reason?  I seem to recall this being discussed a few times here, with no real root cause determined, or solution found to speed things up.  :-(</p>
]]></description><link>https://community.notepad-plus-plus.org/post/95553</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/95553</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Mon, 24 Jun 2024 16:23:38 GMT</pubDate></item><item><title><![CDATA[Reply to Little script is very slow (depends on file size) on Mon, 24 Jun 2024 19:45:21 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/0bzen" aria-label="Profile: 0BZEN">@<bdi>0BZEN</bdi></a> ,</p>
<p dir="auto">Regarding the behavior: do you really need it as a script?  Because selecting the text and hitting <code>Ctrl+M</code> will bring up the <strong>Mark</strong> dialog with the selected text in the <strong>Find what</strong> (assuming normal <strong>Settings &gt; Preferences &gt; Searching &gt; ☑ Fill Find Field with Selected Text</strong>).  Ensuring <code>☑ Bookmark Line</code> is turned on in the <strong>Mark</strong> dialog will then bookmark all those same lines, just using Notepad++'s native search, rather than having to go through the plugin.  I would think that would be faster.  (Though any searches and actions over hundreds of MB will be slow.)</p>
<p dir="auto">Regarding script optimization: I don’t know of any ways that would for-sure optimize that…  if it were my script, I might look into whether I could cache a mapping that would help simplify, to avoid making PythonScript re-count line number from every call to the method…  I might give it a little thought to flesh out the idea some, though I won’t guarantee it will actually be faster than your current behavior.</p>
<h3>Update:</h3>
<p dir="auto">I took the lines</p>
<pre><code class="language-txt">one one one one one one one one one one one one one one one one one one one one 
two two two two two two two two two two two two two two two two two two two two 
thr thr thr thr thr thr thr thr thr thr thr thr thr thr thr thr thr thr thr thr 
fou fou fou fou fou fou fou fou fou fou fou fou fou fou fou fou fou fou fou fou 
fiv fiv fiv fiv fiv fiv fiv fiv fiv fiv fiv fiv fiv fiv fiv fiv fiv fiv fiv fiv 
six six six six six six six six six !HERE!  six six six six six six six six six 
sev sev sev sev sev sev sev sev sev sev sev sev sev sev sev sev sev sev sev sev 
eig eig eig eig eig eig eig eig eig eig eig eig eig eig eig eig eig eig eig eig 
nin nin nin nin nin nin nin nin nin nin nin nin nin nin nin nin nin nin nin nin 
ten ten ten ten ten ten ten ten ten ten ten ten ten ten ten ten ten ten ten ten 
</code></pre>
<p dir="auto">and <code>Ctrl+A Ctrl+D</code> until there were ~1.3M lines with ~107MB.  In this file, 10% of the lines have a match for <code>!HERE!</code>.</p>
<p dir="auto">Selecting <code>!HERE!</code> and using the <strong>Mark</strong> dialog took a couple seconds.  Running my script (below) took &gt;15sec.</p>
<pre><code class="language-py"># encoding=utf-8
"""in response to https://community.notepad-plus-plus.org/topic/25902/"""
from Npp import notepad,console,editor1
# editor = None # update: comment this out

def _doit():
    #console.clear()
    #console.show()
    linemap = {}
    for l in range(editor1.getLineCount()):
        p = editor1.positionFromLine(l)
        linemap[p] = l

    # console.write(str(linemap)+"\n\n")

    def match_found(m):
        s = m.start()
        p = s
        l = None

        # console.write("match start={} p={} line={} [before p search]\n".format(s,p,l))

        while p&gt;0:
            # console.write("{} =&gt; {}\n".format(p, p in linemap))
            if p in linemap:
                l = linemap[p]
                break
            else:
                p = p - 1

        # console.write("match start={} p={} line={}\n".format(s,p,l))
        editor1.markerAdd(l,20)

    editor1.search("!HERE!", match_found)

_doit()
del(_doit)
</code></pre>
<p dir="auto">I don’t know whether the caching really helped speed things up or not – if you’re interested, you may try to compare your algorithm to mine on your own data.  But what I am sure about is that the <strong>Mark</strong> dialog was <em>significantly</em> faster than the script equivalent for my example data.</p>
<p dir="auto">(<em>note: the doit() function is the algorithm itself; I wrap it in a function and then delete that function at the end to avoid cluttering my PythonScript with variables held from previous script runs, etc</em>)</p>
<p dir="auto"><em>update: comment out the line referenced by <a class="plugin-mentions-user plugin-mentions-a" href="/user/alan-kilborn" aria-label="Profile: Alan-Kilborn">@<bdi>Alan-Kilborn</bdi></a> , below</em></p>
]]></description><link>https://community.notepad-plus-plus.org/post/95552</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/95552</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Mon, 24 Jun 2024 19:45:21 GMT</pubDate></item><item><title><![CDATA[Reply to Little script is very slow (depends on file size) on Mon, 24 Jun 2024 13:35:48 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="/user/0bzen" aria-label="Profile: 0BZEN">@<bdi>0BZEN</bdi></a> We’re talking potentially 10’s of megabytes, if not 100’s. It’s not as bas as it used to be, but I suspect it could be made much more efficient.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/95551</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/95551</guid><dc:creator><![CDATA[0BZEN]]></dc:creator><pubDate>Mon, 24 Jun 2024 13:35:48 GMT</pubDate></item></channel></rss>