Faster "Find in Files"?



  • Any chance on multi-threaded “Find in Files” functionality? Is here any multi-threaded code in n++? Are there any plans for multi-threading? I am doing many such searches in couple thousands files and I am thinking about cutting wait time - it looks like I/O is a main bottleneck now so using couple of threads would speed it up.



  • In general multi threaded is not the ideal solution for “find in files” since it is mostly IO bound. Any thread added into a GUI application is an invitation for trouble. Asynchronous IO with single thread should usually provide results as good or better than multi threaded implementation.

    BUT this is not the problem of Notepad++.
    During “find in files” Notepad++ loads each file needlessly as if it would open it for viewing. The benefit is that during this load it detects file encoding, so you can “find in files” with multiple encodings. The price is that it is really really slow.

    An alternative “find in files” that assumes UTF-8 or is given a specific encoding in the dialog and scans the files with primitive buffer operations without actually load them into Scintilla will be MUCH faster.

    Personally I ‘grep’ things from command line, copy paste results and use tags lookup plugin to jump to file:line.



  • Hello, @gstavi,

    Thanks for your excellent explanation, on the N++ moderate speed of searching, on multiple files. But, now, I’m simply wondering :

    Why don’t we add an other field, in the Find in Files dialog, which indicates the encoding ( ANSI, UTF-8, UCS-2 LE or UCS-2 BE ) of the different files scanned ?. Of course, if this zone would NOT be filled, the classical search, with encoding detection, would occurs ?

    However, it would be of the user’s responsibility to verify that no file, of the list to scan, has an other encoding that the one specified, as I suppose that the results, in the Search result panel, would, certainly, not be coherent, in that case !!

    Just an idea…

    Best Regards,

    guy038

    P.S. :

    If would be sensible to test this option in order to verify that speed increase is really significant !!



  • @guy038
    GUI-wise anything goes.
    But the other benefit of current implementation that it actually reuse base functionality within Notepad++.
    As far as I remember, it is:

    Scan directories and build file list according to wildcards // This is another slow (and memory consuming) thing I forgot to mention
    For each file in list
        Load file into Scintilla buffer // detect encoding, load entire file at once
        Find in Scintilla buffer and add to find results // Including all regular expression tricks
        Close Scintilla buffer
    

    So for faster find in files we will have to write a new algorithm entirely.



  • I would just distribute “For each file in list” loop you mentioned across all CPU cores available with synchronization on find results; in theory it should cut the wait time by the number of cores available



  • I am downloading VS Community 2017 and I will see what I could do



  • @Krzysztof-Chodak ,

    As @gstavi already described distributing will not work the way things are currently implemented in N++.
    The reason is because N++ uses hidden Scintilla view instance to perform the search. So each file in the search list will have to pass through this hidden Scintilla view which is serialization actually. Unless you change things entirely and have separate Scintilla view per thread multi-threading will be pointless but even with many Scintilla views those will again have to pass through N++'s main GUI thread window procedure. As @gstavi said to be able to have multi-threaded search you’ll have to bypass Scintilla, load each file in memory and search that buffer but here comes the encoding detection problem and the proper reg-ex handling.

    BR,
    Pavel



  • Hi, @pnedev and @gstavi,

    Just an other newbe question !

    Would the Search in Files be quicker if the list of the scanned files contains, exclusively, files with a BOM ( cases UTF-8-BOM, UCS-2 BE BOM or UCS-2 LE BOM ) ?

    Indeed, with that BOM, the right encoding is quickly known, without any ambiguity, and should increase the search process !?

    Cheers,

    guy038



  • Hi @guy038 ,

    If the file encoding is known and the multi-threaded search is implemented then yes, this will speed-up the process. But again, each thread will have to load a file to search into memory buffer.

    BR



  • @pnedev: this is what I started to do after analyzing the code - I have created hidden editview per each thread


Log in to reply