Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Faster "Find in Files"?

    General Discussion
    4
    10
    6449
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Krzysztof Chodak
      Krzysztof Chodak last edited by

      Any chance on multi-threaded “Find in Files” functionality? Is here any multi-threaded code in n++? Are there any plans for multi-threading? I am doing many such searches in couple thousands files and I am thinking about cutting wait time - it looks like I/O is a main bottleneck now so using couple of threads would speed it up.

      1 Reply Last reply Reply Quote 0
      • gstavi
        gstavi last edited by

        In general multi threaded is not the ideal solution for “find in files” since it is mostly IO bound. Any thread added into a GUI application is an invitation for trouble. Asynchronous IO with single thread should usually provide results as good or better than multi threaded implementation.

        BUT this is not the problem of Notepad++.
        During “find in files” Notepad++ loads each file needlessly as if it would open it for viewing. The benefit is that during this load it detects file encoding, so you can “find in files” with multiple encodings. The price is that it is really really slow.

        An alternative “find in files” that assumes UTF-8 or is given a specific encoding in the dialog and scans the files with primitive buffer operations without actually load them into Scintilla will be MUCH faster.

        Personally I ‘grep’ things from command line, copy paste results and use tags lookup plugin to jump to file:line.

        1 Reply Last reply Reply Quote 2
        • guy038
          guy038 last edited by guy038

          Hello, @gstavi,

          Thanks for your excellent explanation, on the N++ moderate speed of searching, on multiple files. But, now, I’m simply wondering :

          Why don’t we add an other field, in the Find in Files dialog, which indicates the encoding ( ANSI, UTF-8, UCS-2 LE or UCS-2 BE ) of the different files scanned ?. Of course, if this zone would NOT be filled, the classical search, with encoding detection, would occurs ?

          However, it would be of the user’s responsibility to verify that no file, of the list to scan, has an other encoding that the one specified, as I suppose that the results, in the Search result panel, would, certainly, not be coherent, in that case !!

          Just an idea…

          Best Regards,

          guy038

          P.S. :

          If would be sensible to test this option in order to verify that speed increase is really significant !!

          1 Reply Last reply Reply Quote 0
          • gstavi
            gstavi last edited by gstavi

            @guy038
            GUI-wise anything goes.
            But the other benefit of current implementation that it actually reuse base functionality within Notepad++.
            As far as I remember, it is:

            Scan directories and build file list according to wildcards // This is another slow (and memory consuming) thing I forgot to mention
            For each file in list
                Load file into Scintilla buffer // detect encoding, load entire file at once
                Find in Scintilla buffer and add to find results // Including all regular expression tricks
                Close Scintilla buffer
            

            So for faster find in files we will have to write a new algorithm entirely.

            1 Reply Last reply Reply Quote 0
            • Krzysztof Chodak
              Krzysztof Chodak last edited by

              I would just distribute “For each file in list” loop you mentioned across all CPU cores available with synchronization on find results; in theory it should cut the wait time by the number of cores available

              1 Reply Last reply Reply Quote 0
              • Krzysztof Chodak
                Krzysztof Chodak last edited by

                I am downloading VS Community 2017 and I will see what I could do

                1 Reply Last reply Reply Quote 0
                • pnedev
                  pnedev last edited by

                  @Krzysztof-Chodak ,

                  As @gstavi already described distributing will not work the way things are currently implemented in N++.
                  The reason is because N++ uses hidden Scintilla view instance to perform the search. So each file in the search list will have to pass through this hidden Scintilla view which is serialization actually. Unless you change things entirely and have separate Scintilla view per thread multi-threading will be pointless but even with many Scintilla views those will again have to pass through N++'s main GUI thread window procedure. As @gstavi said to be able to have multi-threaded search you’ll have to bypass Scintilla, load each file in memory and search that buffer but here comes the encoding detection problem and the proper reg-ex handling.

                  BR,
                  Pavel

                  1 Reply Last reply Reply Quote 2
                  • guy038
                    guy038 last edited by

                    Hi, @pnedev and @gstavi,

                    Just an other newbe question !

                    Would the Search in Files be quicker if the list of the scanned files contains, exclusively, files with a BOM ( cases UTF-8-BOM, UCS-2 BE BOM or UCS-2 LE BOM ) ?

                    Indeed, with that BOM, the right encoding is quickly known, without any ambiguity, and should increase the search process !?

                    Cheers,

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • pnedev
                      pnedev last edited by

                      Hi @guy038 ,

                      If the file encoding is known and the multi-threaded search is implemented then yes, this will speed-up the process. But again, each thread will have to load a file to search into memory buffer.

                      BR

                      1 Reply Last reply Reply Quote 0
                      • Krzysztof Chodak
                        Krzysztof Chodak last edited by

                        @pnedev: this is what I started to do after analyzing the code - I have created hidden editview per each thread

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Copyright © 2014 NodeBB Forums | Contributors