• Login
Community
  • Login

Faster "Find in Files"?

Scheduled Pinned Locked Moved General Discussion
10 Posts 4 Posters 7.7k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • K
    Krzysztof Chodak
    last edited by Apr 14, 2017, 5:16 PM

    Any chance on multi-threaded “Find in Files” functionality? Is here any multi-threaded code in n++? Are there any plans for multi-threading? I am doing many such searches in couple thousands files and I am thinking about cutting wait time - it looks like I/O is a main bottleneck now so using couple of threads would speed it up.

    1 Reply Last reply Reply Quote 0
    • G
      gstavi
      last edited by Apr 17, 2017, 9:54 PM

      In general multi threaded is not the ideal solution for “find in files” since it is mostly IO bound. Any thread added into a GUI application is an invitation for trouble. Asynchronous IO with single thread should usually provide results as good or better than multi threaded implementation.

      BUT this is not the problem of Notepad++.
      During “find in files” Notepad++ loads each file needlessly as if it would open it for viewing. The benefit is that during this load it detects file encoding, so you can “find in files” with multiple encodings. The price is that it is really really slow.

      An alternative “find in files” that assumes UTF-8 or is given a specific encoding in the dialog and scans the files with primitive buffer operations without actually load them into Scintilla will be MUCH faster.

      Personally I ‘grep’ things from command line, copy paste results and use tags lookup plugin to jump to file:line.

      1 Reply Last reply Reply Quote 2
      • G
        guy038
        last edited by guy038 Apr 18, 2017, 7:54 PM Apr 18, 2017, 7:50 PM

        Hello, @gstavi,

        Thanks for your excellent explanation, on the N++ moderate speed of searching, on multiple files. But, now, I’m simply wondering :

        Why don’t we add an other field, in the Find in Files dialog, which indicates the encoding ( ANSI, UTF-8, UCS-2 LE or UCS-2 BE ) of the different files scanned ?. Of course, if this zone would NOT be filled, the classical search, with encoding detection, would occurs ?

        However, it would be of the user’s responsibility to verify that no file, of the list to scan, has an other encoding that the one specified, as I suppose that the results, in the Search result panel, would, certainly, not be coherent, in that case !!

        Just an idea…

        Best Regards,

        guy038

        P.S. :

        If would be sensible to test this option in order to verify that speed increase is really significant !!

        1 Reply Last reply Reply Quote 0
        • G
          gstavi
          last edited by gstavi Apr 19, 2017, 6:44 AM Apr 19, 2017, 6:43 AM

          @guy038
          GUI-wise anything goes.
          But the other benefit of current implementation that it actually reuse base functionality within Notepad++.
          As far as I remember, it is:

          Scan directories and build file list according to wildcards // This is another slow (and memory consuming) thing I forgot to mention
          For each file in list
              Load file into Scintilla buffer // detect encoding, load entire file at once
              Find in Scintilla buffer and add to find results // Including all regular expression tricks
              Close Scintilla buffer
          

          So for faster find in files we will have to write a new algorithm entirely.

          1 Reply Last reply Reply Quote 0
          • K
            Krzysztof Chodak
            last edited by Apr 28, 2017, 7:26 AM

            I would just distribute “For each file in list” loop you mentioned across all CPU cores available with synchronization on find results; in theory it should cut the wait time by the number of cores available

            1 Reply Last reply Reply Quote 0
            • K
              Krzysztof Chodak
              last edited by Apr 28, 2017, 10:56 AM

              I am downloading VS Community 2017 and I will see what I could do

              1 Reply Last reply Reply Quote 0
              • P
                pnedev
                last edited by Apr 28, 2017, 11:54 AM

                @Krzysztof-Chodak ,

                As @gstavi already described distributing will not work the way things are currently implemented in N++.
                The reason is because N++ uses hidden Scintilla view instance to perform the search. So each file in the search list will have to pass through this hidden Scintilla view which is serialization actually. Unless you change things entirely and have separate Scintilla view per thread multi-threading will be pointless but even with many Scintilla views those will again have to pass through N++'s main GUI thread window procedure. As @gstavi said to be able to have multi-threaded search you’ll have to bypass Scintilla, load each file in memory and search that buffer but here comes the encoding detection problem and the proper reg-ex handling.

                BR,
                Pavel

                1 Reply Last reply Reply Quote 2
                • G
                  guy038
                  last edited by Apr 28, 2017, 8:18 PM

                  Hi, @pnedev and @gstavi,

                  Just an other newbe question !

                  Would the Search in Files be quicker if the list of the scanned files contains, exclusively, files with a BOM ( cases UTF-8-BOM, UCS-2 BE BOM or UCS-2 LE BOM ) ?

                  Indeed, with that BOM, the right encoding is quickly known, without any ambiguity, and should increase the search process !?

                  Cheers,

                  guy038

                  1 Reply Last reply Reply Quote 0
                  • P
                    pnedev
                    last edited by May 2, 2017, 2:21 PM

                    Hi @guy038 ,

                    If the file encoding is known and the multi-threaded search is implemented then yes, this will speed-up the process. But again, each thread will have to load a file to search into memory buffer.

                    BR

                    1 Reply Last reply Reply Quote 0
                    • K
                      Krzysztof Chodak
                      last edited by May 9, 2017, 7:06 AM

                      @pnedev: this is what I started to do after analyzing the code - I have created hidden editview per each thread

                      1 Reply Last reply Reply Quote 0
                      • First post
                        Last post
                      The Community of users of the Notepad++ text editor.
                      Powered by NodeBB | Contributors