How to get the Filenames after running "Find in Files" ?



  • @Robinson-George

    You will probably have to resort to doing a regular-expression replace operation on the copied data.

    @guy038 did a nice writeup at one point on how to do what you are seeking but I can’t find it now so here is a “cheap” version:

    Paste your copied data into a fresh N++ tab.

    Do a replace operation:

    Find-what zone: (?-is)^(?:Search.+\R)|(?:\t.+\R)|(?:\s{2}(.+?)\s\(\d+\shits?\))
    Replace-with zone: \1
    Wrap-around checkbox: ticked
    Search mode: Regular expression
    Action: Press the Replace All button

    You should be left with only the pathnames that had hits during your search.



  • …and how to remove the text e.g.: “(336 hits)”, that is appended to the end of these paths ?
    That text is not a part of the path/filename and will mess up my batch file.



  • The regular expression replacement above should remove the (xx hits) part. I just retested it and it works for me, but I’m guessing you tried it and it didn’t work for you? Hmmm…



  • Hello, @robinson-George, @scott-sumner and All,

    UPDATE of the regex, on 03/19/18 22h58 (French time zone) => No leading spaces, before each pathname

    Ah, Scott, I was, also, unable to find out that bloody post, where I tried to extract the absolute pathnames, only, from the Find result contents ;-))

    Anyway, thinking about it, I’ve found out a simple regex to achieve that goal and I don’t think I could do better, in the past !!

    So, @robinson-George, once you performed your search and got the Full Result contents, just follow these few steps, below :

    • Right-click, on the Find result window, and choose the Select All option

    • Left-click, anywhere, in the Find result window and execute a Ctrl + C operation, to copy the whole selection to the clipboard ( DO NOT use the built-in Copy option ! )

    • Then, open a new tab, ( Ctrl + N )

    • Paste the previous selection (Ctrl + V )

    • Now, open the Replace dialog ( Ctrl +H )

    SEARCH (?-is)^[S\t].+\R|^\x20\x20|\x20\(\d+\x20hits?\)$

    REPLACE Leave EMPTY

    • Tick the Wrap around version

    • Select the Regular expression mode

    • Click on the Replace All button

    Et voilà ! You should obtain all the absolute paths of the files, which matched your original search, one per line ;-))


    Remark :

    • To easily see all the pathnames, of the concerned files, at a glance, in the Find result window :

      • Hold down the Ctrl key and left-click, simultaneously, on the squared minus -, before the blue string Search

      • Release the Ctrl key and left-click on the squared plus +, before the blue string Search

    • To expand and get back the complete list :

      • Left-Click on the squared minus -, before the blue string Search

      • Hold down the Ctrl key and click, simultaneously, on the squared plus +, before the blue string Search

    Cheers,

    guy038



  • @guy038

    Nice simplification–I was just trying to be quick-n-dirty, figuring you would provide the link to the “bloody post”–but you simplified it too much: You should have left shits? in–not too often do you get to use cool text like that in searches! :-)



  • Yup, that regular expression did not work for me because it assumes that the file contains only text. In my case it also contains binary data.
    However, it successfully removes the “hits” at the end of the lines.



  • Since there seem to be no easy solutions to this problem, which are INDEPENDENT OF FILE CONTENTS, I’d like to ask what is the best way to make a new feature request to the authors of Notepad++ ?

    Adding a check box “Output Filenames Only” to the “Search in Files” tab should take them 15 minutes since it does not require creating a new algorithm or new data. Just a trimming down of the data currently output to the Find results window.

    If they care about the speed of the search, they could also terminate searching the current file once the first hit is made …and go on to searching the next file.
    This way, if a 100MB file contains 9999 matching strings and the 1st hit is located in the beginning of this file, then only a small portion of it would need to be searched to output the filename (i.e. the beginning of it). Statistically, that would save a lot of CPU cycles and disk I/O when searching a set of big files.

    Alternately, fixing the Ctrl-C vs. RightClick-Copy behavior to be synonymous and to copy only file names when the results are collapsed (+) in the Results window, would be an elegant solution to this problem.
    Getting rid of the (hits xxx) text at the ends of the paths, remains but seems to be solvable with a regex workaround, etc…



  • @Robinson-George

    …assumes that the file contains only text. In my case it also contains binary data…
    …no easy solutions to this problem, which are INDEPENDENT OF FILE CONTENTS…

    Well, this is rather a grey area (but not really, to me)…Notepad++ is a text editor, and while you can do some things with it regarding non-text (binary) data, it is not really advisable. I think the regular expressions above stand as good solutions to the normal use.

    Searching is a frequently complained-about area of Notepad++; heck, I do it myself. :-) But the replies tend to be, especially for Find in Files, that Notepad++ isn’t the right tool for intensive searches, and that you should “use the right tool for the right job”. I’m not saying I agree with that, necessarily, just filling you in from what I’ve observed here in the past.

    Regarding “terminate searching the current file once the first hit”, you might be interested in this related thread.

    Your best option is to make a feature request or a change-in-functionality request here by opening a “New issue”. However, please spend a little time searching to see if your specific request already exists, in which case you can “up-vote” that Open Issue so that in theory it gets more attention. You may also add to an already-open Issue any new information you think relevant.



  • Hi, @robinson-george, @scott-sumner and All,

    @scott-sumner,

    Yeah ! At the first glance to your regex ....\(\d+\shits?\)), I thought that you just did a mistake :-))


    @robinson-george, you said :

    Adding a check box “Output Filenames Only” to the “Search in Files” tab

    I do agree, that would be a valuable option !

    to copy only file names when the results are collapsed (+) in the Results window

    Again, I agree with that possible improvement !

    fixing the Ctrl-C vs. RightClick-Copy behavior to be synonymous

    Personally, once you’re aware of that particularity, I think that it’s worth having two ways to paste Find result text :

    • All the selected text, when hitting the Ctrl + C shortcut

    • The matched lines, ONLY, when using the context menu Copy option

    If they care about the speed of the search, they could also terminate searching the current file once the first hit is made …and go on to searching the next file.

    Of course, I do agree, that this would be the obvious optimization, to insert in code ! But Ah, Ah ! Thinking about it, I found out a nice way to simulate both a rapid search and a short text report ;-))

    By that mean, if you simply want to know what are the files, matching a specific string, text or regex, no need to filter the Find result window, anymore !

    So, the miracle regex is :

    (String|Text|Regex)to Search(?s).*\K , where you replace the (String|Text|Regex) with your own text

    For instance, I’ve got, presently, 34 opened tabs, in the two views, of N++ v7.5.5, whose two copies of notepad++.exe, that I previously renamed as xxx.txt and Small text to see.txt !!

    Then I decided to search for the simple word the, in all these files. Hence, the regex the(?s).*\K. One second later, though my anti-virus was performing, simultaneously, a scan, I got the Find result window, which contained two lines per file :

    • The absolute pathname of the file

    • The last line of the file, which may be virtual, when the last physical line ends with a line break !

    Then, as in my previous post :

    • Select all the Find result window and copy its contents with the CTrl + C

    • Open a new tab, with Ctrl + N and paste the clipboard with Ctrl + V

    • Finally, using this other regex S/R, below, on the new tab contents, you’ll get the complete list of files, matching the word the, at least one time ;-))

    SEARCH (?-is)^\x20\x20(.+)\x20\(\d+\x20hits?\)\R[^\r\n]+|^Search.+\R

    REPLACE ?1\1

    Cheers,

    guy038



  • @guy038

    I have verified that your workaround works even with files that are not all text, e.g. all of the *.exe files in C:\Windows directory.
    The final cleanup regex successfully cleans up even the binary garbage that is interlaced between the lines with paths.
    The (hits xxx) text is also correctly removed from the end of the paths. I did not test how it behaves if the path accidentally contains, e.g. a string like “(hits from the 1960s)”

    So, I must write “good job” overall and I hope N++ authors are reading this to see how many steps us poor users have to go through to extract the file paths of files containing the matching string.

    P.S.
    Could you explain how the (?s).*\K regex works ?
    Also, if it is not too much to ask: What would be the “cleanup regex” that outputs just the hit numbers in the first column in front of the paths? e.g.:
    243 C:\Windows\explorer.exe



  • …or a path like:
    C:\MyMusic\The Best of ABBA (1970 hits)\Playlist.m3u



  • @Robinson-George said:

    C:\MyMusic\The Best of ABBA (1970 hits)\Playlist.m3u

    This won’t cause a problem because the regular expression looks for hits) (or hit)) at the very end of a line.

    I think one of the B’s in ABBA is backwards…but no matter…



  • Hello, @robinson-george,

    No problem , George !

    A) :

    What does the generic (String|Text|Regex)(?s).*\K regex match ? Well, once you change the part (String|Text|Regex) with your effective text :

    • First, the regex just matches your own text

    • Then the (?s) modifier tells the regex engine that, from now on, any dot . will match any single character ( standard or EOL one )

    • Thus, the .* syntax matches all text, after the first occurrence of your text, till the very end of each file

    • Finally, the \K structure resets the regex engine search. So the final overall match is, simply, the zero length string, at the very end of each file

    • So, the Find result simply displays the last line ( real or virtual ), where this empty string has been found !

    Et voilà !

    You may give it a try with the regex (?s).*\K, which should display, in the Find result window, the list of all the opened documents, loaded in the current N++ session, or all the files, involved in a Find in Files search, followed with the unique last line containing that zero-length string !


    B)

    Now, supposing that all your pathnames are of the form Letter Drive:\....\.....\FileName.Extension or, possibly, new ## ( but NOT as “network paths”, using the Universal Naming Convention syntax ). I, then, changed the regex to match some variants of paths, collected in the new tab !

    Thus, the global syntax, with possible spaces characters, in paths :

    ^[Any range of characters]Upper Letter:\... ....\...\... ... ....\File Name.ext[ (1 hit)]$
    

    is, now, supported by the new search regex. So, given the sample text, below, in a new tab, which could be the results of the the(?s).*\K search regex :

    Search "the(?s).*\K" (12 hits in 12 files)
      C:\MyMusic\The Best of ABBA (1970 hits)\Playlist.m3u (1 hit)
    	Line 596: 
             Any Char				new 3 (1 hit)
    	Line 16: 
      C:\_755\license.txt (1 hit)
    	Line 116: 
      C:\Program Files\Notepad++\Tests\NativeLang.xml (1 hit)
    	Line 596: 
      C:\MyMusic\The Best of ABBA (1970 hits)\Nice song.m3u
    	Line 345: 
    C:\Program Files\Notepad++\Tests\MySecond File.txt (1 hit)
    	Line 100: 
      C:\Program Files\Notepad++\Lettres\George_3.txt (1 hit)
    	Line 97: 
      C:\_755\xxx.txt (1 hit)
    	Line 12538: .¤B"¸?5ø{v\^ê×	ª™ý=ú}ˆÑz¤4²GÏXð™°#B±
      C:\Program Files\Notepad++\Lettres\RegexDocum.txt
    	Line 12894: 
    C:\Program Files\Notepad++\Tests\My Third File.txt
    	Line 100: 
    1234 C:\Program Files\Notepad++\Tests\MyFile.txt (1 hit)
    	Line 856: 
               1234 C:\MyMusic\The Best of ABBA (1970 hits)\Test.m3u (1 hit)
    	Line 123:
    

    The new regex S/R, below :

    SEARCH (?-is)^.*?(([A-Z]:\\|new).+?)(\x20\(1\x20hit\)$)?\R[^\r\n]+|^Search.+\R

    REPLACE ?1\1

    Would give the following 12 absolute paths, below :

    C:\MyMusic\The Best of ABBA (1970 hits)\Playlist.m3u
    new 3
    C:\_755\license.txt
    C:\Program Files\Notepad++\Tests\NativeLang.xml
    C:\MyMusic\The Best of ABBA (1970 hits)\Nice song.m3u
    C:\Program Files\Notepad++\Tests\MySecond File.txt
    C:\Program Files\Notepad++\Lettres\George_3.txt
    C:\_755\xxx.txt
    C:\Program Files\Notepad++\Lettres\RegexDocum.txt
    C:\Program Files\Notepad++\Tests\My Third File.txt
    C:\Program Files\Notepad++\Tests\MyFile.txt
    C:\MyMusic\The Best of ABBA (1970 hits)\Test.m3u
    

    IMORTANT :

    Beware that the regex (?-is)^.*?(([A-Z]:\\|new).+?)(\x20\(1\x20hit\)$)?\R[^\r\n]+|^Search.+\R expects, both :

    • The string (1 hit), when present

    • An unique line of results, after the absolute pathname line

    So, this regex must be performed ONLY on the results of the previous regex search ( (String|Text|Regex)(?s).*\K ), on multiple files !

    Cheers,

    guy038

    P.S. : As usual, things are harder to explain than to execute ;-))


    For noob people, about regular expressions concept and syntax, begin with that article, in N++ Wiki :

    http://docs.notepad-plus-plus.org/index.php/Regular_Expressions

    In addition, you’ll find good documentation, about the Boost C++ Regex library, v1.55.0 ( similar to the PERL Regular Common Expressions, v5.8 ), used by Notepad++, since its 6.0 version, at the two addresses below :

    http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/syntax/perl_syntax.html

    http://www.boost.org/doc/libs/1_55_0/libs/regex/doc/html/boost_regex/format/boost_format_syntax.html

    • The FIRST link explains the syntax, of regular expressions, in the SEARCH part

    • The SECOND link explains the syntax, of regular expressions, in the REPLACEMENT part


    You may, also, look for valuable information, on the sites, below :

    http://www.regular-expressions.info

    http://www.rexegg.com

    http://perldoc.perl.org/perlre.html

    Be aware that, as any documentation, it may contain some errors ! Anyway, if you detected one, that’s good news : you’re improving ;-))


Log in to reply