Find in files, copy filenames

Dan Walter

The output pane for finding text in files is nicely organized with a filename and as a child of that file name, all the lines containing the text.

Is it possible to copy the filenames? When I select more than one node in the output pane, only the lines are copied.

Thanks!

PeterJones

@Dan-Walter ,

The “search results pane” documentation was recently updated. Does that help explain any better?

(Also, I think v7.9.6 is going to have some improvements in the copy/paste behavior from search results, based on the notes in the usermanual submissions)

Alan Kilborn

Perhaps OP is asking about how to copy only the matching filenames?
If so, there’s no direct way to do it.
One can do it by selecting and copying all of the text from the search results window, and then manipulating it so that only the filenames remain, but that’s a bit of an intense operation…

guy038

Hello, @dan-walter, @peterjones, @alan-kilborn and All,

Not a big task, indeed ! I tested the method, below, with N++ v7.9.2 ! So :

Perform your searh with, either, the buttons Find All in Current Document, Find All in All Opened Documents or Find All
Once the Search Results panel opened, right-click in any location of the Search Results panel
Choose the Select All option
Choose the Copy option ( Not the Copy Selected Line(s) )
Open a new tab ( Ctrl + N )
Paste all the clipboard contents with Ctrl + V
Open the Replace All dialog ( Ctrl + H )
- SEARCH (?-s)^\t(?:[^\r\n]|.[\x{D800}-\x{DFFF}])+\R
- REPLACE Leave EMPTY
- Tick the Wrap around option
- Click on the Replace All button

Voilà !

=> You should get the list of the absolute paths of all the files concerned with the current search

Notes : As any line, not wanted, begins with a tabulation character :

First, the regex engine searches for a tabulation character which starts the current line ( ^\t )
Then it looks for any non-null range of, either :
- A character in the BMP, different from the EOL chars ( [^\r\n] )
- A character over the BMP ( so with Unicode code-point > U+FFFF ( .[\x{D800}-\x{DFFF}] )
The final \R syntax matches the EOL characters of current line scanned, whatever they are
As the replacement zone is empty, all the lines beginning with a Tabulation char are, therefore, deleted

Best Regards

guy038

Alan Kilborn

@guy038 said in Find in files, copy filenames:

You should get the list of the absolute paths of all the files concerned with the current search

Isn’t one left with a bit more than that?

If I try it, by doing a FACD on the license.txt file’s result after searching for “software”, I get:

Search "software" (23 hits in 1 file of 1 searched)
  C:\...\npp.7.9.5.portable.x64\license.txt (23 hits)

when I really expected to get only:

C:\...\npp.7.9.5.portable.x64\license.txt

guy038

Hi, @dan-walter, @peterjones, @alan-kilborn and All,

For people who are wondering :

Why doesn’t use the usual . regex char to match a standard character of the BMP ?
Why does he care about characters with Unicode code-point > U+FFFF ?

Well, regarding the first question :

Some characters, like the Form Feed character ( \x0C - \f ), are not matched by the regex Dot char !
Some characters as FF - Form Feed \x000C, NEL - NEw Line \x{0085}, LS - Line Separator \x{2028}, PS - Paragraph Separator \x{2029} act in the same way as the classical EOL chars ( \r and/or \n ). So, the ^ assertion matches right after these characters and the $ assertion matches right before these characters !
Consequently, the simple regex [^\r\n] seems the best syntax to match all characters of the Unicode Basic Multilingual Plane but the two \r and \n chars !

Now, regarding the second question :

When a file is UTF-8 encoded, it may contain characters with Unicode code-point > U+FFFF ( according to a 4 bytes sequence for each char ). Unfortunately, the usual regex syntax . cannot find these characters.
After some tests, the BOOST regex .[\x{D800}-\x{DFFF}] syntax seems to match correctly all these characters over the BMP.
Although I don’t have any clear explanation for this syntax, I suppose that it’s certainly related to the surrogate pair mechanism, used to express these characters in two-bytes encoded files !

So, here is the third question, from Alan :

Why the result contains the first line : Search "•••••" (••••• hits in ••• files of ••• searched) ?

Well, to my mind, this first line seems rather informative. This is the only reason why I kept this line, in the final result !

Now to be rigorous, and in order to delete the leading space chars before the absolute paths, too, prefer this regex S/R :

SEARCH (?-s)^\t(?:[^\r\n]|.[\x{D800}-\x{DFFF}])+\R|^\x20\x20(.)|\A.+\R
REPLACE ?1\1

Oh… wait a minute ! Here is an other method, much more simple ;-)) However, you need, at least, N++ release v7.9.1

Perform your search with, either, the buttons Find All in Current Document, Find All in All Opened Documents or Find All
Once the Search Results panel opened, right-click in any location of the Search Results panel
- Choose the Select All option
- Choose the Copy option ( not the Copy Selected Line(s) option )
- Click on the Esc key to close the Search Results panel
Open a new tab ( Ctrl + N )
Paste all the clipboard contents with Ctrl + V
Open the Mark dialog ( Ctrl + M )
- SEARCH ^\x20\x20\K.+(?=\x20.*\d.*$)
- Untick all the square box options
- Tick the Wrap around option
- Select the Regular expression search mode
- Click on the Mark All button
- Click on the Copy Marked Text button
- Click on the Esc key to close the Mark dialog
Open, again, a new tab ( Ctrl + N ) or replace the contents of the present new tab
Paste all the clipboard contents with Ctrl + V

Et voilà ! Done ;-))

Best Regards,

guy038

P.S. :

Each line displayed :

Always begins with 2 space characters ( hard-coded )
Is followed with an absolute path which may contain some space chars and other characters than word chars !
Followed with 1 space character ( hard-coded )
Ending with the default text (# hits) where # stands for any mandatory integer number

However, as the text, after each absolute path is now translatable, we must consider various possibilities. Here are, below, some of these :

  D:\@@\Doc N++\Abc def_789(012).txt 1
  D:\@@\Doc N++\Abc def_789(012).txt 	1
  D:\@@\Doc N++\Abc def_789(012).123 1
  D:\@@\Doc N++\Abc def_789(012).123 	1
  D:\@@\Doc N++\Abc def_789(012) 1
  D:\@@\Doc N++\Abc def_789(012) 	1
  D:\@@\Doc N++\Abc def_789(012).txt 1 hits
  D:\@@\Doc N++\Abc def_789(012).txt 	1 hits
  D:\@@\Doc N++\Abc def_789(012).123 1 hits
  D:\@@\Doc N++\Abc def_789(012).123 	1 hits
  D:\@@\Doc N++\Abc def_789(012) 1 hits
  D:\@@\Doc N++\Abc def_789(012) 	1 hits
  D:\@@\Doc N++\Abc def_789(012).txt 456
  D:\@@\Doc N++\Abc def_789(012).txt 	456
  D:\@@\Doc N++\Abc def_789(012).123 456
  D:\@@\Doc N++\Abc def_789(012).123 	456
  D:\@@\Doc N++\Abc def_789(012) 456
  D:\@@\Doc N++\Abc def_789(012) 	456
  D:\@@\Doc N++\Abc def_789(012).txt 456 hits
  D:\@@\Doc N++\Abc def_789(012).txt 	456 hits
  D:\@@\Doc N++\Abc def_789(012).123 456 hits
  D:\@@\Doc N++\Abc def_789(012).123 	456 hits
  D:\@@\Doc N++\Abc def_789(012) 456 hits
  D:\@@\Doc N++\Abc def_789(012) 	456 hits

  D:\@@\Doc N++\Abc def_789(012).txt (1)
  D:\@@\Doc N++\Abc def_789(012).txt 	(1)
  D:\@@\Doc N++\Abc def_789(012).123 (1)
  D:\@@\Doc N++\Abc def_789(012).123 	(1)
  D:\@@\Doc N++\Abc def_789(012) (1)
  D:\@@\Doc N++\Abc def_789(012) 	(1)
  D:\@@\Doc N++\Abc def_789(012).txt (1 hits)
  D:\@@\Doc N++\Abc def_789(012).txt 	(1 hits)
  D:\@@\Doc N++\Abc def_789(012).123 (1 hits)
  D:\@@\Doc N++\Abc def_789(012).123 	(1 hits)
  D:\@@\Doc N++\Abc def_789(012) (1 hits)
  D:\@@\Doc N++\Abc def_789(012) 	(1 hits)
  D:\@@\Doc N++\Abc def_789(012).txt (456)
  D:\@@\Doc N++\Abc def_789(012).txt 	(456)
  D:\@@\Doc N++\Abc def_789(012).123 (456)
  D:\@@\Doc N++\Abc def_789(012).123 	(456)
  D:\@@\Doc N++\Abc def_789(012) (456)
  D:\@@\Doc N++\Abc def_789(012) 	(456)
  D:\@@\Doc N++\Abc def_789(012).txt (456 hits)
  D:\@@\Doc N++\Abc def_789(012).txt 	(456 hits)
  D:\@@\Doc N++\Abc def_789(012).123 (456 hits)
  D:\@@\Doc N++\Abc def_789(012).123 	(456 hits)
  D:\@@\Doc N++\Abc def_789(012) (456 hits)
  D:\@@\Doc N++\Abc def_789(012) 	(456 hits)

  D:\@@\Doc N++\Abc def_789(012).txt ( 1 )
  D:\@@\Doc N++\Abc def_789(012).txt 	( 1 )
  D:\@@\Doc N++\Abc def_789(012).123 ( 1 )
  D:\@@\Doc N++\Abc def_789(012).123 	( 1 )
  D:\@@\Doc N++\Abc def_789(012) ( 1 )
  D:\@@\Doc N++\Abc def_789(012) 	( 1 )
  D:\@@\Doc N++\Abc def_789(012).txt ( 1 hits )
  D:\@@\Doc N++\Abc def_789(012).txt 	( 1 hits )
  D:\@@\Doc N++\Abc def_789(012).123 ( 1 hits )
  D:\@@\Doc N++\Abc def_789(012).123 	( 1 hits )
  D:\@@\Doc N++\Abc def_789(012) ( 1 hits )
  D:\@@\Doc N++\Abc def_789(012) 	( 1 hits )
  D:\@@\Doc N++\Abc def_789(012).txt ( 456 )
  D:\@@\Doc N++\Abc def_789(012).txt 	( 456 )
  D:\@@\Doc N++\Abc def_789(012).123 ( 456 )
  D:\@@\Doc N++\Abc def_789(012).123 	( 456 )
  D:\@@\Doc N++\Abc def_789(012) ( 456 )
  D:\@@\Doc N++\Abc def_789(012) 	( 456 )
  D:\@@\Doc N++\Abc def_789(012).txt ( 456 hits )
  D:\@@\Doc N++\Abc def_789(012).txt 	( 456 hits )
  D:\@@\Doc N++\Abc def_789(012).123 ( 456 hits )
  D:\@@\Doc N++\Abc def_789(012).123 	( 456 hits )
  D:\@@\Doc N++\Abc def_789(012) ( 456 hits )
  D:\@@\Doc N++\Abc def_789(012) 	( 456 hits )

You can verify that the regex ^\x20\x20\K.+(?=\x20.*\d.*$) matches most of these lines. Unfortunately, it’s not perfect because, in the third part, it matches the opening bracket and the blank chars before it :-((

You could say : in the range of characters .+, we should not allow an opening bracket right after a space char. But, if you do so, how to match, for instance, a file with name Abc (012 !? In this case, only the Abc part would be matched :-(

In short, the present regex ^\x20\x20\K.+(?=\x20.*\d.*$) may catch some extra characters, in rare occasions. But, knowing the filenames you’re dealing to, it should not be difficult to correct the syntaxes of these few erroneous paths ;-))