Hi, @dan-walter, @peterjones, @alan-kilborn and All,
For people who are wondering :
Why doesn’t use the usual . regex char to match a standard character of the BMP ?
Why does he care about characters with Unicode code-point > U+FFFF ?
Well, regarding the first question :
Some characters, like the Form Feed character ( \x0C - \f ), are not matched by the regex Dot char !
Some characters as FF - Form Feed \x000C, NEL - NEw Line \x{0085}, LS - Line Separator \x{2028}, PS - Paragraph Separator \x{2029} act in the same way as the classical EOL chars ( \r and/or \n ). So, the ^ assertion matches right after these characters and the $ assertion matches right before these characters !
Consequently, the simple regex [^\r\n] seems the best syntax to match all characters of the Unicode Basic Multilingual Plane but the two \r and \n chars !
Now, regarding the second question :
When a file is UTF-8 encoded, it may contain characters with Unicode code-point > U+FFFF ( according to a 4 bytes sequence for each char ). Unfortunately, the usual regex syntax . cannot find these characters.
After some tests, the BOOST regex .[\x{D800}-\x{DFFF}] syntax seems to match correctly all these characters over the BMP.
Although I don’t have any clear explanation for this syntax, I suppose that it’s certainly related to the surrogate pair mechanism, used to express these characters in two-bytes encoded files !
So, here is the third question, from Alan :
Why the result contains the first line : Search "•••••" (••••• hits in ••• files of ••• searched) ?
Well, to my mind, this first line seems rather informative. This is the only reason why I kept this line, in the final result !
Now to be rigorous, and in order to delete the leading space chars before the absolute paths, too, prefer this regex S/R :
SEARCH (?-s)^\t(?:[^\r\n]|.[\x{D800}-\x{DFFF}])+\R|^\x20\x20(.)|\A.+\R
REPLACE ?1\1
Oh… wait a minute ! Here is an other method, much more simple ;-)) However, you need, at least, N++ release v7.9.1
Perform your search with, either, the buttons Find All in Current Document, Find All in All Opened Documents or Find All
Once the Search Results panel opened, right-click in any location of the Search Results panel
Choose the Select All option
Choose the Copy option ( not the Copy Selected Line(s) option )
Click on the Esc key to close the Search Results panel
Open a new tab ( Ctrl + N )
Paste all the clipboard contents with Ctrl + V
Open the Mark dialog ( Ctrl + M )
SEARCH ^\x20\x20\K.+(?=\x20.*\d.*$)
Untick all the square box options
Tick the Wrap around option
Select the Regular expression search mode
Click on the Mark All button
Click on the Copy Marked Text button
Click on the Esc key to close the Mark dialog
Open, again, a new tab ( Ctrl + N ) or replace the contents of the present new tab
Paste all the clipboard contents with Ctrl + V
Et voilà ! Done ;-))
Best Regards,
guy038
P.S. :
Each line displayed :
Always begins with 2 space characters ( hard-coded )
Is followed with an absolute path which may contain some space chars and other characters than word chars !
Followed with 1 space character ( hard-coded )
Ending with the default text (# hits) where # stands for any mandatory integer number
However, as the text, after each absolute path is now translatable, we must consider various possibilities. Here are, below, some of these :
D:\@@\Doc N++\Abc def_789(012).txt 1
D:\@@\Doc N++\Abc def_789(012).txt 1
D:\@@\Doc N++\Abc def_789(012).123 1
D:\@@\Doc N++\Abc def_789(012).123 1
D:\@@\Doc N++\Abc def_789(012) 1
D:\@@\Doc N++\Abc def_789(012) 1
D:\@@\Doc N++\Abc def_789(012).txt 1 hits
D:\@@\Doc N++\Abc def_789(012).txt 1 hits
D:\@@\Doc N++\Abc def_789(012).123 1 hits
D:\@@\Doc N++\Abc def_789(012).123 1 hits
D:\@@\Doc N++\Abc def_789(012) 1 hits
D:\@@\Doc N++\Abc def_789(012) 1 hits
D:\@@\Doc N++\Abc def_789(012).txt 456
D:\@@\Doc N++\Abc def_789(012).txt 456
D:\@@\Doc N++\Abc def_789(012).123 456
D:\@@\Doc N++\Abc def_789(012).123 456
D:\@@\Doc N++\Abc def_789(012) 456
D:\@@\Doc N++\Abc def_789(012) 456
D:\@@\Doc N++\Abc def_789(012).txt 456 hits
D:\@@\Doc N++\Abc def_789(012).txt 456 hits
D:\@@\Doc N++\Abc def_789(012).123 456 hits
D:\@@\Doc N++\Abc def_789(012).123 456 hits
D:\@@\Doc N++\Abc def_789(012) 456 hits
D:\@@\Doc N++\Abc def_789(012) 456 hits
D:\@@\Doc N++\Abc def_789(012).txt (1)
D:\@@\Doc N++\Abc def_789(012).txt (1)
D:\@@\Doc N++\Abc def_789(012).123 (1)
D:\@@\Doc N++\Abc def_789(012).123 (1)
D:\@@\Doc N++\Abc def_789(012) (1)
D:\@@\Doc N++\Abc def_789(012) (1)
D:\@@\Doc N++\Abc def_789(012).txt (1 hits)
D:\@@\Doc N++\Abc def_789(012).txt (1 hits)
D:\@@\Doc N++\Abc def_789(012).123 (1 hits)
D:\@@\Doc N++\Abc def_789(012).123 (1 hits)
D:\@@\Doc N++\Abc def_789(012) (1 hits)
D:\@@\Doc N++\Abc def_789(012) (1 hits)
D:\@@\Doc N++\Abc def_789(012).txt (456)
D:\@@\Doc N++\Abc def_789(012).txt (456)
D:\@@\Doc N++\Abc def_789(012).123 (456)
D:\@@\Doc N++\Abc def_789(012).123 (456)
D:\@@\Doc N++\Abc def_789(012) (456)
D:\@@\Doc N++\Abc def_789(012) (456)
D:\@@\Doc N++\Abc def_789(012).txt (456 hits)
D:\@@\Doc N++\Abc def_789(012).txt (456 hits)
D:\@@\Doc N++\Abc def_789(012).123 (456 hits)
D:\@@\Doc N++\Abc def_789(012).123 (456 hits)
D:\@@\Doc N++\Abc def_789(012) (456 hits)
D:\@@\Doc N++\Abc def_789(012) (456 hits)
D:\@@\Doc N++\Abc def_789(012).txt ( 1 )
D:\@@\Doc N++\Abc def_789(012).txt ( 1 )
D:\@@\Doc N++\Abc def_789(012).123 ( 1 )
D:\@@\Doc N++\Abc def_789(012).123 ( 1 )
D:\@@\Doc N++\Abc def_789(012) ( 1 )
D:\@@\Doc N++\Abc def_789(012) ( 1 )
D:\@@\Doc N++\Abc def_789(012).txt ( 1 hits )
D:\@@\Doc N++\Abc def_789(012).txt ( 1 hits )
D:\@@\Doc N++\Abc def_789(012).123 ( 1 hits )
D:\@@\Doc N++\Abc def_789(012).123 ( 1 hits )
D:\@@\Doc N++\Abc def_789(012) ( 1 hits )
D:\@@\Doc N++\Abc def_789(012) ( 1 hits )
D:\@@\Doc N++\Abc def_789(012).txt ( 456 )
D:\@@\Doc N++\Abc def_789(012).txt ( 456 )
D:\@@\Doc N++\Abc def_789(012).123 ( 456 )
D:\@@\Doc N++\Abc def_789(012).123 ( 456 )
D:\@@\Doc N++\Abc def_789(012) ( 456 )
D:\@@\Doc N++\Abc def_789(012) ( 456 )
D:\@@\Doc N++\Abc def_789(012).txt ( 456 hits )
D:\@@\Doc N++\Abc def_789(012).txt ( 456 hits )
D:\@@\Doc N++\Abc def_789(012).123 ( 456 hits )
D:\@@\Doc N++\Abc def_789(012).123 ( 456 hits )
D:\@@\Doc N++\Abc def_789(012) ( 456 hits )
D:\@@\Doc N++\Abc def_789(012) ( 456 hits )
You can verify that the regex ^\x20\x20\K.+(?=\x20.*\d.*$) matches most of these lines. Unfortunately, it’s not perfect because, in the third part, it matches the opening bracket and the blank chars before it :-((
You could say : in the range of characters .+, we should not allow an opening bracket right after a space char. But, if you do so, how to match, for instance, a file with name Abc (012 !? In this case, only the Abc part would be matched :-(
In short, the present regex ^\x20\x20\K.+(?=\x20.*\d.*$) may catch some extra characters, in rare occasions. But, knowing the filenames you’re dealing to, it should not be difficult to correct the syntaxes of these few erroneous paths ;-))