Search text in source code excluding comments
-
Hallo.
Notepad++ search functionality is great and very superior to most IDEs I am using.
I would find it extremely useful to be able to exclude commented text while searching in my programming source code for a given text.
This issue has been mentioned by other people in the web regrading Visual Studio et altri. However I believe that is has not been implemented so far.
Please comment if it is there already and I have missed it.
Thanks.Regards. -
Hello, @eduardog26,
As always, we could use regular expressions to simulate a NON comment search ! The generic regex S/R is :
SEARCH
(?s)<Multi-Lines START Comment>.*<Multi-Lines END Comment>|(?-s)<ONE-Line Comment>.*|(Regex)
REPLACE
?1\U\1:$0
For instance, considering the C language, with :
-
Multi-Lines START Comment = /*
-
Multi-Lines END Comment = */
-
ONE-Line Comment = //
This generic search regex becomes :
(?s)/\*.*\*/|(?-s)//.*|(Regex)
Now, given the example text, below :
This is //A comment Oh ! A non comment line This is//A comment, with a comment symbol // !! Line of code //Comment // This is /*A comment Still a COMMENT line This is//A comment, with a comment symbol // !! Bla Bla Bla */ Instructions { A non comment line, too Line of code//Last comment
Let’s suppose that we want to upper-case any word, containing, at least, one letter O, or o
Well, a possible regex, for catching this kind of word, could be
(?i)\b\w*o\w*\b
. Therefore, the final S/R is, finally :SEARCH
(?s)/\*.*\*/|(?-s)//.*|(?i)(\b\w*o\w*\b)
REPLACE
?1\U\1:$0
And, after clicking on the Replace All button, we get the modified text, below :
This is //A comment OH ! A NON COMMENT line This is//A comment, with a comment symbol // !! Line OF CODE //Comment // This is /*A comment Still a COMMENT line This is//A comment, with a comment symbol // !! Bla Bla Bla */ INSTRUCTIONS { A NON COMMENT line, TOO Line OF CODE//Last comment
As expected, all the words, containing a letter O or o, and which are outside comment areas, are changed upper-case !!
Notes :
-
The search regex contains three alternatives, separated by two
|
regex symbols-
The first alternative,
(?s)/\*.*\*/
, looks for any multi-lines comment. The(?s)
modifier => Dot = any character -
The second alternative,
(?-s)//.*
, looks for any one-line comment. The(?-s)
modifier => Dot = any standard character -
The third alternative,
(?i)(\b\w*o\w*\b)
is our active text, stored as group 1, due to the enclosed parentheses
-
-
It’s important to note the order of the search : First, the comment areas, secondly, if not, the interesting text to search for !
-
Now, the replacement regex
?1\U\1:$0
is a conditional replacement, which means :-
IF group 1 does NOT exist, the part after the semicolon
:
is executed. Then, as$0
stands for the complete match, any comment area, which is outside group 1, is just re-written ! -
IF group 1 exists, the part between
?1
and the semicolon:
is executed. Then, due to the\U\1
code, the group 1 ( any word, containing a O letter, whatever its case, due to the(?i)
modifier ) is re-written, in upper-case
-
IMPORTANT :
When I first thought about it, my regex tried to catch NON-comment text, only ! But, despite I succeeded to build correct regexes, I met some particular cases ( for instance, a comment area, containing other comment symbols ! ) which did not work properly !
Finally, I found out a rule that is worth to remember :
Rather than building a regex which EXCLUDES any NON-wanted text, it’s, generally, better to build a regex which INCLUDES this NON-wanted text, and get rid of it, immediately, by a NO-change replacement. Thus, afterwards, we, just, have to elaborate the suitable replacement for the only text, about we care of :-))
Best Regards,
guy038
-
-
Hi Guy. The original poster was asking about searching, but your reply discusses replacing. I tried some things with your search regexes without the replacement part, but got results that weren’t in line with what the OP wanted (for instance, I got hits like shown below, where clearly the hit text is sometimes INSIDE the comments). I must be missing the key point here; can you help clarify my misunderstanding?
-
Hi, @eduardog26 and @scott-sumner,
Oh, yes, Scott, you’re perfectly right, about the search operation, without any replacement :-((
Of course, my previous regex works, only if the conditional replacement is present and if you perform a replace operation !
Why this behaviour ? Just because, the conditional replacement, separates, in two parts :
-
The search matches, about comments, which must NOT be changed, in any case ( so, replaced by themselves ! (
:$0
) -
The search matches, about real code, which MUST be changed, according to the first part ( case TRUE : group 1 exists ) of the conditional replacement
?1\U\1
Indeed ! If you just want to search, for text, in particular zones ( Inside NO-comments zones, in our case ), the logic, exposed in my previous post, cannot be used !
I began to search for this problem. But, up to now, I cannot figure out any intelligent solution ! I just thought about the trivial possibility of getting rid of all comment areas which, automatically, would enable a search on code, only :-)
See you later !
Cheers,
guy038
-