Search text in source code excluding comments
-
Hallo.
Notepad++ search functionality is great and very superior to most IDEs I am using.
I would find it extremely useful to be able to exclude commented text while searching in my programming source code for a given text.
This issue has been mentioned by other people in the web regrading Visual Studio et altri. However I believe that is has not been implemented so far.
Please comment if it is there already and I have missed it.
Thanks.Regards. -
Hello, @eduardog26,
As always, we could use regular expressions to simulate a NON comment search ! The generic regex S/R is :
SEARCH
(?s)<Multi-Lines START Comment>.*<Multi-Lines END Comment>|(?-s)<ONE-Line Comment>.*|(Regex)REPLACE
?1\U\1:$0For instance, considering the C language, with :
-
Multi-Lines START Comment = /*
-
Multi-Lines END Comment = */
-
ONE-Line Comment = //
This generic search regex becomes :
(?s)/\*.*\*/|(?-s)//.*|(Regex)
Now, given the example text, below :
This is //A comment Oh ! A non comment line This is//A comment, with a comment symbol // !! Line of code //Comment // This is /*A comment Still a COMMENT line This is//A comment, with a comment symbol // !! Bla Bla Bla */ Instructions { A non comment line, too Line of code//Last commentLet’s suppose that we want to upper-case any word, containing, at least, one letter O, or o
Well, a possible regex, for catching this kind of word, could be
(?i)\b\w*o\w*\b. Therefore, the final S/R is, finally :SEARCH
(?s)/\*.*\*/|(?-s)//.*|(?i)(\b\w*o\w*\b)REPLACE
?1\U\1:$0And, after clicking on the Replace All button, we get the modified text, below :
This is //A comment OH ! A NON COMMENT line This is//A comment, with a comment symbol // !! Line OF CODE //Comment // This is /*A comment Still a COMMENT line This is//A comment, with a comment symbol // !! Bla Bla Bla */ INSTRUCTIONS { A NON COMMENT line, TOO Line OF CODE//Last commentAs expected, all the words, containing a letter O or o, and which are outside comment areas, are changed upper-case !!
Notes :
-
The search regex contains three alternatives, separated by two
|regex symbols-
The first alternative,
(?s)/\*.*\*/, looks for any multi-lines comment. The(?s)modifier => Dot = any character -
The second alternative,
(?-s)//.*, looks for any one-line comment. The(?-s)modifier => Dot = any standard character -
The third alternative,
(?i)(\b\w*o\w*\b)is our active text, stored as group 1, due to the enclosed parentheses
-
-
It’s important to note the order of the search : First, the comment areas, secondly, if not, the interesting text to search for !
-
Now, the replacement regex
?1\U\1:$0is a conditional replacement, which means :-
IF group 1 does NOT exist, the part after the semicolon
:is executed. Then, as$0stands for the complete match, any comment area, which is outside group 1, is just re-written ! -
IF group 1 exists, the part between
?1and the semicolon:is executed. Then, due to the\U\1code, the group 1 ( any word, containing a O letter, whatever its case, due to the(?i)modifier ) is re-written, in upper-case
-
IMPORTANT :
When I first thought about it, my regex tried to catch NON-comment text, only ! But, despite I succeeded to build correct regexes, I met some particular cases ( for instance, a comment area, containing other comment symbols ! ) which did not work properly !
Finally, I found out a rule that is worth to remember :
Rather than building a regex which EXCLUDES any NON-wanted text, it’s, generally, better to build a regex which INCLUDES this NON-wanted text, and get rid of it, immediately, by a NO-change replacement. Thus, afterwards, we, just, have to elaborate the suitable replacement for the only text, about we care of :-))
Best Regards,
guy038
-
-
Hi Guy. The original poster was asking about searching, but your reply discusses replacing. I tried some things with your search regexes without the replacement part, but got results that weren’t in line with what the OP wanted (for instance, I got hits like shown below, where clearly the hit text is sometimes INSIDE the comments). I must be missing the key point here; can you help clarify my misunderstanding?

-
Hi, @eduardog26 and @scott-sumner,
Oh, yes, Scott, you’re perfectly right, about the search operation, without any replacement :-((
Of course, my previous regex works, only if the conditional replacement is present and if you perform a replace operation !
Why this behaviour ? Just because, the conditional replacement, separates, in two parts :
-
The search matches, about comments, which must NOT be changed, in any case ( so, replaced by themselves ! (
:$0) -
The search matches, about real code, which MUST be changed, according to the first part ( case TRUE : group 1 exists ) of the conditional replacement
?1\U\1
Indeed ! If you just want to search, for text, in particular zones ( Inside NO-comments zones, in our case ), the logic, exposed in my previous post, cannot be used !
I began to search for this problem. But, up to now, I cannot figure out any intelligent solution ! I just thought about the trivial possibility of getting rid of all comment areas which, automatically, would enable a search on code, only :-)
See you later !
Cheers,
guy038
-
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login