Search text in source code excluding comments



  • Hallo.
    Notepad++ search functionality is great and very superior to most IDEs I am using.
    I would find it extremely useful to be able to exclude commented text while searching in my programming source code for a given text.
    This issue has been mentioned by other people in the web regrading Visual Studio et altri. However I believe that is has not been implemented so far.
    Please comment if it is there already and I have missed it.
    Thanks.Regards.



  • Hello, @eduardog26,

    As always, we could use regular expressions to simulate a NON comment search ! The generic regex S/R is :

    SEARCH (?s)<Multi-Lines START Comment>.*<Multi-Lines END Comment>|(?-s)<ONE-Line Comment>.*|(Regex)

    REPLACE ?1\U\1:$0

    For instance, considering the C language, with :

    • Multi-Lines START Comment = /*

    • Multi-Lines END Comment = */

    • ONE-Line Comment = //

    This generic search regex becomes :

    (?s)/\*.*\*/|(?-s)//.*|(Regex)


    Now, given the example text, below :

    This is //A comment
    Oh ! A non comment line
    This is//A comment, with a comment symbol // !!
    Line of code //Comment
    
    //
    
    This is /*A comment
    Still a COMMENT line
    This is//A comment, with a comment symbol // !!
    Bla Bla Bla */ Instructions
    
    {
    A non comment line, too
    Line of code//Last comment
    

    Let’s suppose that we want to upper-case any word, containing, at least, one letter O, or o

    Well, a possible regex, for catching this kind of word, could be (?i)\b\w*o\w*\b. Therefore, the final S/R is, finally :

    SEARCH (?s)/\*.*\*/|(?-s)//.*|(?i)(\b\w*o\w*\b)

    REPLACE ?1\U\1:$0

    And, after clicking on the Replace All button, we get the modified text, below :

    This is //A comment
    OH ! A NON COMMENT line
    This is//A comment, with a comment symbol // !!
    Line OF CODE //Comment
    
    //
    
    This is /*A comment
    Still a COMMENT line
    This is//A comment, with a comment symbol // !!
    Bla Bla Bla */ INSTRUCTIONS
    
    {
    A NON COMMENT line, TOO
    Line OF CODE//Last comment
    

    As expected, all the words, containing a letter O or o, and which are outside comment areas, are changed upper-case !!


    Notes :

    • The search regex contains three alternatives, separated by two | regex symbols

      • The first alternative, (?s)/\*.*\*/, looks for any multi-lines comment. The (?s) modifier => Dot = any character

      • The second alternative, (?-s)//.*, looks for any one-line comment. The (?-s) modifier => Dot = any standard character

      • The third alternative, (?i)(\b\w*o\w*\b) is our active text, stored as group 1, due to the enclosed parentheses

    • It’s important to note the order of the search : First, the comment areas, secondly, if not, the interesting text to search for !

    • Now, the replacement regex ?1\U\1:$0 is a conditional replacement, which means :

      • IF group 1 does NOT exist, the part after the semicolon : is executed. Then, as $0 stands for the complete match, any comment area, which is outside group 1, is just re-written !

      • IF group 1 exists, the part between ?1 and the semicolon : is executed. Then, due to the \U\1 code, the group 1 ( any word, containing a O letter, whatever its case, due to the (?i) modifier ) is re-written, in upper-case


    IMPORTANT :

    When I first thought about it, my regex tried to catch NON-comment text, only ! But, despite I succeeded to build correct regexes, I met some particular cases ( for instance, a comment area, containing other comment symbols ! ) which did not work properly !

    Finally, I found out a rule that is worth to remember :

    Rather than building a regex which EXCLUDES any NON-wanted text, it’s, generally, better to build a regex which INCLUDES this NON-wanted text, and get rid of it, immediately, by a NO-change replacement. Thus, afterwards, we, just, have to elaborate the suitable replacement for the only text, about we care of :-))

    Best Regards,

    guy038



  • @guy038

    Hi Guy. The original poster was asking about searching, but your reply discusses replacing. I tried some things with your search regexes without the replacement part, but got results that weren’t in line with what the OP wanted (for instance, I got hits like shown below, where clearly the hit text is sometimes INSIDE the comments). I must be missing the key point here; can you help clarify my misunderstanding?



  • Hi, @eduardog26 and @scott-sumner,

    Oh, yes, Scott, you’re perfectly right, about the search operation, without any replacement :-((

    Of course, my previous regex works, only if the conditional replacement is present and if you perform a replace operation !

    Why this behaviour ? Just because, the conditional replacement, separates, in two parts :

    • The search matches, about comments, which must NOT be changed, in any case ( so, replaced by themselves ! ( :$0 )

    • The search matches, about real code, which MUST be changed, according to the first part ( case TRUE : group 1 exists ) of the conditional replacement ?1\U\1


    Indeed ! If you just want to search, for text, in particular zones ( Inside NO-comments zones, in our case ), the logic, exposed in my previous post, cannot be used !

    I began to search for this problem. But, up to now, I cannot figure out any intelligent solution ! I just thought about the trivial possibility of getting rid of all comment areas which, automatically, would enable a search on code, only :-)

    See you later !

    Cheers,

    guy038


Log in to reply