Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Search text in source code excluding comments

    General Discussion
    3
    4
    3745
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • EduardoG26
      EduardoG26 last edited by

      Hallo.
      Notepad++ search functionality is great and very superior to most IDEs I am using.
      I would find it extremely useful to be able to exclude commented text while searching in my programming source code for a given text.
      This issue has been mentioned by other people in the web regrading Visual Studio et altri. However I believe that is has not been implemented so far.
      Please comment if it is there already and I have missed it.
      Thanks.Regards.

      1 Reply Last reply Reply Quote 2
      • guy038
        guy038 last edited by guy038

        Hello, @eduardog26,

        As always, we could use regular expressions to simulate a NON comment search ! The generic regex S/R is :

        SEARCH (?s)<Multi-Lines START Comment>.*<Multi-Lines END Comment>|(?-s)<ONE-Line Comment>.*|(Regex)

        REPLACE ?1\U\1:$0

        For instance, considering the C language, with :

        • Multi-Lines START Comment = /*

        • Multi-Lines END Comment = */

        • ONE-Line Comment = //

        This generic search regex becomes :

        (?s)/\*.*\*/|(?-s)//.*|(Regex)


        Now, given the example text, below :

        This is //A comment
        Oh ! A non comment line
        This is//A comment, with a comment symbol // !!
        Line of code //Comment
        
        //
        
        This is /*A comment
        Still a COMMENT line
        This is//A comment, with a comment symbol // !!
        Bla Bla Bla */ Instructions
        
        {
        A non comment line, too
        Line of code//Last comment
        

        Let’s suppose that we want to upper-case any word, containing, at least, one letter O, or o

        Well, a possible regex, for catching this kind of word, could be (?i)\b\w*o\w*\b. Therefore, the final S/R is, finally :

        SEARCH (?s)/\*.*\*/|(?-s)//.*|(?i)(\b\w*o\w*\b)

        REPLACE ?1\U\1:$0

        And, after clicking on the Replace All button, we get the modified text, below :

        This is //A comment
        OH ! A NON COMMENT line
        This is//A comment, with a comment symbol // !!
        Line OF CODE //Comment
        
        //
        
        This is /*A comment
        Still a COMMENT line
        This is//A comment, with a comment symbol // !!
        Bla Bla Bla */ INSTRUCTIONS
        
        {
        A NON COMMENT line, TOO
        Line OF CODE//Last comment
        

        As expected, all the words, containing a letter O or o, and which are outside comment areas, are changed upper-case !!


        Notes :

        • The search regex contains three alternatives, separated by two | regex symbols

          • The first alternative, (?s)/\*.*\*/, looks for any multi-lines comment. The (?s) modifier => Dot = any character

          • The second alternative, (?-s)//.*, looks for any one-line comment. The (?-s) modifier => Dot = any standard character

          • The third alternative, (?i)(\b\w*o\w*\b) is our active text, stored as group 1, due to the enclosed parentheses

        • It’s important to note the order of the search : First, the comment areas, secondly, if not, the interesting text to search for !

        • Now, the replacement regex ?1\U\1:$0 is a conditional replacement, which means :

          • IF group 1 does NOT exist, the part after the semicolon : is executed. Then, as $0 stands for the complete match, any comment area, which is outside group 1, is just re-written !

          • IF group 1 exists, the part between ?1 and the semicolon : is executed. Then, due to the \U\1 code, the group 1 ( any word, containing a O letter, whatever its case, due to the (?i) modifier ) is re-written, in upper-case


        IMPORTANT :

        When I first thought about it, my regex tried to catch NON-comment text, only ! But, despite I succeeded to build correct regexes, I met some particular cases ( for instance, a comment area, containing other comment symbols ! ) which did not work properly !

        Finally, I found out a rule that is worth to remember :

        Rather than building a regex which EXCLUDES any NON-wanted text, it’s, generally, better to build a regex which INCLUDES this NON-wanted text, and get rid of it, immediately, by a NO-change replacement. Thus, afterwards, we, just, have to elaborate the suitable replacement for the only text, about we care of :-))

        Best Regards,

        guy038

        Scott Sumner 1 Reply Last reply Reply Quote 1
        • Scott Sumner
          Scott Sumner @guy038 last edited by

          @guy038

          Hi Guy. The original poster was asking about searching, but your reply discusses replacing. I tried some things with your search regexes without the replacement part, but got results that weren’t in line with what the OP wanted (for instance, I got hits like shown below, where clearly the hit text is sometimes INSIDE the comments). I must be missing the key point here; can you help clarify my misunderstanding?

          1 Reply Last reply Reply Quote 0
          • guy038
            guy038 last edited by guy038

            Hi, @eduardog26 and @scott-sumner,

            Oh, yes, Scott, you’re perfectly right, about the search operation, without any replacement :-((

            Of course, my previous regex works, only if the conditional replacement is present and if you perform a replace operation !

            Why this behaviour ? Just because, the conditional replacement, separates, in two parts :

            • The search matches, about comments, which must NOT be changed, in any case ( so, replaced by themselves ! ( :$0 )

            • The search matches, about real code, which MUST be changed, according to the first part ( case TRUE : group 1 exists ) of the conditional replacement ?1\U\1


            Indeed ! If you just want to search, for text, in particular zones ( Inside NO-comments zones, in our case ), the logic, exposed in my previous post, cannot be used !

            I began to search for this problem. But, up to now, I cannot figure out any intelligent solution ! I just thought about the trivial possibility of getting rid of all comment areas which, automatically, would enable a search on code, only :-)

            See you later !

            Cheers,

            guy038

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Copyright © 2014 NodeBB Forums | Contributors