• Login
Community
  • Login

Search text in source code excluding comments

Scheduled Pinned Locked Moved General Discussion
4 Posts 3 Posters 4.8k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • E
    EduardoG26
    last edited by Jul 10, 2017, 1:53 PM

    Hallo.
    Notepad++ search functionality is great and very superior to most IDEs I am using.
    I would find it extremely useful to be able to exclude commented text while searching in my programming source code for a given text.
    This issue has been mentioned by other people in the web regrading Visual Studio et altri. However I believe that is has not been implemented so far.
    Please comment if it is there already and I have missed it.
    Thanks.Regards.

    1 Reply Last reply Reply Quote 2
    • G
      guy038
      last edited by guy038 Jul 16, 2017, 11:35 AM Jul 16, 2017, 11:28 AM

      Hello, @eduardog26,

      As always, we could use regular expressions to simulate a NON comment search ! The generic regex S/R is :

      SEARCH (?s)<Multi-Lines START Comment>.*<Multi-Lines END Comment>|(?-s)<ONE-Line Comment>.*|(Regex)

      REPLACE ?1\U\1:$0

      For instance, considering the C language, with :

      • Multi-Lines START Comment = /*

      • Multi-Lines END Comment = */

      • ONE-Line Comment = //

      This generic search regex becomes :

      (?s)/\*.*\*/|(?-s)//.*|(Regex)


      Now, given the example text, below :

      This is //A comment
      Oh ! A non comment line
      This is//A comment, with a comment symbol // !!
      Line of code //Comment
      
      //
      
      This is /*A comment
      Still a COMMENT line
      This is//A comment, with a comment symbol // !!
      Bla Bla Bla */ Instructions
      
      {
      A non comment line, too
      Line of code//Last comment
      

      Let’s suppose that we want to upper-case any word, containing, at least, one letter O, or o

      Well, a possible regex, for catching this kind of word, could be (?i)\b\w*o\w*\b. Therefore, the final S/R is, finally :

      SEARCH (?s)/\*.*\*/|(?-s)//.*|(?i)(\b\w*o\w*\b)

      REPLACE ?1\U\1:$0

      And, after clicking on the Replace All button, we get the modified text, below :

      This is //A comment
      OH ! A NON COMMENT line
      This is//A comment, with a comment symbol // !!
      Line OF CODE //Comment
      
      //
      
      This is /*A comment
      Still a COMMENT line
      This is//A comment, with a comment symbol // !!
      Bla Bla Bla */ INSTRUCTIONS
      
      {
      A NON COMMENT line, TOO
      Line OF CODE//Last comment
      

      As expected, all the words, containing a letter O or o, and which are outside comment areas, are changed upper-case !!


      Notes :

      • The search regex contains three alternatives, separated by two | regex symbols

        • The first alternative, (?s)/\*.*\*/, looks for any multi-lines comment. The (?s) modifier => Dot = any character

        • The second alternative, (?-s)//.*, looks for any one-line comment. The (?-s) modifier => Dot = any standard character

        • The third alternative, (?i)(\b\w*o\w*\b) is our active text, stored as group 1, due to the enclosed parentheses

      • It’s important to note the order of the search : First, the comment areas, secondly, if not, the interesting text to search for !

      • Now, the replacement regex ?1\U\1:$0 is a conditional replacement, which means :

        • IF group 1 does NOT exist, the part after the semicolon : is executed. Then, as $0 stands for the complete match, any comment area, which is outside group 1, is just re-written !

        • IF group 1 exists, the part between ?1 and the semicolon : is executed. Then, due to the \U\1 code, the group 1 ( any word, containing a O letter, whatever its case, due to the (?i) modifier ) is re-written, in upper-case


      IMPORTANT :

      When I first thought about it, my regex tried to catch NON-comment text, only ! But, despite I succeeded to build correct regexes, I met some particular cases ( for instance, a comment area, containing other comment symbols ! ) which did not work properly !

      Finally, I found out a rule that is worth to remember :

      Rather than building a regex which EXCLUDES any NON-wanted text, it’s, generally, better to build a regex which INCLUDES this NON-wanted text, and get rid of it, immediately, by a NO-change replacement. Thus, afterwards, we, just, have to elaborate the suitable replacement for the only text, about we care of :-))

      Best Regards,

      guy038

      S 1 Reply Last reply Jul 17, 2017, 11:54 AM Reply Quote 1
      • S
        Scott Sumner @guy038
        last edited by Jul 17, 2017, 11:54 AM

        @guy038

        Hi Guy. The original poster was asking about searching, but your reply discusses replacing. I tried some things with your search regexes without the replacement part, but got results that weren’t in line with what the OP wanted (for instance, I got hits like shown below, where clearly the hit text is sometimes INSIDE the comments). I must be missing the key point here; can you help clarify my misunderstanding?

        1 Reply Last reply Reply Quote 0
        • G
          guy038
          last edited by guy038 Jul 17, 2017, 7:02 PM Jul 17, 2017, 7:01 PM

          Hi, @eduardog26 and @scott-sumner,

          Oh, yes, Scott, you’re perfectly right, about the search operation, without any replacement :-((

          Of course, my previous regex works, only if the conditional replacement is present and if you perform a replace operation !

          Why this behaviour ? Just because, the conditional replacement, separates, in two parts :

          • The search matches, about comments, which must NOT be changed, in any case ( so, replaced by themselves ! ( :$0 )

          • The search matches, about real code, which MUST be changed, according to the first part ( case TRUE : group 1 exists ) of the conditional replacement ?1\U\1


          Indeed ! If you just want to search, for text, in particular zones ( Inside NO-comments zones, in our case ), the logic, exposed in my previous post, cannot be used !

          I began to search for this problem. But, up to now, I cannot figure out any intelligent solution ! I just thought about the trivial possibility of getting rid of all comment areas which, automatically, would enable a search on code, only :-)

          See you later !

          Cheers,

          guy038

          1 Reply Last reply Reply Quote 0
          4 out of 4
          • First post
            4/4
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors