Community
    • Login

    UDL comment that requires seperators

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    10 Posts 4 Posters 595 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • G AG
      G A
      last edited by

      I have a language that uses /* and */ to open and close comments; however only when not “glued” to anything else, ie. the comments require separators.
      Is there any way to implement this, perhaps using several delimiters? Or alternatively is there an implementation that would specifically highlight broken comments for fixing?

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @G A
        last edited by PeterJones

        @G-A ,

        Sorry, unlike the “Operators” and “Folders” categories (which have variants for “separators required” and not), the “Comments” and “Delimiters” categories (which are the two categories most often used for code-comments) do not have a separators-required variant or option.

        The Enhance Any Lexer allows you to provide regular expressions (regex) that will add custom foreground coloring. So with that, you could either make a regex that will match a valid comment and turn it green; and if you wanted, one or more separate regex to also turn invalid regex red:

        [MyUDLName]
        ; color each word, 0x66ad1 is the color used, see the description above for more information on the color coding.
        0x00FF00 = (?-s)(^|\s+)/\*\s+.+\s+\*/(\s+|$)
        0x0000FF = (?-s)(?:[^\s\A])\K/\*.*?\*/
        0x0000CC = (?-s)/\*\S.+\*/
        0xCC00CC = (?-s)/\*.+\S\*/
        0xFF00FF = (?-s)/\*.+\*/\S
        ; check in the respective styler xml if the following IDs are valid
        excluded_styles = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20,21,22,23
        

        Rather than introducing lots of alternations or getting clever to make the “red/invalid”, I just did a separate regex for each of the four borders that could be missing a space/separator, to make it more readable. I also used four different red/purple colors, so you could tell which regex was indicating each of the bad comment types that I interpreted from your description.

        Here’s what it looked like for me with the above definition:
        d1a24e36-0616-458d-a2f7-04ed0c34fdd9-image.png
        using “code”:

        /* should be ok at beginning of file */
        ValidCode /* Valid Comment */
        /* also ok */
        
        ValidCode /*InValid Comment */
        /*also wrong */
        ValidCode /* InValid Comment*/
        /* wrong at end*/
        ValidCode/* InValid Comment */
        /* wrong after end */BecauseNoSpaceBeforeHere
        /* also ok at EOF */
        

        -----
        update: to not highlight the B in BecauseNoSpaceBeforeHere, change the last regex to

        0xFF00FF = (?-s)/\*.+\*/(?=\S)
        

        f133c2d9-70c5-4a1f-a488-1423c8dda27a-image.png

        If you aren’t a regex expert:

        • Notepad++ Online User Manual: Searching/Regex
        • FAQ: Where to find other regular expressions (regex) documentation
        G AG 1 Reply Last reply Reply Quote 5
        • G AG
          G A @PeterJones
          last edited by

          @PeterJones Thank you for your answer and example. Seems to work pretty well.
          Not being regex expert, what would I need to do so that multi-line comments are supported too?

          PeterJonesP 1 Reply Last reply Reply Quote 0
          • PeterJonesP
            PeterJones @G A
            last edited by PeterJones

            @G-A said in UDL comment that requires seperators:

            Not being regex expert, what would I need to do so that multi-line comments are supported too?

            The (?-s) turns off the dot-matches-newline feature of regex; for multi-line comments to work, you’d need to turn it on instead, with (?s) at the beginning of each of the regexes shown.

            edit: but if you do that, you might need to make all the . matches “non-greedy” (not tested):

            0x00FF00 = (?s)(^|\s+)/\*\s+.+?\s+\*/(\s+|$)
            0x0000FF = (?s)(?:[^\s\A])\K/\*.*?\*/
            0x0000CC = (?s)/\*\S.+?\*/
            0xCC00CC = (?s)/\*.+?\S\*/
            0xFF00FF = (?s)/\*.+?\*/\S
            

            edit 2:
            … Hmm … when I don’t make it non-greedy, it doesn’t work; but if I make it greedy, it doesn’t work either.

            After some experimenting, replace the .+? or .*? with ((?!\*/).)+? or ((?!\*/).)*? – that way, it matches any character unless it’s the start of the comment-ending */

            0x00FF00 = (?s)(^|\s+)/\*\s+((?!\*/).)*?\s+\*/(\s+|$)
            0x0000FF = (?s)(?:[^\s\A])\K/\*((?!\*/).)*?\*/
            0x0000CC = (?s)/\*\S((?!\*/).)*?\*/
            0xCC00CC = (?s)/\*((?!\*/).)*?\S\*/
            0xFF00FF = (?s)/\*((?!\*/).)*?\*/(?=\S)
            

            71141546-3f51-4bb8-a1dc-45d6de897ab1-image.png

            Alan KilbornA 1 Reply Last reply Reply Quote 1
            • Alan KilbornA
              Alan Kilborn @PeterJones
              last edited by Alan Kilborn

              @PeterJones said in UDL comment that requires seperators:

              you might need to make all the . matches “non-greedy” (not tested)

              DEFINITELY!

              Here’s the difference (regexes are my own, for illustrative purposes; colors are also not intended to match OP desire – again, my own, to illustrate):

              Greedy (?s)/\*.+\*/ has single match (the yellow) on whole thing:

              f0798210-e0b0-4737-a2a3-cdcd8e5115d4-image.png

              Non-greedy (?s)/\*.+?\*/ matches (shown in alternating colors):

              957c3815-c690-4681-95c2-1423468b2819-image.png

              PeterJonesP 1 Reply Last reply Reply Quote 2
              • PeterJonesP
                PeterJones @Alan Kilborn
                last edited by

                @Alan-Kilborn said,

                DEFINITELY!

                Yes, I discovered that when I started my testing after the first edit. I started the testing when I realized that probably non-greedy alone wasn’t sufficient, either. Hence the second edit, with the tested regex and screenshot

                1 Reply Last reply Reply Quote 1
                • Mark OlsonM
                  Mark Olson
                  last edited by Mark Olson

                  Since we’re still debating, here’s what I came up with:

                  [foobar]
                  ; valid comments are green
                  0x229f22 = (?<!\S)/\*(?:(?!\*/)[\s\S])*\*/(?!\S)
                  ;invalid comments (no space before or after) are orange
                  0x0088ff = \S/\*(?:(?!\*/)[\s\S])*\*/\S
                  ; invalid comments (no space before) are red
                  0x0000ff = \S/\*(?:(?!\*/)[\s\S])*\*/(?!\S)
                  ; invalid comments (no space after) are purple
                  0xff00ff = (?<!\S)/\*(?:(?!\*/)[\s\S])*\*/\S
                  ; check in the respective styler xml if the following IDs are valid
                  excluded_styles = 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,20,21,22,23
                  

                  e05e7ba1-5741-4ddc-ab92-ad64f7a39aa8-image.png

                  text is:

                  /*start of file is ok*/
                  foo /* bar */baz       /* no sep after, bad */
                  foo/* bar */ baz       /* no sep before, bad */
                  foo /* bar */ baz      /* both seps, good */
                  foo /**/ baz           /* both seps, empty, good */
                  foo /**/baz            /* no sep after, empty, bad */
                  foo/**/ baz            /* no sep before, empty, bad */
                  foo/**/baz             /* no sep before or after, bad */
                  /* you need EnhanceAnyLexer plugin
                     to see pretty colors */
                  /* */ro
                  /*eof is ok*/
                  

                  While we’re at it, find/replace (\S)?(/\*(?:(?!\*/)[\s\S])*\*/)(\S)? with (?1\1\x20)\2(?3\x20\3) will fix all errors.

                  PeterJonesP 1 Reply Last reply Reply Quote 3
                  • PeterJonesP
                    PeterJones @Mark Olson
                    last edited by

                    @Mark-Olson said in UDL comment that requires seperators:

                    Since we’re still debating, here’s what I came up with:

                    I guess it depends on the OP’s language, whether the comment begin/end only need spaces outside the comment (like you interpreted), or whether they also need spaces inside the comment (like I interpreted).

                    Why do you use [\s\S] instead of . for “match any character”? My guess is that you want to be able to not think about .-matches-newline … But it’s longer ((?s) and . is only 5 characters, compared to [\s\S] 6 characters), and (IMO) it doesn’t convey the meaning as well, because when looking at a . in a regex, I immediately see “any character”, whereas if I see [\s\S] I have to think to myself “match any character that’s either a space character or isn’t a space chararacter – gee, why not use a . instead?” every time I try to read that idiom.

                    1 Reply Last reply Reply Quote 3
                    • G AG
                      G A
                      last edited by

                      In my specific case, the /* or */ need to be separated both inside and outside the comment from any other characters in order to function as a comment.

                      In any case, appreciate all the responses. This has been very helpful.

                      1 Reply Last reply Reply Quote 4
                      • Mark OlsonM
                        Mark Olson
                        last edited by Mark Olson

                        I finally came up with a regex-replace that adds buffer spaces internally as well as externally, if you are interested.

                        Replace (?s)(\S)?/\*((?!\*/)\S)?((?:(?!\S?\*/).)+?)?((?!\*/)\S)?\*/(\S)?
                        with (?1\1\x20)/*(?2\x20\2)(?3\3:\x20)(?4\4\x20)*/(?5\x20\5)

                        Just for the record, a lot of the complexity of that regex-replace is due to dealing with the silly corner case of replacing /**/ with /* */. You could absolutely come up with something simpler if you didn’t care about that.

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors