Markdown Lexer



  • I have the Markdown User Defined Language and that works OK, but looking at Scintilla docs, there seems to be a built-in Markdown lexer included at least as far back as 4.1.2 which is what I believe N++ was “recently” upgraded to.

    Using NppExec, I can set the Lexer type to “markdown” and not receive an error like I do when setting to a bogus language:

    ================ READY ================
    SCI_SENDMSG SCI_SETLEXERLANGUAGE 0 "markdown"
    ================ READY ================
    SCI_SENDMSG SCI_GETLEXERLANGUAGE 0 @""
    ================ READY ================
    ECHO $(MSG_LPARAM)
    markdown
    ================ READY ================
    SCI_SENDMSG SCI_SETLEXERLANGUAGE 0 "nonexistLang"
    ================ READY ================
    SCI_SENDMSG SCI_GETLEXERLANGUAGE 0 @""
    ================ READY ================
    ECHO $(MSG_LPARAM)
    null
    ================ READY ================
    

    I tried adding the values to my theme but it seems there’s a N++ middle-man I’m missing - somehow I have to tell N++ to use “markdown” as the language, but N++ does not have that in its languages - hence the UDL.

    Is there a reason N++ doesn’t add “Markdown” as a valid language and use the Scintilla lexer rather than forcing a UDL solution? Just curious - maybe there’s some history I’m not aware of or missed discussions on this forum or GitHub issues?

    Cheers.



  • @Michael-Vincent said in Markdown Lexer:

    Scintilla … 4.1.2 which is what I believe N++ was “recently” upgraded to.

    I thought it was 4.20.
    Sure would be nice to have that information in the Debug Info (echo, echo, – note this is not Eko, Eko…) to remove any possible doubt.



  • @Alan-Kilborn said in Markdown Lexer:

    Sure would be nice to have that information in the Debug Info

    LOL!!!

    4.2 - you’re right:

    04c00cfa-6e73-4a8e-9fb8-6e015b598edb-image.png



  • Although…in the Debug Info it would never change for a specific version of N++, thus making it more of an “about Notepad++” kind of thing?

    BTW, Neil Hodgson need to update his copyright date.



  • @Alan-Kilborn

    As you probably know, I’m an NppExec scripter and with the help of @dinkumoil I whipped up this little diddy:

    ::npp
    NPP_CONSOLE on
    NPP_SENDMSG NPPM_GETNPPVERSION
    SET LOCAL NPPMAJVER ~ int($(MSG_RESULT)/65536)
    SET LOCAL NPPMINVER ~ $(MSG_RESULT)%65536
    
    // Bitness from:  https://community.notepad-plus-plus.org/post/44021
    NPE_CONSOLE -- m-
    NPP_CONSOLE disable
    
    NPE_CONSOLE -- v+
    cmd.exe /c "for /f "tokens=1* delims==" %i in ('set ProgramFiles^(x86^) 2^>NUL') do @echo %j"
    SET LOCAL CALLRESULT = $(OUTPUT)
    cmd.exe /c powershell -Command "(Get-Item $(NPP_DIRECTORY)\SciLexer.dll).VersionInfo.ProductVersion"
    SET LOCAL SCIVER = $(OUTPUT)
    NPE_CONSOLE -- v-
    
    IF "$(CALLRESULT)"=="" THEN
        SET LOCAL BITNESS = 32
    ELSE
        IF "$(CALLRESULT)"=="$(SYS.ProgramFiles)" THEN
            SET LOCAL BITNESS = 32
        ELSE
            SET LOCAL BITNESS = 64
        ENDIF
    ENDIF
    
    NPP_CONSOLE enable
    ECHO Notepad++: $(NPPMAJVER).$(NPPMINVER) / $(BITNESS)-bit
    ECHO Scintilla: $(SCIVER)
    

    cb2aee15-fc22-49e9-9362-e61ea348bebc-image.png

    Cheers.



  • @Michael-Vincent ,

    I tried adding the values to my theme but it seems there’s a N++ middle-man

    Apparently. I thought maybe he just hadn’t included the markdown lexer in the NPP copy of Scintilla, but it’s there
    (Then I realized if the SCI_SETLEXERLANGUAGE didn’t complain then of course it’s there.)

    So, there are two different ways to set the lexer. Er, three, really

    1. As you showed, directly use the SCI_SETLEXERLANGUAGE, but that obviously skips any Notepad++ wrappers
    2. Send the NPPM_SETBUFFERLANGTYPE Notepad++ Message,
      which is implemented in the Big Switch, which then calls scintilla’s `Buffer::setLangType()
    3. Use the Language menu, which sends IDM_LANG_…, which are handled in NppCommands about here, and calls setLanguage , langHasBeenSetFromMenu, and _pDocMap->setSyntaxHiliting()

    The setLangType and langHasBeenSetFromMenu eventually send some Scintilla notificationsdoNotify(BufferChangeLanguage|BufferChangeLexing) and doNotify(BufferChangeFilename | BufferChangeLanguage | BufferChangeTimestamp). GitHub isn’t giving all the results when I search for doNotify, but I eventually found that it calls Notepad_plus::notifyBufferChanged

    I was hoping I would find a Scintilla and/or Notepad++ notification you could use to trigger a redraw or something. But I haven’t found a smoking gun.

    Unfortunately, I cannot tell from the notifyBufferChanged what I would need to send in order to convince Notepad++ to do its wrapper stuff without overriding the SCI_SETLEXERLANGUAGE you just used. Also, you would have to somehow define the colors from your script rather than the style configurator.

    You might want to just put in an issue to ask him to add a menu entry for the Scintilla 4.20 builtin Markdown Lexer and add it to the style configurator, and see what happens. :-) Then again, maybe the Scintilla Markdown lexer isn’t great, which is why he’s stuck with his Markdown UDL (or he just is used to the Markdown UDL, and doesn’t want to change). ;-)



  • @PeterJones said in Markdown Lexer:

    Then again, maybe the Scintilla Markdown lexer isn’t great, which is why he’s stuck with his Markdown UDL

    That’s what I was wondering about - any legends or lore from the ol’ timers (no offense) here. Given I started using N++ around version 6.something and only started actively participating in this forum a few years ago, I’m sure this conversation was had, decided and moved on from. Maybe it wasn’t documented though.

    Not big deal, the UDL Markdown works pretty well and with my modified NppMarkdownPanel, the “live” view while editing Markdown works nicely.

    Cheers.



  • @Michael-Vincent

    seems the built-in one does things differently

    ca7cbcc3-dabe-4ec1-9617-f4cb89da6bc7-image.png

    to play with it I used this python code

    MARKDOWN_SYTLES = {
        0  : (180, 180, 180),  # SCE_MARKDOWN_DEFAULT
        1  : (255, 32 , 0),  # SCE_MARKDOWN_LINE_BEGIN
        2  : (255, 64 , 0),  # SCE_MARKDOWN_STRONG1
        3  : (255, 96 , 0),  # SCE_MARKDOWN_STRONG2
        4  : (255, 128, 0),  # SCE_MARKDOWN_EM1
        5  : (255, 160, 0),  # SCE_MARKDOWN_EM2
        6  : (255, 192, 0),  # SCE_MARKDOWN_HEADER1
        7  : (255, 224, 0),  # SCE_MARKDOWN_HEADER2
        8  : (255, 255, 0),  # SCE_MARKDOWN_HEADER3
        9  : (255, 0, 0  ),  # SCE_MARKDOWN_HEADER4
        10 : (255, 0, 32 ),  # SCE_MARKDOWN_HEADER5
        11 : (255, 0, 64 ),  # SCE_MARKDOWN_HEADER6
        12 : (255, 0, 96 ),  # SCE_MARKDOWN_PRECHAR
        13 : (255, 0, 128),  # SCE_MARKDOWN_ULIST_ITEM
        14 : (255, 0, 160),  # SCE_MARKDOWN_OLIST_ITEM
        15 : (255, 0, 192),  # SCE_MARKDOWN_BLOCKQUOTE
        16 : (255, 2, 224),  # SCE_MARKDOWN_STRIKEOUT
        17 : (255, 0, 255),  # SCE_MARKDOWN_HRULE
        18 : (255, 128, 128),  # SCE_MARKDOWN_LINK
        19 : (255, 128, 160),  # SCE_MARKDOWN_CODE
        20 : (255, 128, 192),  # SCE_MARKDOWN_CODE2
        21 : (255, 128, 224),  # SCE_MARKDOWN_CODEBK
    }
    editor.setLexer(98)
    for _id, color in MARKDOWN_SYTLES.items():
        editor.styleSetFore(_id, color)
    editor.colourise(0, -1)
    


  • @Ekopalypse said in Markdown Lexer:

    seems the built-in one does things differently

    Nicely done! Was hoping I could modify langs.xml and my theme to do it “automatically” after sending a Scintilla message, but that works.

    However, not sure I like the “built-in” highlighting. Maybe I’m just used to seeing the UDL version? What do others think? Does anyone work with Markdown a lot - which lexer do you like better?

    Cheers.



  • @Michael-Vincent

    Was hoping I could modify langs.xml

    Not really, there are quite a few locations that I think need to be changed. Nothing complicated, but it has to be done.

    A scripting workaround could be done, but to be honest, from my point of view the UDL looks more stable.
    I’m not talking about the bold and/or italic font attributes or the fonts in general, you could do all that, but did you notice the glitch that the header1 was not colored and that bold italics, the last asterisks and underlines were not colored?

    Personally, I don’t use markdown very often.
    The update of the Npp API documentation I did was probably the biggest project with markdown language :-D



  • Personally, I don’t use markdown very often

    I don’t either, so maybe this isn’t a dumb question (I started it early and then cancelled):

    Which one above (left or right) is the built-in MD lexer and which one is the UDL MD lexer?
    Or maybe it still is a dumb question?



  • @Alan-Kilborn

    left built-in, right UDL :-)



  • Hello, @michael-vincent, @alan-kilborn, @ekopalypse, @peterjones and All,

    Here is my own Markdown test style ;-))

    • First, it may give you some hints about the way to realize some specific things, on our NodeBB Notepad++ forum

    • Secondly, you may use it to test a Markdown lexer or UDL


    So, this raw input text, below :

    #### FONT styles :
    ¯¯¯¯¯¯¯¯¯¯¯
    
    DEFAULT text
    
    *Italic1*
    _Italic2_
    
    **STRONG1**
    __STRONG2__
    
    ***STRONG_italic1***
    ___STRONG_italic2___
    
    ~~Strikethrough~~
    
    `Monospace`
    
    ``There are LITERAL `back-ticks`  here !``
    
    
    #### HEADERS :
    ¯¯¯¯¯¯¯¯¯¯
    
    Header H1
    =========
    
    Header H2
    ---------
    
    and
    
    # Header H1
    ## Header H2
    ### Header H3 ###
    #### Header H4
    ##### Header H5
    ###### Header H6 #
    
    
    #### NON-ORDERED lists :
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    
    * First item
    * Second item
      - Third item
      - Fourth item
        + Fifth item
        + Sixth item
      - Seventh item
      - Eighth item
    * Ninth item
    * Tenth item
    
    #### ORDERED lists :
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    
    1. First item
    2. Second item
    2020\. What a year !
    3. Third item
    999. Last item
    
    
    #### Block QUOTES :
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    
    > This is the **first** level of quoting.
    >
    > > **Second** level : this is
      a **nested**
    blockquote.
    >
    >Back to
    the **first** level.
    
    > **Second**
      block quote
      test
    
    > #### **`Third`** block quote
    >
    > > 1.   This is the **first** list item.
    > > 2.   This is the **second** list item.
    > >
    > > Here's some example **code**:
    > >
    > >     return shell_exec("echo $input | $markdown_script");
    > This is the
    end of the **first** level
    
    
    #### CODE blocks :
    ¯¯¯¯¯¯¯¯¯¯¯¯
    
    With **`4` leading** spaces :
    
        static int isPalindrome(int item)
        {
            int rev = 0;
            int rem = 0;
            int num = item;
        
            while (num > 0)
            {
                rem = num % 10;
                rev = rev * 10 + rem;
                num = num / 10;
            }
        }
    
    By **default** ( identical ) :
    
    ~~~
    static int isPalindrome(int item)
    {
        int rev = 0;
        int rem = 0;
        int num = item;
    
        while (num > 0)
        {
            rem = num % 10;
            rev = rev * 10 + rem;
            num = num / 10;
        }
    }
    ~~~
    
    With **`CPP` language** specified :
    
    ~~~cpp
    static int isPalindrome(int item)
    {
        int rev = 0;
        int rem = 0;
        int num = item;
    
        while (num > 0)
        {
            rem = num % 10;
            rev = rev * 10 + rem;
            num = num / 10;
        }
    }
    ~~~
    
    **Without** any language **extension** :
    
    ~~~no
    static int isPalindrome(int item)
    {
        int rev = 0;
        int rem = 0;
        int num = item;
    
        while (num > 0)
        {
            rem = num % 10;
            rev = rev * 10 + rem;
            num = num / 10;
        }
    }
    ~~~
    
    
    #### HORIZONTAL rules ( *** or --- or ___ )
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    
    Before the **rule**
    
    ---
    
    After the **rule**
    
    
    #### LINKS :
    ¯¯¯¯¯¯
    
    **In-line**-style  links :
    
    This is the [1st link](https://example.com/) example
    
    This is the [2nd link](https://tests.com/ "Hovered Text") example
    
    **Reference**-style links :
    
    This is the [1st Reference][id1] style link
    
    [id1]: <https://tests.com/>
    [id2]: https://example.com/  "Hovered Text"
    
    This is the [2nd reference][id2] style link
    
    **Implicit**-style link :
    
    The [Daring Fireball][] site
    
    [Daring Fireball]: https://daringfireball.net/ (Daring Fireball Site)
    
    
    #### PICTURES :
    ¯¯¯¯¯¯¯¯¯¯
    
    **In-line**-style image :
    
    ![](https://imgur.com/P0GFeMF.jpg)
    
    The *same* **in-line** image with **hovered** text
    
    ![](https://imgur.com/P0GFeMF.jpg "Winter Landscape")
    
    
    
    **Reference-style** image ( **Notepad++** animation ) :
    
     ![][id3]
    
    [id3]: https://i.imgur.com/HvBd52m.gif
    [id4]: <https://i.imgur.com/HvBd52m.gif> 'RENUMBERING with RECTANGULAR selection'
    
    The *same* animation, with **hovered** text
    
    ![][id4]
    
    
    #### AUTOMATIC links :
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    
    <https://www.google.fr/>
    
    <xxx.xxx.xxx@gmail.com>
    
    
    #### ESCAPED characters :
    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
    
    a LITERAL backslash **\\** character
    a LITERAL back-tick **\`** character
    a LITERAL asterisk **\*** character
    a LITERAL underscore **\_** character
    a LITERAL opening curly brace **\{** character
    a LITERAL ending  curly brace **\}** character
    a LITERAL opening square bracket **[** character
    a LITERAL ending  square bracket **]** character
    a LITERAL opening parenthesis **\(** character
    a LITERAL ending  parenthesis **\)** character
    a LITERAL hash mark **\#** character
    a LITERAL plus sign **\+** character
    a LITERAL minus sign (hyphen) **\-** character
    a LITERAL dot **\.** character
    a LITERAL exclamation mark **\!** character
    
    
    #### TABLES :
    ¯¯¯¯¯¯¯¯
    
    **`6`** **columns**, so **`7`** **delimiters** (  character **`|`**)
    
    | Name | AgeM | AgeF | Occupation | ABC |Xyz|
    |-|:-:|:-:|-|:-|-:|
    | John COOMBE | 60 | | Farmer | | Y |
    | Joseph COOMBE | 28 | | | Y | Y |
    
    | Elizabeth COOMBE | | 22 | | Y |N|
    | | | | | | |
    | George COOMBE | 16 | | |N| Y |
    | Richard COOMBE | 14 | | |N|N|
    |.|.|.|.|.|.|
    | Christopher COOMBE | 11 | | | Y ||
    | Francis COOMBE | 2 | | | ||
    
    That's **ALL** !
    

    Should produce this output :

    FONT styles :

    ¯¯¯¯¯¯¯¯¯¯¯

    DEFAULT text

    Italic1
    Italic2

    STRONG1
    STRONG2

    STRONG_italic1
    STRONG_italic2

    Strikethrough

    Monospace

    There are LITERAL `back-ticks` here !

    HEADERS :

    ¯¯¯¯¯¯¯¯¯¯

    Header H1

    Header H2

    and

    Header H1

    Header H2

    Header H3

    Header H4

    Header H5
    Header H6

    NON-ORDERED lists :

    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

    • First item
    • Second item
      • Third item
      • Fourth item
        • Fifth item
        • Sixth item
      • Seventh item
      • Eighth item
    • Ninth item
    • Tenth item

    ORDERED lists :

    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯

    1. First item
    2. Second item
      2020. What a year !
    3. Third item
    4. Last item

    Block QUOTES :

    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯

    This is the first level of quoting.

    Second level : this is
    a nested
    blockquote.

    Back to
    the first level.

    Second
    block quote
    test

    Third block quote

    1. This is the first list item.
    2. This is the second list item.

    Here’s some example code:

    return shell_exec("echo $input | $markdown_script");
    

    This is the
    end of the first level

    CODE blocks :

    ¯¯¯¯¯¯¯¯¯¯¯¯

    With 4 leading spaces :

    static int isPalindrome(int item)
    {
        int rev = 0;
        int rem = 0;
        int num = item;
    
        while (num > 0)
        {
            rem = num % 10;
            rev = rev * 10 + rem;
            num = num / 10;
        }
    }
    

    By default ( identical ) :

    static int isPalindrome(int item)
    {
        int rev = 0;
        int rem = 0;
        int num = item;
    
        while (num > 0)
        {
            rem = num % 10;
            rev = rev * 10 + rem;
            num = num / 10;
        }
    }
    

    With CPP language specified :

    static int isPalindrome(int item)
    {
        int rev = 0;
        int rem = 0;
        int num = item;
    
        while (num > 0)
        {
            rem = num % 10;
            rev = rev * 10 + rem;
            num = num / 10;
        }
    }
    

    Without any language extension :

    static int isPalindrome(int item)
    {
        int rev = 0;
        int rem = 0;
        int num = item;
    
        while (num > 0)
        {
            rem = num % 10;
            rev = rev * 10 + rem;
            num = num / 10;
        }
    }
    

    HORIZONTAL rules ( *** or — or ___ )

    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

    Before the rule


    After the rule

    LINKS :

    ¯¯¯¯¯¯

    In-line-style links :

    This is the 1st link example

    This is the 2nd link example

    Reference-style links :

    This is the 1st Reference style link

    This is the 2nd reference style link

    Implicit-style link :

    The Daring Fireball site

    PICTURES :

    ¯¯¯¯¯¯¯¯¯¯

    In-line-style image :

    The same in-line image with hovered text

    Reference-style image ( Notepad++ animation ) :

    The same animation, with hovered text

    AUTOMATIC links :

    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

    https://www.google.fr/

    xxx.xxx.xxx@gmail.com

    ESCAPED characters :

    ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯

    a LITERAL backslash \ character
    a LITERAL back-tick ` character
    a LITERAL asterisk * character
    a LITERAL underscore _ character
    a LITERAL opening curly brace { character
    a LITERAL ending curly brace } character
    a LITERAL opening square bracket [ character
    a LITERAL ending square bracket ] character
    a LITERAL opening parenthesis ( character
    a LITERAL ending parenthesis ) character
    a LITERAL hash mark # character
    a LITERAL plus sign + character
    a LITERAL minus sign (hyphen) - character
    a LITERAL dot . character
    a LITERAL exclamation mark ! character

    TABLES :

    ¯¯¯¯¯¯¯¯

    6 columns, so 7 delimiters ( character |)

    • Columns 1, 4, 5 are default left justified
    • Column 6 is right justified
    • Columns 2,3 are centered

    Name AgeM AgeF Occupation ABC Xyz
    John COOMBE 60 Farmer Y
    Joseph COOMBE 28 Y Y
    Elizabeth COOMBE 22 Y N
    George COOMBE 16 N Y
    Richard COOMBE 14 N N
    . . . . . .
    Christopher COOMBE 11 Y
    Francis COOMBE 2

    That’s ALL !

    BR

    guy038


Log in to reply