Community
    • Login

    UDL for rmarkdown = R + markdown/LaTex?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    10 Posts 4 Posters 1.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ddalthorpD
      ddalthorp
      last edited by ddalthorp

      rmarkdown (extension rmd) is a kind of a hybrid between R and markdown. rmd documents can be rendered into publication quality pdf’s (or html or docx) using R and pandoc in a straightforward and simple way, with every number, table, figure, etc. being produced from R data and code, which are included in the rmd (and associated .Rdata) but invisible in the pdf.

      I’m wondering if there’s a way to write a UDL with text highlighting that will handle different sections of a document in different ways, depending on whether the section is R code or markdown/LaTex.

      Example (```{r and ``` delimit R code, with the rest being markdown/LaTex):

      =====================

      # Introduction

      I like integrals, and $\int_a^b f(x)dx$ is one of my favorites. I use it all the time on my little garden plot.

      ```{r}
      # graph of a lovely quadratic function
      dat <- data.frame(x = 0:10, y = (0:10) ^ 2)
      plot(dat$x, dat$y)
      ```

      Now that we’ve seen the graph of a lovely quadratic function, our lives are fulfilled, and we can live in peace for eternity.

      ====================

      $ —> a delimiter for mathematical expressions in the markdown/Latex section and is expected to come in pairs, but in the R part, it is an operator and comes singly.

      # —> marks a section header in the markdown section but a comment in the R section

      plot —> just a text word in the markdown but a function in R

      Is there a way to do these two things at the same time?

      1. have UDL format the delimited $<stuff>$ in the markdown/Latex part while ignoring that formatting in the R part, and
      2. have plot be a keyword in the R part but just normal text in the markdown/LaTex part
      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @ddalthorp
        last edited by

        @ddalthorp ,

        I’m wondering if there’s a way to write a UDL with text highlighting that will handle different sections of a document in different ways

        Sorry. UDL is a reasonably simple lexer, which can handle keywords, operators, and simple folding – it doesn’t even handle regex-based keywords or unicode keywords correctly. Handling separate sections differently is well outside it’s purview.

        You would have to write a custom lexer plugin to handle firepower of that magnitude. (Well, you might be able to implement it in PythonScript or one of the other scripting plugins, rather than writing and compiling a separate plugin, but it would probably be a lot slower than a standalone plugin, and would probably still require a lot of effort to get working properly.)

        1 Reply Last reply Reply Quote 3
        • ddalthorpD
          ddalthorp
          last edited by

          Thanks, @PeterJones.

          “Sorry. UDL is a reasonably simple lexer, which can handle keywords, operators, and simple folding”…and it can also handle delimiters, by which I am able to split the .rmd into sections, calling the markdown/LaTex part “Comment” (delimited by ```{ and ``` ), which has can be easily defined to have a different format than the R code section. Then, groups of keywords in R can be explicitly excluded from highlighting in the markdown/LaTex part, and most of the keywords in markdown/LaTex can be excluded from the R part simply by virtue of R not having anything resembling them (e.g., \hline or \caption).

          And, voilà!

          The doc is 95% split into independently formatted section types and the whole task seems very close to being done. The sticking point is a small number of exceptions, like a delimiter in LaTex ($) that would be nice to highlight but is an operator in R.

          So:

          1. break the doc into sections as Comment and code by delimiters [easy]
          2. exclude code syntax highlighting from the Comment parts [easy]
          3. exclude Comment syntax highlighting from the code parts when the syntax from the Comment parts does not overlap with the code syntax [easy]
          4. exclude Comment syntax from the code parts when the two sections use the same words or symbols to mean different things [not so easy?]

          That final 5% is always the hard part!

          ddalthorpD PeterJonesP 2 Replies Last reply Reply Quote 1
          • ddalthorpD
            ddalthorp @ddalthorp
            last edited by

            @ddalthorp

            correction: delimited by ``` and ```{

            1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones @ddalthorp
              last edited by

              @ddalthorp said in UDL for rmarkdown = R + markdown/LaTex?:

              and it can also handle delimiters, by which I am able to split the .rmd into sections

              I would not have though of using delimiters. But now that you mention that, since you can define which keywords and other delimiters can work inside a given delimiter, then that’s actually a great way of doing it.

              Congratulations on thinking outside my box.

              1 Reply Last reply Reply Quote 2
              • caryptC
                carypt
                last edited by

                so the main problem still is (point 4.) to determine between same words/symbols that have different meaning in markdown/latex/ R-code . the different meaning is coded by being surrounded by delimiters ( ```{r} , ``` ). so some symbols(-strings) are context dependent , but the udl-lexer isnt able ignore the characters between ```{r} and $ to adress the different meaning of $ in R- against markdown-language.

                it would be possible to do regex-search-replace-transformation for securely rename the double meaning signs , but that is not wanted .
                what is wanted is a switch to activate highlighting in a different language in a section of the same file . or a double language highlighting of one file section-wise .

                i am only thinking on this , and hope i got it right so far . my little idea is : the context dependency concept is only similar in the open-middle-close folding style in the udl-folder&default-tab , so the $ in middle would be context dependent from open ´´´{r} and close ``` . buut … is not good

                1 Reply Last reply Reply Quote 0
                • caryptC
                  carypt
                  last edited by carypt

                  just thinking aloud : npp has 2 possibilities of style configuration , one is in settings-style configurator , this affects the styling of the preinstalled languages also default , and the udl - settings . is there maybe a way to reach good highlighting fo r-markdown ?

                  ddalthorpD 1 Reply Last reply Reply Quote 0
                  • ddalthorpD
                    ddalthorp @carypt
                    last edited by

                    @carypt I have everything working as I’d like except for two minor issues, which aren’t a big deal for me personally but make the whole project essentially non-sharable as a UDL because it’s so kludgy and is still missing that final 5%.

                    That is too bad. I had been avoiding using Notepad++ for years because it doesn’t do rmarkdown, which I have come to rely on for writing scientific docs because it makes it so easy and natural for tying figures and statistics in a publication to the data and calculations used to generate them — very science-friendly. I’d been using Tinn-R for years. It does OK with rmarkdown, but it doesn’t seem to play well with my new computer, so I’ve been looking for alternatives. [I’ve tried the ever-popular RStudio several times—even tried using it exclusively for months—but I really do not like it.] A clean rmarkdown solution in Notepad++ would be a great thing.

                    The first issue with my kludge is that a snippet of R code is required at the beginning in order to get the delineators right. (There is nothing wrong with R code at the beginning, but it should not be a requirement.) The rationale is that the R part of an .rmd is the one that requires the most syntax highlighting AND it has many, many keywords that should be treated as plain text in the markdown part (e.g., if, for, plot, any, all, scale, etc.). An easy, obvious, partial solution would be to delineate the markdown part as Comment and then exclude the R formatting from that part of the .rmd. In rmarkdown, the markdown part is treated as the default (with no delineation), with the R part delineated. So, I need to invert the delineation, using ``` to open a comment (or markdown section) and ```{ to close it. This requires the doc to start with R code:

                    ================
                    ```{r}
                    # required R section at the beginning, whether it is meaningful or not
                    dataset <- data.frame(x = 1:3, y = 4:6)
                    ```

                    # Introduction
                    Greetings, gentle reader, and welcome. Look at the lovely scatter plot of $y = x + 3$.

                    ```{r}
                    plot(dataset$x, dataset$y)
                    ```

                    # Conclusion
                    I hope you enjoyed the paper.

                    =================
                    The UDL recognizes the second ``` as the opening delineator of the relatively lightly formatted comment (markdown) section, which is then closed with ```{ . This mostly works but is awkward.

                    The second issue is that the markdown part can include LaTex for formatting and rendering the final .pdf. Ideally, all the LaTex directives would be formatted nicely. Many of them are not difficult to handle because they never would appear in R, so they would automatically be ignored in the R part because they would never appear there. There are some, though, that would appear and take on different meanings, most notably the $ delimiter in LaTex and operator in R, which is very common in both R and markdown/LaTex that would be great to be able to highlight in the markdown part and ignore in the R part.

                    EkopalypseE 1 Reply Last reply Reply Quote 1
                    • EkopalypseE
                      Ekopalypse @ddalthorp
                      last edited by Ekopalypse

                      @ddalthorp

                      I don’t think this can be solved with UDL.
                      You need SubLexers, something that for example the HTML Lexer does.
                      The most sensible thing would be a dedicated lexer. If you have C# knowledge, this could be interesting for you.
                      What could also work would be a solution like I implemented with the EnhanceAnyLexer script.
                      This is a regex based “pseudo” lexer that should be used together with a builtin or UDL lexer.
                      This can probably be solved with other scripting plugins like Lua or JavaScript as well.

                      1 Reply Last reply Reply Quote 2
                      • ddalthorpD
                        ddalthorp
                        last edited by

                        @Ekopalypse Dang. I was looking for a quick fix. UDL does an amazing job with the limited tools it employs; I was hoping that a simple tweak would get me there. I’m seriously considering dumping Windows and moving to Linux before too long (after finishing up a couple projects), so I’m not going to sink any more time into coming up with a good windows solution with Npp.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors