UDL for rmarkdown = R + markdown/LaTex?

ddalthorp

rmarkdown (extension rmd) is a kind of a hybrid between R and markdown. rmd documents can be rendered into publication quality pdf’s (or html or docx) using R and pandoc in a straightforward and simple way, with every number, table, figure, etc. being produced from R data and code, which are included in the rmd (and associated .Rdata) but invisible in the pdf.

I’m wondering if there’s a way to write a UDL with text highlighting that will handle different sections of a document in different ways, depending on whether the section is R code or markdown/LaTex.

Example (```{r and ``` delimit R code, with the rest being markdown/LaTex):

=====================

# Introduction

I like integrals, and $\int_a^b f(x)dx$ is one of my favorites. I use it all the time on my little garden plot.

```{r}
# graph of a lovely quadratic function
dat <- data.frame(x = 0:10, y = (0:10) ^ 2)
plot(dat$x, dat$y)
```

Now that we’ve seen the graph of a lovely quadratic function, our lives are fulfilled, and we can live in peace for eternity.

====================

$ —> a delimiter for mathematical expressions in the markdown/Latex section and is expected to come in pairs, but in the R part, it is an operator and comes singly.

# —> marks a section header in the markdown section but a comment in the R section

plot —> just a text word in the markdown but a function in R

Is there a way to do these two things at the same time?

have UDL format the delimited $<stuff>$ in the markdown/Latex part while ignoring that formatting in the R part, and
have plot be a keyword in the R part but just normal text in the markdown/LaTex part

PeterJones

@ddalthorp ,

I’m wondering if there’s a way to write a UDL with text highlighting that will handle different sections of a document in different ways

Sorry. UDL is a reasonably simple lexer, which can handle keywords, operators, and simple folding – it doesn’t even handle regex-based keywords or unicode keywords correctly. Handling separate sections differently is well outside it’s purview.

You would have to write a custom lexer plugin to handle firepower of that magnitude. (Well, you might be able to implement it in PythonScript or one of the other scripting plugins, rather than writing and compiling a separate plugin, but it would probably be a lot slower than a standalone plugin, and would probably still require a lot of effort to get working properly.)

ddalthorp

Thanks, @PeterJones.

“Sorry. UDL is a reasonably simple lexer, which can handle keywords, operators, and simple folding”…and it can also handle delimiters, by which I am able to split the .rmd into sections, calling the markdown/LaTex part “Comment” (delimited by ```{ and ``` ), which has can be easily defined to have a different format than the R code section. Then, groups of keywords in R can be explicitly excluded from highlighting in the markdown/LaTex part, and most of the keywords in markdown/LaTex can be excluded from the R part simply by virtue of R not having anything resembling them (e.g., \hline or \caption).

And, voilà!

The doc is 95% split into independently formatted section types and the whole task seems very close to being done. The sticking point is a small number of exceptions, like a delimiter in LaTex ($) that would be nice to highlight but is an operator in R.

So:

break the doc into sections as Comment and code by delimiters [easy]
exclude code syntax highlighting from the Comment parts [easy]
exclude Comment syntax highlighting from the code parts when the syntax from the Comment parts does not overlap with the code syntax [easy]
exclude Comment syntax from the code parts when the two sections use the same words or symbols to mean different things [not so easy?]

That final 5% is always the hard part!

ddalthorp

@ddalthorp

correction: delimited by ``` and ```{

PeterJones

@ddalthorp said in UDL for rmarkdown = R + markdown/LaTex?:

and it can also handle delimiters, by which I am able to split the .rmd into sections

I would not have though of using delimiters. But now that you mention that, since you can define which keywords and other delimiters can work inside a given delimiter, then that’s actually a great way of doing it.

Congratulations on thinking outside my box.

carypt

so the main problem still is (point 4.) to determine between same words/symbols that have different meaning in markdown/latex/ R-code . the different meaning is coded by being surrounded by delimiters ( ```{r} , ``` ). so some symbols(-strings) are context dependent , but the udl-lexer isnt able ignore the characters between ```{r} and $ to adress the different meaning of $ in R- against markdown-language.

it would be possible to do regex-search-replace-transformation for securely rename the double meaning signs , but that is not wanted .
what is wanted is a switch to activate highlighting in a different language in a section of the same file . or a double language highlighting of one file section-wise .

i am only thinking on this , and hope i got it right so far . my little idea is : the context dependency concept is only similar in the open-middle-close folding style in the udl-folder&default-tab , so the $ in middle would be context dependent from open ´´´{r} and close ``` . buut … is not good

carypt

just thinking aloud : npp has 2 possibilities of style configuration , one is in settings-style configurator , this affects the styling of the preinstalled languages also default , and the udl - settings . is there maybe a way to reach good highlighting fo r-markdown ?

ddalthorp

@carypt I have everything working as I’d like except for two minor issues, which aren’t a big deal for me personally but make the whole project essentially non-sharable as a UDL because it’s so kludgy and is still missing that final 5%.

That is too bad. I had been avoiding using Notepad++ for years because it doesn’t do rmarkdown, which I have come to rely on for writing scientific docs because it makes it so easy and natural for tying figures and statistics in a publication to the data and calculations used to generate them — very science-friendly. I’d been using Tinn-R for years. It does OK with rmarkdown, but it doesn’t seem to play well with my new computer, so I’ve been looking for alternatives. [I’ve tried the ever-popular RStudio several times—even tried using it exclusively for months—but I really do not like it.] A clean rmarkdown solution in Notepad++ would be a great thing.

The first issue with my kludge is that a snippet of R code is required at the beginning in order to get the delineators right. (There is nothing wrong with R code at the beginning, but it should not be a requirement.) The rationale is that the R part of an .rmd is the one that requires the most syntax highlighting AND it has many, many keywords that should be treated as plain text in the markdown part (e.g., if, for, plot, any, all, scale, etc.). An easy, obvious, partial solution would be to delineate the markdown part as Comment and then exclude the R formatting from that part of the .rmd. In rmarkdown, the markdown part is treated as the default (with no delineation), with the R part delineated. So, I need to invert the delineation, using ``` to open a comment (or markdown section) and ```{ to close it. This requires the doc to start with R code:

================
```{r}
# required R section at the beginning, whether it is meaningful or not
dataset <- data.frame(x = 1:3, y = 4:6)
```

# Introduction
Greetings, gentle reader, and welcome. Look at the lovely scatter plot of $y = x + 3$.

```{r}
plot(dataset$x, dataset$y)
```

# Conclusion
I hope you enjoyed the paper.

=================
The UDL recognizes the second ``` as the opening delineator of the relatively lightly formatted comment (markdown) section, which is then closed with ```{ . This mostly works but is awkward.

The second issue is that the markdown part can include LaTex for formatting and rendering the final .pdf. Ideally, all the LaTex directives would be formatted nicely. Many of them are not difficult to handle because they never would appear in R, so they would automatically be ignored in the R part because they would never appear there. There are some, though, that would appear and take on different meanings, most notably the $ delimiter in LaTex and operator in R, which is very common in both R and markdown/LaTex that would be great to be able to highlight in the markdown part and ignore in the R part.

Ekopalypse

@ddalthorp

I don’t think this can be solved with UDL.
You need SubLexers, something that for example the HTML Lexer does.
The most sensible thing would be a dedicated lexer. If you have C# knowledge, this could be interesting for you.
What could also work would be a solution like I implemented with the EnhanceAnyLexer script.
This is a regex based “pseudo” lexer that should be used together with a builtin or UDL lexer.
This can probably be solved with other scripting plugins like Lua or JavaScript as well.

ddalthorp

@Ekopalypse Dang. I was looking for a quick fix. UDL does an amazing job with the limited tools it employs; I was hoping that a simple tweak would get me there. I’m seriously considering dumping Windows and moving to Linux before too long (after finishing up a couple projects), so I’m not going to sink any more time into coming up with a good windows solution with Npp.