functionList for LaTeX: Trying to use classRange to have a hierarchical chapter > section document outline
-
I’m trying to use the npp function list to produce a document outline for me which shows something like the following:
Chapter 1 > Section 1.1 > Section 1.2 Chapter 2 > Section 2.1 > Section 2.2
I can’t work out why this doesn’t work:
<NotepadPlus> <functionList> <parser id="latex" displayName="LaTeX" commentExpr="(%.*?$)"> <classRange mainExpr="\\chapter\*?\{(.*?)\}"> <className> <nameExpr expr=".*"/> </className> <function mainExpr="\\section\*?\{(.*?)\}"> <functionName> <funcNameExpr expr=".*"/> </functionName> </function> </classRange> </parser> </functionList> </NotepadPlus>
As far as I can tell from the manual, that should look for
\chapter{}
or\chapter*{}
as the start of a class and with noopenSymbole
/closeSymbole
it should treat everything after the first\chapter{}
as a class range until the next\chapter{}
.Instead I get an empty function list.
I can get chapters and sections on one level ok using
\\(chapter|section)...
for the regex, but I can’t get classRange to work for the life of me. -
I’m trying to use the npp function list to produce a document outline for me which shows something like the following:
could you share some example LaTeX which would resolve to that FunctionList? It would make it easier for us to help you.
I can’t work out why this doesn’t work:
…<classRange mainExpr="\\chapter\*?\{(.*?)\}">
\*?
says “0 or 1 literal asterisk characters”. Is that really what you’re trying to match? I don’t know enough about LaTeX to make an educated guess there; it just seems odd to me. But looking at the built-inlatex.xml
, I guess that uses the\*?
after each of it’s “functions”, so maybe that is reasonable.As far as I can tell from the manual, that should look for
\chapter{}
or\chapter*{}
as the start of a class and with noopenSymbole
/closeSymbole
it should treat everything after the first\chapter{}
as a class range until the next\chapter{}
.If you don’t have
openSymbole
/closeSymbole
, then the mainExpr for the classRange needs to match the entire class, from beginning to end, and then it searches for any functions inside the results of that regex. (And it needs to be a multi-line match)Instead I get an empty function list.
Unfortunately, that’s a common occurrence when trying to debug FunctionList, especially with classes.
I can get chapters and sections on one level ok using
\\(chapter|section)...
for the regex, but I can’t get classRange to work for the life of me.(Did you start on your own, or did you start with the
latex.xml
that ships with Notepad++ since v8.7? Because that shows a bigger list of things all at the same level, which is better than nothing)I’ll see if I can come up with something, starting from the default
latex.xml
, that will get you started in the right direction. -
using the example text found here, which has chapters and sections, I am able to get something I think is reasonable.
<?xml version="1.0" encoding="UTF-8" ?> <!-- ==========================================================================\ | To learn how to make your own language parser, please check the following | link: https://npp-user-manual.org/docs/function-list/ \=========================================================================== --> <NotepadPlus> <functionList> <parser displayName="LaTeX Syntax" id ="latex_class" commentExpr="(?x) (%.*?$) # Comment " > <function mainExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?im-s) # ignore case, ^ and $ match start/end of line, dot doesn't match newline \\(begin| part\*?| subsection\*?| subsubsection\*?| paragraph\*?| subparagraph\*?) {.*}" > </function> <classRange mainExpr ="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?m) # ^ and $ match at line-breaks (?'CLASS_START' ^ # NO leading white-space at start-of-line \\(chapter\*?) ) (?s:.*?) # whatever, (?= # ...up till \s* # ...optional leading white-space of (?: (?&CLASS_START) # ...next header | \Z # ...or end-of-text ) ) " > <className> <nameExpr expr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) \\(chapter\*?) # prefix INCLUDED { # brace before name INCLUDED .*? # name } # brace after name INCLUDED " /> </className> <function mainExpr="(?xm-s) # free-spacing (see `RegEx - Pattern Modifiers`) \\ ( section\*? |subsection\*? |subsubsection\*? |paragraph\*? |subparagraph\*? ) {.*?} " > <functionName> <funcNameExpr expr="(?xm-s) # free-spacing (see `RegEx - Pattern Modifiers`) \\ ( section\*? |subsection\*? |subsubsection\*? |paragraph\*? |subparagraph\*? ) {.*?} " /> </functionName> </function> </classRange> </parser> </functionList> </NotepadPlus>
Or, if you want to hide the
\XYZ{...}
wrappers around everything:<?xml version="1.0" encoding="UTF-8" ?> <!-- ==========================================================================\ | To learn how to make your own language parser, please check the following | link: https://npp-user-manual.org/docs/function-list/ \=========================================================================== --> <NotepadPlus> <functionList> <parser displayName="LaTeX Syntax" id ="latex_class" commentExpr="(?x) (%.*?$) # Comment " > <function mainExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?im-s) # ignore case, ^ and $ match start/end of line, dot doesn't match newline \\(begin| part\*?| subsection\*?| subsubsection\*?| paragraph\*?| subparagraph\*?) {.*}" > <functionName> <nameExpr expr="(?xm-s) # free-spacing (see `RegEx - Pattern Modifiers`) (?<={) .*? (?=}) " /> </functionName> </function> <classRange mainExpr ="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?m) # ^ and $ match at line-breaks (?'CLASS_START' ^ # NO leading white-space at start-of-line \\(chapter\*?) ) (?s:.*?) # whatever, (?= # ...up till \s* # ...optional leading white-space of (?: (?&CLASS_START) # ...next header | \Z # ...or end-of-text ) ) " > <className> <nameExpr expr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?<={) # brace before name .*? # name (?=}) # brace after name " /> </className> <function mainExpr="(?xm-s) # free-spacing (see `RegEx - Pattern Modifiers`) \\ ( section\*? |subsection\*? |subsubsection\*? |paragraph\*? |subparagraph\*? ) {.*?} " > <functionName> <funcNameExpr expr="(?xm-s) # free-spacing (see `RegEx - Pattern Modifiers`) (?<={) .*? (?=}) " /> </functionName> </function> </classRange> </parser> </functionList> </NotepadPlus>
which will yield:
You can, of course, feel free to tweak it to match your desires.
(Also, note that any class (ie, chapter) that doesn’t have a function (ie, section or similar) will not be listed in the FunctionList. That’s one of the quirks of the FunctionList.)
-
could you share some example LaTeX which would resolve to that FunctionList? It would make it easier for us to help you.
Sorry, I didn’t see your replies until today. The example you found is fine. Here’s a minimal one for my purposes:
\documentclass{scrreprt} \begin{document} \chapter{Chapter with sections} \section{Section 1.1} Lorem ipsum \section{Section 1.2} dolor sit amet \chapter{Chapter with no sections} consectetur adipiscing elit. \chapter{Chapter with unnumbered sections} \section*{Heading with no number} Phasellus mollis posuere ante vel tincidunt. \section*{Second heading with no number} Donec faucibus tellus sapien, vitae fringilla nulla bibendum eget. \appendix \chapter{First appendix} \include{document} \chapter{Appendix with sections} \section{B.1} Nam mauris nisl, cursus at erat in, \section{B.2} molestie luctus nulla. \end{document}
\*?
says “0 or 1 literal asterisk characters”. Is that really what you’re trying to match? I don’t know enough about LaTeX to make an educated guess there; it just seems odd to me. But looking at the built-inlatex.xml
, I guess that uses the\*?
after each of it’s “functions”, so maybe that is reasonable.Yep that’s normal LaTeX syntax. Many functions have starred and unstarred versions - see example above for the usage on sections/chapters if you’re curious.
If you don’t have
openSymbole
/closeSymbole
, then the mainExpr for the classRange needs to match the entire class, from beginning to end, and then it searches for any functions inside the results of that regex. (And it needs to be a multi-line match)That explains where I was going wrong!
(Did you start on your own, or did you start with the latex.xml that ships with Notepad++ since v8.7? Because that shows a bigger list of things all at the same level, which is better than nothing)
I started with the default but cut it right down as I only want it to match chapters and sections and not every instance of
\begin{environment}
or structure at subsection or below.(Also, note that any class (ie, chapter) that doesn’t have a function (ie, section or similar) will not be listed in the FunctionList. That’s one of the quirks of the FunctionList.)
Unfortunately, I do want it to pick up chapters that have no sections.
As a workaround, could I have the
<function mainExpr=x>
just match the first thing in the chapter if it doesn’t find a\section
in it? Meaning something like “IF\\section\*?{.*?}
exists THEN match every instance of\\section\*?{.*?}
, ELSE match the first instance of\S+
.”Then there’d always be a “function” in every chapter and the “empty” ones would still show up.
I don’t know how (or if its possible) to do that with regexes though.
-
@Jason-McGee said in functionList for LaTeX: Trying to use classRange to have a hierarchical chapter > section document outline:
could I have the
<function mainExpr=x>
just match the first thing in the chapter if it doesn’t find a\section
in it? Meaning something like “IF\\section\*?{.*?}
exists THEN match every instance of\\section\*?{.*?}
, ELSE match the first instance of\S+
.”With the way that the nesting works for FunctionLists (they don’t just do one regex; they do regex just on the results of previous regex, and it gets confusing), I am not certain how to accomplish that.
I thought I could try to relax some of the rules, so that inside a
\chapter
, anything starting with\
would start a function (even the chapter!) – and because the chapter would match, then it would have contents. And that “worked” (as long as you don’t mind having each\chapter
class repeated as a function in that class, too):
But, unfortunately, as you can see, it doesn’t pick up the
\appendix
or the\include
. If you needed it to, then this wouldn’t work for you. But if that’s an okay compromise, then here’s an XML<?xml version="1.0" encoding="UTF-8" ?> <!-- ==========================================================================\ | To learn how to make your own language parser, please check the following | link: https://npp-user-manual.org/docs/function-list/ \=========================================================================== --> <NotepadPlus> <functionList> <parser displayName="LaTeX Syntax" id ="latex_class" commentExpr="(?x) (%.*?$) # Comment " > <function mainExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?im-s) # ignore case, ^ and $ match start/end of line, dot doesn't match newline \\(begin| part\*?| subsection\*?| subsubsection\*?| paragraph\*?| subparagraph\*?) {.*}" > </function> <classRange mainExpr ="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?m) # ^ and $ match at line-breaks (?'CLASS_START' ^ # NO leading white-space at start-of-line \\(chapter\*?) ) (?s:.*?) # whatever, (?= # ...up till \s* # ...optional leading white-space of (?: (?&CLASS_START) # ...next header | \Z # ...or end-of-text ) ) " > <className> <nameExpr expr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) \\(chapter\*?) # prefix INCLUDED { # brace before name INCLUDED .*? # name } # brace after name INCLUDED " /> </className> <function mainExpr="(?xm-s) # free-spacing (see `RegEx - Pattern Modifiers`) ^ \\ \w .* " > <functionName> <funcNameExpr expr="(?xm-s) # free-spacing (see `RegEx - Pattern Modifiers`) ^ \\ \w .* " /> </functionName> </function> </classRange> </parser> </functionList> </NotepadPlus>
If it’s not an okay compromise, then one of the other regex+FunctionList experts is going to chime in, because I spent more than an hour experimenting with that this morning, and I’m out of ideas. (I’m specifically hoping @MAPJe71 stops by, as the resident FunctionList guru)
-
Thanks @PeterJones, I really appreciate your time! I spent a chunk of time yesterday too trying to work out the regex if-then-else functionality before I gave up.
I wouldn’t want to pick up everything that starts with
\
because that will pull in a lot of commands that aren’t related to document structure (every command starts with\
). For example, here’s my minimum working example with a numbered list in it, and that picks up the\begin{enumerate}
and\item
s:\documentclass{scrreprt} \begin{document} \chapter{Chapter 1 with sections} \section{1.1} \subsection{1.1.1} Lorem \subsection{1.1.2} ipsum \section{1.2} \begin{enumerate} \item dolor \item sit \item amet \end{enumerate} \chapter{Chapter 2 with no sections} consectetur adipiscing elit. \chapter{Chapter 3 with unnumbered sections} \section*{Heading with no number} Phasellus mollis posuere ante vel tincidunt. \section*{Second heading with no number} Donec faucibus tellus sapien, vitae fringilla nulla bibendum eget. \appendix \chapter{Appendix A} \include{document} \chapter{Appendix B with sections} \section{B.1} Nam mauris nisl, cursus at erat in, \section{B.2} molestie luctus nulla. \end{document}
… but picking up the
\chapter{}
as a function along with the\section{}
s is a great workaround!Here’s what I have now:
<?xml version="1.0" encoding="UTF-8" ?> <!-- ==========================================================================\ | To learn how to make your own language parser, please check the following | link: https://npp-user-manual.org/docs/function-list/ \=========================================================================== --> <NotepadPlus> <functionList> <parser displayName="LaTeX Syntax" id ="latex_class" commentExpr="(?x) (%.*?$) # Comment " > <function mainExpr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?im-s) # ignore case, ^ and $ match start/end of line, dot doesn't match newline \\begin{document} # match start of document " > </function> <classRange mainExpr ="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?m) # ^ and $ match at line-breaks (?'CLASS_START' ^\s* # optional leading white space before \chapter \\(chapter\*?) ) (?s:.*?) # whatever, (?= # ...up till \s* # ...optional leading white-space of (?: (?&CLASS_START) # ...next header | (\\end{document}) # ...or end of document ) ) " > <className> <nameExpr expr="(?x) # free-spacing (see `RegEx - Pattern Modifiers`) (?<={) # brace before name .*? # name (?=}) # brace after name " /> </className> <function mainExpr="(?xm-s) # free-spacing (see `RegEx - Pattern Modifiers`) \\(chapter| # match chapter so that even \chapters with no \section appear section| # match \section subsection| # match \subsection )\*?{.*} # match starred and unstarred commands " > <functionName> <funcNameExpr expr=".*"/> </functionName> </function> </classRange> </parser> </functionList> </NotepadPlus>
And the result on the sample file:
I modified the
classRange
mainExpr
because I wanted to also match indented\chapter{}
s (like I have for the appendices in the new sample file). After that change I found that the last\chapter{}
wasn’t being matched with\Z
so I changed the alternate search to look for\end{document}
instead (which will always appear) and that worked.Thanks for your help!