Function list regex debugging
-
Hi, few days ago I created an issue about function list not working correctly for Rust. I fixed the original issue rather quickly, but noticed some more issues, and wanted to dive deeper. Unfortunately, I couldn’t find solid technical information about the inner workings or any guide for writing the actual regexes, so here are my questions:
- Is there a better way to debug function list definitions than to use regex101? I’m really tired of replacing XML reserved symbols and reloading Notepad++ every time I want to try it out, not to mention, that expression debugged using regex101 is not guaranteed to work.
- What flavour of Regex does Notepad++ use for function list? It seems it’s very close to PCRE, I don’t think it’s exact, because as I mentioned, regexes debugged at regex101 don’t work sometimes.
- Are there perhaps any known bugs with the regex engine which could be at fault for regexes debugged using regex101 not working in Notepad++?
If you think you can fix the regexes I tried to work in Notepad++, that would be appreciated as well.
-
The GitHub issue is #14746, can’t post links :/
-
First, read the online user manual at Notepad++ Online User Manual
And read the FunctionList FAQ.
Realize, that Regex 101 uses PCRE, and Notepad++ uses the Boost Regex. Some of the escape characters and such are different between the two, that’s why you can’t cut and paste the result, but if you are aware of the differences between the two variants, you’ll know what you can paste over or not. You’ll need to change those differences. As a more realistic response, read the Searching/Regex section of the Manual how to use Notepad++ itself to test your Regex.
It’s not easy, and you’ll either have to trial and error or post code and questions here if you get stuck but this is not for the faint of heart. I did one 2 years ago, and I’m still not fluent with it.
Good luck. -
@Lycan-Thrope Thanks for the reply.
First, read the online user manual
I obviously did, but I didn’t find this sort of “practical information” in there 😅
Realize, that Regex 101 uses PCRE, and Notepad++ uses the Boost Regex.
I see, that’s exactly the sort of information I was looking for, thanks.
As a more realistic response, read the Searching/Regex section of the Manual how to use Notepad++ itself to test your Regex.
That’s not exactly what I’m looking for. Ideally I’m looking for the sort of debugger that regex101 has, where you can step through the execution to see where exactly it goes wrong 😄
-
-
@sdasda7777 ,
Well, like I said, you can still use the Regex101 page to formulate your stuff and use their debugging tools, but you just have to remember, it won’t work-as-is in Notepad++ because of the different Regex engines used. I still use Regex101 to get a gist of it, but I just use the front page to view the highlighting as I need it to show the basic capturing I need done. The playback is rather redundant and time consuming, not to mention confusing to someone that isn’t familiar with what they are doing anyway. When I used that tool, all it told me is that it failed. Drilling down to what failed was another matter all together and I found the front page capture guide much more useful for my fledgling forays into Regex itself.
Again, good luck. -
@sdasda7777 ,
Incidentally, I just upvoted you, so you might be able to post github links now. ::shrug:: -
@Lycan-Thrope I’m painfully aware that regexes written using regex101 may not work in Notepad++, that’s why I’m looking for an alternative where I could rely on the fact that they will.
When I reload Notepad++ to apply function list file, I only see that the regex did not work, not the exact part of the (potentially very complex) regex that fails. This sort of debugger would have saved hours of my time (so far).
Thanks for the upvote, though.
-
@sdasda7777 said in Function list regex debugging:
I didn’t find this sort of “practical information” in there 😅
Realize, that Regex 101 uses PCRE, and Notepad++ uses the Boost Regex.
I see, that’s exactly the sort of information I was looking for, thanks.
Makes it quite explicit that Notepad++ uses Boost 1.80. I’m not sure what more could be said in the User Manual to give you that piece of “practical information”. (For example, do you really expect the Notepad++ User Manual to tell you what brand of regex engine regex101 website uses? If it did, it would have to tell you for all other such websites, which would be impractical and outside the scope of documenting the Notepad++ application.)
-
@PeterJones It’s not on the function list site, and I admit I didn’t think it would be on the regular expressions site but on the function list site. I apologize.
do you really expect the Notepad++ User Manual to tell you what brand of regex engine regex101 website uses?
Of course not. What makes you think that?
-
hmmm … in theory it should be possible to create a python script that replicates the behavior of NPP.
Of course not with Python re module or theresearch
method, butsearchInTarget
to really use Npps re-engine. -
@Ekopalypse I don’t really want to replicate N++ behaviour per se, I really just want to debug it. Getting the same result is meaningless when a regex doesn’t work, I want to know why precisely it doesn’t work.
-
@sdasda7777 said in Function list regex debugging:
@PeterJones It’s not on the function list site, and I admit I didn’t think it would be on the regular expressions site but on the function list site. I apologize.
do you really expect the Notepad++ User Manual to tell you what brand of regex engine regex101 website uses?
Of course not. What makes you think that?
Of course I didn’t really think that; I was using an exaggerated interpretation to show you that you had made a mistake, because the information is in the User Manual, even though you didn’t find it.
But exaggerations aside, I will try to see if I can try to make it more obvious in the User Manual, without repeating too much information.
-
@PeterJones said in Function list regex debugging:
I will try to see if I can try to make it more obvious in the User Manual, without repeating too much information.
I am curious, when you were reading the User Manual on the Function List page, did you see this section?
Because the link there takes you to the beginning of the Regular Expressions section of the Manual, whose first sentence (shown above) does give the Boost version number.
I’ll still try to make it even more clear, but I need to understand what parts are causing confusion to be able to figure it out.
-
@PeterJones Yeah, I did, but I didn’t assume it would be what I’m looking for. In hindsight I absolutely should have looked there. Maybe this section could hint the library and version is on the other page, but I do see this as an user error, so not sure 😅
-
@sdasda7777 said in Function list regex debugging:
I’m looking for the sort of debugger that regex101 has, where you can step through the execution to see where exactly it goes wrong
Commercial: RegexBuddy can debug Boost regexes.
-
@sdasda7777 said in Function list regex debugging:
@PeterJones Yeah, I did, but I didn’t assume it would be what I’m looking for. In hindsight I absolutely should have looked there. Maybe this section could hint the library and version is on the other page,
I had hoped that “… syntax spelled out in the docs on Searching: Regular Expressions” would be enough of a hint. Apparently not.
Possible new paragraph in the Notes on regular expressions for parsers section:
Because the Function List parser uses a subset of the same regular expression syntax that Notepad++ uses for Search > Find regular expressions, you can use Notepad++'s search dialog with Search Mode set to ☑ Regular Expression to experiment with the searches. (If you choose to use some other tool – like one of the many web-based regex explainers – to help you debug your expression for your Function List definition, please understand that there are many implementations of regular expressions, and even very similar implementations have subtle differences in behavior. You would need to find a tool that uses the exact same Boost library version that Notepad++ uses to have their results be identical. The Searching: Regular Expressions section should always list the version of the Boost library used by the most recent Notepad++, and will be updated if Notepad++ ever moves from Boost to some other regular expression library.) Because you aren’t the first to be confused by those tools, either for Function List parsers or Notepad++ searching in general, I am also thinking of updating the first paragraph of the Searching: Regular Expressions to be:
Notepad++ regular expressions (“regex”) use the Boost regular expression library v1.80 (as of NPP v8.4.7), which is based on an old PCRE (Perl Compatible Regular Expression) syntax, only departing from it in very minor ways. Complete documentation on the precise implementation is to be found on the Boost pages for search syntax and replacement syntax. (Some users have misunderstood this paragraph to mean that they can use one of the regex-explainer websites that accepts PCRE and expect anything that works there to also work in Notepad++; this is not accurate. There are many different “PCRE” implimentations, and Boost itself no longer claims to be “PCRE”, though both Boost and PCRE variants have the same origins in an early version of Perl’s regex engine. If your regex-explainer does not claim to use the same Boost engine as Notepad++ uses, there will be differences between the results from your chosen website and the results that Notepad++ gives.) Let me know if you think that clarifies things more
-
Commercial: RegexBuddy can debug Boost regexes.
Interesting, thanks for bringing it to attention! 😄 But it’s €40, which I assume most people aren’t willing to spend on a tool they expect to use once 😕
-
I was thinking more along the lines of visualizing what is found and in case of an invalid regex seeing the information why it failed but … that can’t replace a regex debugger, surely not.
-
@sdasda7777 brings good questions which I’ll rephrase as
How do the developers and maintainers of function list definitions do it? What tools are they using?