Help with regex

Claudia Frank

good to see you did it ;-)

Have a nice day
Claudia

DaveyD

Hi Claudia, thanks
I hadn’t seen your post when I wrote mine!

Regarding association, I did it by userDefinedLang ID
(By the way… thanks for that too! I couldn’t live without it! :)
[still waiting and hoping for version 3!] )

Thanks
David

(P.S. My replies are delayed since my reputation is under 2… :( - :) )

guy038

Hi, Davey and Claudia,

I understood why the regex ^=+\s+\K.+[^\s](?= ) is correct and the regex =+\s+\K.+[^\s](?= ), without the ^ character, is NOT correct

Just imagine the subject text, below :

=======      MY FIRST SECTION      ===================
=======      MY SECOND SECTION      ===================

You have to remember that the regex \s is stricly identical to the regex [\t\n\x0B\f\r\x20\xA0]

So, if you consider the wrong regex =+\s+\K.+[^\s](?= )

A first click, on the Find Next button, matches the text MY FIRST SECTION. Nice !
A second click, on the Find Next button, matches, wrongly, the string ======= MY SECOND SECTION. Why ?

Well, after the first click, the cursor is located between the last letter N and the space, in the first line

So, that wrong regex =+\s+\K.+[^\s](?= ) matches :

The =================== string, at the end of the first line, due to the regex =+
The EOL characters ( \r\n ) of the first line, which are, both, \s characters !
The \K form forget the present regex, so the cursor is reset between the \n of the first line and the first = character of the second line
Then, after backtracking, the ======= MY SECOND SECTIO string, due to the .+ part of the regex
Finally, the N character, as it’s a NON BLANK character ( [^\s] )
The space character, although not part of the final regex, must be present, after the last N character, of the second line : that’s right

On the contrary, after the first match, when we use the regex ^=+\s+\K.+[^\s](?= ), the cursor location is correctly reset, first, at the beginning of the second line !

Cheers,

guy038

Claudia Frank

Hello guy038,

thank you for the detailed and good explanation.
When creating a regex I’m still don’t care enough about the current cursor position and the correct meaning of parts
of the regex like \K. Like you said, it is not only the reset but als the forget the previous matches which is important.
But I hope this gets done when doing more and more of these. Yesterday I found a nice webpage
which supports me understanding my regexes ;-)

Cheers
Claudia

DaveyD

Hi guy038
Thanks for the extra clarity that you provided - as you always do!

All the best
Davey

guy038

Hello, Claudia and DaveyD,

I had a look to the webpage, that you found out Claudia. Really interesting, indeed !

However, in order to get the same matches than the Notepad++ regex engine, we should take care about the following points, when using the site :

https://regex101.com

We must use the default PCRE flavor, on the left part of the window, which seems to have the closest behaviour than the N++ regex engine
In the gmixXsuUAJ field, you should, systematically, add the two modifiers gm
- The g modifier means it will indicate all the matches of the test regex, and not only the first one ! ( just like the Find All button of the Mark tab ! )
- The m modifier means that the anchors ^ and $ represent the beginning and the end of each line, as the N++ regex engine does ( implicit modifier (?m) )
In the gmixXsuUAJ field, you will add the i modifier, if you DON’T check the Match case option OR if your N++ regex begins with the (?i) form
In the gmixXsuUAJ field, you will add the s modifier, if you have CHECKED the . matches newline option OR if your N++ regex begins with the (?s) form
In the gmixXsuUAJ field, you will add the x modifier, if your N++ regex begins with the form (?x)
In the gmixXsuUAJ field, you will NOT indicate the m modifier, if your N++ regex begins with the form (?-m)

So, giving our last example of the test string, below,

=======      MY FIRST SECTION      ===================
=======      MY SECOND SECTION      ===================

If you enter the regex =+\s+\K.+[^\s](?= ), in the Regular Expression field and gm after the slash, in the modifiers field, you should have the strings MY FIRST SECTION and ======= MY SECOND SECTION, both, highlighted in blue
Now, just add the ^ anchor, at the beginning of the regex -> This time, you should have the two strings MY FIRST SECTION and MY SECOND SECTION, both, highlighted in blue !

Cheers,

guy038

Claudia Frank

Hi guy038,

thanky you very much for the infos. I was looking for a boost:regex tester but couldn’t find anyone.
So with this site and your explanation it makes it easier to find the misunderstandings when using
regexes.

Cheers
Claudia

DaveyD

Hi Loreia,
Just to mention, when I first started, I found a website (similar to the one you mentioned above) that was really helpful, and I liked it mostly because they had a desktop app that included the same features!
You can take a here http://regexr.com/
I think the download is somewhere else www.gskinner.com - however, this is from memory
I’m not saying that this is better than what you found, but just another nice option with a desktop app.

I think everything guy038 mentioned above applies here too, except that here, the g modifier is enabled by default

Davey

Claudia Frank

Loreia ?? Did some UDL lately??
Yeah, https://github.com/gskinner/regexr/
Worth a try

Thanks and cheers
Claudia

DaveyD

:) Yes, I did! - Sorry about that confusion!

Let me know what you think - I thought it was great and it really helped me in the beginning.

All the best
Davey