Bad Bracket Highlighter

SalviaSage

Earlier, thanks to the people on this forum. We managed to make a line-final whitespace highlighter
in pythonscript.

So, I got the idea to modify this code to make it so that “bad brackets” that is brackets without closers or openers also get highlighted in a similar fashion.

Link to my post is here:
https://notepad-plus-plus.org/community/topic/15507/feature-request-show-only-line-final-whitespace/19

I will try to modify the script myself if I don’t get help, I don’t know how much success I will have with it. But, oh well.

Thanks for the replies, cya.

Scott Sumner

@SalviaSage

Why don’t you start by showing some examples of a bad-bracket situation?

SalviaSage

It’s very simple.

There is already a built-in feature which colours the bad brackets but only if the caret cursor is on that bracket.

The idea is now, they should be highlighted without the caret cursor being on them.
So, the user would be pointed to it and fix it.

dinkumoil

@SalviaSage

Have you tried the BracketsCheck plugin? Available via PluginManager.

Scott Sumner

@SalviaSage said:

There is already a built-in feature which colours the bad brackets but only if the caret cursor is on that bracket

How does this color “bad” brackets? I see it coloring matching brackets, but it makes no decisions about a bracket’s “goodness” or “badness”. What I’m getting at is “What is your algorithm for determining a ‘bad’ bracket?”

Maybe @dinkumoil 's plugin suggestion helps you.

There is also a Pythonscript that boxes in the containing brackets when your caret is “inside”, if that helps you. But I’m sensing it doesn’t, because you already know about that script from our previous discussion in this thread…

SalviaSage

Dear Scott Sumner;

What I mean is, aside from notepad++ highlighting the brackets with their matching brackets, it can also detect if a bracket does not have a matching pair and then color that bracket differently. You can see this in the style configurator, it is called “bad brace colour”

You might have missed this though, because by default the background for this highlighter is white, I changed it to pink though to alert me, but it only does this highlighting when your cursor is on that bracket.

So, I was thinking we could highlight it always, and the brackets check does not do that, but it does inform you if there is a bad bracket, but it is not perfect. For example, it can not tell apart the less than or greater than sign “>” because the code for that is not in there.

Also, there is a bug at the location of the character that it informs you, for example it says the bad bracket is at character 64, when that character is actually at character 74 etc.

SalviaSage

I would appreciate it though, if the code for EOL whitespace could be modified in some way to also allow for this feature (without breaking it).

Also, that inside bracket highlighter is a cool feature, but again it does not work together with the EOL whitespace code. it bugs the EOL whitespace feature while the inside bracket highlighting still works.

I think the codes can be merged though,

so, we could have the bad bracket highlighter, and the eol whitespace highlighter and the inside parenthesis highlighter all working in unison, I am trying to kinda merge the code but I don’t know if I will ever be able to do that without your help.

Thanks for all your work and see you.

Scott Sumner

@SalviaSage said:

You can see this in the style configurator, it is called “bad brace colour”

Ah…I was not aware of this feature…too bad its default background color is not something other than white. I have also made mine “pink” moving forward.

for example it says the bad bracket is at character 64, when that character is actually at character 74

What is the 64 and 74 character things you are talking about? I see no indication other than a “pink” bracket… I think maybe what you are saying is that it is turning the bracket at column 64 pink when it should do the one at column 74…? Is that right? Well, I can’t see exactly what you mean, but I’m thinking that you as the human are judging right-from-wrong in a context where there is little chance that Notepad++ itself can do the same. Sure, it can tell you you don’t have all the brackets correctly matched, but exactly which ones are supposed to match…it has no real clue. And that’s probably where the whole thing falls apart (and, if you’ll notice, is subtly where I’ve been leading you from the start of this thread).

Which bracket (actually “parenthesis”) should be colored pink below, and why? Which one should Notepad++ choose to color pink?

(((    :^)    )))

BTW, the “bracket highlighter” Pythonscript also has to make a right–or wrong–choice here…note that we aren’t doing a whole-document tokenizing-check…just a simple “do I have an opening-bracket to my left and a closing-bracket to my right” check…

it bugs the EOL whitespace feature while the inside bracket highlighting still works.

So if you run the EOL-whitespace thing first, and then the bracket-highlighter, you’re going to have the effect you say you see, for the simple reason that they are both using the same indicator number, because they share a very similar line of source code:

XXXX__dict['indic_to_use'] = 10  # pick a free indicator number

After you use 10 in the execution of one script, it is no longer a “free indicator number”. If you change one of the 10s to something else, e.g. 11, the functionality of the two scripts should peacefully coexist when both are run.

I think the codes can be merged though

Yes, nothing preventing that, and not too hard…I leave that as an exercise for the reader. :-)

SalviaSage

WoW, nice, I changed it to 11 and now I have them both working.

You see, this is what seperates an expert programmer (you) with an amateur (me).

You know what does what and you can find fixes to the problems fast.
You are not the first progger i saw doing that, but I hope to get there 1 day.

Big inspiration, and big thanks to you.

I’ll cya lata, I always have ideas.

(also, please tell me how you do that red code box and the biger code window also, thx)

Scott Sumner

@SalviaSage said:

tell me how you do that red code box and the biger code window also

I don’t understand what this means.

SalviaSage

Like, how you typed in the “10” and the “11” up there in that little red box, also the code window.

Scott Sumner

@SalviaSage

The important-est part about either technique is that inside them this website won’t mess with the content…it will post your text verbatim.

Red text:

`I am red because I am surrounded by grave accents` --> I am red because I am surrounded by grave accents
See https://en.wikipedia.org/wiki/Grave_accent

Black box (code window):

I'm text where my composer put 4 spaces in front of the "I" in "I'm"
I could be lines of code composed in N++ and then indented 4 spaces before copying!

Other, related:

```z
I’m a variation on the indented black box above, but without the black and without my text needing to be indented
```

will yield:

I'm a variation on the indented black box above, but without the black and without my text needing to be indented

guy038

Hi, @salviasage, @scott-sumner and All,

Thinking ( again ! ) about the matching pair problem, it’s really a tricky problem !

1)

Consider that simple text below :

{abc
(123)
[def(ghi)]
(jkl[mno])
(789)
pqr}

Obviously, this text is well balanced. So, either,

The regex, given at the end of my post :

https://notepad-plus-plus.org/community/topic/14501/has-a-plugin-like-sublime-plugin-brackethighlighter/12

and the BracketHighlighter.py Python script :

https://notepad-plus-plus.org/community/topic/14501/has-a-plugin-like-sublime-plugin-brackethighlighter/7

Matches and colors the totality that multi-lines area

Note, that you must, first, copy/paste the regex, from your browser, in a N++ document or a new tab, then re-select that regex and, finally, open the Find dialog, with Ctrl + F

=> The multi-lines regex should be, automatically, filled up in the Find what: zone

Now, let’s suppose you wrongly add an opening parenthesis, before the (123) string. Then, you do a second mistake, adding, this time, a closing parenthesis, after the (789) string, giving the text :

{abc
((123)
^
[def(ghi)]
(jkl[mno])
(789))
     ^ 
pqr}

Despite of these two consecutive mistakes, either, the regex and the script detect all the multi-lines block as correct !! Well, imagine that there quite a lot of text, between the strings (123) and (789) How to easily point out where is the problem ??!! There’s NO solution :-(((

2)

Second problem : how to manage escaped boundaries as, for instance, \( or \},…

Compare the difference of behavior of, both, the regex and the script between these two lines :

A ( simple [ example of text ]    {  to test ( MATCHING pairs )    of { braces }, [ brackets ] and  ( parentheses ) by } , ) DEFAULT

A ( simple [ example \] of text ] {  to test ( MATCHING \{ pairs ) of { braces }, [ brackets ] and  ( parentheses ) by } , ) DEFAULT

So, to my mind, it would be judicious to consider all these escaped boundaries as standard characters

In addition, as Scott said, I decided to avoid the < and > symbols, which are, generally, seen as arithmetic operators !

So, with the help of this other post :

https://notepad-plus-plus.org/community/topic/14090/best-way-to-find-unmatched-parentheses/5

I created a new version of the 3 generic recursive patterns, named A, B and C, involved in matching well-balanced [ multi-lines ] blocks of text !

First, some definitions :

A boundary is one these 6 symbols : ( , ) , [ , ] , { , }
SB = Starting Boundary of a pair, escaped with the \ symbol, to be considered as literal
EB = Ending boundary of a pair, escaped with the \ symbol, to be considered as literal
AC = Allowed Character = Any single character, different from, either, the SB and the EB boundaries, possibly escaped
R# = Recursive call subroutine to capturing group #. Hence, the regex syntax (?#)

Notes :

The (?0) or (?R) syntaxes are a recursive call of the overall regex
An AC allowed character OR an ESCAPED boundary can be found, either, with the regexes :
- (?x) (?: \\ [{}] | [^{}] ) , in case of brace boundaries
- (?x) (?: \\ [][] | [^][] ) , in case of bracket boundaries
- (?x) (?: \\ [()] | [^()] ) , in case of parenthese boundaries
- (?x) (?: \\ [][(){}] | [^][(){}] ) , if brace, bracket and parenthese boundaries

Then :

The recursive pattern A is the regex SB(?:CA|R0)*EB , which searches the largest area, even on several lines, between a SB boundary and an EB boundary, which may contain other juxtaposed and/or nested blocks SB…EB, all correctly balanced
The recursive pattern B is the regex CA*(SB(?:CA|R1)*EB)CA* , which searches the largest area, even on several lines, between a SB boundary and an EB boundary, which may contain other juxtaposed and/or nested blocks SB…EB, all correctly balanced, possibly preceded and/or followed by any range, even null, of AC characters
The recursive pattern C is the regex (?:CA*(SB(?:CA|R1)*EB)CA*)+ , which searches for any non-null amount of consecutive areas, as defined above ( matched with the regex (B) )

Practically, here are, below, the regexes A , B and C, using the free-spacing mode, a lot of non-capturing groups and the recursive syntaxes (?#)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Regex A ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # BRACE, BRACKET and PARENTHESE boundaries

( (?<!\\) \{ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \} ) |
( (?<!\\) \[ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \] ) |
( (?<!\\) \( (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \) )

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Regex B ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # BRACE, BRACKET and PARENTHESE boundaries

(?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \{ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \} ) (?: \\ [][(){}] | [^][(){}] )* |
(?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \[ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \] ) (?: \\ [][(){}] | [^][(){}] )* |
(?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \( (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \) ) (?: \\ [][(){}] | [^][(){}] )*

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Regex C ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # BRACE, BRACKET and PARENTHESE boundaries
(?:
(?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \{ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \} ) (?: \\ [][(){}] | [^][(){}] )* |
(?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \[ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \] ) (?: \\ [][(){}] | [^][(){}] )* |
(?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \( (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \) ) (?: \\ [][(){}] | [^][(){}] )*
)+

Now, if we test the regex A, against the text :

A ( simple [ example of text ]    {  to test ( MATCHING pairs )    of { braces }, [ brackets ] and  ( parentheses ) by } , ) DEFAULT

A ( simple [ example \] of text ] {  to test ( MATCHING \{ pairs ) of { braces }, [ brackets ] and  ( parentheses ) by } , ) DEFAULT

We get the same behavior, despite of some escaped boundaries, in the second line ;-)) Just as expected !

Now, you may prefer that these 3 patterns, above, would search for ONLY ONE type of boundary !

So, here are, for each generic pattern, the different regexes matching, respectively, brace, bracket or parenthese boundaries :

For pattern A

~~~~~~~~~~~~~~~~ Regex A ~~~~~~~~~~~~~~~~

(?x) # BRACE boundaries { and }
(?<!\\) \{
(?: (?: \\ [{}] | [^{}] ) | (?0) )*
(?<!\\) \}

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # BRACKET boundaries [ and ]
(?<!\\) \[
(?: (?: \\ [][] | [^][] ) | (?0) )*
(?<!\\) \]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # PARENTHESE boundaries ( and )
(?<!\\) \(
(?: (?: \\ [()] | [^()] ) | (?0) )*
(?<!\\) \)

For pattern B

~~~~~~~~~~~~~~~~ Regex B ~~~~~~~~~~~~~~~~

(?x) # BRACE boundaries { and }

(?: \\ [{}] | [^{}] )*
(
(?<!\\) \{
(?: (?: \\ [{}] | [^{}] ) | (?1) )*
(?<!\\) \}
)
(?: \\ [{}] | [^{}] )*

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # BRACKET boundaries [ and ]

(?: \\ [][] | [^][] )*
(
(?<!\\) \[
(?: (?: \\ [][] | [^][] ) | (?1) )*
(?<!\\) \]
)
(?: \\ [][] | [^][] )*

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # PARENTHESE boundaries ( and )

(?: \\ [()] | [^()] )*
(
(?<!\\) \(
(?: (?: \\ [()] | [^()] ) | (?1) )*
(?<!\\) \)
)
(?: \\ [()] | [^()] )*

For pattern C

~~~~~~~~~~~~~~~~ Regex C ~~~~~~~~~~~~~~~~

(?x) # BRACE boundaries { and }
(?:
(?: \\ [{}] | [^{}] )*
(
(?<!\\) \{
(?: (?: \\ [{}] | [^{}] ) | (?1) )*
(?<!\\) \}
)
(?: \\ [{}] | [^{}] )*
)+

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # BRACKET boundaries [ and ]
(?:
(?: \\ [][] | [^][] )*
(
(?<!\\) \[
(?: (?: \\ [][] | [^][] ) | (?1) )*
(?<!\\) \]
)
(?: \\ [][] | [^][] )*
)+

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # PARENTHESE boundaries ( and )
(?:
(?: \\ [()] | [^()] )*
(
(?<!\\) \(
(?: (?: \\ [()] | [^()] ) | (?1) )*
(?<!\\) \)
)
(?: \\ [()] | [^()] )*
)+

Cheers,

guy038

SalviaSage

@guy038, you seem to have a good understanding of regex. But I don’t and I am afraid I don’t understand you.

I do understand that regex is important in the tasks that I am trying to accomplish here. For example, using a regex replace I have learned how to change line endings for example if there are mixed line endings, I can change them all in a file to the ending I want using notepad++ replace command.

I also want to be able to detect the broken parenthesis in this manner and “jump” to them or highlight them by incorporating them into the highlighter python script that I already have here. Of course, we want to do this ideally with any opener, this will alert the user to missing 'end’s missing string quotes etc.

I just think if these things could be highlighted it would be a great benefit to coders.
I find coding so difficult and I like to have tools like these…

Also, how to turn these regex find and replace commands into 1 click macros and things like that,
So, I can easily click on my mixed line ending converter or other stuff easily.

Thanks for the replies!

Scott Sumner

@SalviaSage said:

how to turn these regex find and replace commands into 1 click macros

Well now THAT is super-easy. Here’s how I like to do it:

Set up your search or replace operation and test it using the Find dialog. Include setting up all the fields and checkboxes as you want, of course
Use the toolbar button for starting a macro recording, or equivalently use the Macro (menu) -> Start Recording menu option
Do one of the search or replace actions…(e.g., FInd Next, Replace, Replace All, etc.) – this is the only thing that you do that gets saved (in other words, you can change the Find what zone over and over but, even while macro recording is going on, these changes are not recorder–but they are recorded (as a group) when you do a search/replace action)
Use the toolbar button for ending a macro recording, or equivalently use the Macro (menu) -> Stop Recording menu option
(Optional) Test your macro using the toolbar button for playback, , or equivalently use the Macro (menu) -> Playback menu option…if it is not working right, start over with the first step
Name and save your macro using the Macro (menu) -> Save Currently Recorded Macro menu option

guy038

Hello, @salviasage, @scott-sumner and All,

I forgot an very IMPORTANT point about the free-spacing mode, which is introduced with the (?x) modifier

A) IF your free-spacing pattern is a SINGLE-line regex, for instance :

(?x) abc \r \n def \r \n ghi

Then :

From your browser, copy the preferred regex, in the clipboard, with Ctrl + C
Then, you have the choice between two possibilities :
- 1) Open the Find dialog with Ctrl + F then paste, with Ctrl + V, the regex in the Find what: zone, which overwrites the present contents of the search zone
- 2) Paste the regex, anywhere in a document or a new tab, then re-select the regex and open the Find dialog, with Ctrl + F, which, automatically, fill up the Find what: zone, with the MONO-line selection

B) IF your free-spacing pattern is a MULTI-line regex, for instance :

(?x)
abc \r \n
def \r \n
ghi

Like above, from your browser, copy the preferred regex, in the clipboard, with Ctrl + C
This time, you cannot paste the regex, directly, in the Find dialog, with Ctrl + V ! You must use the case 2). So :
- Paste the regex, anywhere in a document or a new tab, then re-select the regex and open the Find dialog, with Ctrl + F, which, automatically, fill up the Find what zone, with the correct MULTI-line selection !

You may test, the single-line and multi-line regexes, with the sample text, below :

abc
def
ghi

Remark :

The \R syntax, standing for any kind of EOL, is forbidden in free-spacing regexes. So, according to the type of your file, you’ll replace any \R pattern with one the syntaxes \r\n , \n or \r !

Cheers,

guy038

Scott Sumner

@SalviaSage said:

if there are mixed line endings, I can change them all in a file to the ending I want using notepad++ replace command.

Why do this when you could simply use the Edit menu’s EOL Conversion submenu? I’d suggest doing it that way (or via double-clicking on the line-ending area on the status bar)…less error prone…and you don’t have to remember the regex to do the desired conversion.

SalviaSage

@Scott-Sumner

The reason why I can not trust that built in function in notepad++ is because it simply does not work… at least it does not work real-time. I tried it.

That is why, I am loving my regex macro because it does work real time, you can try it yourself by pressing ctrl+m to insert CR line, and then click on that, and the line won’t be converted.

whereas my regex macro does convert everything properly.
And yeah, now that I can save my regex as a macro (thanks for telling me how to do that) I don’t have to remember to type anything in the replace window.

But, I still can not detect the possible mixed line endings, without manually opening up the EOL display and looking through myself (eek… what are computers for!!! )

Scott Sumner

@SalviaSage

So the EOL Conversion feature only does something right at the time when you execute it. It doesn’t set up any “real-time” or “as you type” monitoring to intercept whatever badness you choose to do and correct it.

So files basically have a line-ending type as a convenience for the user. You hit Enter and the correct line-ending is inserted. You paste data from another source that has different line-endings and at the paste Scintilla sets the line-endings to match the destination’s EOL configuration. It is basically a set-it-and-forget-it kind of thing. HOWEVER, you can go around it, in a few ways (and probably more than this short list):

Pressing ctrl+m when you haven’t remapped ctrl+m to something else (and you aren’t using Mac EOLs)
Doing a regex replace where your replacement text uses \n or \r or \r\n and the sequence you use doesn’t match your file’s current EOL choice
Pasting data using the Clipboard History panel when an entry there has line-endings that don’t match your file’s current EOL choice

So I’d think most people just avoid doing these things, and they don’t ever deal with the problem of mixed line-endings in one file…?

May I ask why you are using ctrl+m in this way…or is this just something that you discovered will cause a mismatch and that is why you gave that as an example?

I am loving my regex macro because it does work real time

Rereading your most recent posting, I’m concerned that I am not understanding your “in real time” statement. For example, how does your regex macro work in real time? It is an on-demand thing…nothing happens until you run the macro…very much like the EOL conversion command.

I still can not detect the possible mixed line endings, without manually opening up the EOL display and looking through myself

I will try to think of some things that can be done about the mixed line-ending situation…

Scott Sumner

BTW, pressing ctrl+j if you remove the default Notepad++ keymapping tying that to the Join Lines feature will insert a \n line-ending in the current file, regardless of the file’s EOL setting. (This can be added to the previous posting’s bullet list)