Bad Bracket Highlighter

Scott Sumner

The important-est part about either technique is that inside them this website won’t mess with the content…it will post your text verbatim.

Red text:

`I am red because I am surrounded by grave accents` --> I am red because I am surrounded by grave accents
See https://en.wikipedia.org/wiki/Grave_accent

Black box (code window):

I'm text where my composer put 4 spaces in front of the "I" in "I'm"
I could be lines of code composed in N++ and then indented 4 spaces before copying!

Other, related:

```z
I’m a variation on the indented black box above, but without the black and without my text needing to be indented
```

will yield:

I'm a variation on the indented black box above, but without the black and without my text needing to be indented

guy038

Hi, @salviasage, @scott-sumner and All,

Thinking ( again ! ) about the matching pair problem, it’s really a tricky problem !

1)

Consider that simple text below :

{abc
(123)
[def(ghi)]
(jkl[mno])
(789)
pqr}

Obviously, this text is well balanced. So, either,

The regex, given at the end of my post :

https://notepad-plus-plus.org/community/topic/14501/has-a-plugin-like-sublime-plugin-brackethighlighter/12

and the BracketHighlighter.py Python script :

https://notepad-plus-plus.org/community/topic/14501/has-a-plugin-like-sublime-plugin-brackethighlighter/7

Matches and colors the totality that multi-lines area

Note, that you must, first, copy/paste the regex, from your browser, in a N++ document or a new tab, then re-select that regex and, finally, open the Find dialog, with Ctrl + F

=> The multi-lines regex should be, automatically, filled up in the Find what: zone

Now, let’s suppose you wrongly add an opening parenthesis, before the (123) string. Then, you do a second mistake, adding, this time, a closing parenthesis, after the (789) string, giving the text :

{abc
((123)
^
[def(ghi)]
(jkl[mno])
(789))
     ^ 
pqr}

Despite of these two consecutive mistakes, either, the regex and the script detect all the multi-lines block as correct !! Well, imagine that there quite a lot of text, between the strings (123) and (789) How to easily point out where is the problem ??!! There’s NO solution :-(((

2)

Second problem : how to manage escaped boundaries as, for instance, \( or \},…

Compare the difference of behavior of, both, the regex and the script between these two lines :

A ( simple [ example of text ]    {  to test ( MATCHING pairs )    of { braces }, [ brackets ] and  ( parentheses ) by } , ) DEFAULT

A ( simple [ example \] of text ] {  to test ( MATCHING \{ pairs ) of { braces }, [ brackets ] and  ( parentheses ) by } , ) DEFAULT

So, to my mind, it would be judicious to consider all these escaped boundaries as standard characters

In addition, as Scott said, I decided to avoid the < and > symbols, which are, generally, seen as arithmetic operators !

So, with the help of this other post :

https://notepad-plus-plus.org/community/topic/14090/best-way-to-find-unmatched-parentheses/5

I created a new version of the 3 generic recursive patterns, named A, B and C, involved in matching well-balanced [ multi-lines ] blocks of text !

First, some definitions :

A boundary is one these 6 symbols : ( , ) , [ , ] , { , }
SB = Starting Boundary of a pair, escaped with the \ symbol, to be considered as literal
EB = Ending boundary of a pair, escaped with the \ symbol, to be considered as literal
AC = Allowed Character = Any single character, different from, either, the SB and the EB boundaries, possibly escaped
R# = Recursive call subroutine to capturing group #. Hence, the regex syntax (?#)

Notes :

The (?0) or (?R) syntaxes are a recursive call of the overall regex
An AC allowed character OR an ESCAPED boundary can be found, either, with the regexes :
- (?x) (?: \\ [{}] | [^{}] ) , in case of brace boundaries
- (?x) (?: \\ [][] | [^][] ) , in case of bracket boundaries
- (?x) (?: \\ [()] | [^()] ) , in case of parenthese boundaries
- (?x) (?: \\ [][(){}] | [^][(){}] ) , if brace, bracket and parenthese boundaries

Then :

The recursive pattern A is the regex SB(?:CA|R0)*EB , which searches the largest area, even on several lines, between a SB boundary and an EB boundary, which may contain other juxtaposed and/or nested blocks SB…EB, all correctly balanced
The recursive pattern B is the regex CA*(SB(?:CA|R1)*EB)CA* , which searches the largest area, even on several lines, between a SB boundary and an EB boundary, which may contain other juxtaposed and/or nested blocks SB…EB, all correctly balanced, possibly preceded and/or followed by any range, even null, of AC characters
The recursive pattern C is the regex (?:CA*(SB(?:CA|R1)*EB)CA*)+ , which searches for any non-null amount of consecutive areas, as defined above ( matched with the regex (B) )

Practically, here are, below, the regexes A , B and C, using the free-spacing mode, a lot of non-capturing groups and the recursive syntaxes (?#)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Regex A ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # BRACE, BRACKET and PARENTHESE boundaries

( (?<!\\) \{ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \} ) |
( (?<!\\) \[ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \] ) |
( (?<!\\) \( (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \) )

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Regex B ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # BRACE, BRACKET and PARENTHESE boundaries

(?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \{ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \} ) (?: \\ [][(){}] | [^][(){}] )* |
(?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \[ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \] ) (?: \\ [][(){}] | [^][(){}] )* |
(?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \( (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \) ) (?: \\ [][(){}] | [^][(){}] )*

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Regex C ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # BRACE, BRACKET and PARENTHESE boundaries
(?:
(?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \{ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \} ) (?: \\ [][(){}] | [^][(){}] )* |
(?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \[ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \] ) (?: \\ [][(){}] | [^][(){}] )* |
(?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \( (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \) ) (?: \\ [][(){}] | [^][(){}] )*
)+

Now, if we test the regex A, against the text :

A ( simple [ example of text ]    {  to test ( MATCHING pairs )    of { braces }, [ brackets ] and  ( parentheses ) by } , ) DEFAULT

A ( simple [ example \] of text ] {  to test ( MATCHING \{ pairs ) of { braces }, [ brackets ] and  ( parentheses ) by } , ) DEFAULT

We get the same behavior, despite of some escaped boundaries, in the second line ;-)) Just as expected !

Now, you may prefer that these 3 patterns, above, would search for ONLY ONE type of boundary !

So, here are, for each generic pattern, the different regexes matching, respectively, brace, bracket or parenthese boundaries :

For pattern A

~~~~~~~~~~~~~~~~ Regex A ~~~~~~~~~~~~~~~~

(?x) # BRACE boundaries { and }
(?<!\\) \{
(?: (?: \\ [{}] | [^{}] ) | (?0) )*
(?<!\\) \}

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # BRACKET boundaries [ and ]
(?<!\\) \[
(?: (?: \\ [][] | [^][] ) | (?0) )*
(?<!\\) \]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # PARENTHESE boundaries ( and )
(?<!\\) \(
(?: (?: \\ [()] | [^()] ) | (?0) )*
(?<!\\) \)

For pattern B

~~~~~~~~~~~~~~~~ Regex B ~~~~~~~~~~~~~~~~

(?x) # BRACE boundaries { and }

(?: \\ [{}] | [^{}] )*
(
(?<!\\) \{
(?: (?: \\ [{}] | [^{}] ) | (?1) )*
(?<!\\) \}
)
(?: \\ [{}] | [^{}] )*

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # BRACKET boundaries [ and ]

(?: \\ [][] | [^][] )*
(
(?<!\\) \[
(?: (?: \\ [][] | [^][] ) | (?1) )*
(?<!\\) \]
)
(?: \\ [][] | [^][] )*

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # PARENTHESE boundaries ( and )

(?: \\ [()] | [^()] )*
(
(?<!\\) \(
(?: (?: \\ [()] | [^()] ) | (?1) )*
(?<!\\) \)
)
(?: \\ [()] | [^()] )*

For pattern C

~~~~~~~~~~~~~~~~ Regex C ~~~~~~~~~~~~~~~~

(?x) # BRACE boundaries { and }
(?:
(?: \\ [{}] | [^{}] )*
(
(?<!\\) \{
(?: (?: \\ [{}] | [^{}] ) | (?1) )*
(?<!\\) \}
)
(?: \\ [{}] | [^{}] )*
)+

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # BRACKET boundaries [ and ]
(?:
(?: \\ [][] | [^][] )*
(
(?<!\\) \[
(?: (?: \\ [][] | [^][] ) | (?1) )*
(?<!\\) \]
)
(?: \\ [][] | [^][] )*
)+

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

(?x) # PARENTHESE boundaries ( and )
(?:
(?: \\ [()] | [^()] )*
(
(?<!\\) \(
(?: (?: \\ [()] | [^()] ) | (?1) )*
(?<!\\) \)
)
(?: \\ [()] | [^()] )*
)+

Cheers,

guy038

SalviaSage

@guy038, you seem to have a good understanding of regex. But I don’t and I am afraid I don’t understand you.

I do understand that regex is important in the tasks that I am trying to accomplish here. For example, using a regex replace I have learned how to change line endings for example if there are mixed line endings, I can change them all in a file to the ending I want using notepad++ replace command.

I also want to be able to detect the broken parenthesis in this manner and “jump” to them or highlight them by incorporating them into the highlighter python script that I already have here. Of course, we want to do this ideally with any opener, this will alert the user to missing 'end’s missing string quotes etc.

I just think if these things could be highlighted it would be a great benefit to coders.
I find coding so difficult and I like to have tools like these…

Also, how to turn these regex find and replace commands into 1 click macros and things like that,
So, I can easily click on my mixed line ending converter or other stuff easily.

Thanks for the replies!

Scott Sumner

@SalviaSage said:

how to turn these regex find and replace commands into 1 click macros

Well now THAT is super-easy. Here’s how I like to do it:

Set up your search or replace operation and test it using the Find dialog. Include setting up all the fields and checkboxes as you want, of course
Use the toolbar button for starting a macro recording, or equivalently use the Macro (menu) -> Start Recording menu option
Do one of the search or replace actions…(e.g., FInd Next, Replace, Replace All, etc.) – this is the only thing that you do that gets saved (in other words, you can change the Find what zone over and over but, even while macro recording is going on, these changes are not recorder–but they are recorded (as a group) when you do a search/replace action)
Use the toolbar button for ending a macro recording, or equivalently use the Macro (menu) -> Stop Recording menu option
(Optional) Test your macro using the toolbar button for playback, , or equivalently use the Macro (menu) -> Playback menu option…if it is not working right, start over with the first step
Name and save your macro using the Macro (menu) -> Save Currently Recorded Macro menu option

guy038

Hello, @salviasage, @scott-sumner and All,

I forgot an very IMPORTANT point about the free-spacing mode, which is introduced with the (?x) modifier

A) IF your free-spacing pattern is a SINGLE-line regex, for instance :

(?x) abc \r \n def \r \n ghi

Then :

From your browser, copy the preferred regex, in the clipboard, with Ctrl + C
Then, you have the choice between two possibilities :
- 1) Open the Find dialog with Ctrl + F then paste, with Ctrl + V, the regex in the Find what: zone, which overwrites the present contents of the search zone
- 2) Paste the regex, anywhere in a document or a new tab, then re-select the regex and open the Find dialog, with Ctrl + F, which, automatically, fill up the Find what: zone, with the MONO-line selection

B) IF your free-spacing pattern is a MULTI-line regex, for instance :

(?x)
abc \r \n
def \r \n
ghi

Like above, from your browser, copy the preferred regex, in the clipboard, with Ctrl + C
This time, you cannot paste the regex, directly, in the Find dialog, with Ctrl + V ! You must use the case 2). So :
- Paste the regex, anywhere in a document or a new tab, then re-select the regex and open the Find dialog, with Ctrl + F, which, automatically, fill up the Find what zone, with the correct MULTI-line selection !

You may test, the single-line and multi-line regexes, with the sample text, below :

abc
def
ghi

Remark :

The \R syntax, standing for any kind of EOL, is forbidden in free-spacing regexes. So, according to the type of your file, you’ll replace any \R pattern with one the syntaxes \r\n , \n or \r !

Cheers,

guy038

Scott Sumner

@SalviaSage said:

if there are mixed line endings, I can change them all in a file to the ending I want using notepad++ replace command.

Why do this when you could simply use the Edit menu’s EOL Conversion submenu? I’d suggest doing it that way (or via double-clicking on the line-ending area on the status bar)…less error prone…and you don’t have to remember the regex to do the desired conversion.

SalviaSage

@Scott-Sumner

The reason why I can not trust that built in function in notepad++ is because it simply does not work… at least it does not work real-time. I tried it.

That is why, I am loving my regex macro because it does work real time, you can try it yourself by pressing ctrl+m to insert CR line, and then click on that, and the line won’t be converted.

whereas my regex macro does convert everything properly.
And yeah, now that I can save my regex as a macro (thanks for telling me how to do that) I don’t have to remember to type anything in the replace window.

But, I still can not detect the possible mixed line endings, without manually opening up the EOL display and looking through myself (eek… what are computers for!!! )

Scott Sumner

@SalviaSage

So the EOL Conversion feature only does something right at the time when you execute it. It doesn’t set up any “real-time” or “as you type” monitoring to intercept whatever badness you choose to do and correct it.

So files basically have a line-ending type as a convenience for the user. You hit Enter and the correct line-ending is inserted. You paste data from another source that has different line-endings and at the paste Scintilla sets the line-endings to match the destination’s EOL configuration. It is basically a set-it-and-forget-it kind of thing. HOWEVER, you can go around it, in a few ways (and probably more than this short list):

Pressing ctrl+m when you haven’t remapped ctrl+m to something else (and you aren’t using Mac EOLs)
Doing a regex replace where your replacement text uses \n or \r or \r\n and the sequence you use doesn’t match your file’s current EOL choice
Pasting data using the Clipboard History panel when an entry there has line-endings that don’t match your file’s current EOL choice

So I’d think most people just avoid doing these things, and they don’t ever deal with the problem of mixed line-endings in one file…?

May I ask why you are using ctrl+m in this way…or is this just something that you discovered will cause a mismatch and that is why you gave that as an example?

I am loving my regex macro because it does work real time

Rereading your most recent posting, I’m concerned that I am not understanding your “in real time” statement. For example, how does your regex macro work in real time? It is an on-demand thing…nothing happens until you run the macro…very much like the EOL conversion command.

I still can not detect the possible mixed line endings, without manually opening up the EOL display and looking through myself

I will try to think of some things that can be done about the mixed line-ending situation…

Scott Sumner

BTW, pressing ctrl+j if you remove the default Notepad++ keymapping tying that to the Join Lines feature will insert a \n line-ending in the current file, regardless of the file’s EOL setting. (This can be added to the previous posting’s bullet list)

Scott Sumner

So as a prevention from pressing ctrl+m inserting a Mac line-ending into a non-Mac encoded file (as well as possibly ctrl+j inserting a Unix line-ending into a non-Unix encoded file), see the CHARADDED Pythonscript in this thread. This script should be set to be run upon Notepad++ startup.

More comments to come on this topic…which unfortunately is off-topic for this thread (but I didn’t do that). :-)

guy038

Hello, @salviasage, @scott-sumner and All,

Oh, my God ! there a very simple way to prevent the Ctrl+ M shortcut from inserting the CR character ( \x{000D} ) , which is displayed in reverse video !.. Like me, simply, affect the Ctrl+ M shortcut to the Mark dialog ( Search > Mark… ) :-)))

Et voilà !

Cheers,

guy038

Scott Sumner

@guy038

very simple way

Yes, but ctrl+m is just a special case of ALL of the control-plus-(mostly)letter codes that one hasn’t assigned shortcut functions to. The Pythonscript in that other thread takes care of all of them at once, so fat-fingered users should not see odd black-boxed things again in their editor window (like the picture at the bottom of this posting)…unless they Undo (ctrl+z) the change made by the script. Sadly, I don’t believe there is a way to remove something from the undo buffer without purging the entire thing. :-(

It’s a more complete solution: It takes care of the original problem of ctrl+m inserting a \r – without having to assign it a function, and some other things that a user may experience and not like.

By the way, the script won’t interfere with things like ctrl+a or ctrl+c or ctrl+v, etc, because those (and your ctrl+m, @guy038) get snared as commands at a higher level and don’t pass through–like unassigned ones–to be added as “text” to the current editor window.

Imgur
(those are ctrl+w and ctrl+e, respectively, as text)

Scott Sumner

So one way to avoid having mixed line-endings in your file is to automatically run a check each time the file is saved, and if any inconsistent line-endings are found, correct them at that time. Here’s a Pythonscript that will do that; I call it LineEndingRepairAtSave.py:

try:

    LERAS__bad_eol_regex_via_good_eol_dict

except NameError:

    LERAS__bad_eol_regex_via_good_eol_dict = {
        '\r\n' : r'\r(?!\n)|(?<!\r)\n',
        '\n'   : r'\r\n?',
        '\r'   : r'\r?\n',
    }

    def LERAS__callback_npp_FILEBEFORESAVE(args):
        correct_eol_for_this_file = ['\r\n', '\r', '\n'][notepad.getFormatType()]
        editor.rereplace(LERAS__bad_eol_regex_via_good_eol_dict[correct_eol_for_this_file], correct_eol_for_this_file)

    notepad.callback(LERAS__callback_npp_FILEBEFORESAVE, [NOTIFICATION.FILEBEFORESAVE])

The idea is that you run it once per Notepad++ session and it will stand guard against the tyranny of mixed line-endings in your saved files. Maybe it takes a noticeable amount of time to run on really large files…dunno…use at your own risk.

SalviaSage

I want to cry with joy at this moment…

This is what makes notepad++ great…

A special thanks to Scott as usual for his contribution.

Scott Sumner

@SalviaSage said:

I want to cry with joy at this moment…

LOL. …and I thought you’d complain that it isn’t an as-you-type or an as-you-paste solution! I still wonder why mixed line-endings is a real problem for you. I’ve been using Notepad++ for a long time and it rarely is a problem for me…

Scott Sumner

Slight change to the Pythonscript I posted earlier. I noticed that does not work correctly when the Notepad++ user executes a Save All. Here’s an update to (only) the callback function part that will fix this, just replace the old LERAS__callback_npp_FILEBEFORESAVE function definition with the following:

def LERAS__callback_npp_FILEBEFORESAVE(args):
    notepad.activateBufferID(args['bufferID'])
    correct_eol_for_this_file = ['\r\n', '\r', '\n'][notepad.getFormatType()]
    editor.rereplace(LERAS__bad_eol_regex_via_good_eol_dict[correct_eol_for_this_file], correct_eol_for_this_file)

SalviaSage

@Scott-Sumner

Dear Scott.

I am afraid this script and the broken bracket highlighter is no longer working.

I confirmed that the startup script of the PythonScript plugin is working by using some other scripts.

But, these 2 which are very similar to each other in nature, are just not working.

I don’t know why, I tried fixing it and I could not. So, I have to appeal to you for help.

:(

SalviaSage

Ignore above, there were 2 syntax errors in there for some reason,
and I fixed it. I must have made a mistake myself.