Bad Bracket Highlighter
-
@SalviaSage said:
There is already a built-in feature which colours the bad brackets but only if the caret cursor is on that bracket
How does this color “bad” brackets? I see it coloring matching brackets, but it makes no decisions about a bracket’s “goodness” or “badness”. What I’m getting at is “What is your algorithm for determining a ‘bad’ bracket?”
Maybe @dinkumoil 's plugin suggestion helps you.
There is also a Pythonscript that boxes in the containing brackets when your caret is “inside”, if that helps you. But I’m sensing it doesn’t, because you already know about that script from our previous discussion in this thread…
-
Dear Scott Sumner;
What I mean is, aside from notepad++ highlighting the brackets with their matching brackets, it can also detect if a bracket does not have a matching pair and then color that bracket differently. You can see this in the style configurator, it is called “bad brace colour”
You might have missed this though, because by default the background for this highlighter is white, I changed it to pink though to alert me, but it only does this highlighting when your cursor is on that bracket.
So, I was thinking we could highlight it always, and the brackets check does not do that, but it does inform you if there is a bad bracket, but it is not perfect. For example, it can not tell apart the less than or greater than sign “>” because the code for that is not in there.
Also, there is a bug at the location of the character that it informs you, for example it says the bad bracket is at character 64, when that character is actually at character 74 etc.
-
I would appreciate it though, if the code for EOL whitespace could be modified in some way to also allow for this feature (without breaking it).
Also, that inside bracket highlighter is a cool feature, but again it does not work together with the EOL whitespace code. it bugs the EOL whitespace feature while the inside bracket highlighting still works.
I think the codes can be merged though,
so, we could have the bad bracket highlighter, and the eol whitespace highlighter and the inside parenthesis highlighter all working in unison, I am trying to kinda merge the code but I don’t know if I will ever be able to do that without your help.
Thanks for all your work and see you.
-
@SalviaSage said:
You can see this in the style configurator, it is called “bad brace colour”
Ah…I was not aware of this feature…too bad its default background color is not something other than white. I have also made mine “pink” moving forward.
for example it says the bad bracket is at character 64, when that character is actually at character 74
What is the 64 and 74 character things you are talking about? I see no indication other than a “pink” bracket… I think maybe what you are saying is that it is turning the bracket at column 64 pink when it should do the one at column 74…? Is that right? Well, I can’t see exactly what you mean, but I’m thinking that you as the human are judging right-from-wrong in a context where there is little chance that Notepad++ itself can do the same. Sure, it can tell you you don’t have all the brackets correctly matched, but exactly which ones are supposed to match…it has no real clue. And that’s probably where the whole thing falls apart (and, if you’ll notice, is subtly where I’ve been leading you from the start of this thread).
Which bracket (actually “parenthesis”) should be colored pink below, and why? Which one should Notepad++ choose to color pink?
((( :^) )))
BTW, the “bracket highlighter” Pythonscript also has to make a right–or wrong–choice here…note that we aren’t doing a whole-document tokenizing-check…just a simple “do I have an opening-bracket to my left and a closing-bracket to my right” check…
it bugs the EOL whitespace feature while the inside bracket highlighting still works.
So if you run the EOL-whitespace thing first, and then the bracket-highlighter, you’re going to have the effect you say you see, for the simple reason that they are both using the same indicator number, because they share a very similar line of source code:
XXXX__dict['indic_to_use'] = 10 # pick a free indicator number
After you use
10
in the execution of one script, it is no longer a “free indicator number”. If you change one of the10
s to something else, e.g.11
, the functionality of the two scripts should peacefully coexist when both are run.I think the codes can be merged though
Yes, nothing preventing that, and not too hard…I leave that as an exercise for the reader. :-)
-
WoW, nice, I changed it to 11 and now I have them both working.
You see, this is what seperates an expert programmer (you) with an amateur (me).
You know what does what and you can find fixes to the problems fast.
You are not the first progger i saw doing that, but I hope to get there 1 day.Big inspiration, and big thanks to you.
I’ll cya lata, I always have ideas.
(also, please tell me how you do that red code box and the biger code window also, thx)
-
@SalviaSage said:
tell me how you do that red code box and the biger code window also
I don’t understand what this means.
-
Like, how you typed in the “10” and the “11” up there in that little red box, also the code window.
-
The important-est part about either technique is that inside them this website won’t mess with the content…it will post your text verbatim.
Red text:
`I am red because I am surrounded by grave accents` -->
I am red because I am surrounded by grave accents
See https://en.wikipedia.org/wiki/Grave_accentBlack box (code window):
I'm text where my composer put 4 spaces in front of the "I" in "I'm" I could be lines of code composed in N++ and then indented 4 spaces before copying!
Other, related:
```z
I’m a variation on the indented black box above, but without the black and without my text needing to be indented
```will yield:
I'm a variation on the indented black box above, but without the black and without my text needing to be indented
-
Hi, @salviasage, @scott-sumner and All,
Thinking ( again ! ) about the matching pair problem, it’s really a tricky problem !
1)
Consider that simple text below :
{abc (123) [def(ghi)] (jkl[mno]) (789) pqr}
Obviously, this text is well balanced. So, either,
- The regex, given at the end of my post :
and the BracketHighlighter.py Python script :
Matches and colors the totality that multi-lines area
Note, that you must, first, copy/paste the regex, from your browser, in a N++ document or a new tab, then re-select that regex and, finally, open the Find dialog, with
Ctrl + F
=> The multi-lines regex should be, automatically, filled up in the
Find what:
zoneNow, let’s suppose you wrongly add an opening parenthesis, before the
(123)
string. Then, you do a second mistake, adding, this time, a closing parenthesis, after the(789)
string, giving the text :{abc ((123) ^ [def(ghi)] (jkl[mno]) (789)) ^ pqr}
Despite of these two consecutive mistakes, either, the regex and the script detect all the multi-lines block as correct !! Well, imagine that there quite a lot of text, between the strings
(123)
and(789)
How to easily point out where is the problem ??!! There’s NO solution :-(((2)
Second problem : how to manage escaped boundaries as, for instance,
\(
or\}
,…Compare the difference of behavior of, both, the regex and the script between these two lines :
A ( simple [ example of text ] { to test ( MATCHING pairs ) of { braces }, [ brackets ] and ( parentheses ) by } , ) DEFAULT A ( simple [ example \] of text ] { to test ( MATCHING \{ pairs ) of { braces }, [ brackets ] and ( parentheses ) by } , ) DEFAULT
So, to my mind, it would be judicious to consider all these escaped boundaries as standard characters
In addition, as Scott said, I decided to avoid the
<
and>
symbols, which are, generally, seen as arithmetic operators !
So, with the help of this other post :
https://notepad-plus-plus.org/community/topic/14090/best-way-to-find-unmatched-parentheses/5
I created a new version of the
3
generic recursive patterns, namedA
,B
andC
, involved in matching well-balanced [ multi-lines ] blocks of text !First, some definitions :
-
A boundary is one these
6
symbols :(
,)
,[
,]
,{
,}
-
SB = Starting Boundary of a pair, escaped with the
\
symbol, to be considered as literal -
EB = Ending boundary of a pair, escaped with the
\
symbol, to be considered as literal -
AC = Allowed Character = Any single character, different from, either, the SB and the EB boundaries, possibly escaped
-
R# = Recursive call subroutine to capturing group
#
. Hence, the regex syntax(?#)
Notes :
-
The
(?0)
or(?R)
syntaxes are a recursive call of the overall regex -
An AC allowed character OR an ESCAPED boundary can be found, either, with the regexes :
-
(?x) (?: \\ [{}] | [^{}] )
, in case of brace boundaries -
(?x) (?: \\ [][] | [^][] )
, in case of bracket boundaries -
(?x) (?: \\ [()] | [^()] )
, in case of parenthese boundaries -
(?x) (?: \\ [][(){}] | [^][(){}] )
, if brace, bracket and parenthese boundaries
-
Then :
-
The recursive pattern
A
is the regexSB(?:CA|R0)*EB
, which searches the largest area, even on several lines, between a SB boundary and an EB boundary, which may contain other juxtaposed and/or nested blocks SB…EB, all correctly balanced -
The recursive pattern
B
is the regexCA*(SB(?:CA|R1)*EB)CA*
, which searches the largest area, even on several lines, between a SB boundary and an EB boundary, which may contain other juxtaposed and/or nested blocks SB…EB, all correctly balanced, possibly preceded and/or followed by any range, even null, of AC characters -
The recursive pattern
C
is the regex(?:CA*(SB(?:CA|R1)*EB)CA*)+
, which searches for any non-null amount of consecutive areas, as defined above ( matched with the regex(B)
)
Practically, here are, below, the regexes
A
,B
andC
, using the free-spacing mode, a lot of non-capturing groups and the recursive syntaxes(?#)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Regex A ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (?x) # BRACE, BRACKET and PARENTHESE boundaries ( (?<!\\) \{ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \} ) | ( (?<!\\) \[ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \] ) | ( (?<!\\) \( (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \) ) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Regex B ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (?x) # BRACE, BRACKET and PARENTHESE boundaries (?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \{ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \} ) (?: \\ [][(){}] | [^][(){}] )* | (?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \[ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \] ) (?: \\ [][(){}] | [^][(){}] )* | (?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \( (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \) ) (?: \\ [][(){}] | [^][(){}] )* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Regex C ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (?x) # BRACE, BRACKET and PARENTHESE boundaries (?: (?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \{ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \} ) (?: \\ [][(){}] | [^][(){}] )* | (?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \[ (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \] ) (?: \\ [][(){}] | [^][(){}] )* | (?: \\ [][(){}] | [^][(){}] )* ( (?<!\\) \( (?: (?: \\ [][(){}] | [^][(){}] ) | (?1) | (?2) | (?3) )* (?<!\\) \) ) (?: \\ [][(){}] | [^][(){}] )* )+
Now, if we test the regex
A
, against the text :A ( simple [ example of text ] { to test ( MATCHING pairs ) of { braces }, [ brackets ] and ( parentheses ) by } , ) DEFAULT A ( simple [ example \] of text ] { to test ( MATCHING \{ pairs ) of { braces }, [ brackets ] and ( parentheses ) by } , ) DEFAULT
We get the same behavior, despite of some escaped boundaries, in the second line ;-)) Just as expected !
Now, you may prefer that these
3
patterns, above, would search for ONLY ONE type of boundary !So, here are, for each generic pattern, the different regexes matching, respectively, brace, bracket or parenthese boundaries :
- For pattern
A
~~~~~~~~~~~~~~~~ Regex A ~~~~~~~~~~~~~~~~ (?x) # BRACE boundaries { and } (?<!\\) \{ (?: (?: \\ [{}] | [^{}] ) | (?0) )* (?<!\\) \} ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (?x) # BRACKET boundaries [ and ] (?<!\\) \[ (?: (?: \\ [][] | [^][] ) | (?0) )* (?<!\\) \] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (?x) # PARENTHESE boundaries ( and ) (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?0) )* (?<!\\) \)
- For pattern
B
~~~~~~~~~~~~~~~~ Regex B ~~~~~~~~~~~~~~~~ (?x) # BRACE boundaries { and } (?: \\ [{}] | [^{}] )* ( (?<!\\) \{ (?: (?: \\ [{}] | [^{}] ) | (?1) )* (?<!\\) \} ) (?: \\ [{}] | [^{}] )* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (?x) # BRACKET boundaries [ and ] (?: \\ [][] | [^][] )* ( (?<!\\) \[ (?: (?: \\ [][] | [^][] ) | (?1) )* (?<!\\) \] ) (?: \\ [][] | [^][] )* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (?x) # PARENTHESE boundaries ( and ) (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )*
- For pattern
C
~~~~~~~~~~~~~~~~ Regex C ~~~~~~~~~~~~~~~~ (?x) # BRACE boundaries { and } (?: (?: \\ [{}] | [^{}] )* ( (?<!\\) \{ (?: (?: \\ [{}] | [^{}] ) | (?1) )* (?<!\\) \} ) (?: \\ [{}] | [^{}] )* )+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (?x) # BRACKET boundaries [ and ] (?: (?: \\ [][] | [^][] )* ( (?<!\\) \[ (?: (?: \\ [][] | [^][] ) | (?1) )* (?<!\\) \] ) (?: \\ [][] | [^][] )* )+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (?x) # PARENTHESE boundaries ( and ) (?: (?: \\ [()] | [^()] )* ( (?<!\\) \( (?: (?: \\ [()] | [^()] ) | (?1) )* (?<!\\) \) ) (?: \\ [()] | [^()] )* )+
Cheers,
guy038
-
@guy038, you seem to have a good understanding of regex. But I don’t and I am afraid I don’t understand you.
I do understand that regex is important in the tasks that I am trying to accomplish here. For example, using a regex replace I have learned how to change line endings for example if there are mixed line endings, I can change them all in a file to the ending I want using notepad++ replace command.
I also want to be able to detect the broken parenthesis in this manner and “jump” to them or highlight them by incorporating them into the highlighter python script that I already have here. Of course, we want to do this ideally with any opener, this will alert the user to missing 'end’s missing string quotes etc.
I just think if these things could be highlighted it would be a great benefit to coders.
I find coding so difficult and I like to have tools like these…Also, how to turn these regex find and replace commands into 1 click macros and things like that,
So, I can easily click on my mixed line ending converter or other stuff easily.Thanks for the replies!
-
@SalviaSage said:
how to turn these regex find and replace commands into 1 click macros
Well now THAT is super-easy. Here’s how I like to do it:
- Set up your search or replace operation and test it using the Find dialog. Include setting up all the fields and checkboxes as you want, of course
- Use the toolbar button for starting a macro recording, or equivalently use the Macro (menu) -> Start Recording menu option
- Do one of the search or replace actions…(e.g., FInd Next, Replace, Replace All, etc.) – this is the only thing that you do that gets saved (in other words, you can change the Find what zone over and over but, even while macro recording is going on, these changes are not recorder–but they are recorded (as a group) when you do a search/replace action)
- Use the toolbar button for ending a macro recording, or equivalently use the Macro (menu) -> Stop Recording menu option
- (Optional) Test your macro using the toolbar button for playback, , or equivalently use the Macro (menu) -> Playback menu option…if it is not working right, start over with the first step
- Name and save your macro using the Macro (menu) -> Save Currently Recorded Macro menu option
-
Hello, @salviasage, @scott-sumner and All,
I forgot an very IMPORTANT point about the free-spacing mode, which is introduced with the
(?x)
modifier
A)
IF your free-spacing pattern is a SINGLE-line regex, for instance :(?x) abc \r \n def \r \n ghi
Then :
-
From your browser, copy the preferred regex, in the clipboard, with
Ctrl + C
-
Then, you have the choice between two possibilities :
-
1)
Open the Find dialog withCtrl + F
then paste, withCtrl + V
, the regex in theFind what:
zone, which overwrites the present contents of the search zone -
2)
Paste the regex, anywhere in a document or a new tab, then re-select the regex and open the Find dialog, withCtrl + F
, which, automatically, fill up theFind what:
zone, with the MONO-line selection
-
B)
IF your free-spacing pattern is a MULTI-line regex, for instance :(?x) abc \r \n def \r \n ghi
-
Like above, from your browser, copy the preferred regex, in the clipboard, with
Ctrl + C
-
This time, you cannot paste the regex, directly, in the Find dialog, with
Ctrl + V
! You must use the case2)
. So :- Paste the regex, anywhere in a document or a new tab, then re-select the regex and open the Find dialog, with
Ctrl + F
, which, automatically, fill up theFind what
zone, with the correct MULTI-line selection !
- Paste the regex, anywhere in a document or a new tab, then re-select the regex and open the Find dialog, with
You may test, the single-line and multi-line regexes, with the sample text, below :
abc def ghi
Remark :
- The
\R
syntax, standing for any kind of EOL, is forbidden in free-spacing regexes. So, according to the type of your file, you’ll replace any\R
pattern with one the syntaxes\r\n
,\n
or\r
!
Cheers,
guy038
-
-
@SalviaSage said:
if there are mixed line endings, I can change them all in a file to the ending I want using notepad++ replace command.
Why do this when you could simply use the Edit menu’s EOL Conversion submenu? I’d suggest doing it that way (or via double-clicking on the line-ending area on the status bar)…less error prone…and you don’t have to remember the regex to do the desired conversion.
-
The reason why I can not trust that built in function in notepad++ is because it simply does not work… at least it does not work real-time. I tried it.
That is why, I am loving my regex macro because it does work real time, you can try it yourself by pressing ctrl+m to insert CR line, and then click on that, and the line won’t be converted.
whereas my regex macro does convert everything properly.
And yeah, now that I can save my regex as a macro (thanks for telling me how to do that) I don’t have to remember to type anything in the replace window.But, I still can not detect the possible mixed line endings, without manually opening up the EOL display and looking through myself (eek… what are computers for!!! )
-
So the EOL Conversion feature only does something right at the time when you execute it. It doesn’t set up any “real-time” or “as you type” monitoring to intercept whatever badness you choose to do and correct it.
So files basically have a line-ending type as a convenience for the user. You hit Enter and the correct line-ending is inserted. You paste data from another source that has different line-endings and at the paste Scintilla sets the line-endings to match the destination’s EOL configuration. It is basically a set-it-and-forget-it kind of thing. HOWEVER, you can go around it, in a few ways (and probably more than this short list):
- Pressing ctrl+m when you haven’t remapped ctrl+m to something else (and you aren’t using Mac EOLs)
- Doing a regex replace where your replacement text uses
\n
or\r
or\r\n
and the sequence you use doesn’t match your file’s current EOL choice - Pasting data using the Clipboard History panel when an entry there has line-endings that don’t match your file’s current EOL choice
So I’d think most people just avoid doing these things, and they don’t ever deal with the problem of mixed line-endings in one file…?
May I ask why you are using ctrl+m in this way…or is this just something that you discovered will cause a mismatch and that is why you gave that as an example?
I am loving my regex macro because it does work real time
Rereading your most recent posting, I’m concerned that I am not understanding your “in real time” statement. For example, how does your regex macro work in real time? It is an on-demand thing…nothing happens until you run the macro…very much like the EOL conversion command.
I still can not detect the possible mixed line endings, without manually opening up the EOL display and looking through myself
I will try to think of some things that can be done about the mixed line-ending situation…
-
BTW, pressing ctrl+j if you remove the default Notepad++ keymapping tying that to the Join Lines feature will insert a
\n
line-ending in the current file, regardless of the file’s EOL setting. (This can be added to the previous posting’s bullet list) -
So as a prevention from pressing ctrl+m inserting a Mac line-ending into a non-Mac encoded file (as well as possibly ctrl+j inserting a Unix line-ending into a non-Unix encoded file), see the
CHARADDED
Pythonscript in this thread. This script should be set to be run upon Notepad++ startup.More comments to come on this topic…which unfortunately is off-topic for this thread (but I didn’t do that). :-)
-
Hello, @salviasage, @scott-sumner and All,
Oh, my God ! there a very simple way to prevent the
Ctrl+ M
shortcut from inserting the CR character (\x{000D}
) , which is displayed in reverse video !.. Like me, simply, affect theCtrl+ M
shortcut to theMark
dialog ( Search > Mark… ) :-)))Et voilà !
Cheers,
guy038
-
very simple way
Yes, but ctrl+m is just a special case of ALL of the control-plus-(mostly)letter codes that one hasn’t assigned shortcut functions to. The Pythonscript in that other thread takes care of all of them at once, so fat-fingered users should not see odd black-boxed things again in their editor window (like the picture at the bottom of this posting)…unless they Undo (ctrl+z) the change made by the script. Sadly, I don’t believe there is a way to remove something from the undo buffer without purging the entire thing. :-(
It’s a more complete solution: It takes care of the original problem of ctrl+m inserting a
\r
– without having to assign it a function, and some other things that a user may experience and not like.By the way, the script won’t interfere with things like ctrl+a or ctrl+c or ctrl+v, etc, because those (and your ctrl+m, @guy038) get snared as commands at a higher level and don’t pass through–like unassigned ones–to be added as “text” to the current editor window.
(those are ctrl+w and ctrl+e, respectively, as text) -
So one way to avoid having mixed line-endings in your file is to automatically run a check each time the file is saved, and if any inconsistent line-endings are found, correct them at that time. Here’s a Pythonscript that will do that; I call it
LineEndingRepairAtSave.py
:try: LERAS__bad_eol_regex_via_good_eol_dict except NameError: LERAS__bad_eol_regex_via_good_eol_dict = { '\r\n' : r'\r(?!\n)|(?<!\r)\n', '\n' : r'\r\n?', '\r' : r'\r?\n', } def LERAS__callback_npp_FILEBEFORESAVE(args): correct_eol_for_this_file = ['\r\n', '\r', '\n'][notepad.getFormatType()] editor.rereplace(LERAS__bad_eol_regex_via_good_eol_dict[correct_eol_for_this_file], correct_eol_for_this_file) notepad.callback(LERAS__callback_npp_FILEBEFORESAVE, [NOTIFICATION.FILEBEFORESAVE])
The idea is that you run it once per Notepad++ session and it will stand guard against the tyranny of mixed line-endings in your saved files. Maybe it takes a noticeable amount of time to run on really large files…dunno…use at your own risk.