Filter the data !!!
-
Hi, @alan-kilborn and All,
No, sorry, Alan ! I wish I could create such a Python script like that ;-)) I simply used paint.exe and added a red rectangular box around specific zones of a screenshot picture ! Moreover you can notice that, for
2
of the$0
occurrences which are distributed on two lines, I drew two rectangles whereas, by script, there will be certainly only one zone!I posted this request about the
$0
group because I remembered the old post you mentioned in your last post. But I was a bit lazy and I’ve given up to find where it could be, on our forum. However, I was sure it has been created by @scott-sumner or @claudia-frank !Therefore, as a first step, I preferred to omit this precious link. I just assumed you would not have any particular problem with this kind of highlighting ! So, sorry for letting you do this research on your own :-(
Cheers,
guy038
-
Hello everyone. I sincerely thank everyone for supporting me. And this is how I did:
- Because my files are very big, but it’s similar to what I posted so I shortened them with the Plugin Remove Duplicate line
- Next I delete the blank lines and Indent all
- Next remove the first <div><div> with the command: ^<div><div>
- And continue to use the command: <div><div>.* —> Remove the characters after <div><div> and itself.
- And finally use the command: .{90}.+(\R?\N|\n|$) -> Remove lines with more than 90 characters: Such as this line: There is a grandtotal of <span id=“stats_s1” style=“font-weight:bold;”>27,018,552,748</span> user hash requests made to this database, <span id=“stats_s2” style=“font-weight:bold;”>180,510,988</span> are of unique hashes (about <span id=“stats_s3” style=“font-weight:bold;”>0%</span> of grandtotal). Out of the grandtotal number of requests, <span id=“stats_s4” style=“font-weight:bold;”>26,403,484,047</span> were successful or cracked (about <span id=“stats_s5” style=“font-weight:bold;”>97%</span>). Regardingly only unique hashes, <span id=“stats_s6” style=“font-weight:bold;”>144,717,104</span> were successful or cracked (about <span id=“stats_s7” style=“font-weight:bold;”>80%</span>). </p>
Because it is not the same, it is impossible to eliminate duplicate lines. And I have the results I need.
-
Second version of script with desired change (mainly boxing the entire match; doing nothing with non-matching text):
# -*- coding: utf-8 -*- # see https://community.notepad-plus-plus.org/topic/19240/filter-the-data # see https://community.notepad-plus-plus.org/topic/14501/has-a-plugin-like-sublime-plugin-brackethighlighter from Npp import editor, notepad, INDICATORSTYLE class T19240a(object): def __init__(self): free_indicator_to_use = 17 self.indicator_set_options(free_indicator_to_use, INDICATORSTYLE.STRAIGHTBOX, (238,121,159), 0, 255, True) indic_list = [ free_indicator_to_use, 25, 24, 23, 22, 21, 31, 29, 28 ] for i in indic_list: self.clear_all(i) regex = r'(?-s)(notepad|editor)\.(.*?)\(.*?\)' regex = notepad.prompt('Enter regex (just Cancel to clear colors from previous run):', '', regex) if regex == None or len(regex) == 0: return def match_fn(m): for grp in range(len(m.groups()) + 1): #print('{g} -> {s} |{text}|'.format(g=grp, s=m.span(grp), text=m.group(grp))) if grp < len(indic_list): # we only have a finite number of colors but we could have more groups than that if m.span(grp)[0] != m.span(grp)[1]: # don't bother with zero-length groups; or groups not matched: (-1, -1) self.fill(indic_list[grp], m.span(grp)[0], m.span(grp)[1]) editor.research(regex, match_fn) def fill(self, indic, start_pos, end_pos): editor.setIndicatorCurrent(indic) editor.indicatorFillRange(start_pos, end_pos - start_pos) def clear_all(self, indic): editor.setIndicatorCurrent(indic) editor.indicatorClearRange(0, editor.getTextLength()) def indicator_set_options(self, indicator_number, indicator_style, rgb_color_tup, alpha, outline_alpha, draw_under_text): for ed in (editor1, editor2): ed.indicSetStyle(indicator_number, indicator_style) # e.g. INDICATORSTYLE.ROUNDBOX ed.indicSetFore(indicator_number, rgb_color_tup) ed.indicSetAlpha(indicator_number, alpha) # integer ed.indicSetOutlineAlpha(indicator_number, outline_alpha) # integer ed.indicSetUnder(indicator_number, draw_under_text) # boolean if __name__ == '__main__': T19240a()
@Fake-Trum Sorry for hijacking your thread a bit.
-
Hello, @alan-kilborn and All,
Many thanks for your second try ;-)) As for me, I preferred to slightly color all the group
0
zones ! So I used an alpha transparency of50
instead of0
Here is a regex which enables the
8
possible highlightings :~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (?x) # Start of Regex # Number group Name of Style Indicator (\l[\l\r\n]+)\h+\d+\h+ # Group 1 Mark Style 1 25 (\l[\l\r\n]+)\h+\d+\h+ # Group 2 Mark Style 2 24 (\l[\l\r\n]+)\h+\d+\h+ # Group 3 Mark Style 3 23 (\l[\l\r\n]+)\h+\d+\h+ # Group 4 Mark Style 4 22 (\l[\l\r\n]+)\h+\d+\h+ # Group 5 Mark Style 5 21 (\l[\l\r\n]+)\h+\d+\h+ # Group 6 Find Mark Style 31 (\l[\l\r\n]+)\h+\d+\h+ # Group 7 Smart Highlighting 29 (\l[\l\r\n]+) # Group 8 Incremental highlight all 28 # End of Regex Color = (240,128,160) , Alpha = 40 , Outline Alpha = 255 , StraightBox Style 17 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Tested against the text below :
abcde fghij 012345 abcde fghij 012345 abcde fghij 012345 abcde fghij 012345 abcdefg hij 012345 abc defghij 012345 abc defghij 012345 abc defghij
If you click on the
¶
button to visualize all characters, it’s great to see that highlighting goes also over theLF
andCR
chars and that the straight box embeds them, either, when a group containsline-break(s)
;-))
Now, we already have a default regex, with the line
regex = r'.............'
But, Allan (and this is my last request, I promise !), could you add the automatic assignment of the current selection to the regex variable ?I mean something like :
If no current main *selection THEN
regex = notepad.prompt...
ELSE regex = current selection ( without any dialog )TIA,
Cheers,
guy038
-
@guy038 said in Filter the data !!!:
Alan (and this is my last request, I promise !), could you add the automatic assignment of the current selection to the regex variable ?
Remember, you promised!
Here’s the “b” version:
# -*- coding: utf-8 -*- # see https://community.notepad-plus-plus.org/topic/19240/filter-the-data # see https://community.notepad-plus-plus.org/topic/14501/has-a-plugin-like-sublime-plugin-brackethighlighter from Npp import editor, notepad, INDICATORSTYLE class T19240b(object): def __init__(self): free_indicator_to_use = 17 self.indicator_set_options(free_indicator_to_use, INDICATORSTYLE.STRAIGHTBOX, (240,128,160), 40, 255, True) indic_list = [ free_indicator_to_use, 25, 24, 23, 22, 21, 31, 29, 28 ] for i in indic_list: self.clear_all(i) if editor.getSelectionEmpty(): regex = r'(?-s)(notepad|editor)\.(.*?)\(.*?\)' # a regex just for demo purposes regex = notepad.prompt('Enter regex (just Cancel to clear colors from previous run):', '', regex) else: regex = editor.getSelText() if regex == None or len(regex) == 0: return def match_fn(m): for grp in range(len(m.groups()) + 1): #print('{g} -> {s} |{text}|'.format(g=grp, s=m.span(grp), text=m.group(grp))) if grp < len(indic_list): # we only have a finite number of colors but we could have more groups than that if m.span(grp)[0] != m.span(grp)[1]: # don't bother with zero-length groups; or groups not matched: (-1, -1) self.fill(indic_list[grp], m.span(grp)[0], m.span(grp)[1]) editor.research(regex, match_fn) def fill(self, indic, start_pos, end_pos): editor.setIndicatorCurrent(indic) editor.indicatorFillRange(start_pos, end_pos - start_pos) def clear_all(self, indic): editor.setIndicatorCurrent(indic) editor.indicatorClearRange(0, editor.getTextLength()) def indicator_set_options(self, indicator_number, indicator_style, rgb_color_tup, alpha, outline_alpha, draw_under_text): for ed in (editor1, editor2): ed.indicSetStyle(indicator_number, indicator_style) # e.g. INDICATORSTYLE.ROUNDBOX ed.indicSetFore(indicator_number, rgb_color_tup) ed.indicSetAlpha(indicator_number, alpha) # integer ed.indicSetOutlineAlpha(indicator_number, outline_alpha) # integer ed.indicSetUnder(indicator_number, draw_under_text) # boolean if __name__ == '__main__': T19240b()
-
Hi, @Alan-kilborn and All,
Alan, this version is just perfect ! Up to now, when building a complicated search regex, containing some groups, I was used to type in this regex, in the Replace dialog, to clearly see the contents of each group :
REPLACE
\r\n>$1<\r\n>$2<\r\n>$3<\r\n>$4<\r\n>$5<\r\n......>$n<\r\n
Now, with your script :
-
Select the regex where you want to notice the different groups, from
1
to8
, as well as the overall match$0
, for each occurrence, in current file -
Execute the last version of your Python script, that I renamed
Groups_Highlighter.py
, BTW ;-))
Much more elegant, isn’t it ?
Best Regards
guy038
P.S. : Two more points :
-
Out of curiosity, what means, exactly, the syntax
T19240
? -
I tried to change the draw_under_text value from
True
toFalse
. But I did not see any difference ?!
Thanks, again, Alan, for this valuable script :-))
-
-
I renamed Groups_Highlighter.py
I called my copy
ColorizeRegex.py
but to each his own!what means, exactly, the syntax T19240 ?
It’s the topic id of this thread in the forum! :-)
This is a Peter-ism. :-)tried to change the draw_under_text value from True to False. But I did not see any difference ?!
Not sure, it was in the code I stole from the earlier referenced thread, about bracket-highlighting.
I don’t think I fully follow the DOCS about it, either. -
-
Hi, @alan-kilborn,
Yes, I also remember the @claudia-frank’s regex_tester script However it does not behave the same way than your script !
As far as I can remember, it just used two colors for two consecutive groups + a third color for the overall match. So, if your regex contained, for instance,
3
groups => groups1
and3
were highlighted with the first color, the group2
with the second color and the overall regex/occurrence with the third color !And I think that your script, with a different color for each group, is quite interesting, too !
Ah… too late, I promised ! I just forgot the case when two
$0
regexes are consecutive. then the straight-boxes are joined and no separation appears to show where the boundary between the two occurrences, is !I will survive this ;-))
BR
guy038
-
@guy038 said in Filter the data !!!:
Ah… too late, I promised ! I just forgot the case when two $0 regexes are consecutive. then the straight-boxes are joined and no separation appears
It’s a critical bug, not a new feature request. I will work on it.
-
@Alan-Kilborn said in Filter the data !!!:
I’m sure Peter will be scanning your tabs in your screenshot HERE looking for interesting things. :-)
Well, I’m mildly surprised. One, because I’d forgotten I’d passed my four-year anniversary in December. Two, because I had only about 8 posts (if the search for my posts, sorted by ascending date doesn’t miss any). He had replied once or twice to me in that timeframe, but I’m surprised I was “on his radar” yet – at least enough to save a tab for that long.
While looking at the early posts, I was amused to see me say, in this Jan 2016 post,
I am not a Notepad++ expert
I don’t think I can rightly claim that anymore. :-)
-
Hello, @alan-kilborn, @peterjones and All,
I first joined the Notepad++ forum, on
SourceForge.net
, onMay 08 2013
. Then, fromJun 24 2015
, as others, I migrated to ourNodeBB
forumI’m used to save any of my posts in a simple
.txt
file , in a specific folder, giving the OP’s name to that file. So, at any moment, I keeps opened tabs of my recent posts because, sometimes, the OP does not answer immediately !Of course, I should, daily, close some of these tabs, when either, the OP succeeded to solve his problem or do not reply, after a while ! But, I have to admit that I do not apply myself to this daily task, but only from time to time, which explains the numerous tabs of my session !
However, and this seems obvious, regarding your case, Alan and Peter, and some others, you are quite active on our forum. Therefore, I simply keeps your tab opened permanently ;-))
Up to now, after a look into
Users > Most Reputation
, I created2,361
posts. The specific folder, where are all my saved posts, contains1,265
text files. Let’s say that a couple of them are from mine : this means that I created about1.87
post per OP ;-)) (2,361 / 1,260
)Best Regards,
guy038
-
Hi, @alan-kilborn,
Regarding the issue of consecutive
$0
ranges of text :May be using two different styles, let’s say
17
and18
( if free, of course ), and swapping, successively, to each style, for each$0
occurrence ?BR
guy038
-
@guy038 said in Filter the data !!!:
May be using two different styles, let’s say 17 and 18 ( if free, of course ), and swapping, successively, to each style, for each $0 occurrence ?
Exactly what I had in mind, I just have to find a bit of free time to do it. :-)
17 and 18 ( if free, of course )
I believe these are “free” on a default system, but of course, it is worth pointing out that if others happen to be using these, and still want to use this script, they should alter the numbers.
-
This version seems to delineate the start of a following match beginning right where the previous match ended:
# -*- coding: utf-8 -*- # see https://community.notepad-plus-plus.org/topic/19240/filter-the-data # see https://community.notepad-plus-plus.org/topic/14501/has-a-plugin-like-sublime-plugin-brackethighlighter from Npp import editor, notepad, INDICATORSTYLE class T19240c(object): def __init__(self): free_indic_list_for_group0 = [ 17, 18 ] self.indicator_set_options(free_indic_list_for_group0[0], INDICATORSTYLE.ROUNDBOX, (240,128,160), 40, 255, True) self.indicator_set_options(free_indic_list_for_group0[1], INDICATORSTYLE.ROUNDBOX, (240,128,160), 40, 255, True) indic_list = [ free_indic_list_for_group0[0], 25, 24, 23, 22, 21, 31, 29, 28, free_indic_list_for_group0[1] ] for i in indic_list: self.clear_all(i) regex = editor.getSelText() if len(regex) == 0: regex = r'(?-s)(notepad|editor)\.(.*?)\(.*?\)' # a regex just for demo purposes; delete this line if desired regex = notepad.prompt('Enter regex (just Cancel to clear colors from previous run):', '', regex) if regex == None or len(regex) == 0: return def match_fn(m): for grp in range(len(m.groups()) + 1): print('{g} -> {s} |{text}|'.format(g=grp, s=m.span(grp), text=m.group(grp))) if m.span(grp)[0] != m.span(grp)[1]: # don't bother with zero-length groups; or groups not matched: (-1, -1) if grp < len(indic_list) - 1: # we only have a finite number of colors but we could have more groups than that self.fill(indic_list[grp], m.span(grp)[0], m.span(grp)[1]) (indic_list[0], indic_list[-1]) = (indic_list[-1], indic_list[0]) # toggle between 2 indicators for subsequent group 0 editor.research(regex, match_fn) def fill(self, indic, start_pos, end_pos): editor.setIndicatorCurrent(indic) editor.indicatorFillRange(start_pos, end_pos - start_pos) def clear_all(self, indic): editor.setIndicatorCurrent(indic) editor.indicatorClearRange(0, editor.getTextLength()) def indicator_set_options(self, indicator_number, indicator_style, rgb_color_tup, alpha, outline_alpha, draw_under_text): for ed in (editor1, editor2): ed.indicSetStyle(indicator_number, indicator_style) # e.g. INDICATORSTYLE.ROUNDBOX ed.indicSetFore(indicator_number, rgb_color_tup) # (red, green, blue) ed.indicSetAlpha(indicator_number, alpha) # integer ed.indicSetOutlineAlpha(indicator_number, outline_alpha) # integer ed.indicSetUnder(indicator_number, draw_under_text) # boolean if __name__ == '__main__': T19240c()
-
Hello, @alan-kilborn, @ekopalypse, @peterjones and All,
Perfect, Alan ! I chose an other color for
Straightbox Style 2
, which have the same Saturation and Lightness thanStraightbox Style 1
color, in theHSL
Color Space ( S ≈79
and L ≈72
). So, I could choose the same Alpha transparency (40
).From their Hue (
343
and201
) we deduce that they come from the pure colors[255,0,73]
and[0,164,255]
([343,100,50]
and[201,100,50]
values in theHSL
Color Space )Here is, in a screenshot, a summary of the different styles, with their
RGB
values and theirAlpha
transparency as well as an example of the color alpha blending process, used in Notepad++, relative to the mixing of the ID style25
with the ID style17
, over theWhite
background ([255,255,255]
) of the Default style of Default theme (Stylers.xml
)
Two observations :
-
I noticed that you cannot use the
Smart Highlighting
style, either, natively and with your script. Thus, I preferred to place it at the end of the list, so for group8
-
When less than
6
groups are involved in the overall regex, after running your script, if we use the Context menuRemove style > Clear all Styles
option, we just see the highlighting of each$0
occurrence, alternatively, in light sky-blue or carmine color
Best Regards,
guy038
-
-
@guy038 said in Filter the data !!!:
a summary of the different styles, with their RGB values and their Alpha transparency as well as an example of the color alpha blending process,
You are the ColorMaster in addition to being the RegexMaster.
cannot use the Smart Highlighting style, either, natively and with your script.
True. If you run the script and it uses that color, the next time you use Smart Highlighting it will erase any of that color the script placed in favor of its own results. And the other way around.
if we use the Context menu Remove style > Clear all Styles option
What that command actually does in relation to the script is to remove the coloring that the script does with indicators 21 through 25.
For maximum flexibility, we could make the Notepad++ coloring features and this script’s coloring features totally independent. This would eat up unallocated indicators, but that’s ok, isn’t it? :-)
Really, all of the tools for you to go ahead and do this yourself @guy038 are already in the script. Just don’t use indicators 21 through 25, and 28 through 31. IIRC, 26 and 27 are also used by Notepad++ for something, so stay away from those. If you started, say at 20 and worked your way downward… Based upon Scintilla docs, I think they may be all unused until you get down to number 8.
What do you think? Is it worth me modifying the script? Or can you do it, if it is valuable?
-
How about supplying your screenshoted text as ACTUAL text here so that I can attempt to duplicate your coloring results without a lot of retyping? :-)
-
Hello, @alan-kilborn, @ekopalypse, @peterjones and All,
Alan, when I said :
- When less than
6
groups are involved in the overall regex, …
I do not see this behaviour as an issue. On the contrary, I think that it’s quite useful ;-)) I mean, as most of regex S/R don’t need more than
5
groups, we can :-
Easily identify these groups, from
1
to5
, with your Python script -
Then, easily identify successive
$0
matches, thanks to the2
colors of styles17
and18
, after running the context commandRemove style > Clear all Styles
Ah, OK ! Here is my text, being in N++
Post-it
screen mode, in my previous post :Mark Style 1 25 Color = [ 0,255,255] , Alpha = 100 Mark Style 2 24 Color = [255,128, 0] , Alpha = 100 Mark Style 3 23 Color = [255,255, 0] , Alpha = 100 Mark Style 4 22 Color = [128, 0,255] , Alpha = 100 Mark Style 5 21 Color = [ 0,128, 0] , Alpha = 100 Find Mark Style 31 Color = [255, 0, 0] , Alpha = 100 Incremental highlight all 28 Color = [ 0,128,255] , Alpha = 100 Smart Highlighting 29 Color = [ 0,255, 0] , Alpha = 100 StraightBox Style 1 17 Color = [240,128,160] , Alpha = 40 StraightBox Style 2 18 Color = [128,200,240] , Alpha = 40 (?-i)(\l+)11111|22222(\l+)|(\l+)33333|44444(\l+)|(\l+)55555|66666(\l+)|(\l+)77777|88888(\l+) abcde1111122222abcde12345abcde3333344444abcde12345abcde5555566666abcde12345abcde7777788888abcde ABCDE FGHIJ KLMNO PQRST UVWXY ZABCD - UPPERCASE strings are hightligted with "Mark" style, from 1 to 5, and the "Find Mark" style ( "Match case" option ON ) ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ - StraightBox Style 1 ( id = 17, [240,128,160] ), of Group 0 color, highlights the string "abcde" + five ODD digits - StraightBox Style 2 ( id = 18, [128,200,240] ), of Group 0 color, highlights five EVEN digits + the string "abcde" For instance, the process for coloring the FIRST string "abcde", with "Mark Style 1" [0,255,255], is : - First, "StraightBox Style 1", with Alpha = 40/255, is blended with "White" backgroud => Color [252,235,240] - Secondly, "Mark Style 1", with Alpha = 100/255, is blended with color [252,235,240] => Color [153 243 246] - When "Mark Style 1", with Alpha = 100/255, is blended with "White" background ( 1st "ABCDE" ) => Color [155,255,255] The GENERAL formula, for Red, Green and Blue, is : FINAL Color = CURRENT color + Alpha x ( NEW color - CURRENT Color ) ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ - If Foreground Alpha opacity = 1 => FINAL color = NEW colour => The NEW color is totally OPAQUE - If Foreground Alpha opacity = .5 => FINAL color = ( NEW + CURRENT) / 2 => PERFECT mixing of the TWO colours - If Foreground Alpha opacity = 0 => FINAL color = CURRENT colour => The NEW color is totally TRANSPARENT
Best Regards,
guy038
- When less than
-
Hi, @alan-kilborn and All
At the end of my description of the blending process, I made a little mistake. I should have written :
The GENERAL formula, for Red, Green and Blue, is : FINAL Color = CURRENT color + Alpha x ( NEW color - CURRENT Color ) ¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯ - If NEW color Alpha opacity = 1 => FINAL color = NEW colour => The NEW color is totally OPAQUE - If NEW color Alpha opacity = .5 => FINAL color = ( NEW + CURRENT) / 2 => PERFECT mixing of the TWO colours - If NEW color Alpha opacity = 0 => FINAL color = CURRENT colour => The NEW color is totally TRANSPARENT
This is more rigorous !
Cheers,
guy038