Community
    • Login

    Filter the data !!!

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    63 Posts 6 Posters 12.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • EkopalypseE
      Ekopalypse @Alan Kilborn
      last edited by Ekopalypse

      @Alan-Kilborn

      Since it runs a search at every keystroke, performance problem on huge files?

      … yes but I would argue … don’t do it, use the find dialog instead :-)

      my caret immediately jump from where I was concentrating

      as this feature doesn’t exist yet it might be that it doesn’t do what you think it will do :-)
      But I get your point, that would be, at least, confusing. :-D

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @Ekopalypse
        last edited by Alan Kilborn

        @Ekopalypse said in Filter the data !!!:

        as this feature doesn’t exist yet it might be that it doesn’t do what you think it will do

        So current implementation sets active selection to text matching incremental search data.
        If you return to the editor, your caret is left at the end of the selected text (the end closer to end-of-file).
        Default expectation is it would work same way if there was a regex mode.
        Thus, my guess is that .* would leave one’s caret at end-of-file with everything above selected.
        I supposed it would have to be (?s).* to be entirely correct.
        But, yes, I guess Notepad++ devs could change how it logically works (i.e., leave caret at start of selection, closer to original caret pos)?

        1 Reply Last reply Reply Quote 0
        • Alan KilbornA
          Alan Kilborn @astrosofista
          last edited by

          @astrosofista said in Filter the data !!!:

          implementation of a colored syntax to highligth groups and alternations at a glance - by the way, maybe I am not aware and this is currently feasible

          I’m sure not quite what is being asked for, but here’s a curious little Pythonscript.

          It takes a regex as its input and then highlights the current file according to the sections of the file that don’t match (yellow), and the overall match (left “uncolored”) and the capturing groups in the regex (group #1 = cyan, group #2 = orange, group #3 = purple, group #4 = dark-green, group #5 = red). Above group #5 I didn’t bother doing.

          The reason I left the overall match (group #0) uncolored is that we’d have had overlapping colors that way, and I thought that would have made things less clear.

          So if we take the text of the script itself:

          # -*- coding: utf-8 -*-
          
          # see https://community.notepad-plus-plus.org/topic/19240/filter-the-data
          
          from Npp import editor, notepad
          
          class T19240(object):
              def __init__(self):
                  indic_list = [ 23, 25, 24, 22, 21, 31 ]
                  for i in indic_list: editor.setIndicatorCurrent(i); editor.indicatorClearRange(0, editor.getTextLength())
                  regex = r'(?-s)(notepad|editor)\.(.*?)\(.*?\)'
                  regex = notepad.prompt('Enter regex (just Cancel to clear colors from previous run):', '', regex)
                  if regex == None or len(regex) == 0: return
                  def fill(indic, start_pos, end_pos): editor.setIndicatorCurrent(indic); editor.indicatorFillRange(start_pos, end_pos - start_pos)
                  self.remember = 0
                  def match_fn(m):
                      fill(indic_list[0], self.remember, m.span(0)[0])
                      self.remember = m.span(0)[1]
                      for grp in range(len(m.groups()) + 1):
                          #print(grp, '->', m.span(grp), m.group(grp))
                          if 0 < grp <= 5: fill(indic_list[grp], m.span(grp)[0], m.span(grp)[1])
                  editor.research(regex, match_fn)
                  fill(indic_list[0], self.remember, editor.getTextLength())
          
          if __name__ == '__main__': T19240()
          

          and we run the script on that, and accept the suggested regex, we get:

          a16b2690-8a5d-46c6-9b44-22efdbdbec96-image.png

          astrosofistaA 1 Reply Last reply Reply Quote 1
          • guy038G
            guy038
            last edited by guy038

            Hello, @alan-kilborn and All,

            I tested your Python script : Works nice :-)

            I noticed that the id of styles 1 to 5 are in reverse order, giving their names !

            So :

            Mark   Style 1  = 25
            Mark   Style 2  = 24
            Mark   Style 3  = 23
            Mark   Style 4  = 22
            Mark   Style 5  = 21
            
            Find Mark Style = 31
            

            I also noted that the first indicator, of the indic_list, is the color with highlights parts of text which do not match the user regex

            Personally, I preferred that this specific color was the Find Mark style, which allows me to wipe out the color of all non-matched parts, using the Clear all marks button of the Mark dialog !

            And to clear the different highlighting groups, I just use the Remove style > Clear all Styles option, of the Context menu !

            Now, Alan, would it be possible to show the $0 group, with the kind of highlighting, in the picture below :

            42f9f2fb-63a6-4a65-af0a-34bff5ca34ab-image.png

            Just a suggestion, of course ! Only if interested and if you get some spare time !

            Best Regards,

            guy038

            P.S. :

            I know, I abuse, but would it also be possible to easily modify the border color of that $0 group ?

            Alan KilbornA 1 Reply Last reply Reply Quote 1
            • Alan KilbornA
              Alan Kilborn @guy038
              last edited by

              @guy038 said in Filter the data !!!:

              with the kind of highlighting, in the picture below

              Yes! That’s a better idea.
              Of course, since you’ve already shown what it looks like, I wonder how you did that; maybe you already wrote the code!? :-)

              Alan KilbornA 1 Reply Last reply Reply Quote 1
              • Alan KilbornA
                Alan Kilborn @Alan Kilborn
                last edited by

                It took me a bit to figure out how to do the boxing, but thanks to this OLD THREAD I see how to get it going. Update to be posted soon!

                1 Reply Last reply Reply Quote 1
                • guy038G
                  guy038
                  last edited by

                  Hi, @alan-kilborn and All,

                  No, sorry, Alan ! I wish I could create such a Python script like that ;-)) I simply used paint.exe and added a red rectangular box around specific zones of a screenshot picture ! Moreover you can notice that, for 2 of the $0 occurrences which are distributed on two lines, I drew two rectangles whereas, by script, there will be certainly only one zone!

                  I posted this request about the $0 group because I remembered the old post you mentioned in your last post. But I was a bit lazy and I’ve given up to find where it could be, on our forum. However, I was sure it has been created by @scott-sumner or @claudia-frank !

                  Therefore, as a first step, I preferred to omit this precious link. I just assumed you would not have any particular problem with this kind of highlighting ! So, sorry for letting you do this research on your own :-(

                  Cheers,

                  guy038

                  Alan KilbornA 1 Reply Last reply Reply Quote 1
                  • Fake TrumF
                    Fake Trum
                    last edited by

                    Hello everyone. I sincerely thank everyone for supporting me. And this is how I did:

                    • Because my files are very big, but it’s similar to what I posted so I shortened them with the Plugin Remove Duplicate line
                    • Next I delete the blank lines and Indent all
                    • Next remove the first <div><div> with the command: ^<div><div>
                    • And continue to use the command: <div><div>.* —> Remove the characters after <div><div> and itself.
                    • And finally use the command: .{90}.+(\R?\N|\n|$) -> Remove lines with more than 90 characters: Such as this line: There is a grandtotal of <span id=“stats_s1” style=“font-weight:bold;”>27,018,552,748</span> user hash requests made to this database, <span id=“stats_s2” style=“font-weight:bold;”>180,510,988</span> are of unique hashes (about <span id=“stats_s3” style=“font-weight:bold;”>0%</span> of grandtotal). Out of the grandtotal number of requests, <span id=“stats_s4” style=“font-weight:bold;”>26,403,484,047</span> were successful or cracked (about <span id=“stats_s5” style=“font-weight:bold;”>97%</span>). Regardingly only unique hashes, <span id=“stats_s6” style=“font-weight:bold;”>144,717,104</span> were successful or cracked (about <span id=“stats_s7” style=“font-weight:bold;”>80%</span>). </p>
                      Because it is not the same, it is impossible to eliminate duplicate lines. And I have the results I need.
                    1 Reply Last reply Reply Quote 2
                    • Alan KilbornA
                      Alan Kilborn @guy038
                      last edited by

                      @guy038

                      Second version of script with desired change (mainly boxing the entire match; doing nothing with non-matching text):

                      # -*- coding: utf-8 -*-
                      
                      # see https://community.notepad-plus-plus.org/topic/19240/filter-the-data
                      # see https://community.notepad-plus-plus.org/topic/14501/has-a-plugin-like-sublime-plugin-brackethighlighter
                      
                      from Npp import editor, notepad, INDICATORSTYLE
                      
                      class T19240a(object):
                      
                          def __init__(self):
                              free_indicator_to_use = 17
                              self.indicator_set_options(free_indicator_to_use, INDICATORSTYLE.STRAIGHTBOX, (238,121,159), 0, 255, True)
                              indic_list = [ free_indicator_to_use, 25, 24, 23, 22, 21, 31, 29, 28 ]
                              for i in indic_list: self.clear_all(i)
                              regex = r'(?-s)(notepad|editor)\.(.*?)\(.*?\)'
                              regex = notepad.prompt('Enter regex (just Cancel to clear colors from previous run):', '', regex)
                              if regex == None or len(regex) == 0: return
                              def match_fn(m):
                                  for grp in range(len(m.groups()) + 1):
                                      #print('{g} -> {s} |{text}|'.format(g=grp, s=m.span(grp), text=m.group(grp)))
                                      if grp < len(indic_list):  # we only have a finite number of colors but we could have more groups than that
                                          if m.span(grp)[0] != m.span(grp)[1]:  # don't bother with zero-length groups; or groups not matched: (-1, -1)
                                              self.fill(indic_list[grp], m.span(grp)[0], m.span(grp)[1])
                              editor.research(regex, match_fn)
                      
                          def fill(self, indic, start_pos, end_pos):
                              editor.setIndicatorCurrent(indic)
                              editor.indicatorFillRange(start_pos, end_pos - start_pos)
                      
                          def clear_all(self, indic):
                              editor.setIndicatorCurrent(indic)
                              editor.indicatorClearRange(0, editor.getTextLength())
                      
                          def indicator_set_options(self, indicator_number, indicator_style, rgb_color_tup, alpha, outline_alpha, draw_under_text):
                              for ed in (editor1, editor2):
                                  ed.indicSetStyle(indicator_number, indicator_style)       # e.g. INDICATORSTYLE.ROUNDBOX
                                  ed.indicSetFore(indicator_number, rgb_color_tup)
                                  ed.indicSetAlpha(indicator_number, alpha)                 # integer
                                  ed.indicSetOutlineAlpha(indicator_number, outline_alpha)  # integer
                                  ed.indicSetUnder(indicator_number, draw_under_text)       # boolean
                      
                      if __name__ == '__main__': T19240a()
                      

                      @Fake-Trum Sorry for hijacking your thread a bit.

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by guy038

                        Hello, @alan-kilborn and All,

                        Many thanks for your second try ;-)) As for me, I preferred to slightly color all the group0 zones ! So I used an alpha transparency of 50 instead of 0

                        Here is a regex which enables the 8 possible highlightings :

                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                        (?x)   # Start of Regex   #              Number group         Name of Style         Indicator
                        (\l[\l\r\n]+)\h+\d+\h+    #                 Group 1        Mark Style 1                 25
                        (\l[\l\r\n]+)\h+\d+\h+    #                 Group 2        Mark Style 2                 24
                        (\l[\l\r\n]+)\h+\d+\h+    #                 Group 3        Mark Style 3                 23
                        (\l[\l\r\n]+)\h+\d+\h+    #                 Group 4        Mark Style 4                 22
                        (\l[\l\r\n]+)\h+\d+\h+    #                 Group 5        Mark Style 5                 21
                        (\l[\l\r\n]+)\h+\d+\h+    #                 Group 6        Find Mark Style              31
                        (\l[\l\r\n]+)\h+\d+\h+    #                 Group 7        Smart Highlighting           29
                        (\l[\l\r\n]+)             #                 Group 8        Incremental highlight all    28
                                                  # End of Regex
                        Color = (240,128,160) , Alpha = 40 , Outline Alpha = 255 , StraightBox Style            17
                        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                        

                        Tested against the text below :

                        abcde
                        fghij   012345  abcde
                        fghij   012345  abcde
                        fghij   012345  abcde
                        fghij   012345  abcdefg
                        hij   012345  abc
                        defghij   012345  abc
                        defghij   012345  abc
                        defghij
                        

                        If you click on the ¶ button to visualize all characters, it’s great to see that highlighting goes also over the LF and CR chars and that the straight box embeds them, either, when a group contains line-break(s) ;-))

                        45f00723-b829-4fd5-89e9-37289cb07546-image.png


                        Now, we already have a default regex, with the line regex = r'.............' But, Allan (and this is my last request, I promise !), could you add the automatic assignment of the current selection to the regex variable ?

                        I mean something like :

                        If no current main *selection THEN regex = notepad.prompt... ELSE regex = current selection ( without any dialog )

                        TIA,

                        Cheers,

                        guy038

                        Alan KilbornA 1 Reply Last reply Reply Quote 1
                        • Alan KilbornA
                          Alan Kilborn @guy038
                          last edited by Alan Kilborn

                          @guy038 said in Filter the data !!!:

                          Alan (and this is my last request, I promise !), could you add the automatic assignment of the current selection to the regex variable ?

                          Remember, you promised!

                          Here’s the “b” version:

                          # -*- coding: utf-8 -*-
                          
                          # see https://community.notepad-plus-plus.org/topic/19240/filter-the-data
                          # see https://community.notepad-plus-plus.org/topic/14501/has-a-plugin-like-sublime-plugin-brackethighlighter
                          
                          from Npp import editor, notepad, INDICATORSTYLE
                          
                          class T19240b(object):
                          
                              def __init__(self):
                                  free_indicator_to_use = 17
                                  self.indicator_set_options(free_indicator_to_use, INDICATORSTYLE.STRAIGHTBOX, (240,128,160), 40, 255, True)
                                  indic_list = [ free_indicator_to_use, 25, 24, 23, 22, 21, 31, 29, 28 ]
                                  for i in indic_list: self.clear_all(i)
                                  if editor.getSelectionEmpty():
                                      regex = r'(?-s)(notepad|editor)\.(.*?)\(.*?\)'  # a regex just for demo purposes
                                      regex = notepad.prompt('Enter regex (just Cancel to clear colors from previous run):', '', regex)
                                  else:
                                      regex = editor.getSelText()
                                  if regex == None or len(regex) == 0: return
                                  def match_fn(m):
                                      for grp in range(len(m.groups()) + 1):
                                          #print('{g} -> {s} |{text}|'.format(g=grp, s=m.span(grp), text=m.group(grp)))
                                          if grp < len(indic_list):  # we only have a finite number of colors but we could have more groups than that
                                              if m.span(grp)[0] != m.span(grp)[1]:  # don't bother with zero-length groups; or groups not matched: (-1, -1)
                                                  self.fill(indic_list[grp], m.span(grp)[0], m.span(grp)[1])
                                  editor.research(regex, match_fn)
                          
                              def fill(self, indic, start_pos, end_pos):
                                  editor.setIndicatorCurrent(indic)
                                  editor.indicatorFillRange(start_pos, end_pos - start_pos)
                          
                              def clear_all(self, indic):
                                  editor.setIndicatorCurrent(indic)
                                  editor.indicatorClearRange(0, editor.getTextLength())
                          
                              def indicator_set_options(self, indicator_number, indicator_style, rgb_color_tup, alpha, outline_alpha, draw_under_text):
                                  for ed in (editor1, editor2):
                                      ed.indicSetStyle(indicator_number, indicator_style)       # e.g. INDICATORSTYLE.ROUNDBOX
                                      ed.indicSetFore(indicator_number, rgb_color_tup)
                                      ed.indicSetAlpha(indicator_number, alpha)                 # integer
                                      ed.indicSetOutlineAlpha(indicator_number, outline_alpha)  # integer
                                      ed.indicSetUnder(indicator_number, draw_under_text)       # boolean
                          
                          if __name__ == '__main__': T19240b()
                          
                          1 Reply Last reply Reply Quote 0
                          • guy038G
                            guy038
                            last edited by guy038

                            Hi, @Alan-kilborn and All,

                            Alan, this version is just perfect ! Up to now, when building a complicated search regex, containing some groups, I was used to type in this regex, in the Replace dialog, to clearly see the contents of each group :

                            REPLACE \r\n>$1<\r\n>$2<\r\n>$3<\r\n>$4<\r\n>$5<\r\n......>$n<\r\n

                            Now, with your script :

                            • Select the regex where you want to notice the different groups, from 1 to 8, as well as the overall match $0 , for each occurrence, in current file

                            • Execute the last version of your Python script, that I renamed Groups_Highlighter.py, BTW ;-))

                            Much more elegant, isn’t it ?

                            Best Regards

                            guy038

                            P.S. : Two more points :

                            • Out of curiosity, what means, exactly, the syntax T19240 ?

                            • I tried to change the draw_under_text value from True to False. But I did not see any difference ?!

                            Thanks, again, Alan, for this valuable script :-))

                            Alan KilbornA 1 Reply Last reply Reply Quote 1
                            • Alan KilbornA
                              Alan Kilborn @guy038
                              last edited by Alan Kilborn

                              @guy038

                              I renamed Groups_Highlighter.py

                              I called my copy ColorizeRegex.py but to each his own!

                              what means, exactly, the syntax T19240 ?

                              It’s the topic id of this thread in the forum! :-)
                              This is a Peter-ism. :-)

                              tried to change the draw_under_text value from True to False. But I did not see any difference ?!

                              Not sure, it was in the code I stole from the earlier referenced thread, about bracket-highlighting.
                              I don’t think I fully follow the DOCS about it, either.

                              Alan KilbornA 1 Reply Last reply Reply Quote 0
                              • Alan KilbornA
                                Alan Kilborn @Alan Kilborn
                                last edited by

                                @guy038

                                Of course, with this new script, aren’t we somewhat reinventing a four-year-old WHEEL ??

                                I’m sure Peter will be scanning your tabs in your screenshot HERE looking for interesting things. :-)

                                PeterJonesP 1 Reply Last reply Reply Quote 0
                                • guy038G
                                  guy038
                                  last edited by

                                  Hi, @alan-kilborn,

                                  Yes, I also remember the @claudia-frank’s regex_tester script However it does not behave the same way than your script !

                                  As far as I can remember, it just used two colors for two consecutive groups + a third color for the overall match. So, if your regex contained, for instance, 3 groups => groups 1 and 3 were highlighted with the first color, the group 2 with the second color and the overall regex/occurrence with the third color !

                                  And I think that your script, with a different color for each group, is quite interesting, too !


                                  Ah… too late, I promised ! I just forgot the case when two $0 regexes are consecutive. then the straight-boxes are joined and no separation appears to show where the boundary between the two occurrences, is !

                                  I will survive this ;-))

                                  BR

                                  guy038

                                  Alan KilbornA 1 Reply Last reply Reply Quote 0
                                  • Alan KilbornA
                                    Alan Kilborn @guy038
                                    last edited by

                                    @guy038 said in Filter the data !!!:

                                    Ah… too late, I promised ! I just forgot the case when two $0 regexes are consecutive. then the straight-boxes are joined and no separation appears

                                    It’s a critical bug, not a new feature request. I will work on it.

                                    1 Reply Last reply Reply Quote 0
                                    • PeterJonesP
                                      PeterJones @Alan Kilborn
                                      last edited by

                                      @Alan-Kilborn said in Filter the data !!!:

                                      I’m sure Peter will be scanning your tabs in your screenshot HERE looking for interesting things. :-)

                                      Well, I’m mildly surprised. One, because I’d forgotten I’d passed my four-year anniversary in December. Two, because I had only about 8 posts (if the search for my posts, sorted by ascending date doesn’t miss any). He had replied once or twice to me in that timeframe, but I’m surprised I was “on his radar” yet – at least enough to save a tab for that long.

                                      While looking at the early posts, I was amused to see me say, in this Jan 2016 post,

                                      I am not a Notepad++ expert

                                      I don’t think I can rightly claim that anymore. :-)

                                      1 Reply Last reply Reply Quote 2
                                      • guy038G
                                        guy038
                                        last edited by guy038

                                        Hello, @alan-kilborn, @peterjones and All,

                                        I first joined the Notepad++ forum, on SourceForge.net, on May 08 2013. Then, from Jun 24 2015, as others, I migrated to our NodeBB forum

                                        I’m used to save any of my posts in a simple .txt file , in a specific folder, giving the OP’s name to that file. So, at any moment, I keeps opened tabs of my recent posts because, sometimes, the OP does not answer immediately !

                                        Of course, I should, daily, close some of these tabs, when either, the OP succeeded to solve his problem or do not reply, after a while ! But, I have to admit that I do not apply myself to this daily task, but only from time to time, which explains the numerous tabs of my session !

                                        However, and this seems obvious, regarding your case, Alan and Peter, and some others, you are quite active on our forum. Therefore, I simply keeps your tab opened permanently ;-))

                                        Up to now, after a look into Users > Most Reputation, I created 2,361 posts. The specific folder, where are all my saved posts, contains 1,265 text files. Let’s say that a couple of them are from mine : this means that I created about 1.87 post per OP ;-)) ( 2,361 / 1,260 )

                                        Best Regards,

                                        guy038

                                        1 Reply Last reply Reply Quote 1
                                        • guy038G
                                          guy038
                                          last edited by

                                          Hi, @alan-kilborn,

                                          Regarding the issue of consecutive $0 ranges of text :

                                          May be using two different styles, let’s say 17 and 18 ( if free, of course ), and swapping, successively, to each style, for each $0 occurrence ?

                                          BR

                                          guy038

                                          Alan KilbornA 1 Reply Last reply Reply Quote 2
                                          • Alan KilbornA
                                            Alan Kilborn @guy038
                                            last edited by

                                            @guy038 said in Filter the data !!!:

                                            May be using two different styles, let’s say 17 and 18 ( if free, of course ), and swapping, successively, to each style, for each $0 occurrence ?

                                            Exactly what I had in mind, I just have to find a bit of free time to do it. :-)

                                            17 and 18 ( if free, of course )

                                            I believe these are “free” on a default system, but of course, it is worth pointing out that if others happen to be using these, and still want to use this script, they should alter the numbers.

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors