Community
    • Login

    Testing regular expressions by RegexTester.py

    Scheduled Pinned Locked Moved General Discussion
    11 Posts 2 Posters 9.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Claudia FrankC
      Claudia Frank
      last edited by Claudia Frank

      Hello,

      this is a little python script which can be used to TEST regular expressions,
      hence the name RegexTester, by providing immediate visual results. Best used if
      document which needs to be scanned has just a few lines - enough to test the regex.

      To make this work the following needs to be done

      1. python script plugin needs to be installed, of course.

      2. once installed, run

        • Plugins->Python Script->New Script
        • give it a meaningful name like RegexTester.py
        • copy the script listed here
        • save it.
      3. Open a file which should be searched by using regular expressions

      4. Open a new empty document and move it to the other view (View->Move…->Move to other view)
        You can use it side by side, I prefer both views as top and bottom

      Alt Screenshot1

      1. Run Plugins->Python Script->Scripts->NAME_OF_YOUR_SCRIPT

      How it should be use.

      You’ll see, that the second view shows a status message in line 3.
      It is important, that you don’t delete/modify this line, otherwise you loose the
      information whether regex tester is active and in which document it has been
      activate. One place can be modified, the [i] can became a [I] to toggle case sesitive.

      Alt Screenshot2

      Line 1 of the second view, is where you write your regular expressions
      Line 2 (not yet used - for future use)

      While typing your regex, the document in editor1 will constantly be updated about the matches.

      If there are sub matches, the “main” match gets colored without outlining, sub matches are outlined.
      For a better differentiation of “main” matches they get colored differently.

      Example:

      Alt Screenshot3

      Alt Screenshot4

      The script was inspired by the regex101.com site and the old regex tester plugin.

      Normally, the script should use the same regex syntax as npp does but if you find
      something which can be done using npp but not using the script it would be nice to
      let me know. Code itself has comments as well.

      One word of warning - I wouldn’t use it to scan big files, it might take
      some time before the document gets colored.

      Enough said, I guess - so, have fun ;-)

      import re                                                                                   # import regular expression module
      
      editor1.indicSetStyle(10,INDICATORSTYLE.CONTAINER)                                          # used to color whole match - odd lines
      editor1.indicSetFore(10,(95,215,184))                                                       # the color
      editor1.indicSetAlpha(10,55)                                                                # alpha settings
      editor1.indicSetOutlineAlpha(8,255)                                                         # outlining
      editor1.indicSetUnder(10,True)                                                              # draw under the text
      
      editor1.indicSetStyle(9,INDICATORSTYLE.CONTAINER)                                           # used to color whole match - even lines
      editor1.indicSetFore(9,(195,215,184))
      editor1.indicSetAlpha(9,55)
      editor1.indicSetOutlineAlpha(8,255)
      editor1.indicSetUnder(9,True) 
      
      editor1.indicSetStyle(8,INDICATORSTYLE.ROUNDBOX)                                            # used for sub matches
      editor1.indicSetFore(8,(100,215,100))
      editor1.indicSetAlpha(8,55)
      editor1.indicSetOutlineAlpha(8,255)
      editor1.indicSetUnder(8,True) 
      
      isOdd = False                                                                               # used as even/odd line identifier
      
      def match_found(m):
      
          global isOdd                                                                            # global, because we modify it
          if m.lastindex > 0:                                                                     # >0 = how many submatches do we have
              for i in range(0, m.lastindex + 1):                                                 # loop over it
                  if i == 0:                                                                      # match 0 is always the whole match
                      editor1.setIndicatorCurrent(9 if isOdd else 10)                             # set indicator for whole match
                      editor1.indicatorFillRange(m.span(0)[0], m.span(0)[1] - m.span(0)[0])       # draw indicator
                      isOdd = False if isOdd else True                                            # set even/odd identifier - next whole match gets coloured different
                  else:
                      editor1.setIndicatorCurrent(8)                                              # set indicator for sub matches
                      editor1.indicatorFillRange(m.span(i)[0], m.span(i)[1] - m.span(i)[0])       # draw it
                      
          else:                                                                                   # no sub matches    
              editor1.setIndicatorCurrent(8)                                                      # set the same indicator as normally used in sub matches
              editor1.indicatorFillRange(m.span(0)[0], m.span(0)[1] - m.span(0)[0])               # guess what :-) yes, draw it
      
      def clear_indicator():                                                                      # clear all indicators by
          length = editor1.getTextLength()                                                        # calculating length of document
          for i in range(8,11):                                                                   # and looping over
              editor1.setIndicatorCurrent(i)                                                      # each indicator to
              editor1.indicatorClearRange(0,length)                                               # clear the range
          
      def regex():                                                                                # here the regex starts
          
          clear_indicator()                                                                       # first have a clear view ;-)
          
          pattern = editor2.getLine(0).rstrip()                                                   # next, get the pattern for the second view and cut of line endings
      
          try:                                                                                    # try it
              if editor2.getLine(2)[22:23] == 'I':                                                # is it a case insensitive search?
                  editor1.research(pattern, match_found, re.IGNORECASE)                           # then call research with the ignore case flag
              else:                                                                               # otherwise
                  editor1.research(pattern, match_found)                                          # call without flag
          except:
              pass                                                                                # is needed to catch incorrect regular expressions
              
      def RegexTester_CHARADDED(args):                                                            # callback which gets called each time when char is added in editor
          regex()                                                                                 # calls itself regex function
      
      def RegexTester_UPDATEUI(args):                                                             # callback gets called and emulates a CHARDELETE notification
          if args['updated'] == 3:                                                                # is a bit of a hack but
              regex()                                                                             # seems to work
              
      if editor2.getProperty('RegexTester_running') != '1':                                       # if the script isn't currently running
          editor.callback(RegexTester_CHARADDED, [SCINTILLANOTIFICATION.CHARADDED])               # register the callbacks charadd
          editor.callback(RegexTester_UPDATEUI, [SCINTILLANOTIFICATION.UPDATEUI])                 # and emulated chardelete
          if editor2.getProperty('RegexTester_running') == '0':                                   # this checks if script was already running, stopped and restarted again
              editor2.replace('RegexTester inActive', 'RegexTester isActive')                     # add the status info to second view
          else:                                                                                   # no, this is the first time we run the script so
              editor2.appendText('\r\n\r\nRegexTester isActive [i] i=sensitive, I=insesitive')    # add the status info to second view
          editor2.setProperty('RegexTester_running', '1')                                         # and set the running identifier
          editor2.setFocus(True)                                                                  # give the second view the focus
          editor2.gotoLine(0)                                                                     # and jump to line 1
      
      else:                                                                                       # the script runs already so this call is used to
          editor.clearCallbacks([SCINTILLANOTIFICATION.CHARADDED])                                # clear the callback charadded and
          editor.clearCallbacks([SCINTILLANOTIFICATION.UPDATEUI])                                 # emulated chardeleted
          editor2.setProperty('RegexTester_running', '0')                                         # set info that script isn't running
          editor2.replace('RegexTester isActive', 'RegexTester inActive')                         # add the status info to second view
          clear_indicator()                                                                       # clear all indicators
          editor1.setFocus(True)                                                                  # and give first view the focus. Have fun 
      

      Cheers
      Claudia

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hello Claudia,

        I missed your awesome python script Regex Tester.py !! After some tests, here are some deductions :

        • First of all, how to stop your script ?! May be, I miss something obvious !

        • Once, after an N++ re-start, I ran Regex Tester.py, but I omitted to open, before, a new document and to move it in the secondary view. Doing that, afterwards, I was surprised to get two tabs New 1 in the secondary view !

        • When using the End of Line characters syntax ( \R, \r or \n) in a regex, they are not highlighted, even if the button Show All characters is set. However, the regex .\R. does highlight the last character of a line and the first character of the next line :-)

        • Let’s consider the subject string aabaaababbbaabbab On the picture, below, just under each regex, I indicated, first, the way your script highlights and outlines the matches, then, a new highlighting and outlining, just below. What do you think of, Claudia ? I don’t even know if it’s technically possible and not too hard to code !

        Image

        But, please Claudia, take all your time, as it seems that you do hundred things, at the same time !!

        Cheers,

        guy038

        1 Reply Last reply Reply Quote 0
        • Claudia FrankC
          Claudia Frank
          last edited by Claudia Frank

          Hi Guy,

          thank you for your kind words.

          First of all, how to stop your script ?! May be, I miss something obvious !

          Obviously I missed that :- (
          Make sure editor2 and the tab where RegexTester isActive… has the focus and then run it a second time.
          Text will change to RegexTester inActive…
          If you accidentally run it in another tab a second time you need to call it in that tab again.
          This isn’t really user friendly and I’m thinking about having a solution like first run of script activates, second run deactivates, regardless
          where you execute it.

          Once, after an N++ re-start, I ran Regex Tester.py, but I omitted to open, before, a new document and to move it in the secondary view.
          Doing that, afterwards, I was surprised to get two tabs New 1 in the secondary view !

          Ooopss - don’t think that my script is responsible but will take a look

          When using the End of Line characters syntax ( \R, \r or \n) in a regex, they are not highlighted, even if the button Show All characters is set.
          However, the regex .\R. does highlight the last character of a line and the first character of the next line :-)

          Yes, I discovered this as well. It looks like scintilla/npp doesn’t allow me to color/access it.

          Let’s consider the subject string aabaaababbbaabbab On the picture, below, just under each regex, I indicated, first, the way your script highlights and outlines the matches, then, a new highlighting and outlining, just below. What do you think of, Claudia ? I don’t even know if it’s technically possible and not too hard to code !

          Obviously I cannot divide the letters but I see what you mean, I’ll think about it, should be possible.

          I saw you reply on the regex /v topic, damned, missed that /v and [/v] have different meanings. Thx for clarifying it - AGAIN. ;-)

          Cheers
          Claudia

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by

            Hello Claudia,

            OK , after having focus on the new 1 tab again and choosing the option Plugins - Python Script - Run Previous Script (Regex Tester), I got, as expected, the text RegexTester inActive… and all the highlighting was suppressed ! But, why the option Plugins - Python Script - Stop script is greyed ?


            To reproduce the issue, about new tabs :

            • Run the Regex Tester python script first ( Menu Plugins - Python Script - Scripts - Regex Tester )

            • Open a new tab ( Menu File - New ) or the CTRL + N shortcut

            • Right-Click) on that new tab and choose, in the context menu, the option Move to Other view

            => Two tabs new 1 are displayed !


            BTW, the message :

            RegexTester isActive [i] i=sensitive, I=insesitive
            

            could be changed into :

            RegexTester isActive [S] S=Sensitive, I=Insensitive
            

            without changing any code :-))

            Cheers,

            guy038

            Claudia FrankC 1 Reply Last reply Reply Quote 0
            • Claudia FrankC
              Claudia Frank
              last edited by

              Hi Guy,

              in regards to the two new1 documents, npp does this in the background.
              When you start npp with only one view open, npp already has opened a new1 document in second view.
              If you don’t access it, it gets deleted/replaced by the one which you move to the second view.

              E.g.
              New start of npp with one view result in one document named new1 which is visible.
              Another new1 document is available in second view but currently invisible.
              If you open another new document -> new2 appears but if you move this to second view,
              new1 from second view gets replaced, and only if it hasn’t been touched in the meantime,
              by new2 document. Not sure if this expected but doesn’t harm anyway.

              In regards to the sensitive switch, yes, you could use any letter as I’m checking for I (Capital i)only ;-)

              Currently preparing a v2 of RegexTester, with your comments ;-)

              Cheers
              Claudia

              1 Reply Last reply Reply Quote 0
              • Claudia FrankC
                Claudia Frank
                last edited by Claudia Frank

                Hi Guy and all,

                Version2 of the RegexTester.
                Improvements based on Guys comments.

                1. If second view isn’t active a message pops up complaining about this and abort the start.

                2. To stop the script you can execute the script a second time regardless which view is active.

                3. Coloring/Grouping changed - based on a even/odd differentiation.

                4. minor changes.

                  import re # import regular expression module

                  editor1.indicSetStyle(10,INDICATORSTYLE.ROUNDBOX) # used to color whole match - odd lines
                  editor1.indicSetFore(10,(95,215,184)) # the color
                  editor1.indicSetAlpha(10,55) # alpha settings
                  editor1.indicSetOutlineAlpha(10,255) # outlining
                  editor1.indicSetUnder(10,True) # draw under the text

                  editor1.indicSetStyle(9,INDICATORSTYLE.ROUNDBOX) # used to color whole match - even lines
                  editor1.indicSetFore(9,(195,215,184))
                  editor1.indicSetAlpha(9,55)
                  editor1.indicSetOutlineAlpha(9,255)
                  editor1.indicSetUnder(9,True)

                  editor1.indicSetStyle(8,INDICATORSTYLE.CONTAINER) # used for sub matches
                  editor1.indicSetFore(8,(100,215,100))
                  editor1.indicSetAlpha(8,55)
                  editor1.indicSetOutlineAlpha(8,255)
                  editor1.indicSetUnder(8,True)

                  isOdd = False # used as even/odd line identifier

                  def match_found(m):

                   global isOdd                                                                                # global, because we modify it
                   
                   if m.lastindex > 0:                                                                         # >0 = how many submatches do we have
                       for i in range(0, m.lastindex + 1):                                                     # loop over it
                           if i == 0:                                                                          # match 0 is always the whole match
                               editor1.setIndicatorCurrent(8)                                                  # set indicator
                               editor1.indicatorFillRange(m.span(i)[0], m.span(i)[1] - m.span(i)[0])           # draw it
                           else:
                               editor1.setIndicatorCurrent(9 if isOdd else 10)                                 # set indicator for sub matches
                               editor1.indicatorFillRange(m.span(i)[0], m.span(i)[1] - m.span(i)[0])           # draw indicator
                               isOdd = False if isOdd else True                                                # set even/odd identifier - next sub match gets coloured different
                                   
                   else:                                                                                       # no sub matches    
                       editor1.setIndicatorCurrent(9 if isOdd else 10)                                         # set indicator for matches
                       editor1.indicatorFillRange(m.span(0)[0], m.span(0)[1] - m.span(0)[0])                   # draw indicator
                       isOdd = False if isOdd else True                                                        # set even/odd identifier
                  

                  def clear_indicator(): # clear all indicators by
                  length = editor1.getTextLength() # calculating length of document
                  for i in range(8,11): # and looping over
                  editor1.setIndicatorCurrent(i) # each indicator to
                  editor1.indicatorClearRange(0,length) # clear the range

                  def regex(): # here the regex starts

                   clear_indicator()                                                                           # first have a clear view ;-)
                       
                   pattern = editor2.getLine(0).rstrip()                                                       # next, get the pattern for the second view and cut of line endings
                   
                   try:                                                                                        # try it
                       if editor2.getLine(2)[22:23] == 'i':                                                    # is it a case insensitive search?
                           editor1.research(pattern, match_found, re.IGNORECASE)                               # then call research with the ignore case flag
                       else:                                                                                   # otherwise
                           editor1.research(pattern, match_found)                                              # call without flag
                   except: 
                       pass                                                                                    # is needed to catch incorrect regular expressions
                  

                  def RegexTester_CHARADDED(args): # callback which gets called each time when char is added in editor
                  regex() # calls itself regex function

                  def RegexTester_UPDATEUI(args): # callback gets called and emulates a CHARDELETE notification
                  if args[‘updated’] == 3: # is a bit of a hack but
                  regex() # seems to work

                  def checkIfSecondViewActive(): # self-explanatory
                  if notepad.getCurrentView() == 0:
                  notepad.messageBox(‘It is needed to have the second view active!!’,‘RegexTester’,0)
                  return False
                  else:
                  return True

                  def startRegexTester(): # start procedures
                  editor.callback(RegexTester_CHARADDED, [SCINTILLANOTIFICATION.CHARADDED]) # register the callbacks charadd
                  editor.callback(RegexTester_UPDATEUI, [SCINTILLANOTIFICATION.UPDATEUI]) # and emulated chardelete

                   wasAlreadyRunning = 1 if console.editor.getProperty('RegexTester_running') == '0' else 0    # this checks if script was already running
                   inputTab = console.editor.getProperty('RegexTester_inputTab')                               # do we have the bufferid of the previous run saved?
                      
                   if inputTab == '':
                       newTabActive = 0  
                       inputTab = notepad.getCurrentBufferID()                                                 # get bufferid from active tab to
                       console.editor.setProperty('RegexTester_inputTab', inputTab)                            # remember where the input is done        
                   else:   
                       notepad.activateBufferID(int(inputTab))         
                       newTabActive = 0 if notepad.getCurrentBufferID() == int(inputTab) else 1                # Has old tab been closed?
                              
                   if wasAlreadyRunning == 1 and newTabActive == 0:                                            # 
                       editor2.replace('RegexTester inActive', 'RegexTester isActive')                         # add the status info to second view
                   else:                                                                                       # no, this is the first time we run the script so           
                       editor2.appendText('\r\n\r\nRegexTester isActive [s] s=sensitive, i=insensitive')       # add the status info to second view
                   
                           
                   console.editor.setProperty('RegexTester_running', '1')                                      # and set the running identifier
                   editor2.setFocus(True)                                                                      # give the second view the focus
                   editor2.gotoLine(0)                                                                         # and jump to line 1
                  

                  def stopRegexTester(): # stop procedures

                   editor.clearCallbacks([SCINTILLANOTIFICATION.CHARADDED])                                    # clear the callback charadded and
                   editor.clearCallbacks([SCINTILLANOTIFICATION.UPDATEUI])                                     # emulated chardeleted
                   console.editor.setProperty('RegexTester_running', '0')                                      # set info that script isn't running
                       
                   inputTab = console.editor.getProperty('RegexTester_inputTab')                               # get the bufferid of the inupttab
                       
                   if notepad.getCurrentBufferID() != inputTab:                                                # is it currently active? if not,
                       notepad.activateBufferID(int(inputTab))                                                 # activate it
                       
                   editor2.replace('RegexTester isActive', 'RegexTester inActive')                             # add the status info to second view
                   clear_indicator()                                                                           # clear all indicators
                   editor1.setFocus(True)                                                                      # and give first view the focus. Have fun 
                  

                  if console.editor.getProperty(‘RegexTester_running’) == ‘1’: # if the script is currently running
                  stopRegexTester() # stop RegexTester
                  else: # else
                  if checkIfSecondViewActive(): # check if second view is the active one
                  startRegexTester() # start RegexTester

                Cheers
                Claudia

                1 Reply Last reply Reply Quote 1
                • Claudia FrankC
                  Claudia Frank @guy038
                  last edited by

                  Hi @guy038

                  just saw that I didn’t answer this question

                  But, why the option Plugins - Python Script - Stop script is greyed ?

                  Technically the script starts, registers its callbacks and ends. Done.
                  The reason why you still see changes while typing the regexes is,
                  that the callbacks get executed by python script plugin.

                  Cheers
                  Claudia

                  1 Reply Last reply Reply Quote 0
                  • guy038G
                    guy038
                    last edited by

                    Hi, Claudia and All,

                    Got plenty of things to do at work, this week and I couldn’t find enough time and motivation to be on forums ! Even yesterday, I rebuilt a server, till 9.30 pm, whose hard disk was definitively dead, without any possibility of restoring ! ( So, please, don’t forget to backup your important files, from time to time ! One never knows ! )

                    Let’s go back, Claudia, to your second version of your Regex Tester Python script. Awesome, you did it ! Not only, it works perfectly well, but I suppose that the re Python’s module don’t have the issues than our present Boost regex version has :-)) Finally, your plugin behaves exactly as the non-official François-R Boyer regex engine does:-)


                    For instance, if we consider the subject string aabaaababbbaabbab, of my previous post, and the regex (?<!a)ba*, the correct results are :

                    • 1st match b, at position 10
                    • 2nd match baa, at positions 11, 12 and 13
                    • 3rd match ba, at positions 15 and 16

                    That is exactly the matches found with your script, as shown below :

                    Image1

                    With the classical regex search, we get 5 matches. But two of them are wrong : the b at position 14 and at position 17


                    A second example. From that link, below, you’ll see the 40 characters of the Osmanya alphabet, in the range [\x{10480}-\x{104AF}], which are, obviously, outside the Unicode BMP Plane

                    http://www.unicode.org/charts/PDF/U10480.pdf

                    With an appropriate font ( Andagii ) set to Default Style of Global Styles , on the picture below, you’ll see that the regex [\x{10485}-\x{104A3}] does find the correct consecutive characters, with the Regex Tester script , UNLIKE the classical regex search, which leads to the error message Invalid regular expression :-((

                    Image2

                    Cheers,

                    guy038

                    1 Reply Last reply Reply Quote 0
                    • Claudia FrankC
                      Claudia Frank
                      last edited by Claudia Frank

                      Hi Guy,

                      many many thx for doing all your tests and efforts. Very much appreciated.
                      Unfortunately your test means that my script failed as I was expecting that I can use
                      it to test functionlist more easily. :-(
                      (I already discovered that functionlist regex behaves strange sometimes but now… - YOUUUU broke it ;-))) kidding.

                      In regards to the François-R Boyer regex implementation I’m on a good way I think.
                      Currently boost regex supports two ways of implementing unicode awareness.
                      Relying on wchar_t, which is how Don implemented it and by using unicode aware regular expression types like François did.

                      So, at the moment, I don’t see how I could merge both codes reasonably, that’s why I started to use François’s code to replace Dons implementation.

                      Cheers
                      Claudia

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by guy038

                        Hello Claudia,

                        I did the tests, of my previous post, with a 6.8.8 version, where I has, previously installed the Python plugin. I decided to verify if S/R are faster or not, with the François-R Boyer version, on the last 6.9.1 N++ version. And there’s a bad new, indeed !

                        The François-R Boyer regex engine, included in his SciLexer.dll version, does NOT work, with the last 6.9.1 version of N++ :-((

                        I verified that it’s OK with the 6.9 version, and the previous versions of Notepad++

                        Why ?!

                        Cheers,

                        guy038

                        1 Reply Last reply Reply Quote 0
                        • Claudia FrankC
                          Claudia Frank
                          last edited by

                          Hi Guy,

                          sorry for answering so late - I had a day off - mostly bicycling and enjoying the nice weather.
                          I thought since upgrading scintilla this lib wasn’t working anymore!!??
                          Because of that I didn’t test it - I will give the original code a try and see what it is complaining about.

                          Will comeback on this.

                          Cheers
                          Claudia

                          1 Reply Last reply Reply Quote 1
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors