Regex for function list in user defined language



  • Hi all,
    I would like to customise the function list for a user defined language but I know nothing about regex. Can someone give me the regex to find the functions in my language. It is quite simple :

    define method .mymethod(arg1 as string, arg2 as string) as string

    It is always starting with “define method” and the name of the method starts with a “.”, it may have arguments or not and returns a type or not.

    What would be the regex for the Mainexpr ? and to retrieve the name of the method ?

    Thanks for your help



  • Hello Julien,

    If I fully understood you, the main four constructions, below, should be correct syntaxes :

    define method .mymethod(arg1 as string, arg2 as string) as string  (A)
    
    define method .mymethod(arg1 as string, arg2 as string)            (B)
    
    define method .mymethod as string                                  (C)
    
    define method .mymethod                                            (D)
    

    Now, from the first (A) form, are these derived constructions, below, containing spaces, still acceptable ?

    ( Note : there is only ONE modification per line, compared to the (A) form ! )

         define method .mymethod(arg1 as string, arg2 as string) as string    (E) 
    
    define    method .mymethod(arg1 as string, arg2 as string) as string      (F)
    
    define method .mymethod    (arg1 as string, arg2 as string) as string     (G)
    
    define method .mymethod(   arg1 as string, arg2 as string) as string      (H)
    
    define method .mymethod(arg1 as string, arg2 as string    ) as string     (I)
    
    define method .mymethod(arg1 as string, arg2 as string)as string          (J)
    
    define method .mymethod(arg1 as string       , arg2 as string) as string  (K)
    
    define method .mymethod(arg1 as string,arg2 as string) as string          (L)
    

    Some other questions :

    • May these different syntaxes contain tabulation characters instead of space ones ? (M)

    • May these different syntaxes contain upper-case letters ? (N)

    • May these different syntaxes lie on several lines ? (O)

    • What are the possible syntaxes of an argument ? I mean word characters ( letters, digits and underscore ) or only letters ? May they contain other signs ?


    For all these questions, except for the last one, just tell me the different upper letters, between round brackets, that correspond to correct or acceptable syntaxes !

    Of course, I already found general regexes to, both, match your mainExpr and the name of the method. However, with this material, it will be easier to build the right regex, for your exact needs :-)

    Many thanks for your reply,

    See you soon,

    Best regards,

    guy038



  • Hi Guy, thanks a lot for your time.
    I have tested all your propositions (E) to (L) and they are all acceptable .
    Note that © and (D) are not valid because we must have “()” when no arguments.
    (M) yes any space or any tabulation is acceptable
    (N) any keyword can be written with upper or lower case characters, it is not case sensitive (ex “DeFiNe MeThOd … aS sTrIng” is valid)
    (O) it may be written on several lines with a special instruction but I have never seen it, so let’s say it is only one line
    (\P) syntax of arguments : they are always prefixed with one ! character
    Thanks



  • Hi, Julien,

    Thanks for your quick reply !

    OK. Finally, the main four constructions, below, should be correct syntaxes :

    define method .mymethod(!arg1 as string, !arg2 as string) as string  (A)
    
    define method .mymethod(!arg1 as string, !arg2 as string)            (B)
    
    define method .mymethod () as string                                 (C)
    
    define method .mymethod ()                                           (D)
    

    By the way, I suppose that the expression ‘as string’ can be considered literally, in the future regex !

    Is this the only statement ? I mean, could it be the expression as integer or something else ?

    Is the syntax, below, forgotten in my previous post, also acceptable :

    define method     .mymethod(!arg1 as string, !arg2 as string) as string (R)
    

    And, generally speaking, may the name of the method .mymethod, and of the different arguments, after the ! character, contain only letters and/or digits and/or the underscore character ?

    If possible, just provide me some true examples of your code, and, especially, of the define method instruction :-)

    Cheers,

    guy038



  • Wow I have not realised how much details are needed to build this kind of regex ! It seems very complicated. I think NPP should make it more user friendly. You don’t have to do all that in Ultra Edit.
    Anyway thanks for your help, and for the additional info :
    ‘as string’ is not the only statement, it can be anything as we can use any class. and by the way my mistake, we don’t use ‘as’ but ‘is’
    the method and argument names can contain, but must not start with one digits. Underscore is valid in names.

    Some real examples :

    define method .errorLogAppend(!severity is STRING, !error is STRING, !operation is STRING, !command is STRING, !line is STRING)
    define method .comPropData(!element is DBREF, !editFormTitle is STRING, !queryFormTitle is STRING)
    define method .getPlotFile() is STRING
    define method .displayDimension(!value is REAL, !show is BOOLEAN)
    define method .deleteAllDims()
    define method .currentValues(!type is STRING, !keys is ARRAY, !values is ARRAY)
    define method .getVValuesString(!data is DBREF) is ARRAY



  • Hello Julien,

    First of all, don’t worry about regular expressions. For a same regex engine, there are not more difficult with a text editor than an other one ! It just that I’m a bit perfectionist and I thought about your problem in a wrong way !

    Indeed, I tried, first, to imagine a regex which could, both, match all the acceptable cases ( as some extra blank characters ) and avoid all the forbidden cases ( as arguments, not beginning with an exclamation mark ) It’s not the right method to do because :

    • Being acquainted to your language, you probably avoid main big mistakes ( I hope so, anyway ! )

    • When compiling or interpreting your code, you certainly aware of all these syntax problems

    • It quite possible to detect some errors, with very simple regexes ( For instance, to detect, in your language, all names of methods or arguments beginning with a digit, could be detected with the regex [.!]\d )


    So, from your 4 syntaxes, below :

    define method .mymethod(!arg1 is STRING, !arg2 is STRING) is STRING
    
    define method .mymethod(!arg1 is STRING, !arg2 is STRING)
    
    define method .mymethod() is STRING
    
    define method .mymethod()
    

    the common range of characters is, obviously, the string define method .mymethod(, that is to say, the string define method, followed by the name of a method, after a dot character and followed by an opening round bracket.

    If we consider some possible blank characters, between the words, and that the \h syntax = [ \t\xa0], we could use, for mainExpr, the simple regex ^\h*define\h+method\h+\.\w+\h*\(

    To notice the different parts of this regex, we may use, instead, the mode modifier (?x), as below :

    (?x) ^ \h* define \h+ method \h+ \. \w+ \h* \(


    Now, to get the name of the method, ONLY, we’ll use, in expr, the regex define\h+method\h+\.\K\w+(?=\h*\()

    Or, with the mode modifier (?x), the regex (?x) define \h+ method \h+ \. \K \w+ (?= \h* \( )

    Note : Due to the \K syntax, once the first part define\h+method\h+\. is matched, it’s forgotten by the regex engine and the final regex is, ONLY, the regex \w+(?=\h*\(), that matches a word, with the condition to be followed by possible blanks characters and an opening round bracket.


    Therefore, in your functionList.xml file, your should insert the text below :

    <function
        mainExpr="^\h*define\h+method\h+\.\w+\h*\("
        displayMode="$functionName">
        <functionName>
            <nameExpr expr="define\h+method\h+\.\K\w+(?=\h*\()"/>
        </functionName>
    </function>
    

    If we want to build, for mainExpr, a more STRICT regex, adapted to your syntaxes, we’ll consider :

    • The method syntax .name_Method(…)

    • The presence of the main forms !argument is TYPE, with a comma between them, inside the round brackets block

    • At the end, the possible form is TYPE for the method, itself

    These hypotheses lead to the regex :

    ^\h*define\h+method\h+\.\w+\h*\(\h*(!\w+(\h+is\h+\w+)\h*)?(?:,\h*(?1))*\)(?2)?

    Note : This regex contains 3 groups :

    • The two first ones are re-used, further, as subroutine calls (?1) and (?2)

    • The third one is a non-capturing group (?:....., which is followed by the * quantifier, but is NOT used, anywhere else, in the regex

    With the mode modifier (?x), we obtain the regex :

    (?x) ^ \h* define \h+ method \h+ \. \w+ \h* \( \h* ( ! \w+ ( \h+ is \h+ \w+ ) \h* )? (?: , \h* (?1) )* \) (?2)?

    To better understand this regex, you may split this regex in four main parts :

    (?x)  ^  \h*  define  \h+  method
          \h+  \.  \w+  \h*
          \(  \h*  (  !  \w+  (  \h+  is  \h+  \w+  )  \h*  )?  (?:  ,  \h*  (?1)  )*  \)
          (?2)?
    
    • The first part matches the string define method, preceded, from the beginning of line, by possible blank characters

    • The second part matches the string .mymethod, preceded by one or more blank characters and followed by zero or more blanks

    • The third part matches the possible string !arg1 is TYPE, !arg2 is TYPE,…, !argN is TYPE, surrounded by two round brackets

    • The fourth part matches the possible string is TYPE


    Giving this new regex, you should, this time, insert in your functionList.xml file :

    <function
        mainExpr="^\h*define\h+method\h+\.\w+\h*\(\h*(!\w+(\h+is\h+\w+)\h*)?(?:,\h*(?1))*\)(?2)?"
        displayMode="$functionName">
        <functionName>
            <nameExpr expr="define\h+method\h+\.\K\w+(?=\h*\()"/>
        </functionName>
    </function>
    

    I did a quick test, changing the mainExpr and the expr of the INI, parser ( I could have chosen an other parser ! ) , in the my functionList.xml file.

    Then, after copying your 7 examples in a Test.ini file , I was able to see the name of these 7 methods, in the functionList window !

    Cheers,

    guy038

    P.S. :

    Do understand the difference between a subroutine call (?1) to the group 1 and a backreference \1 to that same group. For instance :

    (\d+)ABC\1 matches the strings 1234ABC1234 or 99ABC99 but NOT the strings 45678ABC11 or 73ABC9999

    (\d+)ABC(?1) matches the FOUR strings 1234ABC1234, 99ABC99, 45678ABC11 and 73ABC9999 and any string ABC, surrounded by numbers

    Indeed, when you write (\d+)ABC(?1), it’s just like if you write the regex \d+ABC\d+



  • Hell Guy,
    Thank you so much for you time. It works very well. And thanks for all this information. I admire generous people like you.
    Cheers



  • I have a very similar situation and am having a rough time getting the proper regular expressions in my functionList.xml for my user defined language. Here are some details and examples:

    Functions are defined with the (always case insensitive) word “def” and may or may not be preceeded by a 5 digit number. all function names must start with “fn”
    If a function has parameters they are included in parenthesis after the function name. Parameters will be seperated by commas. However one semi-colon may seperate all the optional parameters from the required ones (instead of a comma)
    Examples

    def fnwhat3ver
    00100 def fnAnotherExample
    DEF FNwithParameters(arg_number,arg_string$)
    00200 Def FnWithOptional(par_12;par_option_3)

    I really appreciate any help on this matter. I am a long time user and lover of Notepad++ and hope to implement this feature for myself and to win over some new n++ users.

    -John


Log in to reply