Functionlist - Different results with different line endings



  • Hello everybody.

    I try to write a parser for universe basic files with a UNIX line ending which should list the labels. Some labels are found, some not. The only difference between the labels is that some directly end with a line ending and the other are followed by a comment. The labels followed by a comment are listed, the other not. After changing the line ending from Unix (LF) to Windows (CR + LF) all labels are listed. Here is an example:

    label1:\n
    

    Label not listed.

    label1:     ;* comment\n
    

    Label listed.

    Now change the line ending to Windows:

    label1:\r\n
    

    Label listed.

    label1:     ;* comment\r\n
    

    Label listed.

    Can somebody confirm this behavior? is there a way to list labels directly followed by a UNIX line ending?

    Sincerely Bernd



  • Yes there is a way. It depends how you implemented the parser. So it might be helpful if you post your parser.



  • I copied the parser from BAT-files and change it:

    <parser
        id         ="universe-basic"
        displayName="UniVerse Basic"
        commentExpr="(^\*.*?$)"
    >
        <function
            mainExpr="^[\w.]+\:"
        >
        <!--
            <functionName>
                <nameExpr expr=".*:" />
            </functionName>
        -->
        </function>
    </parser>
    

    I only use the mainExpr to list the labels. Maybe this is also a reason the not listed labels.



  • Escape of colon (\:) is not wrong but not required either. Can’t find anything else.

    My take:

    			<parser
    				displayName="UniVerse BASIC"
    				id         ="universe_basic"
    				commentExpr="(?x)                                               # Utilize inline comments (see `RegEx - Pattern Modifiers`)
    						(?m-s:
    							(?:^|;)                                             # at start-of-line or after end-of-statement
    							\h*                                                 # optional leading whitespace
    							(?-i:REM\b|\x24\x2A|[\x21\x2A])                     # Single Line Comment 1..4
    							.*$                                                 # whatever, until end-of-line
    						)
    					|	(?:\x22[^\x22\r\n]*\x22)                                # String Literal - Double Quoted
    					|	(?:\x27[^\x27\r\n]*\x27)                                # String Literal - Single Quoted
    					|	(?:\x5C[^\x5C\r\n]*\x5C)                                # String Literal - Backslash Quoted
    					"
    			>
    				<function
    					mainExpr="(?x)                                              # Utilize inline comments (see `RegEx - Pattern Modifiers`)
    							(?m-i)^                                             # case-sensitive, NO leading whitespace
    							(?:
    								\d+\b(?=:?)                                     # completely numeric label, colon optional + discarded
    							|	[A-Za-z_][\w.$%]*(?=:)                          # alphanumeric label, colon required + discarded
    							)
    						"
    				/>
    			</parser>
    


  • Many thanks for your solution. Your Parser works fine.

    It also gives me a lot of stuff to study. I never heard of Pattern Modifiers before. I looked for Pattern Modifiers and found a tutorial on regular-expressions.info. I’ll try to work it through and learn more about Regular Expressions.

    Sincerely Bernd


Log in to reply