User Defined Languages: RegEx-based automatic language detection feature



  • Concepts
    At work, I’m constantly working with a huge library of text files in a variety of languages that contain configuration data for various devices. Sometimes these are Cisco ASA config files, sometimes they’re DAT files that use internal proprietary data formats, and sometimes they’re really just INI-file type configration data for apps that we use internally. I’ve downloaded or created user language files for all of them. As far as UDL’s go, the problem I have is that these files often share common extensions, sometimes with other language files. I can’t rely on a DAT file always opening the right custom DAT file language.

    The feature request is an idea to help Notepad++ detect the inherent document language automatically. It would be great if we could define one or more keywords, especially keywords powered by regular expressions, that trigger the loading of the language file if they are found. This would be an optional feature for each language. For example, in a Cisco ASA configuration file, you might find either or both of the below commands. The presence of these commands is a flag that the file being loaded is an ASA language file. The italicized words would be the keyword triggers.

    • banner asdm Welcome to the Cisco ASA Firewall!

    If you have an ASDM banner, it’s a Cisco ASA configuration file, and it doesn’t actually matter what the rest of that line says. The problem is, not every ASA uses the ASDM command interface or its banner, and so it won’t properly detect the file on that command alone.

    • boot system disk0:/asa904-k8.bin

    The second example, however, is a required command on any ASA device; it specifies the boot ROM the firewall operates from. Every ASA config has this. The problem is, every ASA has the capability to have a different ASA ROM version.

    The Suggestion
    This is where the RegEx comes in. If we could RegEx part or whole of the language-identifying command (such as the filename in this case) we could always detect the ASA ROM command. In this case, I’m looking for a command that (forgive my lack of RegEx experience) emulates something like this:

    • boot system disk*:/asa*.bin

    It doesn’t matter what disk the command loads from, (0,1,2,etc), nor does it matter what version of the ASA software is being used. We’re seeing a valid ASA boot disk command, and so this is probably an ASA config file.

    The key is to find specific syntactical inputs that are unique to a given language; commands, key phrases, etc. This feature would ideally support multiple detection syntaxes, since obviously not every language is going to have a must-have command like the boot system option.

    Additionally, you may have a scenario where matching parameters are met by two or more languages when opening a file; for example, two different UDL’s match the phrase banner login because both Cisco iOS and Cisco ASA support it. Obviously this is a bad syntax-detection command to use for this exact reason, but it’s very possible that two completely different languages happen to have the same syntax detection command by pure chance. In such a case, the user could be offered a listbox or drop-down box that allows them to pick from the matched languages.


Log in to reply