Community
    • Login

    Need clarification about "built-in" language lexers

    Scheduled Pinned Locked Moved General Discussion
    4 Posts 2 Posters 253 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • pbarneyP
      pbarney
      last edited by

      I sometimes code in Python, and Notepad++ dutifully does its language highlighting. But I just noticed that I don’t have the language file for Python installed. I must have deselected it on install. It’s not listed under Languages → P.

      And yet… my code is still properly highlighted.

      So I asked ChatGPT why, and it says that Scintilla’s underlying lexer library SciLexer.dll has it’s own built-in lexers that are available to NPP even if you don’t explicitly install them in NPP.

      In Lexilla 5.4.4 (included with NPP v8.8.1):

      • AutoIt (AU3)
      • Abaqus
      • Ada
      • Asciidoc
      • Assembler / ASM
      • Bash / Shell
      • Basic / VB-style BASIC
      • Batch / Windows batch files
      • CIL / .NET IL
      • COBOL
      • C / C++
      • CSS
      • Caml / OCaml
      • CMake
      • CoffeeScript
      • D (programming language)
      • Dart
      • Diff / Patch
      • Erlang
      • Forth
      • Fortran
      • GDScript
      • HTML
      • Haskell
      • Julia
      • LaTeX / TeX
      • Lisp
      • Lua
      • Makefile
      • Markdown
      • MATLAB / Octave
      • Nim
      • NSIS
      • Null (no-op / fallback)
      • PO (gettext)
      • Pascal / Delphi
      • Perl
      • PowerShell
      • Properties files
      • Python
      • R
      • Raku
      • Ruby
      • Rust
      • SQL
      • Smalltalk
      • Tcl
      • txt2tags
      • Verilog
      • VHDL
      • Visual Prolog
      • YAML
      • Zig

      But this doesn’t strike me as a true statement, because if it were so, then why would NPP even include language files for these languages?

      And yet, it’s still lexing Python for me even though I don’t have NPP’s Python language file loaded.

      So I’m clearly not understanding something. What am I misunderstanding?

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @pbarney
        last edited by

        @pbarney said in Need clarification about "built-in" language lexers:

        But I just noticed that I don’t have the language file for Python installed.

        If you think it’s an individual file for each language, which “language file” are you talking about? I’ll come back to this point

        I must have deselected it on install. It’s not listed under Languages → P.

        Then, at some point, you probably went to Preferences > Languages and moved Python to Disabled Items. Note that when it’s in the Disabled Items list, it doesn’t actually disable that language from doing syntax highlighting, it just removes it from the visible Languages > … menu, to declutter your menu from languages you don’t use; if you open a file that has the right extension (like .py for Python files), then it will recognize it and automatically choose Python, even though you don’t see Python in the menu. You will notice, even in your current state, that Settings > Style Configurator’s Language pulldown still has Python available, and when you choose it, there are still colors defined for Python’s various styles.

        So I asked ChatGPT why,

        Why would you believe that atrocity?

        Scintilla’s underlying lexer library SciLexer.dll has it’s own built-in lexers that are available to NPP even if you don’t explicitly install them in NPP.

        Yes. And no. That random text generator only listed 53, but as is obvious from the user manual, there are around 90. So it’s underreporting by almost a factor of 2.

        then why would NPP even include language files for these languages?

        You have a fundamental misunderstanding of how Notepad++ and Scintilla work together.

        • Scintilla – or, more accurately, Lexilla – provides the code (the logic) that does the lexing.
        • Notepad++ decides which of the Lexilla lexers it enables from the library. Lexilla has many lexers which Notepad++ doesn’t enable; and Notepad++ can actually use the same lexer for many languages (if the lexer is designed that way; for example, XML and HTML and some others are all done by the same lexer)
        • The langs.xml is used to define the default extensions for a language (the ones that show up in the Style Configuator’s Default ext.: box:
          1a5226fd-4204-4708-8d16-c896063858f0-image.png
        • The langs.xml is also used to define the default lists of Keywords for some of the styles (like the KEYWORDS style in Python):
          d601bfd0-e210-43cf-8ba5-0d5d662814b9-image.png
        • The stylers.xml or themes\<ThemeName>.xml is used to store which colors are assigned to each style for a given language
        • The functionList\<languageName>.xml is used to determine what things show up if you have View > Function List panel visible
        • The autoCompletion\<languageName>.xml is used to determine which keywords are available for easy auto-completion, and which function parameters are know for function-parameter auto-completion

        So, to sum up:

        • there is no one “file” for a given built-in language
        • you purposefully removed Python from the menu at some point (and presumably forgot about it), but that doesn’t disable the Python lexer, it just removes it from the menu
        • you mistakenly believed a random number generator that “predicts” the next word in its response based on statistics on the words that came before could actually provide you with facts or truth. If you’re lucky, an LLM like ChatGPT might point you in the right direction – you just weren’t lucky
        pbarneyP 2 Replies Last reply Reply Quote 2
        • pbarneyP
          pbarney @PeterJones
          last edited by pbarney

          at some point, you probably went to Preferences > Languages and moved Python to Disabled Items.

          You’re right, I can’t imagine why I would have done that, but there it is.

          then why would NPP even include language files for these languages?

          You have a fundamental misunderstanding of how Notepad++ and Scintilla work together.

          After digging into it more, I realize that I also had a fundamental misunderstanding of what Lexilla was actually providing. I thought that it determined the classification of tokens, and the stylers were just for coloring, but I see that it’s a bit more involved with that.

          you mistakenly believed a random number generator that “predicts” the next word in its response based on statistics on the words that came before could actually provide you with facts or truth. If you’re lucky, an LLM like ChatGPT might point you in the right direction – you just weren’t lucky

          Well the problem really was my misinterpretation of what I read, given the deeper misunderstanding how how Lexilla works, and I just jumped the gun on that.

          Thanks for taking time to write all that! The info you provided was incredibly clarifying. Thank you.

          1 Reply Last reply Reply Quote 0
          • pbarneyP
            pbarney @PeterJones
            last edited by

            @PeterJones said in Need clarification about "built-in" language lexers:

            So I asked ChatGPT why,
            

            Why would you believe that atrocity?

            It was probably an off-the-cuff question, but I figured I’d take it seriously. I know that this is going off-topic, so feel free to cull this response if you like.

            To tweak the old Russian maxim, it’s very much a case of “distrust until verified” (which is why I posted my question instead of just swallowing what the thing spit out.

            I’m not particularly a fan of them, and I honestly believe that in time, we (as in humanity) may come to regret their invention and our likely inevitable overdependence on them.

            But I’m also not an ignorant neophyte. I’m actually very well aware of the limitations and problems with LLM’s, probably more than most people, and despite that, I’ve found them to be useful in some contexts.

            First, you’re not wrong to call them “random text generators,” but that really is an oversimplification. It’s not just a flat index of word frequencies. Tehy’re trained with billions (or even trillions) of parameters that encode patterns across syntax, semantics and reasoning heuristics. From a purely mathematical point of view, it’s actually pretty intersting. But saying it’s “just statistics” is a bit like saying the human brain is “just firing neurons.” Yeah, it’s technically true, but it misses the interesting part.

            So yes, because they are probabilistic sequence models, they are perfectly capable of fabricating “facts” (i.e., hallucinations, especially with multi-dimensional requests or as the context window gets filled up), making overgeneralizations like missing edge cases, or have issues with compression bias, shallow chain-of-reasoning (although this one is getting a little better), ambiguity drift, context inference biases, fidelity drift when repeatedly iterating through details, context window size limitations for long conversations, etc. I have some experience dealing with each of these limitations to some extent.

            So I know all that going in, and since I do, I know not to rely on them as primary sources, and also how to account for many of those problems and a number of strategies to somewhat limit and mitigate the problems (e.g., authoritative source anchoring, chunking, forcing tabular output, explicitly prompting for blanks instead of it making guesses, etc). If I’m doing anything serious, I’ll use all the tools at my disposal, but I still know that if the output isn’t testable, it’s not trustworthy and I know not to rely on it for expertise; it’s just a tool I use to speed up my info gathering. I think of it as supplementary rather than authoritative.

            So it’s an occasionally useful tool that’s saved me some time by giving me a starting point to quickly gather ideas and point me to things I might not have thought of before I check with reliable sources (like you) that can actually confirm or invalidate them.

            I don’t expect to change any minds about it, and in truth, I don’t really even want to, but you always take the time to thoroughly answer people’s questions, and I wanted to respect that in turn.

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors