Advice developing plugin that should process large files



  • I’m working on a plugin to handle CSV files, and I’m running into an issue with large files.

    When trying to parse file like 10MB or 100MB of text, the Notepad++ plugin seems to freeze. Now obviously reading a large file will take some time, but Notepad++ stops responding and even after just waiting a long while the process never seems to finish.

    It is a C# plugin and to read and process the file I’m using something like this:

    var data = ScintillaStreams.StreamAllText();
    
    while (line != null)
    {
      line = data.ReadLine();
      //etc.
    }
    

    Should this be working, is this a good way to read the currently opened file in NPP into the plugin?
    And could this cause the plugin to crash, or is it maybe caused by something else?

    Does anyone know a better way to handle this, or have any advice?



  • Disclaimer - I don’t know enough about C# to understand if this is
    the code to go with.

    Q1: Is this C# code using the latest scintilla version? Recent Npp introduced a new version and this introduced a modified notification structure.

    Q2: Are you doing your validation from a npp or scintilla callback?
    If so, make sure your code is as fast as possible or use a thread for doing the validation part.

    I’ve done a quick test with python and a downloaded csv from here.
    Validating these 1.500.000 lines took ~20 seconds on my
    old i5 2nd Gen. and have to say that I just used some very naive approach like using GetText instead of GetCharacterPointer etc…
    I saw my memory usage of npp was increasing from ~400 to 1.5GB during validating so make sure your resources are available.



  • Since ScintillaStreams is something I wrote I can give you some tips.

    Reading 100 MB from N++ is going to take a few seconds, not matter how you do it. In order to not lock up the N++ interface you have to make sure you do the processing in a different thread, e.g. something like

             Task.Factory.StartNew(() =>
                {
                    var data = ScintillaStreams.StreamAllText();
    
                    while (line != null)
                    {
                        line = data.ReadLine();
                        //etc.
                    }
    
                    // Only interact with N++ on the main thread
                    this.Invoke((Action)(() => { MessageBox.Show("The stuff is finished", "MyPlugin"); }));
                });
    

    The way ScintillaStreams works is by getting a pointer to the N++ text buffer, and read it as “raw” as possible. This is likely going to crash horribly if the user modifies the text while it’s reading, so giving control back to the user is kind of a two-edged sword. I still feel it’s worth it though, the user experience is much better. Also consider showing the status of your progress somehow so the users feels something is happening.

    I don’t know why “the process never seems to finish”. A 10Mb file should take a second or so to read


Log in to reply