Community
    • Login

    [New Plugin] CSV Lint

    Scheduled Pinned Locked Moved Notepad++ & Plugin Development
    81 Posts 25 Posters 71.9k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Bas de ReuverB
      Bas de Reuver
      last edited by

      This plug-in was already released, but I just wanted to announce it here because it’s now also available in the Plugins > Plugins Admin menu.

      CSV Lint is a plug-in that adds syntax highlighting to csv or fixed width data files, it will automatically detect columns and datatypes and you can validate the csv datasets. It is meant as a quality control tool, to quickly examine, verify or polish up a datasets before further processing.

      csv_lint_example.png

      There are options to change datetime formats, decimal separators, split columns, count unique values etc. It’s a utility for data cleaning and working with messy data files. The features as of v0.4.1:

      • Scan for metadata, i.e. detect columns and datatypes
      • Create schema.ini based on metadata
      • Validate data against schema.ini
      • Convert datetime/decimal values to different formats
      • Convert between comma, semicolon, tab separated, fixed width formats
      • Analyse data report
      • Split valid/invalid values into two separate columns
      • Other split column options (see screenshot)
      • Count unique values of one or more columns
      • Convert csv data to SQL insert script

      Let me know what you think. Any ideas for additional features?

      1 Reply Last reply Reply Quote 6
      • Bas de ReuverB
        Bas de Reuver
        last edited by Bas de Reuver

        CSV Lint v0.4.3 was updated yesterday. Bit of a false start with the previous version, sorry about that, it gave an error message CSVLint.xml is missing when installing, this is now fixed. Thanks to everyone who gave feedback.

        You’ll need Notepad++ v8.1.9.1 (latest version) to install the plugin through the Plugins > Plugins Admin... menu item. For older versions you can install the new plugin by copying the .DLL manually.

        Also, in this update

        • Fluent UI and dark mode icons
        • 4 color presets, accessible in Settings
        • All user dialogs settings are saved now
        • Reformat, new Vertically align option
        • New SQL export setting, mySQL or MS-SQL
        • Documentation & help (link in About dialog)
        • Various bugfixes
        1 Reply Last reply Reply Quote 4
        • Bas de ReuverB
          Bas de Reuver
          last edited by

          CSV Lint v0.4.4 was updated today, it now supports large integer and decimal values. This means values with many digits, like 12 or more, can be auto-detected and validated as integer. For the rest it’s mostly minor improvements and bugfixes.

          1 Reply Last reply Reply Quote 3
          • Bas de ReuverB
            Bas de Reuver
            last edited by

            I noticed that some people use the CSV Lint only for the syntax highlighting colors, and they don’t seem to be aware of the other features.

            That’s why I’ve created a video to show how you can use this plug-in to validate data, reformat datetime values, split column functions etc.

            CSV data in Notepad++

            _ C64 CUSTOMS __ Alan KilbornA Lycan ThropeL 3 Replies Last reply Reply Quote 6
            • _ C64 CUSTOMS __
              _ C64 CUSTOMS _ @Bas de Reuver
              last edited by

              @bas-de-reuver How do you format this fixed column width like shown in the Youtube thumbnail?

              Bas de ReuverB 2 Replies Last reply Reply Quote 0
              • Bas de ReuverB
                Bas de Reuver @_ C64 CUSTOMS _
                last edited by Bas de Reuver

                @_-c64-customs-_ I realise now I didn’t show that in the video 😅 Open your csv file and then go to the menu Plugins > CSV Lint > CSV Lint window, then click the Reformat button and in the dialog check the Align vertically checkbox and press OK.

                Btw make sure that the metadata (in the lower left textbox) is correct before you do this, else you get a weird result. If there is just one column Col1=Textfile Text Width 9999 then maybe press the “Refresh from data” button first.

                _ C64 CUSTOMS __ 1 Reply Last reply Reply Quote 1
                • Bas de ReuverB
                  Bas de Reuver @_ C64 CUSTOMS _
                  last edited by

                  @_-c64-customs-_ Also, there is a small bug at the moment which was already mentioned, it doesn’t vertically align the column headers (the first line) correctly. This will be fixed in the next update.

                  _ C64 CUSTOMS __ 1 Reply Last reply Reply Quote 1
                  • _ C64 CUSTOMS __
                    _ C64 CUSTOMS _ @Bas de Reuver
                    last edited by

                    @bas-de-reuver Great, thank you very much for the info and thank you very much for the plugin at all, forgot to say this in my first post! Great work!

                    1 Reply Last reply Reply Quote 2
                    • _ C64 CUSTOMS __
                      _ C64 CUSTOMS _ @Bas de Reuver
                      last edited by

                      @bas-de-reuver It looks like there is another small bug, not all lines are aligned equally:
                      c1ca78e9-941b-48b1-9e6b-a6b742762453-image.png

                      PeterJonesP 1 Reply Last reply Reply Quote 1
                      • PeterJonesP
                        PeterJones @_ C64 CUSTOMS _
                        last edited by PeterJones

                        @_-c64-customs-_

                        My guess that the problem isn’t with the plugin not aligning things. The problem is that you don’t have your encoding correct, so Notepad++ is displaying the n-byte sequence for unicode characters as multiple characters instead of properly displaying the single character represented by those multiple bytes. You might try Settings > Preferences > New Document > Encoding > UTF-8: Apply to opened ANSI files

                        860006b9-dfe7-4738-b6bb-cd29ba5c5150-image.png

                        … which should make Notepad++ use UTF-8 instead of mis-reading that as an Win-1252 file even though it should be UTF-8.

                        –
                        Addendum: specifically, ä are the Win-1252 interpretation of the two bytes at 0xC3 (decimal 195) and 0xA4 (decimal 164). The sequence 0xC3 0xA4 is the two-byte UTF-8 sequence for ä. Once you load a file and it’s misinterpreted the UTF-8 as Win-1252, the easiest way to fix things is to change from Encoding > ANSI to Encoding > UTF-8 (use the top half of the menu, not the bottom “convert” half, because you don’t want to convert the file to a different byte sequence, you just want to change how Notepad++ interprets the bytes it’s already read)
                        47417bab-6594-410f-9e1e-5fd31d46c76a-image.png

                        _ C64 CUSTOMS __ Bas de ReuverB 2 Replies Last reply Reply Quote 3
                        • _ C64 CUSTOMS __
                          _ C64 CUSTOMS _ @PeterJones
                          last edited by

                          @peterjones Ah, great, thanks a lot. I have set the preferences as shown above by you but it somehow hasn’t worked but your hint about changing the encoding afterwards has worked very well!

                          1 Reply Last reply Reply Quote 2
                          • Bas de ReuverB
                            Bas de Reuver @PeterJones
                            last edited by

                            @peterjones That is indeed the cause of the problem, thanks for the thorough answer👍

                            It has to do with ANSI<->UTF8 conversion, and the letters with diacritics (à é ü etc.) are incorrectly split into two characters, “Bünder Voll” becomes “Bünder Voll”.

                            When the CSVLint plug-in reformats the csv file, I use a StringBuilder to create the new reformatted file content:

                            StringBuilder datanew = new StringBuilder();
                            
                            // .. do the reformat
                            
                            // update text in editor
                            scintillaGateway.SetText(datanew.ToString());
                            

                            Right up until the line SetText(datanew.ToString()) the StringBuilder contains the correct “Bünder Voll” etc. values. So I suspect the problem is in scintillaGateway.SetText(?).

                            It’s good that there is a work-around, but I don’t know how to fix this properly.

                            PeterJonesP rdipardoR 2 Replies Last reply Reply Quote 0
                            • PeterJonesP
                              PeterJones @Bas de Reuver
                              last edited by

                              @bas-de-reuver ,

                              I cannot give you the answer, because I don’t know the StringBuilder or scintillaGateway requirements. But my guess is that .ToString() is outputting the data in a different encoding than .SetText() wants coming in, hence the corruption of the encoding. Maybe look into the APIs of both, and see if either come with an option to change the encoding. Or if they don’t have that, then maybe you need some translator in between: scintillaGateway.SetText(translate(datanew.ToString()))

                              Sorry I cannot be more specific than that.

                              1 Reply Last reply Reply Quote 2
                              • rdipardoR
                                rdipardo @Bas de Reuver
                                last edited by

                                @bas-de-reuver,

                                Right up until the line SetText(datanew.ToString()) the StringBuilder contains the correct “Bünder Voll” etc. values.

                                That’s because the API detects the file encoding for you:

                                  /// <summary>
                                  /// Reads the whole document as a text stream, trying to use the right encoding
                                  /// </summary>
                                  public static StreamReader StreamAllText()
                                  {
                                    var doc = PluginBase.CurrentScintillaGateway;
                                    var codepage = doc.GetCodePage();
                                    var encoding = codepage == (int)SciMsg.SC_CP_UTF8 ? Encoding.UTF8 : Encoding.Default;
                                    return new StreamReader(StreamAllRawText(), encoding);
                                  }
                                

                                Problem is, a StringBuilder is just a simple utility with no encoding property that you can set, so the text returned by ToString() will be encoded in the system’s default (usually single-byte) code page.

                                Creating a StreamWriter with the StreamWriter(Stream, Encoding) overload would be more useful. The second parameter could be set by calling scintillaGateway.GetCodePage() and choosing an appropriate System.Text.Encoding based on the return value (as in the API method shown above). Scintilla doesn’t declare unique constants for every possible encoding; SC_CP_UTF8 really stands for “Unicode,” i.e., any multi-byte encoding.

                                If want to keep the simplicity of StringBuilder, you could always reduce the reformatted text to bytes, encode each one, then recompose them into a string, like this:

                                              StringBuilder datanew = new StringBuilder();
                                
                                              // ... do the reformat
                                
                                             /// try to match the file encoding of the open buffer
                                             /// <seealso cref="CsvQuery.PluginInfrastructure.ScintillaStreams.StreamAllText"/>
                                             Encoding docEncoding =
                                                 scintillaGateway.GetCodePage() == (int)SciMsg.SC_CP_UTF8
                                                 ? Encoding.UTF8
                                                 : Encoding.Default;
                                
                                             // update text in editor
                                             var byteBuf = new char[datanew.Length];
                                             datanew.CopyTo(0, byteBuf, 0, datanew.Length);
                                             var dataBytes = docEncoding.GetBytes(byteBuf);
                                             scintillaGateway.SetText(docEncoding.GetString(dataBytes));
                                

                                Note The fallback choice of System.Text.Encoding.Default is just for illustration. It’s not recommended in practice on .NET Framework. Besides, every character in the ASCII code page fits inside the CLR Char type (which is always UTF-16).

                                1 Reply Last reply Reply Quote 5
                                • Alan KilbornA
                                  Alan Kilborn @Bas de Reuver
                                  last edited by

                                  @bas-de-reuver said in [New Plugin] CSV Lint:

                                  That’s why I’ve created a video to show how you can use this plug-in to validate data, reformat datetime values, split column functions

                                  I got around to watching the video. Very nice intro to the plugin!

                                  1 Reply Last reply Reply Quote 3
                                  • Lycan ThropeL
                                    Lycan Thrope @Bas de Reuver
                                    last edited by Lycan Thrope

                                    @bas-de-reuver ,

                                    I had watched your video earlier, but being involved elsewhere with my developing UDL and associated files needing to be done, I didn’t get to really appreciate what it was offering. However, now that the language is mostly done, for now, I started going back to a project of mine that has been “slow-rolling” and started working on it. One of the things that I was trying to do was break down what were fledgling attempts at a quick database that was huge. The data was all needed, I just didn’t take the time to break them into smaller usable entities while I was making a quick app for data entry, viewing, searching, etc.

                                    I needed to clean up, and I was able to separate in dBASE some of the table information, in this case, customers (actually shippers and receivers but am combining their information under just customers) and I needed to clean up and split a field. I could have probably done it in my environment, but decided to take the time to see if I could use your plug-in to do some of the work and simplify the cleanup. It worked beautifully, and although I could accomplish it by not converting it to CSV, it was so much simpler just to convert the data and split and clean it up via your plugin.

                                    I just wanted to thank you for developing this plugin, and making the video, that, although I didn’t understand all of the capabilities you were mentioning about it at the time, I figured it couldn’t hurt to play with it a little, and I’m very happy I did. Thanks for doing the plugin and video. Keep up the good work. :)

                                    Bas de ReuverB 1 Reply Last reply Reply Quote 6
                                    • Bas de ReuverB
                                      Bas de Reuver @Lycan Thrope
                                      last edited by

                                      @lycan-thrope said in [New Plugin] CSV Lint:

                                      It worked beautifully

                                      Cool, that is also the goal for this plugin; save time by making the inspecting and cleaning of data easier. So thanks 😀 that’s nice to hear you found it useful.

                                      1 Reply Last reply Reply Quote 2
                                      • TanquenT
                                        Tanquen
                                        last edited by

                                        I’m not able to get this plugin to work. I need to make sure it’s not adding any text/data. I just want to make the CSV data easier to read.
                                        My CSV has a large number of columns and different headers every few rows. It defaults to FixedLength but after changing to CSVDelimited it just adds the text “XML” at the top and nothing changes.

                                        Format=FixedLength
                                        ColNameHeader=False
                                        Col1=XML Text Width 9999

                                        Format=CSVDelimited
                                        ColNameHeader=False
                                        Col1=XML Text Width 9999

                                        Bas de ReuverB 1 Reply Last reply Reply Quote 1
                                        • T SwitzerT
                                          T Switzer
                                          last edited by

                                          any update planned to update CSV Lint to work with current version of notepad ++

                                          Lycan ThropeL Bas de ReuverB 2 Replies Last reply Reply Quote 1
                                          • Lycan ThropeL
                                            Lycan Thrope @T Switzer
                                            last edited by

                                            @t-switzer ,
                                            There already is, but since you haven’t posted which version of NPP you’re using, the assumption is that it is the latest version, and yes, there is an update for it. At present, you’ll need to delete the current version in the plugin folders, or it won’t allow the new NPP to start. Then after you get it started, you can install the newest plugin via the Plugin manager, or go to this site and download it yourself for a self install: CSVList Github page

                                            1 Reply Last reply Reply Quote 2
                                            • First post
                                              Last post
                                            The Community of users of the Notepad++ text editor.
                                            Powered by NodeBB | Contributors