[New Plugin] CSV Lint
-
CSV Lint v0.4.4 was updated today, it now supports large integer and decimal values. This means values with many digits, like 12 or more, can be auto-detected and validated as integer. For the rest itâs mostly minor improvements and bugfixes.
-
-
@bas-de-reuver How do you format this fixed column width like shown in the Youtube thumbnail?
-
@_-c64-customs-_ I realise now I didnât show that in the video đ Open your csv file and then go to the menu
Plugins > CSV Lint > CSV Lint window
, then click theReformat
button and in the dialog check theAlign vertically
checkbox and press OK.Btw make sure that the metadata (in the lower left textbox) is correct before you do this, else you get a weird result. If there is just one column
Col1=Textfile Text Width 9999
then maybe press the âRefresh from dataâ button first. -
@_-c64-customs-_ Also, there is a small bug at the moment which was already mentioned, it doesnât vertically align the column headers (the first line) correctly. This will be fixed in the next update.
-
@bas-de-reuver Great, thank you very much for the info and thank you very much for the plugin at all, forgot to say this in my first post! Great work!
-
@bas-de-reuver It looks like there is another small bug, not all lines are aligned equally:
-
My guess that the problem isnât with the plugin not aligning things. The problem is that you donât have your encoding correct, so Notepad++ is displaying the n-byte sequence for unicode characters as multiple characters instead of properly displaying the single character represented by those multiple bytes. You might try Settings > Preferences > New Document > Encoding > UTF-8: Apply to opened ANSI files
⊠which should make Notepad++ use UTF-8 instead of mis-reading that as an Win-1252 file even though it should be UTF-8.
â
Addendum: specifically,Ă€
are the Win-1252 interpretation of the two bytes at 0xC3 (decimal 195) and 0xA4 (decimal 164). The sequence 0xC3 0xA4 is the two-byte UTF-8 sequence forÀ
. Once you load a file and itâs misinterpreted the UTF-8 as Win-1252, the easiest way to fix things is to change from Encoding > ANSI to Encoding > UTF-8 (use the top half of the menu, not the bottom âconvertâ half, because you donât want to convert the file to a different byte sequence, you just want to change how Notepad++ interprets the bytes itâs already read)
-
@peterjones Ah, great, thanks a lot. I have set the preferences as shown above by you but it somehow hasnât worked but your hint about changing the encoding afterwards has worked very well!
-
@peterjones That is indeed the cause of the problem, thanks for the thorough answerđ
It has to do with ANSI<->UTF8 conversion, and the letters with diacritics (Ă Ă© ĂŒ etc.) are incorrectly split into two characters, âBĂŒnder Vollâ becomes âBĂÂŒnder Vollâ.
When the CSVLint plug-in reformats the csv file, I use a
StringBuilder
to create the new reformatted file content:StringBuilder datanew = new StringBuilder(); // .. do the reformat // update text in editor scintillaGateway.SetText(datanew.ToString());
Right up until the line
SetText(datanew.ToString())
the StringBuilder contains the correct âBĂŒnder Vollâ etc. values. So I suspect the problem is inscintillaGateway.SetText
(?).Itâs good that there is a work-around, but I donât know how to fix this properly.
-
I cannot give you the answer, because I donât know the StringBuilder or scintillaGateway requirements. But my guess is that
.ToString()
is outputting the data in a different encoding than.SetText()
wants coming in, hence the corruption of the encoding. Maybe look into the APIs of both, and see if either come with an option to change the encoding. Or if they donât have that, then maybe you need some translator in between:scintillaGateway.SetText(translate(datanew.ToString()))
Sorry I cannot be more specific than that.
-
Right up until the line
SetText(datanew.ToString())
the StringBuilder contains the correct âBĂŒnder Vollâ etc. values.Thatâs because the API detects the file encoding for you:
/// <summary> /// Reads the whole document as a text stream, trying to use the right encoding /// </summary> public static StreamReader StreamAllText() { var doc = PluginBase.CurrentScintillaGateway; var codepage = doc.GetCodePage(); var encoding = codepage == (int)SciMsg.SC_CP_UTF8 ? Encoding.UTF8 : Encoding.Default; return new StreamReader(StreamAllRawText(), encoding); }
Problem is, a
StringBuilder
is just a simple utility with no encoding property that you can set, so the text returned byToString()
will be encoded in the systemâs default (usually single-byte) code page.Creating a StreamWriter with the
StreamWriter(Stream, Encoding)
overload would be more useful. The second parameter could be set by callingscintillaGateway.GetCodePage()
and choosing an appropriateSystem.Text.Encoding
based on the return value (as in the API method shown above). Scintilla doesnât declare unique constants for every possible encoding;SC_CP_UTF8
really stands for âUnicode,â i.e., any multi-byte encoding.If want to keep the simplicity of
StringBuilder
, you could always reduce the reformatted text to bytes, encode each one, then recompose them into a string, like this:StringBuilder datanew = new StringBuilder(); // ... do the reformat /// try to match the file encoding of the open buffer /// <seealso cref="CsvQuery.PluginInfrastructure.ScintillaStreams.StreamAllText"/> Encoding docEncoding = scintillaGateway.GetCodePage() == (int)SciMsg.SC_CP_UTF8 ? Encoding.UTF8 : Encoding.Default; // update text in editor var byteBuf = new char[datanew.Length]; datanew.CopyTo(0, byteBuf, 0, datanew.Length); var dataBytes = docEncoding.GetBytes(byteBuf); scintillaGateway.SetText(docEncoding.GetString(dataBytes));
Note The fallback choice of
System.Text.Encoding.Default
is just for illustration. Itâs not recommended in practice on .NET Framework. Besides, every character in the ASCII code page fits inside the CLRChar
type (which is always UTF-16). -
@bas-de-reuver said in [New Plugin] CSV Lint:
Thatâs why Iâve created a video to show how you can use this plug-in to validate data, reformat datetime values, split column functions
I got around to watching the video. Very nice intro to the plugin!
-
I had watched your video earlier, but being involved elsewhere with my developing UDL and associated files needing to be done, I didnât get to really appreciate what it was offering. However, now that the language is mostly done, for now, I started going back to a project of mine that has been âslow-rollingâ and started working on it. One of the things that I was trying to do was break down what were fledgling attempts at a quick database that was huge. The data was all needed, I just didnât take the time to break them into smaller usable entities while I was making a quick app for data entry, viewing, searching, etc.
I needed to clean up, and I was able to separate in dBASE some of the table information, in this case, customers (actually shippers and receivers but am combining their information under just customers) and I needed to clean up and split a field. I could have probably done it in my environment, but decided to take the time to see if I could use your plug-in to do some of the work and simplify the cleanup. It worked beautifully, and although I could accomplish it by not converting it to CSV, it was so much simpler just to convert the data and split and clean it up via your plugin.
I just wanted to thank you for developing this plugin, and making the video, that, although I didnât understand all of the capabilities you were mentioning about it at the time, I figured it couldnât hurt to play with it a little, and Iâm very happy I did. Thanks for doing the plugin and video. Keep up the good work. :)
-
@lycan-thrope said in [New Plugin] CSV Lint:
It worked beautifully
Cool, that is also the goal for this plugin; save time by making the inspecting and cleaning of data easier. So thanks đ thatâs nice to hear you found it useful.
-
Iâm not able to get this plugin to work. I need to make sure itâs not adding any text/data. I just want to make the CSV data easier to read.
My CSV has a large number of columns and different headers every few rows. It defaults to FixedLength but after changing to CSVDelimited it just adds the text âXMLâ at the top and nothing changes.Format=FixedLength
ColNameHeader=False
Col1=XML Text Width 9999Format=CSVDelimited
ColNameHeader=False
Col1=XML Text Width 9999 -
any update planned to update CSV Lint to work with current version of notepad ++
-
@t-switzer ,
There already is, but since you havenât posted which version of NPP youâre using, the assumption is that it is the latest version, and yes, there is an update for it. At present, youâll need to delete the current version in the plugin folders, or it wonât allow the new NPP to start. Then after you get it started, you can install the newest plugin via the Plugin manager, or go to this site and download it yourself for a self install: CSVList Github page -
@Tanquen said in [New Plugin] CSV Lint:
My CSV has a large number of columns and different headers every few rows. It defaults to FixedLength but after changing to CSVDelimited it just adds the text âXMLâ at the top and nothing changes.
Thanks for mentioning your issue. It sounds like the plug-in canât recognise this specific data file. I suspect the file includes many
<
or>
characters as well as many,
or;
characters or something like that. This can âconfuseâ the autodetect function so to speak, meaning it canât determine which is the correct separator character, so it doesnât interpret the data and columns correctly.Is it possible to send the data file to my e-mail address (see About dialog)? If it contains privacy sensitive data or is too large, then maybe edit the file and just include a few lines of data to reproduce this issue?
Btw someone metioned a similar issue so in a future update I want to add a where you can (optionally) manually specify the separator character.
In the mean time you can somehow manually construct the meta data, like below.
Format=CSVDelimited
ColNameHeader=False
Col1=Field1 Text Width 50
Col2=Field2 Text Width 50
Col3=Field3 Text Width 50
Col4=Field4 Text Width 50
etc.Or alternatively, first try to delete the rows (if possible) that are causing trouble, so keep only a few rows with representative data, and then click
Refresh from data
, and then apply that resulting metadata to the complete file with all the rows. -
@T-Switzer said in [New Plugin] CSV Lint:
any update planned to update CSV Lint to work with current version of notepad ++
Like @Lycan-Thrope mentioned, there is a new CSV Lint v0.4.5.2 which you can manually download from the github page. That version will be included automatically in the Plugin Admin in the upcoming Notepad++ v8.4.3.
It looks like the compatibility issues with the new Lexer v5 are solved now đ€ and I want to wait and see before continuing and adding too many other features to the plug-in.
-