Community
    • Login

    Can a CSV file be displayed and edited in a table fashion. Excel mangles it.

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    10 Posts 6 Posters 17.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • MyNameIsTooShortM
      MyNameIsTooShort
      last edited by

      I have to deal with long strings of CSV data. If I use Excel to do so it “helps” by stripping doublequotes from the file rendering it useless.
      The issue is that I need to see it in a table format to examine the data in aligned columns like Excel does. Can Notepad++ do this ?

      Below is an example of 2 lines of this data.

      “DI”,“A-DI”,“”,“Digital Placeholder for Dynamo Creation”,“SIM”,“”,“1:0”,“AUTO”,“ON”,“1”,“NO”,“OPEN”,“CLOSE”,“DISABLE”,“NONE”,“LOW”,“NONE”,“DISABLE”,“NONE”,“NONE”,“NONE”,“YES”,“Other”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“NONE”,“YES”,“NO”,“REJECT”,“NO”,“1,000”,“300,000”,“0”,“NO”,“NO”,“0”,“”,“NO”,“5,000.0”,“0.0”,“Milliseconds”,“ENABLE”,“0.0000”,“Absolute”,“0”,“ENABLE”,“CHOICES”
      “DI”,“AAHH_195”,“”,“Ozone Gen Room Hi Amb Ozone ALARM”,“IDS”,“”,“OzGen.33.ALM_AMBIENT_OZONE_HIGH”,“AUTO”,“ON”,“1”,“NO”,“OK”,“ALARM”,“ENABLE”,“NONE”,“CRITICAL”,“CLOSE”,“DISABLE”,“NONE”,“NONE”,“NONE”,“NO”,“Ozone”,“SMS_CRIT”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“”,“Concentration over .3 ppm - Ventilation Started - Generators Shut Down”,“”,“NONE”,“YES”,“NO”,“REJECT”,“NO”,“1,000”,“300,000”,“0”,“NO”,“NO”,“0”,“”,“NO”,“5,000.0”,“0.0”,“Milliseconds”,“DISABLE”,“0.0000”,“Absolute”,“0”,“ENABLE”,“CHOICES”

      CoisesC 1 Reply Last reply Reply Quote 0
      • CoisesC
        Coises @MyNameIsTooShort
        last edited by

        @MyNameIsTooShort said in Can a CSV file be displayed and edited in a table fashion. Excel mangles it.:

        I have to deal with long strings of CSV data. If I use Excel to do so it “helps” by stripping doublequotes from the file rendering it useless.
        The issue is that I need to see it in a table format to examine the data in aligned columns like Excel does. Can Notepad++ do this ?

        Notepad++ cannot do this without adding a plugin, but a plugin I’m developing, called Columns++, can help you.

        In my experience this plugin is stable; however, it’s not yet in Plugins Admin and it has not yet had extensive real-world testing. Be sure you have a safety copy of your data, just in case. The link above goes to the latest release on GitHub; there you will find a link to a “quick installer” which you can use if you have Notepad++ installed in the default location.

        The steps would be to use Convert separated values to tabs…, enable Elastic tabstops, view and edit as needed and then use Convert tabs to separated values… to restore the original format.

        You will want to be sure Preserve quotes, escapes and blanks when converting to tabbed is checked when you convert.

        MyNameIsTooShortM 1 Reply Last reply Reply Quote 2
        • MyNameIsTooShortM
          MyNameIsTooShort @Coises
          last edited by

          That’s a great plugin. It worked so far just as advertised and will be a great help with locating data.
          It doesn’t however treat the data as columns. i.e. I can’t select a column of data to find/replace like you can in Excel - But this is not at all unexpected and probably too much to hope for in a text editor. But worth mentioning in case you had that trick in the queue.
          Because your plugin still put these two in columns while still missing the doublequotes… I’m not sure yet if this is an artifact of your plugin or the entire reason I am having trouble with this file… but I expect its the latter and I will follow up. Thanks again.
          Well I tried to post a pic to explain above, but I have no reputation yet so I can’t.

          Michael VincentM CoisesC Thomas KnoefelT 4 Replies Last reply Reply Quote 1
          • Michael VincentM
            Michael Vincent @MyNameIsTooShort
            last edited by

            @MyNameIsTooShort said in Can a CSV file be displayed and edited in a table fashion. Excel mangles it.:

            It doesn’t however treat the data as columns. i.e. I can’t select a column of data to find/replace like you can in Excel

            If you want that and understand SQL, have a look at CSVQuery.

            Cheers.

            1 Reply Last reply Reply Quote 1
            • CoisesC
              Coises @MyNameIsTooShort
              last edited by

              @MyNameIsTooShort said in Can a CSV file be displayed and edited in a table fashion. Excel mangles it.:

              I can’t select a column of data to find/replace like you can in Excel

              Notepad++ can make rectangular selections. Click at one corner of the rectangle you want to select; then Alt+Shift+click at the diagonally opposite corner. See the help on Rectangular selections for a bit more information. (One tip is that if you make an ordinary selection in a single line covering the column or columns you want, when a rectangular selection is needed, Columns++ will offer to extend your selection vertically. That makes it easy to select a column, and to start on the line after the headers if that’s want you want.)

              The native search in Notepad++ can’t find or replace in rectangular selections, but the Search… function in Columns++ can. (Strictly speaking, it creates a search region from a rectangular selection — a complexity I didn’t like, but found necessary to make everything consistent and usable.)

              1 Reply Last reply Reply Quote 2
              • CoisesC
                Coises @MyNameIsTooShort
                last edited by

                @MyNameIsTooShort said in Can a CSV file be displayed and edited in a table fashion. Excel mangles it.:

                Because your plugin still put these two in columns while still missing the doublequotes… I’m not sure yet if this is an artifact of your plugin or the entire reason I am having trouble with this file… but I expect its the latter and I will follow up.

                I’m not yet sure what you mean, but if it is stripping the double quotes when converting from comma-separated values to tabs and you don’t want that, be sure the box Preserve quotes, escapes and blanks when converting to tabbed is checked.

                If you do not have that box checked, it will remove the quotes in your example when converting to tabbed. It won’t put them back when you convert back to comma-separated values because they aren’t required in a standards-compliant CSV file. I don’t have an option to quote all entries (though you could do that with regular expressions) — unless it interferes with your editing, it’s best to leave the quotes in place by checking the box if you need them.

                If it’s removing quotes when you checked that box, there is a bug.

                1 Reply Last reply Reply Quote 3
                • Mark OlsonM
                  Mark Olson
                  last edited by Mark Olson

                  CSVLint is also good. It can handle some pretty pathological files.

                  Lastly, here’s a regular expression for matching a single column in any delimiter-separated variables file (replace all NEWLINE with your newline (\r, \n, or \r\n), QUOTE_CHAR with your quote character, and DELIMITER with your delimiter.
                  (?'column'(?:(?!QUOTE_CHAR)(?:(?!NEWLINE)[^DELIMITER])*(?<!QUOTE_CHAR)|QUOTE_CHAR(?:[^QUOTE_CHAR]|\\QUOTE_CHAR)*QUOTE_CHAR))
                  Putting it into practice, here’s that regex used to match all seven columns in separate capture groups for a 7-column CSV file with " as quote character and \r\n newline:
                  ^(?'column'(?:(?!")(?:(?!\r\n)[^,])*(?<!")|"(?:[^"]|\\")*")),((?&column)),((?&column)),((?&column)),((?&column)),((?&column)),((?&column))$

                  You can verify that a regex works for capturing 7 columns of a CSV file by testing if find/replacing your regex with (?1a),(?2b),(?3c),(?4d),(?5e),(?6f),(?7g) replaces every row of the file with a,b,c,d,e,f,g

                  1 Reply Last reply Reply Quote 1
                  • guy038G
                    guy038
                    last edited by guy038

                    Hello, @mark-olson and All,

                    I think I succeeded to simplify your generic regex. To be convinced of this fact, the magic regex, when QUOTE_CHAR = " and DELIMITER_CHAR = , is :

                    • MARK (?:(?<=^")|(?<=","))[^\r\n]*?(?=","|"$)

                    Now, test this regex against the text below, with 63 examples, pasted in a new tab

                    # With a COMMA or DOUBLE QUOTES char in MIDDLE of column :
                    
                    "abc,def","789"
                    "abc"def","789"
                    "123","abc,def","789"
                    "123","abc"def","789"
                    "123","abc,def"
                    "123","abc"def"
                    
                    # With a COMMA or DOUBLE QUOTES char at BEGINNING of column :
                    ",def","789"
                    ""def","789"
                    "123",",def","789"
                    "123",""def","789"
                    "123",",def"
                    "123",""def"
                    
                    # With a COMMA or DOUBLE QUOTES char at END of column :
                    "abc,","789"
                    "abc"","789"
                    "123","abc,","789"
                    "123","abc"","789"
                    "123","abc,"
                    "123","abc""
                    
                    # With a COMMA or DOUBLE QUOTES char as a COMPLETE column :
                    ",","789"
                    ""","789"
                    "123",",","789"
                    "123",""","789"
                    "123",","
                    "123","""
                    
                    # With an EMPTY column :
                    "","789"
                    "123","","789"
                    "123",""
                    
                    # SINGLE column cases :
                    ""
                    "abc"
                    ",def"
                    "abc,def"
                    "abc,"
                    ","
                    ""def"
                    "abc"def"
                    "abc""
                    """
                    
                    # MICELLANEOUS SINGLE columns :
                    "ab""cd"
                    "ab",cd"
                    "ab,"cd"
                    "ab,"",cd"
                    "ab",,"cd"
                    "ab,""",cd"
                    "ab",,,"cd"
                    "ab"",,cd"
                    "ab,,""cd"
                    
                    """abcd"
                    "",abcd"
                    ","abcd"
                    ","",abcd"
                    "",,"abcd"
                    ",""",abcd"
                    "",,,"abcd"
                    """,,abcd"
                    ",,""abcd"
                    
                    "abcd"""
                    "abcd","     # KO
                    "abcd,""
                    "abcd,"","   # KO
                    "abcd",,""
                    "abcd,""","  # KO
                    "abcd",,,""
                    "abcd"",,"
                    "abcd,,"""
                    
                    • As you can see, almost all the cases are correctly detected

                    • Empty strings are also matched with the zero length match call-tip

                    • Three columns ONLY resist ( as it ends with the "," string ). However, we should live with these very special cases !

                    • The regex always get the total content of each column, minus the 3 cases, mentioned above !

                    • NO need to escape a possible QUOTE_CHAR or DELIMITER_CHAR ( as \" or \, ) in the column contents, in order to get the all its content ;-))


                    So, for a seven columns CSV table exactly, we need this regex ( I shortened the column name to col ) :

                    • SEARCH "(?'col'(?:(?:(?<=^")|(?<=","))[^\r\n]*?(?=","|"$)))","((?&col))","((?&col))","((?&col))","((?&col))","((?&col))","((?&col))"

                    Test it against the following text :

                    "abcd","efgh","ijkl","mnop","qrst","uvwx","yz01"
                    ",abcd","efgh","ij,kl","mnop","qrst,","uvwx",","
                    ""abcd","efgh","ij"kl","mnop","qrst"","uvwx","""
                    "","","","","","",""
                    

                    Thus, the corresponding generic regex would be :

                    "(?'col'(?:(?:(?<=^QUOTE_CHAR)|(?<=QUOTE_CHARDELIMITER_CHARQUOTE_CHAR))[^NEW_LINE_CHAR(s)]*?(?=QUOTE_CHARDELIMITER_CHARQUOTE_CHAR|QUOTE_CHAR$)))","((?&col))","((?&col))","((?&col))","((?&col))","((?&col))","((?&col))"

                    Best Regards,

                    guy038

                    Mark OlsonM 1 Reply Last reply Reply Quote 0
                    • Thomas KnoefelT
                      Thomas Knoefel @MyNameIsTooShort
                      last edited by Thomas Knoefel

                      @MyNameIsTooShort said in Can a CSV file be displayed and edited in a table fashion. Excel mangles it.:

                      It doesn’t however treat the data as columns. i.e. I can’t select a column of data to find/replace like you can in Excel - But this is not at all unexpected and probably too much to hope for in a text editor. But worth mentioning in case you had that trick in the queue.

                      You can also give the MultiReplace Plugin a try, ? It allows you to filter find and replace operations by a specific CSV column scope, offering precise editing within your CSV files.

                      1 Reply Last reply Reply Quote 1
                      • Mark OlsonM
                        Mark Olson @guy038
                        last edited by Mark Olson

                        @guy038
                        Your regex is better if everything is quoted, but that’s not really what I meant.

                        In most CSV files, a column is only wrapped in quotes if it contains the separator or a newline.

                        3223309f-ecd6-4839-adbd-c3ff8c9fe1db-image.png

                        In the above example (pretty colors courtesy of CSVLint), most columns aren’t wrapped in quotes except column 3 of row 5 and column 2 of row 6. The other part of my regex is the (?:[^"]|\\")* alternation inside the quotes, which is necessary to allow an escaped quote character to be present inside quotes.

                        Using my column regex (?'column'(?:(?!")(?:(?!\r\n)[^,])*(?<!")|"(?:[^"]|\\")*")) converts

                        nums,names,cities,date,zone,subzone,contaminated
                        nan,Bluds,BUS,,1,b,FALSE
                        0.5,dfsd,FUDG,12/13/2020 0:00,2,c,TRUE
                        1.2,qere,GOLAR,,3,f,TRUE
                        3.4,flodt,"comma,in \"column\"",,4,q,FALSE
                        4.6,"newline
                        in column",QUZ,10/17/2014 0:00,5,"",TRUE
                        7,Unyir,MOKJI,5/11/2017 0:00,6,i,TRUE
                        

                        to seven lines of a,b,c,d,e,f,g as expected.

                        1 Reply Last reply Reply Quote 3
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors