Community
    • Login

    Delete ";" in a description column

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 2 Posters 485 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Jose Emilio OsorioJ
      Jose Emilio Osorio
      last edited by

      Hi,
      I need to delete the characters “;” that are in a column named “Description”, but the problem is that I have more than 3 columns separated by “;” as well (Code, Quantity, Price).
      Example:
      Code;Description;Quantity;Price
      0001;Chocol;te;12.00;1.50

      I need as follow:
      Code;Description;Quantity;Price
      0001;Chocol te;12.00;1.50

      The code always have 4 characters, the problem is that the description has no fixed number of characters.

      The file has more than a million rows.

      Thanks for your help.

      PeterJonesP 1 Reply Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @Jose Emilio Osorio
        last edited by PeterJones

        @Jose-Emilio-Osorio ,

        In general, it’s better to fix poorly formatted CSV (where “c” means “character”, not “comma”, so “semicolon-separated values” is also a form of CSV) at the source, rather than after the fact.

        If you cannot, and assuming that the rule that I am about to state will work for your data (it won’t work for every data; if there’s ever quotes or escapes that change the rules, my regex will not work), then here’s a go.

        Rule: first semicolon is always valid; last two semicolons are always valid (the penultimate and ultimate); any semicolon between the first semicolon and the penultimate semicolon is assumed invalid, and will be converted into a space character.

        • FIND = ^[^;]*;[^;]*\K;(?=.*;[^;]*;[^;]*$)
        • REPLACE = space character
        • SEARCH MODE = Regular Expression
        • hit Replace All until they are all gone

        This finds the beginning of the line, any number of non-semicolons, then the first semicolon, then any number of non-semicolons, then resets the match. This effectively moves the cursor to just at the second semicolon in your line.

        Then it matches the literal second semicolon in your line.

        The stuff in the parentheses is a lookahead saying, effectively, if there are at least two semicolons later in this line, then the semicolon we just found is not the first, penultimate, or ultimate, so we want to replace it with a space.

        With this version, you have to hit replace all multiple times in some circumstances.

        0001;Chocol;te;12.00;1.50
        0001;C;o;o;a;e;12.00;1.50
        0001;;;;;;;;;;;12.00;1.50
        0001;Chocolate;12.00;1.50
        

        becomes

        0001;Chocol te;12.00;1.50
        0001;C o o a e;12.00;1.50
        0001;         ;12.00;1.50
        0001;Chocolate;12.00;1.50
        

        after 9 Replace Alls

        If there are N+3 semicolons on the “worst” line, it will only require N hits of Replace All

        Ooh, an idea. Using the \G to mean “start from the previous match”, and an alternation,

        • FIND = (^[^;]*;|\G)[^;]*\K;(?=.*;[^;]*;[^;]*$)
        • other parameters the same

        it will now match “after the first semicolon, or after the previous match”, as long as there are still at least two semicolons after the current semicolon. This means that the same data will only require one Replace All to work.

        There are probably edge cases that are not handled in the expressions given. Oh, like what if there are lines that don’t have at least three semicolons… In certain circumstances, you might need to change [^;]* (which says “0 or more non-semicolons”) into [^;\r\n]* (which says “0 or more characters that are neither semicolons nor CR nor LF (EOL sequences)”) – there might be some way that the simpler form would match across multiple lines; with that correction, it wouldn’t. But my existing expressions work with my test data.

        If that’s not sufficient, study the information below, try modifying it yourself, and if you’re having difficulty understanding where it’s going wrong, ask with more details (including what you tried; make sure to include that you tried my exact expression(s) with my sample dummy data set)

        ----

        Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as plain text using the </> toolbar button or manual Markdown syntax. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get… Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

        Jose Emilio OsorioJ 1 Reply Last reply Reply Quote 0
        • Jose Emilio OsorioJ
          Jose Emilio Osorio
          last edited by

          Thank you very much. It helps me a lot.

          Now, I just have received another file with a Column added at the beginning and need to replace de “;” in the Description column. (As previous problem)

          Example:
          New line:
          STORE;CODE;DESCRIPTION;QUANTITY;PRICE
          0036;359116;CHOCOLA;E;0.000;4.993028
          Must be:
          STORE;CODE;DESCRIPTION;QUANTITY;PRICE
          0036;359116;CHOCOLA E;0.000;4.993028

          What do I have to modify on the regex that you sent before ?

          Thanks for your help.

          Jose Emilio OsorioJ 1 Reply Last reply Reply Quote 0
          • Jose Emilio OsorioJ
            Jose Emilio Osorio @PeterJones
            last edited by

            This post is deleted!
            1 Reply Last reply Reply Quote 0
            • Jose Emilio OsorioJ
              Jose Emilio Osorio @Jose Emilio Osorio
              last edited by

              @PeterJones
              Hi Peter,
              After a lot of tries, it works with the following expression:

              (^[^;];[^;];|\G)[^;]\K;(?=.;[^;];[^;]$)

              Thank you very much.

              1 Reply Last reply Reply Quote 2
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors