Community
    • Login

    Advanced replace including right trim

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    8 Posts 5 Posters 121 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Mike AlbersM
      Mike Albers
      last edited by

      Hi,
      I have read many posts on the subject but cannot find the desired solution.
      issue is as follows:
      I have a CSV-file that i want to use as an external table in Oracle.
      It is formatted like this:
      <values Field01>;<values Field02>;<values Field03>; <values Field04>
      In fact there are more fields, but my issue is with the last field.
      Field 1 to 3 are defined as Char (255) fields in the oracle external table.
      The last field, Field04, is a memo field that should be max. 4000 characters in the interface CSV-file, but probably due to codeset the last field is way over 4000 Characters. (according to Excel)

      I need to find the pattern like "n-characters field01"Semicolon"n-characters field02"Semicolon"n-characters field03"Semicolon
      and trim the rest after the last semicolon to 4000 characters to let it fit in the Oracle table.
      so field 1 to 3 with variable length + 4000 additional characters for field04.

      Hope that there’s a Notepad++Wizard here that can solve this.

      Thanks in advance!
      Mike

      Mark OlsonM 1 Reply Last reply Reply Quote 0
      • Mark OlsonM
        Mark Olson @Mike Albers
        last edited by

        @Mike-Albers
        Pretty sure using the find/replace form to replace ([^\r\n;]{4000})[^\r\n;]+$ with $1 would solve your issue.

        Mike AlbersM 2 Replies Last reply Reply Quote 2
        • Mike AlbersM
          Mike Albers @Mark Olson
          last edited by

          @Mark-Olson

          Thank you i will try it out but i am trying to understand the solution too. Can you please explain it a bit?
          As far as i understand you do a search with negative on carriage return linefeed and then a semicolon. But how will that find the last semicolon? And how will it find the following string with a maximum of 4000 characters even when it can be far less or bigger then 4000. Or does a negative search mean that it goes backwards from the end.
          Want to learn from your solution. Not copy/paste it like a monkey. :-)

          1 Reply Last reply Reply Quote 1
          • Mike AlbersM
            Mike Albers @Mark Olson
            last edited by Mike Albers

            @Mark-Olson

            Hi Mark,

            I tested the solution but it is not working. Zero replacements.

            But i forgot to mention that in the last field before \r\n there can be several \n characters in the tekst string.
            In that memo field people entered some tekst or uploaded e-mails etc… So the format is unpredictable. The record should be seen as one line that ends with carriage return linefeed.
            That omission on my part might be why it is not working…

            I also tried to put all the records on their own line by doing the following search/replace first;
            \n := \t\n wich resulted in \t\n in the memo field ending with \r\t\n.
            then
            \r\t\n := \r\n
            then
            \t\n := \t
            After which all the fields were in their own single line.

            Then applied your solution again, but also zero replacements.

            Since i cannot fully comprehend what you gave me, i don’t get it.

            My workaround could be to truncate every line at the 4000th character, but i really try to keep all (up to 4000) characters in that last memo field.

            Hope it can be solved.
            Thanks,
            Mike

            CoisesC Mark OlsonM 2 Replies Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by

              Hello, @mike-albers, @mark-olson and All,

              First, @mike-albers, I advice you to backup your file BEFORE trying my S/R :

              If we assume that your last field does not contain any ; nor \r, nor \n within the first 4,000 characters,

              I would simplify the problem, using this regex S/R :

              • FIND ([^;\r\n]{4000})(?s).+$

              • REPLACE $1

              So, for each record :

              • Any character placed in the last field, under 4,001, will be kept

              • Any character placed in the last field, over 4,000, will be deleted !

              Best Regards,

              guy038

              1 Reply Last reply Reply Quote 0
              • CoisesC
                Coises @Mike Albers
                last edited by

                @Mike-Albers said in Advanced replace including right trim:

                I tested the solution but it is not working. Zero replacements.

                But i forgot to mention that in the last field before \r\n there can be several \n characters in the tekst string.
                In that memo field people entered some tekst or uploaded e-mails etc… So the format is unpredictable. The record should be seen as one line that ends with carriage return linefeed.

                I would suggest looking at @Bas-de-Reuver ’s CSVLint plugin for working with CSV files. I don’t know enough about it to say whether it can solve your problem, but I can say that working in CSV files with regular expressions is error-prone.

                My own Columns++ plugin can convert CSV to tab-separated values and back again; tab-separated values are less problematic to search and manipulate than delimiter-separated values. However, I suggest CSVLint first, if it can do what you need.

                1 Reply Last reply Reply Quote 1
                • Mark OlsonM
                  Mark Olson @Mike Albers
                  last edited by Mark Olson

                  @Mike-Albers
                  If you need to match \n but not \r\n, try replacing ((?:[^\r\n;]|(?<!\r)\n){4000})(?:[^\r\n;]|(?<!\r)\n)+(?=\r\n) with $1.

                  I’m writing my response on my phone, so I prefer brevity. That said, my new regex uses lookahead and lookbehind. Google those terms for some guidance.

                  1 Reply Last reply Reply Quote 0
                  • PeterJonesP PeterJones locked this topic on
                  • PeterJonesP
                    PeterJones
                    last edited by PeterJones

                    @Mike-Albers deleted the first post, and started a new conversation Topic here.

                    In the future, please do not delete posts that have replies. And if you want to clarify, provide more details, etc, you can just do that in a reply, so that the conversation keeps the original context.

                    To preserve the history of this discussion, the first post has been un-deleted. But since the conversation has moved forward in the new Topic, this old version has been locked.

                    1 Reply Last reply Reply Quote 4
                    • First post
                      Last post
                    The Community of users of the Notepad++ text editor.
                    Powered by NodeBB | Contributors