• Login
Community
  • Login

Advanced replace including right trim

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
8 Posts 5 Posters 671 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • M
    Mike Albers
    last edited by Apr 8, 2025, 3:24 PM

    Hi,
    I have read many posts on the subject but cannot find the desired solution.
    issue is as follows:
    I have a CSV-file that i want to use as an external table in Oracle.
    It is formatted like this:
    <values Field01>;<values Field02>;<values Field03>; <values Field04>
    In fact there are more fields, but my issue is with the last field.
    Field 1 to 3 are defined as Char (255) fields in the oracle external table.
    The last field, Field04, is a memo field that should be max. 4000 characters in the interface CSV-file, but probably due to codeset the last field is way over 4000 Characters. (according to Excel)

    I need to find the pattern like "n-characters field01"Semicolon"n-characters field02"Semicolon"n-characters field03"Semicolon
    and trim the rest after the last semicolon to 4000 characters to let it fit in the Oracle table.
    so field 1 to 3 with variable length + 4000 additional characters for field04.

    Hope that there’s a Notepad++Wizard here that can solve this.

    Thanks in advance!
    Mike

    M 1 Reply Last reply Apr 8, 2025, 4:38 PM Reply Quote 0
    • M
      Mark Olson @Mike Albers
      last edited by Apr 8, 2025, 4:38 PM

      @Mike-Albers
      Pretty sure using the find/replace form to replace ([^\r\n;]{4000})[^\r\n;]+$ with $1 would solve your issue.

      M 2 Replies Last reply Apr 8, 2025, 4:51 PM Reply Quote 2
      • M
        Mike Albers @Mark Olson
        last edited by Apr 8, 2025, 4:51 PM

        @Mark-Olson

        Thank you i will try it out but i am trying to understand the solution too. Can you please explain it a bit?
        As far as i understand you do a search with negative on carriage return linefeed and then a semicolon. But how will that find the last semicolon? And how will it find the following string with a maximum of 4000 characters even when it can be far less or bigger then 4000. Or does a negative search mean that it goes backwards from the end.
        Want to learn from your solution. Not copy/paste it like a monkey. :-)

        1 Reply Last reply Reply Quote 1
        • M
          Mike Albers @Mark Olson
          last edited by Mike Albers Apr 8, 2025, 6:05 PM Apr 8, 2025, 5:25 PM

          @Mark-Olson

          Hi Mark,

          I tested the solution but it is not working. Zero replacements.

          But i forgot to mention that in the last field before \r\n there can be several \n characters in the tekst string.
          In that memo field people entered some tekst or uploaded e-mails etc… So the format is unpredictable. The record should be seen as one line that ends with carriage return linefeed.
          That omission on my part might be why it is not working…

          I also tried to put all the records on their own line by doing the following search/replace first;
          \n := \t\n wich resulted in \t\n in the memo field ending with \r\t\n.
          then
          \r\t\n := \r\n
          then
          \t\n := \t
          After which all the fields were in their own single line.

          Then applied your solution again, but also zero replacements.

          Since i cannot fully comprehend what you gave me, i don’t get it.

          My workaround could be to truncate every line at the 4000th character, but i really try to keep all (up to 4000) characters in that last memo field.

          Hope it can be solved.
          Thanks,
          Mike

          C M 2 Replies Last reply Apr 8, 2025, 6:17 PM Reply Quote 0
          • G
            guy038
            last edited by Apr 8, 2025, 6:10 PM

            Hello, @mike-albers, @mark-olson and All,

            First, @mike-albers, I advice you to backup your file BEFORE trying my S/R :

            If we assume that your last field does not contain any ; nor \r, nor \n within the first 4,000 characters,

            I would simplify the problem, using this regex S/R :

            • FIND ([^;\r\n]{4000})(?s).+$

            • REPLACE $1

            So, for each record :

            • Any character placed in the last field, under 4,001, will be kept

            • Any character placed in the last field, over 4,000, will be deleted !

            Best Regards,

            guy038

            1 Reply Last reply Reply Quote 0
            • C
              Coises @Mike Albers
              last edited by Apr 8, 2025, 6:17 PM

              @Mike-Albers said in Advanced replace including right trim:

              I tested the solution but it is not working. Zero replacements.

              But i forgot to mention that in the last field before \r\n there can be several \n characters in the tekst string.
              In that memo field people entered some tekst or uploaded e-mails etc… So the format is unpredictable. The record should be seen as one line that ends with carriage return linefeed.

              I would suggest looking at @Bas-de-Reuver ’s CSVLint plugin for working with CSV files. I don’t know enough about it to say whether it can solve your problem, but I can say that working in CSV files with regular expressions is error-prone.

              My own Columns++ plugin can convert CSV to tab-separated values and back again; tab-separated values are less problematic to search and manipulate than delimiter-separated values. However, I suggest CSVLint first, if it can do what you need.

              1 Reply Last reply Reply Quote 1
              • M
                Mark Olson @Mike Albers
                last edited by Mark Olson Apr 8, 2025, 6:27 PM Apr 8, 2025, 6:25 PM

                @Mike-Albers
                If you need to match \n but not \r\n, try replacing ((?:[^\r\n;]|(?<!\r)\n){4000})(?:[^\r\n;]|(?<!\r)\n)+(?=\r\n) with $1.

                I’m writing my response on my phone, so I prefer brevity. That said, my new regex uses lookahead and lookbehind. Google those terms for some guidance.

                1 Reply Last reply Reply Quote 0
                • P PeterJones locked this topic on Apr 9, 2025, 1:34 PM
                • P
                  PeterJones
                  last edited by PeterJones Apr 9, 2025, 1:39 PM Apr 9, 2025, 1:39 PM

                  @Mike-Albers deleted the first post, and started a new conversation Topic here.

                  In the future, please do not delete posts that have replies. And if you want to clarify, provide more details, etc, you can just do that in a reply, so that the conversation keeps the original context.

                  To preserve the history of this discussion, the first post has been un-deleted. But since the conversation has moved forward in the new Topic, this old version has been locked.

                  1 Reply Last reply Reply Quote 4
                  2 out of 8
                  • First post
                    2/8
                    Last post
                  The Community of users of the Notepad++ text editor.
                  Powered by NodeBB | Contributors