Community
    • Login

    Advance Replace including right trim (repost with example)

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    15 Posts 6 Posters 390 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Mike AlbersM
      Mike Albers
      last edited by

      Hi,
      I have read many posts on the subject but cannot find the desired solution. I posted this question before but repost it now with example and something i forgot to mention. (couldn’t edit original post)
      For those who replied to my original deleted post, thanks so far but the solutions didn’t work.
      issue is as follows:
      I have a CSV-file that i want to use as an external table in Oracle.
      It is formatted like this:
      <values Field01>;<values Field02>;<values Field03>; <values Field04>
      In fact there are more fields, but my issue is with the last field.
      Field 1 to 3 are defined as Char (255) fields in the oracle external table.
      The last field, Field04, is a memo field that should be max. 4000 characters in the interface CSV-file, but probably due to codeset the last field is way over 4000 Characters. (according to Excel)
      In this last field LineFeed characters might be present.
      End of record/line is with CarriageReturn + LineFeed.

      I need to find the pattern like "n-characters field01"Semicolon"n-characters field02"Semicolon"n-characters field03"Semicolon (or last delimiter found)
      and trim the rest after the last semicolon to 4000 characters to let it fit in the Oracle table.
      so field 1 to 3 with variable length + 4000 additional characters for field04.

      For my example instead of max 4000 characters lets say it is max 10 characters until EOL i want in the last field.
      I want this original file:
      xxxxxxxxxxxx;yyyy;zzzzzzzzzz;12\n34\n5678901234567890\r\n
      xxx;yyyyyyyy;zzzzzz;12345\r\n
      xxxxxx;yyy;zzzzzzzzzzzzzzzzzzz;1\n2\n3456789012345678901234567890\r\n
      xxxxxx;yyyyy;zzzzzzzzzzzzzzzzzzzzz;1234567890\r\n

      changed/replaced to:
      xxxxxxxxxxxx;yyyy;zzzzzzzzzz;12\n34\n5678\r\n
      xxx;yyyyyyyy;zzzzzz;12345\r\n
      xxxxxx;yyy;zzzzzzzzzzzzzzzzzzz;1\n2\n345678\r\n
      xxxxxx;yyyyy;zzzzzzzzzzzzzzzzzzzzz;1234567890\r\n

      Hope that there’s a Notepad++Wizard here that can solve this.

      Thanks in advance!
      Mike

      Terry RT CoisesC 2 Replies Last reply Reply Quote 0
      • Terry RT
        Terry R @Mike Albers
        last edited by

        @Mike-Albers

        My immediate thought was to “undelete” your deleted post. I am a (recent new) moderator and I consider you deleting that post after others have posted replies to it to be “bad form”. You probably had the best of intentions but anyone coming along some time in the future will attempt to read that thread and it won’t make sense.

        That was your first mistake, the second was to start a new thread. It would have been preferable to just append this post to the original thread. We often have first time posters adding additional relevant information after prodding by the other members.

        Since this is just my opinion I won’t “undelete” nor “move” this to the original thread. I will leave that to a more senior moderator to consider.

        While I’m telling you what you should have done, here’s another one. When showing examples, please include them in a code box. That helps to prevent the posting engine from mangling the data.

        All of this information is in the FAQ and also pinned at the start of each category. Unfortunately you, like many new posters, don’t read them. Yet you post here in the hope to get answers, so I guess my question is why didn’t you read the FAQ and pinned posts?

        Terry

        Mike AlbersM 1 Reply Last reply Reply Quote 3
        • CoisesC
          Coises @Mike Albers
          last edited by Coises

          @Mike-Albers:

          I think the expression you want is this:

          ^((?:(?:[^";\r\n]*+|\h*+"(?:[^"]|"")*+"\h*+);){3})(?:([^";\r\n]{0,4000}+)[^";\r\n]*+|(\h*+"(?:[^"]|""){0,4000}+)(?:[^"]|"")*+("\h*+))$

          with replacement:

          $1$2$3$4

          In your example, you have values containing new line characters that are not quoted; that’s normally invalid in a CSV. If you really have that in your file, change the [^";\r\n] sequences to [^";\r].

          1 Reply Last reply Reply Quote 0
          • guy038G
            guy038
            last edited by guy038

            Hi, @mike-albers, @mark-olson, @terry-r, @coises and All,

            Ah, I now understand that \n may occur in the first 4,000 characters of the last field !

            A completely different goal to reach !

            If I still assume that no line-break occurs in the first 3 fields

            And given the INPUT file :

            xxxxxxxxxxxx;yyyy;zzzzzzzzzz;12
            34
            5678901234567890
            xxx;yyyyyyyy;zzzzzz;12345
            xxx;yyyyyyyy;zzzzzz;
            xxxxxx;yyy;zzzzzzzzzzzzzzzzzzz;1
            2
            3456789012345678901234567890
            xxxxxx;yyyyy;zzzzzzzzzzzzzzzzzzzzz;1234567890
            

            IMPORTANT : After pasting the INPUT code text above in a new tab, you must change, at the end, the current \r\n line-break by \n in lines 1, 2, 6 and 7, as shown below, BEFORE you apply the regex S/R !

            1a741910-920e-4c99-bcd8-846df0f4d8b1-image.png

            The following regex S/R should work :

            • FIND (?s)^(?:[^\r\n;]+;){3}.{0,10}\r\n|^((?:[^\r\n;]+;){3}.{10}).+?\r\n

            • REPLACE ?1$1\r\n:$0

            And produce this OUTPUT text :

            xxxxxxxxxxxx;yyyy;zzzzzzzzzz;12
            34
            5678
            xxx;yyyyyyyy;zzzzzz;12345
            xxx;yyyyyyyy;zzzzzz;
            xxxxxx;yyy;zzzzzzzzzzzzzzzzzzz;1
            2
            345678
            xxxxxx;yyyyy;zzzzzzzzzzzzzzzzzzzzz;1234567890
            

            325fd192-722f-449e-bc09-33dd4f8240e3-image.png

            As you can see, after the last ; of each record :

            • The string 12\n34\n5678, in lines 1, 2 and 3, correctly contains 10 characters and the final \r\n

            • The line 4 contains the string 12345 and the final \r\n

            • The line 5 contains an empty string and the final \r\n

            • The string 1\n2\n345678, in lines 6, 7 and 8, correctly contains 10 characters and the final \r\n

            • The line 9 contains the string 1234567890 and the final \r\n


            Now, as you said :

            In fact there are more fields, but my issue is with the last field.

            I suppose that you must change the numbers 3 of the regex by the exact number of fields before the last one, which size is over 4,000 characters

            Thus, the general regex S/R is :

            • FIND (?s)^(?:[^\r\n;]+;){N}.{0,4000}\r\n|^((?:[^\r\n;]+;){N}.{4000}).+?\r\n

            • REPLACE ?1$1\r\n:$0

            Where N is the number of fields before the last one !

            If, in addition, the number of fields is variable, you could change the two [3} syntaxes by the {x,y} syntax, where x and y represent integers

            Best regards,

            guy038

            Mike AlbersM Alan KilbornA 3 Replies Last reply Reply Quote 0
            • Mike AlbersM
              Mike Albers @Terry R
              last edited by Mike Albers

              @Terry-R Sorry Terry, I did not read all of it. I was confused because it wasn’t possible to edit my original post after 4 hours of first posting. Since there were just a few respondents it seemed better to have the complete issue on top of the post. Otherwise new replies would probably be based on old information provided. I guess that people will not go through all of the discussion first.
              sorry for the inconvenience.
              Would be nice when after the 4 hours a direct timestamped-addendum at the original post would be allowed instead of a reply.

              Will not make the same mistake again.

              Keep up the good work!

              1 Reply Last reply Reply Quote 0
              • Mike AlbersM
                Mike Albers @guy038
                last edited by Mike Albers

                @guy038 Hi Guy, i think your solution is working after all.
                in the tool something strange happens. But in Notepad it seems to work properly.
                I tried it out on the testfile with the \n characters in the 4th field.
                Now i will try it on my real life CSV file to see what happens there.

                So far so good.

                Thanks!

                Alan KilbornA 1 Reply Last reply Reply Quote 1
                • Alan KilbornA
                  Alan Kilborn @Mike Albers
                  last edited by

                  @Mike-Albers said:

                  I tried out your solution with the online regex tool at regex101 site but it is not working.

                  Some of these regexes are quite “involved”. The more involved they are, the less likely they are to work in both regex101 and Notepad++; the reason for this is that they use different regular expression engines and all engines have nuanced processing when the regexes are not simple. It may not be the case here, but you should try all advice provided in Notepad++'s replace before coming to a conclusion.

                  Mike AlbersM 1 Reply Last reply Reply Quote 0
                  • Alan KilbornA
                    Alan Kilborn @guy038
                    last edited by Alan Kilborn

                    @guy038

                    This post has 7 revisions

                    As I typed my reply, I kept seeing screen flashes, so I investigated.
                    It appears that Guy is uber-editing his earlier response.
                    Hopefully, he’s not changing history, and and is always making harmless edits.
                    Otherwise, how is @Mike-Albers to “keep up” with the advice being provided?


                    EDIT: Now:

                    This post has 8 revisions

                    1 Reply Last reply Reply Quote 0
                    • guy038G
                      guy038
                      last edited by guy038

                      Hello, @alan-kilborn and All,

                      I agree that I edited my previous post a lot of times.

                      But it’s just because if you just paste the INPUT text in a new tab, you get all the sentences with a final line-break = \r\n

                      And, of course, the regex S/R would not work in this case :-((

                      BR

                      guy038

                      1 Reply Last reply Reply Quote 0
                      • Mike AlbersM
                        Mike Albers @Alan Kilborn
                        last edited by

                        @Alan-Kilborn you are right. I jumped to conslusions.
                        Tried in notepad++ and the solution from guy seems to work after all. Changed the reply asap. :-)

                        PeterJonesP 1 Reply Last reply Reply Quote 1
                        • PeterJonesP
                          PeterJones @Mike Albers
                          last edited by PeterJones

                          @Alan-Kilborn said,

                          It appears that Guy is uber-editing his earlier response

                          @Mike-Albers said,

                          Changed the reply asap

                          In general, my preference is that posts not get edited after there’s a reply, because that breaks the flow of the conversation. In extreme circumstances, if there is an edit after a reply, I highly encourage marking it like “edit: xyz” or similar, or, if there’s a bunch of information that turns out to be wrong, using the ~~~ to strikethrough, like, “old incorrect information [edited: see my reply below]” . This allows people to be able to see what was being responded to in the immediate replies, but informs them that something has been updated.

                          It should be noted that even changing a post before there are any replies is dangerous, because someone may have read your original, and maybe even replying with quoting your original text, and having your text now be different makes it look like the person is misquoting you, which they are not actually doing.

                          (This discussion has a case in point: Alan quoted the regex101 line, and now it’s been edited away.)

                          As said in another forum where I spend a lot of time, “It is uncool to update a [post] in a way that renders replies confusing or meaningless”.

                          Alan KilbornA 1 Reply Last reply Reply Quote 2
                          • Alan KilbornA
                            Alan Kilborn @PeterJones
                            last edited by

                            @PeterJones

                            Amen. Don’t change posts after posting them, unless you are 100% sure you aren’t changing any meaning. That is, only change an obvious typo (but NOT one in an “expression”). Otherwise, follow Peter’s excellent advice.

                            1 Reply Last reply Reply Quote 0
                            • mkupperM mkupper referenced this topic on
                            • Mike AlbersM
                              Mike Albers @guy038
                              last edited by Mike Albers

                              @guy038 Hi guy, I studied your solution and Regex itself and it starts to dawn at me.
                              I changed my testfile and tried in addition how to handle empty fields. For that i changed your searchstring a tiny bit but also added an extra OR clause.
                              It seems to work properly now.

                              My latest testfile was like this:
                              TESTFILE_02.JPG

                              My search pattern is now:
                              (?s)^(?:[^\r\n;];){3}.{0,24}\r\n|^((?:[^\r\n;];){3}.{0,24}).?\r\n|^(?:[^\r\n;]?)\r\n

                              The replace statement is still yours:
                              ?1$1\r\n:$0

                              Result was:
                              File_after_replace.JPG

                              I tried to figure out the replace string, but i don’t get it.
                              (tried selfstudy on it with Regex0101 tool bit by bit but since it is not 100% compatible i couldn’t figure it out myself.) Really no lazyness on my part here when i ask my question.
                              So i hope you can explain it step by step for me.

                              Thanks!

                              Mike AlbersM 1 Reply Last reply Reply Quote 0
                              • Mike AlbersM
                                Mike Albers @Mike Albers
                                last edited by PeterJones

                                @Mike-Albers
                                Addition for people that will use this final solution in an Oracle database:
                                I had memo-fields in my export CSV-file that were restricted to 4000 CHAR positions. (default is max 4000 in bytes)
                                The regex-search was:

                                (?s)^(?:[^\r\n;]*;){N}.{0,3996}\r\n|^((?:[^\r\n;]*;){N}.{0,3996}).*?\r\n|^(?:[^\r\n;]*?)\r\n
                                

                                N normal fields + 1 memo-field that contains linefeeds in the text.

                                In Oracle i created an external file based on the CSV-file with truncated memo field.
                                It proofed impossible to read 4000 characters because the \r\n is taken into the fieldvalue.
                                Once in Oracle the linefeeds are counted as single characters, but i think that is based on codeset-settings. When counting the length of the oracle CHAR(4000 CHAR) field i found a max length of 3970 instead of 3996 as my truncate position.
                                Keep this in mind otherwise you will get problems with reject limits.

                                —

                                moderator added code markdown around text; please don’t forget to use the </> button to mark example text as “code” so that characters don’t get changed by the forum

                                Alan KilbornA 1 Reply Last reply Reply Quote 0
                                • Alan KilbornA
                                  Alan Kilborn @Mike Albers
                                  last edited by

                                  @Mike-Albers said:

                                  (?s)^(?:[^\r\n;];){N}.{0,3996}\r\n|^((?:[^\r\n;];){N}.{0,3996}).?\r\n|^(?:[^\r\n;]?)\r\n

                                  Unfortunately, your regular expression was corrupted because you didn’t post it correctly.
                                  Probably a moderator will come along and examine your “raw” original post and correct it.

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  The Community of users of the Notepad++ text editor.
                                  Powered by NodeBB | Contributors