Advanced replace including right trim
-
Hi,
I have read many posts on the subject but cannot find the desired solution.
issue is as follows:
I have a CSV-file that i want to use as an external table in Oracle.
It is formatted like this:
<values Field01>;<values Field02>;<values Field03>; <values Field04>
In fact there are more fields, but my issue is with the last field.
Field 1 to 3 are defined as Char (255) fields in the oracle external table.
The last field, Field04, is a memo field that should be max. 4000 characters in the interface CSV-file, but probably due to codeset the last field is way over 4000 Characters. (according to Excel)I need to find the pattern like "n-characters field01"Semicolon"n-characters field02"Semicolon"n-characters field03"Semicolon
and trim the rest after the last semicolon to 4000 characters to let it fit in the Oracle table.
so field 1 to 3 with variable length + 4000 additional characters for field04.Hope that there’s a Notepad++Wizard here that can solve this.
Thanks in advance!
Mike -
@Mike-Albers
Pretty sure using the find/replace form to replace([^\r\n;]{4000})[^\r\n;]+$
with$1
would solve your issue. -
Thank you i will try it out but i am trying to understand the solution too. Can you please explain it a bit?
As far as i understand you do a search with negative on carriage return linefeed and then a semicolon. But how will that find the last semicolon? And how will it find the following string with a maximum of 4000 characters even when it can be far less or bigger then 4000. Or does a negative search mean that it goes backwards from the end.
Want to learn from your solution. Not copy/paste it like a monkey. :-) -
Hi Mark,
I tested the solution but it is not working. Zero replacements.
But i forgot to mention that in the last field before \r\n there can be several \n characters in the tekst string.
In that memo field people entered some tekst or uploaded e-mails etc… So the format is unpredictable. The record should be seen as one line that ends with carriage return linefeed.
That omission on my part might be why it is not working…I also tried to put all the records on their own line by doing the following search/replace first;
\n := \t\n wich resulted in \t\n in the memo field ending with \r\t\n.
then
\r\t\n := \r\n
then
\t\n := \t
After which all the fields were in their own single line.Then applied your solution again, but also zero replacements.
Since i cannot fully comprehend what you gave me, i don’t get it.
My workaround could be to truncate every line at the 4000th character, but i really try to keep all (up to 4000) characters in that last memo field.
Hope it can be solved.
Thanks,
Mike -
Hello, @mike-albers, @mark-olson and All,
First, @mike-albers, I advice you to backup your file BEFORE trying my S/R :
If we assume that your last field does not contain any
;
nor\r
, nor\n
within the first4,000
characters,I would simplify the problem, using this regex S/R :
-
FIND
([^;\r\n]{4000})(?s).+$
-
REPLACE
$1
So, for each record :
-
Any character placed in the last field, under
4,001
, will be kept -
Any character placed in the last field, over
4,000
, will be deleted !
Best Regards,
guy038
-
-
@Mike-Albers said in Advanced replace including right trim:
I tested the solution but it is not working. Zero replacements.
But i forgot to mention that in the last field before \r\n there can be several \n characters in the tekst string.
In that memo field people entered some tekst or uploaded e-mails etc… So the format is unpredictable. The record should be seen as one line that ends with carriage return linefeed.I would suggest looking at @Bas-de-Reuver ’s CSVLint plugin for working with CSV files. I don’t know enough about it to say whether it can solve your problem, but I can say that working in CSV files with regular expressions is error-prone.
My own Columns++ plugin can convert CSV to tab-separated values and back again; tab-separated values are less problematic to search and manipulate than delimiter-separated values. However, I suggest CSVLint first, if it can do what you need.
-
@Mike-Albers
If you need to match\n
but not\r\n
, try replacing((?:[^\r\n;]|(?<!\r)\n){4000})(?:[^\r\n;]|(?<!\r)\n)+(?=\r\n)
with$1
.I’m writing my response on my phone, so I prefer brevity. That said, my new regex uses lookahead and lookbehind. Google those terms for some guidance.
-
-
@Mike-Albers deleted the first post, and started a new conversation Topic here.
In the future, please do not delete posts that have replies. And if you want to clarify, provide more details, etc, you can just do that in a reply, so that the conversation keeps the original context.
To preserve the history of this discussion, the first post has been un-deleted. But since the conversation has moved forward in the new Topic, this old version has been locked.