Advance Replace including right trim (repost with example)
-
Hi, @mike-albers, @mark-olson, @terry-r, @coises and All,
Ah, I now understand that
\n
may occur in the first4,000
characters of the last field !A completely different goal to reach !
If I still assume that no line-break occurs in the first
3
fieldsAnd given the INPUT file :
xxxxxxxxxxxx;yyyy;zzzzzzzzzz;12 34 5678901234567890 xxx;yyyyyyyy;zzzzzz;12345 xxx;yyyyyyyy;zzzzzz; xxxxxx;yyy;zzzzzzzzzzzzzzzzzzz;1 2 3456789012345678901234567890 xxxxxx;yyyyy;zzzzzzzzzzzzzzzzzzzzz;1234567890
IMPORTANT : After pasting the INPUT code text above in a new tab, you must change, at the end, the current
\r\n
line-break by\n
in lines1
,2
,6
and7
, as shown below, BEFORE you apply the regex S/R !The following regex S/R should work :
-
FIND
(?s)^(?:[^\r\n;]+;){3}.{0,10}\r\n|^((?:[^\r\n;]+;){3}.{10}).+?\r\n
-
REPLACE
?1$1\r\n:$0
And produce this OUTPUT text :
xxxxxxxxxxxx;yyyy;zzzzzzzzzz;12 34 5678 xxx;yyyyyyyy;zzzzzz;12345 xxx;yyyyyyyy;zzzzzz; xxxxxx;yyy;zzzzzzzzzzzzzzzzzzz;1 2 345678 xxxxxx;yyyyy;zzzzzzzzzzzzzzzzzzzzz;1234567890
As you can see, after the last
;
of each record :-
The string
12\n34\n5678
, in lines1
,2
and3
, correctly contains10
characters and the final\r\n
-
The line
4
contains the string12345
and the final\r\n
-
The line
5
contains an empty string and the final\r\n
-
The string
1\n2\n345678
, in lines6
,7
and8
, correctly contains10
characters and the final\r\n
-
The line
9
contains the string1234567890
and the final\r\n
Now, as you said :
In fact there are more fields, but my issue is with the last field.
I suppose that you must change the numbers
3
of the regex by the exact number of fields before the last one, which size is over4,000
charactersThus, the general regex S/R is :
-
FIND
(?s)^(?:[^\r\n;]+;){
N}.{0,4000}\r\n|^((?:[^\r\n;]+;){
N}.{4000}).+?\r\n
-
REPLACE
?1$1\r\n:$0
Where
N
is the number of fields before the last one !If, in addition, the number of fields is variable, you could change the two
[3}
syntaxes by the{x,y}
syntax, wherex
andy
represent integersBest regards,
guy038
-
-
@Terry-R Sorry Terry, I did not read all of it. I was confused because it wasn’t possible to edit my original post after 4 hours of first posting. Since there were just a few respondents it seemed better to have the complete issue on top of the post. Otherwise new replies would probably be based on old information provided. I guess that people will not go through all of the discussion first.
sorry for the inconvenience.
Would be nice when after the 4 hours a direct timestamped-addendum at the original post would be allowed instead of a reply.Will not make the same mistake again.
Keep up the good work!
-
@guy038 Hi Guy, i think your solution is working after all.
in the tool something strange happens. But in Notepad it seems to work properly.
I tried it out on the testfile with the \n characters in the 4th field.
Now i will try it on my real life CSV file to see what happens there.So far so good.
Thanks!
-
@Mike-Albers said:
I tried out your solution with the online regex tool at regex101 site but it is not working.
Some of these regexes are quite “involved”. The more involved they are, the less likely they are to work in both regex101 and Notepad++; the reason for this is that they use different regular expression engines and all engines have nuanced processing when the regexes are not simple. It may not be the case here, but you should try all advice provided in Notepad++'s replace before coming to a conclusion.
-
This post has 7 revisions
As I typed my reply, I kept seeing screen flashes, so I investigated.
It appears that Guy is uber-editing his earlier response.
Hopefully, he’s not changing history, and and is always making harmless edits.
Otherwise, how is @Mike-Albers to “keep up” with the advice being provided?
EDIT: Now:
This post has 8 revisions
-
Hello, @alan-kilborn and All,
I agree that I edited my previous post a lot of times.
But it’s just because if you just paste the INPUT text in a new tab, you get all the sentences with a final line-break =
\r\n
And, of course, the regex S/R would not work in this case :-((
BR
guy038
-
@Alan-Kilborn you are right. I jumped to conslusions.
Tried in notepad++ and the solution from guy seems to work after all. Changed the reply asap. :-) -
@Alan-Kilborn said,
It appears that Guy is uber-editing his earlier response
@Mike-Albers said,
Changed the reply asap
In general, my preference is that posts not get edited after there’s a reply, because that breaks the flow of the conversation. In extreme circumstances, if there is an edit after a reply, I highly encourage marking it like “edit: xyz” or similar, or, if there’s a bunch of information that turns out to be wrong, using the ~~~ to strikethrough, like, “
old incorrect information[edited: see my reply below]” . This allows people to be able to see what was being responded to in the immediate replies, but informs them that something has been updated.It should be noted that even changing a post before there are any replies is dangerous, because someone may have read your original, and maybe even replying with quoting your original text, and having your text now be different makes it look like the person is misquoting you, which they are not actually doing.
(This discussion has a case in point: Alan quoted the regex101 line, and now it’s been edited away.)
As said in another forum where I spend a lot of time, “It is uncool to update a [post] in a way that renders replies confusing or meaningless”.
-
Amen. Don’t change posts after posting them, unless you are 100% sure you aren’t changing any meaning. That is, only change an obvious typo (but NOT one in an “expression”). Otherwise, follow Peter’s excellent advice.
-
M mkupper referenced this topic on
-
@guy038 Hi guy, I studied your solution and Regex itself and it starts to dawn at me.
I changed my testfile and tried in addition how to handle empty fields. For that i changed your searchstring a tiny bit but also added an extra OR clause.
It seems to work properly now.My latest testfile was like this:
My search pattern is now:
(?s)^(?:[^\r\n;];){3}.{0,24}\r\n|^((?:[^\r\n;];){3}.{0,24}).?\r\n|^(?:[^\r\n;]?)\r\nThe replace statement is still yours:
?1$1\r\n:$0Result was:
I tried to figure out the replace string, but i don’t get it.
(tried selfstudy on it with Regex0101 tool bit by bit but since it is not 100% compatible i couldn’t figure it out myself.) Really no lazyness on my part here when i ask my question.
So i hope you can explain it step by step for me.Thanks!