Remove special characters in certain location in csv file
I have occurrence of a string that includes a special character that need to be removed.
in this example i need to remove the ^ character
Note: the string may include other special characters, in different positions
Claudia Frank last edited by
Ignore my ignorance but why not simply use find/replace dialog?
not ignorance, I guess I did not explain myself well
I have about 300 occurancs of string like in the example, that have different special characters, that may be located in different parts of the string, so I need to remove all of these special characters that only appear after the beginning of the “file://” string and before the closing of the csv section(") where "file:// is located
the following solution would need multiple replace all as it would find/replace one occurrence per time only.
find what:("file://.*?)([\^~)(.*?") replace with:\1\3
The second capture group is the interesting part as it defines the chars you do not want to have within an alternation (starts with [ and ends with ]).
Currently 3 chars are defined, ^ ~ and [.
^ needs to be escaped as it is a special char within regex, as well as [
So if you do not want to have, let’s say a semicolon in addition you would use
or if only ^ and semicolon should be replaced
Make sure you’ve backed up the data in case anything goes wrong.
When searching with (“file://.?)(.?”) I find the exact entry that starts with file:// and ends with (")
When i insert the search for special characters in between (“file://.?) and (.?”), the search marks the line from first occurrence of file:// to the end of the line.
What am I doing wrong?
guy038 last edited by guy038
Am I understanding you correctly ?
You would like to delete any special character, which lies, exclusively, in a one-line range of characters, between an initial
"file://"string and an ending
"character, wouldn’t you ?
I suppose that letters, digits and the underscore character (
_) are considered as regular characters, which should NOT be deleted. If I include, the semicolon (
:), the slash (
/) and the dot (
.) symbols, as regular characters, this means that any single special character could be found with the negative character class
\wstands for any Word character
Therefore, a correct regex S/R could be :
Leave EMPTY !
For instance, in the text, below, all the underlined characters would be deleted, after a click on the Replace All button
Just note that special characters, located before the
"file:string OR after the ending
"are correctly untouched !
This regex performs a one-line search, in a case-sensitive way
A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st ¯ ¯ ¯ ¯ ¯ A sim^ple tes#t "file://98C+308F75D251D485?22405CE2<1918E93D6>475FC22Epimgpsh_thumb~nail_win_dis&tr.jpg" <A> si;mple Te@st ¯ ¯ ¯ ¯ ¯ ¯
^character must be considered as a regular character, change the ending part of the search regex by
\character must be considered as a regular character, change the ending part of the search regex by
]character must be considered as a regular character, change the ending part of the search regex by
-character must be considered as a regular character, change the ending part of the search regex by
you don’t do anything woring.
My regex builds 3 capture groups which are
internally reflected by the variables \1, \2 and \3.
\1 contains what is discovered by (“file://.*?)
\2 contains ([^~)
and \3 what matched against (.*?”)
Because the replace with contains only \1 and \3 the special chars,
which are hold by \2, aren’t used.
But I would recommend you use the solution provided by @guy038 as
his way replaces the special chars directly in on go (Nice job Guy!!).
Your combination of a negative and positive lookahead revealed that I misunderstood
the meaning as I was under the impression that it must be used within the “context” (?) of the text.
Meaning, if we have a text like
text_for_lookahead_match followed_by_text_of_interest followed_by_aonther_lookahead_text
I thought I need something like
but your solution
does it - GREAT - Learned something new :-)
Of course this
you don’t do anything woring.
you don’t do anything wrong.
guy038 last edited by guy038
In other words, considering the general case, we have to search for
Ctext, between two limits
But, how to define text, which is between these two limits ? Well, simply, because, at ANY location reached :
Amust not be found, further on, in the same line
Bmust be found, further on, in the same line
This implies the two conditions to respect :
The negative look-ahead
The positive look-ahead
In our particular case :
Ais the string "file:\x2F\x2F (
\x2Frepresents the normal slash character,
Bis the simple ending
And, of course,
Cis the regex to get special characters
Just notice that we could swap the two lookarounds, without any problem ! Remember that, at any location, reached by the regex engine, the two conditions, resulting of the look-arounds, are, necessarily, both, evaluated !
Thus, the complete search regex
(?-is)(?=.+?")(?!.*?"file:\x2F\x2F)[^\w:/.], correctly, find the same special characters, as in my previous post !
Now, using the example text, below :
A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st
As long as the current regex engine location is before the string "file:…, the negative look-ahead
(?!.*?"file:\x2F\x2F)is not true, so no overall match is possible, whatever the text
As soon as the current regex engine location is at the ending
"double quote, or further on, the positive look-ahead
(?=.+?")is false, so no overall match is possible, too, whatever the text
But, when the current regex engine location is, BOTH, right after the "file:… string AND before the ending
"double quote, the two conditions are, simultaneously, TRUE. So, an overall match may be found, providing it, also, matches the
Ctext. That is to say, the regex
Note that when the current regex engine location is right after the starting double quote, the negative look-ahead
(?!.*?"file:\x2F\x2F), this time, is true. So, we need to include the semicolon and the slash, as regular characters. Otherwise, they would be found and deleted, as well as the dot character !