Remove special characters in certain location in csv file
-
Hi,
I have occurrence of a string that includes a special character that need to be removed.
Example:
“file://^598C372F08F75D25BA1D48522405CE21918E93A4D6475FC22E^pimgpsh_thumbnail_win_distr.jpg”
in this example i need to remove the ^ characterPlease advise
Thanks -
Note: the string may include other special characters, in different positions
Thanks -
-
not ignorance, I guess I did not explain myself well
I have about 300 occurancs of string like in the example, that have different special characters, that may be located in different parts of the string, so I need to remove all of these special characters that only appear after the beginning of the “file://” string and before the closing of the csv section(") where "file:// is locatedThanks
-
the following solution would need multiple replace all as it would find/replace one occurrence per time only.
find what:("file://.*?)([\^~\[])(.*?") replace with:\1\3The second capture group is the interesting part as it defines the chars you do not want to have within an alternation (starts with [ and ends with ]).
Currently 3 chars are defined, ^ ~ and [.
^ needs to be escaped as it is a special char within regex, as well as [
So if you do not want to have, let’s say a semicolon in addition you would use("file://.*?)([\^~\[;])(.*?")or if only ^ and semicolon should be replaced
("file://.*?)([\^;])(.*?")Make sure you’ve backed up the data in case anything goes wrong.
Cheers
Claudia -
@Claudia-Frank
Hi,
When searching with (“file://.?)(.?”) I find the exact entry that starts with file:// and ends with (")
When i insert the search for special characters in between (“file://.?) and (.?”), the search marks the line from first occurrence of file:// to the end of the line.
What am I doing wrong?Thanks
-
Hello, @itamar-ben-sinai,
Am I understanding you correctly ?
You would like to delete any special character, which lies, exclusively, in a one-line range of characters, between an initial
"file://"string and an ending"character, wouldn’t you ?I suppose that letters, digits and the underscore character (
_) are considered as regular characters, which should NOT be deleted. If I include, the semicolon (:), the slash (/) and the dot (.) symbols, as regular characters, this means that any single special character could be found with the negative character class[^\w:/.], where\wstands for any Word character
Therefore, a correct regex S/R could be :
SEARCH
(?-is)(?!.*?"file:\x2F\x2F)(?=.+?")[^\w:/.]REPLACE
Leave EMPTY !For instance, in the text, below, all the underlined characters would be deleted, after a click on the Replace All button
Just note that special characters, located before the
"file:string OR after the ending"are correctly untouched !This regex performs a one-line search, in a case-sensitive way
A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st ¯ ¯ ¯ ¯ ¯ A sim^ple tes#t "file://98C+308F75D251D485?22405CE2<1918E93D6>475FC22Epimgpsh_thumb~nail_win_dis&tr.jpg" <A> si;mple Te@st ¯ ¯ ¯ ¯ ¯ ¯Remarks :
-
If the
^character must be considered as a regular character, change the ending part of the search regex by[^\w:/.^] -
If the
\character must be considered as a regular character, change the ending part of the search regex by[^\w:/.\\] -
If the
]character must be considered as a regular character, change the ending part of the search regex by[^]\w:/.] -
If the
-character must be considered as a regular character, change the ending part of the search regex by[^\w:/.-]
Best Regards,
guy038
-
-
you don’t do anything woring.
My regex builds 3 capture groups which are
internally reflected by the variables \1, \2 and \3.
\1 contains what is discovered by (“file://.*?)
\2 contains ([^~[])
and \3 what matched against (.*?”)Because the replace with contains only \1 and \3 the special chars,
which are hold by \2, aren’t used.But I would recommend you use the solution provided by @guy038 as
his way replaces the special chars directly in on go (Nice job Guy!!).@guy038
Your combination of a negative and positive lookahead revealed that I misunderstood
the meaning as I was under the impression that it must be used within the “context” (?) of the text.
Meaning, if we have a text liketext_for_lookahead_match followed_by_text_of_interest followed_by_aonther_lookahead_textI thought I need something like
(?!whatever)(text_of_interest)(?=another_lookahead)but your solution
(?!whatever)(?=another_lookahead)(text_of_interest)does it - GREAT - Learned something new :-)
Thank you!!!
Cheers
Claudia -
@Itamar-Ben-Sinai
Of course thisyou don’t do anything woring.
must be
you don’t do anything wrong.
Cheers
Claudia -
Hi, @claudia-frank,
In other words, considering the general case, we have to search for
Ctext, between two limitsAandBBut, how to define text, which is between these two limits ? Well, simply, because, at ANY location reached :
-
A limit
Amust not be found, further on, in the same line -
A limit
Bmust be found, further on, in the same line
This implies the two conditions to respect :
-
The negative look-ahead
(?!.*?A) -
The positive look-ahead
(?=.+?B)
In our particular case :
-
Limit
Ais the string "file:\x2F\x2F (\x2Frepresents the normal slash character,/) -
Limit
Bis the simple ending"character -
And, of course,
Cis the regex to get special characters[^\w:/.]
Just notice that we could swap the two lookarounds, without any problem ! Remember that, at any location, reached by the regex engine, the two conditions, resulting of the look-arounds, are, necessarily, both, evaluated !
Thus, the complete search regex
(?-is)(?=.+?")(?!.*?"file:\x2F\x2F)[^\w:/.], correctly, find the same special characters, as in my previous post !
Now, using the example text, below :
A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st-
As long as the current regex engine location is before the string "file:…, the negative look-ahead
(?!.*?"file:\x2F\x2F)is not true, so no overall match is possible, whatever the textCsearched -
As soon as the current regex engine location is at the ending
"double quote, or further on, the positive look-ahead(?=.+?")is false, so no overall match is possible, too, whatever the textCsearched -
But, when the current regex engine location is, BOTH, right after the "file:… string AND before the ending
"double quote, the two conditions are, simultaneously, TRUE. So, an overall match may be found, providing it, also, matches theCtext. That is to say, the regex[^\w:/.]
Cheers,
guy038
P.S. :
Note that when the current regex engine location is right after the starting double quote, the negative look-ahead
(?!.*?"file:\x2F\x2F), this time, is true. So, we need to include the semicolon and the slash, as regular characters. Otherwise, they would be found and deleted, as well as the dot character ! -