Remove special characters in certain location in csv file
-
Hi,
I have occurrence of a string that includes a special character that need to be removed.
Example:
“file://^598C372F08F75D25BA1D48522405CE21918E93A4D6475FC22E^pimgpsh_thumbnail_win_distr.jpg”
in this example i need to remove the ^ characterPlease advise
Thanks -
Note: the string may include other special characters, in different positions
Thanks -
-
not ignorance, I guess I did not explain myself well
I have about 300 occurancs of string like in the example, that have different special characters, that may be located in different parts of the string, so I need to remove all of these special characters that only appear after the beginning of the “file://” string and before the closing of the csv section(") where "file:// is locatedThanks
-
the following solution would need multiple replace all as it would find/replace one occurrence per time only.
find what:("file://.*?)([\^~\[])(.*?") replace with:\1\3
The second capture group is the interesting part as it defines the chars you do not want to have within an alternation (starts with [ and ends with ]).
Currently 3 chars are defined, ^ ~ and [.
^ needs to be escaped as it is a special char within regex, as well as [
So if you do not want to have, let’s say a semicolon in addition you would use("file://.*?)([\^~\[;])(.*?")
or if only ^ and semicolon should be replaced
("file://.*?)([\^;])(.*?")
Make sure you’ve backed up the data in case anything goes wrong.
Cheers
Claudia -
@Claudia-Frank
Hi,
When searching with (“file://.?)(.?”) I find the exact entry that starts with file:// and ends with (")
When i insert the search for special characters in between (“file://.?) and (.?”), the search marks the line from first occurrence of file:// to the end of the line.
What am I doing wrong?Thanks
-
Hello, @itamar-ben-sinai,
Am I understanding you correctly ?
You would like to delete any special character, which lies, exclusively, in a one-line range of characters, between an initial
"file://"
string and an ending"
character, wouldn’t you ?I suppose that letters, digits and the underscore character (
_
) are considered as regular characters, which should NOT be deleted. If I include, the semicolon (:
), the slash (/
) and the dot (.
) symbols, as regular characters, this means that any single special character could be found with the negative character class[^\w:/.]
, where\w
stands for any Word character
Therefore, a correct regex S/R could be :
SEARCH
(?-is)(?!.*?"file:\x2F\x2F)(?=.+?")[^\w:/.]
REPLACE
Leave EMPTY !
For instance, in the text, below, all the underlined characters would be deleted, after a click on the Replace All button
Just note that special characters, located before the
"file:
string OR after the ending"
are correctly untouched !This regex performs a one-line search, in a case-sensitive way
A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st ¯ ¯ ¯ ¯ ¯ A sim^ple tes#t "file://98C+308F75D251D485?22405CE2<1918E93D6>475FC22Epimgpsh_thumb~nail_win_dis&tr.jpg" <A> si;mple Te@st ¯ ¯ ¯ ¯ ¯ ¯
Remarks :
-
If the
^
character must be considered as a regular character, change the ending part of the search regex by[^\w:/.^]
-
If the
\
character must be considered as a regular character, change the ending part of the search regex by[^\w:/.\\]
-
If the
]
character must be considered as a regular character, change the ending part of the search regex by[^]\w:/.]
-
If the
-
character must be considered as a regular character, change the ending part of the search regex by[^\w:/.-]
Best Regards,
guy038
-
-
you don’t do anything woring.
My regex builds 3 capture groups which are
internally reflected by the variables \1, \2 and \3.
\1 contains what is discovered by (“file://.*?)
\2 contains ([^~[])
and \3 what matched against (.*?”)Because the replace with contains only \1 and \3 the special chars,
which are hold by \2, aren’t used.But I would recommend you use the solution provided by @guy038 as
his way replaces the special chars directly in on go (Nice job Guy!!).@guy038
Your combination of a negative and positive lookahead revealed that I misunderstood
the meaning as I was under the impression that it must be used within the “context” (?) of the text.
Meaning, if we have a text liketext_for_lookahead_match followed_by_text_of_interest followed_by_aonther_lookahead_text
I thought I need something like
(?!whatever)(text_of_interest)(?=another_lookahead)
but your solution
(?!whatever)(?=another_lookahead)(text_of_interest)
does it - GREAT - Learned something new :-)
Thank you!!!
Cheers
Claudia -
@Itamar-Ben-Sinai
Of course thisyou don’t do anything woring.
must be
you don’t do anything wrong.
Cheers
Claudia -
Hi, @claudia-frank,
In other words, considering the general case, we have to search for
C
text, between two limitsA
andB
But, how to define text, which is between these two limits ? Well, simply, because, at ANY location reached :
-
A limit
A
must not be found, further on, in the same line -
A limit
B
must be found, further on, in the same line
This implies the two conditions to respect :
-
The negative look-ahead
(?!.*?A)
-
The positive look-ahead
(?=.+?B)
In our particular case :
-
Limit
A
is the string "file:\x2F\x2F (\x2F
represents the normal slash character,/
) -
Limit
B
is the simple ending"
character -
And, of course,
C
is the regex to get special characters[^\w:/.]
Just notice that we could swap the two lookarounds, without any problem ! Remember that, at any location, reached by the regex engine, the two conditions, resulting of the look-arounds, are, necessarily, both, evaluated !
Thus, the complete search regex
(?-is)(?=.+?")(?!.*?"file:\x2F\x2F)[^\w:/.]
, correctly, find the same special characters, as in my previous post !
Now, using the example text, below :
A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st
-
As long as the current regex engine location is before the string "file:…, the negative look-ahead
(?!.*?"file:\x2F\x2F)
is not true, so no overall match is possible, whatever the textC
searched -
As soon as the current regex engine location is at the ending
"
double quote, or further on, the positive look-ahead(?=.+?")
is false, so no overall match is possible, too, whatever the textC
searched -
But, when the current regex engine location is, BOTH, right after the "file:… string AND before the ending
"
double quote, the two conditions are, simultaneously, TRUE. So, an overall match may be found, providing it, also, matches theC
text. That is to say, the regex[^\w:/.]
Cheers,
guy038
P.S. :
Note that when the current regex engine location is right after the starting double quote, the negative look-ahead
(?!.*?"file:\x2F\x2F)
, this time, is true. So, we need to include the semicolon and the slash, as regular characters. Otherwise, they would be found and deleted, as well as the dot character ! -