Remove special characters in certain location in csv file

Itamar Ben-Sinai

Hi,
I have occurrence of a string that includes a special character that need to be removed.
Example:
“file://^598C372F08F75D25BA1D48522405CE21918E93A4D6475FC22E^pimgpsh_thumbnail_win_distr.jpg”
in this example i need to remove the ^ character

Please advise
Thanks

Itamar Ben-Sinai

Note: the string may include other special characters, in different positions
Thanks

Claudia Frank

@Itamar-Ben-Sinai

Ignore my ignorance but why not simply use find/replace dialog?

Cheers
Claudia

Itamar Ben-Sinai

not ignorance, I guess I did not explain myself well
I have about 300 occurancs of string like in the example, that have different special characters, that may be located in different parts of the string, so I need to remove all of these special characters that only appear after the beginning of the “file://” string and before the closing of the csv section(") where "file:// is located

Thanks

Claudia Frank

@Itamar-Ben-Sinai

the following solution would need multiple replace all as it would find/replace one occurrence per time only.

find what:("file://.*?)([\^~\[])(.*?")
replace with:\1\3

The second capture group is the interesting part as it defines the chars you do not want to have within an alternation (starts with [ and ends with ]).
Currently 3 chars are defined, ^ ~ and [.
^ needs to be escaped as it is a special char within regex, as well as [
So if you do not want to have, let’s say a semicolon in addition you would use

("file://.*?)([\^~\[;])(.*?")

or if only ^ and semicolon should be replaced

("file://.*?)([\^;])(.*?")

Make sure you’ve backed up the data in case anything goes wrong.

Cheers
Claudia

Itamar Ben-Sinai

@Claudia-Frank
Hi,
When searching with (“file://.?)(.?”) I find the exact entry that starts with file:// and ends with (")
When i insert the search for special characters in between (“file://.?) and (.?”), the search marks the line from first occurrence of file:// to the end of the line.
What am I doing wrong?

Thanks

guy038

Hello, @itamar-ben-sinai,

Am I understanding you correctly ?

You would like to delete any special character, which lies, exclusively, in a one-line range of characters, between an initial "file://" string and an ending " character, wouldn’t you ?

I suppose that letters, digits and the underscore character ( _ ) are considered as regular characters, which should NOT be deleted. If I include, the semicolon ( : ), the slash ( / ) and the dot ( . ) symbols, as regular characters, this means that any single special character could be found with the negative character class [^\w:/.], where \w stands for any Word character

Therefore, a correct regex S/R could be :

SEARCH (?-is)(?!.*?"file:\x2F\x2F)(?=.+?")[^\w:/.]

REPLACE Leave EMPTY !

For instance, in the text, below, all the underlined characters would be deleted, after a click on the Replace All button

Just note that special characters, located before the "file: string OR after the ending " are correctly untouched !

This regex performs a one-line search, in a case-sensitive way

A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st
                        ¯        ¯          ¯            ¯             ¯
						
A sim^ple tes#t "file://98C+308F75D251D485?22405CE2<1918E93D6>475FC22Epimgpsh_thumb~nail_win_dis&tr.jpg" <A> si;mple Te@st
                           ¯              ¯        ¯         ¯                     ¯            ¯

Remarks :

If the ^ character must be considered as a regular character, change the ending part of the search regex by [^\w:/.^]
If the \ character must be considered as a regular character, change the ending part of the search regex by [^\w:/.\\]
If the ] character must be considered as a regular character, change the ending part of the search regex by [^]\w:/.]
If the - character must be considered as a regular character, change the ending part of the search regex by [^\w:/.-]

Best Regards,

guy038

Claudia Frank

@Itamar-Ben-Sinai

you don’t do anything woring.
My regex builds 3 capture groups which are
internally reflected by the variables \1, \2 and \3.
\1 contains what is discovered by (“file://.*?)
\2 contains ([^~[])
and \3 what matched against (.*?”)

Because the replace with contains only \1 and \3 the special chars,
which are hold by \2, aren’t used.

But I would recommend you use the solution provided by @guy038 as
his way replaces the special chars directly in on go (Nice job Guy!!).

@guy038
Your combination of a negative and positive lookahead revealed that I misunderstood
the meaning as I was under the impression that it must be used within the “context” (?) of the text.
Meaning, if we have a text like

text_for_lookahead_match  followed_by_text_of_interest followed_by_aonther_lookahead_text

I thought I need something like

(?!whatever)(text_of_interest)(?=another_lookahead)

but your solution

(?!whatever)(?=another_lookahead)(text_of_interest)

does it - GREAT - Learned something new :-)

Thank you!!!

Cheers
Claudia

Claudia Frank

@Itamar-Ben-Sinai
Of course this

you don’t do anything woring.

must be

you don’t do anything wrong.

Cheers
Claudia

guy038

Hi, @claudia-frank,

In other words, considering the general case, we have to search for C text, between two limits A and B

But, how to define text, which is between these two limits ? Well, simply, because, at ANY location reached :

A limit A must not be found, further on, in the same line
A limit B must be found, further on, in the same line

This implies the two conditions to respect :

The negative look-ahead (?!.*?A)
The positive look-ahead (?=.+?B)

In our particular case :

Limit A is the string "file:\x2F\x2F ( \x2F represents the normal slash character, / )
Limit B is the simple ending " character
And, of course, C is the regex to get special characters [^\w:/.]

Just notice that we could swap the two lookarounds, without any problem ! Remember that, at any location, reached by the regex engine, the two conditions, resulting of the look-arounds, are, necessarily, both, evaluated !

Thus, the complete search regex (?-is)(?=.+?")(?!.*?"file:\x2F\x2F)[^\w:/.], correctly, find the same special characters, as in my previous post !

Now, using the example text, below :

A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st

As long as the current regex engine location is before the string "file:…, the negative look-ahead (?!.*?"file:\x2F\x2F) is not true, so no overall match is possible, whatever the text C searched
As soon as the current regex engine location is at the ending " double quote, or further on, the positive look-ahead (?=.+?") is false, so no overall match is possible, too, whatever the text C searched
But, when the current regex engine location is, BOTH, right after the "file:… string AND before the ending " double quote, the two conditions are, simultaneously, TRUE. So, an overall match may be found, providing it, also, matches the C text. That is to say, the regex [^\w:/.]

Cheers,

guy038

P.S. :

Note that when the current regex engine location is right after the starting double quote, the negative look-ahead (?!.*?"file:\x2F\x2F), this time, is true. So, we need to include the semicolon and the slash, as regular characters. Otherwise, they would be found and deleted, as well as the dot character !