Remove special characters in certain location in csv file



  • Hi,
    I have occurrence of a string that includes a special character that need to be removed.
    Example:
    “file://^598C372F08F75D25BA1D48522405CE21918E93A4D6475FC22E^pimgpsh_thumbnail_win_distr.jpg”
    in this example i need to remove the ^ character

    Please advise
    Thanks



  • Note: the string may include other special characters, in different positions
    Thanks



  • @Itamar-Ben-Sinai

    Ignore my ignorance but why not simply use find/replace dialog?

    Cheers
    Claudia



  • not ignorance, I guess I did not explain myself well
    I have about 300 occurancs of string like in the example, that have different special characters, that may be located in different parts of the string, so I need to remove all of these special characters that only appear after the beginning of the “file://” string and before the closing of the csv section(") where "file:// is located

    Thanks



  • @Itamar-Ben-Sinai

    the following solution would need multiple replace all as it would find/replace one occurrence per time only.

    find what:("file://.*?)([\^~[])(.*?")
    replace with:\1\3
    

    The second capture group is the interesting part as it defines the chars you do not want to have within an alternation (starts with [ and ends with ]).
    Currently 3 chars are defined, ^ ~ and [.
    ^ needs to be escaped as it is a special char within regex, as well as [
    So if you do not want to have, let’s say a semicolon in addition you would use

    ("file://.*?)([\^~[;])(.*?")
    

    or if only ^ and semicolon should be replaced

    ("file://.*?)([\^;])(.*?")
    

    Make sure you’ve backed up the data in case anything goes wrong.

    Cheers
    Claudia



  • @Claudia-Frank
    Hi,
    When searching with (“file://.?)(.?”) I find the exact entry that starts with file:// and ends with (")
    When i insert the search for special characters in between (“file://.?) and (.?”), the search marks the line from first occurrence of file:// to the end of the line.
    What am I doing wrong?

    Thanks



  • Hello, @itamar-ben-sinai,

    Am I understanding you correctly ?

    You would like to delete any special character, which lies, exclusively, in a one-line range of characters, between an initial "file://" string and an ending " character, wouldn’t you ?

    I suppose that letters, digits and the underscore character ( _ ) are considered as regular characters, which should NOT be deleted. If I include, the semicolon ( : ), the slash ( / ) and the dot ( . ) symbols, as regular characters, this means that any single special character could be found with the negative character class [^\w:/.], where \w stands for any Word character


    Therefore, a correct regex S/R could be :

    SEARCH (?-is)(?!.*?"file:\x2F\x2F)(?=.+?")[^\w:/.]

    REPLACE Leave EMPTY !

    For instance, in the text, below, all the underlined characters would be deleted, after a click on the Replace All button

    Just note that special characters, located before the "file: string OR after the ending " are correctly untouched !

    This regex performs a one-line search, in a case-sensitive way

    A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st
                            ¯        ¯          ¯            ¯             ¯
    						
    A sim^ple tes#t "file://98C+308F75D251D485?22405CE2<1918E93D6>475FC22Epimgpsh_thumb~nail_win_dis&tr.jpg" <A> si;mple Te@st
                               ¯              ¯        ¯         ¯                     ¯            ¯
    

    Remarks :

    • If the ^ character must be considered as a regular character, change the ending part of the search regex by [^\w:/.^]

    • If the \ character must be considered as a regular character, change the ending part of the search regex by [^\w:/.\]

    • If the ] character must be considered as a regular character, change the ending part of the search regex by [^]\w:/.]

    • If the - character must be considered as a regular character, change the ending part of the search regex by [^\w:/.-]

    Best Regards,

    guy038



  • @Itamar-Ben-Sinai

    you don’t do anything woring.
    My regex builds 3 capture groups which are
    internally reflected by the variables \1, \2 and \3.
    \1 contains what is discovered by (“file://.*?)
    \2 contains ([^~[])
    and \3 what matched against (.*?”)

    Because the replace with contains only \1 and \3 the special chars,
    which are hold by \2, aren’t used.

    But I would recommend you use the solution provided by @guy038 as
    his way replaces the special chars directly in on go (Nice job Guy!!).

    @guy038
    Your combination of a negative and positive lookahead revealed that I misunderstood
    the meaning as I was under the impression that it must be used within the “context” (?) of the text.
    Meaning, if we have a text like

    text_for_lookahead_match  followed_by_text_of_interest followed_by_aonther_lookahead_text
    

    I thought I need something like

    (?!whatever)(text_of_interest)(?=another_lookahead)
    

    but your solution

    (?!whatever)(?=another_lookahead)(text_of_interest)
    

    does it - GREAT - Learned something new :-)

    Thank you!!!

    Cheers
    Claudia



  • @Itamar-Ben-Sinai
    Of course this

    you don’t do anything woring.

    must be

    you don’t do anything wrong.

    Cheers
    Claudia



  • Hi, @claudia-frank,

    In other words, considering the general case, we have to search for C text, between two limits A and B

    But, how to define text, which is between these two limits ? Well, simply, because, at ANY location reached :

    • A limit A must not be found, further on, in the same line

    • A limit B must be found, further on, in the same line

    This implies the two conditions to respect :

    • The negative look-ahead (?!.*?A)

    • The positive look-ahead (?=.+?B)

    In our particular case :

    • Limit A is the string "file:\x2F\x2F ( \x2F represents the normal slash character, / )

    • Limit B is the simple ending " character

    • And, of course, C is the regex to get special characters [^\w:/.]

    Just notice that we could swap the two lookarounds, without any problem ! Remember that, at any location, reached by the regex engine, the two conditions, resulting of the look-arounds, are, necessarily, both, evaluated !

    Thus, the complete search regex (?-is)(?=.+?")(?!.*?"file:\x2F\x2F)[^\w:/.], correctly, find the same special characters, as in my previous post !


    Now, using the example text, below :

    A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st
    
    • As long as the current regex engine location is before the string "file:…, the negative look-ahead (?!.*?"file:\x2F\x2F) is not true, so no overall match is possible, whatever the text C searched

    • As soon as the current regex engine location is at the ending " double quote, or further on, the positive look-ahead (?=.+?") is false, so no overall match is possible, too, whatever the text C searched

    • But, when the current regex engine location is, BOTH, right after the "file:… string AND before the ending " double quote, the two conditions are, simultaneously, TRUE. So, an overall match may be found, providing it, also, matches the C text. That is to say, the regex [^\w:/.]

    Cheers,

    guy038

    P.S. :

    Note that when the current regex engine location is right after the starting double quote, the negative look-ahead (?!.*?"file:\x2F\x2F), this time, is true. So, we need to include the semicolon and the slash, as regular characters. Otherwise, they would be found and deleted, as well as the dot character !


Log in to reply