Community
    • Login

    Remove special characters in certain location in csv file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    10 Posts 3 Posters 9.3k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Itamar Ben-SinaiI
      Itamar Ben-Sinai
      last edited by

      Hi,
      I have occurrence of a string that includes a special character that need to be removed.
      Example:
      “file://^598C372F08F75D25BA1D48522405CE21918E93A4D6475FC22E^pimgpsh_thumbnail_win_distr.jpg”
      in this example i need to remove the ^ character

      Please advise
      Thanks

      1 Reply Last reply Reply Quote 0
      • Itamar Ben-SinaiI
        Itamar Ben-Sinai
        last edited by

        Note: the string may include other special characters, in different positions
        Thanks

        Claudia FrankC 1 Reply Last reply Reply Quote 0
        • Claudia FrankC
          Claudia Frank @Itamar Ben-Sinai
          last edited by

          @Itamar-Ben-Sinai

          Ignore my ignorance but why not simply use find/replace dialog?

          Cheers
          Claudia

          1 Reply Last reply Reply Quote 0
          • Itamar Ben-SinaiI
            Itamar Ben-Sinai
            last edited by

            not ignorance, I guess I did not explain myself well
            I have about 300 occurancs of string like in the example, that have different special characters, that may be located in different parts of the string, so I need to remove all of these special characters that only appear after the beginning of the “file://” string and before the closing of the csv section(") where "file:// is located

            Thanks

            Claudia FrankC 1 Reply Last reply Reply Quote 0
            • Claudia FrankC
              Claudia Frank @Itamar Ben-Sinai
              last edited by Claudia Frank

              @Itamar-Ben-Sinai

              the following solution would need multiple replace all as it would find/replace one occurrence per time only.

              find what:("file://.*?)([\^~\[])(.*?")
              replace with:\1\3
              

              The second capture group is the interesting part as it defines the chars you do not want to have within an alternation (starts with [ and ends with ]).
              Currently 3 chars are defined, ^ ~ and [.
              ^ needs to be escaped as it is a special char within regex, as well as [
              So if you do not want to have, let’s say a semicolon in addition you would use

              ("file://.*?)([\^~\[;])(.*?")
              

              or if only ^ and semicolon should be replaced

              ("file://.*?)([\^;])(.*?")
              

              Make sure you’ve backed up the data in case anything goes wrong.

              Cheers
              Claudia

              1 Reply Last reply Reply Quote 0
              • Itamar Ben-SinaiI
                Itamar Ben-Sinai
                last edited by

                @Claudia-Frank
                Hi,
                When searching with (“file://.?)(.?”) I find the exact entry that starts with file:// and ends with (")
                When i insert the search for special characters in between (“file://.?) and (.?”), the search marks the line from first occurrence of file:// to the end of the line.
                What am I doing wrong?

                Thanks

                Claudia FrankC 1 Reply Last reply Reply Quote 0
                • guy038G
                  guy038
                  last edited by guy038

                  Hello, @itamar-ben-sinai,

                  Am I understanding you correctly ?

                  You would like to delete any special character, which lies, exclusively, in a one-line range of characters, between an initial "file://" string and an ending " character, wouldn’t you ?

                  I suppose that letters, digits and the underscore character ( _ ) are considered as regular characters, which should NOT be deleted. If I include, the semicolon ( : ), the slash ( / ) and the dot ( . ) symbols, as regular characters, this means that any single special character could be found with the negative character class [^\w:/.], where \w stands for any Word character


                  Therefore, a correct regex S/R could be :

                  SEARCH (?-is)(?!.*?"file:\x2F\x2F)(?=.+?")[^\w:/.]

                  REPLACE Leave EMPTY !

                  For instance, in the text, below, all the underlined characters would be deleted, after a click on the Replace All button

                  Just note that special characters, located before the "file: string OR after the ending " are correctly untouched !

                  This regex performs a one-line search, in a case-sensitive way

                  A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st
                                          ¯        ¯          ¯            ¯             ¯
                  						
                  A sim^ple tes#t "file://98C+308F75D251D485?22405CE2<1918E93D6>475FC22Epimgpsh_thumb~nail_win_dis&tr.jpg" <A> si;mple Te@st
                                             ¯              ¯        ¯         ¯                     ¯            ¯
                  

                  Remarks :

                  • If the ^ character must be considered as a regular character, change the ending part of the search regex by [^\w:/.^]

                  • If the \ character must be considered as a regular character, change the ending part of the search regex by [^\w:/.\\]

                  • If the ] character must be considered as a regular character, change the ending part of the search regex by [^]\w:/.]

                  • If the - character must be considered as a regular character, change the ending part of the search regex by [^\w:/.-]

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 1
                  • Claudia FrankC
                    Claudia Frank @Itamar Ben-Sinai
                    last edited by Claudia Frank

                    @Itamar-Ben-Sinai

                    you don’t do anything woring.
                    My regex builds 3 capture groups which are
                    internally reflected by the variables \1, \2 and \3.
                    \1 contains what is discovered by (“file://.*?)
                    \2 contains ([^~[])
                    and \3 what matched against (.*?”)

                    Because the replace with contains only \1 and \3 the special chars,
                    which are hold by \2, aren’t used.

                    But I would recommend you use the solution provided by @guy038 as
                    his way replaces the special chars directly in on go (Nice job Guy!!).

                    @guy038
                    Your combination of a negative and positive lookahead revealed that I misunderstood
                    the meaning as I was under the impression that it must be used within the “context” (?) of the text.
                    Meaning, if we have a text like

                    text_for_lookahead_match  followed_by_text_of_interest followed_by_aonther_lookahead_text
                    

                    I thought I need something like

                    (?!whatever)(text_of_interest)(?=another_lookahead)
                    

                    but your solution

                    (?!whatever)(?=another_lookahead)(text_of_interest)
                    

                    does it - GREAT - Learned something new :-)

                    Thank you!!!

                    Cheers
                    Claudia

                    1 Reply Last reply Reply Quote 0
                    • Claudia FrankC
                      Claudia Frank
                      last edited by Claudia Frank

                      @Itamar-Ben-Sinai
                      Of course this

                      you don’t do anything woring.

                      must be

                      you don’t do anything wrong.

                      Cheers
                      Claudia

                      1 Reply Last reply Reply Quote 0
                      • guy038G
                        guy038
                        last edited by guy038

                        Hi, @claudia-frank,

                        In other words, considering the general case, we have to search for C text, between two limits A and B

                        But, how to define text, which is between these two limits ? Well, simply, because, at ANY location reached :

                        • A limit A must not be found, further on, in the same line

                        • A limit B must be found, further on, in the same line

                        This implies the two conditions to respect :

                        • The negative look-ahead (?!.*?A)

                        • The positive look-ahead (?=.+?B)

                        In our particular case :

                        • Limit A is the string "file:\x2F\x2F ( \x2F represents the normal slash character, / )

                        • Limit B is the simple ending " character

                        • And, of course, C is the regex to get special characters [^\w:/.]

                        Just notice that we could swap the two lookarounds, without any problem ! Remember that, at any location, reached by the regex engine, the two conditions, resulting of the look-arounds, are, necessarily, both, evaluated !

                        Thus, the complete search regex (?-is)(?=.+?")(?!.*?"file:\x2F\x2F)[^\w:/.], correctly, find the same special characters, as in my previous post !


                        Now, using the example text, below :

                        A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st
                        
                        • As long as the current regex engine location is before the string "file:…, the negative look-ahead (?!.*?"file:\x2F\x2F) is not true, so no overall match is possible, whatever the text C searched

                        • As soon as the current regex engine location is at the ending " double quote, or further on, the positive look-ahead (?=.+?") is false, so no overall match is possible, too, whatever the text C searched

                        • But, when the current regex engine location is, BOTH, right after the "file:… string AND before the ending " double quote, the two conditions are, simultaneously, TRUE. So, an overall match may be found, providing it, also, matches the C text. That is to say, the regex [^\w:/.]

                        Cheers,

                        guy038

                        P.S. :

                        Note that when the current regex engine location is right after the starting double quote, the negative look-ahead (?!.*?"file:\x2F\x2F), this time, is true. So, we need to include the semicolon and the slash, as regular characters. Otherwise, they would be found and deleted, as well as the dot character !

                        1 Reply Last reply Reply Quote 1
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors