Community
    • Login

    Remove special characters in certain location in csv file

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    10 Posts 3 Posters 10.2k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Itamar Ben-SinaiI Offline
      Itamar Ben-Sinai
      last edited by

      Hi,
      I have occurrence of a string that includes a special character that need to be removed.
      Example:
      “file://^598C372F08F75D25BA1D48522405CE21918E93A4D6475FC22E^pimgpsh_thumbnail_win_distr.jpg”
      in this example i need to remove the ^ character

      Please advise
      Thanks

      1 Reply Last reply Reply Quote 0
      • Itamar Ben-SinaiI Offline
        Itamar Ben-Sinai
        last edited by

        Note: the string may include other special characters, in different positions
        Thanks

        Claudia FrankC 1 Reply Last reply Reply Quote 0
        • Claudia FrankC Offline
          Claudia Frank @Itamar Ben-Sinai
          last edited by

          @Itamar-Ben-Sinai

          Ignore my ignorance but why not simply use find/replace dialog?

          Cheers
          Claudia

          1 Reply Last reply Reply Quote 0
          • Itamar Ben-SinaiI Offline
            Itamar Ben-Sinai
            last edited by

            not ignorance, I guess I did not explain myself well
            I have about 300 occurancs of string like in the example, that have different special characters, that may be located in different parts of the string, so I need to remove all of these special characters that only appear after the beginning of the “file://” string and before the closing of the csv section(") where "file:// is located

            Thanks

            Claudia FrankC 1 Reply Last reply Reply Quote 0
            • Claudia FrankC Offline
              Claudia Frank @Itamar Ben-Sinai
              last edited by Claudia Frank

              @Itamar-Ben-Sinai

              the following solution would need multiple replace all as it would find/replace one occurrence per time only.

              find what:("file://.*?)([\^~\[])(.*?")
              replace with:\1\3
              

              The second capture group is the interesting part as it defines the chars you do not want to have within an alternation (starts with [ and ends with ]).
              Currently 3 chars are defined, ^ ~ and [.
              ^ needs to be escaped as it is a special char within regex, as well as [
              So if you do not want to have, let’s say a semicolon in addition you would use

              ("file://.*?)([\^~\[;])(.*?")
              

              or if only ^ and semicolon should be replaced

              ("file://.*?)([\^;])(.*?")
              

              Make sure you’ve backed up the data in case anything goes wrong.

              Cheers
              Claudia

              1 Reply Last reply Reply Quote 0
              • Itamar Ben-SinaiI Offline
                Itamar Ben-Sinai
                last edited by

                @Claudia-Frank
                Hi,
                When searching with (“file://.?)(.?”) I find the exact entry that starts with file:// and ends with (")
                When i insert the search for special characters in between (“file://.?) and (.?”), the search marks the line from first occurrence of file:// to the end of the line.
                What am I doing wrong?

                Thanks

                Claudia FrankC 1 Reply Last reply Reply Quote 0
                • guy038G Online
                  guy038
                  last edited by guy038

                  Hello, @itamar-ben-sinai,

                  Am I understanding you correctly ?

                  You would like to delete any special character, which lies, exclusively, in a one-line range of characters, between an initial "file://" string and an ending " character, wouldn’t you ?

                  I suppose that letters, digits and the underscore character ( _ ) are considered as regular characters, which should NOT be deleted. If I include, the semicolon ( : ), the slash ( / ) and the dot ( . ) symbols, as regular characters, this means that any single special character could be found with the negative character class [^\w:/.], where \w stands for any Word character


                  Therefore, a correct regex S/R could be :

                  SEARCH (?-is)(?!.*?"file:\x2F\x2F)(?=.+?")[^\w:/.]

                  REPLACE Leave EMPTY !

                  For instance, in the text, below, all the underlined characters would be deleted, after a click on the Replace All button

                  Just note that special characters, located before the "file: string OR after the ending " are correctly untouched !

                  This regex performs a one-line search, in a case-sensitive way

                  A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st
                                          ¯        ¯          ¯            ¯             ¯
                  						
                  A sim^ple tes#t "file://98C+308F75D251D485?22405CE2<1918E93D6>475FC22Epimgpsh_thumb~nail_win_dis&tr.jpg" <A> si;mple Te@st
                                             ¯              ¯        ¯         ¯                     ¯            ¯
                  

                  Remarks :

                  • If the ^ character must be considered as a regular character, change the ending part of the search regex by [^\w:/.^]

                  • If the \ character must be considered as a regular character, change the ending part of the search regex by [^\w:/.\\]

                  • If the ] character must be considered as a regular character, change the ending part of the search regex by [^]\w:/.]

                  • If the - character must be considered as a regular character, change the ending part of the search regex by [^\w:/.-]

                  Best Regards,

                  guy038

                  1 Reply Last reply Reply Quote 1
                  • Claudia FrankC Offline
                    Claudia Frank @Itamar Ben-Sinai
                    last edited by Claudia Frank

                    @Itamar-Ben-Sinai

                    you don’t do anything woring.
                    My regex builds 3 capture groups which are
                    internally reflected by the variables \1, \2 and \3.
                    \1 contains what is discovered by (“file://.*?)
                    \2 contains ([^~[])
                    and \3 what matched against (.*?”)

                    Because the replace with contains only \1 and \3 the special chars,
                    which are hold by \2, aren’t used.

                    But I would recommend you use the solution provided by @guy038 as
                    his way replaces the special chars directly in on go (Nice job Guy!!).

                    @guy038
                    Your combination of a negative and positive lookahead revealed that I misunderstood
                    the meaning as I was under the impression that it must be used within the “context” (?) of the text.
                    Meaning, if we have a text like

                    text_for_lookahead_match  followed_by_text_of_interest followed_by_aonther_lookahead_text
                    

                    I thought I need something like

                    (?!whatever)(text_of_interest)(?=another_lookahead)
                    

                    but your solution

                    (?!whatever)(?=another_lookahead)(text_of_interest)
                    

                    does it - GREAT - Learned something new :-)

                    Thank you!!!

                    Cheers
                    Claudia

                    1 Reply Last reply Reply Quote 0
                    • Claudia FrankC Offline
                      Claudia Frank
                      last edited by Claudia Frank

                      @Itamar-Ben-Sinai
                      Of course this

                      you don’t do anything woring.

                      must be

                      you don’t do anything wrong.

                      Cheers
                      Claudia

                      1 Reply Last reply Reply Quote 0
                      • guy038G Online
                        guy038
                        last edited by guy038

                        Hi, @claudia-frank,

                        In other words, considering the general case, we have to search for C text, between two limits A and B

                        But, how to define text, which is between these two limits ? Well, simply, because, at ANY location reached :

                        • A limit A must not be found, further on, in the same line

                        • A limit B must be found, further on, in the same line

                        This implies the two conditions to respect :

                        • The negative look-ahead (?!.*?A)

                        • The positive look-ahead (?=.+?B)

                        In our particular case :

                        • Limit A is the string "file:\x2F\x2F ( \x2F represents the normal slash character, / )

                        • Limit B is the simple ending " character

                        • And, of course, C is the regex to get special characters [^\w:/.]

                        Just notice that we could swap the two lookarounds, without any problem ! Remember that, at any location, reached by the regex engine, the two conditions, resulting of the look-arounds, are, necessarily, both, evaluated !

                        Thus, the complete search regex (?-is)(?=.+?")(?!.*?"file:\x2F\x2F)[^\w:/.], correctly, find the same special characters, as in my previous post !


                        Now, using the example text, below :

                        A sim^ple tes#t "file://^598C308F;75D2A1D485#22405CE21918@E94D6475FC22E^pimgpsh_thumbnail_win_distr.jpg" A si;mple Te@st
                        
                        • As long as the current regex engine location is before the string "file:…, the negative look-ahead (?!.*?"file:\x2F\x2F) is not true, so no overall match is possible, whatever the text C searched

                        • As soon as the current regex engine location is at the ending " double quote, or further on, the positive look-ahead (?=.+?") is false, so no overall match is possible, too, whatever the text C searched

                        • But, when the current regex engine location is, BOTH, right after the "file:… string AND before the ending " double quote, the two conditions are, simultaneously, TRUE. So, an overall match may be found, providing it, also, matches the C text. That is to say, the regex [^\w:/.]

                        Cheers,

                        guy038

                        P.S. :

                        Note that when the current regex engine location is right after the starting double quote, the negative look-ahead (?!.*?"file:\x2F\x2F), this time, is true. So, we need to include the semicolon and the slash, as regular characters. Otherwise, they would be found and deleted, as well as the dot character !

                        1 Reply Last reply Reply Quote 1

                        Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                        Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                        With your input, this post could be even better 💗

                        Register Login
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors