HELP! Replace spaces in text ONLY between quotes?



  • HELP! I have a script that needs editing. In it, it has calls to over 200 filenames.

    I discovered too late that the filenames can not contain spaces. Renaming the files themselves was easy with a Bulk Rename Utility, but now I have several hundred references to those files in my scripts and it would take me DAYS to replace the spaces in all those filename references with underscore “_” characters.

    A global Search/Replace of EVERY space, with an underscore obviously is not an option.

    Might there be a way to replace only spaces in lines of text that appear between quotes? Even a “wildcard” for the characters surrounding a space would be helpful.

    Big TIA.



  • Based on the vague specs you gave, a vague answer might be

    Given the data

    blah blah blah
    blah "file with space" blah
    blah
    "space file"
    blah
    not a file
    
    • FIND = (?-s)("\S+)\h(?=.*?")
    • REPLACE = ${1}_
    • mode = regular expression
    • then hit REPLACE or REPLACE ALL multiple times until everything is replaced.
      • If you use Replace, you will need to hit it once for every space in a filename in your script
      • if you use Replace All, you should only need to hit it once per space for the filename with the maximum spaces

    Note that this assumes that you only have one quoted filename per line. If it’s more than that, the logic becomes exceedingly convoluted.

    If you were able to give example data (if there is sensitive info, just use dummy words), which gave a good indication of the structure – whether there are any other rules that could help us narrow down the regex so it will have fewer false matches – that would be great.

    ----

    Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as plain text using the </> toolbar button or manual Markdown syntax. Screenshots can be pasted from the clipbpard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get… Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.



  • @PeterJones said in HELP! Replace spaces in text ONLY between quotes?:

    […]

    Thanks for the reply. I don’t understand the syntax for using that in a search (a simple copy/paste of the text in pink didn’t work.)

    Here is an example code block:

    codeblock_a {
    -----.nodename “file name with spaces”;
    -----.filepath "folderA/folderB/“file name with spaces”;
    }

    Thx. (hyphens are spaces.)



  • @PeterJones (Oops, never mind. I forgot to check the “Regular Expression” box.)

    Worked great. HUGE thanks! You saved me days of tedious work!



  • @Mugsys-RapSheet ,

    Glad it worked.

    As I said in my italics, using the </> button on the editing toolbar would have formatted your text as text, so that the quotes wouldn’t turn into smart quotes and spaces would be preserved:

    codeblock_a {
         .nodename "file_name_with_spaces";
         .filepath "folderA/folderB/"file_name_with_spaces";
    }
    

    if there is every the possibility of more than one filename on a line, the regex won’t work. if that’s true for you, feel free to ask for more clarification.



  • @PeterJones (Quick clarification for anyone reading this is the future:

    "folderA/folderB/“file name with spaces”;

    Should be:

    "folderA/folderB/file name with spaces”;

    (The extra quote was a typing mistake.)



  • Hello, @ugsys-rapSheet, @Peterjones and All,

    I’ve been thinking about the general problem of doing a replacement, ONLY between zones delimited with double-quotes ( "......" )

    The difficulty comes from the fact that the 2 delimiters are identical. So in order to help, we must, first, change, temporarily, the " ending delimiter into an other character, not used yet in current file. I chose the # character but any other character, even a control char, would be fine if not already present in current file.

    So, assuming the test sample, below :

        This is a "     C:\aaa zzz \file with spaces" test to verify    if the regex "C:\ aaa zzz\file with spaces.txt    "   works correctly
    "     C:\aaa zzz \file with spaces" test to verify    if the regex "C:\ aaa zzz\file with spaces.txt    "   works correctly
    This is a "     C:\aaa zzz \file with spaces" test to verify    if the regex "C:\ aaa zzz\file with spaces.txt    "
    "     C:\aaa zzz \file with spaces" test to verify    if the regex "C:\ aaa zzz\file with spaces.txt    "
    
    This is a "     file with spaces" test to verify    if the regex "file with spaces.txt    "   works correctly        
    "     file with spaces" test to verify    if the regex "file with spaces.txt    "   works correctly
    This is a "     file with spaces" test to verify    if the regex "file with spaces.txt    "
    "     file with spaces" test to verify    if the regex "file with spaces.txt    "
    
    This is a "     C:\aaa zzz \file with spaces" test to verify    if the regex "C:\ aaa zzz\file with spaces.txt    "   works correctly
    

    The first regex S/R is, obviously :

    SEARCH "(.+?)"

    REPLACE "\1#

    And we would end with :

        This is a "     C:\aaa zzz \file with spaces# test to verify    if the regex "C:\ aaa zzz\file with spaces.txt    #   works correctly
    "     C:\aaa zzz \file with spaces# test to verify    if the regex "C:\ aaa zzz\file with spaces.txt    #   works correctly
    This is a "     C:\aaa zzz \file with spaces# test to verify    if the regex "C:\ aaa zzz\file with spaces.txt    #
    "     C:\aaa zzz \file with spaces# test to verify    if the regex "C:\ aaa zzz\file with spaces.txt    #
    
    This is a "     file with spaces# test to verify    if the regex "file with spaces.txt    #   works correctly        
    "     file with spaces# test to verify    if the regex "file with spaces.txt    #   works correctly
    This is a "     file with spaces# test to verify    if the regex "file with spaces.txt    #
    "     file with spaces# test to verify    if the regex "file with spaces.txt    #
    
    This is a "     C:\aaa zzz \file with spaces# test to verify    if the regex "C:\ aaa zzz\file with spaces.txt      works correctly
    

    Now, here is the regex S/R, which changes any space char into an underscore character ( _ ) in zones ".......# and which changes, back, the temporary # character into the usual double-quote ( " ) :

    SEARCH (?-s)([^"#\r\n]*")\x20*|\x20*(#)|(\x20)+(?=.+#)

    REPLACE (?1\1)(?2")(?3_)

    We get our expected text :

        This is a "C:\aaa_zzz_\fil_with_spaces" test to verify    if the regex "C:\_aaa_zzz\file_with_spaces.txt"   works correctly
    "C:\aaa_zzz_\fil_with_spaces" test to verify    if the regex "C:\_aaa_zzz\file_with_spaces.txt"   works correctly
    This is a "C:\aaa_zzz_\fil_with_spaces" test to verify    if the regex "C:\_aaa_zzz\file_with_spaces.txt"
    "C:\aaa_zzz_\fil_with_spaces" test to verify    if the regex "C:\_aaa_zzz\file_with_spaces.txt"
    
    This is a "fil_with_spaces" test to verify    if the regex "file_with_spaces.txt"   works correctly        
    "fil_with_spaces" test to verify    if the regex "file_with_spaces.txt"   works correctly
    This is a "fil_with_spaces" test to verify    if the regex "file_with_spaces.txt"
    "fil_with_spaces" test to verify    if the regex "file_with_spaces.txt"
    
    This is a "C:\aaa_zzz_\fil_with_spaces" test to verify    if the regex "C:\ aaa zzz\file with spaces.txt      works correctly
    

    Note that, in the last test line, the last # ending delimiter is missing. So, all space chars ,after the last unpaired " character, are obviously not modified !

    Best Regards

    guy038

    P.S. :

    As, when a file is renamed, any leading or ending space char, after the extension, are omitted from the final name, then the regex changes the text :

    "    C:\xxx\name with spaces.txt    "
    

    as :

    "C:\xxx\name_with_spaces.txt"
    


  • @guy038 said:

    I’ve been thinking about the general problem of doing a replacement, ONLY between zones delimited with…

    So this is great.
    I started archiving it in my special “notes” file as a general-purpose solution.

    But…
    Then I realized it “goes too far” and perhaps its general-purposeness is destroyed.
    I’m talking about the P.S. part above.
    I’d have much rather have seen "____C:\xxx\name with spaces.txt____" as the result (keeps the general-purposeness).

    But…
    I’m also curious as to why you would solve a problem that the OP didn’t seem to have.
    OP said nothing about leading/trailing spaces needing to be removed.
    Of course, the OP was really light on sample data, but…

    Or am I missing something?



  • Hi, @mugsys-rapSheet, @Peterjones, @alan-kilbron and All,

    Well Alan, I just built up the regex that way, because it seems to be the default Windows rename behavior ;-) Indeed, on my Win XP laptop, if I try to rename, for instance, the Test.txt file, hitting the F2 key, into " Test.txt ", Windows does not allow this name change !

    Now, I realized that, in a DOS console window, the command :

    dir > "   Test.txt   "
    

    does create the file :

       Test.txt
    

    with 3 spaces before the word Test, but without any space char after the extension .txt !

    So, could you verify on your newer Windows OS ? And , of course, an alternate regex is always possible ;-))

    BR

    guy038



  • @guy038 ,

    I see the same behavior on Win10 command (with quotes around the spaces, the leading spaces are preserved, but trailing spaces are eliminated)



  • @guy038

    How about “Replace text1 with text2 only between delimiter1 and delimiter2”, where delimiter1 and delimiter2 could be the same.

    No other assumptions. Meaning, for one thing, that the data doesn’t have to all be on one line.

    Maybe that is too much for regex?
    (Oooh, a challenge for regex…)



  • Hi, @mugsys-rapSheet, @alan-kilborn and All,

    First, I’m just adapting my previous regex S/R to @mugsys-rapSheet needs, which concern absolute /relative paths to files :

    • Now, each single space char is replaced with a single _ char ( instead of several spaces into 1 underscore )

    • As a filename can contain leading spaces only and, as an absolute path, obviously cannot begin with space characters

    This leads to this regex S/R :

    SEARCH (?-s)([^"#\r\n]*")(?:\x20*(?=\u:))?|\x20*(#)|(\x20)(?=.+#)

    REPLACE (?1\1)(?2")(?3_)

    So from an initial text :

    A test to "    C:\aaa      zzz\     file with    spaces  .txt     #  verify the regex   "    C:\aaa      zzz\     file with    spaces  .txt     #    behavior
    
    A test to "     file with    spaces  .txt     #  verify the regex   "     file with    spaces  .txt     #    behavior
    

    we would end with :

    A test to "C:\aaa______zzz\_____file_with____spaces__.txt"  verify the regex   "C:\aaa______zzz\_____file_with____spaces__.txt"    behavior
    
    A test to "_____file_with____spaces__.txt"  verify the regex   "_____file_with____spaces__.txt"    behavior
    

    Alan, do you find this replacement more acceptable ?


    Now, regarding the regex challenge, I don’t think that multi-lines search, with the (?s) modifier, would be a problem !

    But, of course, we would assume that :

    • Any range with identical opening and ending delimiter ( mostly the ' and " characters ) would have been previously changed into an oriented range, like, for instance, ".........# and '.........# !

    • The search would not simultaneously search for two or more ranges ( for instance, ranges ".........# and {..........} ) Too tricky ;-))

    Let me get to the bottom of this ;-)) I just hope not to be long !

    Best regards,

    guy038



  • @guy038 said in HELP! Replace spaces in text ONLY between quotes?:

    But, of course, we would assume that…

    Yes, those two assumption points are fine!

    One thing I forgot to mention earlier is that perhaps text1, text2, delimiter1 and delimiter2 could all be multiple-characters in length (maybe not so obvious for the delimiters?)



  • @guy038 said in HELP! Replace spaces in text ONLY between quotes?:

    Alan, do you find this replacement more acceptable ?

    I didn’t really mean to “downgrade” the original solution.
    It probably helped the OP.
    I was just disappointed that it “went too far” so that the general problem solution (that we are now discussing) may have been obscured in the special-purpose result.
    But…we’re on the right track now. :-)



  • Hi, @alan-kilobrn and All,

    Well, here it is ! I found a regex structure and some variants, which do the trick very nicely ;-))

    Let’s define these 4 variables :

    • Srch_Expr = Char|String|Regex

    • Repl_Text = Char|String|Regex

    • Opn_Del = Char

    • End_Del = Char

    Then the Search_Replacement_A, which matches any Srch_Expr, ONLY WHEN located between the Opn_Del and the End_Del delimiter(s), is :

    • SEARCH Srch_Expr(?=[^Opn_Del]*?End_Del)

    • REPLACE Repl_Text


    Rules and Remarks :

    • The Opn_Del and End_Del delimiters represent, both, a single char, which may be :

      • Characters already present of the current file

      • Characters presently absent in current file and manually added by the user ( Verify with the Count feature they are new chars ! )


    • In case, of added delimiters, use, preferably, the Search_Replacement_B, below, which deletes these temporary delimiters, at the same time :

      • SEARCH Opn_Del|(Srch_Expr)(?=[^Opn_Del]*?End_Del)|End_Del

      • REPLACE ?1Repl_Text


    • The Opn_Del and End_Del delimiters must not be identical. If NOT ( usually, cases "...." or '....' ), perform the following regex S/R, first :

      • SEARCH Delim(.+?)Delim

      • REPLACE Opn_Del\1End_Del

      • Then, execute the Search_Replacement_C, which rewrites, as well, the initial double delimiters Delim :

        • SEARCH Opn_Del|(Srch_Expr)(?=[^Opn_Del]*?End_Del)|End_Del

        • REPLACE ?1Repl_Text:Delim


    • In case that the last zone Opn_Del......End_Del ends at the very end of current file, the End_Del delimiter, of the last zone, can be omitted and you’ll use the Search_Replacement_D, below :

      • SEARCH Srch_Expr(?=[^Opn_Del]*?(End_Del|\z))

      • REPLACE Repl_Text


    • The different zones, between delimiters, must be complete, with their matched different delimiters ( Open_Del......End_Del )

    • The different zones, between delimiters, may be juxtaposed but NOT nested

    • A zone can lie in a single line or split over several lines

    • Srch_Expr must not match the Opn_Del delimiter, or part of it


    As an example, let’s use the license.txt file and define :

    • Opn_Del &

    • End_Del #

    • Srch_Expr \w+

    I previously verified that, both, these two symbols do not exist in the license.txt file

    • Place one or several zones &......#, anywhere in this file. The zones may be split on several lines, with the OplDel & in a line and an End_Del # on a further line

    So, the appropriate search regex_A is :

    SEARCH \w+(?=[^&]*?#)

    And imagine that we want to surround any word with strings [-- and --], in zones &............#, ONLY. Hence, the replace Regex_A is :

    REPLACE [--$0--]


    With the same delimiters, let’s suppose that we want, this time, to find any single standard char, so the regex (?-s). and to replace it with an underscore _ char

    However, regarding the rules above, this regex should not match the Opn_Del delimiter. This can be achieved with the regex (?-s)(?!&)., giving the following Search_Replacement_A :

    SEARCH (?-s)(?!&).(?=[^&]*?#)

    REPLACE _


    Now, Alan I also found a regex which works when delimiters are simple words as for instance, STT and END and not single characters :

    • Regex_E = Srch_Expr(?=(?s-i:(?:(?!Opn_Del).)*?End_Del))

    Unfortunately, due to the negative look-ahead syntax, (?!Opn_Del).), this regex bugges, even with the light license.txt file :-(( You know : the final wrong “all contents” match ! But it works nice when few lines are involved !

    The safe solution, in that case, is to replace word delimiters with char delimiters, first ( for instance STT -> & and END -> # )


    Finally, a possible @mugsys-rapSheet’s solution, much shorter, using the generic Search_Replacement_C, could be :

    SEARCH &|(\x20)(?=[^&]*?#)|#

    REPLACE ?1_:"

    Of course, assuming this text :

    &    C:\xxx\name with spaces.txt    #
    

    Then, the leading and trailing spaces, around the absolute paths, would not be deleted anymore but just replaced with _ characters. Thus, the final text would be :

    "____C:\xxx\name_with_spaces.txt____"
    

    Best regards,

    guy038


Log in to reply