Help replacing spaces between wildcards between quotes
Fellow Notepad++ Users,
Could you please help me the the following search-and-replace problem I am having?
I have a large dataset of names I’m trying to organize, the middle process is the most problematic, I’m trying to replace all the spaces in-between wildcard texts within quotes with something else, so that I can later replace them again. But there’s a large variety of combinations of spaces and dashes and numbers of words between quotes.
Here is the data I currently have (“before” data):
"text text" "text text" "text-text" "text text" "text-text" "text text-text" "text-text text" "text text-text text-text" "text-text text text-text" "text text-text text-text text" ... etc
Here is how I would like that data to look (“after” data):
"text=text" "text=text" "text-text" "text=text" "text-text" "text=text-text" "text-text text" "text=text-text=text-text" "text-text=text=text-text" "text=text-text=text-text text"
To accomplish this, I have tried using the following Find/Replace expressions and settings
Find What = `"(\w+)\h(\w+)"` Replace With = `"(\1)=(\2)"` Search Mode = REGULAR EXPRESSION Dot Matches Newline = CHECKED
This does work, except I then have to search and replace every possible combinations to get them all.
Find What = "(\w+)-(\w+)\h(\w+)" Replace With = "(\1)-(\2)=(\3)" Find What = "(\w+)\h(\w+)-(\w+)" Replace With = "(\1)=(\2)-(\3)" Find What = "(\w+)\h(\w+)\h(\w+)-(\w+)" Replace With = "(\1)=(\2)=(\3)-(\4)" Find What = "(\w+)\h(\w+)-(\w+)-(\w+)" Replace With = "(\1)=(\2)-(\3)-(\4)" Find What = "(\w+)-(\w+)\h(\w+)-(\w+)" Replace With = "(\1-(\2)=(\3)-(\4)" ... etc
I couldn’t figure out how to get the logic to work with regular expression unfortunately: to search and replace any numbers of spaces between any numbers of texts, but only when it’s between quotes, and not replaces dashes…
Any help will be immensely helpful.
Thanks for showing what you already tried, and showing before/after data. That’s helpful.
Same replacement as you used (or I phrase it as
You may have to do Replace All multiple times to get them all, but you don’t have to change the expression.
In the first “term”, i search for word characters (alpha, numeric, underscore) or equals-sign (to allow it to do the multiple joins like you showed in the later examples) or the minus-sign (to allow
text-text). In the second term, allow only word, hyphen, or horizontal spaces (that way, it handles the 3-or-more-word quoted terms correctly).
This also assumes that your last line in the final result
was actually intended to be
Because you didn’t give a rule that would have left any spaces in your problem description (which sounded like you wanted all spaces between the quotes to become equals).
You could also use the generic regex for “Replacing in a specific zone of text” which is linked in our generic regex FAQ, which could accomplish the same thing in a single Replace All. The beginning and ending expressions (BSR and ESR) would just be simple quote marks, the find-expression (FR) would be a space or the
\hequivalent, and the replacement (RR) would be the equals-sign. (I would have suggested that first, but since you’d already put in the work for your custom expression, so I thoguht I’d give you the tailored version)
@peterjones you know… I’ve never used notepad++'s regex before, and i felt like i just delved into the deep end totally by accident. Spent a good few days trying to learn coding from scratch, and still couldn’t figure it out. Thanks a bunch man, this works, and helps me a lot with my work. Cheers.
(I will keep at learning it, so hopefully i can help someone else next time)
I’m wondering if the typical replace-but-only-between-delimiters technique can be adapted for this case, i.e., when the beginning and ending delimiter is the same?
I’m wondering if the typical replace-but-only-between-delimiters technique can be adapted for this case
A simple trial of the referenced technique:
Yields this result on the OP’s original text:
"text=text" "text=text"="text-text"="text=text"="text-text" ="text=text-text"=="text-text=text"= "text=text-text=text-text" "text-text=text=text-text" "text=text-text=text-text=text"
Which is not what was wanted; compare of desired (left side) and actual result:
Interesting. Yeah, the generic regex doesn’t currently work when the BSR and ESR are the same string. Something would need to be done to the expression to make sure it’s always between balanced pairs, rather than between any two instances of the single wrapping character.
Changing to single-line matching for the ESR (…
(?-si:(?!").)*?…) will fix the equal at the beginning of line 3… but the other differences on 2-3 will not be fixed. (The difference in line 6 is, I assume, a mistake on the part of the OP, as I said in my earlier reply, because the rules defined say to replace all the spaces between the pairs of quotes, and line 6 in the example missed one.)
For this specific instance, I would be tempted to do a three-step regex sequence: on the first, change pairs of
“…”, then use
“as BSR and
”as ESR, then convert
"...". That wouldn’t work in the general case, because of course the BSR==ESR might not always be ASCII quotes. But for this specific instance, my tailored regex is conceptually easier for me to understand, so that’s what I’d actually use.
I think @guy038’s homework should be to come up with the equivalent generic syntax for when BSR==ESR. (I cannot easily think of a way to “consume” the ESR after all the FR have been replaced between the previous BSR and that ESR). Once it’s been vetted, he can add it as a follow on to the official page ;-)
I think @guy038’s homework should be to come up with the equivalent generic syntax for when BSR==ESR. (I cannot easily think of a way to “consume” the ESR after all the FR have been replaced between the previous BSR and that ESR). Once it’s been vetted, he can add it as a follow on to the official page
Well, yes, our course that is the ideal thing to happen next. :-)
But perhaps it is not reasonable to try to wedge a problem like this into the BSR/ESR solution. It has come up before, I just found THIS and that solution also uses what Peter suggests (change identical delimiter such that both delimiters are no longer identical).