Regex help: Find/Replace only on lines that include specific words
-
First, I’m not native English speaker and the sentences I write might be wrong, so feel free to ask me if you don’t understand my question.
Ok…
I have text files with tons of sentences.
Like
.
.
~~list([“Apple”, “Banana”, “Orange”])
~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”])
~~screen(“fruit_image”, _choice[1], )
~~action Return([“category”, “fruits”])
And tens of thousands of other sentences…
.
.And what I want to do is replace only “words” in choice([]) to (“words”)
Ex) choice([“Red”, “Blue”, “Orange”, … ,“Purple”]) --> choice([(“Red”), (“Blue”), (“Orange”),…,(“Purple”)])I need to change only “words” inside the choice([]) and not inside the list([]), screen([]), etc.
So this is the question I want to ask.
1. What regular expression should I use to do that?
I’ve used some regex, but I couldn’t find just the parts I wanted… The biggest problem is that I still only understand a little bit of the regular expression. I’ve been studying recently, but it’s still so hard…😭😭2. Is there a way to find/replace only within the bookmarked line?
If that is possible, I can solve the problem by bookmarking only the lines containing ‘choice’ and replacing (“.?") or (?=.?]))(”.*?") with ($1).3. Or is there a way to select all search results or to find/replace in the search results?
Then, I can solve the problem by using the ‘find in selection’ function… -
-
Thank you @Alan-Kilborn😀
I made regex after looking at the linked post, and it ‘almost’ seems to work well…Find:
(?i-s)(?:choice\(\[|\G).*?\K(?=.*?\]\))(".*?")
Replaced by:\($1\)I think it found every “words” in choice([]), so my problem was solved.
But there was one exception, it also found the code below.screen dropdown_menu(pos=(0, 0), name="", spacing=0, items_offset=(0, 0), background="#00000080", style="empty", iconset=["▾", "▴"]):
Can you tell me why that code was found as well?
I want to know what the problem of regex I made is. -
@비공개 I have an incomplete solution based on this strategy:
For each hit we want to match:
"choice(" followed by.. any stuff, until you bump into.. space or comma or left brace, followed immediately by.. NOT "(" (and issue a match reset) followed by.. a word inside fancy quotation marks (and this last text going into capture group 1)The search string is:
(?<=choice\().*?[ ,\\[](?<!\()\K(\“\w+\”)(I hope it appears that after the comma there’s ONE backslash before the left brace).
I found this matches what (I think) you want – well, I think it matches because the editor highlights it.
(Also, the backslashes escaping the fancy quotation marks appear to be optional.)
The replace text I used:
\(\1\)Here’s the big problem: the matched text doesn’t get replaced. I need a guru to explain to me why this is so.
My guess is that the problem is related to the fact that the fancy quotation marks are each three bytes long: E2809C and E2809D.
Another weakness of my solution is that (if replace worked) it only processes the first text meeting the criteria on a line, so you’d need to run “replace all” a bunch of times. (I think there are ways of overcoming this, resetting and backtracking, but I haven’t looked closely into that.)
-
@Neil-Schipper My suggestion that the failure to replace the captured text is related to the fancy quotation marks is probably wrong because I could easily match-and-capture
(“Blue”)and replace it with\(\1\)which gave the expected results.So maybe the problem is related to my use of \K.
-
Hello, @비공개, @alan-kilborn, @Neil-schipper and All,
Thanks for trying to get the solution by yourself !
I’ve already found out a suitable regex S/R for your case ! Try this version and tell me if it avoids the mentioned side-effects !
SEARCH
(?-is)(?:~~choice|(?!\A)\G).+?\K"\w+"REPLACE
\($0\)If OK, I could give your some regex explanations next time !
Best Regards,
guy038
P.S. :
I supposed that your file contains only regular double quotes
"and not the“and”characters, of Unicode value\x{201C}and\x{201D}, which are automatically displayed in our forum ! -
@guy038 It didn’t work for me. I ran it on this test text:
~~list([“Apple”, “Banana”, “Orange”]) ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”]) ~~choice([“Red”, (“Blue”), (“Orange”),…,(“Purple”)]) ~~choice([(“Red”), “Blue”, (“Orange”),…,(“Purple”)]) ~~choice([(“Red”), (“Blue”), “Orange”,…,(“Purple”)]) ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”]) ~~screen(“fruit_image”, _choice[1], ) ~~action Return([“category”, “fruits”]) choice([(“Red”), (“Blue”), (“Orange”),…,(“Purple”)])and after seeing your P.S. I converted the fancy qm’s to standard ascii:
~~list(["Apple", "Banana", "Orange"]) ~~choice(["Red", "Blue", "Orange", … ,"Purple"]) ~~choice(["Red", ("Blue"), ("Orange"),…,("Purple")]) ~~choice([("Red"), "Blue", ("Orange"),…,("Purple")]) ~~choice([("Red"), ("Blue"), "Orange",…,("Purple")]) ~~choice(["Red", "Blue", "Orange", … ,"Purple"]) ~~screen("fruit_image", _choice[1], ) ~~action Return(["category", "fruits"]) choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])but I still get no matches.
I didn’t analyze your search string, and I have no doubt it’s based on sound principles.
I am amazed to learn about the quotation marks getting altered. That’s another “gotcha” that warrants documentation in an easily found location! (Maybe it’s a feature than can be disabled.)
Also the codes for the qm’s you state are different from mine. I got mine (lazily) by running a conversion using the Converter plug-in, which I have not vetted for byte-level correctness against standard character tables. Yet another trap for the unwary?
-
Hello, @비공개, @alan-kilborn, @Neil-schipper and All,
Ah…OK, Neil. So I improved my regex S/R in order that it will not process anything if the double quotes are already preceded and followed with parentheses !
Here is the new version :
SEARCH
(?-is)(?:~~choice|(?!\A)\G).+?\K(?<!\()"\w+"(?!\))REPLACE
\($0\)Taking your INPUT text in account :
~~list(["Apple", "Banana", "Orange"]) ~~choice(["Red", "Blue", "Orange", … ,"Purple"]) ~~choice(["Red", ("Blue"), ("Orange"),…,("Purple")]) ~~choice([("Red"), "Blue", ("Orange"),…,("Purple")]) ~~choice([("Red"), ("Blue"), "Orange",…,("Purple")]) ~~choice(["Red", "Blue", "Orange", … ,"Purple"]) ~~screen("fruit_image", _choice[1], ) ~~action Return(["category", "fruits"]) choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])It correctly changes it as below :
~~list(["Apple", "Banana", "Orange"]) ~~choice([("Red"), ("Blue"), ("Orange"), … ,("Purple")]) ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")]) ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")]) ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")]) ~~choice([("Red"), ("Blue"), ("Orange"), … ,("Purple")]) ~~screen("fruit_image", _choice[1], ) ~~action Return(["category", "fruits"]) choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
Notes :
-
You must use only the
Replace Allbutton ( Do NOT click on theReplacebutton for successive replacements : it won’t work due to the\Ksyntax ! ) -
If you don’t tick the
Wrap aroundoption, move preferably the caret at the very beginning of current file -
This new version avoids the formation of forms such as
((((("text"))))), if you’re trying to execute this regex S/R several times !
BR
guy038
-
-
@guy038 Following your instructions, this works exactly as you say.
Furthermore, I see now that your earlier search string also works with Replace All (which I hadn’t tried) but adds the unwanted extra sets of
().Furthermore, I also see now that my original search string (my first post in this thread) also works with Replace All (which I also hadn’t tried) but with the requirement for successive runs to get the whole job done as I had stated.
It appears there’s something about
\Kthat I don’t understand, unless it’s something not fully described in the docs but that was discovered by trial and error.I do consider it a weakness (of both of our search strings) that single replaces don’t work.
-
There are many answers while I’m sleeping. Thank you so much @guy038 ,@Neil-Schipper, @Alan-Kilborn and all!
OK, so @guy038, as soon as I woke up, I tried your method and it solved my problem perfectly!😀😀😀
The “words” I’m looking for aren’t just written as “word charactors”, so I just changed
\wto[A-Za-z \-\.\!\?']. The example I held was not appropriate. I’m sorry.And please! I need your explain.
Especially, I’m not sure what(?:choice|(?!\A)\G)and.+?\K"mean. -
Hi, @비공개, @alan-kilborn, @Neil-schipper and All,
OK, @비공개, I’m going to give some pieces of information but, as always :
-
You have to know how to make cement before you can put two bricks together
-
You must know how to put two bricks together before building a wall
-
You must know how to build a wall before building a room
-
You must know how to build a room before building a house
and so on !
In other words, check this FAQ which gives you the main links to learn regular expressions, from top to bottom ;-))
Now, let’s go :
---------------------------------------------------------------------------------------------------------------------------------------------------------- Regarding MODIFIERS, generally met at BEGINNING of the regex, but which may occur at ANY location within the overall regex : (?-i) From this point, search care about letter's CASE (?i) From this point, search does NOT care about letter's CASE (?-s) From this point, any regex dot symbol represents a SINGLE STANDARD character. So the . is UNICODE equivalent to the NEGATIVE class character [^\r\n\f\x{0085}\x{2028}\x{2029}] for an Unicode encoded file and equivalent to [^\r\n\f] for an ANSI encoded file (?s) From this point, any regex DOT symbol represents ABSOLUTELY ANY character, included all the LINE-ENDING chars (?-x) From this point, any LITERAL SPACE character is SIGNIFICANT and is part of the overall regex ( IMPLICIT in a N++ regex ) (?x) From this point, any LITERAL SPACE character is IGNORED and just helps READABILITY of the overall regex. This mode is called FREE-SPACING mode and can SPLIT in SEVERAL lines. In this mode : - Any SPACE char must be written [ ] or \x20 or escaped with a \ character - Any text, after a # symbol, will be considered as COMMENTS - Any litteral # symbol must be written [#] or \x23 or escaped as \# (?-m) From this point : - The regex symbol ^ represents only the VERY BEGINNING of the file, so equivalent to the regex \A - The regex symbol $ represents only the VERY END of the file, so equivalent to the regex \z (?m) From this point, the assertions ^ and $ represent their USUAL signification of START and END of line locations ( IMPLICIT in a N++ regex ) ---------------------------------------------------------------------------------------------------------------------------------------------------------- Regarding GROUPS : (•••••) It defines a CAPTURING group which allows, both : - The regex engine to STORE the regex ENCLOSED part for FURTHER use, either in the SEARCH and/or the REPLACE part - The regex ENCLOSED part to be possibly REPEATED with a QUANTIFIER, located right after (?:•••••) It defines a NON-CAPTURING group which only allows the regex ENCLOSED part to be REPEATED and which is **not** stored by the regex engine Note that the MODIFIERS, described above, may be INCLUDED within the parentheses : - In a CAPTURING group as, for instance, ((?i)•••••) so that the INSENSITIVE search is RESTRICTED to the contents of this group, only - In a NON-CAPTURING group, TWO syntaxes are possible : for instance : (?:(?i)•••••) or the shorthand (?i:•••••) CAPTURING groups can be RE-USED with the syntax : - \1 to \9 in the SEARCH and/or REPLACE regexes for reference to group 1 to 9 - $1 to $99 in the REPLACE regex ONLY for reference to group 1 to 99 - ${1} to ${99} in the REPLACE regex ONLY for reference to group 1 to 99 For instance, the ${1}5 syntax means contents of GROUP 1 , followed with digit 5 where as the $15 syntax would have meant contents of GROUP 15 - $0 or ${0} in the REPLACE regex ONLY for reference to the OVERALL math of the SEARCH regex ---------------------------------------------------------------------------------------------------------------------------------------------------------- Regarding QUANTIFIERS, 6 syntaxes are possible {n} , {n,}, {n,m}, ?, + and *. Note that : - {n} EXACTLY n times the character or group, PRECEDING the quantifier - {n,} n or MORE times the character or group, PRECEDING the quantifier - {n,m} BETWEEN n and m times the character or group, PRECEDING the quantifier - ? is equivalent to {0,1} - + is equivalent to {1,} - * is equivalent to {0,} They are considered as GREEDY quantifiers because they match as MANY characters as possible If these 6 syntaxes are followed with a QUESTION MARK ?, they are called LAZY quantifiers because they match as FEW characters as possible For instance, given the following sentence : The licenses for most software are designed to take away your freedom to share and change it - Regex (?-s)e.+?ar, with the LAZY quantifier +?, matches --------------------------- - Regex (?-s)e.+ar , with the GREEDY quantifier +, matches --------------------------------------------------------------------------- If theses 6 syntaxes are followed with a ADDITION sign +, they are called ATOMIC quantifiers. - They are quite similar to their GREEDY forms, exceot that, in case of failure, they don't backtrack to attempt further possible match(es) - Note that this ADVANCED option should be studied when you'll be rather ACQUAINTED with regexes ! ---------------------------------------------------------------------------------------------------------------------------------------------------------- BTW, a quick tip to SIMULATE a NORMAL search when the REGULAR EXPRESSION mode is selected : START the search zone with the \Q syntax : For instance, the regex \Q/* This is a C-comment */ will find the LITERAL string /* This is a C-comment */
Now, I will rewrite my last regex, with your improvement, in the
Free-Spacingmode :---------------------------------------------------------------------------------------------------------------------------------------------------------- (?x-is) # FREE-SPACING mode, search SENSITIVE to CASE and DOT regex symbol represents a SINGLE STANDARD char (?: # BEGINNING of a NON-CAPTURING group ~~choice # Matches the string ~~choice, with this EXACT case | # OR ( ALTERNATION symbol ) (?!\A)\G # Matches from RIGHT AFTER the location of the LAST match, IF NOT at the VERY BEGINNING of the file ) # END of the NON-CAPTURING group .+? # The SMALLEST NON-NULL range of STANDARD characters till... \K # CURRENT match is DISCARDED and working location is RESET to this POINT (?<!\() # ONLY if it's NOT PRECEDED with a STARTING parenthesis symbol "[!'.?\w-]+" # ... a NON-NULL range of WORD chars or the characters !, ', ., ? and - (?!\)) # ONLY if it's NOT FOLLOWED with an ENDING parenthesis symbol NOTES : - This syntax is totally FUNCTIONAL. To be convinced do a NORMAL selection from (?x-is) to ENDING parenthesis symbol and hit the Ctrl + F shortcut => This MULTI- lines regex is AUTOMATICALLY inserted in the 'Search what' zone - The \G assertion means that the NEXT search must start, necessarily, RIGHT AFTER the LAST match ! - I rewrote your regex part [A-Za-z \-\.\!\?'] as [!'.?\w-] because most of the punctuation signs do NOT need to be ESCAPED, within a CLASS character. - However note that the DASH - must be found at the VERY BEGINNING or the VERY END of the class character, when NON escaped - I prefer the \w syntax to [A-Za-z] because \w also INClUDES all the ACCENTUATED characters of foreign languages - You must use ONLY the REPLACE ALL button ( Do NOT click on the REPLACE button for SUCCESSIVE replacements : it won't work due to the \K syntax ! ) - If you don't tick the WRAP AROUND option, move preferably the CARET at the VERY BEGINNING of current file - From BEGINNING of file, as the regex engine must SKIP some LINE-ENDING characters to get a match, the \G assertion is NOT verified and the regex engine must necessarily look, FIRST, for a string ~~choice - Then, from RIGHT AFTER the word choice, it grasps the SMALLEST NON-NULL range of STANDARD chars .+? till a "•••••" structure, but ONLY IF NOT embedded between PARENTHESES itself ! - And, due to the \K syntax, ONLY the part "•••••" is the FINAL match desired - This FINAL part is changed with the REPLACE regex \($0\) which just rewrites the string "•••••" between PARENTHESES. The parenthesis symbols must be ESCAPED as they have a SPECIAL signification in REPLACEMENT - Then, from RIGHT AFTER the closing " char, as the regex CANNOT find any other ~~choice string, the (?!\A)\G.+? part, again, selects the SMALLEST NON-NULL range of STANDARD characters till an OTHER block "•••••", execute the REPLACEMENT and so on...
In the example, below, in each second line ( Regex types ) :
-
The dot
.represents any char, found by the regex dummy part.+? -
The bullet
•represents any char, found by the regex useful part[!'.?\w-] -
The character
"and the string~~choicestand for themselves
Text processed ~~choice(["Red", ("Blue"), ("Orange"), … ,"Purple"]) Regex types ~~choice.."•••"..........................."••••••" Match number BEFORE \K 1111111111 222222222222222222222222222 Match number AFTER \K 11111 22222222 Text processed ~~choice([("Red"), "Blue", "Orange", … ,("Purple")]) Regex types ~~choice..........."••••".."••••••" Match number BEFORE \K 1111111111111111111 22 Match number AFTER \K 111111 22222222 Text processed ~~choice(["Red", "Blue", "Orange", … ,"Purple"]) Regex types ~~choice.."•••".."••••".."••••••"....."••••••" Match number BEFORE \K 1111111111 22 33 44444 Match number AFTER \K 11111 222222 33333333 44444444I hope that you’ll find this article useful, in any way !
However, let me add that the
\Gand\Kassertions, as well as atomic groups and recursive regexes or backtracking verbs ( not discussed ), are difficult notions and I can assure you that there are a LOT of regex things that you need to know before starting to use them !Best Regards,
guy038
-
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login