Regex help: Find/Replace only on lines that include specific words



  • First, I’m not native English speaker and the sentences I write might be wrong, so feel free to ask me if you don’t understand my question.

    Ok…
    I have text files with tons of sentences.
    Like
    .
    .
    ~~list([“Apple”, “Banana”, “Orange”])
    ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”])
    ~~screen(“fruit_image”, _choice[1], )
    ~~action Return([“category”, “fruits”])
    And tens of thousands of other sentences…
    .
    .

    And what I want to do is replace only “words” in choice([]) to (“words”)
    Ex) choice([“Red”, “Blue”, “Orange”, … ,“Purple”]) --> choice([(“Red”), (“Blue”), (“Orange”),…,(“Purple”)])

    I need to change only “words” inside the choice([]) and not inside the list([]), screen([]), etc.

    So this is the question I want to ask.

    1. What regular expression should I use to do that?
    I’ve used some regex, but I couldn’t find just the parts I wanted… The biggest problem is that I still only understand a little bit of the regular expression. I’ve been studying recently, but it’s still so hard…😭😭

    2. Is there a way to find/replace only within the bookmarked line?
    If that is possible, I can solve the problem by bookmarking only the lines containing ‘choice’ and replacing (".?") or (?=.?]))(".*?") with ($1).

    3. Or is there a way to select all search results or to find/replace in the search results?
    Then, I can solve the problem by using the ‘find in selection’ function…



  • @비공개

    The answers to your questions 2 and 3 is “no” and “no”.

    For question number 1, there is a technique presented HERE for solving that. Have a read and see if you can apply that technique.



  • Thank you @Alan-Kilborn😀
    I made regex after looking at the linked post, and it ‘almost’ seems to work well…

    Find: (?i-s)(?:choice\([|\G).*?\K(?=.*?]\))(".*?")
    Replaced by: \($1\)

    I think it found every “words” in choice([]), so my problem was solved.
    But there was one exception, it also found the code below.

    screen dropdown_menu(pos=(0, 0), name="", spacing=0, items_offset=(0, 0), background="#00000080", style="empty", iconset=["▾", "▴"]):

    3.PNG

    Can you tell me why that code was found as well?
    I want to know what the problem of regex I made is.



  • @비공개 I have an incomplete solution based on this strategy:

    For each hit we want to match:

    "choice(" followed by..
    any stuff, until you bump into..
    space or comma or left brace, followed immediately by..
    NOT "("
    (and issue a match reset) 
    followed by..
    a word inside fancy quotation marks
    (and this last text going into capture group 1)
    

    The search string is: (?<=choice\().*?[ ,\[](?<!\()\K(\“\w+\”)

    (I hope it appears that after the comma there’s ONE backslash before the left brace).

    I found this matches what (I think) you want – well, I think it matches because the editor highlights it.

    (Also, the backslashes escaping the fancy quotation marks appear to be optional.)

    The replace text I used: \(\1\)

    Here’s the big problem: the matched text doesn’t get replaced. I need a guru to explain to me why this is so.

    My guess is that the problem is related to the fact that the fancy quotation marks are each three bytes long: E2809C and E2809D.

    Another weakness of my solution is that (if replace worked) it only processes the first text meeting the criteria on a line, so you’d need to run “replace all” a bunch of times. (I think there are ways of overcoming this, resetting and backtracking, but I haven’t looked closely into that.)



  • @Neil-Schipper My suggestion that the failure to replace the captured text is related to the fancy quotation marks is probably wrong because I could easily match-and-capture (“Blue”) and replace it with \(\1\) which gave the expected results.

    So maybe the problem is related to my use of \K.



  • Hello, @비공개, @alan-kilborn, @Neil-schipper and All,

    Thanks for trying to get the solution by yourself !

    I’ve already found out a suitable regex S/R for your case ! Try this version and tell me if it avoids the mentioned side-effects !

    SEARCH (?-is)(?:~~choice|(?!\A)\G).+?\K"\w+"

    REPLACE \($0\)

    If OK, I could give your some regex explanations next time !

    Best Regards,

    guy038

    P.S. :

    I supposed that your file contains only regular double quotes " and not the and characters, of Unicode value \x{201C} and \x{201D}, which are automatically displayed in our forum !



  • @guy038 It didn’t work for me. I ran it on this test text:

    ~~list([“Apple”, “Banana”, “Orange”])
    ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”])
    ~~choice([“Red”, (“Blue”), (“Orange”),…,(“Purple”)])
    ~~choice([(“Red”), “Blue”, (“Orange”),…,(“Purple”)])
    ~~choice([(“Red”), (“Blue”), “Orange”,…,(“Purple”)])
    ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”])
    ~~screen(“fruit_image”, _choice[1], )
    ~~action Return([“category”, “fruits”])
    choice([(“Red”), (“Blue”), (“Orange”),…,(“Purple”)])
    

    and after seeing your P.S. I converted the fancy qm’s to standard ascii:

    ~~list(["Apple", "Banana", "Orange"])
    ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
    ~~choice(["Red", ("Blue"), ("Orange"),…,("Purple")])
    ~~choice([("Red"), "Blue", ("Orange"),…,("Purple")])
    ~~choice([("Red"), ("Blue"), "Orange",…,("Purple")])
    ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
    ~~screen("fruit_image", _choice[1], )
    ~~action Return(["category", "fruits"])
    choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
    

    but I still get no matches.

    I didn’t analyze your search string, and I have no doubt it’s based on sound principles.

    I am amazed to learn about the quotation marks getting altered. That’s another “gotcha” that warrants documentation in an easily found location! (Maybe it’s a feature than can be disabled.)

    Also the codes for the qm’s you state are different from mine. I got mine (lazily) by running a conversion using the Converter plug-in, which I have not vetted for byte-level correctness against standard character tables. Yet another trap for the unwary?



  • Hello, @비공개, @alan-kilborn, @Neil-schipper and All,

    Ah…OK, Neil. So I improved my regex S/R in order that it will not process anything if the double quotes are already preceded and followed with parentheses !

    Here is the new version :

    SEARCH (?-is)(?:~~choice|(?!\A)\G).+?\K(?<!\()"\w+"(?!\))

    REPLACE \($0\)

    Taking your INPUT text in account :

    ~~list(["Apple", "Banana", "Orange"])
    ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
    ~~choice(["Red", ("Blue"), ("Orange"),…,("Purple")])
    ~~choice([("Red"), "Blue", ("Orange"),…,("Purple")])
    ~~choice([("Red"), ("Blue"), "Orange",…,("Purple")])
    ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
    ~~screen("fruit_image", _choice[1], )
    ~~action Return(["category", "fruits"])
    choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
    

    It correctly changes it as below :

    ~~list(["Apple", "Banana", "Orange"])
    ~~choice([("Red"), ("Blue"), ("Orange"), … ,("Purple")])
    ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
    ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
    ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
    ~~choice([("Red"), ("Blue"), ("Orange"), … ,("Purple")])
    ~~screen("fruit_image", _choice[1], )
    ~~action Return(["category", "fruits"])
    choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
    

    Notes :

    • You must use only the Replace All button ( Do NOT click on the Replace button for successive replacements : it won’t work due to the \K syntax ! )

    • If you don’t tick the Wrap around option, move preferably the caret at the very beginning of current file

    • This new version avoids the formation of forms such as ((((("text"))))), if you’re trying to execute this regex S/R several times !

    BR

    guy038



  • @guy038 Following your instructions, this works exactly as you say.

    Furthermore, I see now that your earlier search string also works with Replace All (which I hadn’t tried) but adds the unwanted extra sets of ().

    Furthermore, I also see now that my original search string (my first post in this thread) also works with Replace All (which I also hadn’t tried) but with the requirement for successive runs to get the whole job done as I had stated.

    It appears there’s something about \K that I don’t understand, unless it’s something not fully described in the docs but that was discovered by trial and error.

    I do consider it a weakness (of both of our search strings) that single replaces don’t work.



  • There are many answers while I’m sleeping. Thank you so much @guy038 ,@Neil-Schipper, @Alan-Kilborn and all!

    OK, so @guy038, as soon as I woke up, I tried your method and it solved my problem perfectly!😀😀😀

    The “words” I’m looking for aren’t just written as “word charactors”, so I just changed \w to [A-Za-z \-\.\!\?']. The example I held was not appropriate. I’m sorry.

    And please! I need your explain.
    Especially, I’m not sure what (?:choice|(?!\A)\G) and .+?\K" mean.



  • Hi, @비공개, @alan-kilborn, @Neil-schipper and All,

    OK, @비공개, I’m going to give some pieces of information but, as always :

    • You have to know how to make cement before you can put two bricks together

    • You must know how to put two bricks together before building a wall

    • You must know how to build a wall before building a room

    • You must know how to build a room before building a house

    and so on !

    In other words, check this FAQ which gives you the main links to learn regular expressions, from top to bottom ;-))

    Now, let’s go :

    ----------------------------------------------------------------------------------------------------------------------------------------------------------
    
    Regarding MODIFIERS, generally met at BEGINNING of the regex, but which may occur at ANY location within the overall regex :
    
    (?-i)   From this point, search      care      about letter's CASE
    
    (?i)    From this point, search  does NOT care about letter's CASE
    
    	      
    (?-s)   From this point, any regex dot symbol represents a SINGLE STANDARD character. So the . is UNICODE equivalent to the NEGATIVE class character
                  [^\r\n\f\x{0085}\x{2028}\x{2029}] for an Unicode encoded file and equivalent to [^\r\n\f] for an ANSI encoded file
    	      
    (?s)    From this point, any regex DOT symbol represents ABSOLUTELY ANY character, included all the LINE-ENDING chars
    	      
    
    (?-x)   From this point, any LITERAL SPACE character is SIGNIFICANT and is part of the overall regex ( IMPLICIT in a N++ regex )
    
    (?x)    From this point, any LITERAL SPACE character is IGNORED  and just helps READABILITY of the overall regex.
                 This mode is called FREE-SPACING mode and can SPLIT in SEVERAL lines. In this  mode :
    
                 - Any SPACE char must be written [ ] or \x20  or escaped with a \ character
                 - Any text, after a # symbol, will be considered as COMMENTS
                 - Any litteral # symbol must be written [#] or \x23 or escaped as \#
    
    	      
    (?-m)   From this point :
                 - The regex symbol ^ represents only  the VERY BEGINNING of the file, so equivalent to the regex \A
                 - The regex symbol $ represents only  the VERY END       of the file, so equivalent to the regex \z
    	      
    (?m)    From this point, the assertions ^ and $ represent their USUAL signification of START and END of line locations ( IMPLICIT in a N++ regex )
    
    ----------------------------------------------------------------------------------------------------------------------------------------------------------
    
    Regarding GROUPS :
    
    (•••••)    It defines a CAPTURING group which allows, both :
    
                   - The regex engine to STORE the regex ENCLOSED part for FURTHER use, either in the SEARCH and/or the REPLACE part
    
                   - The regex ENCLOSED part to be possibly REPEATED with a  QUANTIFIER, located right after
    
    (?:•••••)  It defines a NON-CAPTURING group which only allows the regex ENCLOSED part to be REPEATED and which is **not** stored by the regex engine
    
    Note that the MODIFIERS, described above, may be INCLUDED within the parentheses :
    
                 - In a CAPTURING group as, for instance, ((?i)•••••) so that the INSENSITIVE search is RESTRICTED to the contents of this group, only
    
                 - In a NON-CAPTURING group, TWO syntaxes are possible : for instance : (?:(?i)•••••) or the shorthand (?i:•••••)
    
    
    CAPTURING groups can be RE-USED with the syntax :
    
        - \1   to \9     in the SEARCH and/or REPLACE regexes   for reference to group 1 to  9
        - $1   to $99    in the REPLACE regex ONLY              for reference to group 1 to 99
        - ${1} to ${99}  in the REPLACE regex ONLY              for reference to group 1 to 99
    
    	    For instance, the ${1}5 syntax means contents of GROUP 1 , followed with digit 5 where as the $15 syntax would have meant contents of GROUP 15
    				  
        - $0 or ${0}     in the REPLACE regex ONLY              for reference to the OVERALL math of the SEARCH regex
    
    ----------------------------------------------------------------------------------------------------------------------------------------------------------
    
    Regarding QUANTIFIERS, 6 syntaxes are possible {n} , {n,}, {n,m}, ?, + and *. Note that :
    
        - {n}   EXACTLY n       times the character or group, PRECEDING the quantifier
        - {n,}  n or MORE       times the character or group, PRECEDING the quantifier
        - {n,m} BETWEEN n and m times the character or group, PRECEDING the quantifier
    
        - ? is equivalent to {0,1}
        - + is equivalent to {1,}
        - * is equivalent to {0,}
    
    They are considered as GREEDY quantifiers because they match as MANY characters as possible
    
    If these 6 syntaxes are followed with a QUESTION MARK ?, they are called LAZY quantifiers because they match as FEW characters as possible
        
    For instance, given the following sentence :
                                                                The licenses for most software are designed to take away your freedom to share and change it
    - Regex (?-s)e.+?ar, with the LAZY   quantifier +?, matches   ---------------------------
    - Regex (?-s)e.+ar , with the GREEDY quantifier +,  matches   ---------------------------------------------------------------------------
    
    
    If theses 6 syntaxes are followed with a ADDITION sign +, they are called ATOMIC quantifiers.
    
       - They are quite similar to their GREEDY forms, exceot that, in case of failure, they don't backtrack to attempt further possible match(es)
    
       - Note that this ADVANCED option should be studied when you'll be rather ACQUAINTED with regexes !
    
    ----------------------------------------------------------------------------------------------------------------------------------------------------------
    
    BTW, a quick tip to SIMULATE a NORMAL search when the REGULAR EXPRESSION mode is selected : START the search zone with the \Q syntax :
    
        For instance, the regex \Q/* This is a C-comment */ will find the LITERAL string  /* This is a C-comment */
    
    

    Now, I will rewrite my last regex, with your improvement, in the Free-Spacing mode :

    ----------------------------------------------------------------------------------------------------------------------------------------------------------
    
    (?x-is)                #  FREE-SPACING mode, search SENSITIVE to CASE and DOT regex symbol represents a SINGLE STANDARD char
    (?:                    #  BEGINNING of a NON-CAPTURING group
    ~~choice               #       Matches the string ~~choice, with this EXACT case
    |                      #    OR ( ALTERNATION symbol )
    (?!\A)\G               #       Matches from RIGHT AFTER the location of the LAST match, IF NOT at the VERY BEGINNING of the file
    )                      #  END of the NON-CAPTURING group
    .+?                    #  The SMALLEST NON-NULL range of STANDARD characters till...
    \K                     #  CURRENT match is DISCARDED and working location is RESET to this POINT
    (?<!\()                #  ONLY if it's NOT PRECEDED with a STARTING parenthesis symbol
    "[!'.?\w-]+"           #  ... a NON-NULL range of WORD chars or the characters !, ', . ? and -
    (?!\))                 #  ONLY if it's NOT FOLLOWED with an ENDING parenthesis symbol
    
    
    NOTES :
    
    - This syntax us totally FUNCTIONAL. To be convinced do a NORMAL selection from (?x-is) to ENDING parenthesis symbol and hit the Ctrl + F shortcut
         => This MULTI- lines regex is AUTOMATICALLY inserted in the 'Search what' zone 
    
    - The \G assertion means that the NEXT search must start, necessarily, RIGHT AFTER the LAST match !
    
    - I rewrote your regex part [A-Za-z \-\.\!\?'] as [!'.?\w-] because most of the punctuation signs do NOT need to be ESCAPED, within a CLASS character.
    
        - However note that the DASH - must be found at the VERY BEGINNING or the VERY END of the class character, when NON escaped
        - I prefer the \w syntax to [A-Za-z] because \w also INClUDES all the ACCENTUATED characters of foreign languages
    
    - You must use ONLY the REPLACE ALL button ( Do NOT click on the REPLACE button for SUCCESSIVE replacements : it won't work due to the \K syntax ! )
    
    - If you don't tick the WRAP AROUND option, move preferably the CARET at the VERY BEGINNING of current file
    
    - From BEGINNING of file, as the regex engine must SKIP some LINE-ENDING characters to get a match, the \G assertion is NOT verified
    
        and the regex engine must necessarily look, FIRST, for a string ~~choice
    
    - Then, from RIGHT AFTER the word choice, it grasps the SMALLEST NON-NULL range of STANDARD chars .+? till a "•••••" structure, but ONLY IF NOT embedded
          between PARENTHESES itself !
    
    - And, due to the \K syntax, ONLY the part "•••••" is the FINAL match desired
    
    - This FINAL part is changed with the REPLACE regex \($0\) which just rewrites the string "•••••" between PARENTHESES.
        The parenthesis symbols must be ESCAPED as they have a SPECIAL signification in REPLACEMENT
    
    - Then, from RIGHT AFTER the closing " char, as the regex CANNOT find any other ~~choice string, the (?!\A)\G.+? part, again, selects the SMALLEST NON-NULL
        range of STANDARD characters till an OTHER block "•••••", execute the REPLACEMENT and so on...
    
    

    In the example, below, in each second line ( Regex types ) :

    • The dot . represents any char, found by the regex dummy part .+?

    • The bullet represents any char, found by the regex useful part [!'.?\w-]

    • The character " and the string ~~choice stand for themselves

    Text  processed          ~~choice(["Red", ("Blue"), ("Orange"), … ,"Purple"])
    Regex types              ~~choice.."•••"..........................."••••••"
    Match number BEFORE \K   1111111111     222222222222222222222222222
    Match number AFTER  \K             11111                           22222222
    
    
    Text  processed          ~~choice([("Red"), "Blue", "Orange", … ,("Purple")])
    Regex types              ~~choice..........."••••".."••••••"
    Match number BEFORE \K   1111111111111111111      22        
    Match number AFTER  \K                      111111  22222222
    
    
    Text  processed          ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
    Regex types              ~~choice.."•••".."••••".."••••••"....."••••••"
    Match number BEFORE \K   1111111111     22      33        44444        
    Match number AFTER  \K             11111  222222  33333333     44444444
    

    I hope that you’ll find this article useful, in any way !

    However, let me add that the \G and \K assertions, as well as atomic groups and recursive regexes ( not discussed ), are difficult notions and I can assure you that there are a LOT of regex things that you need to know before starting to use them !

    Best Regards,

    guy038


Log in to reply