Community
    • Login

    Regex help: Find/Replace only on lines that include specific words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regex
    11 Posts 4 Posters 807 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Alan KilbornA
      Alan Kilborn @비공개
      last edited by

      @비공개

      The answers to your questions 2 and 3 is “no” and “no”.

      For question number 1, there is a technique presented HERE for solving that. Have a read and see if you can apply that technique.

      1 Reply Last reply Reply Quote 2
      • 비공개비
        비공개
        last edited by 비공개

        Thank you @Alan-Kilborn😀
        I made regex after looking at the linked post, and it ‘almost’ seems to work well…

        Find: (?i-s)(?:choice\(\[|\G).*?\K(?=.*?\]\))(".*?")
        Replaced by: \($1\)

        I think it found every “words” in choice([]), so my problem was solved.
        But there was one exception, it also found the code below.

        screen dropdown_menu(pos=(0, 0), name="", spacing=0, items_offset=(0, 0), background="#00000080", style="empty", iconset=["▾", "▴"]):

        3.PNG

        Can you tell me why that code was found as well?
        I want to know what the problem of regex I made is.

        Neil SchipperN 1 Reply Last reply Reply Quote 1
        • Neil SchipperN
          Neil Schipper @비공개
          last edited by

          @비공개 I have an incomplete solution based on this strategy:

          For each hit we want to match:

          "choice(" followed by..
          any stuff, until you bump into..
          space or comma or left brace, followed immediately by..
          NOT "("
          (and issue a match reset) 
          followed by..
          a word inside fancy quotation marks
          (and this last text going into capture group 1)
          

          The search string is: (?<=choice\().*?[ ,\\[](?<!\()\K(\“\w+\”)

          (I hope it appears that after the comma there’s ONE backslash before the left brace).

          I found this matches what (I think) you want – well, I think it matches because the editor highlights it.

          (Also, the backslashes escaping the fancy quotation marks appear to be optional.)

          The replace text I used: \(\1\)

          Here’s the big problem: the matched text doesn’t get replaced. I need a guru to explain to me why this is so.

          My guess is that the problem is related to the fact that the fancy quotation marks are each three bytes long: E2809C and E2809D.

          Another weakness of my solution is that (if replace worked) it only processes the first text meeting the criteria on a line, so you’d need to run “replace all” a bunch of times. (I think there are ways of overcoming this, resetting and backtracking, but I haven’t looked closely into that.)

          Neil SchipperN 1 Reply Last reply Reply Quote 1
          • Neil SchipperN
            Neil Schipper @Neil Schipper
            last edited by

            @Neil-Schipper My suggestion that the failure to replace the captured text is related to the fancy quotation marks is probably wrong because I could easily match-and-capture (“Blue”) and replace it with \(\1\) which gave the expected results.

            So maybe the problem is related to my use of \K.

            1 Reply Last reply Reply Quote 1
            • guy038G
              guy038
              last edited by guy038

              Hello, @비공개, @alan-kilborn, @Neil-schipper and All,

              Thanks for trying to get the solution by yourself !

              I’ve already found out a suitable regex S/R for your case ! Try this version and tell me if it avoids the mentioned side-effects !

              SEARCH (?-is)(?:~~choice|(?!\A)\G).+?\K"\w+"

              REPLACE \($0\)

              If OK, I could give your some regex explanations next time !

              Best Regards,

              guy038

              P.S. :

              I supposed that your file contains only regular double quotes " and not the “ and ” characters, of Unicode value \x{201C} and \x{201D}, which are automatically displayed in our forum !

              Neil SchipperN 1 Reply Last reply Reply Quote 2
              • Neil SchipperN
                Neil Schipper @guy038
                last edited by

                @guy038 It didn’t work for me. I ran it on this test text:

                ~~list([“Apple”, “Banana”, “Orange”])
                ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”])
                ~~choice([“Red”, (“Blue”), (“Orange”),…,(“Purple”)])
                ~~choice([(“Red”), “Blue”, (“Orange”),…,(“Purple”)])
                ~~choice([(“Red”), (“Blue”), “Orange”,…,(“Purple”)])
                ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”])
                ~~screen(“fruit_image”, _choice[1], )
                ~~action Return([“category”, “fruits”])
                choice([(“Red”), (“Blue”), (“Orange”),…,(“Purple”)])
                

                and after seeing your P.S. I converted the fancy qm’s to standard ascii:

                ~~list(["Apple", "Banana", "Orange"])
                ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                ~~choice(["Red", ("Blue"), ("Orange"),…,("Purple")])
                ~~choice([("Red"), "Blue", ("Orange"),…,("Purple")])
                ~~choice([("Red"), ("Blue"), "Orange",…,("Purple")])
                ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                ~~screen("fruit_image", _choice[1], )
                ~~action Return(["category", "fruits"])
                choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                

                but I still get no matches.

                I didn’t analyze your search string, and I have no doubt it’s based on sound principles.

                I am amazed to learn about the quotation marks getting altered. That’s another “gotcha” that warrants documentation in an easily found location! (Maybe it’s a feature than can be disabled.)

                Also the codes for the qm’s you state are different from mine. I got mine (lazily) by running a conversion using the Converter plug-in, which I have not vetted for byte-level correctness against standard character tables. Yet another trap for the unwary?

                1 Reply Last reply Reply Quote 1
                • guy038G
                  guy038
                  last edited by

                  Hello, @비공개, @alan-kilborn, @Neil-schipper and All,

                  Ah…OK, Neil. So I improved my regex S/R in order that it will not process anything if the double quotes are already preceded and followed with parentheses !

                  Here is the new version :

                  SEARCH (?-is)(?:~~choice|(?!\A)\G).+?\K(?<!\()"\w+"(?!\))

                  REPLACE \($0\)

                  Taking your INPUT text in account :

                  ~~list(["Apple", "Banana", "Orange"])
                  ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                  ~~choice(["Red", ("Blue"), ("Orange"),…,("Purple")])
                  ~~choice([("Red"), "Blue", ("Orange"),…,("Purple")])
                  ~~choice([("Red"), ("Blue"), "Orange",…,("Purple")])
                  ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                  ~~screen("fruit_image", _choice[1], )
                  ~~action Return(["category", "fruits"])
                  choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                  

                  It correctly changes it as below :

                  ~~list(["Apple", "Banana", "Orange"])
                  ~~choice([("Red"), ("Blue"), ("Orange"), … ,("Purple")])
                  ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                  ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                  ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                  ~~choice([("Red"), ("Blue"), ("Orange"), … ,("Purple")])
                  ~~screen("fruit_image", _choice[1], )
                  ~~action Return(["category", "fruits"])
                  choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                  

                  Notes :

                  • You must use only the Replace All button ( Do NOT click on the Replace button for successive replacements : it won’t work due to the \K syntax ! )

                  • If you don’t tick the Wrap around option, move preferably the caret at the very beginning of current file

                  • This new version avoids the formation of forms such as ((((("text"))))), if you’re trying to execute this regex S/R several times !

                  BR

                  guy038

                  Neil SchipperN 1 Reply Last reply Reply Quote 2
                  • Neil SchipperN
                    Neil Schipper @guy038
                    last edited by

                    @guy038 Following your instructions, this works exactly as you say.

                    Furthermore, I see now that your earlier search string also works with Replace All (which I hadn’t tried) but adds the unwanted extra sets of ().

                    Furthermore, I also see now that my original search string (my first post in this thread) also works with Replace All (which I also hadn’t tried) but with the requirement for successive runs to get the whole job done as I had stated.

                    It appears there’s something about \K that I don’t understand, unless it’s something not fully described in the docs but that was discovered by trial and error.

                    I do consider it a weakness (of both of our search strings) that single replaces don’t work.

                    1 Reply Last reply Reply Quote 1
                    • 비공개비
                      비공개
                      last edited by

                      There are many answers while I’m sleeping. Thank you so much @guy038 ,@Neil-Schipper, @Alan-Kilborn and all!

                      OK, so @guy038, as soon as I woke up, I tried your method and it solved my problem perfectly!😀😀😀

                      The “words” I’m looking for aren’t just written as “word charactors”, so I just changed \w to [A-Za-z \-\.\!\?']. The example I held was not appropriate. I’m sorry.

                      And please! I need your explain.
                      Especially, I’m not sure what (?:choice|(?!\A)\G) and .+?\K" mean.

                      1 Reply Last reply Reply Quote 1
                      • guy038G
                        guy038
                        last edited by guy038

                        Hi, @비공개, @alan-kilborn, @Neil-schipper and All,

                        OK, @비공개, I’m going to give some pieces of information but, as always :

                        • You have to know how to make cement before you can put two bricks together

                        • You must know how to put two bricks together before building a wall

                        • You must know how to build a wall before building a room

                        • You must know how to build a room before building a house

                        and so on !

                        In other words, check this FAQ which gives you the main links to learn regular expressions, from top to bottom ;-))

                        Now, let’s go :

                        ----------------------------------------------------------------------------------------------------------------------------------------------------------
                        
                        Regarding MODIFIERS, generally met at BEGINNING of the regex, but which may occur at ANY location within the overall regex :
                        
                        (?-i)   From this point, search      care      about letter's CASE
                        
                        (?i)    From this point, search  does NOT care about letter's CASE
                        
                        	      
                        (?-s)   From this point, any regex dot symbol represents a SINGLE STANDARD character. So the . is UNICODE equivalent to the NEGATIVE class character
                                      [^\r\n\f\x{0085}\x{2028}\x{2029}] for an Unicode encoded file and equivalent to [^\r\n\f] for an ANSI encoded file
                        	      
                        (?s)    From this point, any regex DOT symbol represents ABSOLUTELY ANY character, included all the LINE-ENDING chars
                        	      
                        
                        (?-x)   From this point, any LITERAL SPACE character is SIGNIFICANT and is part of the overall regex ( IMPLICIT in a N++ regex )
                        
                        (?x)    From this point, any LITERAL SPACE character is IGNORED  and just helps READABILITY of the overall regex.
                                     This mode is called FREE-SPACING mode and can SPLIT in SEVERAL lines. In this  mode :
                        
                                     - Any SPACE char must be written [ ] or \x20  or escaped with a \ character
                                     - Any text, after a # symbol, will be considered as COMMENTS
                                     - Any litteral # symbol must be written [#] or \x23 or escaped as \#
                        
                        	      
                        (?-m)   From this point :
                                     - The regex symbol ^ represents only  the VERY BEGINNING of the file, so equivalent to the regex \A
                                     - The regex symbol $ represents only  the VERY END       of the file, so equivalent to the regex \z
                        	      
                        (?m)    From this point, the assertions ^ and $ represent their USUAL signification of START and END of line locations ( IMPLICIT in a N++ regex )
                        
                        ----------------------------------------------------------------------------------------------------------------------------------------------------------
                        
                        Regarding GROUPS :
                        
                        (•••••)    It defines a CAPTURING group which allows, both :
                        
                                       - The regex engine to STORE the regex ENCLOSED part for FURTHER use, either in the SEARCH and/or the REPLACE part
                        
                                       - The regex ENCLOSED part to be possibly REPEATED with a  QUANTIFIER, located right after
                        
                        (?:•••••)  It defines a NON-CAPTURING group which only allows the regex ENCLOSED part to be REPEATED and which is **not** stored by the regex engine
                        
                        Note that the MODIFIERS, described above, may be INCLUDED within the parentheses :
                        
                                     - In a CAPTURING group as, for instance, ((?i)•••••) so that the INSENSITIVE search is RESTRICTED to the contents of this group, only
                        
                                     - In a NON-CAPTURING group, TWO syntaxes are possible : for instance : (?:(?i)•••••) or the shorthand (?i:•••••)
                        
                        
                        CAPTURING groups can be RE-USED with the syntax :
                        
                            - \1   to \9     in the SEARCH and/or REPLACE regexes   for reference to group 1 to  9
                            - $1   to $99    in the REPLACE regex ONLY              for reference to group 1 to 99
                            - ${1} to ${99}  in the REPLACE regex ONLY              for reference to group 1 to 99
                        
                        	    For instance, the ${1}5 syntax means contents of GROUP 1 , followed with digit 5 where as the $15 syntax would have meant contents of GROUP 15
                        				  
                            - $0 or ${0}     in the REPLACE regex ONLY              for reference to the OVERALL math of the SEARCH regex
                        
                        ----------------------------------------------------------------------------------------------------------------------------------------------------------
                        
                        Regarding QUANTIFIERS, 6 syntaxes are possible {n} , {n,}, {n,m}, ?, + and *. Note that :
                        
                            - {n}   EXACTLY n       times the character or group, PRECEDING the quantifier
                            - {n,}  n or MORE       times the character or group, PRECEDING the quantifier
                            - {n,m} BETWEEN n and m times the character or group, PRECEDING the quantifier
                        
                            - ? is equivalent to {0,1}
                            - + is equivalent to {1,}
                            - * is equivalent to {0,}
                        
                        They are considered as GREEDY quantifiers because they match as MANY characters as possible
                        
                        If these 6 syntaxes are followed with a QUESTION MARK ?, they are called LAZY quantifiers because they match as FEW characters as possible
                            
                        For instance, given the following sentence :
                                                                                    The licenses for most software are designed to take away your freedom to share and change it
                        - Regex (?-s)e.+?ar, with the LAZY   quantifier +?, matches   ---------------------------
                        - Regex (?-s)e.+ar , with the GREEDY quantifier +,  matches   ---------------------------------------------------------------------------
                        
                        
                        If theses 6 syntaxes are followed with a ADDITION sign +, they are called ATOMIC quantifiers.
                        
                           - They are quite similar to their GREEDY forms, exceot that, in case of failure, they don't backtrack to attempt further possible match(es)
                        
                           - Note that this ADVANCED option should be studied when you'll be rather ACQUAINTED with regexes !
                        
                        ----------------------------------------------------------------------------------------------------------------------------------------------------------
                        
                        BTW, a quick tip to SIMULATE a NORMAL search when the REGULAR EXPRESSION mode is selected : START the search zone with the \Q syntax :
                        
                            For instance, the regex \Q/* This is a C-comment */ will find the LITERAL string  /* This is a C-comment */
                        
                        

                        Now, I will rewrite my last regex, with your improvement, in the Free-Spacing mode :

                        ----------------------------------------------------------------------------------------------------------------------------------------------------------
                        
                        (?x-is)                #  FREE-SPACING mode, search SENSITIVE to CASE and DOT regex symbol represents a SINGLE STANDARD char
                        (?:                    #  BEGINNING of a NON-CAPTURING group
                        ~~choice               #       Matches the string ~~choice, with this EXACT case
                        |                      #    OR ( ALTERNATION symbol )
                        (?!\A)\G               #       Matches from RIGHT AFTER the location of the LAST match, IF NOT at the VERY BEGINNING of the file
                        )                      #  END of the NON-CAPTURING group
                        .+?                    #  The SMALLEST NON-NULL range of STANDARD characters till...
                        \K                     #  CURRENT match is DISCARDED and working location is RESET to this POINT
                        (?<!\()                #  ONLY if it's NOT PRECEDED with a STARTING parenthesis symbol
                        "[!'.?\w-]+"           #  ... a NON-NULL range of WORD chars or the characters !, ', ., ? and -
                        (?!\))                 #  ONLY if it's NOT FOLLOWED with an ENDING parenthesis symbol
                        
                        
                        NOTES :
                        
                        - This syntax is totally FUNCTIONAL. To be convinced do a NORMAL selection from (?x-is) to ENDING parenthesis symbol and hit the Ctrl + F shortcut
                             => This MULTI- lines regex is AUTOMATICALLY inserted in the 'Search what' zone 
                        
                        - The \G assertion means that the NEXT search must start, necessarily, RIGHT AFTER the LAST match !
                        
                        - I rewrote your regex part [A-Za-z \-\.\!\?'] as [!'.?\w-] because most of the punctuation signs do NOT need to be ESCAPED, within a CLASS character.
                        
                            - However note that the DASH - must be found at the VERY BEGINNING or the VERY END of the class character, when NON escaped
                            - I prefer the \w syntax to [A-Za-z] because \w also INClUDES all the ACCENTUATED characters of foreign languages
                        
                        - You must use ONLY the REPLACE ALL button ( Do NOT click on the REPLACE button for SUCCESSIVE replacements : it won't work due to the \K syntax ! )
                        
                        - If you don't tick the WRAP AROUND option, move preferably the CARET at the VERY BEGINNING of current file
                        
                        - From BEGINNING of file, as the regex engine must SKIP some LINE-ENDING characters to get a match, the \G assertion is NOT verified
                        
                            and the regex engine must necessarily look, FIRST, for a string ~~choice
                        
                        - Then, from RIGHT AFTER the word choice, it grasps the SMALLEST NON-NULL range of STANDARD chars .+? till a "•••••" structure, but ONLY IF NOT embedded
                              between PARENTHESES itself !
                        
                        - And, due to the \K syntax, ONLY the part "•••••" is the FINAL match desired
                        
                        - This FINAL part is changed with the REPLACE regex \($0\) which just rewrites the string "•••••" between PARENTHESES.
                            The parenthesis symbols must be ESCAPED as they have a SPECIAL signification in REPLACEMENT
                        
                        - Then, from RIGHT AFTER the closing " char, as the regex CANNOT find any other ~~choice string, the (?!\A)\G.+? part, again, selects the SMALLEST NON-NULL
                            range of STANDARD characters till an OTHER block "•••••", execute the REPLACEMENT and so on...
                        
                        

                        In the example, below, in each second line ( Regex types ) :

                        • The dot . represents any char, found by the regex dummy part .+?

                        • The bullet • represents any char, found by the regex useful part [!'.?\w-]

                        • The character " and the string ~~choice stand for themselves

                        Text  processed          ~~choice(["Red", ("Blue"), ("Orange"), … ,"Purple"])
                        Regex types              ~~choice.."•••"..........................."••••••"
                        Match number BEFORE \K   1111111111     222222222222222222222222222
                        Match number AFTER  \K             11111                           22222222
                        
                        
                        Text  processed          ~~choice([("Red"), "Blue", "Orange", … ,("Purple")])
                        Regex types              ~~choice..........."••••".."••••••"
                        Match number BEFORE \K   1111111111111111111      22        
                        Match number AFTER  \K                      111111  22222222
                        
                        
                        Text  processed          ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                        Regex types              ~~choice.."•••".."••••".."••••••"....."••••••"
                        Match number BEFORE \K   1111111111     22      33        44444        
                        Match number AFTER  \K             11111  222222  33333333     44444444
                        

                        I hope that you’ll find this article useful, in any way !

                        However, let me add that the \G and \K assertions, as well as atomic groups and recursive regexes or backtracking verbs ( not discussed ), are difficult notions and I can assure you that there are a LOT of regex things that you need to know before starting to use them !

                        Best Regards,

                        guy038

                        1 Reply Last reply Reply Quote 2
                        • First post
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors