Community
    • Login

    Regex help: Find/Replace only on lines that include specific words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regex
    11 Posts 4 Posters 2.1k Views 2 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 비공개비 Offline
      비공개
      last edited by

      First, I’m not native English speaker and the sentences I write might be wrong, so feel free to ask me if you don’t understand my question.

      Ok…
      I have text files with tons of sentences.
      Like
      .
      .
      ~~list([“Apple”, “Banana”, “Orange”])
      ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”])
      ~~screen(“fruit_image”, _choice[1], )
      ~~action Return([“category”, “fruits”])
      And tens of thousands of other sentences…
      .
      .

      And what I want to do is replace only “words” in choice([]) to (“words”)
      Ex) choice([“Red”, “Blue”, “Orange”, … ,“Purple”]) --> choice([(“Red”), (“Blue”), (“Orange”),…,(“Purple”)])

      I need to change only “words” inside the choice([]) and not inside the list([]), screen([]), etc.

      So this is the question I want to ask.

      1. What regular expression should I use to do that?
      I’ve used some regex, but I couldn’t find just the parts I wanted… The biggest problem is that I still only understand a little bit of the regular expression. I’ve been studying recently, but it’s still so hard…😭😭

      2. Is there a way to find/replace only within the bookmarked line?
      If that is possible, I can solve the problem by bookmarking only the lines containing ‘choice’ and replacing (“.?") or (?=.?]))(”.*?") with ($1).

      3. Or is there a way to select all search results or to find/replace in the search results?
      Then, I can solve the problem by using the ‘find in selection’ function…

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA Offline
        Alan Kilborn @비공개
        last edited by

        @비공개

        The answers to your questions 2 and 3 is “no” and “no”.

        For question number 1, there is a technique presented HERE for solving that. Have a read and see if you can apply that technique.

        1 Reply Last reply Reply Quote 2
        • 비공개비 Offline
          비공개
          last edited by 비공개

          Thank you @Alan-Kilborn😀
          I made regex after looking at the linked post, and it ‘almost’ seems to work well…

          Find: (?i-s)(?:choice\(\[|\G).*?\K(?=.*?\]\))(".*?")
          Replaced by: \($1\)

          I think it found every “words” in choice([]), so my problem was solved.
          But there was one exception, it also found the code below.

          screen dropdown_menu(pos=(0, 0), name="", spacing=0, items_offset=(0, 0), background="#00000080", style="empty", iconset=["▾", "▴"]):

          3.PNG

          Can you tell me why that code was found as well?
          I want to know what the problem of regex I made is.

          Neil SchipperN 1 Reply Last reply Reply Quote 1
          • Neil SchipperN Offline
            Neil Schipper @비공개
            last edited by

            @비공개 I have an incomplete solution based on this strategy:

            For each hit we want to match:

            "choice(" followed by..
            any stuff, until you bump into..
            space or comma or left brace, followed immediately by..
            NOT "("
            (and issue a match reset) 
            followed by..
            a word inside fancy quotation marks
            (and this last text going into capture group 1)
            

            The search string is: (?<=choice\().*?[ ,\\[](?<!\()\K(\“\w+\”)

            (I hope it appears that after the comma there’s ONE backslash before the left brace).

            I found this matches what (I think) you want – well, I think it matches because the editor highlights it.

            (Also, the backslashes escaping the fancy quotation marks appear to be optional.)

            The replace text I used: \(\1\)

            Here’s the big problem: the matched text doesn’t get replaced. I need a guru to explain to me why this is so.

            My guess is that the problem is related to the fact that the fancy quotation marks are each three bytes long: E2809C and E2809D.

            Another weakness of my solution is that (if replace worked) it only processes the first text meeting the criteria on a line, so you’d need to run “replace all” a bunch of times. (I think there are ways of overcoming this, resetting and backtracking, but I haven’t looked closely into that.)

            Neil SchipperN 1 Reply Last reply Reply Quote 1
            • Neil SchipperN Offline
              Neil Schipper @Neil Schipper
              last edited by

              @Neil-Schipper My suggestion that the failure to replace the captured text is related to the fancy quotation marks is probably wrong because I could easily match-and-capture (“Blue”) and replace it with \(\1\) which gave the expected results.

              So maybe the problem is related to my use of \K.

              1 Reply Last reply Reply Quote 1
              • guy038G Offline
                guy038
                last edited by guy038

                Hello, @비공개, @alan-kilborn, @Neil-schipper and All,

                Thanks for trying to get the solution by yourself !

                I’ve already found out a suitable regex S/R for your case ! Try this version and tell me if it avoids the mentioned side-effects !

                SEARCH (?-is)(?:~~choice|(?!\A)\G).+?\K"\w+"

                REPLACE \($0\)

                If OK, I could give your some regex explanations next time !

                Best Regards,

                guy038

                P.S. :

                I supposed that your file contains only regular double quotes " and not the “ and ” characters, of Unicode value \x{201C} and \x{201D}, which are automatically displayed in our forum !

                Neil SchipperN 1 Reply Last reply Reply Quote 2
                • Neil SchipperN Offline
                  Neil Schipper @guy038
                  last edited by

                  @guy038 It didn’t work for me. I ran it on this test text:

                  ~~list([“Apple”, “Banana”, “Orange”])
                  ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”])
                  ~~choice([“Red”, (“Blue”), (“Orange”),…,(“Purple”)])
                  ~~choice([(“Red”), “Blue”, (“Orange”),…,(“Purple”)])
                  ~~choice([(“Red”), (“Blue”), “Orange”,…,(“Purple”)])
                  ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”])
                  ~~screen(“fruit_image”, _choice[1], )
                  ~~action Return([“category”, “fruits”])
                  choice([(“Red”), (“Blue”), (“Orange”),…,(“Purple”)])
                  

                  and after seeing your P.S. I converted the fancy qm’s to standard ascii:

                  ~~list(["Apple", "Banana", "Orange"])
                  ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                  ~~choice(["Red", ("Blue"), ("Orange"),…,("Purple")])
                  ~~choice([("Red"), "Blue", ("Orange"),…,("Purple")])
                  ~~choice([("Red"), ("Blue"), "Orange",…,("Purple")])
                  ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                  ~~screen("fruit_image", _choice[1], )
                  ~~action Return(["category", "fruits"])
                  choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                  

                  but I still get no matches.

                  I didn’t analyze your search string, and I have no doubt it’s based on sound principles.

                  I am amazed to learn about the quotation marks getting altered. That’s another “gotcha” that warrants documentation in an easily found location! (Maybe it’s a feature than can be disabled.)

                  Also the codes for the qm’s you state are different from mine. I got mine (lazily) by running a conversion using the Converter plug-in, which I have not vetted for byte-level correctness against standard character tables. Yet another trap for the unwary?

                  1 Reply Last reply Reply Quote 1
                  • guy038G Offline
                    guy038
                    last edited by

                    Hello, @비공개, @alan-kilborn, @Neil-schipper and All,

                    Ah…OK, Neil. So I improved my regex S/R in order that it will not process anything if the double quotes are already preceded and followed with parentheses !

                    Here is the new version :

                    SEARCH (?-is)(?:~~choice|(?!\A)\G).+?\K(?<!\()"\w+"(?!\))

                    REPLACE \($0\)

                    Taking your INPUT text in account :

                    ~~list(["Apple", "Banana", "Orange"])
                    ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                    ~~choice(["Red", ("Blue"), ("Orange"),…,("Purple")])
                    ~~choice([("Red"), "Blue", ("Orange"),…,("Purple")])
                    ~~choice([("Red"), ("Blue"), "Orange",…,("Purple")])
                    ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                    ~~screen("fruit_image", _choice[1], )
                    ~~action Return(["category", "fruits"])
                    choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                    

                    It correctly changes it as below :

                    ~~list(["Apple", "Banana", "Orange"])
                    ~~choice([("Red"), ("Blue"), ("Orange"), … ,("Purple")])
                    ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                    ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                    ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                    ~~choice([("Red"), ("Blue"), ("Orange"), … ,("Purple")])
                    ~~screen("fruit_image", _choice[1], )
                    ~~action Return(["category", "fruits"])
                    choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                    

                    Notes :

                    • You must use only the Replace All button ( Do NOT click on the Replace button for successive replacements : it won’t work due to the \K syntax ! )

                    • If you don’t tick the Wrap around option, move preferably the caret at the very beginning of current file

                    • This new version avoids the formation of forms such as ((((("text"))))), if you’re trying to execute this regex S/R several times !

                    BR

                    guy038

                    Neil SchipperN 1 Reply Last reply Reply Quote 2
                    • Neil SchipperN Offline
                      Neil Schipper @guy038
                      last edited by

                      @guy038 Following your instructions, this works exactly as you say.

                      Furthermore, I see now that your earlier search string also works with Replace All (which I hadn’t tried) but adds the unwanted extra sets of ().

                      Furthermore, I also see now that my original search string (my first post in this thread) also works with Replace All (which I also hadn’t tried) but with the requirement for successive runs to get the whole job done as I had stated.

                      It appears there’s something about \K that I don’t understand, unless it’s something not fully described in the docs but that was discovered by trial and error.

                      I do consider it a weakness (of both of our search strings) that single replaces don’t work.

                      1 Reply Last reply Reply Quote 1
                      • 비공개비 Offline
                        비공개
                        last edited by

                        There are many answers while I’m sleeping. Thank you so much @guy038 ,@Neil-Schipper, @Alan-Kilborn and all!

                        OK, so @guy038, as soon as I woke up, I tried your method and it solved my problem perfectly!😀😀😀

                        The “words” I’m looking for aren’t just written as “word charactors”, so I just changed \w to [A-Za-z \-\.\!\?']. The example I held was not appropriate. I’m sorry.

                        And please! I need your explain.
                        Especially, I’m not sure what (?:choice|(?!\A)\G) and .+?\K" mean.

                        1 Reply Last reply Reply Quote 1
                        • guy038G Offline
                          guy038
                          last edited by guy038

                          Hi, @비공개, @alan-kilborn, @Neil-schipper and All,

                          OK, @비공개, I’m going to give some pieces of information but, as always :

                          • You have to know how to make cement before you can put two bricks together

                          • You must know how to put two bricks together before building a wall

                          • You must know how to build a wall before building a room

                          • You must know how to build a room before building a house

                          and so on !

                          In other words, check this FAQ which gives you the main links to learn regular expressions, from top to bottom ;-))

                          Now, let’s go :

                          ----------------------------------------------------------------------------------------------------------------------------------------------------------
                          
                          Regarding MODIFIERS, generally met at BEGINNING of the regex, but which may occur at ANY location within the overall regex :
                          
                          (?-i)   From this point, search      care      about letter's CASE
                          
                          (?i)    From this point, search  does NOT care about letter's CASE
                          
                          	      
                          (?-s)   From this point, any regex dot symbol represents a SINGLE STANDARD character. So the . is UNICODE equivalent to the NEGATIVE class character
                                        [^\r\n\f\x{0085}\x{2028}\x{2029}] for an Unicode encoded file and equivalent to [^\r\n\f] for an ANSI encoded file
                          	      
                          (?s)    From this point, any regex DOT symbol represents ABSOLUTELY ANY character, included all the LINE-ENDING chars
                          	      
                          
                          (?-x)   From this point, any LITERAL SPACE character is SIGNIFICANT and is part of the overall regex ( IMPLICIT in a N++ regex )
                          
                          (?x)    From this point, any LITERAL SPACE character is IGNORED  and just helps READABILITY of the overall regex.
                                       This mode is called FREE-SPACING mode and can SPLIT in SEVERAL lines. In this  mode :
                          
                                       - Any SPACE char must be written [ ] or \x20  or escaped with a \ character
                                       - Any text, after a # symbol, will be considered as COMMENTS
                                       - Any litteral # symbol must be written [#] or \x23 or escaped as \#
                          
                          	      
                          (?-m)   From this point :
                                       - The regex symbol ^ represents only  the VERY BEGINNING of the file, so equivalent to the regex \A
                                       - The regex symbol $ represents only  the VERY END       of the file, so equivalent to the regex \z
                          	      
                          (?m)    From this point, the assertions ^ and $ represent their USUAL signification of START and END of line locations ( IMPLICIT in a N++ regex )
                          
                          ----------------------------------------------------------------------------------------------------------------------------------------------------------
                          
                          Regarding GROUPS :
                          
                          (•••••)    It defines a CAPTURING group which allows, both :
                          
                                         - The regex engine to STORE the regex ENCLOSED part for FURTHER use, either in the SEARCH and/or the REPLACE part
                          
                                         - The regex ENCLOSED part to be possibly REPEATED with a  QUANTIFIER, located right after
                          
                          (?:•••••)  It defines a NON-CAPTURING group which only allows the regex ENCLOSED part to be REPEATED and which is **not** stored by the regex engine
                          
                          Note that the MODIFIERS, described above, may be INCLUDED within the parentheses :
                          
                                       - In a CAPTURING group as, for instance, ((?i)•••••) so that the INSENSITIVE search is RESTRICTED to the contents of this group, only
                          
                                       - In a NON-CAPTURING group, TWO syntaxes are possible : for instance : (?:(?i)•••••) or the shorthand (?i:•••••)
                          
                          
                          CAPTURING groups can be RE-USED with the syntax :
                          
                              - \1   to \9     in the SEARCH and/or REPLACE regexes   for reference to group 1 to  9
                              - $1   to $99    in the REPLACE regex ONLY              for reference to group 1 to 99
                              - ${1} to ${99}  in the REPLACE regex ONLY              for reference to group 1 to 99
                          
                          	    For instance, the ${1}5 syntax means contents of GROUP 1 , followed with digit 5 where as the $15 syntax would have meant contents of GROUP 15
                          				  
                              - $0 or ${0}     in the REPLACE regex ONLY              for reference to the OVERALL math of the SEARCH regex
                          
                          ----------------------------------------------------------------------------------------------------------------------------------------------------------
                          
                          Regarding QUANTIFIERS, 6 syntaxes are possible {n} , {n,}, {n,m}, ?, + and *. Note that :
                          
                              - {n}   EXACTLY n       times the character or group, PRECEDING the quantifier
                              - {n,}  n or MORE       times the character or group, PRECEDING the quantifier
                              - {n,m} BETWEEN n and m times the character or group, PRECEDING the quantifier
                          
                              - ? is equivalent to {0,1}
                              - + is equivalent to {1,}
                              - * is equivalent to {0,}
                          
                          They are considered as GREEDY quantifiers because they match as MANY characters as possible
                          
                          If these 6 syntaxes are followed with a QUESTION MARK ?, they are called LAZY quantifiers because they match as FEW characters as possible
                              
                          For instance, given the following sentence :
                                                                                      The licenses for most software are designed to take away your freedom to share and change it
                          - Regex (?-s)e.+?ar, with the LAZY   quantifier +?, matches   ---------------------------
                          - Regex (?-s)e.+ar , with the GREEDY quantifier +,  matches   ---------------------------------------------------------------------------
                          
                          
                          If theses 6 syntaxes are followed with a ADDITION sign +, they are called ATOMIC quantifiers.
                          
                             - They are quite similar to their GREEDY forms, exceot that, in case of failure, they don't backtrack to attempt further possible match(es)
                          
                             - Note that this ADVANCED option should be studied when you'll be rather ACQUAINTED with regexes !
                          
                          ----------------------------------------------------------------------------------------------------------------------------------------------------------
                          
                          BTW, a quick tip to SIMULATE a NORMAL search when the REGULAR EXPRESSION mode is selected : START the search zone with the \Q syntax :
                          
                              For instance, the regex \Q/* This is a C-comment */ will find the LITERAL string  /* This is a C-comment */
                          
                          

                          Now, I will rewrite my last regex, with your improvement, in the Free-Spacing mode :

                          ----------------------------------------------------------------------------------------------------------------------------------------------------------
                          
                          (?x-is)                #  FREE-SPACING mode, search SENSITIVE to CASE and DOT regex symbol represents a SINGLE STANDARD char
                          (?:                    #  BEGINNING of a NON-CAPTURING group
                          ~~choice               #       Matches the string ~~choice, with this EXACT case
                          |                      #    OR ( ALTERNATION symbol )
                          (?!\A)\G               #       Matches from RIGHT AFTER the location of the LAST match, IF NOT at the VERY BEGINNING of the file
                          )                      #  END of the NON-CAPTURING group
                          .+?                    #  The SMALLEST NON-NULL range of STANDARD characters till...
                          \K                     #  CURRENT match is DISCARDED and working location is RESET to this POINT
                          (?<!\()                #  ONLY if it's NOT PRECEDED with a STARTING parenthesis symbol
                          "[!'.?\w-]+"           #  ... a NON-NULL range of WORD chars or the characters !, ', ., ? and -
                          (?!\))                 #  ONLY if it's NOT FOLLOWED with an ENDING parenthesis symbol
                          
                          
                          NOTES :
                          
                          - This syntax is totally FUNCTIONAL. To be convinced do a NORMAL selection from (?x-is) to ENDING parenthesis symbol and hit the Ctrl + F shortcut
                               => This MULTI- lines regex is AUTOMATICALLY inserted in the 'Search what' zone 
                          
                          - The \G assertion means that the NEXT search must start, necessarily, RIGHT AFTER the LAST match !
                          
                          - I rewrote your regex part [A-Za-z \-\.\!\?'] as [!'.?\w-] because most of the punctuation signs do NOT need to be ESCAPED, within a CLASS character.
                          
                              - However note that the DASH - must be found at the VERY BEGINNING or the VERY END of the class character, when NON escaped
                              - I prefer the \w syntax to [A-Za-z] because \w also INClUDES all the ACCENTUATED characters of foreign languages
                          
                          - You must use ONLY the REPLACE ALL button ( Do NOT click on the REPLACE button for SUCCESSIVE replacements : it won't work due to the \K syntax ! )
                          
                          - If you don't tick the WRAP AROUND option, move preferably the CARET at the VERY BEGINNING of current file
                          
                          - From BEGINNING of file, as the regex engine must SKIP some LINE-ENDING characters to get a match, the \G assertion is NOT verified
                          
                              and the regex engine must necessarily look, FIRST, for a string ~~choice
                          
                          - Then, from RIGHT AFTER the word choice, it grasps the SMALLEST NON-NULL range of STANDARD chars .+? till a "•••••" structure, but ONLY IF NOT embedded
                                between PARENTHESES itself !
                          
                          - And, due to the \K syntax, ONLY the part "•••••" is the FINAL match desired
                          
                          - This FINAL part is changed with the REPLACE regex \($0\) which just rewrites the string "•••••" between PARENTHESES.
                              The parenthesis symbols must be ESCAPED as they have a SPECIAL signification in REPLACEMENT
                          
                          - Then, from RIGHT AFTER the closing " char, as the regex CANNOT find any other ~~choice string, the (?!\A)\G.+? part, again, selects the SMALLEST NON-NULL
                              range of STANDARD characters till an OTHER block "•••••", execute the REPLACEMENT and so on...
                          
                          

                          In the example, below, in each second line ( Regex types ) :

                          • The dot . represents any char, found by the regex dummy part .+?

                          • The bullet • represents any char, found by the regex useful part [!'.?\w-]

                          • The character " and the string ~~choice stand for themselves

                          Text  processed          ~~choice(["Red", ("Blue"), ("Orange"), … ,"Purple"])
                          Regex types              ~~choice.."•••"..........................."••••••"
                          Match number BEFORE \K   1111111111     222222222222222222222222222
                          Match number AFTER  \K             11111                           22222222
                          
                          
                          Text  processed          ~~choice([("Red"), "Blue", "Orange", … ,("Purple")])
                          Regex types              ~~choice..........."••••".."••••••"
                          Match number BEFORE \K   1111111111111111111      22        
                          Match number AFTER  \K                      111111  22222222
                          
                          
                          Text  processed          ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                          Regex types              ~~choice.."•••".."••••".."••••••"....."••••••"
                          Match number BEFORE \K   1111111111     22      33        44444        
                          Match number AFTER  \K             11111  222222  33333333     44444444
                          

                          I hope that you’ll find this article useful, in any way !

                          However, let me add that the \G and \K assertions, as well as atomic groups and recursive regexes or backtracking verbs ( not discussed ), are difficult notions and I can assure you that there are a LOT of regex things that you need to know before starting to use them !

                          Best Regards,

                          guy038

                          1 Reply Last reply Reply Quote 2

                          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

                          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

                          With your input, this post could be even better 💗

                          Register Login
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors