Community
    • Login

    Regex help: Find/Replace only on lines that include specific words

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regex
    11 Posts 4 Posters 809 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • 비공개비
      비공개
      last edited by

      First, I’m not native English speaker and the sentences I write might be wrong, so feel free to ask me if you don’t understand my question.

      Ok…
      I have text files with tons of sentences.
      Like
      .
      .
      ~~list([“Apple”, “Banana”, “Orange”])
      ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”])
      ~~screen(“fruit_image”, _choice[1], )
      ~~action Return([“category”, “fruits”])
      And tens of thousands of other sentences…
      .
      .

      And what I want to do is replace only “words” in choice([]) to (“words”)
      Ex) choice([“Red”, “Blue”, “Orange”, … ,“Purple”]) --> choice([(“Red”), (“Blue”), (“Orange”),…,(“Purple”)])

      I need to change only “words” inside the choice([]) and not inside the list([]), screen([]), etc.

      So this is the question I want to ask.

      1. What regular expression should I use to do that?
      I’ve used some regex, but I couldn’t find just the parts I wanted… The biggest problem is that I still only understand a little bit of the regular expression. I’ve been studying recently, but it’s still so hard…😭😭

      2. Is there a way to find/replace only within the bookmarked line?
      If that is possible, I can solve the problem by bookmarking only the lines containing ‘choice’ and replacing (“.?") or (?=.?]))(”.*?") with ($1).

      3. Or is there a way to select all search results or to find/replace in the search results?
      Then, I can solve the problem by using the ‘find in selection’ function…

      Alan KilbornA 1 Reply Last reply Reply Quote 0
      • Alan KilbornA
        Alan Kilborn @비공개
        last edited by

        @비공개

        The answers to your questions 2 and 3 is “no” and “no”.

        For question number 1, there is a technique presented HERE for solving that. Have a read and see if you can apply that technique.

        1 Reply Last reply Reply Quote 2
        • 비공개비
          비공개
          last edited by 비공개

          Thank you @Alan-Kilborn😀
          I made regex after looking at the linked post, and it ‘almost’ seems to work well…

          Find: (?i-s)(?:choice\(\[|\G).*?\K(?=.*?\]\))(".*?")
          Replaced by: \($1\)

          I think it found every “words” in choice([]), so my problem was solved.
          But there was one exception, it also found the code below.

          screen dropdown_menu(pos=(0, 0), name="", spacing=0, items_offset=(0, 0), background="#00000080", style="empty", iconset=["▾", "▴"]):

          3.PNG

          Can you tell me why that code was found as well?
          I want to know what the problem of regex I made is.

          Neil SchipperN 1 Reply Last reply Reply Quote 1
          • Neil SchipperN
            Neil Schipper @비공개
            last edited by

            @비공개 I have an incomplete solution based on this strategy:

            For each hit we want to match:

            "choice(" followed by..
            any stuff, until you bump into..
            space or comma or left brace, followed immediately by..
            NOT "("
            (and issue a match reset) 
            followed by..
            a word inside fancy quotation marks
            (and this last text going into capture group 1)
            

            The search string is: (?<=choice\().*?[ ,\\[](?<!\()\K(\“\w+\”)

            (I hope it appears that after the comma there’s ONE backslash before the left brace).

            I found this matches what (I think) you want – well, I think it matches because the editor highlights it.

            (Also, the backslashes escaping the fancy quotation marks appear to be optional.)

            The replace text I used: \(\1\)

            Here’s the big problem: the matched text doesn’t get replaced. I need a guru to explain to me why this is so.

            My guess is that the problem is related to the fact that the fancy quotation marks are each three bytes long: E2809C and E2809D.

            Another weakness of my solution is that (if replace worked) it only processes the first text meeting the criteria on a line, so you’d need to run “replace all” a bunch of times. (I think there are ways of overcoming this, resetting and backtracking, but I haven’t looked closely into that.)

            Neil SchipperN 1 Reply Last reply Reply Quote 1
            • Neil SchipperN
              Neil Schipper @Neil Schipper
              last edited by

              @Neil-Schipper My suggestion that the failure to replace the captured text is related to the fancy quotation marks is probably wrong because I could easily match-and-capture (“Blue”) and replace it with \(\1\) which gave the expected results.

              So maybe the problem is related to my use of \K.

              1 Reply Last reply Reply Quote 1
              • guy038G
                guy038
                last edited by guy038

                Hello, @비공개, @alan-kilborn, @Neil-schipper and All,

                Thanks for trying to get the solution by yourself !

                I’ve already found out a suitable regex S/R for your case ! Try this version and tell me if it avoids the mentioned side-effects !

                SEARCH (?-is)(?:~~choice|(?!\A)\G).+?\K"\w+"

                REPLACE \($0\)

                If OK, I could give your some regex explanations next time !

                Best Regards,

                guy038

                P.S. :

                I supposed that your file contains only regular double quotes " and not the “ and ” characters, of Unicode value \x{201C} and \x{201D}, which are automatically displayed in our forum !

                Neil SchipperN 1 Reply Last reply Reply Quote 2
                • Neil SchipperN
                  Neil Schipper @guy038
                  last edited by

                  @guy038 It didn’t work for me. I ran it on this test text:

                  ~~list([“Apple”, “Banana”, “Orange”])
                  ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”])
                  ~~choice([“Red”, (“Blue”), (“Orange”),…,(“Purple”)])
                  ~~choice([(“Red”), “Blue”, (“Orange”),…,(“Purple”)])
                  ~~choice([(“Red”), (“Blue”), “Orange”,…,(“Purple”)])
                  ~~choice([“Red”, “Blue”, “Orange”, … ,“Purple”])
                  ~~screen(“fruit_image”, _choice[1], )
                  ~~action Return([“category”, “fruits”])
                  choice([(“Red”), (“Blue”), (“Orange”),…,(“Purple”)])
                  

                  and after seeing your P.S. I converted the fancy qm’s to standard ascii:

                  ~~list(["Apple", "Banana", "Orange"])
                  ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                  ~~choice(["Red", ("Blue"), ("Orange"),…,("Purple")])
                  ~~choice([("Red"), "Blue", ("Orange"),…,("Purple")])
                  ~~choice([("Red"), ("Blue"), "Orange",…,("Purple")])
                  ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                  ~~screen("fruit_image", _choice[1], )
                  ~~action Return(["category", "fruits"])
                  choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                  

                  but I still get no matches.

                  I didn’t analyze your search string, and I have no doubt it’s based on sound principles.

                  I am amazed to learn about the quotation marks getting altered. That’s another “gotcha” that warrants documentation in an easily found location! (Maybe it’s a feature than can be disabled.)

                  Also the codes for the qm’s you state are different from mine. I got mine (lazily) by running a conversion using the Converter plug-in, which I have not vetted for byte-level correctness against standard character tables. Yet another trap for the unwary?

                  1 Reply Last reply Reply Quote 1
                  • guy038G
                    guy038
                    last edited by

                    Hello, @비공개, @alan-kilborn, @Neil-schipper and All,

                    Ah…OK, Neil. So I improved my regex S/R in order that it will not process anything if the double quotes are already preceded and followed with parentheses !

                    Here is the new version :

                    SEARCH (?-is)(?:~~choice|(?!\A)\G).+?\K(?<!\()"\w+"(?!\))

                    REPLACE \($0\)

                    Taking your INPUT text in account :

                    ~~list(["Apple", "Banana", "Orange"])
                    ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                    ~~choice(["Red", ("Blue"), ("Orange"),…,("Purple")])
                    ~~choice([("Red"), "Blue", ("Orange"),…,("Purple")])
                    ~~choice([("Red"), ("Blue"), "Orange",…,("Purple")])
                    ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                    ~~screen("fruit_image", _choice[1], )
                    ~~action Return(["category", "fruits"])
                    choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                    

                    It correctly changes it as below :

                    ~~list(["Apple", "Banana", "Orange"])
                    ~~choice([("Red"), ("Blue"), ("Orange"), … ,("Purple")])
                    ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                    ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                    ~~choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                    ~~choice([("Red"), ("Blue"), ("Orange"), … ,("Purple")])
                    ~~screen("fruit_image", _choice[1], )
                    ~~action Return(["category", "fruits"])
                    choice([("Red"), ("Blue"), ("Orange"),…,("Purple")])
                    

                    Notes :

                    • You must use only the Replace All button ( Do NOT click on the Replace button for successive replacements : it won’t work due to the \K syntax ! )

                    • If you don’t tick the Wrap around option, move preferably the caret at the very beginning of current file

                    • This new version avoids the formation of forms such as ((((("text"))))), if you’re trying to execute this regex S/R several times !

                    BR

                    guy038

                    Neil SchipperN 1 Reply Last reply Reply Quote 2
                    • Neil SchipperN
                      Neil Schipper @guy038
                      last edited by

                      @guy038 Following your instructions, this works exactly as you say.

                      Furthermore, I see now that your earlier search string also works with Replace All (which I hadn’t tried) but adds the unwanted extra sets of ().

                      Furthermore, I also see now that my original search string (my first post in this thread) also works with Replace All (which I also hadn’t tried) but with the requirement for successive runs to get the whole job done as I had stated.

                      It appears there’s something about \K that I don’t understand, unless it’s something not fully described in the docs but that was discovered by trial and error.

                      I do consider it a weakness (of both of our search strings) that single replaces don’t work.

                      1 Reply Last reply Reply Quote 1
                      • 비공개비
                        비공개
                        last edited by

                        There are many answers while I’m sleeping. Thank you so much @guy038 ,@Neil-Schipper, @Alan-Kilborn and all!

                        OK, so @guy038, as soon as I woke up, I tried your method and it solved my problem perfectly!😀😀😀

                        The “words” I’m looking for aren’t just written as “word charactors”, so I just changed \w to [A-Za-z \-\.\!\?']. The example I held was not appropriate. I’m sorry.

                        And please! I need your explain.
                        Especially, I’m not sure what (?:choice|(?!\A)\G) and .+?\K" mean.

                        1 Reply Last reply Reply Quote 1
                        • guy038G
                          guy038
                          last edited by guy038

                          Hi, @비공개, @alan-kilborn, @Neil-schipper and All,

                          OK, @비공개, I’m going to give some pieces of information but, as always :

                          • You have to know how to make cement before you can put two bricks together

                          • You must know how to put two bricks together before building a wall

                          • You must know how to build a wall before building a room

                          • You must know how to build a room before building a house

                          and so on !

                          In other words, check this FAQ which gives you the main links to learn regular expressions, from top to bottom ;-))

                          Now, let’s go :

                          ----------------------------------------------------------------------------------------------------------------------------------------------------------
                          
                          Regarding MODIFIERS, generally met at BEGINNING of the regex, but which may occur at ANY location within the overall regex :
                          
                          (?-i)   From this point, search      care      about letter's CASE
                          
                          (?i)    From this point, search  does NOT care about letter's CASE
                          
                          	      
                          (?-s)   From this point, any regex dot symbol represents a SINGLE STANDARD character. So the . is UNICODE equivalent to the NEGATIVE class character
                                        [^\r\n\f\x{0085}\x{2028}\x{2029}] for an Unicode encoded file and equivalent to [^\r\n\f] for an ANSI encoded file
                          	      
                          (?s)    From this point, any regex DOT symbol represents ABSOLUTELY ANY character, included all the LINE-ENDING chars
                          	      
                          
                          (?-x)   From this point, any LITERAL SPACE character is SIGNIFICANT and is part of the overall regex ( IMPLICIT in a N++ regex )
                          
                          (?x)    From this point, any LITERAL SPACE character is IGNORED  and just helps READABILITY of the overall regex.
                                       This mode is called FREE-SPACING mode and can SPLIT in SEVERAL lines. In this  mode :
                          
                                       - Any SPACE char must be written [ ] or \x20  or escaped with a \ character
                                       - Any text, after a # symbol, will be considered as COMMENTS
                                       - Any litteral # symbol must be written [#] or \x23 or escaped as \#
                          
                          	      
                          (?-m)   From this point :
                                       - The regex symbol ^ represents only  the VERY BEGINNING of the file, so equivalent to the regex \A
                                       - The regex symbol $ represents only  the VERY END       of the file, so equivalent to the regex \z
                          	      
                          (?m)    From this point, the assertions ^ and $ represent their USUAL signification of START and END of line locations ( IMPLICIT in a N++ regex )
                          
                          ----------------------------------------------------------------------------------------------------------------------------------------------------------
                          
                          Regarding GROUPS :
                          
                          (•••••)    It defines a CAPTURING group which allows, both :
                          
                                         - The regex engine to STORE the regex ENCLOSED part for FURTHER use, either in the SEARCH and/or the REPLACE part
                          
                                         - The regex ENCLOSED part to be possibly REPEATED with a  QUANTIFIER, located right after
                          
                          (?:•••••)  It defines a NON-CAPTURING group which only allows the regex ENCLOSED part to be REPEATED and which is **not** stored by the regex engine
                          
                          Note that the MODIFIERS, described above, may be INCLUDED within the parentheses :
                          
                                       - In a CAPTURING group as, for instance, ((?i)•••••) so that the INSENSITIVE search is RESTRICTED to the contents of this group, only
                          
                                       - In a NON-CAPTURING group, TWO syntaxes are possible : for instance : (?:(?i)•••••) or the shorthand (?i:•••••)
                          
                          
                          CAPTURING groups can be RE-USED with the syntax :
                          
                              - \1   to \9     in the SEARCH and/or REPLACE regexes   for reference to group 1 to  9
                              - $1   to $99    in the REPLACE regex ONLY              for reference to group 1 to 99
                              - ${1} to ${99}  in the REPLACE regex ONLY              for reference to group 1 to 99
                          
                          	    For instance, the ${1}5 syntax means contents of GROUP 1 , followed with digit 5 where as the $15 syntax would have meant contents of GROUP 15
                          				  
                              - $0 or ${0}     in the REPLACE regex ONLY              for reference to the OVERALL math of the SEARCH regex
                          
                          ----------------------------------------------------------------------------------------------------------------------------------------------------------
                          
                          Regarding QUANTIFIERS, 6 syntaxes are possible {n} , {n,}, {n,m}, ?, + and *. Note that :
                          
                              - {n}   EXACTLY n       times the character or group, PRECEDING the quantifier
                              - {n,}  n or MORE       times the character or group, PRECEDING the quantifier
                              - {n,m} BETWEEN n and m times the character or group, PRECEDING the quantifier
                          
                              - ? is equivalent to {0,1}
                              - + is equivalent to {1,}
                              - * is equivalent to {0,}
                          
                          They are considered as GREEDY quantifiers because they match as MANY characters as possible
                          
                          If these 6 syntaxes are followed with a QUESTION MARK ?, they are called LAZY quantifiers because they match as FEW characters as possible
                              
                          For instance, given the following sentence :
                                                                                      The licenses for most software are designed to take away your freedom to share and change it
                          - Regex (?-s)e.+?ar, with the LAZY   quantifier +?, matches   ---------------------------
                          - Regex (?-s)e.+ar , with the GREEDY quantifier +,  matches   ---------------------------------------------------------------------------
                          
                          
                          If theses 6 syntaxes are followed with a ADDITION sign +, they are called ATOMIC quantifiers.
                          
                             - They are quite similar to their GREEDY forms, exceot that, in case of failure, they don't backtrack to attempt further possible match(es)
                          
                             - Note that this ADVANCED option should be studied when you'll be rather ACQUAINTED with regexes !
                          
                          ----------------------------------------------------------------------------------------------------------------------------------------------------------
                          
                          BTW, a quick tip to SIMULATE a NORMAL search when the REGULAR EXPRESSION mode is selected : START the search zone with the \Q syntax :
                          
                              For instance, the regex \Q/* This is a C-comment */ will find the LITERAL string  /* This is a C-comment */
                          
                          

                          Now, I will rewrite my last regex, with your improvement, in the Free-Spacing mode :

                          ----------------------------------------------------------------------------------------------------------------------------------------------------------
                          
                          (?x-is)                #  FREE-SPACING mode, search SENSITIVE to CASE and DOT regex symbol represents a SINGLE STANDARD char
                          (?:                    #  BEGINNING of a NON-CAPTURING group
                          ~~choice               #       Matches the string ~~choice, with this EXACT case
                          |                      #    OR ( ALTERNATION symbol )
                          (?!\A)\G               #       Matches from RIGHT AFTER the location of the LAST match, IF NOT at the VERY BEGINNING of the file
                          )                      #  END of the NON-CAPTURING group
                          .+?                    #  The SMALLEST NON-NULL range of STANDARD characters till...
                          \K                     #  CURRENT match is DISCARDED and working location is RESET to this POINT
                          (?<!\()                #  ONLY if it's NOT PRECEDED with a STARTING parenthesis symbol
                          "[!'.?\w-]+"           #  ... a NON-NULL range of WORD chars or the characters !, ', ., ? and -
                          (?!\))                 #  ONLY if it's NOT FOLLOWED with an ENDING parenthesis symbol
                          
                          
                          NOTES :
                          
                          - This syntax is totally FUNCTIONAL. To be convinced do a NORMAL selection from (?x-is) to ENDING parenthesis symbol and hit the Ctrl + F shortcut
                               => This MULTI- lines regex is AUTOMATICALLY inserted in the 'Search what' zone 
                          
                          - The \G assertion means that the NEXT search must start, necessarily, RIGHT AFTER the LAST match !
                          
                          - I rewrote your regex part [A-Za-z \-\.\!\?'] as [!'.?\w-] because most of the punctuation signs do NOT need to be ESCAPED, within a CLASS character.
                          
                              - However note that the DASH - must be found at the VERY BEGINNING or the VERY END of the class character, when NON escaped
                              - I prefer the \w syntax to [A-Za-z] because \w also INClUDES all the ACCENTUATED characters of foreign languages
                          
                          - You must use ONLY the REPLACE ALL button ( Do NOT click on the REPLACE button for SUCCESSIVE replacements : it won't work due to the \K syntax ! )
                          
                          - If you don't tick the WRAP AROUND option, move preferably the CARET at the VERY BEGINNING of current file
                          
                          - From BEGINNING of file, as the regex engine must SKIP some LINE-ENDING characters to get a match, the \G assertion is NOT verified
                          
                              and the regex engine must necessarily look, FIRST, for a string ~~choice
                          
                          - Then, from RIGHT AFTER the word choice, it grasps the SMALLEST NON-NULL range of STANDARD chars .+? till a "•••••" structure, but ONLY IF NOT embedded
                                between PARENTHESES itself !
                          
                          - And, due to the \K syntax, ONLY the part "•••••" is the FINAL match desired
                          
                          - This FINAL part is changed with the REPLACE regex \($0\) which just rewrites the string "•••••" between PARENTHESES.
                              The parenthesis symbols must be ESCAPED as they have a SPECIAL signification in REPLACEMENT
                          
                          - Then, from RIGHT AFTER the closing " char, as the regex CANNOT find any other ~~choice string, the (?!\A)\G.+? part, again, selects the SMALLEST NON-NULL
                              range of STANDARD characters till an OTHER block "•••••", execute the REPLACEMENT and so on...
                          
                          

                          In the example, below, in each second line ( Regex types ) :

                          • The dot . represents any char, found by the regex dummy part .+?

                          • The bullet • represents any char, found by the regex useful part [!'.?\w-]

                          • The character " and the string ~~choice stand for themselves

                          Text  processed          ~~choice(["Red", ("Blue"), ("Orange"), … ,"Purple"])
                          Regex types              ~~choice.."•••"..........................."••••••"
                          Match number BEFORE \K   1111111111     222222222222222222222222222
                          Match number AFTER  \K             11111                           22222222
                          
                          
                          Text  processed          ~~choice([("Red"), "Blue", "Orange", … ,("Purple")])
                          Regex types              ~~choice..........."••••".."••••••"
                          Match number BEFORE \K   1111111111111111111      22        
                          Match number AFTER  \K                      111111  22222222
                          
                          
                          Text  processed          ~~choice(["Red", "Blue", "Orange", … ,"Purple"])
                          Regex types              ~~choice.."•••".."••••".."••••••"....."••••••"
                          Match number BEFORE \K   1111111111     22      33        44444        
                          Match number AFTER  \K             11111  222222  33333333     44444444
                          

                          I hope that you’ll find this article useful, in any way !

                          However, let me add that the \G and \K assertions, as well as atomic groups and recursive regexes or backtracking verbs ( not discussed ), are difficult notions and I can assure you that there are a LOT of regex things that you need to know before starting to use them !

                          Best Regards,

                          guy038

                          1 Reply Last reply Reply Quote 2
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors