Community
    • Login

    Generic Regex: Replacing in a specific zone of text

    Scheduled Pinned Locked Moved Blogs
    2 Posts 1 Posters 2.8k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by guy038

      This regex S/R allows to restrict a replacement to a specific zone of text, possibly repeated, on one or several consecutive lines.

      This is particularly useful when dealing with XML or HTML languages, if you need to do some modifications within a specific start and end tag range, only.


      • Let FR (Find Regex ) be the regex which defines the char, string or expression to be searched

      • Let RR (Replacement Regex ) be the regex which defines the char, string or expression which must replace the FR expression

      • Let BSR ( Begin Search-region Regex ) be the regex which defines the beginning of the area where the search for FR, must start

      • Let ESR ( End Search-region Regex) be the regex which defines the end of the area where the search for FR must stop

      Then, the generic regex can be expressed :

      SEARCH (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)

      REPLACE RR


      When the BSR and the different matches of the FR regex are all located in a single line, any line-ending char(s) will implicitly break down the \G feature. The ESR part is then useless and the generic regex can be simplified into :

      SEARCH (?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)

      REPLACE RR


      IMPORTANT :

      • You must use, at least, the v7.9.1 N++ release, so that the \A assertion is correctly handled

      • You must, move the caret at the very beginning of current file ( Ctrl + Home )

      • If you perform a simple search, without any replacement, just click several times on the Find Next button to notice the different zones affected by the future replacement

      • As soon as a replacement is needed, you’ll have to click on the Replace All button, exclusively. Thus, it will perform a global replacement on the entire file

      NOTES :

      • Each non-capturing group, relative to the BSR, ESR and FR regexes, may be prefixed with the s or -s modifiers :

        • If the BSR and/or ESR and/or FR regexes may match EOL characters, use the s modifier in the appropriate non-capturing group(s)

        • If the BSR and/or ESR and/or FR regexes does not match EOL characters, use the -s modifier in the appropriate non-capturing group(s)

      • Each non-capturing group, relative to the BSR, ESR and FR regexes, may be prefixed with the i or -i modifiers :

        • If the BSR and/or ESR and/or FR regexes are sensitive to case, use the -i modifier in the appropriate non-capturing group(s)

        • If the BSR and/or ESR and/or FR regexes are insensitive to case, use the i modifier in the appropriate non-capturing group(s)

      • Of course, these modifiers may not be necessary ( for instance in case of search of an exact string or search of non-letter characters )

      • Note that the generic regexes, above, show the case when :

        • These two generic regexes are sensitive to case => The -i modifier is present everywhere in the definitions

        • The ESR region of the first regex may overlap on several lines => The s modifier in the ESR non-capturing group

      • The FR regex may define a group, between parentheses, which will be re-used in the RR regex with the \# or ${#} syntaxes, where # represents an integer

      • The RR regex may contain the $0 syntax which refers to each whole SR match or re-use a group, previously defined in the FR regex

      Below, here are two examples to illustrate how to build real regexes S/R from these generic ones !


      First, let’s imagine that you want to delete any part within parentheses in any range of text <Descrip>............</Descrip>, only, located in a single line

      • Paste the XML text, below, in a new tab :
      <iden>123456 (START)</iden>
      <name>Case_1</name>
      <descrip>This is a (short) text to (easily) see the results (of the modifications)</descrip>
      <param>val (250)</param>
      
      <iden>123456</iden>
      <name>Case_2</name>
      <descrip>And the (obvious) changes occur only in (the) "descrip" tag</descrip>
      <param>val (500)</param>
      
      <iden>123456 (END)</iden>
      <name>Case_3</name>
      <descrip>All (the) other tags are (just) untouched</descrip>
      <param>val (999)</param>
      
      • As all the parts to delete are contained in a single line, we can use the simplified formulation :

        • SEARCH (?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)

        • REPLACE RR

      • Obviously, as we want to delete, the RR regex is a zero-length match. So, the Replace with field will be empty

      • Now, the FR regex represents a space char followed by the shortest text between parentheses => FR = (?:\x20\(.+?\)) We do not need any case modifier as this regex does not refer to letters !

      • The BSR regex is simply the literal string <descrip>, with this exact case. So BSR = (?-i:<descrip>

      Finally, the functional regex S/R to use is :

      • SEARCH (?-s)(?-i:<descrip>|(?!\A)\G).*?\K(?:\x20\(.+?\))

      • REPLACE Leave EMPTY

      • Open the Replace dialog Ctrl + H

      • Untick all options

      • Select the Regular expression search mode

      • Move to the very beginning of current file ( Ctrl + Home )

      • Hit several times the Find Next button to verify if the FR regex does match what you want ! In this present case it matches a space followed by text between parentheses

      • Again, move to the very beginning of current file ( Ctrl + Home )

      • Click, once only, on the Replace All button

      => As expected, all text between parentheses, of the <descrip> tag only, has been deleted, but the other parentheses, present in other tags, are untouched !


      In the second example, we’ll try to replace any number of consecutive dash character with a single space char in any range <text>..........</text>, possibly splitted into several lines

      • Paste the following XML text in a new tab
      <val>37--001</val>
      <text>This-is
      -a</text>
      <pos>4-1234</pos>
      
      <val>37--002</val>
      <text>-small---example</text>
      <pos>9-0012</pos>
      
      
      <val>37--003</val>
      <text>-of-text-
      which-</text>
      <pos>1-9999</pos>
      
      
      <val>37--004</val>
      
      <text>need
      -to-be-
      modi
      fied</text>
      
      <pos>0-0000</pos>
      
      • As, this time, the <text>..........</text> may be spread over several lines, we’ll use the first generic regex :

        • SEARCH (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)

        • REPLACE RR

      • Obviously, the RR regex is simply \x20

      • Now, the FR regex represents a non-null number of consecutive dashe(s) => FR is just -+, as the non-capturing group seems not needed at all

      • The BSR regex is simply the literal string <text>, with this exact case => BSR = (?-si:<text>

      • The ESR regex is the literal string </text>, with this exact case. So the BSR regex, within its non-capturing group, is (?s-i:(?!</text>).)

      Then, the real regex S/R to use is :

      • SEARCH (?-si:<text>|(?!\A)\G)(?s-i:(?!</text>).)*?\K-+

      • REPLACE \x20

      • Open the Replace dialog Ctrl + H

      • Untick all options

      • Select the Regular expression search mode

      • Move to the very beginning of current file ( Ctrl + Home )

      • Hit several times the Find Next button to verify if the FR regex does match what you want ! In this present case it matches any consecutive range of dash chars

      • Again, move to the very beginning of current file ( Ctrl + Home )

      • Click, once only, on the Replace All button

      => As expected, all range of consecutive dashes, of the <text> tag only, have been replaced with a single space char and the other dash characters, present in other tags, are kepted !

      1 Reply Last reply Reply Quote 6
      • PeterJonesP PeterJones referenced this topic on
      • guy038G
        guy038
        last edited by guy038

        Two other examples regarding this generic regex ! In these ones, we’ll even restrict the replacements to each concerned zone before a # character !

        Paste the text below in a new tab :

        <iden>123456 (START)</iden>
        <name>Case_1</name>
        <descrip>This is a (short) text to (easily) see the results (of the modifications)# (12345) test (67890)</descrip>
        <param>val (250)</param>
        
        <iden>123456</iden>
        <name>Case_2</name>
        <descrip>And the (obvious) changes occur only in (the) "descrip" tag # Parentheses (Yeaah) OK</descrip>
        <param>val (500)</param>
        
        <iden>123456 (END)</iden>
        <name>Case_3</name>
        <descrip>All (the) other tags are (just) untouched #(This is) the end (of the test)</descrip>
        <param>val (999)</param>
        

        In this first example, of single-line <descrip> tags , two solutions are possible :

        • Use the complete generic regex (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR) where ESR = # which leads to the functional S/R :

          • SEARCH (?-s)(?-i:<descrip>|(?!\A)\G)((?!#).)*?\K(?:\x20\(.+?\))

          • REPLACE Leave EMPTY

        => This time, in addition to only replace in each <descrip>..........</descrip> zone, NO replacement will occur after the # character of each <descrip> tag !

        • Use the simplified solution and add a ESR condition at the end of the regex, giving this generic variant (?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)(?=ESR)

          • SEARCH (?-s)(?-i:<descrip>|(?!\A)\G).*?\K(?:\x20\(.+?\))(?=.*#)

          • REPLACE Leave EMPTY

        However, this other solution needs that all the <descrip> tags contains a comment zone with a # char


        Now, paste this other text below in a new tab :

        <val>37--001</val>
        <text>This-is
        -a--very---< # Dashes - - - OK/text>
        <pos>4-1234</pos>
        
        <val>37--002</val>
        <text>-small----#---example</text>
        <pos>9-0012</pos>
        
        
        <val>37--003</val>
        <text>-of-a-text-
        which-</text>
        <pos>1-9999</pos>
        
        
        <val>37--004</val>
        
        <text>need
        -to-be-
        modi
        fied # but - not - there</text>
        
        <pos>0-0000</pos>
        

        This second example is a multi-lines replacement, in each <text>.............</text> zone only and also limited to the part before a # char which can be present or not

        Of course, we’ll have to use the complete generic regex (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR) but, instead of a single (?!ESR), we’ll have to use this variant :

        (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR_1)(?!ESR_2).)*?\K(?-si:FR)

        So, the functional regex S/R becomes :

        • SEARCH (?-si:<text>|(?!\A)\G)(?s-i:(?!</text>)(?!#).)*?\K-+

        • REPLACE \x20

        => ONLY IF a sequence of dashes is located in a <text>..........</text> zone AND, moreover, before a possible # char, it will be replaced with a single space character

        As you can verify, the third multi-lines <text>.............</text> zone does not contain any # char. Thus, all dash characters, of that <Text> tag, are replaced with a single space char !


        Remainder :

        • You must use, at least, the v7.9.1 N++ release, so that the \A assertion is correctly handled

        • Move to the very beginning of file, before any Find Next sequence or Replace All operation

        • Do not click on the step-by-step Replace button

        1 Reply Last reply Reply Quote 6
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • guy038G guy038 referenced this topic on
        • guy038G guy038 referenced this topic on
        • guy038G guy038 referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • Luigi Giuseppe De FranceschiL Luigi Giuseppe De Franceschi referenced this topic on
        • guy038G guy038 referenced this topic on
        • guy038G guy038 referenced this topic on
        • guy038G guy038 referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • Terry RT Terry R referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • fenzek1F fenzek1 referenced this topic on
        • Terry RT Terry R referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • guy038G guy038 referenced this topic on
        • Paul WormerP Paul Wormer referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • Mark OlsonM Mark Olson referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • guy038G guy038 referenced this topic on
        • Paul WormerP Paul Wormer referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • Terry RT Terry R referenced this topic on
        • PeterJonesP PeterJones referenced this topic on
        • dr ramaanandD dr ramaanand referenced this topic on
        • Terry RT Terry R referenced this topic on
        • Sylvester BullittS Sylvester Bullitt referenced this topic on
        • Sylvester BullittS Sylvester Bullitt referenced this topic on
        • Terry RT Terry R referenced this topic on
        • guy038G guy038 referenced this topic on
        • Mark OlsonM Mark Olson referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • mkupperM mkupper referenced this topic on
        • guy038G guy038 referenced this topic on
        • CoisesC Coises referenced this topic on
        • CoisesC Coises referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • guy038G guy038 referenced this topic on
        • Alan KilbornA Alan Kilborn referenced this topic on
        • guy038G guy038 referenced this topic on
        • Terry RT Terry R referenced this topic on
        • guy038G guy038 referenced this topic on
        • Mark OlsonM Mark Olson referenced this topic on
        • guy038G guy038 referenced this topic on
        • guy038G guy038 referenced this topic on
        • Terry RT Terry R referenced this topic on
        • First post
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors