• Login
Community
  • Login

Generic Regex: Replacing in a specific zone of text

Scheduled Pinned Locked Moved Blogs
2 Posts 1 Posters 3.0k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • G
    guy038
    last edited by guy038 Oct 18, 2022, 10:29 PM Mar 9, 2022, 12:18 AM

    This regex S/R allows to restrict a replacement to a specific zone of text, possibly repeated, on one or several consecutive lines.

    This is particularly useful when dealing with XML or HTML languages, if you need to do some modifications within a specific start and end tag range, only.


    • Let FR (Find Regex ) be the regex which defines the char, string or expression to be searched

    • Let RR (Replacement Regex ) be the regex which defines the char, string or expression which must replace the FR expression

    • Let BSR ( Begin Search-region Regex ) be the regex which defines the beginning of the area where the search for FR, must start

    • Let ESR ( End Search-region Regex) be the regex which defines the end of the area where the search for FR must stop

    Then, the generic regex can be expressed :

    SEARCH (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)

    REPLACE RR


    When the BSR and the different matches of the FR regex are all located in a single line, any line-ending char(s) will implicitly break down the \G feature. The ESR part is then useless and the generic regex can be simplified into :

    SEARCH (?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)

    REPLACE RR


    IMPORTANT :

    • You must use, at least, the v7.9.1 N++ release, so that the \A assertion is correctly handled

    • You must, move the caret at the very beginning of current file ( Ctrl + Home )

    • If you perform a simple search, without any replacement, just click several times on the Find Next button to notice the different zones affected by the future replacement

    • As soon as a replacement is needed, you’ll have to click on the Replace All button, exclusively. Thus, it will perform a global replacement on the entire file

    NOTES :

    • Each non-capturing group, relative to the BSR, ESR and FR regexes, may be prefixed with the s or -s modifiers :

      • If the BSR and/or ESR and/or FR regexes may match EOL characters, use the s modifier in the appropriate non-capturing group(s)

      • If the BSR and/or ESR and/or FR regexes does not match EOL characters, use the -s modifier in the appropriate non-capturing group(s)

    • Each non-capturing group, relative to the BSR, ESR and FR regexes, may be prefixed with the i or -i modifiers :

      • If the BSR and/or ESR and/or FR regexes are sensitive to case, use the -i modifier in the appropriate non-capturing group(s)

      • If the BSR and/or ESR and/or FR regexes are insensitive to case, use the i modifier in the appropriate non-capturing group(s)

    • Of course, these modifiers may not be necessary ( for instance in case of search of an exact string or search of non-letter characters )

    • Note that the generic regexes, above, show the case when :

      • These two generic regexes are sensitive to case => The -i modifier is present everywhere in the definitions

      • The ESR region of the first regex may overlap on several lines => The s modifier in the ESR non-capturing group

    • The FR regex may define a group, between parentheses, which will be re-used in the RR regex with the \# or ${#} syntaxes, where # represents an integer

    • The RR regex may contain the $0 syntax which refers to each whole SR match or re-use a group, previously defined in the FR regex

    Below, here are two examples to illustrate how to build real regexes S/R from these generic ones !


    First, let’s imagine that you want to delete any part within parentheses in any range of text <Descrip>............</Descrip>, only, located in a single line

    • Paste the XML text, below, in a new tab :
    <iden>123456 (START)</iden>
    <name>Case_1</name>
    <descrip>This is a (short) text to (easily) see the results (of the modifications)</descrip>
    <param>val (250)</param>
    
    <iden>123456</iden>
    <name>Case_2</name>
    <descrip>And the (obvious) changes occur only in (the) "descrip" tag</descrip>
    <param>val (500)</param>
    
    <iden>123456 (END)</iden>
    <name>Case_3</name>
    <descrip>All (the) other tags are (just) untouched</descrip>
    <param>val (999)</param>
    
    • As all the parts to delete are contained in a single line, we can use the simplified formulation :

      • SEARCH (?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)

      • REPLACE RR

    • Obviously, as we want to delete, the RR regex is a zero-length match. So, the Replace with field will be empty

    • Now, the FR regex represents a space char followed by the shortest text between parentheses => FR = (?:\x20\(.+?\)) We do not need any case modifier as this regex does not refer to letters !

    • The BSR regex is simply the literal string <descrip>, with this exact case. So BSR = (?-i:<descrip>

    Finally, the functional regex S/R to use is :

    • SEARCH (?-s)(?-i:<descrip>|(?!\A)\G).*?\K(?:\x20\(.+?\))

    • REPLACE Leave EMPTY

    • Open the Replace dialog Ctrl + H

    • Untick all options

    • Select the Regular expression search mode

    • Move to the very beginning of current file ( Ctrl + Home )

    • Hit several times the Find Next button to verify if the FR regex does match what you want ! In this present case it matches a space followed by text between parentheses

    • Again, move to the very beginning of current file ( Ctrl + Home )

    • Click, once only, on the Replace All button

    => As expected, all text between parentheses, of the <descrip> tag only, has been deleted, but the other parentheses, present in other tags, are untouched !


    In the second example, we’ll try to replace any number of consecutive dash character with a single space char in any range <text>..........</text>, possibly splitted into several lines

    • Paste the following XML text in a new tab
    <val>37--001</val>
    <text>This-is
    -a</text>
    <pos>4-1234</pos>
    
    <val>37--002</val>
    <text>-small---example</text>
    <pos>9-0012</pos>
    
    
    <val>37--003</val>
    <text>-of-text-
    which-</text>
    <pos>1-9999</pos>
    
    
    <val>37--004</val>
    
    <text>need
    -to-be-
    modi
    fied</text>
    
    <pos>0-0000</pos>
    
    • As, this time, the <text>..........</text> may be spread over several lines, we’ll use the first generic regex :

      • SEARCH (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)

      • REPLACE RR

    • Obviously, the RR regex is simply \x20

    • Now, the FR regex represents a non-null number of consecutive dashe(s) => FR is just -+, as the non-capturing group seems not needed at all

    • The BSR regex is simply the literal string <text>, with this exact case => BSR = (?-si:<text>

    • The ESR regex is the literal string </text>, with this exact case. So the BSR regex, within its non-capturing group, is (?s-i:(?!</text>).)

    Then, the real regex S/R to use is :

    • SEARCH (?-si:<text>|(?!\A)\G)(?s-i:(?!</text>).)*?\K-+

    • REPLACE \x20

    • Open the Replace dialog Ctrl + H

    • Untick all options

    • Select the Regular expression search mode

    • Move to the very beginning of current file ( Ctrl + Home )

    • Hit several times the Find Next button to verify if the FR regex does match what you want ! In this present case it matches any consecutive range of dash chars

    • Again, move to the very beginning of current file ( Ctrl + Home )

    • Click, once only, on the Replace All button

    => As expected, all range of consecutive dashes, of the <text> tag only, have been replaced with a single space char and the other dash characters, present in other tags, are kepted !

    1 Reply Last reply Reply Quote 6
    • PeterJonesP PeterJones referenced this topic on Mar 9, 2022, 12:48 AM
    • G
      guy038
      last edited by guy038 Dec 23, 2022, 3:43 PM Mar 9, 2022, 2:53 AM

      Two other examples regarding this generic regex ! In these ones, we’ll even restrict the replacements to each concerned zone before a # character !

      Paste the text below in a new tab :

      <iden>123456 (START)</iden>
      <name>Case_1</name>
      <descrip>This is a (short) text to (easily) see the results (of the modifications)# (12345) test (67890)</descrip>
      <param>val (250)</param>
      
      <iden>123456</iden>
      <name>Case_2</name>
      <descrip>And the (obvious) changes occur only in (the) "descrip" tag # Parentheses (Yeaah) OK</descrip>
      <param>val (500)</param>
      
      <iden>123456 (END)</iden>
      <name>Case_3</name>
      <descrip>All (the) other tags are (just) untouched #(This is) the end (of the test)</descrip>
      <param>val (999)</param>
      

      In this first example, of single-line <descrip> tags , two solutions are possible :

      • Use the complete generic regex (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR) where ESR = # which leads to the functional S/R :

        • SEARCH (?-s)(?-i:<descrip>|(?!\A)\G)((?!#).)*?\K(?:\x20\(.+?\))

        • REPLACE Leave EMPTY

      => This time, in addition to only replace in each <descrip>..........</descrip> zone, NO replacement will occur after the # character of each <descrip> tag !

      • Use the simplified solution and add a ESR condition at the end of the regex, giving this generic variant (?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)(?=ESR)

        • SEARCH (?-s)(?-i:<descrip>|(?!\A)\G).*?\K(?:\x20\(.+?\))(?=.*#)

        • REPLACE Leave EMPTY

      However, this other solution needs that all the <descrip> tags contains a comment zone with a # char


      Now, paste this other text below in a new tab :

      <val>37--001</val>
      <text>This-is
      -a--very---< # Dashes - - - OK/text>
      <pos>4-1234</pos>
      
      <val>37--002</val>
      <text>-small----#---example</text>
      <pos>9-0012</pos>
      
      
      <val>37--003</val>
      <text>-of-a-text-
      which-</text>
      <pos>1-9999</pos>
      
      
      <val>37--004</val>
      
      <text>need
      -to-be-
      modi
      fied # but - not - there</text>
      
      <pos>0-0000</pos>
      

      This second example is a multi-lines replacement, in each <text>.............</text> zone only and also limited to the part before a # char which can be present or not

      Of course, we’ll have to use the complete generic regex (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR) but, instead of a single (?!ESR), we’ll have to use this variant :

      (?-si:BSR|(?!\A)\G)(?s-i:(?!ESR_1)(?!ESR_2).)*?\K(?-si:FR)

      So, the functional regex S/R becomes :

      • SEARCH (?-si:<text>|(?!\A)\G)(?s-i:(?!</text>)(?!#).)*?\K-+

      • REPLACE \x20

      => ONLY IF a sequence of dashes is located in a <text>..........</text> zone AND, moreover, before a possible # char, it will be replaced with a single space character

      As you can verify, the third multi-lines <text>.............</text> zone does not contain any # char. Thus, all dash characters, of that <Text> tag, are replaced with a single space char !


      Remainder :

      • You must use, at least, the v7.9.1 N++ release, so that the \A assertion is correctly handled

      • Move to the very beginning of file, before any Find Next sequence or Replace All operation

      • Do not click on the step-by-step Replace button

      1 Reply Last reply Reply Quote 6
      • PeterJonesP PeterJones referenced this topic on May 10, 2022, 2:14 PM
      • PeterJonesP PeterJones referenced this topic on May 10, 2022, 2:16 PM
      • PeterJonesP PeterJones referenced this topic on May 21, 2022, 8:37 PM
      • Alan KilbornA Alan Kilborn referenced this topic on May 23, 2022, 12:50 PM
      • PeterJonesP PeterJones referenced this topic on May 23, 2022, 2:31 PM
      • G guy038 referenced this topic on May 26, 2022, 8:46 AM
      • G guy038 referenced this topic on May 26, 2022, 8:48 AM
      • G guy038 referenced this topic on May 26, 2022, 8:49 AM
      • PeterJonesP PeterJones referenced this topic on May 27, 2022, 1:07 PM
      • PeterJonesP PeterJones referenced this topic on May 27, 2022, 1:08 PM
      • Luigi Giuseppe De FranceschiL Luigi Giuseppe De Franceschi referenced this topic on May 31, 2022, 1:41 PM
      • G guy038 referenced this topic on Jun 1, 2022, 5:45 PM
      • G guy038 referenced this topic on Jun 1, 2022, 5:47 PM
      • G guy038 referenced this topic on Jun 1, 2022, 9:12 PM
      • Alan KilbornA Alan Kilborn referenced this topic on Aug 6, 2022, 8:23 PM
      • PeterJonesP PeterJones referenced this topic on Oct 2, 2022, 10:58 PM
      • PeterJonesP PeterJones referenced this topic on Oct 3, 2022, 6:36 PM
      • PeterJonesP PeterJones referenced this topic on Oct 18, 2022, 4:25 PM
      • PeterJonesP PeterJones referenced this topic on Oct 18, 2022, 5:37 PM
      • Alan KilbornA Alan Kilborn referenced this topic on Nov 5, 2022, 6:53 PM
      • Alan KilbornA Alan Kilborn referenced this topic on Nov 22, 2022, 1:37 AM
      • Terry RT Terry R referenced this topic on Nov 23, 2022, 5:04 AM
      • Alan KilbornA Alan Kilborn referenced this topic on Nov 24, 2022, 7:02 PM
      • PeterJonesP PeterJones referenced this topic on Dec 3, 2022, 7:44 PM
      • Alan KilbornA Alan Kilborn referenced this topic on Dec 22, 2022, 3:47 PM
      • Alan KilbornA Alan Kilborn referenced this topic on Jan 4, 2023, 1:10 PM
      • fenzek1F fenzek1 referenced this topic on Jan 4, 2023, 3:19 PM
      • Terry RT Terry R referenced this topic on Jan 18, 2023, 9:55 PM
      • PeterJonesP PeterJones referenced this topic on Jan 18, 2023, 9:56 PM
      • PeterJonesP PeterJones referenced this topic on Feb 8, 2023, 6:02 PM
      • Alan KilbornA Alan Kilborn referenced this topic on Feb 9, 2023, 1:28 PM
      • PeterJonesP PeterJones referenced this topic on Mar 1, 2023, 4:19 PM
      • PeterJonesP PeterJones referenced this topic on Mar 8, 2023, 2:53 PM
      • Alan KilbornA Alan Kilborn referenced this topic on Mar 19, 2023, 12:26 PM
      • G guy038 referenced this topic on Apr 4, 2023, 12:32 PM
      • Paul WormerP Paul Wormer referenced this topic on May 5, 2023, 9:19 AM
      • PeterJonesP PeterJones referenced this topic on Jun 5, 2023, 6:42 PM
      • Mark OlsonM Mark Olson referenced this topic on Jun 24, 2023, 2:32 AM
      • PeterJonesP PeterJones referenced this topic on Aug 14, 2023, 1:42 PM
      • G guy038 referenced this topic on Sep 26, 2023, 10:51 AM
      • Paul WormerP Paul Wormer referenced this topic on Oct 10, 2023, 7:33 AM
      • Alan KilbornA Alan Kilborn referenced this topic on Nov 6, 2023, 6:00 PM
      • Alan KilbornA Alan Kilborn referenced this topic on Nov 8, 2023, 12:06 PM
      • Terry RT Terry R referenced this topic on Dec 11, 2023, 6:25 PM
      • PeterJonesP PeterJones referenced this topic on Dec 30, 2023, 8:29 PM
      • dr ramaanandD dr ramaanand referenced this topic on Dec 30, 2023, 8:33 PM
      • Terry RT Terry R referenced this topic on Jan 21, 2024, 8:36 PM
      • Sylvester BullittS Sylvester Bullitt referenced this topic on Feb 5, 2024, 11:22 PM
      • Sylvester BullittS Sylvester Bullitt referenced this topic on Feb 6, 2024, 11:34 AM
      • Terry RT Terry R referenced this topic on Mar 26, 2024, 9:24 PM
      • G guy038 referenced this topic on Apr 1, 2024, 8:05 AM
      • Mark OlsonM Mark Olson referenced this topic on May 22, 2024, 3:17 PM
      • Alan KilbornA Alan Kilborn referenced this topic on Jun 5, 2024, 10:09 AM
      • mkupperM mkupper referenced this topic on Jul 4, 2024, 7:23 PM
      • G guy038 referenced this topic on Jul 24, 2024, 9:50 AM
      • CoisesC Coises referenced this topic on Aug 18, 2024, 5:54 AM
      • CoisesC Coises referenced this topic on Aug 18, 2024, 5:33 PM
      • Alan KilbornA Alan Kilborn referenced this topic on Sep 30, 2024, 11:04 AM
      • G guy038 referenced this topic on Sep 30, 2024, 1:04 PM
      • Alan KilbornA Alan Kilborn referenced this topic on Oct 27, 2024, 3:12 PM
      • G guy038 referenced this topic on Nov 29, 2024, 10:22 AM
      • Terry RT Terry R referenced this topic on Dec 1, 2024, 8:32 PM
      • G guy038 referenced this topic on Dec 2, 2024, 10:10 AM
      • Mark OlsonM Mark Olson referenced this topic on Dec 5, 2024, 5:28 AM
      • G guy038 referenced this topic on Dec 10, 2024, 4:44 PM
      • G guy038 referenced this topic on Dec 12, 2024, 3:28 PM
      • Terry RT Terry R referenced this topic on Dec 17, 2024, 2:01 AM
      1 out of 2
      • First post
        1/2
        Last post
      The Community of users of the Notepad++ text editor.
      Powered by NodeBB | Contributors