Community
    • Login

    Regex Macro: Creating Macro to Replace variable text with text determined by text on next line

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    regexmacro
    11 Posts 3 Posters 2.0k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • John SleeJ
      John Slee
      last edited by

      Can anyone help me?

      I want to create a Macro using Regular Find and Replace to find five consecutive variable characters in a format \D\d\d\d\d and replace them throughout the text with text created from 4 numeric characters \d\d\d\d with text “Census” appended to them.
      When I tried to achieve this by recording a Macro, the Macro saved the actual text X382 and 1841Census instead of selecting the text as variables.

      e.g. In the following text I want to Replace all occurrences of X382 by 1841Census and all occurrences of X391 by 1861Census

      === Census===
      : 06 JUN 1841.
      ‘’‘Ridge Town, Bondleigh, Devon, England’‘’.
      <ref name=“ref_3”>
      Source: [[#S424]] Class: HO107; Piece: 250; Book: 10; Civil Parish: Bondleigh; County: Devon; Enumeration District: 12; Folio: 8; Page: 11; Line: 1; GSU roll: 241323.
      </ref>
      Note: [[#X382]].
      : 07 APR 1861.
      ‘’‘51, South Street, Tormoham, Devon, England’‘’.
      <ref name=“ref_4”>
      Source: [[#S421]] Class: RG 9; Piece: 1412; Folio: 59; Page: 57; GSU roll: 542809; Enumeration District: 13.
      </ref>
      Note: [[#X391]].
      .
      .

      .
      ===<span id=‘X382’>X382</span>===
      1841 England - Census transcript - John COOMBE - Household
      Class: HO107; Piece: 250; Book: 10; Civil Parish: Bondleigh; County: Devon; Enumeration District: 12; Folio: 8; Page: 11; Line: 1; GSU roll: 241323.
      Ridge Town, Bondleigh, Devon, England

      Name AgeM AgeF Occupation BiC SIF
      John COOMBE 60 Farmer Y
      Joseph COOMBE 28 Y
      Elizabeth COOMBE 22 Y
      George COOMBE 16 Y
      Richard COOMBE 14 Y
      Christopher COOMBE 11 Y
      Francis COOMBE 2 Y

      ===<span id=‘X391’>X391</span>===
      1861 England - Census transcript - Joseph COOMBES - Household
      Class: RG 9; Piece: 1412; Folio: 59; Page: 57; GSU roll: 542809; Enumeration District: 13.

      1 Reply Last reply Reply Quote 0
      • John SleeJ
        John Slee
        last edited by

        I would use the macro twice - once for 1841Census, then again for 1861Census (unless you can create a macro that will replace all occurrences iteratively!)

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by

          Hello, @john-slee and All,

          You said :

          e.g. In the following text I want to Replace all occurrences of X382 by 1841Census and all occurrences of X391 by 1861Census

          If so, the correct regex syntax which will process all values in one go, is :

          SEARCH (?-i)\bX3((82)|91)\b

          REPLACE 18(?{2}4:6)1Census

          Notes :

          • The part (?-i) forces a non-insensitive search

          • The part X3 looks for the string X3, with that exact case

          • The part ((82)|91) means that the string X3 must be followed with, either, the number 82 or the number 91

          • The inner parentheses represents the group 2. So, if it matches the string X391, then group 2 is not defined

          • The two assertions \b forces the string X… to be surrounded with non-word chars for matching

          • In replacement :

            • It first writes the string 18

            • According to group 2, the part (?{2}4:6) rewrites digit 4 or digit 6

            • it finally writes ths string 1Census


          Of course, if you need to run this regex S/R very often, it would be sensible to record this S/R in a macro, and use it with a shortcut ;-))

          Best Regards,

          guy038

          John SleeJ 1 Reply Last reply Reply Quote 1
          • John SleeJ
            John Slee @guy038
            last edited by

            @guy038 Thanks for your suggestion. However, I should have made it clearer that the text to be replaced can vary and is not predictable. I therefore need to replace any text between ===<span id’ = and '> and replace it with the relevant text of the form nnnnCensus with nnnn being taken from the first four digits of the following line.

            1 Reply Last reply Reply Quote 2
            • guy038G
              guy038
              last edited by

              Hello, @john-slee and All,

              Ah… I’m sorry ! I should have examined your text more carefully. No trouble, there’s a solution, anyway !

              If you’ll use the Replace All button, exclusively, here is the right regex S/R, which may be used, either, in a macro :

              SEARCH (?-si)===<span id='\K.+?(?='.+\R(\d+))

              REPLACE \1Census

              Notes :

              • The (?-si) in-line modifiers forces the regex engine :

                • To consider any dot . symbol as matching a single standard char, and not any EOL character

                • To process the S/R in a non-insensitive way

              • Then, the ===<span id=' matches the identical string ===<span id=’

              • The special \K syntax resets the match process and the regex engine position

              • Therefore, the part .+? matches the shortest range of standard characters… ( our string X### )

              • With the condition, due to the look-ahead structure (?=.........), that it must be followed with :

                • A single quote and some standard characters '.+

                • Followed with a line-break \R of current line

                • Followed with some digits characters, stored as group 1, due to the embedded parentheses (\d+)

              • In the replacement \1Census, the match ( string X### ) is replaced with the number, located at beginning of next line, followed with the string Census


              Now, if you want to see, at once, the result of each step by step replacement, use this alternate syntax :

              SEARCH (?-si)(===<span id=').+?(?='.+\R(\d+))

              REPLACE \1\2Census

              Notes :

              • The \K syntax is not present. So, the literal string ===<span id=’ is embedded, itself, in parentheses as the group 1 (===<span id=') and will be re-used in replacement. And, the (\d+) represents the group 2

              • In replacement, it first rewrites the beginning of current line \1, followed with the number \2, at beginning of the 2nd line

              Cheers,

              guy038

              John SleeJ 1 Reply Last reply Reply Quote 1
              • John SleeJ
                John Slee @guy038
                last edited by

                @guy038 Thanks again. However (again - I hope I’m not pushing my luck by asking once more!) this only replaces the first instance of the string. I need it to replace every occurence of each string in the document.
                i.e. In this document, every time X382 occurs it should be replaced by 1841Census and every X391 should be replaced by 1861 Census.
                Is this possible?

                Alan KilbornA 1 Reply Last reply Reply Quote 0
                • Alan KilbornA
                  Alan Kilborn @John Slee
                  last edited by

                  @John-Slee

                  You have more “logic” in your problem statement than a regular expression can handle, I’m afraid.

                  In such cases you should probably turn to a scripting plugin, e.g. Pythonscript, that can work with regular expression data, but can also incorporate more logic into it.

                  1 Reply Last reply Reply Quote 1
                  • Alan KilbornA
                    Alan Kilborn
                    last edited by

                    @John-Slee

                    Maybe something like this:

                    search_repl_pairs_list = []
                    editor.research(r"(?-s)===<span id='(\D\d\d\d).+\R(\d{4})", lambda m: search_repl_pairs_list.append((m.group(1), m.group(2))))
                    for tup in search_repl_pairs_list: editor.replace(tup[0], tup[1] + 'Census')
                    
                    John SleeJ 1 Reply Last reply Reply Quote 2
                    • guy038G
                      guy038
                      last edited by guy038

                      Hi, @john-slee and All,

                      No problem. Let’s give it a new try !

                      From the initial lines :

                      ===<span id='X382'>X382</span>===
                      1841 England - Census transcript - John COOMBE - Household
                      

                      The regex S/R, described in my previous post, changed it as below :

                      ===<span id='1841Census'>X382</span>===
                      1841 England - Census transcript - John COOMBE - Household
                      

                      But, may be, you would expect the following result, where the string X382 is changed, both, inside the single quotes and outside :

                      ===<span id='1841Census'>1841Census</span>===
                      1841 England - Census transcript - John COOMBE - Household
                      

                      If so, use this new regex S/R, below :

                      SEARCH (?-si)(===<span id=).+?(?=</span>.+\R(\d+))

                      REPLACE \1'\2Census'>\2Census

                      Which can be used, indifferently, with the Replace or the Replace All buttons !

                      Cheers,

                      guy038

                      P.S. :

                      Of course, the Alan’s python script is more powerful !

                      Alan KilbornA 1 Reply Last reply Reply Quote 1
                      • Alan KilbornA
                        Alan Kilborn @guy038
                        last edited by

                        @guy038 said in Regex Macro: Creating Macro to Replace variable text with text determined by text on next line:

                        Alan’s python script is more powerful !

                        Indeed, especially when one notices in the data that some of the references come before the definitions!

                        1 Reply Last reply Reply Quote 0
                        • John SleeJ
                          John Slee @Alan Kilborn
                          last edited by John Slee

                          @Alan-Kilborn Thank you so much, Alan. It’s years since I did any proper programming, though that was a previous occupation. Guess I’m going to have to teach myself to use Python!

                          1 Reply Last reply Reply Quote 2
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors