• Login
Community
  • Login

Regex Macro: Creating Macro to Replace variable text with text determined by text on next line

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
regexmacro
11 Posts 3 Posters 2.2k Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • J
    John Slee
    last edited by Mar 16, 2020, 11:58 PM

    Can anyone help me?

    I want to create a Macro using Regular Find and Replace to find five consecutive variable characters in a format \D\d\d\d\d and replace them throughout the text with text created from 4 numeric characters \d\d\d\d with text “Census” appended to them.
    When I tried to achieve this by recording a Macro, the Macro saved the actual text X382 and 1841Census instead of selecting the text as variables.

    e.g. In the following text I want to Replace all occurrences of X382 by 1841Census and all occurrences of X391 by 1861Census

    === Census===
    : 06 JUN 1841.
    ‘’‘Ridge Town, Bondleigh, Devon, England’‘’.
    <ref name=“ref_3”>
    Source: [[#S424]] Class: HO107; Piece: 250; Book: 10; Civil Parish: Bondleigh; County: Devon; Enumeration District: 12; Folio: 8; Page: 11; Line: 1; GSU roll: 241323.
    </ref>
    Note: [[#X382]].
    : 07 APR 1861.
    ‘’‘51, South Street, Tormoham, Devon, England’‘’.
    <ref name=“ref_4”>
    Source: [[#S421]] Class: RG 9; Piece: 1412; Folio: 59; Page: 57; GSU roll: 542809; Enumeration District: 13.
    </ref>
    Note: [[#X391]].
    .
    .

    .
    ===<span id=‘X382’>X382</span>===
    1841 England - Census transcript - John COOMBE - Household
    Class: HO107; Piece: 250; Book: 10; Civil Parish: Bondleigh; County: Devon; Enumeration District: 12; Folio: 8; Page: 11; Line: 1; GSU roll: 241323.
    Ridge Town, Bondleigh, Devon, England

    Name AgeM AgeF Occupation BiC SIF
    John COOMBE 60 Farmer Y
    Joseph COOMBE 28 Y
    Elizabeth COOMBE 22 Y
    George COOMBE 16 Y
    Richard COOMBE 14 Y
    Christopher COOMBE 11 Y
    Francis COOMBE 2 Y

    ===<span id=‘X391’>X391</span>===
    1861 England - Census transcript - Joseph COOMBES - Household
    Class: RG 9; Piece: 1412; Folio: 59; Page: 57; GSU roll: 542809; Enumeration District: 13.

    1 Reply Last reply Reply Quote 0
    • J
      John Slee
      last edited by Mar 17, 2020, 12:03 AM

      I would use the macro twice - once for 1841Census, then again for 1861Census (unless you can create a macro that will replace all occurrences iteratively!)

      1 Reply Last reply Reply Quote 0
      • G
        guy038
        last edited by Mar 17, 2020, 4:08 AM

        Hello, @john-slee and All,

        You said :

        e.g. In the following text I want to Replace all occurrences of X382 by 1841Census and all occurrences of X391 by 1861Census

        If so, the correct regex syntax which will process all values in one go, is :

        SEARCH (?-i)\bX3((82)|91)\b

        REPLACE 18(?{2}4:6)1Census

        Notes :

        • The part (?-i) forces a non-insensitive search

        • The part X3 looks for the string X3, with that exact case

        • The part ((82)|91) means that the string X3 must be followed with, either, the number 82 or the number 91

        • The inner parentheses represents the group 2. So, if it matches the string X391, then group 2 is not defined

        • The two assertions \b forces the string X… to be surrounded with non-word chars for matching

        • In replacement :

          • It first writes the string 18

          • According to group 2, the part (?{2}4:6) rewrites digit 4 or digit 6

          • it finally writes ths string 1Census


        Of course, if you need to run this regex S/R very often, it would be sensible to record this S/R in a macro, and use it with a shortcut ;-))

        Best Regards,

        guy038

        J 1 Reply Last reply Mar 17, 2020, 10:19 AM Reply Quote 1
        • J
          John Slee @guy038
          last edited by Mar 17, 2020, 10:19 AM

          @guy038 Thanks for your suggestion. However, I should have made it clearer that the text to be replaced can vary and is not predictable. I therefore need to replace any text between ===<span id’ = and '> and replace it with the relevant text of the form nnnnCensus with nnnn being taken from the first four digits of the following line.

          1 Reply Last reply Reply Quote 2
          • G
            guy038
            last edited by Mar 17, 2020, 1:31 PM

            Hello, @john-slee and All,

            Ah… I’m sorry ! I should have examined your text more carefully. No trouble, there’s a solution, anyway !

            If you’ll use the Replace All button, exclusively, here is the right regex S/R, which may be used, either, in a macro :

            SEARCH (?-si)===<span id='\K.+?(?='.+\R(\d+))

            REPLACE \1Census

            Notes :

            • The (?-si) in-line modifiers forces the regex engine :

              • To consider any dot . symbol as matching a single standard char, and not any EOL character

              • To process the S/R in a non-insensitive way

            • Then, the ===<span id=' matches the identical string ===<span id=’

            • The special \K syntax resets the match process and the regex engine position

            • Therefore, the part .+? matches the shortest range of standard characters… ( our string X### )

            • With the condition, due to the look-ahead structure (?=.........), that it must be followed with :

              • A single quote and some standard characters '.+

              • Followed with a line-break \R of current line

              • Followed with some digits characters, stored as group 1, due to the embedded parentheses (\d+)

            • In the replacement \1Census, the match ( string X### ) is replaced with the number, located at beginning of next line, followed with the string Census


            Now, if you want to see, at once, the result of each step by step replacement, use this alternate syntax :

            SEARCH (?-si)(===<span id=').+?(?='.+\R(\d+))

            REPLACE \1\2Census

            Notes :

            • The \K syntax is not present. So, the literal string ===<span id=’ is embedded, itself, in parentheses as the group 1 (===<span id=') and will be re-used in replacement. And, the (\d+) represents the group 2

            • In replacement, it first rewrites the beginning of current line \1, followed with the number \2, at beginning of the 2nd line

            Cheers,

            guy038

            J 1 Reply Last reply Mar 18, 2020, 4:01 PM Reply Quote 1
            • J
              John Slee @guy038
              last edited by Mar 18, 2020, 4:01 PM

              @guy038 Thanks again. However (again - I hope I’m not pushing my luck by asking once more!) this only replaces the first instance of the string. I need it to replace every occurence of each string in the document.
              i.e. In this document, every time X382 occurs it should be replaced by 1841Census and every X391 should be replaced by 1861 Census.
              Is this possible?

              Alan KilbornA 1 Reply Last reply Mar 18, 2020, 5:22 PM Reply Quote 0
              • Alan KilbornA
                Alan Kilborn @John Slee
                last edited by Mar 18, 2020, 5:22 PM

                @John-Slee

                You have more “logic” in your problem statement than a regular expression can handle, I’m afraid.

                In such cases you should probably turn to a scripting plugin, e.g. Pythonscript, that can work with regular expression data, but can also incorporate more logic into it.

                1 Reply Last reply Reply Quote 1
                • Alan KilbornA
                  Alan Kilborn
                  last edited by Mar 18, 2020, 5:49 PM

                  @John-Slee

                  Maybe something like this:

                  search_repl_pairs_list = []
                  editor.research(r"(?-s)===<span id='(\D\d\d\d).+\R(\d{4})", lambda m: search_repl_pairs_list.append((m.group(1), m.group(2))))
                  for tup in search_repl_pairs_list: editor.replace(tup[0], tup[1] + 'Census')
                  
                  J 1 Reply Last reply Mar 18, 2020, 10:37 PM Reply Quote 2
                  • G
                    guy038
                    last edited by guy038 Mar 18, 2020, 6:05 PM Mar 18, 2020, 6:02 PM

                    Hi, @john-slee and All,

                    No problem. Let’s give it a new try !

                    From the initial lines :

                    ===<span id='X382'>X382</span>===
                    1841 England - Census transcript - John COOMBE - Household
                    

                    The regex S/R, described in my previous post, changed it as below :

                    ===<span id='1841Census'>X382</span>===
                    1841 England - Census transcript - John COOMBE - Household
                    

                    But, may be, you would expect the following result, where the string X382 is changed, both, inside the single quotes and outside :

                    ===<span id='1841Census'>1841Census</span>===
                    1841 England - Census transcript - John COOMBE - Household
                    

                    If so, use this new regex S/R, below :

                    SEARCH (?-si)(===<span id=).+?(?=</span>.+\R(\d+))

                    REPLACE \1'\2Census'>\2Census

                    Which can be used, indifferently, with the Replace or the Replace All buttons !

                    Cheers,

                    guy038

                    P.S. :

                    Of course, the Alan’s python script is more powerful !

                    Alan KilbornA 1 Reply Last reply Mar 18, 2020, 6:17 PM Reply Quote 1
                    • Alan KilbornA
                      Alan Kilborn @guy038
                      last edited by Mar 18, 2020, 6:17 PM

                      @guy038 said in Regex Macro: Creating Macro to Replace variable text with text determined by text on next line:

                      Alan’s python script is more powerful !

                      Indeed, especially when one notices in the data that some of the references come before the definitions!

                      1 Reply Last reply Reply Quote 0
                      • J
                        John Slee @Alan Kilborn
                        last edited by John Slee Mar 18, 2020, 10:38 PM Mar 18, 2020, 10:37 PM

                        @Alan-Kilborn Thank you so much, Alan. It’s years since I did any proper programming, though that was a previous occupation. Guess I’m going to have to teach myself to use Python!

                        1 Reply Last reply Reply Quote 2
                        1 out of 11
                        • First post
                          1/11
                          Last post
                        The Community of users of the Notepad++ text editor.
                        Powered by NodeBB | Contributors