Regex Macro: Creating Macro to Replace variable text with text determined by text on next line



  • Can anyone help me?

    I want to create a Macro using Regular Find and Replace to find five consecutive variable characters in a format \D\d\d\d\d and replace them throughout the text with text created from 4 numeric characters \d\d\d\d with text “Census” appended to them.
    When I tried to achieve this by recording a Macro, the Macro saved the actual text X382 and 1841Census instead of selecting the text as variables.

    e.g. In the following text I want to Replace all occurrences of X382 by 1841Census and all occurrences of X391 by 1861Census

    === Census===
    : 06 JUN 1841.
    ‘’‘Ridge Town, Bondleigh, Devon, England’’’.
    <ref name=“ref_3”>
    Source: [[#S424]] Class: HO107; Piece: 250; Book: 10; Civil Parish: Bondleigh; County: Devon; Enumeration District: 12; Folio: 8; Page: 11; Line: 1; GSU roll: 241323.
    </ref>
    Note: [[#X382]].
    : 07 APR 1861.
    ‘’‘51, South Street, Tormoham, Devon, England’’’.
    <ref name=“ref_4”>
    Source: [[#S421]] Class: RG 9; Piece: 1412; Folio: 59; Page: 57; GSU roll: 542809; Enumeration District: 13.
    </ref>
    Note: [[#X391]].
    .
    .

    .
    ===<span id=‘X382’>X382</span>===
    1841 England - Census transcript - John COOMBE - Household
    Class: HO107; Piece: 250; Book: 10; Civil Parish: Bondleigh; County: Devon; Enumeration District: 12; Folio: 8; Page: 11; Line: 1; GSU roll: 241323.
    Ridge Town, Bondleigh, Devon, England

    Name AgeM AgeF Occupation BiC SIF
    John COOMBE 60 Farmer Y
    Joseph COOMBE 28 Y
    Elizabeth COOMBE 22 Y
    George COOMBE 16 Y
    Richard COOMBE 14 Y
    Christopher COOMBE 11 Y
    Francis COOMBE 2 Y

    ===<span id=‘X391’>X391</span>===
    1861 England - Census transcript - Joseph COOMBES - Household
    Class: RG 9; Piece: 1412; Folio: 59; Page: 57; GSU roll: 542809; Enumeration District: 13.



  • I would use the macro twice - once for 1841Census, then again for 1861Census (unless you can create a macro that will replace all occurrences iteratively!)



  • Hello, @john-slee and All,

    You said :

    e.g. In the following text I want to Replace all occurrences of X382 by 1841Census and all occurrences of X391 by 1861Census

    If so, the correct regex syntax which will process all values in one go, is :

    SEARCH (?-i)\bX3((82)|91)\b

    REPLACE 18(?{2}4:6)1Census

    Notes :

    • The part (?-i) forces a non-insensitive search

    • The part X3 looks for the string X3, with that exact case

    • The part ((82)|91) means that the string X3 must be followed with, either, the number 82 or the number 91

    • The inner parentheses represents the group 2. So, if it matches the string X391, then group 2 is not defined

    • The two assertions \b forces the string X… to be surrounded with non-word chars for matching

    • In replacement :

      • It first writes the string 18

      • According to group 2, the part (?{2}4:6) rewrites digit 4 or digit 6

      • it finally writes ths string 1Census


    Of course, if you need to run this regex S/R very often, it would be sensible to record this S/R in a macro, and use it with a shortcut ;-))

    Best Regards,

    guy038



  • @guy038 Thanks for your suggestion. However, I should have made it clearer that the text to be replaced can vary and is not predictable. I therefore need to replace any text between ===<span id’ = and '> and replace it with the relevant text of the form nnnnCensus with nnnn being taken from the first four digits of the following line.



  • Hello, @john-slee and All,

    Ah… I’m sorry ! I should have examined your text more carefully. No trouble, there’s a solution, anyway !

    If you’ll use the Replace All button, exclusively, here is the right regex S/R, which may be used, either, in a macro :

    SEARCH (?-si)===<span id='\K.+?(?='.+\R(\d+))

    REPLACE \1Census

    Notes :

    • The (?-si) in-line modifiers forces the regex engine :

      • To consider any dot . symbol as matching a single standard char, and not any EOL character

      • To process the S/R in a non-insensitive way

    • Then, the ===<span id=' matches the identical string ===<span id='

    • The special \K syntax resets the match process and the regex engine position

    • Therefore, the part .+? matches the shortest range of standard characters… ( our string X### )

    • With the condition, due to the look-ahead structure (?=.........), that it must be followed with :

      • A single quote and some standard characters '.+

      • Followed with a line-break \R of current line

      • Followed with some digits characters, stored as group 1, due to the embedded parentheses (\d+)

    • In the replacement \1Census, the match ( string X### ) is replaced with the number, located at beginning of next line, followed with the string Census


    Now, if you want to see, at once, the result of each step by step replacement, use this alternate syntax :

    SEARCH (?-si)(===<span id=').+?(?='.+\R(\d+))

    REPLACE \1\2Census

    Notes :

    • The \K syntax is not present. So, the literal string ===<span id=’ is embedded, itself, in parentheses as the group 1 (===<span id=') and will be re-used in replacement. And, the (\d+) represents the group 2

    • In replacement, it first rewrites the beginning of current line \1, followed with the number \2, at beginning of the 2nd line

    Cheers,

    guy038



  • @guy038 Thanks again. However (again - I hope I’m not pushing my luck by asking once more!) this only replaces the first instance of the string. I need it to replace every occurence of each string in the document.
    i.e. In this document, every time X382 occurs it should be replaced by 1841Census and every X391 should be replaced by 1861 Census.
    Is this possible?



  • @John-Slee

    You have more “logic” in your problem statement than a regular expression can handle, I’m afraid.

    In such cases you should probably turn to a scripting plugin, e.g. Pythonscript, that can work with regular expression data, but can also incorporate more logic into it.



  • @John-Slee

    Maybe something like this:

    search_repl_pairs_list = []
    editor.research(r"(?-s)===<span id='(\D\d\d\d).+\R(\d{4})", lambda m: search_repl_pairs_list.append((m.group(1), m.group(2))))
    for tup in search_repl_pairs_list: editor.replace(tup[0], tup[1] + 'Census')
    


  • Hi, @john-slee and All,

    No problem. Let’s give it a new try !

    From the initial lines :

    ===<span id='X382'>X382</span>===
    1841 England - Census transcript - John COOMBE - Household
    

    The regex S/R, described in my previous post, changed it as below :

    ===<span id='1841Census'>X382</span>===
    1841 England - Census transcript - John COOMBE - Household
    

    But, may be, you would expect the following result, where the string X382 is changed, both, inside the single quotes and outside :

    ===<span id='1841Census'>1841Census</span>===
    1841 England - Census transcript - John COOMBE - Household
    

    If so, use this new regex S/R, below :

    SEARCH (?-si)(===<span id=).+?(?=</span>.+\R(\d+))

    REPLACE \1'\2Census'>\2Census

    Which can be used, indifferently, with the Replace or the Replace All buttons !

    Cheers,

    guy038

    P.S. :

    Of course, the Alan’s python script is more powerful !



  • @guy038 said in Regex Macro: Creating Macro to Replace variable text with text determined by text on next line:

    Alan’s python script is more powerful !

    Indeed, especially when one notices in the data that some of the references come before the definitions!



  • @Alan-Kilborn Thank you so much, Alan. It’s years since I did any proper programming, though that was a previous occupation. Guess I’m going to have to teach myself to use Python!


Log in to reply