Regex Macro: Creating Macro to Replace variable text with text determined by text on next line

John Slee

Can anyone help me?

I want to create a Macro using Regular Find and Replace to find five consecutive variable characters in a format \D\d\d\d\d and replace them throughout the text with text created from 4 numeric characters \d\d\d\d with text “Census” appended to them.
When I tried to achieve this by recording a Macro, the Macro saved the actual text X382 and 1841Census instead of selecting the text as variables.

e.g. In the following text I want to Replace all occurrences of X382 by 1841Census and all occurrences of X391 by 1861Census

=== Census===
: 06 JUN 1841.
‘’‘Ridge Town, Bondleigh, Devon, England’‘’.
<ref name=“ref_3”>
Source: [[#S424]] Class: HO107; Piece: 250; Book: 10; Civil Parish: Bondleigh; County: Devon; Enumeration District: 12; Folio: 8; Page: 11; Line: 1; GSU roll: 241323.
</ref>
Note: [[#X382]].
: 07 APR 1861.
‘’‘51, South Street, Tormoham, Devon, England’‘’.
<ref name=“ref_4”>
Source: [[#S421]] Class: RG 9; Piece: 1412; Folio: 59; Page: 57; GSU roll: 542809; Enumeration District: 13.
</ref>
Note: [[#X391]].
.
.

.
===X382===
1841 England - Census transcript - John COOMBE - Household
Class: HO107; Piece: 250; Book: 10; Civil Parish: Bondleigh; County: Devon; Enumeration District: 12; Folio: 8; Page: 11; Line: 1; GSU roll: 241323.
Ridge Town, Bondleigh, Devon, England

Name	AgeM	AgeF	Occupation	BiC
John COOMBE	60		Farmer	Y
Joseph COOMBE	28			Y
Elizabeth COOMBE		22		Y
George COOMBE	16			Y
Richard COOMBE	14			Y
Christopher COOMBE	11			Y
Francis COOMBE	2			Y

===X391===
1861 England - Census transcript - Joseph COOMBES - Household
Class: RG 9; Piece: 1412; Folio: 59; Page: 57; GSU roll: 542809; Enumeration District: 13.

John Slee

I would use the macro twice - once for 1841Census, then again for 1861Census (unless you can create a macro that will replace all occurrences iteratively!)

guy038

Hello, @john-slee and All,

You said :

e.g. In the following text I want to Replace all occurrences of X382 by 1841Census and all occurrences of X391 by 1861Census

If so, the correct regex syntax which will process all values in one go, is :

SEARCH (?-i)\bX3((82)|91)\b

REPLACE 18(?{2}4:6)1Census

Notes :

The part (?-i) forces a non-insensitive search
The part X3 looks for the string X3, with that exact case
The part ((82)|91) means that the string X3 must be followed with, either, the number 82 or the number 91
The inner parentheses represents the group 2. So, if it matches the string X391, then group 2 is not defined
The two assertions \b forces the string X… to be surrounded with non-word chars for matching
In replacement :
- It first writes the string 18
- According to group 2, the part (?{2}4:6) rewrites digit 4 or digit 6
- it finally writes ths string 1Census

Of course, if you need to run this regex S/R very often, it would be sensible to record this S/R in a macro, and use it with a shortcut ;-))

Best Regards,

guy038

John Slee

@guy038 Thanks for your suggestion. However, I should have made it clearer that the text to be replaced can vary and is not predictable. I therefore need to replace any text between === and replace it with the relevant text of the form nnnnCensus with nnnn being taken from the first four digits of the following line.

guy038

Hello, @john-slee and All,

Ah… I’m sorry ! I should have examined your text more carefully. No trouble, there’s a solution, anyway !

If you’ll use the Replace All button, exclusively, here is the right regex S/R, which may be used, either, in a macro :

SEARCH (?-si)===<span id='\K.+?(?='.+\R(\d+))

REPLACE \1Census

Notes :

The (?-si) in-line modifiers forces the regex engine :
- To consider any dot . symbol as matching a single standard char, and not any EOL character
- To process the S/R in a non-insensitive way
Then, the ===<span id=' matches the identical string ===<span id=’
The special \K syntax resets the match process and the regex engine position
Therefore, the part .+? matches the shortest range of standard characters… ( our string X### )
With the condition, due to the look-ahead structure (?=.........), that it must be followed with :
- A single quote and some standard characters '.+
- Followed with a line-break \R of current line
- Followed with some digits characters, stored as group 1, due to the embedded parentheses (\d+)
In the replacement \1Census, the match ( string X### ) is replaced with the number, located at beginning of next line, followed with the string Census

Now, if you want to see, at once, the result of each step by step replacement, use this alternate syntax :

SEARCH (?-si)(===<span id=').+?(?='.+\R(\d+))

REPLACE \1\2Census

Notes :

The \K syntax is not present. So, the literal string ===<span id=’ is embedded, itself, in parentheses as the group 1 (===<span id=') and will be re-used in replacement. And, the (\d+) represents the group 2
In replacement, it first rewrites the beginning of current line \1, followed with the number \2, at beginning of the 2nd line

Cheers,

guy038

John Slee

@guy038 Thanks again. However (again - I hope I’m not pushing my luck by asking once more!) this only replaces the first instance of the string. I need it to replace every occurence of each string in the document.
i.e. In this document, every time X382 occurs it should be replaced by 1841Census and every X391 should be replaced by 1861 Census.
Is this possible?

Alan Kilborn

@John-Slee

You have more “logic” in your problem statement than a regular expression can handle, I’m afraid.

In such cases you should probably turn to a scripting plugin, e.g. Pythonscript, that can work with regular expression data, but can also incorporate more logic into it.

Alan Kilborn

@John-Slee

Maybe something like this:

search_repl_pairs_list = []
editor.research(r"(?-s)===<span id='(\D\d\d\d).+\R(\d{4})", lambda m: search_repl_pairs_list.append((m.group(1), m.group(2))))
for tup in search_repl_pairs_list: editor.replace(tup[0], tup[1] + 'Census')

guy038

Hi, @john-slee and All,

No problem. Let’s give it a new try !

From the initial lines :

===<span id='X382'>X382</span>===
1841 England - Census transcript - John COOMBE - Household

The regex S/R, described in my previous post, changed it as below :

===<span id='1841Census'>X382</span>===
1841 England - Census transcript - John COOMBE - Household

But, may be, you would expect the following result, where the string X382 is changed, both, inside the single quotes and outside :

===<span id='1841Census'>1841Census</span>===
1841 England - Census transcript - John COOMBE - Household

If so, use this new regex S/R, below :

SEARCH (?-si)(===.+\R(\d+))

REPLACE \1'\2Census'>\2Census

Which can be used, indifferently, with the Replace or the Replace All buttons !

Cheers,

guy038

P.S. :

Of course, the Alan’s python script is more powerful !

Alan Kilborn

@guy038 said in Regex Macro: Creating Macro to Replace variable text with text determined by text on next line:

Alan’s python script is more powerful !

Indeed, especially when one notices in the data that some of the references come before the definitions!

John Slee

@Alan-Kilborn Thank you so much, Alan. It’s years since I did any proper programming, though that was a previous occupation. Guess I’m going to have to teach myself to use Python!