Regexp multiline text replace when care about line delimiters
-
Fellow Notepad++ Users,
Could you please help me the the following search-and-replace problem I am having?
I have a list of names from a church graveyard, they have a grave number and a line for each person commemorated there, but for subsequent lines for a stone no grave number is given. So I want to match a line that starts with a number - eg. 24, and then apply that number to the start of each subsequent line for that grave marker (each of the subsequent lines has a space in the first of the tab-separated fields. Basically, capturing the digits at the start of the first line and replacing the space at the start of the next line with that number. Ideally for subsequent rows too, if possible. So it’s a multiline searching problem, but I care about the line start/ends.
Here is the data I currently have (anonymmised) (“before” data) (I’ve replaced all tabs with \t, CRs with \r\n, and spaces with .):
No\tName\tRelationship\tDate.of.Birth\tDate.of.Death\tAge\tOther.information\r\n 1\tFred.BLOGGS\t.\t.\tMay.13.1800\t93.yrs\tBrotherton\r\n .\tLiz\twife\t.\tApr.39.1840\t64.yrs\t.\r\n .\tAnn.DIRGEWALL\t(husband.:.Jay)\t.\tJny.2.1955\t61.yrs\tGable.St\r\n 2\tUnmarked.grave\t.\t.\t.\t.\t.\r\n 3\tUnmarked.grave\t.\t.\t.\t.\t.\r\n 4\tJack.GARDNER\t.\t.\tDec.5.1967\t75.yrs\tGrove.Rd.\r\n .\tJane\twife\t.\tSep.2.1969\t70.yrs\tBlackpool\r\n .\tMary.JONES\twife.of.Adam\t.\tJly.4.1930\t.\t.\r\n 5\tHenry.ALBERT\t.\t.\tJny.4.1900\t68.yrs\tAbbeyrange\r\n .\tLola\twife\t.\tDec.28.1909\t76.yrs\t.\r\n .\tJack.HARBOR\tson.in.law\t.\tJan.29.1976\t49.yrs\t.\r\n .\tJulie\twife\t.\tMay.29.1999\t72.yrs\t.\r\n
Here is how I would like that data to look (“after” data):
No\tName\tRelationship\tDate.of.Birth\tDate.of.Death\tAge\tOther.information\r\n 1\tFred.BLOGGS\t.\t.\tMay.13.1800\t93.yrs\tBrotherton\r\n 1\tLiz\twife\t.\tApr.39.1840\t64.yrs\t.\r\n 1\tAnn.DIRGEWALL\t(husband.:.Jay)\t.\tJny.2.1955\t61.yrs\tGable.St\r\n 2\tUnmarked.grave\t.\t.\t.\t.\t.\r\n 3\tUnmarked.grave\t.\t.\t.\t.\t.\r\n 4\tJack.GARDNER\t.\t.\tDec.5.1967\t75.yrs\tGrove.Rd.\r\n 4\tJane\twife\t.\tSep.2.1969\t70.yrs\tBlackpool\r\n 4\tMary.JONES\twife.of.Adam\t.\tJly.4.1930\t.\t.\r\n 5\tHenry.ALBERT\t.\t.\tJny.4.1900\t68.yrs\tAbbeyrange\r\n 5\tLola\twife\t.\tDec.28.1909\t76.yrs\t.\r\n 5\tJack.HARBOR\tson.in.law\t.\tJan.29.1976\t49.yrs\t.\r\n 5\tJulie\twife\t.\tMay.29.1999\t72.yrs\t.\r\n
initial data:
To accomplish this, I have tried various regexp matches, including
^(\d+?)(\t.*). \t
for matching but anything using “. matches newline” is greedy and matches the whole lot, no matter what I put on the end. (using\1\2.\1\t
for the replace expression),Unfortunately, this did not produce the output I desired, and I’m not sure why. Could you please help me understand what went wrong and help me find the solution?
—
moderator added code markdown around text; please don’t forget to use the
</>
button to mark example text as “code” so that characters don’t get changed by the forum -
@I-Hay said in Regexp multiline text replace when care about line delimiters:
To accomplish this, I have tried various regexp matches, including ^(\d+?)(\t.*). \t for matching but anything using “. matches newline” is greedy and matches the whole lot, no matter what I put on the end. (using \1\2.\1\t for the replace expression),
Unfortunately, this did not produce the output I desired, and I’m not sure why. Could you please help me understand what went wrong and help me find the solution?
You were close. Try:
Find what:
(?-s)^(\d+)(\t.*\R) \t
Replace with:\1\2\1\t
and Replace All repeatedly, until there are no more matches.
The problem with your expression was that it doesn’t do anything to identify the end of a line. If . matches newline is checked, the match goes right through the new line and eats the entire file; if it’s not checked, nothing in your expression can match the new line.
The
(?-s)
in my expression has the same effect as unchecking . matches newline. The\R
matches any line ending (CR, LF or CRLF). -
Hello, @i-hay, @coises and All,
I found out an easy solution to your problem ! To that purpose, I used the the
Edit > Line Operations > Reverse Line Order
menu option, which is quite powerful when multi-lines regex is involved !
So, let’s consider your INPUT text :
No Name Relationship Date of Birth Date of Death Age Other information 1 Fred BLOGGS May 13 1800 93 yrs Brotherton Liz wife Apr 39 1840 64 yrs Ann DIRGEWALL (husband : Jay) Jny 2 1955 61 yrs Gable St 2 Unmarked grave 3 Unmarked grave 4 Jack GARDNER Dec 5 1967 75 yrs Grove Rd Jane wife Sep 2 1969 70 yrs Blackpool Mary JONES wife of Adam Jly 4 1930 5 Henry ALBERT Jny 4 1900 68 yrs Abbeyrange Lola wife Dec 28 1909 76 yrs Jack HARBOR son in law Jan 29 1976 49 yrs Julie wife May 29 1999 72 yrs
-
First, select all the text, which needs re-numbering
-
Use the
Edit > Line Operations > Reverse Line Order
menu option
=> You should get this temporary OUTPUT text :
Julie wife May 29 1999 72 yrs Jack HARBOR son in law Jan 29 1976 49 yrs Lola wife Dec 28 1909 76 yrs 5 Henry ALBERT Jny 4 1900 68 yrs Abbeyrange Mary JONES wife of Adam Jly 4 1930 Jane wife Sep 2 1969 70 yrs Blackpool 4 Jack GARDNER Dec 5 1967 75 yrs Grove Rd 3 Unmarked grave 2 Unmarked grave Ann DIRGEWALL (husband : Jay) Jny 2 1955 61 yrs Gable St Liz wife Apr 39 1840 64 yrs 1 Fred BLOGGS May 13 1800 93 yrs Brotherton No Name Relationship Date of Birth Date of Death Age Other information
-
Now, move back at the beginning of the reversed text
-
Open the Replace dialog (
Ctrl + H
)-
Uncheck all the box options
-
FIND
(?s)^\x20(?=(?:.+?)^(\d+))
-
REPLACE
\1
-
Select the
Regular expression
search mode -
Click, once only, on the
Replace All
button
-
=> Your temporary text is then changed into : :
5 Julie wife May 29 1999 72 yrs 5 Jack HARBOR son in law Jan 29 1976 49 yrs 5 Lola wife Dec 28 1909 76 yrs 5 Henry ALBERT Jny 4 1900 68 yrs Abbeyrange 4 Mary JONES wife of Adam Jly 4 1930 4 Jane wife Sep 2 1969 70 yrs Blackpool 4 Jack GARDNER Dec 5 1967 75 yrs Grove Rd 3 Unmarked grave 2 Unmarked grave 1 Ann DIRGEWALL (husband : Jay) Jny 2 1955 61 yrs Gable St 1 Liz wife Apr 39 1840 64 yrs 1 Fred BLOGGS May 13 1800 93 yrs Brotherton No Name Relationship Date of Birth Date of Death Age Other information
- Finally, redo a
Edit > Line Operations > Reverse Line Order
menu option
Here we are ! We get your expected OUTPUT text, below :
No Name Relationship Date of Birth Date of Death Age Other information 1 Fred BLOGGS May 13 1800 93 yrs Brotherton 1 Liz wife Apr 39 1840 64 yrs 1 Ann DIRGEWALL (husband : Jay) Jny 2 1955 61 yrs Gable St 2 Unmarked grave 3 Unmarked grave 4 Jack GARDNER Dec 5 1967 75 yrs Grove Rd 4 Jane wife Sep 2 1969 70 yrs Blackpool 4 Mary JONES wife of Adam Jly 4 1930 5 Henry ALBERT Jny 4 1900 68 yrs Abbeyrange 5 Lola wife Dec 28 1909 76 yrs 5 Jack HARBOR son in law Jan 29 1976 49 yrs 5 Julie wife May 29 1999 72 yrs
Some details about the search regex :
-
First, the
(?s)
syntax is an in-line modifier which has the same effect that checking the. matches newline
option of the Replace dialog -
Then, the regex just looks for a space character
\x20
, at beginning of a line (^
), ONLY IF it is followed with a look-ahead syntax ((?=..........)
), coming next -
This structure looks, itself, for any character, even
new-line
char(s), in a non-capturing group ((?:....)
) , till the nearest line (.+?
), beginning by a number (^\d+
), that is stored as group1
, due to the embedded parentheses ((\d+)
) -
In replacement, any space character (
\x20
) is then replaced with the contents of the group1
, which is our desired number !
Best Regards,
guy038
-