Append third row of text to a find replace?

Matthew Suhajda

I have 1k+ files that I need to find foo in and make it foo&3rdrow. Least complex way to make this happen without hand massaging each file? Thanks!

guy038

Hello @Matthew-suhajda,

I’m trying to guess what you want, because a literal example would be welcomed !

For instance, from the initial text, below :

Line foo 1
Line 2
Line 3
Line 4
foo Line 5 
Line 6
Line 7 foo

are you expecting the following text ?

Line foo&Line3 1
Line 2
Line 3
Line 4
foo&Line3 Line 5 
Line 6
Line 7 foo&line3

If so, I will be able to post a solution, next time, which does the job, with two consecutive regex search/replacements !

See you later,

Best Regards,

guy038

Matthew Suhajda

All 1k files have a date and time stamp at line 3. I need to append that stamp to several places in each file before proceeding to clean up the rest of the data and combine all the files. So more like

Line 1
Line 2
Line 3 1/1/2008 14:53:36
Line 4 foo
Line 5 dsafoj
Line 6 adsaf foo
Line 7
Line 8 12341234 foo sdfsd

I think your example has it correct, but hopefully the above illustrates the case a tad better.

guy038

Hi, @Matthew-suhajda,

OK for the time stamp, in line 3 of your files, but you didn’t add what text you expect to, after replacement !

I mean : Is the foo generic expression, located, in your example, in lines 4, 6 and 8, should be replaced with :

A : foo&1/1/2008 14:53:36 , with a literal & between ?
B : foo1/1/2008 14:53:36 , simply attached ?
C : foo 1/1/2008 14:53:36, with a space separator ?
D : Other Case ?

Cheers,

guy038

Matthew Suhajda

A space between would be perfect.

Scott Sumner

Note: Pasted as an image because of the stupid spam filter!! :-D

Imgur
Imgur

guy038

Hello, @Matthew-suhajda, and All,

Well ! My idea consists in three steps :

Firstly, copy the time stamp 3rd line ( which I suppose to be different for each file !) at the very end of each file scanned, after a pure blank line ! this new line won’t be followed by any line-break
Secondly, search for any occurrence of the foo expression, with a look-ahead feature ( always true ) which stores the the very last line ( The time stamp line ) , added during the previous S/R step, as group 1
Thirdly, delete, in each file scanned, the very last line, temporarily added

The first point is realized with a first regex S/R. The second and third ones are done, all together, by a second regex S/R

Note that it’s necessary to copy the time stamp line at the very end, because, once the regex engine position is after Line 3, looking for some foo occurrences, it cannot remember that specific line, because it’s not part, anymore, of the later matches !

So, let’s imagine the text below, with the time stamp in line 3 and the foo word, in lines 2, 6, 8 and 10 :

This is a small
example foo of
1/1/2008 14:53:36
text for testing
the Matthew's goal !
foo It doesn't
mean anything
and foo it's created
to test the
search/replacement foo
That's the end.

Then, the first regex S/R :

SEARCH (?-s)^(?:.*\R){2}(.+)(?s).+

REPLACE $0\r\n\1

would give the following text, with a last line ( the 3rd ) added :

This is a small
example foo of
1/1/2008 14:53:36
text for testing
the Matthew's goal !
foo It doesn't
mean anything
and foo it's created
to test the
search/replacement foo
That's the end.

1/1/2008 14:53:36

Now, the second regex S/R :

SEARCH (?i)foo(?s)(?=.*\R(.+)\z)|(?-s)\R.+\z

REPLACE ?1foo\x20\1

give the expected text, below :

This is a small
example foo 1/1/2008 14:53:36 of
1/1/2008 14:53:36
text for testing
the Matthew's goal !
foo 1/1/2008 14:53:36 It doesn't
mean anything
and foo 1/1/2008 14:53:36 it's created
to test the
search/replacement foo 1/1/2008 14:53:36
That's the end.

I supposed that the search is insensitive to the case, so words FOO, Foo,… would match. If you prefer a sensitive search, just change the first regex part (?i) with the (?-i) syntax

Practically, Matthew, follow these few steps :

First, BACKUP all the files, concerned with these S/R ( IMPORTANT )
Start Notepad ++ and open the Find in Files dialog
Type in (?-s)^(?:.*\R){2}(.+)(?s).+ , in the Find what: zone
Type in $0\r\n\1 , in the Replace with: zone
Enter the right extension of your files ( for instance *.txt, *.html, … ), in the Filters : zone
Add the full pathname of the folder, containing all your files, in the Directory : zone
Select the Regular expression search mode ( IMPORTANT )
Click on the Replace in Files button
Click on the OK button, of the confirmation dialog

At that time, all the files scanned should have a new line, at their end, identical to their 3rd line ! Now :

Change the Find what: zone with the regex (?i)foo(?s)(?=.*\R(.+)\z)|(?-s)\R.+\z
Change the Replace with: zone with the regex ?1foo\x20\1
Click, again, on the Replace in Files button
Click on the OK button, of the confirmation dialog

Et voilà ! any occurrence of foo, in each scanned file, should be followed, after a space separator, with the appropriate time stamp of each file ;-))

Best Regards,

guy038

P.S. :

If you want to, I’ll give you, next time, some explanations about these regexes !!

Matthew Suhajda

@guy038 said:

?1foo\x20\1

Exquisite. Worked perfectly. Hopefully this is the only time I’ll need to do something like this, but I will totally ask for more direction in the future if it comes up again. There’s always a new puzzle when dealing with shitty data! lol

Thank you so very much.

guy038

Hello, @Matthew-suhajda, and All,

Pleased to hear that it worked fine ! Just for information :

Regarding the first S/R :

SEARCH (?-s)^(?:.*\R){2}(.+)(?s).+

REPLACE $0\r\n\1

The modifier (?-s) means that, further dots will match any single character, only
Then, the part ^(?:.*\R){2} looks for the first two lines, with their EOL chars, in a non capturing group
Now, the part (.+) stores, as group 1, the next 3rd line, without its End of Line characters
Finally, the (?s).+ syntax catches all remaining text from End of Line characters of line 3
In replacement, due to the $0 syntax, it re-writes, first, the entire matched text ( = file contents ), followed with a Windows line break ( \r\n ) and, finally, with the group 1 ( = The 3rd line = time stamp )

Regarding the second S/R :

SEARCH (?i)foo(?s)(?=.*\R(.+)\z)|(?-s)\R.+\z

REPLACE ?1foo\x20\1

The searched regex is made of two alternatives, separated by the alternation special character | :
- (?i)foo(?s)(?=.*\R(.+)\z)
- (?-s)\R.+\z
In the first alternative, the part (?i)foo tries, first, to match the foo word, in any case
Then, the (?s)(?=.*\R(.+)\z) syntax represents an always true look-ahead, (?=......), which matches all text after the foo word, till the second to the last line ( .*\R ), and the last ( or 3rd ) line, without any line-break ( (.+)\z ), which is stored as group 1
Near the end of each file, the second alternative, (?-s)\R.+\z, looks for the very last ( or 3rd ) line contents, till the very end of each file ( \z )
In replacement, the ?1foo\x20\1 syntax means :
- If group 1 exists, it rewrites the entire matched string foo, followed with a space character and the time stamp ( last ) line ( \1 )
- If group 1 does not exist ( case of the second alternative ), the very last line, temporarily added, is then, simply, deleted, as no ELSE part is present in the conditional replacement ?1..... !

Best Regards,

guy038