Find (+n) and replace

alexarda

Hi all

I’m trying to find the best way to input a character 14 characters after the end of a variable with Notepad++ and PythonScript. My data looks like this:

<p><a href="http://www.webaddress1">Website 1</a>, Bob's Brilliant Blog 1 Jun 2020</p>
<p><a href="https://www.webaddress2">Website 2</a>, Rachel's Raucous Readings 30 May 2020</p>
<p><a href="https://www.webaddress3">Website 3</a>, Alex's Awful Arias 29 May 2020</p>
<p><a href="http://www.webaddress4">Website 4</a>, Bob's Brilliant Blog 28 May 2020</p>

The date will always be changing, but will only ever use three letter abbreviations (Apr, Jul, Sep etc).

The end goal is to find all examples where “Bob’s Brilliant Blog” appears and add a single character (“1”) 14 characters after the end of “Blog”, but before </p>.

<p><a href="http://www.webaddress1">Website 1</a>, Bob's Brilliant Blog 1 Jun 2020  1</p>
<p><a href="https://www.webaddress2">Website 2</a>, Rachel's Raucous Readings 30 May 2020</p>
<p><a href="https://www.webaddress3">Website 3</a>, Alex's Awful Arias 29 May 2020</p>
<p><a href="http://www.webaddress4">Website 4</a>, Bob's Brilliant Blog 28 May 2020 1</p>

Can anyone point me in the right direction?

Ekopalypse

@alexarda

If I understand your question correctly, then I think this can do it

search_string = r"Bob's Brilliant Blog"    
re_search_for = r"(?<={0}).*(?=</p>)".format(search_string)
editor.rereplace(re_search_for, lambda m: '{0:<13}1'.format(m.group()))

Let me know if you need a description of the code.

alexarda

@Ekopalypse

Really, really appreciate this. It works! Thank you!

For my own education do you mind stepping through what’s happening with the code?

Ekopalypse

@alexarda

search_string = r “Bob’s Brilliant Blog”
re_search_for = r"(?<={0}).*(?=</p>)".format(search_string)
editor.rereplace(re_search_for, lambda m: ‘{0:<13}1’.format(m.group())

search_string is only intended for easier editing if you use also want to search for other strings that follow the same pattern.

re_search_for is then the actual search_string which contains a regex string assembled from 3 parts

(?<={0}) the placeholder {0} is filled via the format function
result is (?<=Bob’s Brilliant Blog)
.* I’m sure you know what that means, match all or nothing, greedy
(?=</p>) which means that a previous match only occures if this follows.

All in all that means, look for anything with Bob’s Brilliant Blog
begins and ends with </p>.

editor.rereplace now searches via the regex pattern and forwards every match to
a function that expects a match object as a parameter and is defined here with lambda.

‘{0:<13}1’.format(m.group()) finally means
0 = what is contained in m.group() and :<13 means the strings of m.group
is filled up with blanks of up to 13 characters if necessary.
The 1 after it is simply appended.

That’s it. :-)

Now that I wrote it I think a safer search would be in a non-greedy manner.

replace this

r"(?<={0}).*(?=</p>)"

with that

r"(?<={0}).+?(?=</p>)"

The changes means, there must be anything and the regex matches as less as possible to meat the requirement.