Regex: Put a comma on REPLACE html tags

Vasile Caraus

This post is deleted!

Vasile Caraus

@PeterJones said in Regex: Put a comma on REPLACE html tags:

@Vasile-Caraus ,

I am not sure I understand you (a couple things didn’t translate well, sorry). But I think I have an idea.

I would do it in multiple parts: first, start with the regex you current use to populate the <meta...> tag.
Then use

FIND WHAT = (content=")(.*?(?<!,))\x20\|?\x20*(.*)

REPLACE WITH = $1\l$2,\x20\l$3

Hit Replace multiple times until it finishes.

Basically, I break the line up into three parts: 1: content=", 2: the text before I want the comma, and 3: the text after the comma. I match a space, then an optional | and 0 or more spaces as the separator between tokens (thus removing the | like you show in your example). I use the \l in the replacement to make the first character of groups 2 and 3 lowercase.

If you always knew the number of words in the title was going to be the same number of words, you could probably make it one operation to populate and split and add commas… but this might be better than doing it manually.

thanks, nice. Yes, I must hit multiple times “replace”.

Alan Kilborn

@Vasile-Caraus said in Regex: Put a comma on REPLACE html tags:

I must hit multiple times “replace”.

It may be worth mentioning that if there are a lot of replacements, holding the Alt key and then holding the r key until they are all replaced is easier than clicking on the Replace button once for each individual replacement.

Vasile Caraus

yes, nice solution.

Maybe @guy038 has a much easier solution, a method does the job in a single pass…

Alan Kilborn

@Vasile-Caraus

You don’t get to choose who answers your questions here.

guy038

Hi, @vasile-caraus, @peterjones, @alan-kilborn, and All,

Well, as Peter said, I think that the best is to consider two successive tasks :

Firstly, to copy the contents of the <title>....</title> tag in the content attribute of the nearest <meta...../> tag
Secondly, modify the syntax of the value of the content attribute of the meta tags, adding a comma and space between each word

So, assuming the input text below :

<title>My name is Prince | Always</title>
bla
bla
blah
<meta name="keywords" content="laptop, home, yellow, diamond"/>

<title>American singer-songwriter. musician ; record producer  dancer | actor filmmaker</title>

Dummy
text

<meta name="keywords" content="desk, work, red, heart"/>

<title>multi-instrumentalist & guitar virtuoso</title>
This is
the end
of the test
<meta name="keywords" content="knife, house, green, spade"/>

The following S/R :

SEARCH (?s)<title>(.*?)<\/title>.*?<meta\x20name="keywords"\x20content="\K.*?(?="\/>)

REPLACE \1

would produce :

<title>My name is Prince | Always</title>
bla
bla
blah
<meta name="keywords" content="My name is Prince | Always"/>

<title>American singer-songwriter. musician ; record producer  dancer | actor filmmaker</title>

Dummy
text

<meta name="keywords" content="American singer-songwriter. musician ; record producer  dancer | actor filmmaker"/>

<title>multi-instrumentalist & guitar virtuoso</title>
This is
the end
of the test
<meta name="keywords" content="multi-instrumentalist & guitar virtuoso"/>

Then, the final regex S/R :

SEARCH (?s)<title>.*?<\/title>.*?<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

REPLACE ?1\l\1:,\x20\l\2

would change, in one go, the text between content=" and "/> of the <meta> tag

<title>My name is Prince | Always</title>
bla
bla
blah
<meta name="keywords" content="my, name, is, prince, always"/>

<title>American singer-songwriter. musician ; record producer  dancer | actor filmmaker</title>

Dummy
text

<meta name="keywords" content="american, singer, songwriter, musician, record, producer, dancer, actor, filmmaker"/>

<title>multi-instrumentalist & guitar virtuoso</title>
This is
the end
of the test
<meta name="keywords" content="multi, instrumentalist, guitar, virtuoso"/>

Best Regards,

guy038

Vasile Caraus

wonderful solution, thanks a lot @guy038 !!

guy038

Hi, @vasile-caraus, @peterjones, @alan-kilborn, and All,

Sorry ! I forgot to mention two things :

Vasile, from your example, the text My name is Prince | Always is, finally, changed as my, name, is, prince, always. So I supposed that you wanted all the words in lower-case ! That’s why I added the \l modifier to get a first letter, in lower-case for groups 1 and 2, containing the words. If not the case, omit these modifiers !
Now, in the second S/R, I could had shortened the search regex as below :

SEARCH (?s)<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

But I preferred to use the longer regex because you may have other <meta ..../> tags, not located nearby a <title>....</title> tag and so, not concerned by the replacement !

Cheers,

guy038

Vasile Caraus

@guy038 said in Regex: Put a comma on REPLACE html tags:

(?s)<meta\x20name=“keywords”\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

not working this one. It is just delete all my <title> line. Maybe you should write the entire regex.

PeterJones

@Vasile-Caraus said in Regex: Put a comma on REPLACE html tags:

not working this one. It is just delete all my <title> line.

If the edited one is not working but the previous solution was a “wonderful solution” (and thus presumably working), I recommend you happily use the one that you already called “wonderful”.

----

Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as plain text using the </> toolbar button or manual Markdown syntax. Screenshots can be pasted from the clipbpard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get… Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

Vasile Caraus

@guy038 said in Regex: Put a comma on REPLACE html tags:

(?s)<title>.?</title>.?

the latest regex Guy038 update, now I see, that he just write the second part of regex.

should have included also (?s)<title>.*?<\/title>.*? at the beginning.

So, it is good, now as I test it.

SEARCH: (?s)<title>.*?<\/title>.*?(?s)<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

REPLACE BY: ?1\l\1:,\x20\l\2

guy038

Hi, @vasile-caraus and All,

No, my simplified search regex, for the second S/R works fine, too ! But I also forgot to add that, due to the \G feature, the caret, before searching ( and replacing ), must NOT be located at beginning of a <meta...../> tag . Otherwise, the regex engine wrongly selects the second alternative !

EDIT : I’m wrong again -(( Just forgot everything of this post ! I’ll try, later, to post a correct answer !

Cheers,

guy038

guy038

Hello, @vasile-caraus and All,

In fact, I’m presently on holidays ! Thus, my concentration is not at top level ;-))

So, here is the true story ! When using, for the second regex S/R, the simplified form :

SEARCH (?s)<meta\x20name="keywords"\x20content="\K(\w+)|\G[^\w\r\n]+(\w+)

The first alternative <meta\x20name="keywords"\x20content="\K(\w+) should be used first, in order to detect the first word after the string content="

However, if before running the S/R, the caret is at a location before some non-words chars, followed themselves by some words chars, then the regex engine wrongly selects the second alternative , due to the \G syntax

So, this new simplified syntax can be used if the initial location of the caret is on an pure empty line. Indeed, as the initial \G location must not be followed with, both, \r or \n ( \G[^\w\r\n]+...... ), this means that, necessarily, the next match will come from the first alternative of the regex, which is the correct solution !

But, of course, this new simplified regex is, then, no related, anymore, to a previous <title>......</title> tag !

BR

guy038