How to add a blank space and a word before every comma in some meta tags in every html file of a folder?

Ramanand Jhingade

How to add a blank space and a word before every comma in some meta tags in every html file of a folder?
For example,

<META name=keywords content="blah blah blah, blah blah, blah blah blah blah, blah blah blah, blah " />

should become

<META name=keywords content="blah blah blah word, blah blah word, blah blah blah blah word, blah blah blah word, blah " />

and

<META name=keywords content="blah blah blah, blah blah, blah blah blah blah, blah blah blah, blah">

should become

<META name=keywords content="blah blah blah word, blah blah word, blah blah blah blah word, blah blah blah word, blah">

If a RegEx is suggested, please let me know if the “matches newline” should be checked/ticked.
I don’t mind using a python script plugged in to Notepad++ (I have understood how to add plug ins)

Ramanand Jhingade

@guy038 For the same meta tags, mentioned at the top, how to add a blank space followed by for after every occurrence of these words (in the meta tags):-

Homeopathic treatment
Homeopathic doctor
Homeopathic clinic
Homeopathic specialist
Homeopathy
Thanks in advance for the help!

guy038

Hello, @ramanand-jhingade and All,

Concerning your first request we should use, from this post, the generic regex S/R, below :

SEARCH (?-i:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-i:FR)

REPLACE RR

with :

BSR ( Begin Search-region Regex ) = <META\x20
ESR ( End Search-region Regex ) can be ignored ( See explanations, below ! )
FR ( Find Regex ) = ,
RR ( Replace Regex ) = \x20WORD,

This leads to the effective regex S/R :

SEARCH (?-s)(?-i:<META\x20|(?!\A)\G).*?\K,
REPLACE \x20WORD,

Notes :

As usual, tick the Wrap around option ( IMPORTANT ) and choose the Regular expression search mode. The status of all other bo options do not matter !
As said above, the ESR is implicit ! As after each line containing a <META ••••• /> tag, we must reset the search for the string <META + a space char, first, we have to consider that the BSR is, simply, the end of each current line.
To do so, we place, at beginning of the regex, the modifier (?-s) which forces the regex . to match a single standard char only. Thus, when the working location is right before line-break chars, it must skip them and then, the \G assertion, as expected, is not true anymore.
This implies that, necessarily, the next search will look for the string <META and NOT for a , char ! Theferore, the initial regex syntax for the ESR region (?s-i:(?!ESR).)*? can just be simplified to .*? which searches for the nearest range, possibly empty, of chars between the string <META OR the end of the previous match AND a , character, in current line

Now, regarding your second request, as our search is also a “mono-line” one , we may use the same technic, as before, starting with the MODIFIED generic regex :

SEARCH (?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)

REPLACE RR

where :

BSR ( Begin Search-region Regex ) = <META\x20
FR ( Find Regex ) = Homeopathic (treatment|doctor|clinic|specialist)|Homeopathy
RR ( Replace Regex ) = $0\x20for

And we get the functional regex S/R, below :

REPLACE $0\x20for

Best Regards,

guy038

Ramanand Jhingade

@guy038 you are a terrific genius - the RegEx Guru. Thank you very much! I was wasting so much time trying to do that

PeterJones

@Ramanand-Jhingade ,

This is now your sixth search-and-replace question in about a month. We’ve given you answers to all of these. However, you don’t seem to be putting in the effort yourself; you are just slightly describing your data (occasionally giving a one-line example), and expecting us to do your work for you.

That’s not how this forum works. This is a discussion forum, for the entirety of Notepad++, not a “develop my search-and-replace regex for me” forum where we just hand you answers. For people who are new to Notepad++, or new to regex, we’ll give a couple of freebies. But we expect you to read the documentation that we link you to – I’ve given you my search-and-replace boilerplate at least twice (and will a third time here), which links to the official Notepad++ User Manual section on search-and-replace and links to this forum’s FAQ.

The only way to learn regex is to read the documentation, and start trying on your own, and running experiments to see how the results change when you slightly tweak things. When someone shows you a syntax you’ve never seen before, look it up in the docs, and figure out what it’s doing.

If you don’t start giving evidence that you have read the advice and documentation, and if you don’t start showing the regexes that you’ve tried and why you thought they would work (proving that you try things on your own before giving it to us to solve), there are many regulars here who will get tired of you mooching off of our regex expertise, and you’ll either get fewer replies, or you will start getting replies that more forcefully request that you put in the effort, or you might notice downvotes start appearing on your requests – and if you’re downvoted enough, it will be much harder for you to post here. We’re not here to do your homework / job / hobby for you, and we get tired of being expected to do so; we’re glad to help you learn, but only if you put in the effort to learn.

I hope you take this advice to heart, and that you put in the effort to learn Notepad++'s regex syntax yourself. Good luck.

-—

Do you want regex search/replace help? Then please be patient and polite, show some effort, and be willing to learn; answer questions and requests for clarification that are made of you. All example text should be marked as literal text using the </> toolbar button or manual Markdown syntax. To make regex in red (and so they keep their special characters like *), use backticks, like `^.*?blah.*?\z`. Screenshots can be pasted from the clipboard to your post using Ctrl+V to show graphical items, but any text should be included as literal text in your post so we can easily copy/paste your data. Show the data you have and the text you want to get from that data; include examples of things that should match and be transformed, and things that don’t match and should be left alone; show edge cases and make sure you examples are as varied as your real data. Show the regex you already tried, and why you thought it should work; tell us what’s wrong with what you do get. Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ. If you follow these guidelines, you’re much more likely to get helpful replies that solve your problem in the shortest number of tries.

Ramanand Jhingade

@guy038 I observed that for both the RegExs’, the META, the Homeopathic and the Homeopathy are case sensitive.
After you gave me the 2nd RegEx, I tried to add a for to the word cure with a space in between, with this RegEx in the “Find All” field: (?-s)(?-i:<META\x20|(?!\A)\G).*?\K(?-i:cure)) and this RegEx in the “Replace in Files” field: $0\x20for but it adds the for to the cure only in the first line and all the subsequent lines remain unchanged. How to add the for to all the lines which begin with cure - of course, nothing should be replaced or added after the "> or " /> which is the end of the meta tag.

Ramanand Jhingade

@PeterJones I am trying - please see what I posted just above. Thanks for the advice!

Ramanand Jhingade

@guy038 Will (?-s)(?-i:<META\x20|(?!\A)\G).*?\K(?-i:Cure|cure) help find all instances of Cure and cure till the beginning of the next meta tag?

Ramanand Jhingade

@guy038 I have understood this much: $0 Inserts all text matched in the regex (automatic unnamed capture) and \x20 adds a space, g searches for a string repeatedly. The rest is still “Latin” and “Greek” to me. I, therefore, request you to help find the word cure wherever it appears in the META tag (not just in the first line) so that I can replace it.

PeterJones

@Ramanand-Jhingade said in How to add a blank space and a word before every comma in some meta tags in every html file of a folder?:

The rest is still “Latin” and “Greek” to me. I, therefore, request you to help

Thank you for trying to understand. But like learning Latin and Greek, you cannot expect to be an expert in regex in one afternoon. But just because you’re not a Latin/Greek expert doesn’t give you permission to go into a forum on ancient texts and ask them to translate a manuscript you just found, for free, on your terms and on your timeline. When they point you to courses or websites on learning ancient Latin and Greek, you are expected to then go avail yourself of that documentation and give it a shot yourself.

Guy gave you the formula for “change X into Y, but only when it’s between START and END” (using “FR”, “RR”, “BSR”, “ESR”). And he even gave you the first version of what those markers should be, based on your data. Try applying what you learned. If you get stuck, show us what you tried and why you thought it would work, an example of the data input, and how the output differs from what you expected.

Instead, you give a one sentence of “I tried something, and it didn’t work, so just do it for me”, which is an attitude not appreciated here.

Guy will likely come back and hand you yet another answer, and that’s his prerogative. But until you start trying to learn this, it’s going to remain foreign to you, and you’re going to continue to struggle.

Also understand: a lot of what people think are going to be “easy” search-and-replace requests are actually very involved. The “change X into Y, but only when it’s between START and END” sounds easy, but the answer is complicated; that’s why Guy went to the trouble of figuring out the generic formula for that request, which is about as plug-and-play as a regex gets. But sometimes, you are going to have to try various things on your own until you figure it out.

This forum is not a do-my-data-manipulation-for-me-for-free location. We point people to initial answers about regex to inform them that Notepad++'s regex can do a lot, and that if they invest the time in learning regex, they will be able to eventually figure out how to do their own data manipulation for themselves.

guy038

Hi, @ramanand-jhingade, @peterjones and All,

To @peterjones :

I fully support what you said to @ramanand-jhingade. However, I considered that my previous post could be worth to, as it gave, from the same generic regex S/R adapted to mono-lines search, described below, two different solutions for two different goals !

SEARCH (?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)

REPLACE RR

where :

BSR is the Begin Search-region Regex expression to search BEFORE any FR string located in current line
FR is the Find Regex expression, which may be present once or several times in current line
RR is the Replace Regex expression, which replaces any FR expression found in current line

To @ramanand-jhingade :

The (?-i:This is a Text) syntax is a non-capturing group which searches for the exact string This is a Text
The (?i:This is a Text) syntax is a non-capturing group which searches for the string This is a Text whatever its case. For instance, this regex may find any of these expressions :
- This is a Text
- this is a text
- This Is A Text
- tHIs iS a teXT
- THIS IS A TEXT
Similarly, the (?-i)(This is a Text) syntax is a capturing group which searches for the exact string This is a Text
The (?i)(This is a Text) syntax is a capturing group which searches for the string This is a Text whatever its case. For instance, this regex may find any of these expressions :
- this is A TEXT
- THIS IS a text
- ThIs Is A TeXt

You also said :

g searches for a string repeatedly

Not exactly ! We’re are speaking about the \G assertion ( i.e. a condition ) which is TRUE only if the present match IMMEDIATELY follows the previous match !. For instance, let’s imagine the regex \Ga against the text, below, in a new tab

aaaaaabaaaaaaa
aaaaaabaaaaaaa

Move the cursor at the very beginning ( Ctrl + Home)
Search \Ga
Hit repeatedly on the Find Next button

=> As you can see, it will match any a letter and stop when meeting the b letter. Logical, as the next a char, so the future match, does not immediately follows the previous a letter

Now, move the caret right after the b letter. Again, any a letter is matched but note that it stops after the last char of the first line. Again, this behaviour is logical as, between the last a and the next a, there are the two Windows line-break chars ( CR and LF ). So, the two matches are not consecutive !
And, if you move the caret at beginning of the second line, again, the regex engine will match any a letter till the b letter !

I must admit that the \G assertion, even when explained, is not easy to handle ! At least, much less than the usual ^ and $ assertions which match the zero-length locations Beginning and End of current line !

But modifiers, non-capturing and capturing groups are well explained in most of regex tutorials ! So, no matter, you need to "read this fucking manual" !!

You will probably be disappointed by your first attempts but you will progress ! No doubt about it !

Best Regards,

guy038

Ramanand Jhingade

@guy038 You are talking of what is available here: Notepad User Manual right?

PeterJones

@Ramanand-Jhingade said in How to add a blank space and a word before every comma in some meta tags in every html file of a folder?:

You are talking of what is available here: Notepad User Manual right?

Yes. Specifically, as I posted earlier:

Read the official NPP Searching / Regex docs and the forum’s Regular Expression FAQ.

guy038

Hi, @ramanand-jhingade

Yes and more precisely :

https://npp-user-manual.org/docs/searching/ about the general concept of searching
https://npp-user-manual.org/docs/searching/#regular-expressions about regular expressions

A valuable site (and the reference !) is that one, too ( not especially devoted to the Boost regex library, used within N++ ) :

https://www.regular-expressions.info/

And begin with https://www.regular-expressions.info/quickstart.html

Of course, you need two weeks, minimum, to feel at ease with simple goals and, let’s say, three months to become fluent to regex syntaxes in order to modify 90 % texts as you want to, about. For the remaining 10 % you’re welcome and we’ll probably find out a suitable solution !

Note that, with the knowledge of basic regexes syntax, the provided solutions should enlighten you much better than at present, without the necessary background !

BR

guy038

Ramanand Jhingade

@guy038 I was able to do what I wanted by using this Regex in the “Find All” field: (?-s)(?-i:<META\x20|(?!\A)\G).*?\Kcure and this cure\x20for in the “Replace in Files” field. Thanks for the idea. Now, a lot of people will find this page through search engines, so just for their convenience and information please do let us know how to search for more than one string of characters and add a word with a space to the end of that string but where the search is limited to the Meta tags (it should not search the rest of the file)

Ramanand Jhingade

REPLACE ALL $0\x20for
did not work for me!

Reza Saputra

How about this bro?

Ramanand Jhingade

@Reza-Saputra I already searched for each word individually and replaced each with the code I posted for searching for the word, “cure” which was originally given by @guy038 that is (?-s)(?-i:<META\x20|(?!\A)\G).*?\Kcure and this cure\x20for
However, I can tell you that h finds one horizontal whitespace character: tab or Unicode space separator that is it!

guy038

Hello, @ramanand-jhingade,

You said :

please do let us know how to search for more than one string of characters and add a word with a space to the end of that string but where the search is limited to the Meta tags (it should not search the rest of the file)

Well, from the two parts of this previous post and from what I specifically wrote to @peterjones, here, you should had guessed how to do it !

Indeed, from the generic regex :

SEARCH (?-s)(?-i:BSR|(?!\A)\G).*?\K(?-i:FR)

REPLACE RR

where :

BSR is the Begin Search-region Regex expression to search BEFORE any FR string located in current line
FR is the Find Regex expression, which may be present once or several times in current line
RR is the Replace Regex expression, which replaces any FR expression found in current line

we can build up the suitable regex in order to look for, let’s say, 4 different words Word_1, Word_2, Word_3 and Word_4, in meta tags only, written in a single line and add, after each of them, the word WORD !

BSR ( Begin Search-region Regex ) = <META\x20
FR ( Find Regex ) = Word_1|Word_2|word_3|Word_4
RR ( Replace Regex ) = $0\x20WORD

leading to the right regex S/R :

SEARCH (?-s)(?-i:<META\x20|(?!\A)\G).*?\K(?-i:Word_1|Word_2|word_3|Word_4)

REPLACE $0\x20WORD

Now, you said :

SEARCH (?-s)(?-i:<META\x20|(?!\A)\G).*?\K(?-i:Homeopathic (treatment|doctor|clinic|specialist)|Homeopathy)

REPLACE $0\x20for

did not work for me!

Well, may be try this one :

REPLACE $0\x20for

If it does not work either, just send me your file by e-mail. Refer to this post, to get my temporary e-mail address !

BR

guy038

Ramanand Jhingade

@guy038 I already searched for each word individually and replaced each with the code you posted here first to search for the META tag and comma. I used it to search for the word, “Homeopathy”, with this RegEx: (?-s)(?-i:<META\x20|(?!\A)\G).*?\KHomeopathy and replaced it with this: Homeopathy\x20for so I will not bother you again.
I am glad you replied - I think you realised that doing this is was beyond my present abilities and comprehension!