Regex: Delete only one instance of a string between two html tags (double quotes)
-
hello. I have some html tags, for example:
<meta name="description" content="......"/>
As you can see there are 2 double quotes
"
+"
One at starting content of tag, one at the end content of that tag.But in the example below, I have one (or I cand have multiple double quotes, apart from the two basic. How can I delete those extra double quotes?
<meta name="description" content="Kiel vi rilatigas vian juĝvaloron "al la kredoj esprimitaj de aliaj se vi ne pretas elporti la kostojn de misinterpretado de la " cirkonstancoj en kiuj okazas evento?"/>
I try to use and old generic regex that @guy038 made:
(?-si:BSR|(?!\A)\G)(?s-i:(?!ESR).)*?\K(?-si:FR)
will become:
FIND:
(?-si:<meta name="description" content="|(?!\A)\G)(?s-i:(?!"/>).)*?\K(?-si:")
REPLACE BY:
(leave empty)
The problem is that, this solution delete all double quotes, except the first one. But also, the last one (this did not have to be deleted)
-
Hard to understand.
Try: "This regex produced the output
output1
but the output I want isoutput2
. -
@neil-schipper only those 2 double quotes (between dotes) are important:
<meta name="description" content="......"/>
-
Hi @robin-Cruise, @neil-schipper and All,
Given this text :
<meta name="description" content="Kiel vi rilatigas vian juĝvaloron "al la kredoj esprimitaj de aliaj se vi ne pretas elporti la kostojn de misinterpretado de la " cirkonstancoj en kiuj okazas evento?"/>
You used this regex :
FIND:
(?-si:<meta name="description" content="|(?!\A)\G)(?s-i:(?!"/>).)*?\K(?-si:")
So, after finding and deleting the two non-wanted
"
characters, then, due to the\G
feature, it first selects the remaining rangecirkonstancoj en kiuj okazas evento?
When reading the last char
?
of that range, the(?!"/>)
condition is still verified. So, due to the\K
syntax, it wrongly selects the last"
char !This case is special because the string to find is part of the ESR region too. The rule should be :
In single lines, containing the
<meta name="description" content="
string, delete any subsequent double-quote, that is not ending the tag. This gives this simple regex :SEARCH
(?-si:<meta name="description" content="|(?!\A)\G).*?\K"(?!/>)
REPLACE
Leave EMPTY
Note, that I keep only a
No Single Line
andNot Insensitive
modifiers at beginning of the regex(?-si)
and did not use any modifier afterwards, whereas you used this syntax.....(?s-i:(?!"/>).)*?.....
Then, each time, the
.+?
represents the range of text to forget before catching the"
char and the ESR region becomes the final negative look-ahead(?!/>)
Now, the above regex S/R works only for lines containing
<meta name="description" content="
Below, here is a regex which will find out any double-quote, between the usual"
boundaries, in anHTML
orXML
file :SEARCH
(?<!=\x20)(?<!=)"(?!>|/>|\x20>|\x20/>|\?>|\x20\?>|\x20\w+=)
Normally, this case should occur only in comments !
Best Regards,
guy038
-
thank you @guy038
(?-si:<meta name="description" content="|(?!\A)\G).*?\K"(?!/>)
So, I extracted a new generic from your regex above:
This is The Generic regex for search and replace:
(?-si:BSR|(?!\A)\G).*?\KFR(?!ESR)
For the second regex you made, I also try to extract the generic, but I can’t figure it out…
-
Hi, @robin-cruise,
-
Regarding the first regex, your equivalent generic regex is correct
-
However, we cannot find any generic regex, related to my second regex ! Indeed, it just finds any double quote character when :
- Some characters, before the
"
char, do not occur ((?<!=\x20)(?<!=)
)
- Some characters, before the
-
AND
- Some characters, after the
"
char, do not occur ((?!>|/>|\x20>|\x20/>|\?>|\x20\?>|\x20\w+=)
)
- Some characters, after the
BR
guy038
-
-
I try myself to find a generic, from your regex. Works well, except doesn’t work for
"
(double quotes) because is repeated in the tag construction. I change those extra quotes on the content of tags, with a work, like “BOOM” and it find/replace it well beetween start and ending tag.These are the generic regex for your second solution. Are almost the same, short and long version. Makes the same thing, find and replace just well between start and ending tags.
(?<!=\x20)(?<!=)FR(?!>|ESR|\x20>|\x20/>|\?>|\x20\?>|BSR)
OR
(?<!=\x20)(?<!=)FR(?!>|ESR|\x20>|\x20/>|\?>|\x20\?>|\x20BSR)
OR
(?<!=\x20)(?<!=)FR(?!>|ESR|\x20\?>|\x20BSR)