REGEX: Combine the two lines on a single line
-
I try to combine the two lines on a single line, but in such a way as to keep the tags.
<meta property="og:description" content="Комедия по пьесе А.Н.Островского.
«На бойком месте» не соскучишься. Тут тебе и кутёж, и грабёж, и коварство, и л"/>
My regex is good, but I need a short version for the same tag:
FIND:
<meta property="og:description" content="(.*?)\s*(\n)(.*?)"/>
REPLACE BY:<meta property="og:description" content="\1\3"/>
-
@Hellena-Crainicu said in REGEX: Combine the two lines on a single line:
<meta property=“og:description” content="Комедия по пьесе А.Н.Островского.
«На бойком месте» не соскучишься. Тут тебе и кутёж, и грабёж, и коварство, и л"/>
Why not do:
Search:(\h+content=")([^\R"]*)\R*([^\R"]*)
Replace:\1\2\3
Try it and think.
-
@Hellena-Crainicu said in REGEX: Combine the two lines on a single line:
I try to combine the two lines on a single line, but in such a way as to keep the tags.
Another idea is:
Find What:<((?!/>)[^\r\n])+\K\R
Replace With: this field emptySo it will look for the start of a tag (<), if it reaches the end of the line without a closing tag (/>) that would mean a continuation line. Thus it removes the CR and/or LF. If the next line needs a space between it and the preceding line, add that space character to the Replace With field.
(?!/>)[^\r\n]
this means the next 2 characters cannot be a/>
(closing tag), if not then capture another character (so long as they aren’t end of line characters). The+
following will repeat this process. That way if a/
or a>
occurs by itself in the text it will not stop the regex.As it uses the
\K
you will need to click on “Replace All” for the regex to work.Terry
-
@mkupper said in REGEX: Combine the two lines on a single line:
Why not do:
Search:(\h+content=")([^\R"]*)\R*([^\R"]*)
Replace:\1\2\3
Try it and think.
The replace is not working, doesn’t do nothing…
-
@Terry-R said in REGEX: Combine the two lines on a single line:
<((?!/>)[^\r\n])+\K\R
yout regex is almost good, but the replacement will delete the line between 1 and 3 lines. But doesn’t combine the content of both lines.
-
@Hellena-Crainicu
Assuming your goal is to delete all newlines inside of thecontent
attribute of ameta
tag, I believe I have a solution for you:
Replace(?s-i)(?:<meta[^<>]*content\s*=\s*"|(?!\A)\G)[^"]*?\K\R
with nothing.This is based on guy038’s now-classic “replace-all-instances-of-X-between-Y-and-Z” regex, and works as follows:
- start off by looking for the
content
attribute of ameta
tag:<meta[^<>]*content\s*=\s*"
- Begin our match there OR wherever the last match ended UNLESS we wrapped around to the beginning of the file:
(?!\A)\G
- Keep searching until we find the close quote character of the
content
attribute, and then forget everything we matched:[^"]*?\K
- Find a newline character:
\R
- and replace the newline character you found with nothing.
It will convert
<meta property=“og:description” content="foo bar baz more foo" /> <notmeta content="ignore this tag"/> <meta property=“og:description” content="also foo bar bar"/> <meta content="keep fooing forever! foo foo" property=“og:description” />
into the following:
<meta property=“og:description” content="foobarbazmore foo" /> <notmeta content="ignore this tag"/> <meta property=“og:description” content="also foobarbar"/> <meta content="keep fooing forever!foofoo" property=“og:description” />
Note that all the newlines that aren’t in the
content
attribute of ameta
tag have been preserved.
Just for fun, I tried this 8 thousand lines of that repeated, and it was reasonably performant. - start off by looking for the
-
@Hellena-Crainicu said in REGEX: Combine the two lines on a single line:
But doesn’t combine the content of both lines.
I see what the problem was. I was copying your example, and because your title was about combining the 2 lines, I was absentmindedly removing the (very important) empty line between. Then my testing was with just 2 lines.
So my regex just needs an additional
\R
at the end. So it becomes
<((?!/>)[^\r\n])+\K\R\R
Terry
-
@Terry-R said in REGEX: Combine the two lines on a single line:
FIND:
<((?!/>)[^\r\n])+\K\R\R
REPLACE:(leave empty)
thanks, works beautiful
-
@Mark-Olson said in REGEX: Combine the two lines on a single line:
FIND:
(?s-i)(?:<meta[^<>]*content\s*=\s*"|(?!\A)\G)[^"]*?\K\R
REPLACE BY:(leave empty)
super. thanks