Regex: Delete group of many \\ on the same line
-
I need a regex to delete all kind of the text below.
My regex is not very good, I don;t know why. Can anyone find another solution?
FIND:
\\\\u[0-9a-fA-F]{4}.*?\n{2,}
Replace by:(Leave empty)
Text:
\\u5361\\u8def\\u91cc\\uff08kcal\\uff09\\n\\n\\n\\n\\n<p>\\u8102\\u80aa\\uff08g\\uff09\\n\\ n\\n\\n\\n<p>\\u78b3\\u6c34\\u5316\\u5408\\u7269\\uff08g\\uff09\\n\\n\\n\\n\\n< p>\\u86cb\\u767d\\u8d28\\uff08g\\uff09\\n\\n\\n\\n\\n<p>\\u81b3\\u98df\\u7ea4\\u7ef4\\uff08g \\uff09\\n\\n\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">\\u82b1\\u751f\\ n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">567\\n\\n\\n\\n\\n<p> 49.2\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">16.1\\n\\n\\n\\n\\n <p style=\\\"text-align: center\\\">25,8\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\ „>8,5\\n\\n\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">\\u674f\\u4ec1\\ n\\n\\n\\n\\n<p>579\\n\\n\\n\\n\\n<p>49,9\\n\\n\\n\\n\\ n<p>21,6\\n\\n\\n\\n\\n<p>21,2\\n\\n\\n\\n\\n<p>12,5\\n\\n\ \n\\n\\n\\n\\n<p>\\u8170\\u679c\\n\\n\\n\\n\\n<p>553\\n\\n\\ n\\n\\n<p>43,9\\n\\n\\n\\n\\n<p>30,2\\n\\n\\n\\n\\n<p>18,2\ \n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">3.3\\n\\n\\n\\n\\n\\ n\\n<p style=\\\"text-align: center\\\">\\u699b\\u5b50\\n\\n\\n\\n\\n<p>628\\n \\n\\n\\n\\n<p>60,8\\n\\n\\n\\n\\n<p>16,7\\n\\n\\n\\n\\n <p>14,9\\n\\n\\n\\n\\n<p>9,7\\n\\n\\n\\n\\n\\n\\n<p stil=\\ \"text-align: center\\\">\\u590f\\u5a01\\u5937\\u679c\\n\\n\\n\\n\\n<p>718\\n\\n\ \n\\n\\n<p>75,8\\n\\n\\n\\n\\n<p>13,8\\n\\n\\n\\n\\n<p>7,9 \\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">8.6\\n\\n\\n\\n\\n\ \n\\n<p style=\\\"text-align: center\\\">\\u78a7\\u6839\\u679c\\n\\n\\n\\n\\n<p> 691\\n\\n\\n\\n\\n<p>71,9\\n\\n\\n\\n\\n<p>13,9\\n\\n\\n\\ n\\n<p>9.2\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">9.6\\n\\n\\ n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">\\u677e\\u5b50\\u4ec1\\n\\n\\n\ \n\\n<p>673\\n\\n\\n\\n\\n<p>68,4\\n\\n\\n\\n\\n<p>13,1\\n \\n\\n\\n\\n<p>13,7\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">3,7 \\n\\n\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">\\u5f00\\u5fc3\\u679c\\ n\\n\\n\\n\\n<p>560\\n\\n\\n\\n\\n<p>45,3\\n\\n\\n\\n\\ n<p>27,2\\n\\n\\n\\n\\n<p>20,2\\n\\n\\n\\n\\n<p style=\\\"text-align :
-
FIND:
\\\\.*$
Replace by:(leave empty)
-
@Hellena-Crainicu I suspect the reason your first expression did not work is that you may have Windows end-of-line with
\r\n
rather than the Unix style\n
end-of-line that your expression expected. If you use\R
then it will match any style of end of line.\R
works with Windows, Unix, and Macintosh style end of lines which use\r
.If you replaced the
\n
with\R
we have\\\\u[0-9a-fA-F]{4}.*?\R{2,}
That will match from the first
\\uxxxx
on to the end of the line followed by two or more end of line marks. The expression matches your example data if it’s followed by two or more blank lines.In your follow-up you used
\\\\.*$
implying you had not intended to also include the end-of-line marks as part of the pattern nor the followed-by-blank-line requirement.This will work
\\\\u[0-9a-fA-F]{4}.*$
though as you probably noticed, the data also includes\\
sequences such as\\n
which is a newline plus some unusual things such as\\ n
with a space between the\\
andn
. The underlying HTML that this is intended to decode to is badly formatted meaning it’s likely not worthwhile to fully parse the data.Your
\\\\.*$
just gets rid of it which is probably good.