• Login
Community
  • Login

Regex: Delete group of many \\ on the same line

Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
3 Posts 2 Posters 296 Views
Loading More Posts
  • Oldest to Newest
  • Newest to Oldest
  • Most Votes
Reply
  • Reply as topic
Log in to reply
This topic has been deleted. Only users with topic management privileges can see it.
  • H
    Hellena Crainicu
    last edited by Dec 5, 2023, 10:46 AM

    I need a regex to delete all kind of the text below.

    My regex is not very good, I don;t know why. Can anyone find another solution?

    FIND: \\\\u[0-9a-fA-F]{4}.*?\n{2,}
    Replace by: (Leave empty)

    Text:

    \\u5361\\u8def\\u91cc\\uff08kcal\\uff09\\n\\n\\n\\n\\n<p>\\u8102\\u80aa\\uff08g\\uff09\\n\\ n\\n\\n\\n<p>\\u78b3\\u6c34\\u5316\\u5408\\u7269\\uff08g\\uff09\\n\\n\\n\\n\\n< p>\\u86cb\\u767d\\u8d28\\uff08g\\uff09\\n\\n\\n\\n\\n<p>\\u81b3\\u98df\\u7ea4\\u7ef4\\uff08g \\uff09\\n\\n\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">\\u82b1\\u751f\\ n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">567\\n\\n\\n\\n\\n<p> 49.2\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">16.1\\n\\n\\n\\n\\n <p style=\\\"text-align: center\\\">25,8\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\ „>8,5\\n\\n\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">\\u674f\\u4ec1\\ n\\n\\n\\n\\n<p>579\\n\\n\\n\\n\\n<p>49,9\\n\\n\\n\\n\\ n<p>21,6\\n\\n\\n\\n\\n<p>21,2\\n\\n\\n\\n\\n<p>12,5\\n\\n\ \n\\n\\n\\n\\n<p>\\u8170\\u679c\\n\\n\\n\\n\\n<p>553\\n\\n\\ n\\n\\n<p>43,9\\n\\n\\n\\n\\n<p>30,2\\n\\n\\n\\n\\n<p>18,2\ \n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">3.3\\n\\n\\n\\n\\n\\ n\\n<p style=\\\"text-align: center\\\">\\u699b\\u5b50\\n\\n\\n\\n\\n<p>628\\n \\n\\n\\n\\n<p>60,8\\n\\n\\n\\n\\n<p>16,7\\n\\n\\n\\n\\n <p>14,9\\n\\n\\n\\n\\n<p>9,7\\n\\n\\n\\n\\n\\n\\n<p stil=\\ \"text-align: center\\\">\\u590f\\u5a01\\u5937\\u679c\\n\\n\\n\\n\\n<p>718\\n\\n\ \n\\n\\n<p>75,8\\n\\n\\n\\n\\n<p>13,8\\n\\n\\n\\n\\n<p>7,9 \\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">8.6\\n\\n\\n\\n\\n\ \n\\n<p style=\\\"text-align: center\\\">\\u78a7\\u6839\\u679c\\n\\n\\n\\n\\n<p> 691\\n\\n\\n\\n\\n<p>71,9\\n\\n\\n\\n\\n<p>13,9\\n\\n\\n\\ n\\n<p>9.2\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">9.6\\n\\n\\ n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">\\u677e\\u5b50\\u4ec1\\n\\n\\n\ \n\\n<p>673\\n\\n\\n\\n\\n<p>68,4\\n\\n\\n\\n\\n<p>13,1\\n \\n\\n\\n\\n<p>13,7\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">3,7 \\n\\n\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">\\u5f00\\u5fc3\\u679c\\ n\\n\\n\\n\\n<p>560\\n\\n\\n\\n\\n<p>45,3\\n\\n\\n\\n\\ n<p>27,2\\n\\n\\n\\n\\n<p>20,2\\n\\n\\n\\n\\n<p style=\\\"text-align :

    H 1 Reply Last reply Dec 5, 2023, 10:51 AM Reply Quote 0
    • H
      Hellena Crainicu @Hellena Crainicu
      last edited by Dec 5, 2023, 10:51 AM

      @Hellena-Crainicu

      FIND: \\\\.*$
      Replace by: (leave empty)

      M 1 Reply Last reply Dec 5, 2023, 9:39 PM Reply Quote 1
      • M
        mkupper @Hellena Crainicu
        last edited by Dec 5, 2023, 9:39 PM

        @Hellena-Crainicu I suspect the reason your first expression did not work is that you may have Windows end-of-line with \r\n rather than the Unix style \n end-of-line that your expression expected. If you use \R then it will match any style of end of line. \R works with Windows, Unix, and Macintosh style end of lines which use \r.

        If you replaced the \n with \R we have \\\\u[0-9a-fA-F]{4}.*?\R{2,}

        That will match from the first \\uxxxx on to the end of the line followed by two or more end of line marks. The expression matches your example data if it’s followed by two or more blank lines.

        In your follow-up you used \\\\.*$ implying you had not intended to also include the end-of-line marks as part of the pattern nor the followed-by-blank-line requirement.

        This will work \\\\u[0-9a-fA-F]{4}.*$ though as you probably noticed, the data also includes \\ sequences such as \\n which is a newline plus some unusual things such as \\ n with a space between the \\ and n. The underlying HTML that this is intended to decode to is badly formatted meaning it’s likely not worthwhile to fully parse the data.

        Your \\\\.*$ just gets rid of it which is probably good.

        1 Reply Last reply Reply Quote 0
        2 out of 3
        • First post
          2/3
          Last post
        The Community of users of the Notepad++ text editor.
        Powered by NodeBB | Contributors