Community
    • Login

    Regex: Delete group of many \\ on the same line

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    3 Posts 2 Posters 283 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Hellena CrainicuH
      Hellena Crainicu
      last edited by

      I need a regex to delete all kind of the text below.

      My regex is not very good, I don;t know why. Can anyone find another solution?

      FIND: \\\\u[0-9a-fA-F]{4}.*?\n{2,}
      Replace by: (Leave empty)

      Text:

      \\u5361\\u8def\\u91cc\\uff08kcal\\uff09\\n\\n\\n\\n\\n<p>\\u8102\\u80aa\\uff08g\\uff09\\n\\ n\\n\\n\\n<p>\\u78b3\\u6c34\\u5316\\u5408\\u7269\\uff08g\\uff09\\n\\n\\n\\n\\n< p>\\u86cb\\u767d\\u8d28\\uff08g\\uff09\\n\\n\\n\\n\\n<p>\\u81b3\\u98df\\u7ea4\\u7ef4\\uff08g \\uff09\\n\\n\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">\\u82b1\\u751f\\ n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">567\\n\\n\\n\\n\\n<p> 49.2\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">16.1\\n\\n\\n\\n\\n <p style=\\\"text-align: center\\\">25,8\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\ „>8,5\\n\\n\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">\\u674f\\u4ec1\\ n\\n\\n\\n\\n<p>579\\n\\n\\n\\n\\n<p>49,9\\n\\n\\n\\n\\ n<p>21,6\\n\\n\\n\\n\\n<p>21,2\\n\\n\\n\\n\\n<p>12,5\\n\\n\ \n\\n\\n\\n\\n<p>\\u8170\\u679c\\n\\n\\n\\n\\n<p>553\\n\\n\\ n\\n\\n<p>43,9\\n\\n\\n\\n\\n<p>30,2\\n\\n\\n\\n\\n<p>18,2\ \n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">3.3\\n\\n\\n\\n\\n\\ n\\n<p style=\\\"text-align: center\\\">\\u699b\\u5b50\\n\\n\\n\\n\\n<p>628\\n \\n\\n\\n\\n<p>60,8\\n\\n\\n\\n\\n<p>16,7\\n\\n\\n\\n\\n <p>14,9\\n\\n\\n\\n\\n<p>9,7\\n\\n\\n\\n\\n\\n\\n<p stil=\\ \"text-align: center\\\">\\u590f\\u5a01\\u5937\\u679c\\n\\n\\n\\n\\n<p>718\\n\\n\ \n\\n\\n<p>75,8\\n\\n\\n\\n\\n<p>13,8\\n\\n\\n\\n\\n<p>7,9 \\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">8.6\\n\\n\\n\\n\\n\ \n\\n<p style=\\\"text-align: center\\\">\\u78a7\\u6839\\u679c\\n\\n\\n\\n\\n<p> 691\\n\\n\\n\\n\\n<p>71,9\\n\\n\\n\\n\\n<p>13,9\\n\\n\\n\\ n\\n<p>9.2\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">9.6\\n\\n\\ n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">\\u677e\\u5b50\\u4ec1\\n\\n\\n\ \n\\n<p>673\\n\\n\\n\\n\\n<p>68,4\\n\\n\\n\\n\\n<p>13,1\\n \\n\\n\\n\\n<p>13,7\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">3,7 \\n\\n\\n\\n\\n\\n\\n<p style=\\\"text-align: center\\\">\\u5f00\\u5fc3\\u679c\\ n\\n\\n\\n\\n<p>560\\n\\n\\n\\n\\n<p>45,3\\n\\n\\n\\n\\ n<p>27,2\\n\\n\\n\\n\\n<p>20,2\\n\\n\\n\\n\\n<p style=\\\"text-align :

      Hellena CrainicuH 1 Reply Last reply Reply Quote 0
      • Hellena CrainicuH
        Hellena Crainicu @Hellena Crainicu
        last edited by

        @Hellena-Crainicu

        FIND: \\\\.*$
        Replace by: (leave empty)

        mkupperM 1 Reply Last reply Reply Quote 1
        • mkupperM
          mkupper @Hellena Crainicu
          last edited by

          @Hellena-Crainicu I suspect the reason your first expression did not work is that you may have Windows end-of-line with \r\n rather than the Unix style \n end-of-line that your expression expected. If you use \R then it will match any style of end of line. \R works with Windows, Unix, and Macintosh style end of lines which use \r.

          If you replaced the \n with \R we have \\\\u[0-9a-fA-F]{4}.*?\R{2,}

          That will match from the first \\uxxxx on to the end of the line followed by two or more end of line marks. The expression matches your example data if it’s followed by two or more blank lines.

          In your follow-up you used \\\\.*$ implying you had not intended to also include the end-of-line marks as part of the pattern nor the followed-by-blank-line requirement.

          This will work \\\\u[0-9a-fA-F]{4}.*$ though as you probably noticed, the data also includes \\ sequences such as \\n which is a newline plus some unusual things such as \\ n with a space between the \\ and n. The underlying HTML that this is intended to decode to is badly formatted meaning it’s likely not worthwhile to fully parse the data.

          Your \\\\.*$ just gets rid of it which is probably good.

          1 Reply Last reply Reply Quote 0
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors