Community
    • Login

    How to delete ANSI characters such as Ââ in multiple UTF-8 files?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 2 Posters 619 Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Neculai I. FantanaruN
      Neculai I. Fantanaru
      last edited by Neculai I. Fantanaru

      hello, I have a lots of ANSI characters such as Â|â in multiple UTF-8 files. How to delete them?

      I try with regex to make a find and replace in Find in Files, but I did not succeed, because the files are in UTF-8.

      In UTF-8, Â and â looks like this. And I cannot copy and make replacement with these simbols

      eb704964-ae9f-4108-9294-8a7e75e5b217-image.png

      Neculai I. FantanaruN 1 Reply Last reply Reply Quote 0
      • Neculai I. FantanaruN
        Neculai I. Fantanaru @Neculai I. Fantanaru
        last edited by

        I find this Python code HERE, but is a little bit tricky. I must have just ANSI text pages, not UTF-8

        import re
        text = "\t\u001b[0;35mgoogle.com\u001b[0m \u001b[0;36m216.58.218.206\u001b[0m"
        print("Original Text: ",text)
        reaesc = re.compile(r'\x1b[^m]*m')
        new_text = reaesc.sub('', text)
        print("New Text: ",new_text)
        
        1 Reply Last reply Reply Quote 0
        • Neculai I. FantanaruN
          Neculai I. Fantanaru
          last edited by Neculai I. Fantanaru

          Another solution I find on internet, it to use uni2ascii

          this is the command:

          uni2ascii -B input.txt >output.txt
          

          After convert the file into ANSI, you can make a replacement with powershell:

          $a = Get-Content -Path "c:\Folder3\test.txt"
            $a.Replace("Â"," ").Replace("â"," ") | Out-File -FilePath "c:\Folder3\new.txt" -Encoding utf8
          
          Alan KilbornA 1 Reply Last reply Reply Quote -1
          • Alan KilbornA
            Alan Kilborn @Neculai I. Fantanaru
            last edited by

            @Neculai-I-Fantanaru

            Another solution I find on internet, it to use uni2ascii

            Off-topic. Please refrain from posting off-topic information here.

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors