Community
    • Login

    How to delete ANSI characters such as Ââ in multiple UTF-8 files?

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    4 Posts 2 Posters 987 Views 1 Watching
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Neculai I. FantanaruN Offline
      Neculai I. Fantanaru
      last edited by Neculai I. Fantanaru

      hello, I have a lots of ANSI characters such as Â|â in multiple UTF-8 files. How to delete them?

      I try with regex to make a find and replace in Find in Files, but I did not succeed, because the files are in UTF-8.

      In UTF-8, Â and â looks like this. And I cannot copy and make replacement with these simbols

      eb704964-ae9f-4108-9294-8a7e75e5b217-image.png

      Neculai I. FantanaruN 1 Reply Last reply Reply Quote 0
      • Neculai I. FantanaruN Offline
        Neculai I. Fantanaru @Neculai I. Fantanaru
        last edited by

        I find this Python code HERE, but is a little bit tricky. I must have just ANSI text pages, not UTF-8

        import re
        text = "\t\u001b[0;35mgoogle.com\u001b[0m \u001b[0;36m216.58.218.206\u001b[0m"
        print("Original Text: ",text)
        reaesc = re.compile(r'\x1b[^m]*m')
        new_text = reaesc.sub('', text)
        print("New Text: ",new_text)
        
        1 Reply Last reply Reply Quote 0
        • Neculai I. FantanaruN Offline
          Neculai I. Fantanaru
          last edited by Neculai I. Fantanaru

          Another solution I find on internet, it to use uni2ascii

          this is the command:

          uni2ascii -B input.txt >output.txt
          

          After convert the file into ANSI, you can make a replacement with powershell:

          $a = Get-Content -Path "c:\Folder3\test.txt"
            $a.Replace("Â"," ").Replace("â"," ") | Out-File -FilePath "c:\Folder3\new.txt" -Encoding utf8
          
          Alan KilbornA 1 Reply Last reply Reply Quote -1
          • Alan KilbornA Offline
            Alan Kilborn @Neculai I. Fantanaru
            last edited by

            @Neculai-I-Fantanaru

            Another solution I find on internet, it to use uni2ascii

            Off-topic. Please refrain from posting off-topic information here.

            1 Reply Last reply Reply Quote 0

            Hello! It looks like you're interested in this conversation, but you don't have an account yet.

            Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

            With your input, this post could be even better 💗

            Register Login
            • First post
              Last post
            The Community of users of the Notepad++ text editor.
            Powered by NodeBB | Contributors