How to delete ANSI characters such as Ââ in multiple UTF-8 files?

Neculai I. Fantanaru

hello, I have a lots of ANSI characters such as Â|â in multiple UTF-8 files. How to delete them?

I try with regex to make a find and replace in Find in Files, but I did not succeed, because the files are in UTF-8.

In UTF-8, Â and â looks like this. And I cannot copy and make replacement with these simbols

Neculai I. Fantanaru

I find this Python code HERE, but is a little bit tricky. I must have just ANSI text pages, not UTF-8

import re
text = "\t\u001b[0;35mgoogle.com\u001b[0m \u001b[0;36m216.58.218.206\u001b[0m"
print("Original Text: ",text)
reaesc = re.compile(r'\x1b[^m]*m')
new_text = reaesc.sub('', text)
print("New Text: ",new_text)

Neculai I. Fantanaru

Another solution I find on internet, it to use uni2ascii

this is the command:

uni2ascii -B input.txt >output.txt

After convert the file into ANSI, you can make a replacement with powershell:

$a = Get-Content -Path "c:\Folder3\test.txt"
  $a.Replace("Â"," ").Replace("â"," ") | Out-File -FilePath "c:\Folder3\new.txt" -Encoding utf8

Alan Kilborn

@Neculai-I-Fantanaru

Another solution I find on internet, it to use uni2ascii

Off-topic. Please refrain from posting off-topic information here.