How to delete ANSI characters such as Ââ in multiple UTF-8 files?
-
hello, I have a lots of ANSI characters such as
Â|â
in multiple UTF-8 files. How to delete them?I try with regex to make a find and replace in
Find in Files
, but I did not succeed, because the files are in UTF-8.In UTF-8,
Â
andâ
looks like this. And I cannot copy and make replacement with these simbols -
I find this Python code HERE, but is a little bit tricky. I must have just ANSI text pages, not UTF-8
import re text = "\t\u001b[0;35mgoogle.com\u001b[0m \u001b[0;36m216.58.218.206\u001b[0m" print("Original Text: ",text) reaesc = re.compile(r'\x1b[^m]*m') new_text = reaesc.sub('', text) print("New Text: ",new_text)
-
Another solution I find on internet, it to use uni2ascii
this is the command:
uni2ascii -B input.txt >output.txt
After convert the file into ANSI, you can make a replacement with powershell:
$a = Get-Content -Path "c:\Folder3\test.txt" $a.Replace("Â"," ").Replace("â"," ") | Out-File -FilePath "c:\Folder3\new.txt" -Encoding utf8
-
Another solution I find on internet, it to use uni2ascii
Off-topic. Please refrain from posting off-topic information here.