Haha! Thank you for the detailed answers. I was hoping for a simpler solution than regex, as I understand very little about it. However, it went surprisingly well…
It’s very basic, not 100% accurate solution but close enough. I edited by hand the dozen that were left next to a HTML tag (<a>, <li>, <i>).
I’ll leave it there in case somebody else has the same need in the future:
Step 0: Make backup!
Step 1:
\s" replaced by \s“ (catches all opening quotes after a space, so no HTML “code” should be hit, and replaces them)
Step 2:
“([^“]*)” replaced by “\1” (replaces closing quotes)
I use the French version:
Step 1:
\s" replaced by \s«
Step 2:
«([^“]*)” replaced by «\1 »
And for the apostrophes:
([a-zA-Z]|[à-ü]|[À-Ü])'([a-zA-Z]|[à-ü]|[À-Ü]) replaced by \1’\2 (every apostrophe between two letters, to avoid hitting things in javascript, accented letters included for French)
Then by hand, I replaced the few that were left (no regex):
'<i> --> ’<i>
'<a --> ’<a
@ Mark Olson
No idea what a parser is, nor XML… Anyway my typography is saved now.