@PeterJones said in Search accented and non-accented characters alike with one simple setting?:
the near-impossible task that @Coises hinted at
In a C++ plugin, not near-impossible, just tedious. In anything other than C++, maybe near-impossible.
it might not handle all cases, but it’d definitely handle the single-codepoint accentented characters even if it doesn’t handle all the cases with combing-accent characters – my test with [[=a=]] shows that it just matches the a when my doc is a followed by U+0301 Combining Acute á, but obviously matches the U+00E1 á single-character but my guess is that most of the people who have been asking for accent-insensitive searching are just using simple single-character accented characters, rather than the combing versions. but that is just a guess.
It could be that (?=[[=a=]])\X would catch most if not all of the combining cases and not add false positives. Matching the full character is important because you’d want to string characters together, and the intervening combining marks would make the match fail.
And once @coises added the simple version, I am sure he would be innundated with requests to make it handle the combining, and might not like that.
If I get into this, I will almost certainly go the iterator route. The modify-the-search-string route is plausible, though, for someone who might want to tackle this in Python Script, or probably anything other than a C++ plugin calling Boost::regex directly.