Syntax highlighting certain Unicode characters

  • Hi. I’m proofreading some OCR’ed text in NPP and it contains some Cyrillic (among other things). Since several Cyrillic letters are of exactly the same shape as certain Latin letters, but of course they go by different code, I would like to syntax highlight them to be sure what am I seeing. Is that possible, please?

  • NB. For the time being, I quasi-solved it by creating a delimiter style to highlight between characters { and } and performed a regex replacement of [complete cyrillic alphabet here]+ to {$0} thus enclosing any Cyrillic character or word between { } braces. But this isn’t very nice of course.

  • Cannot be solved via UDL, afaik.
    Two possible solutions might be to mark the regex matches instead of replacing it (4th tab of find/reaplce dialog)
    or to use a scripting plugin like pythonscript, lua script … to write your own quasi-lexer.

  • Sorry to hear that, @Ekopalypse, but actually, I know nothing of either Python or Lua, nor about the programming of NPP. So, this won’t be solved. Thank you for letting me know.

  • @Láng-Attila-D

    I hope we haven’t misunderstood each other but

    mark the regex matches instead of replacing it (4th tab of find/reaplce dialog)

    doesn’t involve any programming and doesn’t manipulate your text either.

  • Oh, I see. I wasn’t aware about that function. Well, it’s nice, too. The only problem is that it doesn’t remain so after reopening the document. Well, better than nothing. Thank you.

  • @Láng-Attila-D

    strictily speaking, UDL does not remain too but get automatically reapplied.
    One further thing you can do is to record a macro, save the action and assign
    a shortcut then you have to just press that shortcut and the regex marks get reapplied as well.