Unicode Combining Characters



  • Hallo,

    When I enter a unicode combining character like the overline U+305, in Notepad++ I get the overline and the number seperately, looking like this:
    3 ̅
    Every other Editor I tried, including this forms field I’m typing in, show it above each other:

    (even if not well aligned).
    Even the stupid Windows stock editor is able to show it!
    How can I get Notepad++ to show it right, or if it is a bug, can it be fixed soon?



  • Notepad++ text rendering is based on Scintilla.
    The author of Scintilla also maintains the SciTE editor as a showcase for Scintilla.
    Post a feature request there. Once it works with SciTE it will also work in Notepad++.



  • Hello @gauner-berlin and All,

    Sorry for my late answer, but I was away from our site all September !

    The character Combining Overline, of Unicode code U+0305, is a sign, added to a previous character,( generally a letter ), to form a new glyph.

    This special character is part of an Unicode block, named Combining Diacritical marks, containing 112 marks, with Unicode value from \x{0300} to \x{036f}. which change the glyph of the letter, located right before it. Refer to the link, below, to discover how all these marks look for !

    http://www.unicode.org/charts/PDF/U0300.pdf

    However, the capacity to add a diacritical mark over/under an uppercase/lowercase letter is, for the most part, due to the font currently used !


    So, after some tests, I could draw some facts :

    • The diacritical characters, from U+0300 to U+0304, from U+0306 to U+030A, as well as the five characters U+030C, U+030F, U+0311, U+0323 and U+0325 are correctly added, to its associated letter, in Notepad++, whatever the font used !

    • If you use an Unicode-oriented font, as, for instance, Lucida Sans Unicode or Arial Unicode MS, globally, all the diacritical marks, from U+300 to U+345, correctly modify the glyph of the previous letter

    • I found out a family of monospaced fonts ( Ioveska fonts ) which are able to display the totality of the diacritical marks, except for the 7 characters U+034F, U+0356, U+0359, U+035B, U+035C, U+035F and U+0362 :-))


    You can download all these monospaced fonts, from the link, below :

    https://github.com/be5invis/Iosevka/releases/tag/v1.11.1

    Personally, I chose the packs 01 ( Sans serif ) and 04 ( serif ). But you may be interested, also, by the pack 07 or 09. These three packs use different shapes for the lowercase letter i and for the digit 1

    Beware, that the three packs 01, 07 and 09 are mutually exclusive and cannot be installed, on a system, at the same time

    Finally, among all the possibilities tested, I only extracted the Bold and Medium forms. The Bold font looks better when characters are small ( I mean, after 3 Zoom Out actions, from the default zoom ! )

    The Iosevka fonts can reproduce 3318 glyphs. Here are, below, all the Unicode blocks, partially or totally, covered by these fonts, in the Unicode BMP ( Unicode Basic Multilingual Plane )

                                   Iosevka font / Iosevka-Slab font
    
        •--------•--------•------------------------------------------•---------------•
        |  Start |   End  |            Unicode Block Name            |  Num / Total  |
        •--------•--------•------------------------------------------•---------------•
        |  0000  |  007F  |  Basic Latin                             |   95  /  128  |
        |  0080  |  00FF  |  Latin-1 Supplement                      |   96  /  128  |
        |  0100  |  017F  |  Latin Extended-A                        |  128  /  128  |
        |  0180  |  024F  |  Latin Extended-B                        |  208  /  208  |
        |  0250  |  02AF  |  IPA Extensions                          |   96  /   96  |
        |  02B0  |  02FF  |  Spacing Modifier Letters                |   56  /   80  |
        |  0300  |  036F  |  Combining Diacritical Marks             |  108  /  112  |
        |  0370  |  03FF  |  Greek and Coptic                        |   89  /  135  |
        |  0400  |  04FF  |  Cyrillic                                |  178  /  256  |
        |  0500  |  052F  |  Cyrillic Supplement                     |    4  /   48  |
        |  1AB0  |  1AFF  |  Combining Diacritical Marks Extended    |    1  /   15  |
        |  1D00  |  1D7F  |  Phonetic Extensions                     |  106  /  128  |
        |  1D80  |  1DBF  |  Phonetic Extensions Supplement          |   40  /   64  |
        |  1DC0  |  1DFF  |  Combining Diacritical Marks Supplement  |   34  /   58  |
        |  1E00  |  1EFF  |  Latin Extended Additional               |  250  /  256  |
        |  1F00  |  1FFF  |  Greek Extended                          |  233  /  233  |
        |  2000  |  206F  |  General Punctuation                     |   45  /  111  |
        |  2070  |  209F  |  Superscripts and Subscripts             |   42  /   42  |
        |  20A0  |  20CF  |  Currency Symbols                        |    6  /   31  |
        |  2100  |  214F  |  Letterlike Symbols                      |   14  /   80  |
        |  2150  |  218F  |  Number Forms                            |   16  /   60  |
        |  2190  |  21FF  |  Arrows                                  |   16  /  112  |
        |  2200  |  22FF  |  Mathematical Operators                  |   67  /  256  |
        |  2300  |  23FF  |  Miscellaneous Technical                 |   76  /  251  |
        |  2460  |  24FF  |  Enclosed Alphanumerics                  |   97  /  160  |
        |  2500  |  257F  |  Box Drawing                             |  128  /  128  |
        |  2580  |  259F  |  Block Elements                          |   20  /   32  |
        |  25A0  |  25FF  |  Geometric Shapes                        |   33  /   96  |
        |  2600  |  26FF  |  Miscellaneous Symbols                   |   12  /  256  |
        |  2700  |  27BF  |  Dingbats                                |    7  /  192  |
        |  27C0  |  27EF  |  Miscellaneous Mathematical Symbols-A    |    4  /   48  |
        |  2800  |  28FF  |  Braille Patterns                        |  256  /  256  |
        |  2900  |  297F  |  Supplemental Arrows-B                   |    2  /  128  |
        |  2A00  |  2AFF  |  Supplemental Mathematical Operators     |    2  /  256  |
        |  2B00  |  2BFF  |  Miscellaneous Symbols and Arrows        |    7  /  206  |
        |  2C60  |  2C7F  |  Latin Extended-C                        |   15  /   32  |
        |  3000  |  303F  |  CJK Symbols and Punctuation             |    6  /   64  |
        |  A720  |  A7FF  |  Latin Extended-D                        |   10  /  159  |
        |  AB30  |  AB6F  |  Latin Extended-E                        |    5  /   54  |
        |  E000  |  F8FF  |  Private Use Area                        |   14  / 6400  |
        |  FB00  |  FB4F  |  Alphabetic Presentation Forms           |    2  /   58  |
        |  FF00  |  FFEF  |  Halfwidth and Fullwidth Forms           |  101  /  225  |
        •--------•--------•------------------------------------------•---------------•
    

    Best Regards,

    guy038



  • I don’t think it is only a font issue.
    The editor needs logic to detect that multiple unicode values form a single viewable symbol.
    Can the overline symbol be deleted (with backspace) without the letter or are they fused together immediatly after input?
    How should an oveline be handled at beginning of line?
    With Linux gedit you can add multiple overline symbols to a single letter, E.g. 3 lines over a letter. I guess this rendering effort is beyond just a font.

    My belief is that the vast majority of Notepad++ user base are developers who care mostly about English letters with monospace fonts.
    It is noce that NPP has support for Unicode and some support for right-to-left but I think that such advanced features should have a very low priority and be evaluated very carefully prior to implementation due to the possible negative impact on the ‘standard’ NPP usage.



  • Hi, @gstavi,

    Of course, I’m quite agree with you ! N++ users doesn’t care, most of the time, about good appearance of exotic diacritical marks.

    And yes, I noticed, for instance, that, with Microsoft Word 2002 SP3, some usual diacritical characters, as U+0307 , U+030f , U+0311 and U+0325, are not represented, with my Times New Roman font, although they are well displayed in Notepad++, with the similar font !

    Note that, when you write, in a true UTF-8 BOM encoded file, within N++, the string ab, with a diacritical mark on the a letter, that is to say the string âb, it’s, really, a set of three independent characters and when you select it, you do get a three-characters selection !

    The letter a ( U+0061 ) + the combining circumflex accent ( U+0302 ) + the letter b ( U+0062 )

    You may search for any of them, with the syntaxes \x{0061}, \x{0302} and \x{0062} and when you hit the right arrow key, you’ll be convinced of these 3 characters !

    In Microsoft Word, the well displayed letters, with their associated diacritical mark are, logically, considered as an unique character !

    See also the difference with the simple â letter, of the C1 Controls and Latin-1 Supplement Unicode block, which represents an unique character, of Unicode code U+00E2

    Cheers,

    guy038


Log in to reply