Community
    • Login

    Unicode Combining Characters

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    5 Posts 3 Posters 5.5k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Gauner BerlinG
      Gauner Berlin
      last edited by

      Hallo,

      When I enter a unicode combining character like the overline U+305, in Notepad++ I get the overline and the number seperately, looking like this:
      3 ̅
      Every other Editor I tried, including this forms field I’m typing in, show it above each other:
      3̅
      (even if not well aligned).
      Even the stupid Windows stock editor is able to show it!
      How can I get Notepad++ to show it right, or if it is a bug, can it be fixed soon?

      1 Reply Last reply Reply Quote 0
      • gstaviG
        gstavi
        last edited by

        Notepad++ text rendering is based on Scintilla.
        The author of Scintilla also maintains the SciTE editor as a showcase for Scintilla.
        Post a feature request there. Once it works with SciTE it will also work in Notepad++.

        1 Reply Last reply Reply Quote 0
        • guy038G
          guy038
          last edited by guy038

          Hello @gauner-berlin and All,

          Sorry for my late answer, but I was away from our site all September !

          The character Combining Overline, of Unicode code U+0305, is a sign, added to a previous character,( generally a letter ), to form a new glyph.

          This special character is part of an Unicode block, named Combining Diacritical marks, containing 112 marks, with Unicode value from \x{0300} to \x{036f}. which change the glyph of the letter, located right before it. Refer to the link, below, to discover how all these marks look for !

          http://www.unicode.org/charts/PDF/U0300.pdf

          However, the capacity to add a diacritical mark over/under an uppercase/lowercase letter is, for the most part, due to the font currently used !


          So, after some tests, I could draw some facts :

          • The diacritical characters, from U+0300 to U+0304, from U+0306 to U+030A, as well as the five characters U+030C, U+030F, U+0311, U+0323 and U+0325 are correctly added, to its associated letter, in Notepad++, whatever the font used !

          • If you use an Unicode-oriented font, as, for instance, Lucida Sans Unicode or Arial Unicode MS, globally, all the diacritical marks, from U+300 to U+345, correctly modify the glyph of the previous letter

          • I found out a family of monospaced fonts ( Ioveska fonts ) which are able to display the totality of the diacritical marks, except for the 7 characters U+034F, U+0356, U+0359, U+035B, U+035C, U+035F and U+0362 :-))


          You can download all these monospaced fonts, from the link, below :

          https://github.com/be5invis/Iosevka/releases/tag/v1.11.1

          Personally, I chose the packs 01 ( Sans serif ) and 04 ( serif ). But you may be interested, also, by the pack 07 or 09. These three packs use different shapes for the lowercase letter i and for the digit 1

          Beware, that the three packs 01, 07 and 09 are mutually exclusive and cannot be installed, on a system, at the same time

          Finally, among all the possibilities tested, I only extracted the Bold and Medium forms. The Bold font looks better when characters are small ( I mean, after 3 Zoom Out actions, from the default zoom ! )

          The Iosevka fonts can reproduce 3318 glyphs. Here are, below, all the Unicode blocks, partially or totally, covered by these fonts, in the Unicode BMP ( Unicode Basic Multilingual Plane )

                                         Iosevka font / Iosevka-Slab font
          
              •--------•--------•------------------------------------------•---------------•
              |  Start |   End  |            Unicode Block Name            |  Num / Total  |
              •--------•--------•------------------------------------------•---------------•
              |  0000  |  007F  |  Basic Latin                             |   95  /  128  |
              |  0080  |  00FF  |  Latin-1 Supplement                      |   96  /  128  |
              |  0100  |  017F  |  Latin Extended-A                        |  128  /  128  |
              |  0180  |  024F  |  Latin Extended-B                        |  208  /  208  |
              |  0250  |  02AF  |  IPA Extensions                          |   96  /   96  |
              |  02B0  |  02FF  |  Spacing Modifier Letters                |   56  /   80  |
              |  0300  |  036F  |  Combining Diacritical Marks             |  108  /  112  |
              |  0370  |  03FF  |  Greek and Coptic                        |   89  /  135  |
              |  0400  |  04FF  |  Cyrillic                                |  178  /  256  |
              |  0500  |  052F  |  Cyrillic Supplement                     |    4  /   48  |
              |  1AB0  |  1AFF  |  Combining Diacritical Marks Extended    |    1  /   15  |
              |  1D00  |  1D7F  |  Phonetic Extensions                     |  106  /  128  |
              |  1D80  |  1DBF  |  Phonetic Extensions Supplement          |   40  /   64  |
              |  1DC0  |  1DFF  |  Combining Diacritical Marks Supplement  |   34  /   58  |
              |  1E00  |  1EFF  |  Latin Extended Additional               |  250  /  256  |
              |  1F00  |  1FFF  |  Greek Extended                          |  233  /  233  |
              |  2000  |  206F  |  General Punctuation                     |   45  /  111  |
              |  2070  |  209F  |  Superscripts and Subscripts             |   42  /   42  |
              |  20A0  |  20CF  |  Currency Symbols                        |    6  /   31  |
              |  2100  |  214F  |  Letterlike Symbols                      |   14  /   80  |
              |  2150  |  218F  |  Number Forms                            |   16  /   60  |
              |  2190  |  21FF  |  Arrows                                  |   16  /  112  |
              |  2200  |  22FF  |  Mathematical Operators                  |   67  /  256  |
              |  2300  |  23FF  |  Miscellaneous Technical                 |   76  /  251  |
              |  2460  |  24FF  |  Enclosed Alphanumerics                  |   97  /  160  |
              |  2500  |  257F  |  Box Drawing                             |  128  /  128  |
              |  2580  |  259F  |  Block Elements                          |   20  /   32  |
              |  25A0  |  25FF  |  Geometric Shapes                        |   33  /   96  |
              |  2600  |  26FF  |  Miscellaneous Symbols                   |   12  /  256  |
              |  2700  |  27BF  |  Dingbats                                |    7  /  192  |
              |  27C0  |  27EF  |  Miscellaneous Mathematical Symbols-A    |    4  /   48  |
              |  2800  |  28FF  |  Braille Patterns                        |  256  /  256  |
              |  2900  |  297F  |  Supplemental Arrows-B                   |    2  /  128  |
              |  2A00  |  2AFF  |  Supplemental Mathematical Operators     |    2  /  256  |
              |  2B00  |  2BFF  |  Miscellaneous Symbols and Arrows        |    7  /  206  |
              |  2C60  |  2C7F  |  Latin Extended-C                        |   15  /   32  |
              |  3000  |  303F  |  CJK Symbols and Punctuation             |    6  /   64  |
              |  A720  |  A7FF  |  Latin Extended-D                        |   10  /  159  |
              |  AB30  |  AB6F  |  Latin Extended-E                        |    5  /   54  |
              |  E000  |  F8FF  |  Private Use Area                        |   14  / 6400  |
              |  FB00  |  FB4F  |  Alphabetic Presentation Forms           |    2  /   58  |
              |  FF00  |  FFEF  |  Halfwidth and Fullwidth Forms           |  101  /  225  |
              •--------•--------•------------------------------------------•---------------•
          

          Best Regards,

          guy038

          1 Reply Last reply Reply Quote 0
          • gstaviG
            gstavi
            last edited by

            I don’t think it is only a font issue.
            The editor needs logic to detect that multiple unicode values form a single viewable symbol.
            Can the overline symbol be deleted (with backspace) without the letter or are they fused together immediatly after input?
            How should an oveline be handled at beginning of line?
            With Linux gedit you can add multiple overline symbols to a single letter, E.g. 3 lines over a letter. I guess this rendering effort is beyond just a font.

            My belief is that the vast majority of Notepad++ user base are developers who care mostly about English letters with monospace fonts.
            It is noce that NPP has support for Unicode and some support for right-to-left but I think that such advanced features should have a very low priority and be evaluated very carefully prior to implementation due to the possible negative impact on the ‘standard’ NPP usage.

            1 Reply Last reply Reply Quote 0
            • guy038G
              guy038
              last edited by guy038

              Hi, @gstavi,

              Of course, I’m quite agree with you ! N++ users doesn’t care, most of the time, about good appearance of exotic diacritical marks.

              And yes, I noticed, for instance, that, with Microsoft Word 2002 SP3, some usual diacritical characters, as U+0307 , U+030f , U+0311 and U+0325, are not represented, with my Times New Roman font, although they are well displayed in Notepad++, with the similar font !

              Note that, when you write, in a true UTF-8 BOM encoded file, within N++, the string ab, with a diacritical mark on the a letter, that is to say the string âb, it’s, really, a set of three independent characters and when you select it, you do get a three-characters selection !

              The letter a ( U+0061 ) + the combining circumflex accent ( U+0302 ) + the letter b ( U+0062 )

              You may search for any of them, with the syntaxes \x{0061}, \x{0302} and \x{0062} and when you hit the right arrow key, you’ll be convinced of these 3 characters !

              In Microsoft Word, the well displayed letters, with their associated diacritical mark are, logically, considered as an unique character !

              See also the difference with the simple â letter, of the C1 Controls and Latin-1 Supplement Unicode block, which represents an unique character, of Unicode code U+00E2

              Cheers,

              guy038

              1 Reply Last reply Reply Quote 0
              • First post
                Last post
              The Community of users of the Notepad++ text editor.
              Powered by NodeBB | Contributors