Community
    • Login

    New N++ feature to show/hide Non-Printing characters

    Scheduled Pinned Locked Moved General Discussion
    3 Posts 2 Posters 5.6k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G Offline
      guy038
      last edited by guy038

      Hi All,

      Do you remember of this Invisible characters unwanted discussion and of my last post, below, about the main invisible characters which need a visual representation ?

      https://community.notepad-plus-plus.org/post/62169

      From this post, and regarding the new N++ feature, in the v8.5 release, to show the non-printing characters, it think that it would be interesting to have a new look at this topic !


      Firstly, in the last N++ release, the invisible characters, located outside the BMP ( Basic Multilingual Plane ), are not taken in account. I think that this position is acceptable as :

      • The two format Kaithi characters are related to the historical Kaithi script, which is rather ignored since the 1970’s years

      • The nine format Egyptian characters refer to the ancient Egyptian hieroglyphs

      • The four format Shorthand characters cannot be considered as true characters as it may encode a lot of european languages, simultaneously

      • The 233 characters of the Musical symbols Unicode block cannot be strictly considered as characters and rather represents a modern musical notation system

      • To end with, all the format characters of the Tag block are strongly discouraged by the Unicode Consortium


      Now, if we consider all the non-printing characters seen, when the View > Show Symbol > Show Non-Printing characters is set, we get a list of 42 lines :

          •-------•---------•---------------------------------------------•----•------------------•-------
          | Code  | Abbrev. |           Character Name                    | Cg |    N++  Regex    | Char
          •-------•---------•---------------------------------------------•----•------------------•-------
          |  00A0 |  NBSP   | NO-BREAK SPACE                              | Zs |     \x{00A0}     |   
          |       |         |                                             |    |                  |
          |  061C |  ALM    | ARABIC LETTER MARK                          | Cf |     \x{061C}     |  ؜
          |       |         |                                             |    |                  |
          |  1680 |  OSPM   | OGHAM SPACE MARK                            | Zs |     \x{1680}     |   
          |       |         |                                             |    |                  |
          |  180E |  MVS    | MONGOLIAN VOWEL SEPARATOR                   | Cf |     \x{180E}     |  ᠎
          |       |         |                                             |    |                  |
          |  2000 |  NQSP   | EN QUAD                                     | Zs |     \x{2000}     |   
          |  2001 |  MQSP   | EM QUAD                                     | Zs |     \x{2001}     |   
          |  2002 |  ENSP   | EN SPACE                                    | Zs |     \x{2002}     |   
          |  2003 |  EMSP   | EM SPACE                                    | Zs |     \x{2003}     |   
          |  2004 |  3/MSP  | THREE-PER-EM SPACE                          | Zs |     \x{2004}     |   
          |  2005 |  4/MSP  | FOUR-PER-EM SPACE                           | Zs |     \x{2005}     |   
          |  2006 |  6/MSP  | SIX-PER-EM SPACE                            | Zs |     \x{2006}     |   
          |  2007 |  FSP    | FIGURE SPACE                                | Zs |     \x{2007}     |   
          |  2008 |  PSP    | PUNCTUATION SPACE                           | Zs |     \x{2008}     |   
          |  2009 |  THSP   | THIN SPACE                                  | Zs |     \x{2009}     |   
          |  200A |  HSP    | HAIR SPACE                                  | Zs |     \x{200A}     |   
          |       |         |                                             |    |                  |
          |  200B |  ZWSP   | ZERO WIDTH SPACE                            | Cf |     \x{200B}     |  ​
          |  200C |  ZWNJ   | ZERO WIDTH NON-JOINER                       | Cf |     \x{200C}     |  ‌
          |  200D |  ZWJ    | ZERO WIDTH JOINER                           | Cf |     \x{200D}     |  ‍
          |  200E |  LRM    | LEFT-TO-RIGHT MARK                          | Cf |     \x{200E}     |  ‎
          |  200F |  RLM    | RIGHT-TO-LEFT MARK                          | Cf |     \x{200F}     |  ‏
          |       |         |                                             |    |                  |
          |  202A |  LRE    | LEFT-TO-RIGHT EMBEDDING                     | Cf |     \x{202A}     |  ‪
          |  202B |  RLE    | RIGHT-TO-LEFT EMBEDDING                     | Cf |     \x{202B}     |  ‫
          |  202C |  PDF    | POP DIRECTIONAL FORMATTING                  | Cf |     \x{202C}     |  ‬
          |  202D |  LRO    | LEFT-TO-RIGHT OVERRIDE                      | Cf |     \x{202D}     |  ‭
          |  202E |  RLO    | RIGHT-TO-LEFT OVERRIDE                      | Cf |     \x{202E}     |  ‮
          |       |         |                                             |    |                  |
          |  2028 |  LS     | LINE SEPARATOR                              | Zl |     \x{2028}     |  

          |  2029 |  PS     | PARAGRAPH SEPARATOR                         | Zp |     \x{2029}     |  

          |       |         |                                             |    |                  |
          |  202F |  NNBSP  | NARROW NO-BREAK SPACE                       | Zs |     \x{202F}     |   
          |       |         |                                             |    |                  |
          |  205F |  MMSP   | MEDIUM MATHEMATICAL SPACE                   | Zs |     \x{205F}     |   
          |       |         |                                             |    |                  |
          |  2060 |  WJ     | WORD JOINER                                 | Cf |     \x{2060}     |  ⁠
          |       |         |                                             |    |                  |
          |  2066 |  LRI    | LEFT-TO-RIGHT ISOLATE                       | Cf |     \x{2066}     |  ⁦
          |  2067 |  RLI    | RIGHT-TO-LEFT ISOLATE                       | Cf |     \x{2067}     |  ⁧
          |  2068 |  FSI    | FIRST STRONG ISOLATE                        | Cf |     \x{2068}     |  ⁨
          |  2069 |  PDI    | POP DIRECTIONAL ISOLATE                     | Cf |     \x{2069}     |  ⁩
          |  206A |  ISS    | INHIBIT SYMMETRIC SWAPPING                  | Cf |     \x{206A}     |  
          |  206B |  ASS    | ACTIVATE SYMMETRIC SWAPPING                 | Cf |     \x{206B}     |  
          |  206C |  IAFS   | INHIBIT ARABIC FORM SHAPING                 | Cf |     \x{206C}     |  
          |  206D |  AAFS   | ACTIVATE ARABIC FORM SHAPING                | Cf |     \x{206D}     |  
          |  206E |  NADS   | NATIONAL DIGIT SHAPES                       | Cf |     \x{206E}     |  
          |  206F |  NOSP   | NOMINAL DIGIT SHAPES                        | Cf |     \x{206F}     |  
          |       |         |                                             |    |                  |
          |  3000 |  IDSP   | IDEOGRAPHIC SPACE                           | Zs |     \x{3000}     |   
          |       |         |                                             |    |                  |
          |  FEFF |  ZWNBSP | ZERO WIDTH NO-BREAK SPACE / BYTE ORDER MARK | Cf |     \x{FEFF}     |  
          •-------•---------•---------------------------------------------•----•------------------•------
      

      I would like to make three points about this list:

      • I don’t see the purpose of showing the Line Separator, \x{2028} character and the Paragraph Separator \x{2029} characters. Indeed, there are not format characters and, morever, they already have a “black on white” representation LS and PS. Simply check the Show Non-Printing characters `option then uncheck it and observe the changes regarding these two chars, in the above table !

      Conversely, I think that we miss two individual and two sets of invisible characters, which do belong to the Unicode BMP :

      • The Soft Hyphen character of code \x{00AD}

      • The Syriac Abreviation Mark of code \x{070F}

      • The four invisible operators of the General Punctuation block, between \x{2061} and \x{2064}. Refer to :

        • https://www.w3.org/TR/2010/REC-MathML3-20101021/chapter3.html#presm.invisibleops for further information
      • The three Interlinear annotation characters of the Specials block, between \x{FFF9} and \x{FFFB}. Refer to :

        • https://www.w3.org/TR/unicode-xml/#Interlinear

        • https://en.wikipedia.org/wiki/Ruby_character


      So, if we take in account the above remarks, regarding the new items to include / exclude to the list of the existing non-printing chars, we get the updated list below :

          •-------•---------•---------------------------------------------•----•------------------•-------
          | Code  | Abbrev. |           Character Name                    | Cg |    N++  Regex    | Char
          •-------•---------•---------------------------------------------•----•------------------•-------
          |  00A0 |  NBSP   | NO-BREAK SPACE                              | Zs |     \x{00A0}     |   
          |       |         |                                             |    |                  |
          |  00AD |  SHY    | SOFT HYPHEN                                 | Cf |     \x{00AD}     |  ­
          |       |         |                                             |    |                  |
          |  061C |  ALM    | ARABIC LETTER MARK                          | Cf |     \x{061C}     |  ؜
          |       |         |                                             |    |                  |
          |  070F |  SAM    | SYRIAC ABBREVIATION MARK                    | Cf |     \x{070F}     |  ܏
          |       |         |                                             |    |                  |
          |  1680 |  OSPM   | OGHAM SPACE MARK                            | Zs |     \x{1680}     |   
          |       |         |                                             |    |                  |
          |  180E |  MVS    | MONGOLIAN VOWEL SEPARATOR                   | Cf |     \x{180E}     |  ᠎
          |       |         |                                             |    |                  |
          |  2000 |  NQSP   | EN QUAD                                     | Zs |     \x{2000}     |   
          |  2001 |  MQSP   | EM QUAD                                     | Zs |     \x{2001}     |   
          |  2002 |  ENSP   | EN SPACE                                    | Zs |     \x{2002}     |   
          |  2003 |  EMSP   | EM SPACE                                    | Zs |     \x{2003}     |   
          |  2004 |  3/MSP  | THREE-PER-EM SPACE                          | Zs |     \x{2004}     |   
          |  2005 |  4/MSP  | FOUR-PER-EM SPACE                           | Zs |     \x{2005}     |   
          |  2006 |  6/MSP  | SIX-PER-EM SPACE                            | Zs |     \x{2006}     |   
          |  2007 |  FSP    | FIGURE SPACE                                | Zs |     \x{2007}     |   
          |  2008 |  PSP    | PUNCTUATION SPACE                           | Zs |     \x{2008}     |   
          |  2009 |  THSP   | THIN SPACE                                  | Zs |     \x{2009}     |   
          |  200A |  HSP    | HAIR SPACE                                  | Zs |     \x{200A}     |   
          |       |         |                                             |    |                  |
          |  200B |  ZWSP   | ZERO WIDTH SPACE                            | Cf |     \x{200B}     |  ​
          |  200C |  ZWNJ   | ZERO WIDTH NON-JOINER                       | Cf |     \x{200C}     |  ‌
          |  200D |  ZWJ    | ZERO WIDTH JOINER                           | Cf |     \x{200D}     |  ‍
          |  200E |  LRM    | LEFT-TO-RIGHT MARK                          | Cf |     \x{200E}     |  ‎
          |  200F |  RLM    | RIGHT-TO-LEFT MARK                          | Cf |     \x{200F}     |  ‏
          |       |         |                                             |    |                  |
          |  202A |  LRE    | LEFT-TO-RIGHT EMBEDDING                     | Cf |     \x{202A}     |  ‪
          |  202B |  RLE    | RIGHT-TO-LEFT EMBEDDING                     | Cf |     \x{202B}     |  ‫
          |  202C |  PDF    | POP DIRECTIONAL FORMATTING                  | Cf |     \x{202C}     |  ‬
          |  202D |  LRO    | LEFT-TO-RIGHT OVERRIDE                      | Cf |     \x{202D}     |  ‭
          |  202E |  RLO    | RIGHT-TO-LEFT OVERRIDE                      | Cf |     \x{202E}     |  ‮
          |       |         |                                             |    |                  |
          |  202F |  NNBSP  | NARROW NO-BREAK SPACE                       | Zs |     \x{202F}     |   
          |       |         |                                             |    |                  |
          |  205F |  MMSP   | MEDIUM MATHEMATICAL SPACE                   | Zs |     \x{205F}     |   
          |       |         |                                             |    |                  |
          |  2060 |  WJ     | WORD JOINER                                 | Cf |     \x{2060}     |  ⁠
          |       |         |                                             |    |                  |
          |  2061 | (FA)    | FUNCTION APPLICATION                        | Cf |     \x{2061}     |  ⁡
          |  2062 | (IT)    | INVISIBLE TIMES                             | Cf |     \x{2062}     |  ⁢
          |  2063 | (IS)    | INVISIBLE SEPARATOR                         | Cf |     \x{2063}     |  ⁣
          |  2064 | (IP)    | INVISIBLE PLUS                              | Cf |     \x{2064}     |  ⁤
          |       |         |                                             |    |                  |
          |  2066 |  LRI    | LEFT-TO-RIGHT ISOLATE                       | Cf |     \x{2066}     |  ⁦
          |  2067 |  RLI    | RIGHT-TO-LEFT ISOLATE                       | Cf |     \x{2067}     |  ⁧
          |  2068 |  FSI    | FIRST STRONG ISOLATE                        | Cf |     \x{2068}     |  ⁨
          |  2069 |  PDI    | POP DIRECTIONAL ISOLATE                     | Cf |     \x{2069}     |  ⁩
          |  206A |  ISS    | INHIBIT SYMMETRIC SWAPPING                  | Cf |     \x{206A}     |  
          |  206B |  ASS    | ACTIVATE SYMMETRIC SWAPPING                 | Cf |     \x{206B}     |  
          |  206C |  IAFS   | INHIBIT ARABIC FORM SHAPING                 | Cf |     \x{206C}     |  
          |  206D |  AAFS   | ACTIVATE ARABIC FORM SHAPING                | Cf |     \x{206D}     |  
          |  206E |  NADS   | NATIONAL DIGIT SHAPES                       | Cf |     \x{206E}     |  
          |  206F |  NOSP   | NOMINAL DIGIT SHAPES                        | Cf |     \x{206F}     |  
          |       |         |                                             |    |                  |
          |  3000 |  IDSP   | IDEOGRAPHIC SPACE                           | Zs |     \x{3000}     |   
          |       |         |                                             |    |                  |
          |  FEFF |  ZWNBSP | ZERO WIDTH NO-BREAK SPACE / BYTE ORDER MARK | Cf |     \x{FEFF}     |  
          |       |         |                                             |    |                  |
          |  FFF9 |  IAA    | INTERLINEAR ANNOTATION ANCHOR               | Cf |     \x{FFF9}     |  
          |  FFFA |  IAS    | INTERLINEAR ANNOTATION SEPARATOR            | Cf |     \x{FFFA}     |  
          |  FFFB |  IAT    | INTERLINEAR ANNOTATION TERMINATOR           | Cf |     \x{FFFB}     |  
          •-------•---------•---------------------------------------------•----•------------------•------
      

      Of course, if my remarks seem pertinent enough to most people, I’ll create a GitHub issue !


      Notes :

      • Regarding the syntax to use for naming this new N++ feature, I propose this one :

      View > Show Symbol > Show BMP Format chars / Non Regular Spaces. Do you like it ?

      • Regarding the last table, the equivalent regex to mark these 49 special characters becomes :

      MARK [\x{00A0}\x{00AD}\x{061C}\x{070F}\x{1680}\x{180E}\x{2000}-\x{200A}\x{200B}-\x{200F}\x{202A}-\x{202E}\x{202F}\x{205F}-\x{206F}\x{3000}\x{FEFF}\x{FFF9}\x{FFFA}\x{FFFB}]

      Depending of the character marked, you should get, either, an red/orange space for a space character OR a thin red line for all the other characters !

      Best Regards,

      guy038

      P.S. :

      To have a overview of all these strange Unicode characters, read this exhaustive article :

      https://en.wikipedia.org/wiki/Universal_Character_Set_characters

      1 Reply Last reply Reply Quote 5
      • guy038G Offline
        guy038
        last edited by

        Hi, All,

        Although my post did not get some positive reviews, I still created an issue on GiHub !

        https://github.com/notepad-plus-plus/notepad-plus-plus/issues/13408

        Best Regards

        guy038

        C 1 Reply Last reply Reply Quote 2
        • C Offline
          cs96and @guy038
          last edited by

          @guy038 I’m wondering why IDSP (“Ideographic Space”) has been included in this list. It is not a zero width character, so it is a printable (albeit whitespace) character.

          1 Reply Last reply Reply Quote 1

          Hello! It looks like you're interested in this conversation, but you don't have an account yet.

          Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.

          With your input, this post could be even better 💗

          Register Login
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors