Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    New N++ feature to show/hide Non-Printing characters

    General Discussion
    1
    1
    107
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038
      guy038 last edited by guy038

      Hi All,

      Do you remember of this Invisible characters unwanted discussion and of my last post, below, about the main invisible characters which need a visual representation ?

      https://community.notepad-plus-plus.org/post/62169

      From this post, and regarding the new N++ feature, in the v8.5 release, to show the non-printing characters, it think that it would be interesting to have a new look at this topic !


      Firstly, in the last N++ release, the invisible characters, located outside the BMP ( Basic Multilingual Plane ), are not taken in account. I think that this position is acceptable as :

      • The two format Kaithi characters are related to the historical Kaithi script, which is rather ignored since the 1970’s years

      • The nine format Egyptian characters refer to the ancient Egyptian hieroglyphs

      • The four format Shorthand characters cannot be considered as true characters as it may encode a lot of european languages, simultaneously

      • The 233 characters of the Musical symbols Unicode block cannot be strictly considered as characters and rather represents a modern musical notation system

      • To end with, all the format characters of the Tag block are strongly discouraged by the Unicode Consortium


      Now, if we consider all the non-printing characters seen, when the View > Show Symbol > Show Non-Printing characters is set, we get a list of 42 lines :

          •-------•---------•---------------------------------------------•----•------------------•-------
          | Code  | Abbrev. |           Character Name                    | Cg |    N++  Regex    | Char
          •-------•---------•---------------------------------------------•----•------------------•-------
          |  00A0 |  NBSP   | NO-BREAK SPACE                              | Zs |     \x{00A0}     |   
          |       |         |                                             |    |                  |
          |  061C |  ALM    | ARABIC LETTER MARK                          | Cf |     \x{061C}     |  ؜
          |       |         |                                             |    |                  |
          |  1680 |  OSPM   | OGHAM SPACE MARK                            | Zs |     \x{1680}     |   
          |       |         |                                             |    |                  |
          |  180E |  MVS    | MONGOLIAN VOWEL SEPARATOR                   | Cf |     \x{180E}     |  ᠎
          |       |         |                                             |    |                  |
          |  2000 |  NQSP   | EN QUAD                                     | Zs |     \x{2000}     |   
          |  2001 |  MQSP   | EM QUAD                                     | Zs |     \x{2001}     |   
          |  2002 |  ENSP   | EN SPACE                                    | Zs |     \x{2002}     |   
          |  2003 |  EMSP   | EM SPACE                                    | Zs |     \x{2003}     |   
          |  2004 |  3/MSP  | THREE-PER-EM SPACE                          | Zs |     \x{2004}     |   
          |  2005 |  4/MSP  | FOUR-PER-EM SPACE                           | Zs |     \x{2005}     |   
          |  2006 |  6/MSP  | SIX-PER-EM SPACE                            | Zs |     \x{2006}     |   
          |  2007 |  FSP    | FIGURE SPACE                                | Zs |     \x{2007}     |   
          |  2008 |  PSP    | PUNCTUATION SPACE                           | Zs |     \x{2008}     |   
          |  2009 |  THSP   | THIN SPACE                                  | Zs |     \x{2009}     |   
          |  200A |  HSP    | HAIR SPACE                                  | Zs |     \x{200A}     |   
          |       |         |                                             |    |                  |
          |  200B |  ZWSP   | ZERO WIDTH SPACE                            | Cf |     \x{200B}     |  ​
          |  200C |  ZWNJ   | ZERO WIDTH NON-JOINER                       | Cf |     \x{200C}     |  ‌
          |  200D |  ZWJ    | ZERO WIDTH JOINER                           | Cf |     \x{200D}     |  ‍
          |  200E |  LRM    | LEFT-TO-RIGHT MARK                          | Cf |     \x{200E}     |  ‎
          |  200F |  RLM    | RIGHT-TO-LEFT MARK                          | Cf |     \x{200F}     |  ‏
          |       |         |                                             |    |                  |
          |  202A |  LRE    | LEFT-TO-RIGHT EMBEDDING                     | Cf |     \x{202A}     |  ‪
          |  202B |  RLE    | RIGHT-TO-LEFT EMBEDDING                     | Cf |     \x{202B}     |  ‫
          |  202C |  PDF    | POP DIRECTIONAL FORMATTING                  | Cf |     \x{202C}     |  ‬
          |  202D |  LRO    | LEFT-TO-RIGHT OVERRIDE                      | Cf |     \x{202D}     |  ‭
          |  202E |  RLO    | RIGHT-TO-LEFT OVERRIDE                      | Cf |     \x{202E}     |  ‮
          |       |         |                                             |    |                  |
          |  2028 |  LS     | LINE SEPARATOR                              | Zl |     \x{2028}     |  

          |  2029 |  PS     | PARAGRAPH SEPARATOR                         | Zp |     \x{2029}     |  

          |       |         |                                             |    |                  |
          |  202F |  NNBSP  | NARROW NO-BREAK SPACE                       | Zs |     \x{202F}     |   
          |       |         |                                             |    |                  |
          |  205F |  MMSP   | MEDIUM MATHEMATICAL SPACE                   | Zs |     \x{205F}     |   
          |       |         |                                             |    |                  |
          |  2060 |  WJ     | WORD JOINER                                 | Cf |     \x{2060}     |  ⁠
          |       |         |                                             |    |                  |
          |  2066 |  LRI    | LEFT-TO-RIGHT ISOLATE                       | Cf |     \x{2066}     |  ⁦
          |  2067 |  RLI    | RIGHT-TO-LEFT ISOLATE                       | Cf |     \x{2067}     |  ⁧
          |  2068 |  FSI    | FIRST STRONG ISOLATE                        | Cf |     \x{2068}     |  ⁨
          |  2069 |  PDI    | POP DIRECTIONAL ISOLATE                     | Cf |     \x{2069}     |  ⁩
          |  206A |  ISS    | INHIBIT SYMMETRIC SWAPPING                  | Cf |     \x{206A}     |  
          |  206B |  ASS    | ACTIVATE SYMMETRIC SWAPPING                 | Cf |     \x{206B}     |  
          |  206C |  IAFS   | INHIBIT ARABIC FORM SHAPING                 | Cf |     \x{206C}     |  
          |  206D |  AAFS   | ACTIVATE ARABIC FORM SHAPING                | Cf |     \x{206D}     |  
          |  206E |  NADS   | NATIONAL DIGIT SHAPES                       | Cf |     \x{206E}     |  
          |  206F |  NOSP   | NOMINAL DIGIT SHAPES                        | Cf |     \x{206F}     |  
          |       |         |                                             |    |                  |
          |  3000 |  IDSP   | IDEOGRAPHIC SPACE                           | Zs |     \x{3000}     |   
          |       |         |                                             |    |                  |
          |  FEFF |  ZWNBSP | ZERO WIDTH NO-BREAK SPACE / BYTE ORDER MARK | Cf |     \x{FEFF}     |  
          •-------•---------•---------------------------------------------•----•------------------•------
      

      I would like to make three points about this list:

      • I don’t see the purpose of showing the Line Separator, \x{2028} character and the Paragraph Separator \x{2029} characters. Indeed, there are not format characters and, morever, they already have a “black on white” representation LS and PS. Simply check the Show Non-Printing characters `option then uncheck it and observe the changes regarding these two chars, in the above table !

      Conversely, I think that we miss two individual and two sets of invisible characters, which do belong to the Unicode BMP :

      • The Soft Hyphen character of code \x{00AD}

      • The Syriac Abreviation Mark of code \x{070F}

      • The four invisible operators of the General Punctuation block, between \x{2061} and \x{2064}. Refer to :

        • https://www.w3.org/TR/2010/REC-MathML3-20101021/chapter3.html#presm.invisibleops for further information
      • The three Interlinear annotation characters of the Specials block, between \x{FFF9} and \x{FFFB}. Refer to :

        • https://www.w3.org/TR/unicode-xml/#Interlinear

        • https://en.wikipedia.org/wiki/Ruby_character


      So, if we take in account the above remarks, regarding the new items to include / exclude to the list of the existing non-printing chars, we get the updated list below :

          •-------•---------•---------------------------------------------•----•------------------•-------
          | Code  | Abbrev. |           Character Name                    | Cg |    N++  Regex    | Char
          •-------•---------•---------------------------------------------•----•------------------•-------
          |  00A0 |  NBSP   | NO-BREAK SPACE                              | Zs |     \x{00A0}     |   
          |       |         |                                             |    |                  |
          |  00AD |  SHY    | SOFT HYPHEN                                 | Cf |     \x{00AD}     |  ­
          |       |         |                                             |    |                  |
          |  061C |  ALM    | ARABIC LETTER MARK                          | Cf |     \x{061C}     |  ؜
          |       |         |                                             |    |                  |
          |  070F |  SAM    | SYRIAC ABBREVIATION MARK                    | Cf |     \x{070F}     |  ܏
          |       |         |                                             |    |                  |
          |  1680 |  OSPM   | OGHAM SPACE MARK                            | Zs |     \x{1680}     |   
          |       |         |                                             |    |                  |
          |  180E |  MVS    | MONGOLIAN VOWEL SEPARATOR                   | Cf |     \x{180E}     |  ᠎
          |       |         |                                             |    |                  |
          |  2000 |  NQSP   | EN QUAD                                     | Zs |     \x{2000}     |   
          |  2001 |  MQSP   | EM QUAD                                     | Zs |     \x{2001}     |   
          |  2002 |  ENSP   | EN SPACE                                    | Zs |     \x{2002}     |   
          |  2003 |  EMSP   | EM SPACE                                    | Zs |     \x{2003}     |   
          |  2004 |  3/MSP  | THREE-PER-EM SPACE                          | Zs |     \x{2004}     |   
          |  2005 |  4/MSP  | FOUR-PER-EM SPACE                           | Zs |     \x{2005}     |   
          |  2006 |  6/MSP  | SIX-PER-EM SPACE                            | Zs |     \x{2006}     |   
          |  2007 |  FSP    | FIGURE SPACE                                | Zs |     \x{2007}     |   
          |  2008 |  PSP    | PUNCTUATION SPACE                           | Zs |     \x{2008}     |   
          |  2009 |  THSP   | THIN SPACE                                  | Zs |     \x{2009}     |   
          |  200A |  HSP    | HAIR SPACE                                  | Zs |     \x{200A}     |   
          |       |         |                                             |    |                  |
          |  200B |  ZWSP   | ZERO WIDTH SPACE                            | Cf |     \x{200B}     |  ​
          |  200C |  ZWNJ   | ZERO WIDTH NON-JOINER                       | Cf |     \x{200C}     |  ‌
          |  200D |  ZWJ    | ZERO WIDTH JOINER                           | Cf |     \x{200D}     |  ‍
          |  200E |  LRM    | LEFT-TO-RIGHT MARK                          | Cf |     \x{200E}     |  ‎
          |  200F |  RLM    | RIGHT-TO-LEFT MARK                          | Cf |     \x{200F}     |  ‏
          |       |         |                                             |    |                  |
          |  202A |  LRE    | LEFT-TO-RIGHT EMBEDDING                     | Cf |     \x{202A}     |  ‪
          |  202B |  RLE    | RIGHT-TO-LEFT EMBEDDING                     | Cf |     \x{202B}     |  ‫
          |  202C |  PDF    | POP DIRECTIONAL FORMATTING                  | Cf |     \x{202C}     |  ‬
          |  202D |  LRO    | LEFT-TO-RIGHT OVERRIDE                      | Cf |     \x{202D}     |  ‭
          |  202E |  RLO    | RIGHT-TO-LEFT OVERRIDE                      | Cf |     \x{202E}     |  ‮
          |       |         |                                             |    |                  |
          |  202F |  NNBSP  | NARROW NO-BREAK SPACE                       | Zs |     \x{202F}     |   
          |       |         |                                             |    |                  |
          |  205F |  MMSP   | MEDIUM MATHEMATICAL SPACE                   | Zs |     \x{205F}     |   
          |       |         |                                             |    |                  |
          |  2060 |  WJ     | WORD JOINER                                 | Cf |     \x{2060}     |  ⁠
          |       |         |                                             |    |                  |
          |  2061 | (FA)    | FUNCTION APPLICATION                        | Cf |     \x{2061}     |  ⁡
          |  2062 | (IT)    | INVISIBLE TIMES                             | Cf |     \x{2062}     |  ⁢
          |  2063 | (IS)    | INVISIBLE SEPARATOR                         | Cf |     \x{2063}     |  ⁣
          |  2064 | (IP)    | INVISIBLE PLUS                              | Cf |     \x{2064}     |  ⁤
          |       |         |                                             |    |                  |
          |  2066 |  LRI    | LEFT-TO-RIGHT ISOLATE                       | Cf |     \x{2066}     |  ⁦
          |  2067 |  RLI    | RIGHT-TO-LEFT ISOLATE                       | Cf |     \x{2067}     |  ⁧
          |  2068 |  FSI    | FIRST STRONG ISOLATE                        | Cf |     \x{2068}     |  ⁨
          |  2069 |  PDI    | POP DIRECTIONAL ISOLATE                     | Cf |     \x{2069}     |  ⁩
          |  206A |  ISS    | INHIBIT SYMMETRIC SWAPPING                  | Cf |     \x{206A}     |  
          |  206B |  ASS    | ACTIVATE SYMMETRIC SWAPPING                 | Cf |     \x{206B}     |  
          |  206C |  IAFS   | INHIBIT ARABIC FORM SHAPING                 | Cf |     \x{206C}     |  
          |  206D |  AAFS   | ACTIVATE ARABIC FORM SHAPING                | Cf |     \x{206D}     |  
          |  206E |  NADS   | NATIONAL DIGIT SHAPES                       | Cf |     \x{206E}     |  
          |  206F |  NOSP   | NOMINAL DIGIT SHAPES                        | Cf |     \x{206F}     |  
          |       |         |                                             |    |                  |
          |  3000 |  IDSP   | IDEOGRAPHIC SPACE                           | Zs |     \x{3000}     |   
          |       |         |                                             |    |                  |
          |  FEFF |  ZWNBSP | ZERO WIDTH NO-BREAK SPACE / BYTE ORDER MARK | Cf |     \x{FEFF}     |  
          |       |         |                                             |    |                  |
          |  FFF9 |  IAA    | INTERLINEAR ANNOTATION ANCHOR               | Cf |     \x{FFF9}     |  
          |  FFFA |  IAS    | INTERLINEAR ANNOTATION SEPARATOR            | Cf |     \x{FFFA}     |  
          |  FFFB |  IAT    | INTERLINEAR ANNOTATION TERMINATOR           | Cf |     \x{FFFB}     |  
          •-------•---------•---------------------------------------------•----•------------------•------
      

      Of course, if my remarks seem pertinent enough to most people, I’ll create a GitHub issue !


      Notes :

      • Regarding the syntax to use for naming this new N++ feature, I propose this one :

      View > Show Symbol > Show BMP Format chars / Non Regular Spaces. Do you like it ?

      • Regarding the last table, the equivalent regex to mark these 49 special characters becomes :

      MARK [\x{00A0}\x{00AD}\x{061C}\x{070F}\x{1680}\x{180E}\x{2000}-\x{200A}\x{200B}-\x{200F}\x{202A}-\x{202E}\x{202F}\x{205F}-\x{206F}\x{3000}\x{FEFF}\x{FFF9}\x{FFFA}\x{FFFB}]

      Depending of the character marked, you should get, either, an red/orange space for a space character OR a thin red line for all the other characters !

      Best Regards,

      guy038

      P.S. :

      To have a overview of all these strange Unicode characters, read this exhaustive article :

      https://en.wikipedia.org/wiki/Universal_Character_Set_characters

      1 Reply Last reply Reply Quote 3
      • First post
        Last post
      Copyright © 2014 NodeBB Forums | Contributors