New N++ feature to show/hide Non-Printing characters
-
Hi All,
Do you remember of this Invisible characters unwanted discussion and of my last post, below, about the main invisible characters which need a visual representation ?
https://community.notepad-plus-plus.org/post/62169
From this post, and regarding the new N++ feature, in the
v8.5
release, to show the non-printing characters, it think that it would be interesting to have a new look at this topic !
Firstly, in the last N++ release, the invisible characters, located outside the
BMP
( Basic Multilingual Plane ), are not taken in account. I think that this position is acceptable as :-
The two format
Kaithi
characters are related to the historicalKaithi
script, which is rather ignored since the 1970’s years -
The nine format
Egyptian
characters refer to the ancient Egyptian hieroglyphs -
The four format
Shorthand
characters cannot be considered as true characters as it may encode a lot of european languages, simultaneously -
The 233 characters of the
Musical symbols
Unicode block cannot be strictly considered as characters and rather represents a modern musical notation system -
To end with, all the format characters of the
Tag
block are strongly discouraged by the Unicode Consortium
Now, if we consider all the non-printing characters seen, when the
View > Show Symbol > Show Non-Printing characters
is set, we get a list of42
lines :•-------•---------•---------------------------------------------•----•------------------•------- | Code | Abbrev. | Character Name | Cg | N++ Regex | Char •-------•---------•---------------------------------------------•----•------------------•------- | 00A0 | NBSP | NO-BREAK SPACE | Zs | \x{00A0} | | | | | | | | 061C | ALM | ARABIC LETTER MARK | Cf | \x{061C} | | | | | | | | 1680 | OSPM | OGHAM SPACE MARK | Zs | \x{1680} | | | | | | | | 180E | MVS | MONGOLIAN VOWEL SEPARATOR | Cf | \x{180E} | | | | | | | | 2000 | NQSP | EN QUAD | Zs | \x{2000} | | 2001 | MQSP | EM QUAD | Zs | \x{2001} | | 2002 | ENSP | EN SPACE | Zs | \x{2002} | | 2003 | EMSP | EM SPACE | Zs | \x{2003} | | 2004 | 3/MSP | THREE-PER-EM SPACE | Zs | \x{2004} | | 2005 | 4/MSP | FOUR-PER-EM SPACE | Zs | \x{2005} | | 2006 | 6/MSP | SIX-PER-EM SPACE | Zs | \x{2006} | | 2007 | FSP | FIGURE SPACE | Zs | \x{2007} | | 2008 | PSP | PUNCTUATION SPACE | Zs | \x{2008} | | 2009 | THSP | THIN SPACE | Zs | \x{2009} | | 200A | HSP | HAIR SPACE | Zs | \x{200A} | | | | | | | | 200B | ZWSP | ZERO WIDTH SPACE | Cf | \x{200B} | | 200C | ZWNJ | ZERO WIDTH NON-JOINER | Cf | \x{200C} | | 200D | ZWJ | ZERO WIDTH JOINER | Cf | \x{200D} | | 200E | LRM | LEFT-TO-RIGHT MARK | Cf | \x{200E} | | 200F | RLM | RIGHT-TO-LEFT MARK | Cf | \x{200F} | | | | | | | | 202A | LRE | LEFT-TO-RIGHT EMBEDDING | Cf | \x{202A} | | 202B | RLE | RIGHT-TO-LEFT EMBEDDING | Cf | \x{202B} | | 202C | PDF | POP DIRECTIONAL FORMATTING | Cf | \x{202C} | | 202D | LRO | LEFT-TO-RIGHT OVERRIDE | Cf | \x{202D} | | 202E | RLO | RIGHT-TO-LEFT OVERRIDE | Cf | \x{202E} | | | | | | | | 2028 | LS | LINE SEPARATOR | Zl | \x{2028} | | 2029 | PS | PARAGRAPH SEPARATOR | Zp | \x{2029} | | | | | | | | 202F | NNBSP | NARROW NO-BREAK SPACE | Zs | \x{202F} | | | | | | | | 205F | MMSP | MEDIUM MATHEMATICAL SPACE | Zs | \x{205F} | | | | | | | | 2060 | WJ | WORD JOINER | Cf | \x{2060} | | | | | | | | 2066 | LRI | LEFT-TO-RIGHT ISOLATE | Cf | \x{2066} | | 2067 | RLI | RIGHT-TO-LEFT ISOLATE | Cf | \x{2067} | | 2068 | FSI | FIRST STRONG ISOLATE | Cf | \x{2068} | | 2069 | PDI | POP DIRECTIONAL ISOLATE | Cf | \x{2069} | | 206A | ISS | INHIBIT SYMMETRIC SWAPPING | Cf | \x{206A} | | 206B | ASS | ACTIVATE SYMMETRIC SWAPPING | Cf | \x{206B} | | 206C | IAFS | INHIBIT ARABIC FORM SHAPING | Cf | \x{206C} | | 206D | AAFS | ACTIVATE ARABIC FORM SHAPING | Cf | \x{206D} | | 206E | NADS | NATIONAL DIGIT SHAPES | Cf | \x{206E} | | 206F | NOSP | NOMINAL DIGIT SHAPES | Cf | \x{206F} | | | | | | | | 3000 | IDSP | IDEOGRAPHIC SPACE | Zs | \x{3000} | | | | | | | | FEFF | ZWNBSP | ZERO WIDTH NO-BREAK SPACE / BYTE ORDER MARK | Cf | \x{FEFF} | •-------•---------•---------------------------------------------•----•------------------•------
I would like to make three points about this list:
- I don’t see the purpose of showing the Line Separator,
\x{2028}
character and the Paragraph Separator\x{2029}
characters. Indeed, there are not format characters and, morever, they already have a “black on white” representation LS and PS. Simply check theShow Non-Printing characters
`option then uncheck it and observe the changes regarding these two chars, in the above table !
Conversely, I think that we miss two individual and two sets of invisible characters, which do belong to the Unicode
BMP
:-
The Soft Hyphen character of code
\x{00AD}
-
The Syriac Abreviation Mark of code
\x{070F}
-
The four invisible operators of the General Punctuation block, between
\x{2061}
and\x{2064}
. Refer to :- https://www.w3.org/TR/2010/REC-MathML3-20101021/chapter3.html#presm.invisibleops for further information
-
The three
Interlinear annotation
characters of the Specials block, between\x{FFF9}
and\x{FFFB}
. Refer to :
So, if we take in account the above remarks, regarding the new items to include / exclude to the list of the existing non-printing chars, we get the updated list below :
•-------•---------•---------------------------------------------•----•------------------•------- | Code | Abbrev. | Character Name | Cg | N++ Regex | Char •-------•---------•---------------------------------------------•----•------------------•------- | 00A0 | NBSP | NO-BREAK SPACE | Zs | \x{00A0} | | | | | | | | 00AD | SHY | SOFT HYPHEN | Cf | \x{00AD} | | | | | | | | 061C | ALM | ARABIC LETTER MARK | Cf | \x{061C} | | | | | | | | 070F | SAM | SYRIAC ABBREVIATION MARK | Cf | \x{070F} | | | | | | | | 1680 | OSPM | OGHAM SPACE MARK | Zs | \x{1680} | | | | | | | | 180E | MVS | MONGOLIAN VOWEL SEPARATOR | Cf | \x{180E} | | | | | | | | 2000 | NQSP | EN QUAD | Zs | \x{2000} | | 2001 | MQSP | EM QUAD | Zs | \x{2001} | | 2002 | ENSP | EN SPACE | Zs | \x{2002} | | 2003 | EMSP | EM SPACE | Zs | \x{2003} | | 2004 | 3/MSP | THREE-PER-EM SPACE | Zs | \x{2004} | | 2005 | 4/MSP | FOUR-PER-EM SPACE | Zs | \x{2005} | | 2006 | 6/MSP | SIX-PER-EM SPACE | Zs | \x{2006} | | 2007 | FSP | FIGURE SPACE | Zs | \x{2007} | | 2008 | PSP | PUNCTUATION SPACE | Zs | \x{2008} | | 2009 | THSP | THIN SPACE | Zs | \x{2009} | | 200A | HSP | HAIR SPACE | Zs | \x{200A} | | | | | | | | 200B | ZWSP | ZERO WIDTH SPACE | Cf | \x{200B} | | 200C | ZWNJ | ZERO WIDTH NON-JOINER | Cf | \x{200C} | | 200D | ZWJ | ZERO WIDTH JOINER | Cf | \x{200D} | | 200E | LRM | LEFT-TO-RIGHT MARK | Cf | \x{200E} | | 200F | RLM | RIGHT-TO-LEFT MARK | Cf | \x{200F} | | | | | | | | 202A | LRE | LEFT-TO-RIGHT EMBEDDING | Cf | \x{202A} | | 202B | RLE | RIGHT-TO-LEFT EMBEDDING | Cf | \x{202B} | | 202C | PDF | POP DIRECTIONAL FORMATTING | Cf | \x{202C} | | 202D | LRO | LEFT-TO-RIGHT OVERRIDE | Cf | \x{202D} | | 202E | RLO | RIGHT-TO-LEFT OVERRIDE | Cf | \x{202E} | | | | | | | | 202F | NNBSP | NARROW NO-BREAK SPACE | Zs | \x{202F} | | | | | | | | 205F | MMSP | MEDIUM MATHEMATICAL SPACE | Zs | \x{205F} | | | | | | | | 2060 | WJ | WORD JOINER | Cf | \x{2060} | | | | | | | | 2061 | (FA) | FUNCTION APPLICATION | Cf | \x{2061} | | 2062 | (IT) | INVISIBLE TIMES | Cf | \x{2062} | | 2063 | (IS) | INVISIBLE SEPARATOR | Cf | \x{2063} | | 2064 | (IP) | INVISIBLE PLUS | Cf | \x{2064} | | | | | | | | 2066 | LRI | LEFT-TO-RIGHT ISOLATE | Cf | \x{2066} | | 2067 | RLI | RIGHT-TO-LEFT ISOLATE | Cf | \x{2067} | | 2068 | FSI | FIRST STRONG ISOLATE | Cf | \x{2068} | | 2069 | PDI | POP DIRECTIONAL ISOLATE | Cf | \x{2069} | | 206A | ISS | INHIBIT SYMMETRIC SWAPPING | Cf | \x{206A} | | 206B | ASS | ACTIVATE SYMMETRIC SWAPPING | Cf | \x{206B} | | 206C | IAFS | INHIBIT ARABIC FORM SHAPING | Cf | \x{206C} | | 206D | AAFS | ACTIVATE ARABIC FORM SHAPING | Cf | \x{206D} | | 206E | NADS | NATIONAL DIGIT SHAPES | Cf | \x{206E} | | 206F | NOSP | NOMINAL DIGIT SHAPES | Cf | \x{206F} | | | | | | | | 3000 | IDSP | IDEOGRAPHIC SPACE | Zs | \x{3000} | | | | | | | | FEFF | ZWNBSP | ZERO WIDTH NO-BREAK SPACE / BYTE ORDER MARK | Cf | \x{FEFF} | | | | | | | | FFF9 | IAA | INTERLINEAR ANNOTATION ANCHOR | Cf | \x{FFF9} | | FFFA | IAS | INTERLINEAR ANNOTATION SEPARATOR | Cf | \x{FFFA} | | FFFB | IAT | INTERLINEAR ANNOTATION TERMINATOR | Cf | \x{FFFB} | •-------•---------•---------------------------------------------•----•------------------•------
Of course, if my remarks seem pertinent enough to most people, I’ll create a GitHub issue !
Notes :
- Regarding the syntax to use for naming this new N++ feature, I propose this one :
View > Show Symbol > Show BMP Format chars / Non Regular Spaces
. Do you like it ?- Regarding the last table, the equivalent regex to mark these
49
special characters becomes :
MARK
[\x{00A0}\x{00AD}\x{061C}\x{070F}\x{1680}\x{180E}\x{2000}-\x{200A}\x{200B}-\x{200F}\x{202A}-\x{202E}\x{202F}\x{205F}-\x{206F}\x{3000}\x{FEFF}\x{FFF9}\x{FFFA}\x{FFFB}]
Depending of the character marked, you should get, either, an red/orange space for a space character OR a thin red line for all the other characters !
Best Regards,
guy038
P.S. :
To have a overview of all these strange Unicode characters, read this exhaustive article :
https://en.wikipedia.org/wiki/Universal_Character_Set_characters
-
-
Hi, All,
Although my post did not get some positive reviews, I still created an issue on
GiHub
!https://github.com/notepad-plus-plus/notepad-plus-plus/issues/13408
Best Regards
guy038