New N++ feature to show/hide Non-Printing characters
-
Hi All,
Do you remember of this Invisible characters unwanted discussion and of my last post, below, about the main invisible characters which need a visual representation ?
https://community.notepad-plus-plus.org/post/62169
From this post, and regarding the new N++ feature, in the
v8.5release, to show the non-printing characters, it think that it would be interesting to have a new look at this topic !
Firstly, in the last N++ release, the invisible characters, located outside the
BMP( Basic Multilingual Plane ), are not taken in account. I think that this position is acceptable as :-
The two format
Kaithicharacters are related to the historicalKaithiscript, which is rather ignored since the 1970’s years -
The nine format
Egyptiancharacters refer to the ancient Egyptian hieroglyphs -
The four format
Shorthandcharacters cannot be considered as true characters as it may encode a lot of european languages, simultaneously -
The 233 characters of the
Musical symbolsUnicode block cannot be strictly considered as characters and rather represents a modern musical notation system -
To end with, all the format characters of the
Tagblock are strongly discouraged by the Unicode Consortium
Now, if we consider all the non-printing characters seen, when the
View > Show Symbol > Show Non-Printing charactersis set, we get a list of42lines :•-------•---------•---------------------------------------------•----•------------------•------- | Code | Abbrev. | Character Name | Cg | N++ Regex | Char •-------•---------•---------------------------------------------•----•------------------•------- | 00A0 | NBSP | NO-BREAK SPACE | Zs | \x{00A0} | | | | | | | | 061C | ALM | ARABIC LETTER MARK | Cf | \x{061C} | | | | | | | | 1680 | OSPM | OGHAM SPACE MARK | Zs | \x{1680} | | | | | | | | 180E | MVS | MONGOLIAN VOWEL SEPARATOR | Cf | \x{180E} | | | | | | | | 2000 | NQSP | EN QUAD | Zs | \x{2000} | | 2001 | MQSP | EM QUAD | Zs | \x{2001} | | 2002 | ENSP | EN SPACE | Zs | \x{2002} | | 2003 | EMSP | EM SPACE | Zs | \x{2003} | | 2004 | 3/MSP | THREE-PER-EM SPACE | Zs | \x{2004} | | 2005 | 4/MSP | FOUR-PER-EM SPACE | Zs | \x{2005} | | 2006 | 6/MSP | SIX-PER-EM SPACE | Zs | \x{2006} | | 2007 | FSP | FIGURE SPACE | Zs | \x{2007} | | 2008 | PSP | PUNCTUATION SPACE | Zs | \x{2008} | | 2009 | THSP | THIN SPACE | Zs | \x{2009} | | 200A | HSP | HAIR SPACE | Zs | \x{200A} | | | | | | | | 200B | ZWSP | ZERO WIDTH SPACE | Cf | \x{200B} | | 200C | ZWNJ | ZERO WIDTH NON-JOINER | Cf | \x{200C} | | 200D | ZWJ | ZERO WIDTH JOINER | Cf | \x{200D} | | 200E | LRM | LEFT-TO-RIGHT MARK | Cf | \x{200E} | | 200F | RLM | RIGHT-TO-LEFT MARK | Cf | \x{200F} | | | | | | | | 202A | LRE | LEFT-TO-RIGHT EMBEDDING | Cf | \x{202A} | | 202B | RLE | RIGHT-TO-LEFT EMBEDDING | Cf | \x{202B} | | 202C | PDF | POP DIRECTIONAL FORMATTING | Cf | \x{202C} | | 202D | LRO | LEFT-TO-RIGHT OVERRIDE | Cf | \x{202D} | | 202E | RLO | RIGHT-TO-LEFT OVERRIDE | Cf | \x{202E} | | | | | | | | 2028 | LS | LINE SEPARATOR | Zl | \x{2028} | | 2029 | PS | PARAGRAPH SEPARATOR | Zp | \x{2029} | | | | | | | | 202F | NNBSP | NARROW NO-BREAK SPACE | Zs | \x{202F} | | | | | | | | 205F | MMSP | MEDIUM MATHEMATICAL SPACE | Zs | \x{205F} | | | | | | | | 2060 | WJ | WORD JOINER | Cf | \x{2060} | | | | | | | | 2066 | LRI | LEFT-TO-RIGHT ISOLATE | Cf | \x{2066} | | 2067 | RLI | RIGHT-TO-LEFT ISOLATE | Cf | \x{2067} | | 2068 | FSI | FIRST STRONG ISOLATE | Cf | \x{2068} | | 2069 | PDI | POP DIRECTIONAL ISOLATE | Cf | \x{2069} | | 206A | ISS | INHIBIT SYMMETRIC SWAPPING | Cf | \x{206A} | | 206B | ASS | ACTIVATE SYMMETRIC SWAPPING | Cf | \x{206B} | | 206C | IAFS | INHIBIT ARABIC FORM SHAPING | Cf | \x{206C} | | 206D | AAFS | ACTIVATE ARABIC FORM SHAPING | Cf | \x{206D} | | 206E | NADS | NATIONAL DIGIT SHAPES | Cf | \x{206E} | | 206F | NOSP | NOMINAL DIGIT SHAPES | Cf | \x{206F} | | | | | | | | 3000 | IDSP | IDEOGRAPHIC SPACE | Zs | \x{3000} | | | | | | | | FEFF | ZWNBSP | ZERO WIDTH NO-BREAK SPACE / BYTE ORDER MARK | Cf | \x{FEFF} | •-------•---------•---------------------------------------------•----•------------------•------I would like to make three points about this list:
- I don’t see the purpose of showing the Line Separator,
\x{2028}character and the Paragraph Separator\x{2029}characters. Indeed, there are not format characters and, morever, they already have a “black on white” representation LS and PS. Simply check theShow Non-Printing characters`option then uncheck it and observe the changes regarding these two chars, in the above table !
Conversely, I think that we miss two individual and two sets of invisible characters, which do belong to the Unicode
BMP:-
The Soft Hyphen character of code
\x{00AD} -
The Syriac Abreviation Mark of code
\x{070F} -
The four invisible operators of the General Punctuation block, between
\x{2061}and\x{2064}. Refer to :- https://www.w3.org/TR/2010/REC-MathML3-20101021/chapter3.html#presm.invisibleops for further information
-
The three
Interlinear annotationcharacters of the Specials block, between\x{FFF9}and\x{FFFB}. Refer to :
So, if we take in account the above remarks, regarding the new items to include / exclude to the list of the existing non-printing chars, we get the updated list below :
•-------•---------•---------------------------------------------•----•------------------•------- | Code | Abbrev. | Character Name | Cg | N++ Regex | Char •-------•---------•---------------------------------------------•----•------------------•------- | 00A0 | NBSP | NO-BREAK SPACE | Zs | \x{00A0} | | | | | | | | 00AD | SHY | SOFT HYPHEN | Cf | \x{00AD} | | | | | | | | 061C | ALM | ARABIC LETTER MARK | Cf | \x{061C} | | | | | | | | 070F | SAM | SYRIAC ABBREVIATION MARK | Cf | \x{070F} | | | | | | | | 1680 | OSPM | OGHAM SPACE MARK | Zs | \x{1680} | | | | | | | | 180E | MVS | MONGOLIAN VOWEL SEPARATOR | Cf | \x{180E} | | | | | | | | 2000 | NQSP | EN QUAD | Zs | \x{2000} | | 2001 | MQSP | EM QUAD | Zs | \x{2001} | | 2002 | ENSP | EN SPACE | Zs | \x{2002} | | 2003 | EMSP | EM SPACE | Zs | \x{2003} | | 2004 | 3/MSP | THREE-PER-EM SPACE | Zs | \x{2004} | | 2005 | 4/MSP | FOUR-PER-EM SPACE | Zs | \x{2005} | | 2006 | 6/MSP | SIX-PER-EM SPACE | Zs | \x{2006} | | 2007 | FSP | FIGURE SPACE | Zs | \x{2007} | | 2008 | PSP | PUNCTUATION SPACE | Zs | \x{2008} | | 2009 | THSP | THIN SPACE | Zs | \x{2009} | | 200A | HSP | HAIR SPACE | Zs | \x{200A} | | | | | | | | 200B | ZWSP | ZERO WIDTH SPACE | Cf | \x{200B} | | 200C | ZWNJ | ZERO WIDTH NON-JOINER | Cf | \x{200C} | | 200D | ZWJ | ZERO WIDTH JOINER | Cf | \x{200D} | | 200E | LRM | LEFT-TO-RIGHT MARK | Cf | \x{200E} | | 200F | RLM | RIGHT-TO-LEFT MARK | Cf | \x{200F} | | | | | | | | 202A | LRE | LEFT-TO-RIGHT EMBEDDING | Cf | \x{202A} | | 202B | RLE | RIGHT-TO-LEFT EMBEDDING | Cf | \x{202B} | | 202C | PDF | POP DIRECTIONAL FORMATTING | Cf | \x{202C} | | 202D | LRO | LEFT-TO-RIGHT OVERRIDE | Cf | \x{202D} | | 202E | RLO | RIGHT-TO-LEFT OVERRIDE | Cf | \x{202E} | | | | | | | | 202F | NNBSP | NARROW NO-BREAK SPACE | Zs | \x{202F} | | | | | | | | 205F | MMSP | MEDIUM MATHEMATICAL SPACE | Zs | \x{205F} | | | | | | | | 2060 | WJ | WORD JOINER | Cf | \x{2060} | | | | | | | | 2061 | (FA) | FUNCTION APPLICATION | Cf | \x{2061} | | 2062 | (IT) | INVISIBLE TIMES | Cf | \x{2062} | | 2063 | (IS) | INVISIBLE SEPARATOR | Cf | \x{2063} | | 2064 | (IP) | INVISIBLE PLUS | Cf | \x{2064} | | | | | | | | 2066 | LRI | LEFT-TO-RIGHT ISOLATE | Cf | \x{2066} | | 2067 | RLI | RIGHT-TO-LEFT ISOLATE | Cf | \x{2067} | | 2068 | FSI | FIRST STRONG ISOLATE | Cf | \x{2068} | | 2069 | PDI | POP DIRECTIONAL ISOLATE | Cf | \x{2069} | | 206A | ISS | INHIBIT SYMMETRIC SWAPPING | Cf | \x{206A} | | 206B | ASS | ACTIVATE SYMMETRIC SWAPPING | Cf | \x{206B} | | 206C | IAFS | INHIBIT ARABIC FORM SHAPING | Cf | \x{206C} | | 206D | AAFS | ACTIVATE ARABIC FORM SHAPING | Cf | \x{206D} | | 206E | NADS | NATIONAL DIGIT SHAPES | Cf | \x{206E} | | 206F | NOSP | NOMINAL DIGIT SHAPES | Cf | \x{206F} | | | | | | | | 3000 | IDSP | IDEOGRAPHIC SPACE | Zs | \x{3000} | | | | | | | | FEFF | ZWNBSP | ZERO WIDTH NO-BREAK SPACE / BYTE ORDER MARK | Cf | \x{FEFF} | | | | | | | | FFF9 | IAA | INTERLINEAR ANNOTATION ANCHOR | Cf | \x{FFF9} | | FFFA | IAS | INTERLINEAR ANNOTATION SEPARATOR | Cf | \x{FFFA} | | FFFB | IAT | INTERLINEAR ANNOTATION TERMINATOR | Cf | \x{FFFB} | •-------•---------•---------------------------------------------•----•------------------•------Of course, if my remarks seem pertinent enough to most people, I’ll create a GitHub issue !
Notes :
- Regarding the syntax to use for naming this new N++ feature, I propose this one :
View > Show Symbol > Show BMP Format chars / Non Regular Spaces. Do you like it ?- Regarding the last table, the equivalent regex to mark these
49special characters becomes :
MARK
[\x{00A0}\x{00AD}\x{061C}\x{070F}\x{1680}\x{180E}\x{2000}-\x{200A}\x{200B}-\x{200F}\x{202A}-\x{202E}\x{202F}\x{205F}-\x{206F}\x{3000}\x{FEFF}\x{FFF9}\x{FFFA}\x{FFFB}]Depending of the character marked, you should get, either, an red/orange space for a space character OR a thin red line for all the other characters !
Best Regards,
guy038
P.S. :
To have a overview of all these strange Unicode characters, read this exhaustive article :
https://en.wikipedia.org/wiki/Universal_Character_Set_characters
-
-
Hi, All,
Although my post did not get some positive reviews, I still created an issue on
GiHub!https://github.com/notepad-plus-plus/notepad-plus-plus/issues/13408
Best Regards
guy038
-
@guy038 I’m wondering why IDSP (“Ideographic Space”) has been included in this list. It is not a zero width character, so it is a printable (albeit whitespace) character.
Hello! It looks like you're interested in this conversation, but you don't have an account yet.
Getting fed up of having to scroll through the same posts each visit? When you register for an account, you'll always come back to exactly where you were before, and choose to be notified of new replies (either via email, or push notification). You'll also be able to save bookmarks and upvote posts to show your appreciation to other community members.
With your input, this post could be even better 💗
Register Login