<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Invisible characters unwanted]]></title><description><![CDATA[<p dir="auto">Hello everyone,<br />
Really strange problem with Notepad++ here…<br />
It comes an invisible characters between the others letters at certain point of my html code.<br />
So my code doesn’t work.<br />
Where does it come from ? Do you have a solution such as “search and replace” ?<br />
Check this file below : just before the F letter of FE-000-2. If I try to delete the character ", I have to press 2 times on backspace !<br />
<a href="https://drive.google.com/open?id=0B-tMAt7OX-3OSFNjTFJVNkk3VEU" rel="nofollow ugc">https://drive.google.com/open?id=0B-tMAt7OX-3OSFNjTFJVNkk3VEU</a></p>
<p dir="auto">Thx for help</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/14045/invisible-characters-unwanted</link><generator>RSS for Node</generator><lastBuildDate>Tue, 21 Apr 2026 00:52:26 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/14045.rss" rel="self" type="application/rss+xml"/><pubDate>Wed, 28 Jun 2017 13:18:33 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Invisible characters unwanted on Wed, 15 Feb 2023 02:00:10 GMT]]></title><description><![CDATA[<p dir="auto">Someone else found the solution for me:</p>
<ul>
<li>set the control character’s representation to the “zero-width space” character (so it has no visual component – but it will have a small “inverse video” artifact on-screen)</li>
<li>set the representation appearance to plain text (to avoid the inverse video)</li>
</ul>
<p dir="auto">Although, to script this (as I tend to do) is difficult as the <a href="https://www.scintilla.org/ScintillaDoc.html#SCI_SETREPRESENTATIONAPPEARANCE" rel="nofollow ugc">set-representation-appearance</a> Scintilla command is not yet available as an <code>editor</code> command in the PythonScript versions that are current as of this writing.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/84167</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/84167</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Wed, 15 Feb 2023 02:00:10 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Tue, 14 Feb 2023 12:31:37 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/14479">@Ekopalypse</a> said in <a href="/post/84130">Invisible characters unwanted</a>:</p>
<blockquote>
<p dir="auto">Adhoc, the only thing I can think of is to give the control chars a style and set the visible property to false.<br />
Practically like the error list lexer does with the ANSI sequences.</p>
</blockquote>
<p dir="auto">That sounds like a lot of work, both coding wise and runtime wise.  I was hoping that I was misunderstanding some simple thing about how Scintilla does “representation”, but from lack of replies, and direction of the one reply, it appears I am not missing that “simple thing”.</p>
<p dir="auto">I think it is odd that Scintilla basically forces me to see always see “control characters”, but never shows me certain UTF-8 characters.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/84132</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/84132</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Tue, 14 Feb 2023 12:31:37 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Tue, 14 Feb 2023 07:28:07 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@Alan-Kilborn</a></p>
<p dir="auto">Adhoc, the only thing I can think of is to give the control chars a style and set the <a href="https://www.scintilla.org/ScintillaDoc.html#SCI_STYLESETVISIBLE" rel="nofollow ugc">visible property</a> to false.<br />
Practically like the error list lexer does with the ANSI sequences.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/84130</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/84130</guid><dc:creator><![CDATA[Ekopalypse]]></dc:creator><pubDate>Tue, 14 Feb 2023 07:28:07 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Mon, 13 Feb 2023 14:19:35 GMT]]></title><description><![CDATA[<p dir="auto">This thread largely talks about making “invisible” characters visible.</p>
<p dir="auto">Recently I had a need/desire to make characters with a visible component invisible, i.e. “go the other way”.</p>
<p dir="auto">Example:</p>
<p dir="auto"><img src="/assets/uploads/files/1676291265861-6fe5497f-98c5-4fb6-ba79-6bde61335e33-image.png" alt="6fe5497f-98c5-4fb6-ba79-6bde61335e33-image.png" class=" img-fluid img-markdown" /></p>
<p dir="auto">So for the above, I’d like to “turn off” the control characters and focus only on the “meat” of the data:</p>
<p dir="auto"><img src="/assets/uploads/files/1676292850483-74ab872e-f290-4a56-ab82-26ea1c9609f7-image.png" alt="74ab872e-f290-4a56-ab82-26ea1c9609f7-image.png" class=" img-fluid img-markdown" /></p>
<p dir="auto">This is only for visualization purposes; I’m not editing this data.</p>
<p dir="auto">Long story short is I ran into trouble doing this.  I found I can change the representation of the characters, using PythonScript, e.g., executing <code>editor.setRepresentation(u'\u0002', 'startxmit')</code> will then show:</p>
<p dir="auto"><img src="/assets/uploads/files/1676292415062-365b965d-ba6e-45ef-bef8-23d31d741b6c-image.png" alt="365b965d-ba6e-45ef-bef8-23d31d741b6c-image.png" class=" img-fluid img-markdown" /></p>
<p dir="auto">But my guess at what was needed to hide the U+0002 character entirely didn’t work:</p>
<p dir="auto"><code>editor.setRepresentation(u'\u0002', '')</code></p>
<p dir="auto">produces:</p>
<p dir="auto"><img src="/assets/uploads/files/1676292520820-b7be4360-4ac2-4a82-83d1-915ef01aba54-image.png" alt="b7be4360-4ac2-4a82-83d1-915ef01aba54-image.png" class=" img-fluid img-markdown" /></p>
<p dir="auto">which is “better” but still has a visual component.</p>
<p dir="auto"><code>editor.clearRepresentation(u'\u0002')</code> only brings back the default visualizaton of <code>STX</code>.</p>
<p dir="auto">Any ideas on how to solve this, i.e., eliminate the visual component of a control character?</p>
]]></description><link>https://community.notepad-plus-plus.org/post/84104</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/84104</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Mon, 13 Feb 2023 14:19:35 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Sat, 04 Mar 2023 22:18:19 GMT]]></title><description><![CDATA[<p dir="auto">Hi <strong>All</strong>,</p>
<p dir="auto">Continuation …</p>
<p dir="auto">From this <strong>last</strong> table, we can reasonably <strong>ignore</strong> :</p>
<ul>
<li>
<p dir="auto">The <strong>space</strong> and <strong>soft hyphen</strong> characters</p>
</li>
<li>
<p dir="auto">Some <strong>musical</strong> characters</p>
</li>
<li>
<p dir="auto">All the characters, <strong>specific</strong> to a language, modern or archaic</p>
</li>
<li>
<p dir="auto">The <strong>tag</strong> characters, whose usage is strongly <strong>discouraged</strong> by the <strong><code>Unicode consortium</code></strong>.</p>
</li>
</ul>
<p dir="auto">In other words, all characters with the <strong>No</strong> indication, in the <strong><code>To search</code></strong> column, of the <strong>previous</strong> table !</p>
<p dir="auto">As a result, we should <strong>only</strong> take care of this <strong>restricted</strong> list of <strong><code>50</code></strong> characters :</p>
<pre><code class="language-z">•-------•---------•---------------------------------------------•----•------------------•-----•
| Code  | Abbrev. |           Character Name                    | Cg |    N++  Regex    | Chr |
•-------•---------•---------------------------------------------•----•------------------•-----•
|  00A0 |  NBSP   | NO-BREAK SPACE                              | Zs |     \x{00A0}     |     |
|       |         |                                             |    |                  |     |
|  2000 |  NQSP   | EN QUAD                                     | Zs |     \x{2000}     |     |
|  2001 |  MQSP   | EM QUAD                                     | Zs |     \x{2001}     |     |
|  2002 |  ENSP   | EN SPACE                                    | Zs |     \x{2002}     |     |
|  2003 |  EMSP   | EM SPACE                                    | Zs |     \x{2003}     |     |
|  2004 |  3/MSP  | THREE-PER-EM SPACE                          | Zs |     \x{2004}     |     |
|  2005 |  4/MSP  | FOUR-PER-EM SPACE                           | Zs |     \x{2005}     |     |
|  2006 |  6/MSP  | SIX-PER-EM SPACE                            | Zs |     \x{2006}     |     |
|  2007 |  FSP    | FIGURE SPACE                                | Zs |     \x{2007}     |     |
|  2008 |  PSP    | PUNCTUATION SPACE                           | Zs |     \x{2008}     |     |
|  2009 |  THSP   | THIN SPACE                                  | Zs |     \x{2009}     |     |
|  200A |  HSP    | HAIR SPACE                                  | Zs |     \x{200A}     |     |
|       |         |                                             |    |                  |     |
|  200B |  ZWSP   | ZERO WIDTH SPACE                            | Cf |     \x{200B}     |  ​  |
|  200C |  ZWNJ   | ZERO WIDTH NON-JOINER                       | Cf |     \x{200C}     |  ‌  |
|  200D |  ZWJ    | ZERO WIDTH JOINER                           | Cf |     \x{200D}     |  ‍  |
|  200E |  LRM    | LEFT-TO-RIGHT MARK                          | Cf |     \x{200E}     |  ‎  |
|  200F |  RLM    | RIGHT-TO-LEFT MARK                          | Cf |     \x{200F}     |  ‏  |
|  202A |  LRE    | LEFT-TO-RIGHT EMBEDDING                     | Cf |     \x{202A}     |  ‪  |
|  202B |  RLE    | RIGHT-TO-LEFT EMBEDDING                     | Cf |     \x{202B}     |  ‫  |
|  202C |  PDF    | POP DIRECTIONAL FORMATTING                  | Cf |     \x{202C}     |  ‬  |
|  202D |  LRO    | LEFT-TO-RIGHT OVERRIDE                      | Cf |     \x{202D}     |  ‭  |
|  202E |  RLO    | RIGHT-TO-LEFT OVERRIDE                      | Cf |     \x{202E}     |  ‮  |
|       |         |                                             |    |                  |     |
|  202F |  NNBSP  | NARROW NO-BREAK SPACE                       | Zs |     \x{202F}     |     |
|       |         |                                             |    |                  |     |
|  205F |  MMSP   | MEDIUM MATHEMATICAL SPACE                   | Zs |     \x{205F}     |     |
|       |         |                                             |    |                  |     |
|  2060 |  WJ     | WORD JOINER                                 | Cf |     \x{2060}     |  ⁠  |
|       |         |                                             |    |                  |     |
|  2061 | (FA)    | FUNCTION APPLICATION                        | Cf |     \x{2061}     |  ⁡  |
|  2062 | (IT)    | INVISIBLE TIMES                             | Cf |     \x{2062}     |  ⁢  |
|  2063 | (IS)    | INVISIBLE SEPARATOR                         | Cf |     \x{2063}     |  ⁣  |
|  2064 | (IP)    | INVISIBLE PLUS                              | Cf |     \x{2064}     |  ⁤  |
|       |         |                                             |    |                  |     |
|  2066 |  LRI    | LEFT-TO-RIGHT ISOLATE                       | Cf |     \x{2066}     |  ⁦  |
|  2067 |  RLI    | RIGHT-TO-LEFT ISOLATE                       | Cf |     \x{2067}     |  ⁧  |
|  2068 |  FSI    | FIRST STRONG ISOLATE                        | Cf |     \x{2068}     |  ⁨  |
|  2069 |  PDI    | POP DIRECTIONAL ISOLATE                     | Cf |     \x{2069}     |  ⁩  |
|  206A |  ISS    | INHIBIT SYMMETRIC SWAPPING                  | Cf |     \x{206A}     |  ⁪  |
|  206B |  ASS    | ACTIVATE SYMMETRIC SWAPPING                 | Cf |     \x{206B}     |  ⁫  |
|  206C |  IAFS   | INHIBIT ARABIC FORM SHAPING                 | Cf |     \x{206C}     |  ⁬  |
|  206D |  AAFS   | ACTIVATE ARABIC FORM SHAPING                | Cf |     \x{206D}     |  ⁭  |
|  206E |  NADS   | NATIONAL DIGIT SHAPES                       | Cf |     \x{206E}     |  ⁮  |
|  206F |  NOSP   | NOMINAL DIGIT SHAPES                        | Cf |     \x{206F}     |  ⁯  |
|       |         |                                             |    |                  |     |
|  3000 |  IDSP   | IDEOGRAPHIC SPACE                           | Zs |     \x{3000}     |  　  |
|       |         |                                             |    |                  |     |
|  FEFF |  ZWNBSP | ZERO WIDTH NO-BREAK SPACE / BYTE ORDER MARK | Cf |     \x{FEFF}     |  ﻿  |
|       |         |                                             |    |                  |     |
|  FFF9 |  IAA    | INTERLINEAR ANNOTATION ANCHOR               | Cf |     \x{FFF9}     |  ￹  |
|  FFFA |  IAS    | INTERLINEAR ANNOTATION SEPARATOR            | Cf |     \x{FFFA}     |  ￺  |
|  FFFB |  IAT    | INTERLINEAR ANNOTATION TERMINATOR           | Cf |     \x{FFFB}     |  ￻  |
|       |         |                                             |    |                  |     |
|  FFFC |  OBJ    | OBJECT REPLACEMENT CHARACTER                | So |     \x{FFFC}     |  ￼  |
|  FFFD |  ?      | REPLACEMENT CHARACTER                       | So |     \x{FFFD}     |  �  |
|       |         |                                             |    |                  |     |
| 1BCA0 | (SFLO)  | SHORTHAND FORMAT LETTER OVERLAP             | Cf | \x{D82F}\x{DCA0} |  𛲠  |
| 1BCA1 | (SFCO)  | SHORTHAND FORMAT CONTINUING OVERLAP         | Cf | \x{D82F}\x{DCA1} |  𛲡  |
| 1BCA2 | (SFDS)  | SHORTHAND FORMAT DOWN STEP                  | Cf | \x{D82F}\x{DCA2} |  𛲢  |
| 1BCA3 | (SFUS)  | SHORTHAND FORMAT UP STEP                    | Cf | \x{D82F}\x{DCA3} |  𛲣  |
•-------•---------•---------------------------------------------•----•------------------•-----•
</code></pre>
<hr />
<p dir="auto">Remark that I <strong>added</strong>, to that list, the <strong>two</strong> characters <strong>Object Replacement Character</strong> <strong><code>\x{FFFC}</code></strong> and <strong>Replacement Character</strong> <strong><code>\x{FFFD}</code></strong> often used in case of <strong>encoding</strong> problems !</p>
<hr />
<p dir="auto">Then the <strong>updated</strong> <strong><code>Mark</code></strong> regex would be :</p>
<p dir="auto">MARK <strong><code>[\x{00A0}\x{2000}-\x{200A}\x{200B}-\x{200F}\x{202A}-\x{202E}\x{202F}\x{205F}-\x{206F}\x{3000}\x{FEFF}\x{FFF9}-\x{FFFD}\x{D82F}\x{DCA0}\x{D82F}\x{DCA1}\x{D82F}\x{DCA2}\x{D82F}\x{DCA3}]</code></strong></p>
<p dir="auto">And the <strong>updated</strong> <strong><code>Python</code></strong> script is :</p>
<pre><code class="language-py"># -*- coding: utf-8 -*-

from Npp import editor, notepad, NOTIFICATION

class SRFSC(object):

    def __init__(self):
        notepad.callback(self.callback_npp_BUFFERACTIVATED, [NOTIFICATION.BUFFERACTIVATED])
        self.callback_npp_BUFFERACTIVATED(None)

    def callback_npp_BUFFERACTIVATED(self, args):

        # SPACE chars ( Zs )

        editor.setRepresentation(u'\u00A0', "NBSP")    # no-break space

        editor.setRepresentation(u'\u2000', "NQSP")    # EN quad
        editor.setRepresentation(u'\u2001', "MQSP")    # EM quad
        editor.setRepresentation(u'\u2002', "ENSP")    # EN space
        editor.setRepresentation(u'\u2003', "EMSP")    # EN space

        editor.setRepresentation(u'\u2004', "3/MSP")   # three-per-EM space
        editor.setRepresentation(u'\u2005', "4/MSP")   # four-per-EM space
        editor.setRepresentation(u'\u2006', "6/MSP")   # six-per-EM space

        editor.setRepresentation(u'\u2007', "FSP")     # figure space
        editor.setRepresentation(u'\u2008', "PSP")     # punctuation space
        editor.setRepresentation(u'\u2009', "THSP")    # thin space
        editor.setRepresentation(u'\u200A', "HSP")     # hair space

        # FORMAT chars ( Cf )

        editor.setRepresentation(u'\u200B', "ZWSP")    # zero width space

        editor.setRepresentation(u'\u200C', "ZWNJ")    # zero width non-joiner
        editor.setRepresentation(u'\u200D', "ZWJ")     # zero width joiner

        editor.setRepresentation(u'\u200E', "LRM")     # left-to-right mark
        editor.setRepresentation(u'\u200F', "RLM")     # right-to-left mark

        editor.setRepresentation(u'\u202A', "LRE")     # left-to-right embedding
        editor.setRepresentation(u'\u202B', "RLE")     # right-to-left embedding

        editor.setRepresentation(u'\u202C', "PDF")     # pop directional formatting

        editor.setRepresentation(u'\u202D', "LRO")     # left-to-right override
        editor.setRepresentation(u'\u202E', "RLO")     # right-to-left override

        # SPACE chars ( Zs )

        editor.setRepresentation(u'\u202F', "NNBSP")   # narrow no-break space

        editor.setRepresentation(u'\u205F', "NNBSP")   # medium mathematical space


        # FORMAT chars ( Cf )

        editor.setRepresentation(u'\u2060', "WJ")      # word joiner ( zero width no-break space )

        editor.setRepresentation(u'\u2061', "FA")      # function application

        editor.setRepresentation(u'\u2062', "IT")      # invisible times
        editor.setRepresentation(u'\u2063', "IS")      # invisible separator
        editor.setRepresentation(u'\u2064', "IP")      # invisible plus

        editor.setRepresentation(u'\u2066', "LRI")     # left-to-right isolate
        editor.setRepresentation(u'\u2067', "RLI")     # right-to-left isolate

        editor.setRepresentation(u'\u2068', "FSI")     # first strong isolate
        editor.setRepresentation(u'\u2069', "PDI")     # pop directional isolate

        # FORMAT chars ( Cf ) DEPRECATED

        editor.setRepresentation(u'\u206A', "ISS")     # inhibit symmetric swapping
        editor.setRepresentation(u'\u206B', "ASS")     # activate symmetric swapping

        editor.setRepresentation(u'\u206C', "IAFS")    # inhibit arabic form shaping
        editor.setRepresentation(u'\u206D', "AAFS")    # activate arabic form shaping

        editor.setRepresentation(u'\u206E', "NADS")    # national digit shapes
        editor.setRepresentation(u'\u206F', "NODS")    # nominal digit shapes

        # SPACE chars ( Zs )

        editor.setRepresentation(u'\u3000', "IDSP")    # ideographic space

        # FORMAT chars ( Cf ) SPECIALS

        editor.setRepresentation(u'\uFEFF', "ZWNBSP")  # zero width no-break space : deprecated ( see U+2060 ) / byte order mark

        editor.setRepresentation(u'\uFFF9', "IAA")     # interlinear annotation anchor
        editor.setRepresentation(u'\uFFFA', "IAS")     # interlinear annotation separator
        editor.setRepresentation(u'\uFFFB', "IAT")     # interlinear annotation terminator

        # OTHER symbols ( So )

        editor.setRepresentation(u'\uFFFC', "OBJ")     # object replacement character
        editor.setRepresentation(u'\uFFFD', "&lt;?&gt;")     # replacement character

        # FORMAT chars ( Cf )

        # For characters OVER the BMP, with code &gt; FFFF, we can use, EITHER, the syntaxes :

        #    - editor.setRepresentation(u'\U0001BCA0', "SFLO")    TRUE "32-bits" representation
        #    - editor.setRepresentation(u'\uD82F\uDCA0', "SFLO")  The   16-bits "SURROGATES PAIR"

        editor.setRepresentation(u'\uD82F\uDCA0', "SFLO")   # shorthand format letter overlap
        editor.setRepresentation(u'\uD82F\uDCA1', "SFCO")   # shorthand format continuing overlap
        editor.setRepresentation(u'\uD82F\uDCA2', "SFDS")   # shorthand format down step
        editor.setRepresentation(u'\uD82F\uDCA3', "SFUS")   # shorthand format up step

        # Active the character representation

        notepad.menuCommand(MENUCOMMAND.VIEW_ALL_CHARACTERS)
        notepad.menuCommand(MENUCOMMAND.VIEW_ALL_CHARACTERS)

SRFSC()
</code></pre>
<hr />
<p dir="auto">I did <strong>not</strong> investigate in the <strong>S/R</strong>, because of the number of chars to handle ( <strong><code>50</code></strong> ) and because I’m just feeling… lazy for such a task !</p>
<p dir="auto">However with the <strong><code>Mark</code></strong> operation, which helps you to locate <strong>exactly</strong> where are these <strong>special</strong> characters and the <strong><code>Python</code></strong> script which clearly <strong>identify</strong> them, you should be <strong>safe</strong> with your file’s contents  ;-))</p>
<hr />
<p dir="auto"><img src="/assets/uploads/files/1611533519137-788ae4c4-e37b-4e50-83a5-285c9b12975d-image.png" alt="788ae4c4-e37b-4e50-83a5-285c9b12975d-image.png" class=" img-fluid img-markdown" /></p>
<p dir="auto">Best Regards</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/62170</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/62170</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sat, 04 Mar 2023 22:18:19 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Sat, 04 Mar 2023 22:17:03 GMT]]></title><description><![CDATA[<p dir="auto">Hi, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@alan-kilborn</a>, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@peterjones</a> and <strong>All</strong>,</p>
<p dir="auto">From the <strong><code>UnicodeData.txt</code></strong> file    <a href="https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt" rel="nofollow ugc">https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt</a></p>
<p dir="auto">I just <strong>extract</strong> the lines relative to <strong>any</strong> character which has a <strong>General Category</strong> property <strong><code>Cf</code></strong> or <strong><code>Zs</code></strong> ( <strong><code>3rd</code></strong> <strong>field</strong> of the list ), i.e. the <strong><code>Format_Control</code></strong> and <strong><code>Space_Separator</code></strong> characters. Indeed, depending of the <strong>current</strong> font used, these characters may be :</p>
<ul>
<li>
<p dir="auto"><strong>Not</strong> displayed at all ( Invisible )</p>
</li>
<li>
<p dir="auto">Displayed as a <strong>question mark</strong>, inside a <strong>square</strong> or <strong>lozenge</strong></p>
</li>
<li>
<p dir="auto">Displayed as a <strong>white square</strong> box</p>
</li>
<li>
<p dir="auto">Displayed with a <strong>wrong</strong> width ( case of a <strong>space</strong> char )</p>
</li>
</ul>
<hr />
<p dir="auto">As a <strong>remainder</strong>, below, here is a table, giving <strong>all</strong> values of the Unicode <strong><code>General Category</code></strong> property :</p>
<pre><code class="language-z">GENERAL_CATEGORY Values :

•----•-----------------------•------------------------------------------------------
| Abv|         Value         |                     Description
•----•-----------------------•------------------------------------------------------
| Lu | Uppercase_Letter      | an UPPERCASE letter
| Ll | Lowercase_Letter      | a LOWERCASE letter
| Lt | Titlecase_Letter      | A DI-GRAPHIC character, with first part UPPERCASE
|    |                       |
| LC | Cased_Letter          | Lu | Ll | Lt
|    |                       |
| Lm | Modifier_Letter       | A MODIFIER letter
| Lo | Other_Letter          | Other letters, including SYLLABLES and IDEOGRAPHS
|    |                       |
| L  | Letter                | Lu | Ll | Lt | Lm | Lo
|    |                       |
| Mn | Nonspacing_Mark       | A NON-SPACING COMBINING mark (zero advance width)
| Mc | Spacing_Mark          | A SPACING COMBINING mark (positive advance width)
| Me | Enclosing_Mark        | An ENCLOSING COMBINING mark
|    |                       |
| M  | Mark                  | Mn | Mc | Me
|    |                       |
| Nd | Decimal_Number        | A DECIMAL digit
| Nl | Letter_Number         | A LETTERLIKE numeric character
| No | Other_Number          | A NUMERIC character of other type
|    |                       |
| N  | Number                | Nd | Nl | No
|    |                       |
| Pc | Connector_Punctuation | A CONNECTING PUNCTUATION mark, like a tie
| Pd | Dash_Punctuation      | A DASH or HYPHEN punctuation mark
| Ps | Open_Punctuation      | An OPENING PUNCTUATION mark (of a pair)
| Pe | Close_Punctuation     | A CLOSING PUNCTUATION mark (of a pair)
| Pi | Initial_Punctuation   | An INITIAL QUOTATION mark
| Pf | Final_Punctuation     | A FINAL QUOTATION mark
| Po | Other_Punctuation     | A PUNCTUATION mark of other type
|    |                       |
| P  | Punctuation           | Pc | Pd | Ps | Pe | Pi | Pf | Po
|    |                       |
| Sm | Math_Symbol           | A symbol of MATHEMATICAL use
| Sc | Currency_Symbol       | A CURRENCY sign
| Sk | Modifier_Symbol       | A NON-LETTERLIKE MODIFIER symbol
| So | Other_Symbol          | A SYMBOL of other type
|    |                       |
| S  | Symbol                | Sm | Sc | Sk | So
|    |                       |
| Zs | Space_Separator       | A SPACE character ( of various NON-ZERO width )
| Zl | Line_Separator        | U+2028 LINE SEPARATOR only
| Zp | Paragraph_Separator   | U+2029 PARAGRAPH SEPARATOR only
|    |                       |
| Z  | Separator             | Zs | Zl | Zp
|    |                       |
| Cc | Control               | A C0 or C1 CONTROL code
| Cf | Format                | A FORMAT CONTROL character
| Cs | Surrogate             | A SURROGATE code point
| Co | Private_Use           | A PRIVATE-USE character
| Cn | Unassigned            | A reserved UNASSIGNED code point or a NON-CHARACTER
|    |                       |
| C  | Other                 | Cc | Cf | Cs | Co | Cn
•----•-----------------------•------------------------------------------------------
</code></pre>
<hr />
<p dir="auto">And, here is the table of these <strong><code>178</code></strong> <strong>extracted</strong> characters :</p>
<pre><code class="language-z">•-------•---------•---------------------------------------------•----•-----------•------------------•-------
| Code  | Abbrev. |           Character Name                    | Cg | To search |    N++  Regex    | Char
•-------•---------•---------------------------------------------•----•-----------•------------------•-------
|  0020 |  SP     | SPACE                                       | Zs |    No     |     \x{0020}     |
|       |         |                                             |    |           |                  |
|  00A0 |  NBSP   | NO-BREAK SPACE                              | Zs |    Yes    |     \x{00A0}     |   
|       |         |                                             |    |           |                  |
|  00AD |  SHY    | SOFT HYPHEN                                 | Cf |    Yes    |     \x{00AD}     |  ­
|       |         |                                             |    |           |                  |
|  0600 |         | ARABIC NUMBER SIGN                          | Cf |    No     |     \x{0600}     |  ؀
|  0601 |         | ARABIC SIGN SANAH                           | Cf |    No     |     \x{0601}     |  ؁
|  0602 |         | ARABIC FOOTNOTE MARKER                      | Cf |    No     |     \x{0602}     |  ؂
|  0603 |         | ARABIC SIGN SAFHA                           | Cf |    No     |     \x{0603}     |  ؃
|  0604 |         | ARABIC SIGN SAMVAT                          | Cf |    No     |     \x{0604}     |  ؄
|  0605 |         | ARABIC NUMBER MARK ABOVE                    | Cf |    No     |     \x{0605}     |  ؅
|  061C |  ALM    | ARABIC LETTER MARK                          | Cf |    No     |     \x{061C}     |  ؜
|  06DD |         | ARABIC END OF AYAH                          | Cf |    No     |     \x{06DD}     |  ۝
|       |         |                                             |    |           |                  |
|  070F |  SAM    | SYRIAC ABBREVIATION MARK                    | Cf |    No     |     \x{070F}     |  ܏
|       |         |                                             |    |           |                  |
|  08E2 |         | ARABIC DISPUTED END OF AYAH                 | Cf |    No     |     \x{08E2}     |  ࣢
|       |         |                                             |    |           |                  |
|  1680 |         | OGHAM SPACE MARK                            | Zs |    No     |     \x{1680}     |   
|       |         |                                             |    |           |                  |
|  180E |  MVS    | MONGOLIAN VOWEL SEPARATOR                   | Cf |    No     |     \x{180E}     |  ᠎
|       |         |                                             |    |           |                  |
|  2000 |  NQSP   | EN QUAD                                     | Zs |    Yes    |     \x{2000}     |   
|  2001 |  MQSP   | EM QUAD                                     | Zs |    Yes    |     \x{2001}     |   
|  2002 |  ENSP   | EN SPACE                                    | Zs |    Yes    |     \x{2002}     |   
|  2003 |  EMSP   | EM SPACE                                    | Zs |    Yes    |     \x{2003}     |   
|  2004 |  3/MSP  | THREE-PER-EM SPACE                          | Zs |    Yes    |     \x{2004}     |   
|  2005 |  4/MSP  | FOUR-PER-EM SPACE                           | Zs |    Yes    |     \x{2005}     |   
|  2006 |  6/MSP  | SIX-PER-EM SPACE                            | Zs |    Yes    |     \x{2006}     |   
|  2007 |  FSP    | FIGURE SPACE                                | Zs |    Yes    |     \x{2007}     |   
|  2008 |  PSP    | PUNCTUATION SPACE                           | Zs |    Yes    |     \x{2008}     |   
|  2009 |  THSP   | THIN SPACE                                  | Zs |    Yes    |     \x{2009}     |   
|  200A |  HSP    | HAIR SPACE                                  | Zs |    Yes    |     \x{200A}     |   
|       |         |                                             |    |           |                  |
|  200B |  ZWSP   | ZERO WIDTH SPACE                            | Cf |    Yes    |     \x{200B}     |  ​
|  200C |  ZWNJ   | ZERO WIDTH NON-JOINER                       | Cf |    Yes    |     \x{200C}     |  ‌
|  200D |  ZWJ    | ZERO WIDTH JOINER                           | Cf |    Yes    |     \x{200D}     |  ‍
|  200E |  LRM    | LEFT-TO-RIGHT MARK                          | Cf |    Yes    |     \x{200E}     |  ‎
|  200F |  RLM    | RIGHT-TO-LEFT MARK                          | Cf |    Yes    |     \x{200F}     |  ‏
|  202A |  LRE    | LEFT-TO-RIGHT EMBEDDING                     | Cf |    Yes    |     \x{202A}     |  ‪
|  202B |  RLE    | RIGHT-TO-LEFT EMBEDDING                     | Cf |    Yes    |     \x{202B}     |  ‫
|  202C |  PDF    | POP DIRECTIONAL FORMATTING                  | Cf |    Yes    |     \x{202C}     |  ‬
|  202D |  LRO    | LEFT-TO-RIGHT OVERRIDE                      | Cf |    Yes    |     \x{202D}     |  ‭
|  202E |  RLO    | RIGHT-TO-LEFT OVERRIDE                      | Cf |    Yes    |     \x{202E}     |  ‮
|       |         |                                             |    |           |                  |
|  202F |  NNBSP  | NARROW NO-BREAK SPACE                       | Zs |    Yes    |     \x{202F}     |   
|       |         |                                             |    |           |                  |
|  205F |  MMSP   | MEDIUM MATHEMATICAL SPACE                   | Zs |    Yes    |     \x{205F}     |   
|       |         |                                             |    |           |                  |
|  2060 |  WJ     | WORD JOINER                                 | Cf |    Yes    |     \x{2060}     |  ⁠
|       |         |                                             |    |           |                  |
|  2061 | (FA)    | FUNCTION APPLICATION                        | Cf |    Yes    |     \x{2061}     |  ⁡
|  2062 | (IT)    | INVISIBLE TIMES                             | Cf |    Yes    |     \x{2062}     |  ⁢
|  2063 | (IS)    | INVISIBLE SEPARATOR                         | Cf |    Yes    |     \x{2063}     |  ⁣
|  2064 | (IP)    | INVISIBLE PLUS                              | Cf |    Yes    |     \x{2064}     |  ⁤
|       |         |                                             |    |           |                  |
|  2066 |  LRI    | LEFT-TO-RIGHT ISOLATE                       | Cf |    Yes    |     \x{2066}     |  ⁦
|  2067 |  RLI    | RIGHT-TO-LEFT ISOLATE                       | Cf |    Yes    |     \x{2067}     |  ⁧
|  2068 |  FSI    | FIRST STRONG ISOLATE                        | Cf |    Yes    |     \x{2068}     |  ⁨
|  2069 |  PDI    | POP DIRECTIONAL ISOLATE                     | Cf |    Yes    |     \x{2069}     |  ⁩
|  206A |  ISS    | INHIBIT SYMMETRIC SWAPPING                  | Cf |    Yes    |     \x{206A}     |  ⁪
|  206B |  ASS    | ACTIVATE SYMMETRIC SWAPPING                 | Cf |    Yes    |     \x{206B}     |  ⁫
|  206C |  IAFS   | INHIBIT ARABIC FORM SHAPING                 | Cf |    Yes    |     \x{206C}     |  ⁬
|  206D |  AAFS   | ACTIVATE ARABIC FORM SHAPING                | Cf |    Yes    |     \x{206D}     |  ⁭
|  206E |  NADS   | NATIONAL DIGIT SHAPES                       | Cf |    Yes    |     \x{206E}     |  ⁮
|  206F |  NOSP   | NOMINAL DIGIT SHAPES                        | Cf |    Yes    |     \x{206F}     |  ⁯
|       |         |                                             |    |           |                  |
|  3000 |  IDSP   | IDEOGRAPHIC SPACE                           | Zs |    Yes    |     \x{3000}     |  　
|       |         |                                             |    |           |                  |
|  FEFF |  ZWNBSP | ZERO WIDTH NO-BREAK SPACE / BYTE ORDER MARK | Cf |    Yes    |     \x{FEFF}     |  ﻿
|       |         |                                             |    |           |                  |
|  FFF9 |  IAA    | INTERLINEAR ANNOTATION ANCHOR               | Cf |    Yes    |     \x{FFF9}     |  ￹
|  FFFA |  IAS    | INTERLINEAR ANNOTATION SEPARATOR            | Cf |    Yes    |     \x{FFFA}     |  ￺
|  FFFB |  IAT    | INTERLINEAR ANNOTATION TERMINATOR           | Cf |    Yes    |     \x{FFFB}     |  ￻
|       |         |                                             |    |           |                  |
| 110BD |  (KNS)  | KAITHI NUMBER SIGN                          | Cf |    No     | \x{D804}\x{DCBD} |  𑂽
| 110CD | (KNSA)  | KAITHI NUMBER SIGN ABOVE                    | Cf |    No     | \x{D804}\x{DCCD} |  𑃍
|       |         |                                             |    |           |                  |
| 13430 | (EHVJ)  | EGYPTIAN HIEROGLYPH VERTICAL JOINER         | Cf |    No     | \x{D80D}\x{DC30} |  𓐰
| 13431 | (EHHJ)  | EGYPTIAN HIEROGLYPH HORIZONTAL JOINER       | Cf |    No     | \x{D80D}\x{DC31} |  𓐱
| 13432 | (EHITS) | EGYPTIAN HIEROGLYPH INSERT AT TOP START     | Cf |    No     | \x{D80D}\x{DC32} |  𓐲
| 13433 | (EHIBS) | EGYPTIAN HIEROGLYPH INSERT AT BOTTOM START  | Cf |    No     | \x{D80D}\x{DC33} |  𓐳
| 13434 | (EHITE) | EGYPTIAN HIEROGLYPH INSERT AT TOP END       | Cf |    No     | \x{D80D}\x{DC34} |  𓐴
| 13435 | (EHIBE) | EGYPTIAN HIEROGLYPH INSERT AT BOTTOM END    | Cf |    No     | \x{D80D}\x{DC35} |  𓐵
| 13436 | (EHOM)  | EGYPTIAN HIEROGLYPH OVERLAY MIDDLE          | Cf |    No     | \x{D80D}\x{DC36} |  𓐶
| 13437 | (EHBS)  | EGYPTIAN HIEROGLYPH BEGIN SEGMENT           | Cf |    No     | \x{D80D}\x{DC37} |  𓐷
| 13438 | (EHES)  | EGYPTIAN HIEROGLYPH END SEGMENT             | Cf |    No     | \x{D80D}\x{DC38} |  𓐸
|       |         |                                             |    |           |                  |
| 1BCA0 | (SFLO)  | SHORTHAND FORMAT LETTER OVERLAP             | Cf |    Yes    | \x{D82F}\x{DCA0} |  𛲠
| 1BCA1 | (SFCO)  | SHORTHAND FORMAT CONTINUING OVERLAP         | Cf |    Yes    | \x{D82F}\x{DCA1} |  𛲡
| 1BCA2 | (SFDS)  | SHORTHAND FORMAT DOWN STEP                  | Cf |    Yes    | \x{D82F}\x{DCA2} |  𛲢
| 1BCA3 | (SFUS)  | SHORTHAND FORMAT UP STEP                    | Cf |    Yes    | \x{D82F}\x{DCA3} |  𛲣
|       |         |                                             |    |           |                  |
| 1D173 | (MSBB)  | MUSICAL SYMBOL BEGIN BEAM                   | Cf |    No     | \x{D834}\x{DD73} |  𝅳
| 1D174 | (MSEB)  | MUSICAL SYMBOL END BEAM                     | Cf |    No     | \x{D834}\x{DD74} |  𝅴
| 1D175 | (MSBT)  | MUSICAL SYMBOL BEGIN TIE                    | Cf |    No     | \x{D834}\x{DD75} |  𝅵
| 1D176 | (MSET)  | MUSICAL SYMBOL END TIE                      | Cf |    No     | \x{D834}\x{DD76} |  𝅶
| 1D177 | (MSBS)  | MUSICAL SYMBOL BEGIN SLUR                   | Cf |    No     | \x{D834}\x{DD77} |  𝅷
| 1D178 | (MSES)  | MUSICAL SYMBOL END SLUR                     | Cf |    No     | \x{D834}\x{DD78} |  𝅸
| 1D179 | (MSBP)  | MUSICAL SYMBOL BEGIN PHRASE                 | Cf |    No     | \x{D834}\x{DD79} |  𝅹
| 1D17A | (MSEP)  | MUSICAL SYMBOL END PHRASE                   | Cf |    No     | \x{D834}\x{DD7A} |  𝅺
|       |         |                                             |    |           |                  |
| E0001 |  BEGIN  | LANGUAGE TAG                                | Cf |    No     | \x{DB40}\x{DC01} |  󠀁
| E0020 |  SP     | TAG SPACE                                   | Cf |    No     | \x{DB40}\x{DC20} |  󠀠
| ..... |         | ........................................... | .. |    No     | ................ |  .
| ..... |         | ........................................... | .. |    No     | ................ |  .
| ..... |         | ........................................... | .. |    No     | ................ |  .
| E007E |  ~      | TAG TILDE                                   | Cf |    No     | \x{DB40}\x{DC7E} |  󠁾
| E007F |  END    | CANCEL TAG                                  | Cf |    No     | \x{DB40}\x{DC7F} |  󠁿
•-------•---------•---------------------------------------------•----•-----------•------------------•------
</code></pre>
<p dir="auto">Continuation on <strong>next</strong> post !</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/62169</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/62169</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sat, 04 Mar 2023 22:17:03 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Sat, 23 Jan 2021 12:42:40 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/195">@guy038</a></p>
<p dir="auto">Is it your intent with your last posting to say that, with the PDFs from <a href="http://unicode.org" rel="nofollow ugc">unicode.org</a>, we now have a “complete” list of invisible characters, and a script can be made that covers them all, using correct abbreviations in their N++ representations?</p>
<p dir="auto">At first look, it seems that anything in those docs that is shown inside a “dashed box”, e.g.:</p>
<p dir="auto"><img src="/assets/uploads/files/1611405488990-007dbf53-1422-4f86-9d1c-6023f2137ee3-image.png" alt="007dbf53-1422-4f86-9d1c-6023f2137ee3-image.png" class=" img-fluid img-markdown" /></p>
<p dir="auto">is a good candidate for a new representation being assigned in a N++ script like the ones above?</p>
<p dir="auto">If this is the case, I’m surprised that in the script you presented, not all of the seemingly invisible characters from the documents are in the script.</p>
<p dir="auto"><strong>EDIT:</strong>  Hmm, not sure now about the “dashed box” as I just noticed some dashed boxes in the doc containing things like <code>,</code> and <code>+</code> , so probably the dashed box does not truly identify something as an “invisible character”.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/62120</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/62120</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Sat, 23 Jan 2021 12:42:40 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Fri, 22 Jan 2021 18:15:08 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@peterjones</a>, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@alan-kilborn</a> and <strong>All</strong>,</p>
<p dir="auto">From these <strong>two</strong> links :</p>
<p dir="auto"><a href="https://www.unicode.org/charts/PDF/U2000.pdf" rel="nofollow ugc">https://www.unicode.org/charts/PDF/U2000.pdf</a></p>
<p dir="auto"><a href="https://www.unicode.org/charts/PDF/UFE70.pdf" rel="nofollow ugc">https://www.unicode.org/charts/PDF/UFE70.pdf</a></p>
<p dir="auto">I just rewrote these <strong><code>26</code></strong> <strong>special</strong> characters :</p>
<ul>
<li>
<p dir="auto">By <strong>increasing</strong> Unicode <strong>code-point</strong> order</p>
</li>
<li>
<p dir="auto">With their <strong>exact</strong> code-points ( some <strong>typos</strong> corrected )</p>
</li>
<li>
<p dir="auto">With their <strong>normalized</strong> Unicode character <strong>representation</strong></p>
</li>
</ul>
<hr />
<p dir="auto">So, here is a <strong>new</strong> version of the <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@alan-kilborn</a>’s <strong><code>SetRepresentationForSpecialCharacters.py</code></strong> file, with the <strong>merged</strong> lines from the <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@peterjones</a>’s script, <strong>without</strong> using the <strong><code>startup.py</code></strong> file :</p>
<pre><code class="language-py"># -*- coding: utf-8 -*-

from Npp import editor, notepad, NOTIFICATION

class SRFSC(object):

    def __init__(self):
        notepad.callback(self.callback_npp_BUFFERACTIVATED, [NOTIFICATION.BUFFERACTIVATED])
        self.callback_npp_BUFFERACTIVATED(None)

    def callback_npp_BUFFERACTIVATED(self, args):

        # FORMAT chars

        editor.setRepresentation(u'\u200B', "ZWSP")    # zero width space

        editor.setRepresentation(u'\u200C', "ZWNJ")    # zero width non-joiner
        editor.setRepresentation(u'\u200D', "ZWJ")     # zero width joiner

        editor.setRepresentation(u'\u200E', "LRM")     # left-to-right mark
        editor.setRepresentation(u'\u200F', "RLM")     # right-to-left mark

        editor.setRepresentation(u'\u202A', "LRE")     # left-to-right embedding
        editor.setRepresentation(u'\u202B', "RLE")     # right-to-left embedding

        editor.setRepresentation(u'\u202C', "PDF")     # pop directional formatting

        editor.setRepresentation(u'\u202D', "LRO")     # left-to-right override
        editor.setRepresentation(u'\u202E', "RLO")     # right-to-left override

        editor.setRepresentation(u'\u2060', "WJ")      # word joiner ( zero width no-break space )

        # INVISIBLE chars

        editor.setRepresentation(u'\u2061', "FA")      # function application

        editor.setRepresentation(u'\u2062', "IT")      # invisible times
        editor.setRepresentation(u'\u2063', "IS")      # invisible separator
        editor.setRepresentation(u'\u2064', "IP")      # invisible plus

        # FORMAT chars

        editor.setRepresentation(u'\u2066', "LRI")     # left-to-right isolate
        editor.setRepresentation(u'\u2067', "RLI")     # right-to-left isolate

        editor.setRepresentation(u'\u2068', "FSI")     # first strong isolate
        editor.setRepresentation(u'\u2069', "PDI")     # pop directional isolate

        # DEPRECATED chars

        editor.setRepresentation(u'\u206A', "ISS")     # inhibit symmetric swapping
        editor.setRepresentation(u'\u206B', "ASS")     # activate symmetric swapping

        editor.setRepresentation(u'\u206C', "IAFS")    # inhibit arabic form shaping
        editor.setRepresentation(u'\u206D', "AAFS")    # activate arabic form shaping

        editor.setRepresentation(u'\u206E', "NADS")    # national digit shapes
        editor.setRepresentation(u'\u206F', "NODS")    # nominal digit shapes

        # SPECIAL char

        editor.setRepresentation(u'\uFEFF', "BOM")     # byte order mark ( zero width no-break space : deprecated, see U+2060 )

        notepad.menuCommand(MENUCOMMAND.VIEW_ALL_CHARACTERS)
        notepad.menuCommand(MENUCOMMAND.VIEW_ALL_CHARACTERS)

SRFSC()
</code></pre>
<hr />
<ul>
<li>
<p dir="auto">On the other hand, you may quickly <strong>verify</strong> if some <strong>special</strong> characters exist in <strong>current</strong> file, using the <strong><code>Mark</code></strong> dialog :</p>
<ul>
<li>MARK <strong><code>[\x{200B}-\x{200F}\x{202A}-\x{202E}\x{2060}-\x{2064}\x{2066}-\x{206F}\x{FEFF}]</code></strong></li>
</ul>
</li>
</ul>
<p dir="auto">=&gt; You should see some <strong>thin <code>red</code></strong> marks and , since the <strong><code>v7.9.2</code></strong> <strong>N++</strong> version, you can <strong>copy</strong> all these chars in a <strong>new</strong> tab, for further examination, with the <strong><code>Copy Marked Text</code></strong> button !</p>
<ul>
<li>
<p dir="auto">A <strong>third</strong> solution could be to perform a <strong>regex</strong> S/R ( which can be <strong>recorded</strong> as a <strong>macro</strong>) to replace any of these <strong>special</strong> characters with their <strong>Unicode representation</strong> :</p>
<ul>
<li>
<p dir="auto">SEARCH <strong><code>(\x{200B})|(\x{200C})|(\x{200D})|(\x{200E})|(\x{200F})|(\x{202A})|(\x{202B})|(\x{202C})|(\x{202D})|(\x{202E})|(\x{2060})|(\x{2061})|(\x{2062})|(\x{2063})|(\x{2064})|(\x{2066})|(\x{2067})|(\x{2068})|(\x{2069})|(\x{206A})|(\x{206B})|(\x{206C})|(\x{206D})|(\x{206E})|(\x{206F})|(\x{FEFF})</code></strong></p>
</li>
<li>
<p dir="auto">REPLACE <strong><code>(?1[ZWSP])(?2[ZWNJ])(?3[ZWJ])(?4[LRM])(?5[RLM])(?6[LRE])(?7[RLE])(?8[PDF])(?9[LRO])(?10[RLO])(?11[WJ])(?12[FA])(?13[IT])(?14[IS])(?15[IP])(?16[LRI])(?17[RLI])(?18[FSI])(?19[PDI])(?20[ISS])(?21[ASS])(?22[IAFS])(?23[AAFS])(?24[NADS])(?25[NODS])(?26[BOM])</code></strong></p>
</li>
<li>
<p dir="auto">Once the characters have been <strong>noted</strong> and/or the lines <strong>bookmarked</strong>, for further analyze, then just <strong>undo</strong> the replacements with <strong><code>Ctrl + Z</code></strong></p>
</li>
</ul>
</li>
</ul>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
]]></description><link>https://community.notepad-plus-plus.org/post/62107</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/62107</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Fri, 22 Jan 2021 18:15:08 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Fri, 07 Aug 2020 18:32:51 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@Alan-Kilborn</a> said in <a href="/post/56581">Invisible characters unwanted</a>:</p>
<blockquote>
<pre><code>editor.setRepresentation(u'\u200E', "LTR")  # left-to-right mark`
</code></pre>
</blockquote>
<p dir="auto">Apparently this LTR issue is really annoying to you. :-)</p>
<p dir="auto">Looking at <a href="http://www.fileformat.info/info/unicode/block/general_punctuation/images.htm" rel="nofollow ugc">http://www.fileformat.info/info/unicode/block/general_punctuation/images.htm</a>, there are other control characters in that block, so if your data is more varied, I might expand that to:</p>
<pre><code>    # zero width in name
    editor.setRepresentation(u'\u200B', "ZWS")
    editor.setRepresentation(u'\u200C', "ZWNJ")
    editor.setRepresentation(u'\u200D', "ZWJ")
    editor.setRepresentation(u'\uFEFF', "ZWNBSP")
    # also zero width
    editor.setRepresentation(u'\u2060', "WJ")       # word joiner (separate from ZWJ, but still claims zero width)
    # directional controls and other toggles
    editor.setRepresentation(u'\u200E', "LTR")  # left-to-right mark
    editor.setRepresentation(u'\u200F', "RTL")  # right-to-left mark
    editor.setRepresentation(u'\u202A', "EMBL")  # left-to-right embedding
    editor.setRepresentation(u'\u202B', "EMBR")  # right-to-left embedding
    editor.setRepresentation(u'\u202C', "EMBP")  # pop directional formatting
    editor.setRepresentation(u'\u202A', "OVRL")  # left-to-right override
    editor.setRepresentation(u'\u202B', "OVRR")  # right-to-left override
    editor.setRepresentation(u'\u2066', "ISOL")  # left-to-right isolate
    editor.setRepresentation(u'\u2067', "ISOR")  # right-to-left isolate
    editor.setRepresentation(u'\u2068', "ISO1")  # first strong isolate
    editor.setRepresentation(u'\u2069', "ISOP")  # pop directional isolate
    editor.setRepresentation(u'\u206A', "SYMI")  # inhibit symmetric swapping
    editor.setRepresentation(u'\u206B', "SYMA")  # activate symmetric swapping
    editor.setRepresentation(u'\u206C', "ARAI")  # inhibit arabic form shaping
    editor.setRepresentation(u'\u206D', "ARAA")  # activate arabic form shaping
    editor.setRepresentation(u'\u206E', "SHNA")  # national digit shapes
    editor.setRepresentation(u'\u206E', "SHNO")  # nominal digit shapes
</code></pre>
<p dir="auto">But, most important is to include the characters that <em>you</em>, as the user of the script, might run across.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/56583</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/56583</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Fri, 07 Aug 2020 18:32:51 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Fri, 07 Aug 2020 18:00:44 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/7377">@Alan-Kilborn</a> said in <a href="/post/56578">Invisible characters unwanted</a>:</p>
<blockquote>
<p dir="auto">I’ve noticed that after doing an editor.setRepresentation() it shows the new character representation in the currently active tab, but if I switch tabs to one which also has characters that should be shown by this, they aren’t shown.  Switching back to the tab I started in, the representation I set has also disappeared.</p>
</blockquote>
<p dir="auto">Here’s a little script to avoid that problem, I call it <code>SetRepresentationForSpecialCharacters.py</code> :</p>
<pre><code># -*- coding: utf-8 -*-

from Npp import editor, notepad, NOTIFICATION

class SRFSC(object):

    def __init__(self):
        notepad.callback(self.callback_npp_BUFFERACTIVATED, [NOTIFICATION.BUFFERACTIVATED])
        self.callback_npp_BUFFERACTIVATED(None)

    def callback_npp_BUFFERACTIVATED(self, args):
        editor.setRepresentation(u'\u200B', "ZWS")
        editor.setRepresentation(u'\u200C', "ZWNJ")
        editor.setRepresentation(u'\u200D', "ZWJ")
        editor.setRepresentation(u'\u200E', "LTR")  # left-to-right mark
        editor.setRepresentation(u'\uFEFF', "ZWNBSP")
</code></pre>
<p dir="auto">I run it from my <code>startup.py</code> with this segment of code:</p>
<pre><code>import SetRepresentationForSpecialCharacters
SetRepresentationForSpecialCharacters.SRFSC()
</code></pre>
]]></description><link>https://community.notepad-plus-plus.org/post/56581</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/56581</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Fri, 07 Aug 2020 18:00:44 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Fri, 07 Aug 2020 17:45:45 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@PeterJones</a></p>
<p dir="auto">I’ve noticed that after doing an <code>editor.setRepresentation()</code> it shows the new character representation in the currently active tab, but if I switch tabs to one which also has characters that should be shown by this, they aren’t shown.  Switching back to the tab I started in, the representation I set has also disappeared.</p>
<p dir="auto">I think I know why this is (well, kinda, :-) ), but I’m surprised it wasn’t mentioned before in this thread.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/56578</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/56578</guid><dc:creator><![CDATA[Alan Kilborn]]></dc:creator><pubDate>Fri, 07 Aug 2020 17:45:45 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Wed, 12 Jul 2017 14:08:32 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@PeterJones</a></p>
<p dir="auto">thx,<br />
there are editor.getViewEOL() and editor.getViewWS() functions available<br />
to retrieve current state. But instead of using setView… I would recommend using<br />
notepad.menuCommand(MENUCOMMAND…) to be in sync with notepad++ itself.</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/25547</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/25547</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Wed, 12 Jul 2017 14:08:32 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Wed, 12 Jul 2017 13:33:38 GMT]]></title><description><![CDATA[<p dir="auto">Also, if you want one command to do the normal <code>Show All Characters</code> plus showing these four Zero Width characters,</p>
<pre><code>    # script = Show All Characters (including ZeroWidth)
    editor.setRepresentation(u'\u200B', "ZWS")
    editor.setRepresentation(u'\u200C', "ZWNJ")
    editor.setRepresentation(u'\u200D', "ZWJ")
    editor.setRepresentation(u'\uFEFF', "ZWNBSP")
    # if you want to _also_ show all characters with this script,
    #   first pick a different View &gt; Show Symbols option,
    #   then pick this one (each is a toggle, so don't want to accidentally hide all characters if show-all was already selected)
    notepad.menuCommand(MENUCOMMAND.VIEW_EOL)
    notepad.menuCommand(MENUCOMMAND.VIEW_ALL_CHARACTERS)
</code></pre>
<p dir="auto">And similarly to un-set <code>Show All Characters</code> as well as clearing the four Zero Width representations:</p>
<pre><code>    # script = Don'tShow All Characters (including ZeroWidth)
    editor.clearRepresentation(u'\u200B')
    editor.clearRepresentation(u'\u200C')
    editor.clearRepresentation(u'\u200D')
    editor.clearRepresentation(u'\uFEFF')
    # if you want to _also_ hide all characters with this script,
    #   first pick a different View &gt; Show Symbols option,
    #   then pick this one twice (each is a toggle, so don't want to accidentally show all characters if show-all was already cleared)
    notepad.menuCommand(MENUCOMMAND.VIEW_EOL)
    notepad.menuCommand(MENUCOMMAND.VIEW_ALL_CHARACTERS)
    notepad.menuCommand(MENUCOMMAND.VIEW_ALL_CHARACTERS)
</code></pre>
<p dir="auto">As explained in my comments, I use the VIEW_EOL to change out of VIEW_ALL_CHARACTERS, no matter what the state of the VIEW_ALL_CHARACTERS toggle is; then I use VIEW_ALL_CHARACTERS once to set it or twice to clear it.  If, instead, you’d like your DontShowAllCharacters to revert to “Show EOL” or “Show Whitespace and Tab”, then instead of VIEW_EOL/VIEW_ALL_CHARACTERS/VIEW_ALL_CHARACTERS sequence of three, you could just use VIEW_EOL (a sequence of one to show EOL) or VIEW_TAB_SPACE.  (Though, to be safe, you might want a two-sequence of VIEW_ALL_CHARACTERS/VIEW_EOL or VIEW_ALL_CHARACTERS/VIEW_TAB_SPACE.  It would be easier if there were a notepad.getMenuCommandState() or similar command that reads back the current state of a toggled menu command.)</p>
]]></description><link>https://community.notepad-plus-plus.org/post/25544</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/25544</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Wed, 12 Jul 2017 13:33:38 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Tue, 11 Jul 2017 22:39:03 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@PeterJones</a></p>
<p dir="auto">Peter, thank you very much for your insight.<br />
You could be and I already start thinking you are right about the glyph and my used font.<br />
I still feel it should be the other way around as I don’t like to have an invisible char in my code<br />
and wondering why it doesn’t do what it is supposed to do but than, on the other side,  it doesn’t make sense to have a zero-width char. Hmmm.<br />
I guess I’m good as I can use my font or using setRepresentation function to see any “invisible” chars :-)<br />
Your explanation makes sense - absolutely.</p>
<p dir="auto">Thank you very much.<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/25536</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/25536</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Tue, 11 Jul 2017 22:39:03 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Tue, 11 Jul 2017 22:12:52 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3662">@Claudia-Frank</a> ,</p>
<p dir="auto">I think what you’re missing is that your font doesn’t have a glyph for the character, so shows the <code>?</code> in a box.</p>
<p dir="auto">My font, DejaVu Sans Mono, has a glyph for that character, which is a zero-width glyph, so you cannot see it (because it’s there, but zero-width).  But I can highlight it (see the little green highlight on the first line, and the “Sel: 1|1” on the status bar.</p>
<p dir="auto"><img alt="" class=" img-fluid img-markdown" /></p>
<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/9205">@Nathan-Harvey</a> , I think Notepad++ and my font are doing the right thing: there is a character (Zero-Width Space), and it is being shown, as zero-width.  It’s not a control-character, so it doesn’t have a default <code>CR</code> <code>LF</code>-style box-glyph from show-all-characters.</p>
<p dir="auto">However, using the PythonScript plugin (that Claudia’s screenshot implied), you can run <code>editor.setRepresentation(u'\u200B', "ZWS")</code> to get it to replace the normal zero-width space with <code>ZWS</code> in a black box (similar to the <code>CR</code> and <code>LF</code> boxes).  To clear that alternate representation, <code>editor.clearRepresentation(u'\u200B')</code>.  (There is similar notation for the NppExec plugin as well, but I do not know how to represent a unicode string in its syntax.)</p>
<p dir="auto"><img alt="" class=" img-fluid img-markdown" /></p>
<p dir="auto">By saving that to a script, and using the PythonScript Configuration menu to add that script to the <code>Plugins &gt; PythonScript</code> menu, you can actually then assign a keyboard shortcut using <code>Settings &gt; Shortcut Mapper &gt; Plugin Commands</code>.  If you make two scripts</p>
<pre><code># script = Show ZeroWidth Characters (give them a non-zero-width representation)
editor.setRepresentation(u'\u200B', "ZWS")
editor.setRepresentation(u'\u200C', "ZWNJ")
editor.setRepresentation(u'\u200D', "ZWJ")
editor.setRepresentation(u'\uFEFF', "ZWNBSP")

# script = Default ZeroWidth Characters (return them to their zero-width glyph from the selected font)
editor.clearRepresentation(u'\u200B')
editor.clearRepresentation(u'\u200C')
editor.clearRepresentation(u'\u200D')
editor.clearRepresentation(u'\uFEFF')
</code></pre>
<p dir="auto">you can get all the <a href="http://www.fileformat.info/info/unicode/char/search.htm?q=zero+width&amp;preview=entity" rel="nofollow ugc">“zero width” unicode characters that I can find</a> to toggle visibility</p>
]]></description><link>https://community.notepad-plus-plus.org/post/25535</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/25535</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Tue, 11 Jul 2017 22:12:52 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Tue, 11 Jul 2017 19:16:59 GMT]]></title><description><![CDATA[<p dir="auto">and it is shown</p>
<p dir="auto"><img src="https://camo.nodebb.org/7279af299116c19799593db4a75b05d08d3e1db4?url=http%3A%2F%2Fi.imgur.com%2FPj0bv6q.png" alt="" class=" img-fluid img-markdown" /></p>
<p dir="auto">What am I missing here?</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/25534</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/25534</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Tue, 11 Jul 2017 19:16:59 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Tue, 11 Jul 2017 18:59:44 GMT]]></title><description><![CDATA[<p dir="auto"><a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/9205">@Nathan-Harvey</a></p>
<p dir="auto">I’m feeling the same, as long as the underlying font is able to represent it, it should be displayed,<br />
even when not using show all symbols.</p>
<p dir="auto">Cheers<br />
Claudia</p>
]]></description><link>https://community.notepad-plus-plus.org/post/25533</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/25533</guid><dc:creator><![CDATA[Claudia Frank]]></dc:creator><pubDate>Tue, 11 Jul 2017 18:59:44 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Tue, 11 Jul 2017 18:30:51 GMT]]></title><description><![CDATA[<p dir="auto">But Shouldn’t we be able to see all the characters (including <a href="https://graphemica.com/200B" rel="nofollow ugc">Zero-width space</a>) if we use the menu option View↘Show Symbol↘Show All Characters ?</p>
<p dir="auto">This seems like a menu function that doesn’t work as described (ALL characters should include weird control characters and “noop” characters like zero-width space).</p>
]]></description><link>https://community.notepad-plus-plus.org/post/25532</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/25532</guid><dc:creator><![CDATA[Nathan Harvey]]></dc:creator><pubDate>Tue, 11 Jul 2017 18:30:51 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Sat, 01 Jul 2017 11:20:04 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/9066">@benoît-lechat</a>, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/2829">@adrianHHH</a>, <a class="plugin-mentions-user plugin-mentions-a" href="https://community.notepad-plus-plus.org/uid/3841">@peterjones</a>, and <strong>All</strong>,</p>
<p dir="auto">For <strong><code>UTF-8</code></strong> files, I very often use this simple and <strong>useful</strong> tool :</p>
<p dir="auto"><a href="http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi" rel="nofollow ugc">http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi</a>?</p>
<hr />
<p dir="auto">For instance, given the line of <strong>Benoît</strong>’s example, from the link :</p>
<p dir="auto"><a href="https://drive.google.com/file/d/0B-tMAt7OX-3OSFNjTFJVNkk3VEU/view" rel="nofollow ugc">https://drive.google.com/file/d/0B-tMAt7OX-3OSFNjTFJVNkk3VEU/view</a></p>
<ul>
<li>
<p dir="auto"><strong>Select</strong> all that line and <strong>paste</strong> in a N++ <strong>new</strong>  tab</p>
</li>
<li>
<p dir="auto">Now, place the <strong>cursor</strong>, right <strong>before</strong> the <strong>opening double-quote</strong> of the string <strong>“FE-000-2”</strong></p>
</li>
<li>
<p dir="auto">Hit the <strong><code>Right Arrow</code></strong>, to be at the location, right <strong>after</strong> the <strong><code>"</code></strong> character</p>
</li>
<li>
<p dir="auto">Then hit the <strong><code>Shift + Right Arrow</code></strong> shortcut to <strong>select</strong> this <strong>unknown</strong> character</p>
</li>
<li>
<p dir="auto"><strong>Copy</strong> it, with the <strong><code>Ctrl + C</code></strong> shortcut</p>
</li>
<li>
<p dir="auto">Open the following <strong>UTF-8</strong> tool : <a href="http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi" rel="nofollow ugc">http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi</a>?</p>
</li>
<li>
<p dir="auto"><strong>Paste</strong> it, with the <strong><code>Ctrl + V</code></strong> shortcut, in the box, at the <strong>top</strong> of the page</p>
</li>
<li>
<p dir="auto">Choose the <strong>Interpret as <code>character</code></strong> option</p>
</li>
<li>
<p dir="auto">Finally, click on the <strong><code>Go</code></strong> button</p>
</li>
</ul>
<p dir="auto">=&gt;  You’ll get the <strong>Zero Width space</strong> character :-))</p>
<hr />
<p dir="auto">You may, either :</p>
<ul>
<li>
<p dir="auto">Enter the text <strong><code>E2 80 8B</code></strong>, with a <strong>space</strong> separator character, between <strong>bytes</strong></p>
</li>
<li>
<p dir="auto">Choose the <strong>Interpret as <code>Hex UTF-8 Bytes</code></strong> option</p>
</li>
<li>
<p dir="auto">Click on the <strong><code>Go</code></strong> button</p>
</li>
</ul>
<p dir="auto">You could also :</p>
<ul>
<li>
<p dir="auto">Enter the text <strong><code>200B</code></strong>, <strong>without</strong> any <strong>space</strong> character ( <strong>Hexadecimal Unicode code-point</strong> number )</p>
</li>
<li>
<p dir="auto">Choose the <strong>Interpret as <code>Hex code-point</code></strong> option</p>
</li>
<li>
<p dir="auto">Click on the <strong><code>Go</code></strong> button</p>
</li>
</ul>
<p dir="auto">To end with :</p>
<ul>
<li>
<p dir="auto">Enter the text <strong><code>8203</code></strong> ( <strong>Decimal Unicode code-point</strong> number )</p>
</li>
<li>
<p dir="auto">Choose the <strong>Interpret as <code>Decimal code-point</code></strong> option</p>
</li>
<li>
<p dir="auto">Click on the <strong><code>Go</code></strong> button</p>
</li>
</ul>
<p dir="auto">Each time, you’ll get, again,  the <strong>Zero Width space</strong> character :-))</p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
<p dir="auto"><strong>P.S.</strong> :</p>
<p dir="auto">You can get additional and useful information on the <strong>zero-width space (ZWSP)</strong> character, from the link, below :</p>
<p dir="auto"><a href="https://en.wikipedia.org/wiki/Zero-width_space" rel="nofollow ugc">https://en.wikipedia.org/wiki/Zero-width_space</a></p>
]]></description><link>https://community.notepad-plus-plus.org/post/25301</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/25301</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Sat, 01 Jul 2017 11:20:04 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Fri, 30 Jun 2017 13:03:15 GMT]]></title><description><![CDATA[<p dir="auto">Oops.  Actually, we were both wrong.  It <em>was</em> <code>E2 80 8B</code>, but that decodes into…</p>
<pre><code>1110_xxxx 10yy_yyyy 10zz_zzzz
E2        80        8B
1110 0010 1000 0000 1000 1011
     0010   00 0000   00 1011
0010_00_0000_00_1011
0010_0000_0000_1011
U+200B
</code></pre>
<p dir="auto"><code>U+200B</code> is the <a href="http://www.fileformat.info/info/unicode/char/200b/index.htm" rel="nofollow ugc">Zero Width Space</a>.  And suddenly the lack of a glyph (or, rather, not seeing anything) in Notepad++ makes perfect sense!  Of course you won’t see a zero-width space. ☺</p>
<p dir="auto">Thanks.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/25289</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/25289</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Fri, 30 Jun 2017 13:03:15 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Fri, 30 Jun 2017 10:45:44 GMT]]></title><description><![CDATA[<p dir="auto">Peter, great on how to convert, but I think you transcribed the hex values wrongly. You converted <code>E0 82  8B</code>  but I believe the values in the file are <code>E2 80 8B</code>. When I convert these using your method I get <a href="http://www.fileformat.info/info/unicode/char/400b/index.htm" rel="nofollow ugc">U+400B</a> and it is described as “Unicode Han Character ‘(same as U+9E7D 鹽) salt’ (U+400B)”.</p>
]]></description><link>https://community.notepad-plus-plus.org/post/25287</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/25287</guid><dc:creator><![CDATA[AdrianHHH]]></dc:creator><pubDate>Fri, 30 Jun 2017 10:45:44 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Thu, 29 Jun 2017 14:17:58 GMT]]></title><description><![CDATA[<p dir="auto">I originally assumed it was Unicode encoded in <a href="https://en.wikipedia.org/wiki/UTF-8" rel="nofollow ugc">UTF-8</a>.  In UTF-8, bytes starting with a most-significant-bit of 1 (0x80 - 0xFF) are part of an encoded Unicode codepoint beyond U+007F, and when the first nibble in the encoded codepoint (the first hex character) is an <code>E</code>, it indicates that the codepoint is spread across thee bytes:</p>
<pre><code>1110_xxxx   10yy_yyyy   10zz_zzzz       the three bytes of codepoint, showing the "prefixes" and the arbitrary characters x,b,c
E0          82          8B              the three values from your document
1110_0000   1000_0010   1000_1011       ... converted to binary
     0000     00 0010     00 1011       get rid of the
0000_00_0010_00_1011                    compress the space
0000_0000_1000_1011                     regroup to nibbles
U+008B                                  convert to hex unicode
</code></pre>
<p dir="auto"><a href="http://www.fileformat.info/info/unicode/char/008b/index.htm" rel="nofollow ugc">U+008B</a> is a control character.</p>
<p dir="auto">Oddly, <code>U+008B</code> should have been represented with only two bytes, not three: <code>C2 8B</code>, so I’m not sure why your browser gave you those three bytes when you copied, unless the browser wasn’t really presenting UTF-8.  Windows-1252 (Latin-1) would be <code>à‚‹</code>.  In CP850/OEM850 (“Multilingual”, which also calls itself Latin I in <a href="https://ss64.com/nt/chcp.html" rel="nofollow ugc">some documents</a>), it is <code>Óéï</code>.  In DOS <a href="https://en.wikipedia.org/wiki/Code_page_437" rel="nofollow ugc">CodePage 437</a> (which I cannot find in Notepad++ Encoding list, but it’s the one that had the old box-drawing characters), it would have been <code>αéï</code>.  None of those strings make sense as being likely; maybe the browser misinterpreted or misencoded something when you copied from the browser, or over the years, whatever encoding was originally there had been corrupted into those three bytes.</p>
<p dir="auto">I tried an experiment, and took your example file, and appended the hex characters 2020C28B2020 (two spaces, the proper UTF-8 encoding of U+008B, and two more spaces) – Notepad++ doesn’t display the E0828B as anything, but does display the C28B as <code>[PLD]</code>.  So it’s looking like Notepad++ just doesn’t display anything for an invalid UTF-8 sequence, but does display something for UTF-8 control-codes</p>
]]></description><link>https://community.notepad-plus-plus.org/post/25268</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/25268</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Thu, 29 Jun 2017 14:17:58 GMT</pubDate></item><item><title><![CDATA[Reply to Invisible characters unwanted on Thu, 29 Jun 2017 08:11:32 GMT]]></title><description><![CDATA[<p dir="auto">A hex dump of the downloaded file shows as below. There appear to be the three bytes <code>E2 80 8B</code> between the double quote and the ‘F’. I do not know what they represent or why Notepad++ does not show anything for them.</p>
<pre><code> 000000   3C646976 20636C61   |&lt;div cla|
 000008   73733D22 6D6F6461   |ss="moda|
 000010   6C206661 64652070   |l fade p|
 000018   726F6475 63745F76   |roduct_v|
 000020   69657722 2069643D   |iew" id=|
 000028   22E2808B 46452D30   |"...FE-0|
 000030   30302D32 223E       |00-2"&gt;  |
</code></pre>
]]></description><link>https://community.notepad-plus-plus.org/post/25262</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/25262</guid><dc:creator><![CDATA[AdrianHHH]]></dc:creator><pubDate>Thu, 29 Jun 2017 08:11:32 GMT</pubDate></item></channel></rss>