<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Visualization for zero-width characters]]></title><description><![CDATA[<p dir="auto">Hello Community,<br />
I just ran into this post:<br />
<a href="https://medium.com/@umpox/be-careful-what-you-copy-invisibly-inserting-usernames-into-text-with-zero-width-characters-18b4e6f17b66" rel="nofollow ugc">https://medium.com/@umpox/be-careful-what-you-copy-invisibly-inserting-usernames-into-text-with-zero-width-characters-18b4e6f17b66</a></p>
<p dir="auto">In summary you can have zero width (invisible) characters that most applications don’t register.<br />
I’ve tested with Notepad++ it seems to register that there is characters when I navigate the text (I need an extra press of the arrow when I’m at a location with a zero-width chars).<br />
Even with “Show All Characters” option I don’t see any characters there.</p>
<p dir="auto">Please consider adding visualization for zero-width characters it will be very helpful for security conscious  people.</p>
<p dir="auto">PS. Thanks to everyone who is involved in the existence of this great software.</p>
<p dir="auto">All the best,<br />
Petyo</p>
]]></description><link>https://community.notepad-plus-plus.org/topic/15649/visualization-for-zero-width-characters</link><generator>RSS for Node</generator><lastBuildDate>Thu, 11 Jun 2026 16:51:38 GMT</lastBuildDate><atom:link href="https://community.notepad-plus-plus.org/topic/15649.rss" rel="self" type="application/rss+xml"/><pubDate>Thu, 19 Apr 2018 08:50:48 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Reply to Visualization for zero-width characters on Mon, 01 Aug 2022 11:01:29 GMT]]></title><description><![CDATA[<p dir="auto">Hello, <a class="plugin-mentions-user plugin-mentions-a" href="/user/petyo-vodenicharov" aria-label="Profile: petyo-vodenicharov">@<bdi>petyo-vodenicharov</bdi></a>, <a class="plugin-mentions-user plugin-mentions-a" href="/user/peterjones" aria-label="Profile: peterjones">@<bdi>peterjones</bdi></a> and <strong>All</strong>,</p>
<p dir="auto">In addition, to the <strong>two</strong> valuable <strong>Peter</strong>’s posts, above, here is my contribution to these <strong>strange</strong> characters ;-))</p>
<p dir="auto">Here is, below, a list of <strong>all</strong> the <strong>Unicode</strong> characters, with the <strong>General category</strong> property = <strong><code>Cf</code></strong> ( <strong>Format Character</strong> ), which, both, have a code value &lt; <strong><code>FFFF</code></strong> and do <strong>NOT</strong>, strictly, depend on a <strong>specific</strong> language !</p>
<pre><code class="language-diff">        •--------•--------•-------------------------------------------•------•---------•
        |  Code  |  Abbr. |              Complete Name                | Cat. |  &gt;Car&lt;  |
        •--------•--------•-------------------------------------------•------•---------•
        |  00AD  |  SHY   |  SOFT HYPHEN                              |  Cf  |    &gt;­&lt;  |
        •--------•--------•-------------------------------------------•------•---------•
        |  200B  |  ZWSP  |  ZERO WIDTH SPACE                         |  Cf  |    &gt;​&lt;   |
        |  200C  |  ZWNJ  |  ZERO WIDTH NON-JOINER                    |  Cf  |    &gt;‌&lt;   |
        |  200D  |  ZWJ   |  ZERO WIDTH JOINER                        |  Cf  |    &gt;‍&lt;   |
        |  200E  |  LRM   |  LEFT-TO-RIGHT MARK                       |  Cf  |    &gt;‎&lt;   |
        |  200F  |  RLM   |  RIGHT-TO-LEFT MARK                       |  Cf  |    &gt;‏&lt;   |
        •--------•--------•-------------------------------------------•------•---------•
        |  202A  |  LRE   |  LEFT-TO-RIGHT EMBEDDING                  |  Cf  |    &gt;‪&lt;   |
        |  202B  |  RLE   |  RIGHT-TO-LEFT EMBEDDING                  |  Cf  |    &gt;‫&lt;   |
        |  202C  |  PDF   |  POP DIRECTIONAL FORMATTING               |  Cf  |    &gt;‬&lt;   |
        |  202D  |  LRO   |  LEFT-TO-RIGHT OVERRIDE                   |  Cf  |    &gt;‭&lt;   |
        |  202E  |  RLO   |  RIGHT-TO-LEFT OVERRIDE                   |  Cf  |    &gt;‮|
        •--------•--------•-------------------------------------------•------•---------•
        |  2060  |  WJ    |  WORD JOINER                              |  Cf  |    &gt;⁠&lt;  |
        |  2061  |  ƒ()   |  FUNCTION APPLICATION                     |  Cf  |    &gt;⁡&lt;  |
        |  2062  |  ×     |  INVISIBLE TIMES                          |  Cf  |    &gt;⁢&lt;  |
        |  2063  |  ,     |  INVISIBLE SEPARATOR                      |  Cf  |    &gt;⁣&lt;  |
        |  2064  |  +     |  INVISIBLE PLUS                           |  Cf  |    &gt;⁤&lt;  |
        |  2066  |  LRI   |  LEFT-TO-RIGHT ISOLATE                    |  Cf  |    &gt;⁦&lt;  |
        |  2067  |  RLI   |  RIGHT-TO-LEFT ISOLATE                    |  Cf  |    &gt;⁧&lt;  |
        |  2068  |  FSI   |  FIRST STRONG ISOLATE                     |  Cf  |    &gt;⁨&lt;  |
        |  2069  |  PDI   |  POP DIRECTIONAL ISOLATE                  |  Cf  |    &gt;⁩&lt;  |
        |  206A  |  ISS   |  INHIBIT SYMMETRIC SWAPPING               |  Cf  |    &gt;⁪&lt;   |
        |  206B  |  ASS   |  ACTIVATE SYMMETRIC SWAPPING              |  Cf  |    &gt;⁫&lt;   |
        |  206C  |  IAFS  |  INHIBIT ARABIC FORM SHAPING              |  Cf  |    &gt;⁬&lt;   |
        |  206D  |  AAFS  |  ACTIVATE ARABIC FORM SHAPING             |  Cf  |    &gt;⁭&lt;   |
        |  206E  |  NADS  |  NATIONAL DIGIT SHAPES                    |  Cf  |    &gt;⁮&lt;   |
        |  206F  |  NODS  |  NOMINAL DIGIT SHAPES                     |  Cf  |    &gt;⁯&lt;   |
        •--------•--------•-------------------------------------------•------•---------•
        |  FEFF  | ZWNBSP |  ZERO WIDTH NO-BREAK SPACE                |  Cf  |    &gt;﻿&lt;   |
        •--------•--------•-------------------------------------------•------•---------•
        |  FFF9  |  IAA   |  INTERLINEAR ANNOTATION ANCHOR            |  Cf  |    &gt;￹&lt;   |
        |  FFFA  |  IAS   |  INTERLINEAR ANNOTATION SEPARATOR         |  Cf  |    &gt;￺&lt;   |
        |  FFFB  |  IAT   |  INTERLINEAR ANNOTATION TERMINATOR        |  Cf  |    &gt;￻&lt;   |
        •--------•--------•-------------------------------------------•------•---------•
</code></pre>
<p dir="auto">Now, depending of the <strong>current</strong> font, that is used in <strong>N++</strong>, the <strong>glyph</strong> of these characters may :</p>
<ul>
<li>
<p dir="auto">Be <strong>invisible</strong> ( A true <strong><code>Zero Width</code></strong> character )</p>
</li>
<li>
<p dir="auto">Display a <strong>square</strong> or a <strong>thin rectangular</strong> box ( Character <strong>not</strong> handled by <strong>current</strong> font )</p>
</li>
<li>
<p dir="auto">Display a <strong>specific</strong> character ( case of the <strong>Soft Hyphen</strong> )</p>
</li>
</ul>
<hr />
<p dir="auto">Of course, with the simple <strong>regular</strong> expression <strong><code>\x{####}</code></strong>, you can match the character of <strong>Unicode</strong> value = <strong><code>####</code></strong>. But, it would be <strong>better</strong> to find out a regex to match <strong>any</strong> of these <strong>format</strong> characters !</p>
<p dir="auto">I noticed that the <strong>Posix</strong> character class <strong><code>&lsqb;&lsqb;:cntrl:&rsqb;&rsqb;</code></strong> matches <strong>most</strong> of these characters :</p>
<ul>
<li>
<p dir="auto">The <strong><code>4</code></strong> characters, from <strong><code>\x{200C}</code></strong> to <strong><code>\x{200F}</code></strong></p>
</li>
<li>
<p dir="auto">The <strong><code>5</code></strong> characters, from <strong><code>\x{202A}</code></strong> to <strong><code>\x{202E}</code></strong></p>
</li>
<li>
<p dir="auto">The <strong><code>6</code></strong> characters, from <strong><code>\x{206A}</code></strong> to <strong><code>\x{206F}</code></strong></p>
</li>
<li>
<p dir="auto">The character <strong><code>\x{FEFF}</code></strong></p>
</li>
<li>
<p dir="auto">The <strong><code>3</code></strong> characters, from <strong><code>\x{FFF9}</code></strong> to <strong><code>\x{FFFB}</code></strong></p>
</li>
</ul>
<hr />
<p dir="auto">Unfortunately, the <strong><code>&lsqb;&lsqb;:cntrl:&rsqb;&rsqb;</code></strong> regex, also matches the <strong>Control</strong> characters :</p>
<ul>
<li>
<p dir="auto">The <strong><code>32</code></strong> <strong>C0</strong> characters, from <strong><code>\x{0000}</code></strong> to <strong><code>\x{001F}</code></strong></p>
</li>
<li>
<p dir="auto">The <strong><code>32</code></strong> <strong>C1</strong> characters, from <strong><code>\x{0080}</code></strong> to <strong><code>\x{009F}</code></strong></p>
</li>
</ul>
<p dir="auto">Moreover, the <strong><code>&lsqb;&lsqb;:cntrl:&rsqb;&rsqb;</code></strong> regex <strong>misses</strong> some characters :</p>
<ul>
<li>
<p dir="auto">The <strong>Soft Hyphen</strong> <strong><code>\x{00AD}</code></strong></p>
</li>
<li>
<p dir="auto">The <strong>Zero Width Space</strong> <strong><code>\x{200B}</code></strong></p>
</li>
<li>
<p dir="auto">The <strong><code>9</code></strong> characters, from <strong><code>\x{2060}</code></strong> to <strong><code>\x{2069}</code></strong></p>
</li>
</ul>
<hr />
<p dir="auto">So, a correct regex, to match <strong>all</strong> these <strong>format</strong> characters, above, in an <strong>Unicode</strong> encoded file, could be :</p>
<ul>
<li><strong><code>(?=&lsqb;&lsqb;:unicode:&rsqb;&rsqb;)&lsqb;&lsqb;:cntrl:]\x{200B}\x{2060}-\x{2069}]|\xAD</code></strong></li>
</ul>
<p dir="auto">Now, <strong>how</strong> to visualize a <strong>zero-width</strong> character ? If you just hit the <strong>Find Next</strong> button, you see that a <strong>specific</strong> line is reached but you do <strong>not</strong> know the <strong>exact</strong> location of this/these <strong>zero-width</strong> char(s) :-((</p>
<p dir="auto"><strong>Two</strong> solutions are possible :</p>
<ul>
<li><strong><code>(?-s).((?=&lsqb;&lsqb;:unicode:&rsqb;&rsqb;)&lsqb;&lsqb;:cntrl:]\x{200B}\x{2060}-\x{2069}]|\xAD)+.</code></strong></li>
</ul>
<p dir="auto">Which match <strong>two standard</strong> characters, separated by, <strong>one</strong> or <strong>several</strong> consecutive <strong>format</strong> character(s)</p>
<ul>
<li><strong><code>((?=&lsqb;&lsqb;:unicode:&rsqb;&rsqb;)&lsqb;&lsqb;:cntrl:]\x{200B}\x{2060}-\x{2069}]|\xAD)+</code></strong></li>
</ul>
<p dir="auto">Which <strong>mark</strong> all these <strong>format</strong> chars, while clicking on the <strong>Mark All</strong> button ( the <strong>best</strong> solution, to my mind ! )</p>
<hr />
<p dir="auto">So, trying the simple <strong>regex</strong> <strong><code>\x{200B}</code></strong>, against the sentence, below and using the <strong>Mark</strong> option, will convince you that this sentence <strong>does</strong> contain some <strong>Zero Width Space</strong> characters, inside !</p>
<pre><code class="language-diff">F​or exam​ple, I’ve ins​erted 10 ze​ro-width spa​ces in​to thi​s sentence, c​an you tel​​l me where ?
</code></pre>
<p dir="auto">Note that, between the <strong>two</strong> letters <strong><code>l</code></strong> of the verb <strong>tell</strong>, there are <strong>two</strong> consecutive chars <strong><code>\x{200B}</code></strong> !</p>
<hr />
<p dir="auto">You can see a <strong>description</strong> of these <strong>format</strong> characters, from the following links :</p>
<p dir="auto"><a href="http://www.unicode.org/charts/PDF/U2000.pdf" rel="nofollow ugc">http://www.unicode.org/charts/PDF/U2000.pdf</a></p>
<p dir="auto"><a href="http://www.unicode.org/charts/PDF/UFE70.pdf" rel="nofollow ugc">http://www.unicode.org/charts/PDF/UFE70.pdf</a></p>
<p dir="auto"><a href="http://www.unicode.org/charts/PDF/UFFF0.pdf" rel="nofollow ugc">http://www.unicode.org/charts/PDF/UFFF0.pdf</a></p>
<p dir="auto">Refer, also, to that post :</p>
<p dir="auto"><a href="https://notepad-plus-plus.org/community/topic/14812/how-to-search-for-unknown-3-digit-characters-with-black-background/2" rel="nofollow ugc">https://notepad-plus-plus.org/community/topic/14812/how-to-search-for-unknown-3-digit-characters-with-black-background/2</a></p>
<p dir="auto">Best Regards,</p>
<p dir="auto">guy038</p>
<p dir="auto"><strong>P.S.</strong> :</p>
<p dir="auto">Simply, copy/paste the <strong>list</strong> and the <strong>sentence</strong>, above, in <strong>inverse</strong> video, in a <strong>new</strong> tab and enjoy !</p>
]]></description><link>https://community.notepad-plus-plus.org/post/31761</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/31761</guid><dc:creator><![CDATA[guy038]]></dc:creator><pubDate>Mon, 01 Aug 2022 11:01:29 GMT</pubDate></item><item><title><![CDATA[Reply to Visualization for zero-width characters on Thu, 19 Apr 2018 13:17:25 GMT]]></title><description><![CDATA[<p dir="auto">See my <a href="https://notepad-plus-plus.org/community/topic/14754/entering-zwnj-zero-width-non-joiner" rel="nofollow ugc">previous post</a> on the subject of zero-width characters, and an <a href="https://notepad-plus-plus.org/community/topic/14045/invisible-characters-unwanted/15" rel="nofollow ugc">earlier post</a> where I even shared a pair of PythonScript scripts to give a fuller <code>Show All Characters</code> and <code>Don't Show All Characters</code> functionality</p>
]]></description><link>https://community.notepad-plus-plus.org/post/31759</link><guid isPermaLink="true">https://community.notepad-plus-plus.org/post/31759</guid><dc:creator><![CDATA[PeterJones]]></dc:creator><pubDate>Thu, 19 Apr 2018 13:17:25 GMT</pubDate></item></channel></rss>