Community
    • Login
    1. Home
    2. Popular
    Log in to post
    • All Time
    • Day
    • Week
    • Month
    • All Topics
    • New Topics
    • Watched Topics
    • Unreplied Topics
    • All categories
    • CoisesC

      Columns++ version 1.3: All Unicode, all the time

      Watching Ignoring Scheduled Pinned Locked Moved Notepad++ & Plugin Development
      18
      5 Votes
      18 Posts
      1k Views
      guy038G

      Hi, @coises and All,

      From this link https://www.unicode.org/Public/UCD/latest/ucd/Blocks.txt, I created a list of all Unicode blocks and I verified the number of word characters of each block with, either :

      Columns++

      Notepad++

      MultiReplace

      Just download the text file Words_in_Blocks.txt , from my Google Drive account below :

      https://drive.google.com/file/d/1hFXLBhrKghjoMTvDk46QSk4BjlzOAPKP/view?usp=sharing

      As you can see, from left to right :

      Column 1 : regex needed to get the number of Word characters

      Column 2 : name of each Unicode block

      Column 3 : total number of characters of each block

      Column 4 : number of assigned numbers of each block, so far

      Column 5 : Columns++ number of Word characters found

      Column 6 : N++ Search and MultiReplace number of Word chars found

      At this point, We can deduce some major points :

      First, for any character over the BMP, the N++ search and Multireplace always return the 0 value whereas Columns++, implemented in UTF-32, give the correct results ! So, from now on, I’ll speak about results regarding the BMP Unicode plane, ONLY !

      Secondly, in the table below, I listed all blocks where the N++ search and MultiReplace return 0 for Word chars. As I added a column which shows in which release, each block was created, it’s easy to see that any block after the Unicode release 5.2 have not been updated in our Boost regex engine !

      •---------------------------•------------------------------------------------•----------•----------•-----------•------------•---------• | Block range | Block name | Total | Assigned | Columns++ | N++ / MRep | Unicode | | | | Code-Pts | Code-Pts | Word Chrs | Word Chrs | Version | •---------------------------•------------------------------------------------•----------•----------•-----------•------------•---------• | (?=\w)[\x{0800}-\x{083F}] | Samaritan | 64 | 61 | 25 | 0 | 5.2 | | (?=\w)[\x{18B0}-\x{18FF}] | Unified Canadian Aboriginal Syllabics Extended | 80 | 70 | 70 | 0 | 5.2 | | (?=\w)[\x{1A20}-\x{1AAF}] | Tai Tham | 144 | 127 | 74 | 0 | 5.2 | | (?=\w)[\x{1CD0}-\x{1CFF}] | Vedic Extensions | 48 | 43 | 13 | 0 | 5.2 | | (?=\w)[\x{A4D0}-\x{A4FF}] | Lisu | 48 | 48 | 46 | 0 | 5.2 | | (?=\w)[\x{A6A0}-\x{A6FF}] | Bamum | 96 | 88 | 70 | 0 | 5.2 | | (?=\w)[\x{A8E0}-\x{A8FF}] | Devanagari Extended | 32 | 32 | 9 | 0 | 5.2 | | (?=\w)[\x{A960}-\x{A97F}] | Hangul Jamo Extended-A | 32 | 29 | 29 | 0 | 5.2 | | (?=\w)[\x{A980}-\x{A9DF}] | Javanese | 96 | 91 | 58 | 0 | 5.2 | | (?=\w)[\x{AA60}-\x{AA7F}] | Myanmar Extended-A | 32 | 32 | 26 | 0 | 5.2 | | (?=\w)[\x{AA80}-\x{AADF}] | Tai Viet | 96 | 72 | 61 | 0 | 5.2 | | (?=\w)[\x{ABC0}-\x{ABFF}] | Meetei Mayek | 64 | 56 | 45 | 0 | 5.2 | | (?=\w)[\x{D7B0}-\x{D7FF}] | Hangul Jamo Extended-B | 80 | 72 | 72 | 0 | 5.2 | | (?=\w)[\x{0840}-\x{085F}] | Mandaic | 32 | 29 | 25 | 0 | 6.0 | | (?=\w)[\x{1BC0}-\x{1BFF}] | Batak | 64 | 56 | 38 | 0 | 6.0 | | (?=\w)[\x{AB00}-\x{AB2F}] | Ethiopic Extended-A | 48 | 32 | 32 | 0 | 6.0 | | (?=\w)[\x{08A0}-\x{08FF}] | Arabic Extended-A | 96 | 96 | 42 | 0 | 6.1 | | (?=\w)[\x{AAE0}-\x{AAFF}] | Meetei Mayek Extensions | 32 | 23 | 14 | 0 | 6.1 | | (?=\w)[\x{A9E0}-\x{A9FF}] | Myanmar Extended-B | 32 | 31 | 30 | 0 | 7.0 | | (?=\w)[\x{AB30}-\x{AB6F}] | Latin Extended-E | 64 | 60 | 57 | 0 | 7.0 | | (?=\w)[\x{AB70}-\x{ABBF}] | Cherokee Supplement | 80 | 80 | 80 | 0 | 8.0 | | (?=\w)[\x{1C80}-\x{1C8F}] | Cyrillic Extended-C | 16 | 11 | 11 | 0 | 9.0 | | (?=\w)[\x{0860}-\x{086F}] | Syriac Supplement | 16 | 11 | 11 | 0 | 10.0 | | (?=\w)[\x{1C90}-\x{1CBF}] | Georgian Extended | 48 | 46 | 46 | 0 | 11.0 | | (?=\w)[\x{0870}-\x{089F}] | Arabic Extended-B | 48 | 43 | 31 | 0 | 14.0 | •---------------------------•------------------------------------------------•----------•----------•-----------•------------•---------•

      I did a quick test with N++ v8.9.1 which says :

      Update to Boost 1.90.0.

      But the results do not change at all. So, if I understand correctly, the Boost regex engine hasn’t updated Unicode since version 5.2 ? Very surprising !

      Thirdly, in the table below, I listed all blocks where the N++ search and MultiReplace return a number of WORD chars smaller than in the Columns++ column :

      •---------------------------•----------------------------------------------- •----------•----------•-----------•------------•---------• | Block range | Block name | Total | Assigned | Columns++ | N++ / MRep | Unicode | | | | Code-Pts | Code-Pts | Word Chrs | Word Chrs | Version | •---------------------------•----------------------------------------------- •----------•----------•-----------•------------•---------• | (?=\w)[\x{02B0}-\x{02FF}] | Spacing Modifier Letters | 80 | 80 | 37 | 24 | 1.0 | | (?=\w)[\x{0370}-\x{03FF}] | Greek and Coptic | 144 | 135 | 129 | 127 | 1.0 | | (?=\w)[\x{0530}-\x{058F}] | Armenian | 96 | 91 | 80 | 78 | 1.0 | | (?=\w)[\x{0590}-\x{05FF}] | Hebrew | 112 | 88 | 31 | 30 | 1.0 | | (?=\w)[\x{0600}-\x{06FF}] | Arabic | 256 | 256 | 173 | 172 | 1.0 | | (?=\w)[\x{0900}-\x{097F}] | Devanagari | 128 | 128 | 91 | 83 | 1.0 | | (?=\w)[\x{0980}-\x{09FF}] | Bengali | 128 | 96 | 65 | 63 | 1.0 | | (?=\w)[\x{0A80}-\x{0AFF}] | Gujarati | 128 | 91 | 63 | 62 | 1.0 | | (?=\w)[\x{0C00}-\x{0C7F}] | Telugu | 128 | 101 | 68 | 64 | 1.0 | | (?=\w)[\x{0C80}-\x{0CFF}] | Kannada | 128 | 92 | 68 | 63 | 1.0 | | (?=\w)[\x{0D00}-\x{0D7F}] | Malayalam | 128 | 118 | 77 | 69 | 1.0 | | (?=\w)[\x{0D80}-\x{0DFF}] | Sinhala | 128 | 91 | 69 | 59 | 1.0 | | (?=\w)[\x{0E80}-\x{0EFF}] | Lao | 128 | 83 | 66 | 50 | 1.0 | | (?=\w)[\x{0F00}-\x{0FFF}] | Tibetan | 256 | 211 | 60 | 59 | 1.0 | | (?=\w)[\x{10A0}-\x{10FF}] | Georgian | 96 | 88 | 87 | 82 | 1.0 | | (?=\w)[\x{2070}-\x{209F}] | Superscripts and Subscripts | 48 | 42 | 15 | 7 | 1.0 | | (?=\w)[\x{3100}-\x{312F}] | Bopomofo | 48 | 43 | 43 | 41 | 1.0 | | (?=\w)[\x{4E00}-\x{9FFF}] | CJK Unified Ideographs | 20992 | 20992 | 20992 | 20932 | 1.0.1 | | (?=\w)[\x{F900}-\x{FAFF}] | CJK Compatibility Ideographs | 512 | 472 | 472 | 467 | 1.0.1 | | (?=\w)[\x{16A0}-\x{16FF}] | Runic | 96 | 89 | 83 | 78 | 3.0 | | (?=\w)[\x{13A0}-\x{13FF}] | Cherokee | 96 | 92 | 92 | 85 | 3.0 | | (?=\w)[\x{1400}-\x{167F}] | Unified Canadian Aboriginal Syllabics | 640 | 640 | 637 | 628 | 3.0 | | (?=\w)[\x{3400}-\x{4DBF}] | CJK Unified Ideographs Extension A | 6592 | 6592 | 6592 | 6582 | 3.0 | | (?=\w)[\x{31A0}-\x{31BF}] | Bopomofo Extended | 32 | 32 | 32 | 24 | 3.0 | | (?=\w)[\x{1100}-\x{11FF}] | Hangul Jamo | 256 | 256 | 256 | 240 | 3.1 | | (?=\w)[\x{1700}-\x{171F}] | Tagalog | 32 | 23 | 19 | 17 | 3.2 | | (?=\w)[\x{0500}-\x{052F}] | Cyrillic Supplement | 48 | 48 | 48 | 36 | 3.2 | | (?=\w)[\x{1900}-\x{194F}] | Limbu | 80 | 68 | 41 | 39 | 4.0 | | (?=\w)[\x{2C00}-\x{2C5F}] | Glagolitic | 96 | 96 | 96 | 94 | 4.1 | | (?=\w)[\x{2C80}-\x{2CFF}] | Coptic | 128 | 123 | 107 | 101 | 4.1 | | (?=\w)[\x{2D00}-\x{2D2F}] | Georgian Supplement | 48 | 40 | 40 | 38 | 4.1 | | (?=\w)[\x{2E00}-\x{2E7F}] | Supplemental Punctuation | 128 | 94 | 1 | 0 | 4.1 | | (?=\w)[\x{1980}-\x{19DF}] | New Tai Lue | 96 | 83 | 80 | 59 | 4.1 | | (?=\w)[\x{2D30}-\x{2D7F}] | Tifinagh | 80 | 59 | 57 | 55 | 4.1 | | (?=\w)[\x{A700}-\x{A71F}] | Modifier Tone Letters | 32 | 32 | 9 | 0 | 4.1 | | (?=\w)[\x{2C60}-\x{2C7F}] | Latin Extended-C | 32 | 32 | 32 | 29 | 5.0 | | (?=\w)[\x{1B00}-\x{1B7F}] | Balinese | 128 | 127 | 65 | 64 | 5.0 | | (?=\w)[\x{A720}-\x{A7FF}] | Latin Extended-D | 224 | 204 | 200 | 109 | 5.0 | | (?=\w)[\x{1B80}-\x{1BBF}] | Sundanese | 64 | 64 | 48 | 42 | 5.1 | | (?=\w)[\x{A640}-\x{A69F}] | Cyrillic Extended-B | 96 | 96 | 78 | 69 | 5.1 | •---------------------------•----------------------------------------------- •----------•----------•-----------•------------•---------•

      This time, we can see that the **Unicode releases, listed in this table, are all inferior to the Unicode 5.2 release. I haven’t exactly identified the problem, so far, for these blocks !

      Fourthly, in the table below, I listed all blocks where the N++ search and MultiReplace return a number of WORD chars greater than in the Columns++ column :

      •---------------------------•------------------------------------------------•----------•----------•-----------•------------•---------• | Block range | Block name | Total | Assigned | Columns++ | N++ / MRep | Unicode | | | | Code-Pts | Code-Pts | Word Chrs | Word Chrs | Version | •---------------------------•------------------------------------------------•----------•----------•-----------•------------•---------• | (?=\w)[\x{0080}-\x{00FF}] | Latin-1 Supplement | 128 | 128 | 65 | 68 | 1.0 | | (?=\w)[\x{0E00}-\x{0E7F}] | Thai | 128 | 87 | 67 | 83 | 1.0 | | (?=\w)[\x{2150}-\x{218F}] | Number Forms | 64 | 60 | 2 | 41 | 1.0 | | (?=\w)[\x{3000}-\x{303F}] | CJK Symbols and Punctuation | 64 | 64 | 9 | 22 | 1.0 | | (?=\w)[\x{1800}-\x{18AF}] | Mongolian | 176 | 158 | 139 | 140 | 3.0 | •---------------------------•------------------------------------------------•----------•----------•-----------•------------•---------•

      Again, I don’t understand clearly these differences between the two last columns !

      Best Regards,

      guy038

    • donhoD

      Notepad++ v8.9.2 Release Candidate

      Watching Ignoring Scheduled Pinned Locked Moved Announcements
      1
      1 Votes
      1 Posts
      15 Views
      No one has replied
    • Charles BuegeC

      Adding a shortcut to a language....

      Watching Ignoring Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
      6
      1 Votes
      6 Posts
      4k Views
      PeterJonesP

      I had started to reply to a recent post here, but that post was deleted while I was working on my answer.

      But there was part of my response that I think might be useful to future readers of this topic, as it’s not as commonly understood:

      When using notepad.runMenuCommand(...) in the PythonScript plugin, it is using the actual text from the menu names. Thus, it is affected by Settings > Preferences > General > Localization – thus, using the XML example above, in the default English localization, it would be notepad.runMenuCommand("Language", "XML"). But if you were in Pig Latin (for example), it would have to be notepad.runMenuCommand("Anguagelay", "XML"); or if you were in Greek, it would have to be notepad.runMenuCommand("Γλώσσα", "XML")

      (Aside: there’s actually a quirk, though: if you have switched from, for example, Pig Latin to Greek, then until you restart Notepad++, you can use either the Pig Latin or the Greek menu-name-strings in your runMenuCommand calls. Once you restart Notepad++, you can only use the active Localization for those strings.)

      (I don’t know if Localization was actually the problem in the now-deleted post, but it might have been, and it could easily be a problem for other users as well; This actually came up recently, and since it has bearing on this discussion, I thought I would share that main idea here, too: if you are using Localization other than English, you will have to adapt any runMenuCommand calls in your PythonScript scripts to use the actual name of the menu in your GUI, not the English names – whether or not it would have solved the deleted problem, it’s important enough to share.)

    • Rob PintoR

      Alternative for Notepad++ on Mac

      Watching Ignoring Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
      37
      0 Votes
      37 Posts
      1m Views
      Chris RichardsonC

      Being someone who found this post when trying to find something for the Mac, I wanted to mention the project I am doing at the moment.

      I decided to write something myself, and it will be on the App Store shortly.

      https://madmunky.github.io/MacPadPlusPlus-public/

      Still lots of work to do, I implemented the features I use most first, so feel free to make feature requests. I love Notepad++, but nothing did what I wanted when I had to switch from Windows to Mac, so the above was created.

      This will be 99p, the reason is to cover the Apple developer license needed to publish in the App Store.

    • Mr X.M

      notepad++ loading takes a long time

      Watching Ignoring Scheduled Pinned Locked Moved Boycott Notepad++
      1
      0 Votes
      1 Posts
      7 Views
      No one has replied