Community
    • Login
    1. Home
    2. Popular
    Log in to post
    • All Time
    • Day
    • Week
    • Month
    • All Topics
    • New Topics
    • Watched Topics
    • Unreplied Topics
    • All categories
    • CoisesC

      Columns++ version 1.3: All Unicode, all the time

      Watching Ignoring Scheduled Pinned Locked Moved Notepad++ & Plugin Development
      16
      5 Votes
      16 Posts
      995 Views
      CoisesC

      @guy038 said in Columns++ version 1.3: All Unicode, all the time:

      Note that the \p{Hex_Digit} regex is erroneous ! The right one is \p{xdigit}, at least, within Columns++

      What’s going on there is that I followed the structure of Boost::regex character classes:

      Character Classes that are Always Supported

      Character classes that are supported by Unicode Regular Expressions

      which are mainly the POSIX character classes plus Unicode General Categories interpreted as character classes. Also, note that in Boost::regex, character classes and character properties are the same thing. I didn’t make any attempt to change that. I believe this is different both from Unicode regular expressions and from PCRE.

      (I did add a couple new character classes unique to Columns++: [:defined:] and [:invalid:], and aliases \i, \o and \y for [:invalid:], [:ASCII:] and [:defined:]. Also, Columns++ does not support [:Cs:]/[:Surrogate:] since Unicode in Scintilla can only be UTF-8, which cannot contain surrogates — though it can contain invalid byte sequences which appear to encode surrogates, as in WTF-8; Scintilla treats these as invalid UTF-8 bytes, and so does Columns++.)

      Hex_Digit isn’t one of the Boost::regex character classes, and I never defined it. Defining it to be equivalent to xdigit would be trivial; re-defining xdigit to include non-ASCII characters is a bit more complicated:

      I’ve found out a small anomaly concerning hexadecimal characters :

      If I use the native Notepad++ search to match any hexadecimal character, with the regex [[:xdigit:]], against my Total_Chars.txt file, it returns 44 matches

      If I use the Columns++ search to match any hexadecimal character, with the regex [[:xdigit:]], against my Total_Chars.txt file, it returns 22 matches

      I suppose that the N++ answer is the right one. Indeed, in the https://www.unicode.org/reports/tr18/#Compatibility_Properties article , ( Annexe C about UNICODE REGULAR EXPRESSIONS ), it is said :

      Hex_Digit contains 0-9 A-F fullwidth and halfwidth, upper and lowercase

      Yes, it would seem the standard is to include those non-ASCII characters as hex digits. Further, the comments at your link under lower and upper are troublesome, as Columns++ treats them as aliases for Ll and Lu. Word and word boundaries are probably faulty as well.

      I followed the Boost::regex principle that to extend the traditional POSIX mappings, the only Unicode property that is used to determine membership in a character class is the General Category.

      I hard-coded (that is, they are written explicitly rather than being derived from Unicode tables) the POSIX mappings for ASCII characters, since that’s the only place they are really well-defined; plus there is a hard-coded exception for the non-ASCII character U+0085, the Next Line control character, because it should be part of \v, which is implemented in Boost::regex as [[:v:]]. I don’t see any reason [[:xdigit:]] can’t be extended with similar hard-coded logic; I just didn’t know until now that I should do it.

      The other parts, though: whatever they are saying is supposed to be included in [:lower:] and [:upper:] besides letters, and whatever they are talking about in regard to word characters and boundaries… that might be problematic. I have a condensed set of tables built from a few Unicode files, instead of trying to import the ghastly large and complex ICU. Those tables include the General Category, but if that is not enough to determine membership in a character class… reorganizing them to include whatever additional information I need (it’s not yet clear to me what that will be) is not likely to be simple.

      Thank you for your observation. Indeed, there are flaws. It is not yet clear to me if and how it will be practical to address them, though I can probably fix the [:xdigit:] behavior without much difficulty.

    • conky77C

      show the current zoom

      Watching Ignoring Scheduled Pinned Locked Moved Notepad++ & Plugin Development
      30
      1 Votes
      30 Posts
      20k Views
      Alan KilbornA

      @Javier-Utrilla said in show the current zoom:

      something like U | Zoom +10

      It’s fine…but why not make the abbreviated encoding MORE descriptive, and the zoom info LESS descriptive. E.g. (just my thinking): U8B |Z+10

    • Pierre le LidgeuP

      Cursor become black square at line selection

      Watching Ignoring Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
      8
      0 Votes
      8 Posts
      1k Views
      Andrzej JaworskiA

      @mpheath. Thank you so much. You found a simple and very effective solution. It works 100 percent. Now I see the desired cursor instead of the black square. Thanks again.

    • Jan LarsenJ

      Screen goes blank when switching to Notepad++

      Watching Ignoring Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
      1
      0 Votes
      1 Posts
      10 Views
      No one has replied
    • dz15mlruD

      [Question, performance analyse]. Which documents eats more resources?

      Watching Ignoring Scheduled Pinned Locked Moved General Discussion session resources unsaved performance analysis
      1
      0 Votes
      1 Posts
      26 Views
      No one has replied
    • Chec PufosC

      Need help with finding and replacing

      Watching Ignoring Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
      2
      0 Votes
      2 Posts
      62 Views
      PeterJonesP

      @Chec-Pufos ,

      Your specs are more vague than you think they are.

      Assuming you want to replace any positive integer immediately after that specific I:max= with 987 and any positive integer immediately after that specific I:min= with 654, then it could be done with something like:
      FIND = (?-is)(Size\s+{\s*.*\R\s*I:max=)\d+(\s*.*\R\s*I:min=)\d+
      REPLACE = ${1}987${2}654
      SEARCH MODE = Regular Expression
      REPLACE IN FILES

      hopefully, you can figure out what to change in order to replace with values other than 987 or 654

      highly recommended: always back up your data before trying a regex that someone hands you; always try a new regex on a single file and make sure it behaves as you expect before trying on the full 30 files.

      ----

      Useful References Please Read Before Posting Template for Search/Replace Questions Formatting Forum Posts Notepad++ Online User Manual: Searching/Regex FAQ: Where to find other regular expressions (regex) documentation