Community
    • Login

    word character list - special characters █►◄ not selected as expected

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    12 Posts 5 Posters 1.7k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • guy038G
      guy038
      last edited by

      Hello, @mohammad-hussain,

      I’m elaborating an answer. Just be patient for a while ! Thanks !

      See you later,

      Best Regards,

      guy038

      1 Reply Last reply Reply Quote 0
      • guy038G
        guy038
        last edited by guy038

        Hi, , @mohammad-hussain and All,

        Your special characters are :

        • The FULL BLOCK character ( █ ), from the Unicode Block Elements script, with code-point \x{2588}

        Refer    http://www.unicode.org/charts/PDF/U2580.pdf

        • The BLACK RIGHT-POINTING POINTER character ( ► ), from the Unicode Geometric Shapes script, with code-point \x{25BA}

        • The BLACK LEFT-POINTING POINTER character ( ◄ ), from the Unicode Geometric Shapes script, with code-point \x{25C4}

        Refer    http://www.unicode.org/charts/PDF/U25A0.pdf

        I didn’t search in previous/old versions of Notepad++, for verifications, but I’m afraid that you cannot set characters with code over \x{0080} when using a multi-byte encoding ( So all Unicode encodings : UTF-8, UTF-8-BOM, UCS-2 BE BOM and UCS-2 LE BOM )

        Refer to    https://www.scintilla.org/ScintillaDoc.html#SCI_GETWORDCHARS

        It is said :

        For multi-byte encodings, this API will not return meaningful values for 0x80 and above.

        So the Scintilla message SCI_SETWORDCHARS, to change the set of words characters, can handle only ASCII characters, if you use, for instance, the default UTF-8 encoding


        So, instead of using exotic Unicode characters, I was thinking about using the MACRON symbol of Unicode code-point \x{00AF} ( Don’t laugh ! No relation with the President of the French Republic…as I’m French ! )

        Refer to    http://www.unicode.org/charts/PDF/U0080.pdf

        All that’s next is, of course, a work-around but you may like it and even give yourself other ideas !

        Of course, as its code-point is higher than \x{007F} you will not able to select it, along with some word chars. However :

        • To write it use, either, the shortcut ALT + 0175 ( Unicode, ANSI or Win-1252 encodings ) or ALT + 238 ( OEM-850 encoding )

        • It can help to isolate your words, easily enough, among other normal text. For instance : This is ¯¯¯Domain¯¯¯ a quick test !

        • You could highlight any occurrence of that specific character, using, for instance, Search > Mark All > Using 1st Style OR the context menu Style token > Using 1st Style, after selecting it

        • But above all, you may move from one highlighting to another, with the shortcuts Ctrl + 1 ( forward ) and Ctrl + Shift + 1 ( backward ). Note that your must use the 1 key of the main keyboard !

        • On the other hand, you could also use the ¯+.+?¯+ regular expression , in the Find or Mark dialogs, to match anything embedded between ¯ characters ! And delete these matches leaving the replacement zone empty

        BTW, I verified that if I include the ¯ character as a word character, in Preferences... > Delimiter > Word character list, you can select, for instance, all the string ¯¯¯Domain¯¯¯ with a double-click, if typed in an ANSI encoded file

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 2
        • Mohammad HussainM
          Mohammad Hussain
          last edited by

          Sorry for the late reply (I spent some time looking at ranges and testing different characters).

          Also, thank you very much for your incredibly detailed reply. I can’t believe how much time you’ve spent trying to help. Truly, truly appreciated!!

          Unfortunately, None of this will work well for what I’m doing. Here’s a more clear example of a line in one of the files I distribute to my colleagues, and sometimes clients:
          Generate GUID:
          https://█►Domain◄█/d2l/guids/d2l.guid.2.asmx/GenerateExpiringGuid?guidType=SSO&orgId=█►MainOrgID◄█&installCode=█►InstallationCode◄█&TTL=60&data=█►Username◄█&key=█►LocalPrivateKey◄█

          As you can see, not only the characters I chose are very visible, they also clearly indicate which part to modify, with the arrows helping with that.

          I checked all the characters within the 0080 range, and none of them work for my purpose. The only arrow-like characters are used in html/xml files, so using them will be very confusing if someone is trying to edit html/xml.

          As for the Macron character (very funny btw!), it’s not obvious enough, although it’s clearly more obvious than most other options. The other issue with it is it doesn’t belong to the 0080 range either, which means (as you mentioned), it only works with ANSI encoding, but not Unicode. All of my files are in Unicode.

          I don’t fully understand what Scintilla is, but it does sound like a library/dependency beyond the control of Notepad++ code. If that’s the case, I guess I’ll just keeping the same characters I was using (for visibility/ease of use), and everyone should remove them manually, and hopefully, they will remove the right amount of characters without introducing errors. It’s unfortunate though. Using them before was very convenient…

          Thank you again @guy038. I truly appreciate you help :)

          1 Reply Last reply Reply Quote 2
          • PeterJonesP
            PeterJones
            last edited by

            @Mohammad-Hussain said in word character list - special characters █►◄ not selected as expected:

            everyone should remove them manually, and hopefully, they will remove the right amount of characters

            There is an alternative to fully-manual removal. Instructions:

            1. Double-click and overtype as they previously did, which will change █►DOMAIN◄█ into █►blah.url◄█ (for example)
            2. After they finished all the replacements necessary, Search > Replace
              • FIND = [\x{2588}\x{25ba}\x{25c4}]
              • REPLACE = (leave box empty)
              • Search Mode = regular expression
              • REPLACE ALL

            No hoping required.

            Alternate: Don’t have them manually double-click.

            1. Use Search > Find from the beginning
              • FIND = \x{2588}\x{25ba}.*?\x{25c4}\x{2588}
              • Search Mode = regular expression
              • FIND NEXT
            2. click on the tab bar; if they lost the selection (by clicking in the text instead of in the tab bar), hit F3 to re-highlight the next instance
            3. type over the selected text, which will include typing over the █►DOMAIN◄█ into blah.url, so getting rid of the fancy characters
            4. hit F3 and repeat typeover for all the █►...◄█ instances

            if, in your encoding, the unicode \x{....} characters doesn’t match, you’d have to tell us what encoding you’re actually using (or possibly just paste in the actual characters, rather than the \x{....} notation).

            Alan KilbornA 1 Reply Last reply Reply Quote 4
            • Alan KilbornA
              Alan Kilborn @PeterJones
              last edited by

              @PeterJones

              Two very good solutions; nicely done.

              Additionally, maybe recording some macros helps, and/or the Mark function. After marking, you can jump between marks by using Search > Jump down (or up) > Find Style

              1 Reply Last reply Reply Quote 3
              • Mohammad HussainM
                Mohammad Hussain
                last edited by

                Thank you gentlemen!

                Very elegant solutions indeed :)

                I’ll probably use these myself (probably the macro one. Automation saves time). Most of my colleagues however don’t even know what regular expressions are, not to mention clients! lol! I guess they’ll either have to do this manually, or just use simple search to remove these characters.

                Thanks again:)

                Have a great day everyone! Stay safe :)

                1 Reply Last reply Reply Quote 4
                • Alan KilbornA
                  Alan Kilborn
                  last edited by

                  @Mohammad-Hussain said in word character list - special characters █►◄ not selected as expected:

                  Most of my colleagues however don’t even know what regular expressions are

                  Even better for a macro-based solution; just bind a regular expression operation to a keycombo for them, and they don’t need to know much to use it.

                  1 Reply Last reply Reply Quote 3
                  • guy038G
                    guy038
                    last edited by guy038

                    Hi, @mohammad-hussain, @alan-kilborn, @peterjones and All,

                    @mohammad-hussain, in the second part of this post, I will describe a solution, using macros, for the search of each zone █►...........◄█, in each direction ( forward and backward )

                    However, I would like, first, to discuss, with Alan and Peter, of a regex search bug that I had already noticed but which did not worry me too much. However, presently, it is very annoying, regarding macro behaviour, involving searches !

                    Luckily, @mohammad-hussain, I’ve found out a work-around which will enable you to create two macros and use them to search forward / backward for your █►...........◄█ zones ;-))


                    So, first, let me explain the bug :

                    • Open a new tab

                    • Insert the sample text START é12345 é ABCDEZéGHIùJKZé é67890 é TUVWùXYZé END Zé, containing the very common French letter è and two letters ù

                    • Place the caret at beginning of word START

                    • Open the Find dialog

                    • SEARCH é

                    • Tick the Wrap around option ( IMPORTANT )

                    • Select the Regular expression mode

                    • Click on the Find Next button

                    => The first é of the string é12345 is selected

                    • Close the Find dialog

                    • Go on, hitting the F3 key

                    => You get the successive occurrences of the è letter

                    Now, hit the Shift + F3 for a backward search => nothing happens :-(( Backward search is impossible to perform

                    Notes :

                    • After tests, this bug occurs when the search ends with a character with code-point > \x7F ( so NON pure ASCII char )

                      • Search of regexes .é, \ué or Zé did not work in backward direction, even if you choose the Backward direction option

                      • Search of the regex .[\x{0080}-\x{FFFF}] did not work, either, in backward search

                    • But :

                      • Search of regexes é., é\x20, é\w, .é., .é\x20 or \ué. does search in backward direction

                      • Search of the regex .[\x{0000}-\x{007F}] or é[\x{0000}-\x{007F}] does work, as well, in backward search

                    • This bug only occurs with an Unicode encoding ( UTF-8, UTF-8-BOM, UCS-2 BE BOM and UCS-2 LE BOM ). With an ANSI encoded file, no bug at all !

                    • This bug does not happen, either, if you use the Normal or Extended (\n, \r, \t, \0, \x...) search mode

                    So, do you confirm, guys, that it’s a real bug ? If so, I’ll create an issue, soon


                    Mates, you may think : he’s going to give up ? No, I’m a little stubborn, even quite a lot ! So, do you see a possible work-around to that problem ?

                    Ah, ah ! Well, the magical regex is (?=(?s).). ( Almost ) obviously, this look-ahead assertion is always TRUE, isdn’t it ?. This expression misleads the regular expression engine, by making it believe that there is some additional kind of character to be taken into account !

                    So, in the meanwhile, here is a new regex rule :

                    • When you cannot perform a backward search, in regular expression mode, simply add the (?=(?s).) syntax, at the end of you present search regex ;-))

                    Now, @mohammad-hussain, with this work-around, here are, below, the two macros to be appended at the end of the <Macros>.........</Macros> node of your active shortcuts.xml configuration file :

                            <Macro name="Search Zones to Modify (Fwd)" Ctrl="yes" Alt="no" Shift="no" Key="123">                                <!-- Ctrl + F12 shortcut      -->
                                <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />                                              <!-- Search Initialisation    -->
                                <Action type="3" message="1601" wParam="0" lParam="0" sParam="\x{2588}\x{25ba}.*?\x{25c4}\x{2588}(?=(?s).)" />  <!-- Search of |>........<|   -->
                                <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />                                              <!-- Regular Expression mode  -->
                                <Action type="3" message="1702" wParam="0" lParam="768" sParam="" />                                            <!-- Search Forward and Wrap  -->
                                <Action type="3" message="1701" wParam="0" lParam="1" sParam="" />                                              <!-- Find Next match          -->
                            </Macro>
                            <Macro name="Search Zones to Modify (Bwd)" Ctrl="yes" Alt="no" Shift="yes" Key="123">                               <!-- Ctrl + Shift + F12       -->
                                <Action type="3" message="1700" wParam="0" lParam="0" sParam="" />                                              <!-- Search Initialisation    -->
                                <Action type="3" message="1601" wParam="0" lParam="0" sParam="\x{2588}\x{25ba}.*?\x{25c4}\x{2588}(?=(?s).)" />  <!-- Search of |>........<|   -->
                                <Action type="3" message="1625" wParam="0" lParam="2" sParam="" />                                              <!-- Regular Expression mode  -->
                                <Action type="3" message="1702" wParam="0" lParam="256" sParam="" />                                            <!-- Search Backward and Wrap -->
                                <Action type="3" message="1701" wParam="0" lParam="1" sParam="" />                                              <!-- Find Previous match      -->
                            </Macro>
                    

                    Remark :

                    Depending if you have a local N++ install or not, your shortcuts.xml file can be found :

                    • Along with the notepad++.exe file, for a local configuration, in any folder different from C:\Program files[(x86)]

                    • In the path %AppData%\Notepad++, in case of use of the installer to install N++


                    I just tried it, with the last v7.8.6 version and everything went OK ! So, in summary :

                    • To get the next █►...........◄█ zone, hit the Ctrl + F12 shortcut, which runs the Search Zones to Modify (Fwd) macro

                    • To get the previous █►...........◄█ zone, hit the Ctrl + Shift F12 shortcut, which runs the Search Zones to Modify (Bwd) macro

                    • Bonus, if you hit the F12 key, you swap between the Post-It screen mode and the Normal screen mode ;-))

                    • On the other hand, you can also run a completely independent search with the F3 and Shift + F3 shortcuts

                    Best Regards,

                    guy038

                    P.S. :

                    To be rigorous, the look-ahead syntax (?=(?s).) match at any position, within the file but at the very end of file !

                    So, in case of a █►...........◄█ zone, at the very end of file, simply add a final line-break, after that zone

                    Alan KilbornA 1 Reply Last reply Reply Quote 4
                    • Alan KilbornA
                      Alan Kilborn @guy038
                      last edited by

                      @guy038 said in word character list - special characters █►◄ not selected as expected:

                      So, do you confirm, guys, that it’s a real bug ? If so, I’ll create an issue, soon

                      I confirm the findings.
                      But I already thought that backwards search in Regular Expression Search mode was problematic in Notepad++.
                      So, it seems it is nothing truly new, except another example of the problems.

                      astrosofistaA 1 Reply Last reply Reply Quote 1
                      • astrosofistaA
                        astrosofista @Alan Kilborn
                        last edited by

                        @Alan-Kilborn said in word character list - special characters █►◄ not selected as expected:

                        I confirm the findings.

                        Hi @Alan-Kilborn, @guy038 and All:

                        Me too. Ran only the first tests, not those under the Notes.

                        By the way, @guy038, your magical regex (?=(?s).) is a nice catch. Thank you, I saved it.

                        You may want to know one useless but curious thing I found while playing with regex, is an expression that by repeatedly pressing Find Next confines the caret to the first word of the document, making it move in circles from the beginning to the end of the word: \A(?=\b).

                        Have fun!

                        Alan KilbornA 1 Reply Last reply Reply Quote 0
                        • Alan KilbornA
                          Alan Kilborn @astrosofista
                          last edited by

                          @astrosofista said

                          caret…move in circles from the beginning to the end of the word: \A(?=\b).

                          You must mean with Wrap around ticked.
                          I’m not surprised by the behavior of this regex.

                          It makes sense how it is working.
                          Well, within the confines of Notepad++ anyway. :-)

                          1 Reply Last reply Reply Quote 0
                          • First post
                            Last post
                          The Community of users of the Notepad++ text editor.
                          Powered by NodeBB | Contributors