word character list - special characters █►◄ not selected as expected
-
Hi, , @mohammad-hussain and All,
Your special characters are :
- The FULL BLOCK character (
█
), from the Unicode Block Elements script, with code-point\x{2588}
Refer http://www.unicode.org/charts/PDF/U2580.pdf
-
The BLACK RIGHT-POINTING POINTER character (
►
), from the Unicode Geometric Shapes script, with code-point\x{25BA}
-
The BLACK LEFT-POINTING POINTER character (
◄
), from the Unicode Geometric Shapes script, with code-point\x{25C4}
Refer http://www.unicode.org/charts/PDF/U25A0.pdf
I didn’t search in previous/old versions of Notepad++, for verifications, but I’m afraid that you cannot set characters with code over
\x{0080}
when using a multi-byte encoding ( So all Unicode encodings :UTF-8
,UTF-8-BOM
,UCS-2 BE BOM
andUCS-2 LE BOM
)Refer to https://www.scintilla.org/ScintillaDoc.html#SCI_GETWORDCHARS
It is said :
For multi-byte encodings, this API will not return meaningful values for 0x80 and above.
So the Scintilla message
SCI_SETWORDCHARS
, to change the set of words characters, can handle only ASCII characters, if you use, for instance, the defaultUTF-8
encoding
So, instead of using exotic Unicode characters, I was thinking about using the MACRON symbol of Unicode code-point \x{00AF} ( Don’t laugh ! No relation with the President of the French Republic…as I’m French ! )
Refer to http://www.unicode.org/charts/PDF/U0080.pdf
All that’s next is, of course, a work-around but you may like it and even give yourself other ideas !
Of course, as its code-point is higher than
\x{007F}
you will not able to select it, along with some word chars. However :-
To write it use, either, the shortcut
ALT + 0175
( Unicode, ANSI or Win-1252 encodings ) orALT + 238
( OEM-850 encoding ) -
It can help to isolate your words, easily enough, among other normal text. For instance : This is ¯¯¯Domain¯¯¯ a quick test !
-
You could highlight any occurrence of that specific character, using, for instance,
Search > Mark All > Using 1st Style
OR the context menuStyle token > Using 1st Style
, after selecting it -
But above all, you may move from one highlighting to another, with the shortcuts
Ctrl + 1
( forward ) andCtrl + Shift + 1
( backward ). Note that your must use the1
key of the main keyboard ! -
On the other hand, you could also use the
¯+.+?¯+
regular expression , in the Find or Mark dialogs, to match anything embedded between¯
characters ! And delete these matches leaving the replacement zone empty
BTW, I verified that if I include the
¯
character as a word character, inPreferences... > Delimiter > Word character list
, you can select, for instance, all the string ¯¯¯Domain¯¯¯ with a double-click, if typed in an ANSI encoded fileBest Regards,
guy038
- The FULL BLOCK character (
-
Sorry for the late reply (I spent some time looking at ranges and testing different characters).
Also, thank you very much for your incredibly detailed reply. I can’t believe how much time you’ve spent trying to help. Truly, truly appreciated!!
Unfortunately, None of this will work well for what I’m doing. Here’s a more clear example of a line in one of the files I distribute to my colleagues, and sometimes clients:
Generate GUID:
https://█►Domain◄█/d2l/guids/d2l.guid.2.asmx/GenerateExpiringGuid?guidType=SSO&orgId=█►MainOrgID◄█&installCode=█►InstallationCode◄█&TTL=60&data=█►Username◄█&key=█►LocalPrivateKey◄█As you can see, not only the characters I chose are very visible, they also clearly indicate which part to modify, with the arrows helping with that.
I checked all the characters within the 0080 range, and none of them work for my purpose. The only arrow-like characters are used in html/xml files, so using them will be very confusing if someone is trying to edit html/xml.
As for the Macron character (very funny btw!), it’s not obvious enough, although it’s clearly more obvious than most other options. The other issue with it is it doesn’t belong to the 0080 range either, which means (as you mentioned), it only works with ANSI encoding, but not Unicode. All of my files are in Unicode.
I don’t fully understand what Scintilla is, but it does sound like a library/dependency beyond the control of Notepad++ code. If that’s the case, I guess I’ll just keeping the same characters I was using (for visibility/ease of use), and everyone should remove them manually, and hopefully, they will remove the right amount of characters without introducing errors. It’s unfortunate though. Using them before was very convenient…
Thank you again @guy038. I truly appreciate you help :)
-
@Mohammad-Hussain said in word character list - special characters █►◄ not selected as expected:
everyone should remove them manually, and hopefully, they will remove the right amount of characters
There is an alternative to fully-manual removal. Instructions:
- Double-click and overtype as they previously did, which will change
█►DOMAIN◄█
into█►blah.url◄█
(for example) - After they finished all the replacements necessary, Search > Replace
- FIND =
[\x{2588}\x{25ba}\x{25c4}]
- REPLACE = (leave box empty)
- Search Mode =
regular expression
- REPLACE ALL
- FIND =
No hoping required.
Alternate: Don’t have them manually double-click.
- Use Search > Find from the beginning
- FIND =
\x{2588}\x{25ba}.*?\x{25c4}\x{2588}
- Search Mode =
regular expression
- FIND NEXT
- FIND =
- click on the tab bar; if they lost the selection (by clicking in the text instead of in the tab bar), hit F3 to re-highlight the next instance
- type over the selected text, which will include typing over the
█►DOMAIN◄█
intoblah.url
, so getting rid of the fancy characters - hit F3 and repeat typeover for all the
█►...◄█
instances
if, in your encoding, the unicode
\x{....}
characters doesn’t match, you’d have to tell us what encoding you’re actually using (or possibly just paste in the actual characters, rather than the\x{....}
notation). - Double-click and overtype as they previously did, which will change
-
Two very good solutions; nicely done.
Additionally, maybe recording some macros helps, and/or the Mark function. After marking, you can jump between marks by using Search > Jump down (or up) > Find Style
-
Thank you gentlemen!
Very elegant solutions indeed :)
I’ll probably use these myself (probably the macro one. Automation saves time). Most of my colleagues however don’t even know what regular expressions are, not to mention clients! lol! I guess they’ll either have to do this manually, or just use simple search to remove these characters.
Thanks again:)
Have a great day everyone! Stay safe :)
-
@Mohammad-Hussain said in word character list - special characters █►◄ not selected as expected:
Most of my colleagues however don’t even know what regular expressions are
Even better for a macro-based solution; just bind a regular expression operation to a keycombo for them, and they don’t need to know much to use it.
-
Hi, @mohammad-hussain, @alan-kilborn, @peterjones and All,
@mohammad-hussain, in the second part of this post, I will describe a solution, using macros, for the search of each zone
█►...........◄█
, in each direction ( forward and backward )However, I would like, first, to discuss, with Alan and Peter, of a regex search bug that I had already noticed but which did not worry me too much. However, presently, it is very annoying, regarding macro behaviour, involving searches !
Luckily, @mohammad-hussain, I’ve found out a work-around which will enable you to create two macros and use them to search forward / backward for your
█►...........◄█
zones ;-))
So, first, let me explain the bug :
-
Open a new tab
-
Insert the sample text
START é12345 é ABCDEZéGHIùJKZé é67890 é TUVWùXYZé END Zé
, containing the very common French letterè
and two lettersù
-
Place the caret at beginning of word
START
-
Open the
Find
dialog -
SEARCH
é
-
Tick the
Wrap around
option ( IMPORTANT ) -
Select the
Regular expression
mode -
Click on the Find Next button
=> The first
é
of the stringé12345
is selected-
Close the Find dialog
-
Go on, hitting the
F3
key
=> You get the successive occurrences of the
è
letterNow, hit the
Shift + F3
for a backward search => nothing happens :-(( Backward search is impossible to performNotes :
-
After tests, this bug occurs when the search ends with a character with code-point
> \x7F
( so NON pure ASCII char )-
Search of regexes
.é
,\ué
orZé
did not work in backward direction, even if you choose theBackward direction
option -
Search of the regex
.[\x{0080}-\x{FFFF}]
did not work, either, in backward search
-
-
But :
-
Search of regexes
é.
,é\x20
,é\w
,.é.
,.é\x20
or\ué.
does search in backward direction -
Search of the regex
.[\x{0000}-\x{007F}]
oré[\x{0000}-\x{007F}]
does work, as well, in backward search
-
-
This bug only occurs with an Unicode encoding (
UTF-8
,UTF-8-BOM
,UCS-2 BE BOM
andUCS-2 LE BOM
). With anANSI
encoded file, no bug at all ! -
This bug does not happen, either, if you use the
Normal
orExtended (\n, \r, \t, \0, \x...)
search mode
So, do you confirm, guys, that it’s a real bug ? If so, I’ll create an issue, soon
Mates, you may think : he’s going to give up ? No, I’m a little stubborn, even quite a lot ! So, do you see a possible work-around to that problem ?
Ah, ah ! Well, the magical regex is
(?=(?s).)
. ( Almost ) obviously, this look-ahead assertion is always TRUE, isdn’t it ?. This expression misleads the regular expression engine, by making it believe that there is some additional kind of character to be taken into account !So, in the meanwhile, here is a new regex rule :
- When you cannot perform a backward search, in regular expression mode, simply add the
(?=(?s).)
syntax, at the end of you present search regex ;-))
Now, @mohammad-hussain, with this work-around, here are, below, the two macros to be appended at the end of the
<Macros>.........</Macros>
node of your activeshortcuts.xml
configuration file :<Macro name="Search Zones to Modify (Fwd)" Ctrl="yes" Alt="no" Shift="no" Key="123"> <!-- Ctrl + F12 shortcut --> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <!-- Search Initialisation --> <Action type="3" message="1601" wParam="0" lParam="0" sParam="\x{2588}\x{25ba}.*?\x{25c4}\x{2588}(?=(?s).)" /> <!-- Search of |>........<| --> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <!-- Regular Expression mode --> <Action type="3" message="1702" wParam="0" lParam="768" sParam="" /> <!-- Search Forward and Wrap --> <Action type="3" message="1701" wParam="0" lParam="1" sParam="" /> <!-- Find Next match --> </Macro> <Macro name="Search Zones to Modify (Bwd)" Ctrl="yes" Alt="no" Shift="yes" Key="123"> <!-- Ctrl + Shift + F12 --> <Action type="3" message="1700" wParam="0" lParam="0" sParam="" /> <!-- Search Initialisation --> <Action type="3" message="1601" wParam="0" lParam="0" sParam="\x{2588}\x{25ba}.*?\x{25c4}\x{2588}(?=(?s).)" /> <!-- Search of |>........<| --> <Action type="3" message="1625" wParam="0" lParam="2" sParam="" /> <!-- Regular Expression mode --> <Action type="3" message="1702" wParam="0" lParam="256" sParam="" /> <!-- Search Backward and Wrap --> <Action type="3" message="1701" wParam="0" lParam="1" sParam="" /> <!-- Find Previous match --> </Macro>
Remark :
Depending if you have a local N++ install or not, your
shortcuts.xml
file can be found :-
Along with the
notepad++.exe
file, for a local configuration, in any folder different fromC:\Program files[(x86)]
-
In the path
%AppData%\Notepad++
, in case of use of the installer to install N++
I just tried it, with the last
v7.8.6
version and everything went OK ! So, in summary :-
To get the next
█►...........◄█
zone, hit theCtrl + F12
shortcut, which runs the Search Zones to Modify (Fwd) macro -
To get the previous
█►...........◄█
zone, hit theCtrl + Shift F12
shortcut, which runs the Search Zones to Modify (Bwd) macro -
Bonus, if you hit the
F12
key, you swap between thePost-It
screen mode and theNormal
screen mode ;-)) -
On the other hand, you can also run a completely independent search with the
F3
andShift + F3
shortcuts
Best Regards,
guy038
P.S. :
To be rigorous, the look-ahead syntax
(?=(?s).)
match at any position, within the file but at the very end of file !So, in case of a
█►...........◄█
zone, at the very end of file, simply add a final line-break, after that zone -
-
@guy038 said in word character list - special characters █►◄ not selected as expected:
So, do you confirm, guys, that it’s a real bug ? If so, I’ll create an issue, soon
I confirm the findings.
But I already thought that backwards search in Regular Expression Search mode was problematic in Notepad++.
So, it seems it is nothing truly new, except another example of the problems. -
@Alan-Kilborn said in word character list - special characters █►◄ not selected as expected:
I confirm the findings.
Hi @Alan-Kilborn, @guy038 and All:
Me too. Ran only the first tests, not those under the Notes.
By the way, @guy038, your magical regex
(?=(?s).)
is a nice catch. Thank you, I saved it.You may want to know one useless but curious thing I found while playing with regex, is an expression that by repeatedly pressing
Find Next
confines the caret to the first word of the document, making it move in circles from the beginning to the end of the word:\A(?=\b)
.Have fun!
-
@astrosofista said
caret…move in circles from the beginning to the end of the word: \A(?=\b).
You must mean with Wrap around ticked.
I’m not surprised by the behavior of this regex.It makes sense how it is working.
Well, within the confines of Notepad++ anyway. :-)