Minor typo in the manual for regex control character \c☒
-
The manual section for regular expressions / control characters has a minor typo in
\c☒ ⇒ The control character obtained from character ☒ by stripping all but its 6 lowest order bits.
That should be the 5 lowest bits, not 6.
\c☒
turns out to work with Unicode characters U+0000 to U+FFFF for ☒. For example, to find a tab which is \x09 you can use \c followed by a tab character itself or any of these: \c) \cI \ci \c \c© \cÉ \cé \cĉ \cĩ \cʼn \cũ \cƉ \cƩ \clj \cǩ \cȉ \cȩ \cɉ \cɩ \cʉ \cʩ \cˉ \c˩ \c̉ \c̩ \c͉ \cͩ \cΉ \cΩ \cω \cϩ \cЉ \cЩ \cщ \cѩ \c҉ \cҩ \cӉ \cө \cԉ \cԩ \cՉ \cթ \c։ \c֩ \cש \c؉ \cة \cى \c٩ \cډ \cک \cۉ \c۩ \c܉ \cܩ \c݉ \cݩ \cމ \cީ \c߉ \cߩ \cࠉ \cࠩ \cࡉ \cࡩ \cࢉ \cࢩ \cࣉ \cࣩ \cउ \cऩ \cॉ \c३ \cউ \c৩ \cਉ \c੩ \cઉ \cૉ \c૩ \cଉ \c୩ \cஉ \cன \c௩ \cఉ \c౩ \cಉ \c೩ \cഉ \cഩ \c൩ \cඉ \cඩ \c෩ \cฉ \cษ \c้ \cຉ \cຩ \c້ \c༉ \c༩ \cཉ \cཀྵ \cྉ \cྩ \c࿉ \cဉ \cဩ \c၉ \cၩ \cႉ \cႩ \cჩ \cᄉ \cᄩ \cᅉ \cᅩ \cᆉ \cᆩ \cᇉ \cᇩ \cሉ \cሩ \cቩ \cኩ \cዉ \cዩ \cጉ \cጩ \cፉ \c፩ \cᎉ \cᎩ \cᏉ \cᏩ \cᐉ \cᐩ \cᑉ \cᑩ \cᒉ \cᒩ \cᓉ \cᓩ \cᔉ \cᔩ \cᕉ \cᕩ \cᖉ \cᖩ \cᗉ \cᗩ \cᘉ \cᘩ \cᙉ \cᙩ \cᚉ \cᚩ \cᛉ \cᛩ \cᜉ \cᜩ \cᝉ \cᝩ \cញ \cឩ \c៉ \c៩ \c᠉ \cᠩ \cᡉ \cᡩ \cᢉ \cᢩ \cᣉ \cᣩ \cᤉ \cᤩ \c᥉ \cᥩ \cᦉ \cᦩ \cᧉ \c᧩ \cᨉ \cᨩ \cᩉ \cᩩ \c᪉ \c᪩ \c᫉ \cᬉ \cᬩ \cᭉ \c᭩ \cᮉ \cᮩ \cᯉ \cᯩ \cᰉ \cᰩ \c᱉ \cᱩ \cᲩ \cᳩ \cᴉ \cᴩ \cᵉ \cᵩ \cᶉ \cᶩ \c᷉ \cᷩ \cḉ \cḩ \cṉ \cṩ \cẉ \cẩ \cỉ \cứ \cἉ \cἩ \cὉ \cὩ \cᾉ \cᾩ \cΈ \cῩ \c \c \c⁉ \c \c₉ \c₩ \c⃩ \c℉ \c℩ \cⅉ \cⅩ \c↉ \c↩ \c⇉ \c⇩ \c∉ \c∩ \c≉ \c≩ \c⊉ \c⊩ \c⋉ \c⋩ \c⌉ \c〈 \c⍉ \c⍩ \c⎉ \c⎩ \c⏉ \c⏩ \c␉ \c⑉ \c⑩ \c⒉ \c⒩ \cⓉ \cⓩ \c┉ \c┩ \c╉ \c╩ \c▉ \c▩ \c◉ \c◩ \c☉ \c☩ \c♉ \c♩ \c⚉ \c⚩ \c⛉ \c⛩ \c✉ \c✩ \c❉ \c❩ \c➉ \c➩ \c⟉ \c⟩ \c⠉ \c⠩ \c⡉ \c⡩ \c⢉ \c⢩ \c⣉ \c⣩ \c⤉ \c⤩ \c⥉ \c⥩ \c⦉ \c⦩ \c⧉ \c⧩ \c⨉ \c⨩ \c⩉ \c⩩ \c⪉ \c⪩ \c⫉ \c⫩ \c⬉ \c⬩ \c⭉ \c⭩ \c⮉ \c⮩ \c⯉ \c⯩ \cⰉ \cⰩ \cⱉ \cⱩ \cⲉ \cⲩ \cⳉ \c⳩ \cⴉ \cⵉ \cⶉ \cⶩ \cⷉ \cⷩ \c⸉ \c⸩ \c⹉ \c⺉ \c⺩ \c⻉ \c⻩ \c⼉ \c⼩ \c⽉ \c⽩ \c⾉ \c⾩ \c⿉ \c〉 \c〩 \cぉ \cど \cら \cォ \cド \cラ \cㄉ \cㄩ \cㅉ \cㅩ \cㆉ \cㆩ \c㇉ \c㈉ \c㈩ \c㉉ \c㉩ \c㊉ \c㊩ \c㋉ \c㋩ \c㌉ \c㌩ \c㍉ \c㍩ \c㎉ \c㎩ \c㏉ \c㏩ \c㐉 \c㐩 \c㑉 \c㑩 \c㒉 \c㒩 \c㓉 \c㓩 \c㔉 \c㔩 \c㕉 \c㕩 \c㖉 \c㖩 \c㗉 \c㗩 \c㘉 \c㘩 \c㙉 \c㙩 \c㚉 \c㚩 \c㛉 \c㛩 \c㜉 \c㜩 \c㝉 \c㝩 \c㞉 \c㞩 \c㟉 \c㟩 \c㠉 \c㠩 \c㡉 \c㡩 \c㢉 \c㢩 \c㣉 \c㣩 \c㤉 \c㤩 \c㥉 \c㥩 \c㦉 \c㦩 \c㧉 \c㧩 \c㨉 \c㨩 \c㩉 \c㩩 \c㪉 \c㪩 \c㫉 \c㫩 \c㬉 \c㬩 \c㭉 \c㭩 \c㮉 \c㮩 \c㯉 \c㯩 \c㰉 \c㰩 \c㱉 \c㱩 \c㲉 \c㲩 \c㳉 \c㳩 \c㴉 \c㴩 \c㵉 \c㵩 \c㶉 \c㶩 \c㷉 \c㷩 \c㸉 \c㸩 \c㹉 \c㹩 \c㺉 \c㺩 \c㻉 \c㻩 \c㼉 \c㼩 \c㽉 \c㽩 \c㾉 \c㾩 \c㿉 \c㿩 \c䀉 \c䀩 \c䁉 \c䁩 \c䂉 \c䂩 \c䃉 \c䃩 \c䄉 \c䄩 \c䅉 \c䅩 \c䆉 \c䆩 \c䇉 \c䇩 \c䈉 \c䈩 \c䉉 \c䉩 \c䊉 \c䊩 \c䋉 \c䋩 \c䌉 \c䌩 \c䍉 \c䍩 \c䎉 \c䎩 \c䏉 \c䏩 \c䐉 \c䐩 \c䑉 \c䑩 \c䒉 \c䒩 \c䓉 \c䓩 \c䔉 \c䔩 \c䕉 \c䕩 \c䖉 \c䖩 \c䗉 \c䗩 \c䘉 \c䘩 \c䙉 \c䙩 \c䚉 \c䚩 \c䛉 \c䛩 \c䜉 \c䜩 \c䝉 \c䝩 \c䞉 \c䞩 \c䟉 \c䟩 \c䠉 \c䠩 \c䡉 \c䡩 \c䢉 \c䢩 \c䣉 \c䣩 \c䤉 \c䤩 \c䥉 \c䥩 \c䦉 \c䦩 \c䧉 \c䧩 \c䨉 \c䨩 \c䩉 \c䩩 \c䪉 \c䪩 \c䫉 \c䫩 \c䬉 \c䬩 \c䭉 \c䭩 \c䮉 \c䮩 \c䯉 \c䯩 \c䰉 \c䰩 \c䱉 \c䱩 \c䲉 \c䲩 \c䳉 \c䳩 \c䴉 \c䴩 \c䵉 \c䵩 \c䶉 \c䶩 \c䷉ \c䷩ \c三 \c丩 \c义 \c乩 \c争 \c亩 \c仉 \c仩 \c伉 \c伩 \c佉 \c佩 \c侉 \c侩 \c俉 \c俩 \c倉 \c倩 \c偉 \c偩 \c傉 \c傩 \c僉 \c僩 \c儉 \c儩 \c光 \c兩 \c冉 \c冩 \c凉 \c凩 \c刉 \c利 \c剉 \c剩 \c劉 \c助 \c勉 \c勩 \c匉 \c匩 \c卉 \c卩 \c厉 \c厩 \c叉 \c叩 \c吉 \c吩 \c呉 \c呩 \c咉 \c咩 \c哉 \c哩 \c唉 \c唩 \c啉 \c啩 \c喉 \c喩 \c嗉 \c嗩 \c嘉 \c嘩 \c噉 \c噩 \c嚉 \c嚩 \c囉 \c囩 \c圉 \c圩 \c坉 \c坩 \c垉 \c垩 \c埉 \c埩 \c堉 \c堩 \c塉 \c塩 \c墉 \c墩 \c壉 \c壩 \c変 \c天 \c奉 \c奩 \c妉 \c妩 \c姉 \c姩 \c娉 \c娩 \c婉 \c婩 \c媉 \c媩 \c嫉 \c嫩 \c嬉 \c嬩 \c孉 \c孩 \c安 \c宩 \c寉 \c審 \c尉 \c尩 \c屉 \c屩 \c岉 \c岩 \c峉 \c峩 \c崉 \c崩 \c嵉 \c嵩 \c嶉 \c嶩 \c巉 \c巩 \c帉 \c帩 \c幉 \c幩 \c庉 \c庩 \c廉 \c廩 \c弉 \c弩 \c彉 \c彩 \c徉 \c復 \c忉 \c忩 \c怉 \c怩 \c恉 \c恩 \c悉 \c悩 \c惉 \c惩 \c愉 \c愩 \c慉 \c慩 \c憉 \c憩 \c應 \c懩 \c戉 \c戩 \c扉 \c扩 \c抉 \c抩 \c拉 \c择 \c按 \c挩 \c捉 \c捩 \c掉 \c掩 \c揉 \c揩 \c搉 \c搩 \c摉 \c摩 \c撉 \c撩 \c擉 \c擩 \c攉 \c攩 \c敉 \c敩 \c斉 \c斩 \c旉 \c早 \c昉 \c昩 \c晉 \c晩 \c暉 \c暩 \c曉 \c曩 \c有 \c朩 \c杉 \c杩 \c枉 \c枩 \c柉 \c柩 \c栉 \c栩 \c桉 \c桩 \c梉 \c梩 \c棉 \c棩 \c椉 \c椩 \c楉 \c楩 \c榉 \c榩 \c槉 \c槩 \c樉 \c権 \c橉 \c橩 \c檉 \c檩 \c櫉 \c櫩 \c欉 \c欩 \c歉 \c歩 \c殉 \c殩 \c毉 \c毩 \c氉 \c氩 \c汉 \c汩 \c沉 \c沩 \c泉 \c泩 \c洉 \c洩 \c浉 \c浩 \c涉 \c涩 \c淉 \c淩 \c渉 \c温 \c湉 \c湩 \c溉 \c溩 \c滉 \c滩 \c漉 \c漩 \c潉 \c潩 \c澉 \c澩 \c濉 \c濩 \c瀉 \c瀩 \c灉 \c灩 \c炉 \c炩 \c烉 \c烩 \c焉 \c焩 \c煉 \c煩 \c熉 \c熩 \c燉 \c燩 \c爉 \c爩 \c牉 \c物 \c犉 \c犩 \c狉 \c狩 \c猉 \c猩 \c獉 \c獩 \c玉 \c玩 \c珉 \c珩 \c琉 \c琩 \c瑉 \c瑩 \c璉 \c璩 \c瓉 \c瓩 \c甉 \c甩 \c畉 \c畩 \c疉 \c疩 \c痉 \c痩 \c瘉 \c瘩 \c癉 \c癩 \c皉 \c皩 \c盉 \c盩 \c眉 \c眩 \c睉 \c睩 \c瞉 \c瞩 \c矉 \c矩 \c砉 \c砩 \c硉 \c硩 \c碉 \c碩 \c磉 \c磩 \c礉 \c礩 \c祉 \c祩 \c禉 \c禩 \c秉 \c秩 \c稉 \c稩 \c穉 \c穩 \c窉 \c窩 \c竉 \c竩 \c笉 \c笩 \c等 \c筩 \c箉 \c箩 \c築 \c篩 \c簉 \c簩 \c籉 \c籩 \c粉 \c粩 \c糉 \c糩 \c紉 \c紩 \c絉 \c絩 \c綉 \c綩 \c緉 \c緩 \c縉 \c縩 \c繉 \c繩 \c纉 \c纩 \c绉 \c绩 \c缉 \c缩 \c罉 \c罩 \c羉 \c義 \c翉 \c翩 \c耉 \c耩 \c聉 \c聩 \c肉 \c肩 \c胉 \c胩 \c脉 \c脩 \c腉 \c腩 \c膉 \c膩 \c臉 \c臩 \c舉 \c舩 \c艉 \c艩 \c芉 \c芩 \c苉 \c苩 \c茉 \c茩 \c草 \c荩 \c莉 \c莩 \c菉 \c菩 \c萉 \c萩 \c葉 \c葩 \c蒉 \c蒩 \c蓉 \c蓩 \c蔉 \c蔩 \c蕉 \c蕩 \c薉 \c薩 \c藉 \c藩 \c蘉 \c蘩 \c虉 \c虩 \c蚉 \c蚩 \c蛉 \c蛩 \c蜉 \c蜩 \c蝉 \c蝩 \c螉 \c螩 \c蟉 \c蟩 \c蠉 \c蠩 \c衉 \c衩 \c袉 \c袩 \c裉 \c裩 \c褉 \c褩 \c襉 \c襩 \c覉 \c覩 \c觉 \c觩 \c訉 \c訩 \c詉 \c詩 \c誉 \c誩 \c諉 \c諩 \c謉 \c謩 \c證 \c譩 \c讉 \c让 \c诉 \c诩 \c谉 \c谩 \c豉 \c豩 \c貉 \c販 \c賉 \c賩 \c贉 \c贩 \c赉 \c赩 \c趉 \c趩 \c跉 \c跩 \c踉 \c踩 \c蹉 \c蹩 \c躉 \c躩 \c軉 \c軩 \c載 \c輩 \c轉 \c轩 \c辉 \c辩 \c迉 \c迩 \c选 \c逩 \c遉 \c適 \c邉 \c邩 \c郉 \c郩 \c鄉 \c鄩 \c酉 \c酩 \c醉 \c醩 \c釉 \c釩 \c鈉 \c鈩 \c鉉 \c鉩 \c銉 \c銩 \c鋉 \c鋩 \c錉 \c錩 \c鍉 \c鍩 \c鎉 \c鎩 \c鏉 \c鏩 \c鐉 \c鐩 \c鑉 \c鑩 \c钉 \c钩 \c铉 \c铩 \c锉 \c锩 \c镉 \c镩 \c閉 \c閩 \c闉 \c闩 \c阉 \c阩 \c陉 \c险 \c隉 \c隩 \c雉 \c雩 \c霉 \c霩 \c靉 \c革 \c鞉 \c鞩 \c韉 \c韩 \c頉 \c頩 \c顉 \c顩 \c颉 \c颩 \c飉 \c飩 \c餉 \c餩 \c饉 \c饩 \c馉 \c馩 \c駉 \c駩 \c騉 \c騩 \c驉 \c驩 \c骉 \c骩 \c髉 \c髩 \c鬉 \c鬩 \c魉 \c魩 \c鮉 \c鮩 \c鯉 \c鯩 \c鰉 \c鰩 \c鱉 \c鱩 \c鲉 \c鲩 \c鳉 \c鳩 \c鴉 \c鴩 \c鵉 \c鵩 \c鶉 \c鶩 \c鷉 \c鷩 \c鸉 \c鸩 \c鹉 \c鹩 \c麉 \c麩 \c黉 \c黩 \c鼉 \c鼩 \c齉 \c齩 \c龉 \c龩 \c鿉 \c鿩 \cꀉ \cꀩ \cꁉ \cꁩ \cꂉ \cꂩ \cꃉ \cꃩ \cꄉ \cꄩ \cꅉ \cꅩ \cꆉ \cꆩ \cꇉ \cꇩ \cꈉ \cꈩ \cꉉ \cꉩ \cꊉ \cꊩ \cꋉ \cꋩ \cꌉ \cꌩ \cꍉ \cꍩ \cꎉ \cꎩ \cꏉ \cꏩ \cꐉ \cꐩ \cꑉ \cꑩ \cꒉ \c꒩ \cꓩ \cꔉ \cꔩ \cꕉ \cꕩ \cꖉ \cꖩ \cꗉ \cꗩ \cꘉ \c꘩ \cꙉ \cꙩ \cꚉ \cꚩ \cꛉ \cꛩ \c꜉ \cꜩ \cꝉ \cꝩ \c꞉ \cꞩ \cꟉ \cꠉ \c꠩ \cꡉ \cꡩ \cꢉ \cꢩ \c꣩ \c꤉ \cꤩ \cꥉ \cꥩ \cꦉ \cꦩ \c꧉ \cꧩ \cꨉ \cꨩ \cꩉ \cꩩ \cꪉ \cꪩ \cꫩ \cꬉ \cꬩ \cꭉ \cꭩ \cꮉ \cꮩ \cꯉ \cꯩ \cퟩ \c契 \c朗 \c雷 \c數 \c黎 \c囹 \c柳 \c里 \c降 \c﨩 \c爫 \c響 \c憎 \c睊 \c韛 \c﬩ \cשּ \cﭩ \cﮉ \cﮩ \cﯩ \cﰉ \cﰩ \cﱉ \cﱩ \cﲉ \cﲩ \cﳉ \cﳩ \cﴉ \cﴩ \c﵉ \cﵩ \cﶉ \cﶩ \c︉ \c︩ \c﹉ \c﹩ \cﺉ \cﺩ \cﻉ \cﻩ \c) \cI \ci \cゥ \cノ \cᄅ \c←All of those are Unicode characters where the lower five bits are
01001
.The intent behind
\c☒
was that it would be used with\ci
or\cI
asCtrl+I
is a tab. -
-
Hello, @mkupper, @peterjones and All,
I had a look to the part of the N++ documentation, regarding the way to find out the
C0 Control chars
, mentioned by @mkupper !Remember that the Unicode
C0 Control
characters range is the range[\x00-\x1F]
, ONLY !
Regarding the
\c☒
notation, theBoost
regex engine follows the rules of the equivalence table , below :0020 0040 0060 0080 00A0 00C0 00E0 0100 0120 ... FF80 003F 005F 007F 009F 00BF 00DF 00FF 011F 013F ... FF9F \x00 = NUL ( NULL ) = \x00 = \c \c@ \c` \cPAD \c \cÀ \cà \cĀ \cĠ ... \cタ \x01 = SOH ( START of HEADER ) = \x01 = \c! \cA \ca \cHOP \c¡ \cÁ \cá \cā \cġ ... \cチ \x02 = STX ( START of TEXT ) = \x02 = \c" \cB \cb \cBHP \c¢ \c \câ \cĂ \cĢ ... \cツ \x03 = ETX ( END of TEXT ) = \x03 = \c# \cC \cc \cNBH \c£ \cà \cã \că \cģ ... \cテ \x04 = EOT ( END of TRANSMISSION ) = \x04 = \c$ \cD \cd \cIND \c¤ \cÄ \cä \cĄ \cĤ ... \cト \x05 = ENQ ( ENQUIREMENT ) = \x05 = \c% \cE \ce \cNEL \c¥ \cÅ \cå \cą \cĥ ... \cナ \x06 = ACK ( ACKNOWLEDGEMENT ) = \x06 = \c& \cF \cf \cSSA \c¦ \cÆ \cæ \cĆ \cĦ ... \cニ \x07 = BEL ( BELL ) = \x07 = \c' \cG \cg \cESA \c§ \cÇ \cç \cć \cħ ... \cヌ \x08 = BS ( BACK SPACE ) = \x08 = \c( \cH \ch \cHTS \c¨ \cÈ \cè \cĈ \cĨ ... \cネ \x09 = TAB ( HORIZONTAL TABULATION ) = \x09 = \c) \cI \ci \cHTJ \c© \cÉ \cé \cĉ \cĩ ... \cノ \x0A = LF ( LINE FEED ) = \x0A = \c* \cJ \cj \cVTS \cª \cÊ \cê \cĊ \cĪ ... \cハ \x0B = VT ( VERTICAL TABULATION ) = \x0B = \c+ \cK \ck \cPLD \c« \cË \cë \cċ \cī ... \cヒ \x0C = FF ( FORM FEED ) = \x0C = \c, \cL \cl \cPLU \c¬ \cÌ \cì \cČ \cĬ ... \cフ \x0D = CR ( CARRIAGE RETURN ) = \x0D = \c- \cM \cm \cRI \c \cÍ \cí \cč \cĭ ... \cヘ \x0E = SO ( SHIFT OUT ) = \x0E = \c. \cN \cn \cSS2 \c® \cÎ \cî \cĎ \cĮ ... \cホ \x0F = SI ( SHIFT iN ) = \x0F = \c/ \cO \co \cSS3 \c¯ \cÏ \cï \cď \cį ... \cマ \x10 = DLE ( DELETE ) = \x10 = \c0 \cP \cp \cDCS \c° \cÐ \cð \cĐ \cİ ... \cミ \x11 = DC1 ( DEVICE CONTROL 1 ) = \x11 = \c1 \cQ \cq \cPU1 \c± \cÑ \cñ \cđ \cı ... \cム \x12 = DC2 ( DEVICE CONTROL 2 ) = \x12 = \c2 \cR \cr \cPU2 \c² \cÒ \cò \cĒ \cIJ ... \cメ \x13 = DC3 ( DEVICE CONTROL 3 ) = \x13 = \c3 \cS \cs \cSTS \c³ \cÓ \có \cē \cij ... \cモ \x14 = DC4 ( DEVICE CONTROL 4 ) = \x14 = \c4 \cT \ct \cCCH \c´ \cÔ \cô \cĔ \cĴ ... \cヤ \x15 = NAK ( NEGATIVE ACKNOWLEDGEMENT ) = \x15 = \c5 \cU \cu \cMW \cµ \cÕ \cõ \cĕ \cĵ ... \cユ \x16 = SYN ( SYNCHRONISATION ) = \x16 = \c6 \cV \cv \cSPA \c¶ \cÖ \cö \cĖ \cĶ ... \cヨ \x17 = ETB ( END TRANSMISSION BLOCK ) = \x17 = \c7 \cW \cw \cEPA \c· \c× \c÷ \cė \cķ ... \cラ \x18 = CAN ( CANCEL ) = \x18 = \c8 \cX \cx \cSOS \c¸ \cØ \cø \cĘ \cĸ ... \cリ \x19 = EM ( END of MEDIUM ) = \x19 = \c9 \cY \cy \cSGCI \c¹ \cÙ \cù \cę \cĹ ... \cル \x1A = SUB ( SUBSTITUTION ) = \x1A = \c: \cZ \cz \cSCI \cº \cÚ \cú \cĚ \cĺ ... \cレ \x1B = ESC ( ESCAPE ) = \x1B = \c; \c[ \c{ \cCSI \c» \cÛ \cû \cě \cĻ ... \cロ \x1C = FS ( FILE SEPARATOR ) = \x1C = \c< \c\ \c| \cST \c¼ \cÜ \cü \cĜ \cļ ... \cワ \x1D = GS ( GROUP SEPARATOR ) = \x1D = \c= \c] \c} \cOSC \c½ \cÝ \cý \cĝ \cĽ ... \cン \x1E = RS ( RECORD SEPARATOR ) = \x1E = \c> \c^ \c~ \cPM \c¾ \cÞ \cþ \cĞ \cľ ... \c゙ \x1F = US ( UNIT SEPARATOR ) = \x1F = \c? \c_ \c \cAPC \c¿ \cß \cÿ \cğ \cĿ ... \c゚
-
Note that the values, under the
0080 - 009F
column, represent the string\c
followed with the true C1 Control char, in the range[\x80-\x9F]
-
So, paradoxically, these
C1 Control
values may be used, also, to identify theC0 Control
characters !!
Thus, for example, if you want to search for any
SHIFT OUT
control char (), you can use any of these regexes :
\x0E
,\x{0E}
or\x{000E}
\c.
\cN
\cn
\c
\c®
\cÎ
\cî
\cĎ
\cĮ
...
...
...
\cホ
So, Peter when you say that the search
\c1
matches the SOH char (), it’s not exact. The
\c1
search do match the DC1 char () !
And I confirm that any
\c
string, followed with a char outside the BMP ( so over\x{FFFF}
), cannot be used to reach aC0 control
char !Best Regards,
guy038
-
-
@guy038 said in Minor typo in the manual for regex control character \c☒:
So, Peter when you say that the search \c1 matches
I didn’t say that. Most of the Regex documentation was direct copy/paste from the original Wiki version that the Manual was derived from, including that original phrasing. (It had been edited over time, but the original version still had it described essentially the same)
I will fix it, but it wasn’t my mistake originally. (Given that
1
and!
are on the same key on US keyboards, whoever typed that line in the original Wiki probably just didn’t hold down the shift key while trying to type the correct\c!
for the SOH).I will update the manual so it doesn’t use that example at all, and instead just keep the
\ca
and\cA
versions, since those are the ones that are mnemonicly helpful. -
Hi, @mkupper, @peterjones, and All,
Yes, @peterjones, you’re right about it : The
\cA
and\ca
syntaxes seem the only pertinent ones, in addition to the\x##
notation too !BR
guy038
-
@guy038, @peterjones, and others.
It turns out the
\c☒
topic gets fairly messy, and is far too messy to document the details in the manual. I started playing with ANSI…\c☒
with ANSI or ASCII codes\x00
to\x7F
works well and searches for the lower five bits of the☒
character. Realistically, you should only do it with A-Z or a-z. Better yet is to usex##
orx{####}
style expressions as it’s clearer as to what is being searched for.A case sensitive search for
\c☒
using ANSI codes\x80
to\xFF
matches ANSI codes in the\xE0
to\xFF
range, with some exceptions… The logic first extracts the lower five bits of ☒ and then bitwise-or that with11100000
or0xE0
. For example, all of these will match ANSI character0xEC
which isì
.Hex Pattern \x8C \cŒ \xAC \c¬ \xCC \cÌ \xEC \cì The lower five bits of the above hex codes ‘\x8C’, ‘\xAC’, ‘\xCC’, and ‘\xEC’ is
01100
or\x0C
and we bitwise-or that result with11100000
or0xE0
to search for\xEC
.It turns out that with one exception, all of the ANSI characters in the
\xE0
to\xFF
range are lower case letters. A case-insensitive search for\c☒
using ANSI codes \x80 to \xFF works just like the case-insensitive version I just described but also matches the upper case forms of the letters in\xE0
to\xFF
range.The one exception is ANSI character code
\xF7
which is a divide by sign÷
. A search for\c—
,\c·
,\c×
, or\c÷
only matches÷
when you use a case-insensitive search.Searching for
\c
(\x20),\c@
(\x40),\c`
(\x60),\c€
(\x80),\c
(\xA0),\cÀ
(\xC0), and\cà
(\xE0) all matchNUL
(\x00) in ANSI encoded files. With one exception also matchNUL
(\x{0000}) in UTF-8 encoded files. The exception is searching for\c€
(\x80) matches\x{000C} (form feed) and not
NUL
\x{0000}.Because searches for
\c€
(\x80),\c
(\xA0),\cÀ
(\xC0), and\cà
(\xE0) all matchNUL
(\x00) in ANSI files it means you can’t use them to match the lower caseà
at ANSI character\xE0
nor it’s upper-caseÀ
at\xC0
.
I also ran across that while Notepad++ supports searching for
\x00
or\x{0000}
both which match a NUL (\x00 or \x{0000}) in a file using\x00
or\x{0000}
in the replacement part both results in the replacement string getting terminated at the NUL (\x00 or \x{0000}) character.As replacement strings are terminated at the
NUL
using\c~
where the~
is a NUL (\x00) returnsInvalid Regular Expression
with the details being:ASCII escape sequence terminated prematurely. The error occurred while parsing the regular expression: '>>>HERE>>>\c'.
Using a search for
xxx
and replace ofaaa\x00zzz
oraaa\x{0000}zzz
both result inxxx
being replaced withaaa
as the replacement string was terminated at theNUL
. Apparently the engine first does a pass where it converted the\x☒☒
and\x{☒☒☒☒}
forms of characters into the actual character value meaning\x00
or\x{0000}
in a replacement simply terminates the string at that point.I suspect that bug could be used to add a comment to the replacement!
Search:Hello
Replace:World\x0 This will never happen
Windows also use
NUL
as the text string terminator in its copy/paste system.