Hello,@ @m-fessler, @mathlete2, @alan-kilborn, @coises and All,
@m-fessler, here is, below, a list of all the special Unicode characters which belong, either, to :
The Z separator category ( Zs, Zl and Zp categories )
The Cc Control character category ( except for the TAB, LF and CR ones )
The Cf Format character category
Two So Other Symbol characters ( \x{FFFC} and \x{FFFD} )
This list contains 121 characters
•---------•--------------------•--------------------------------------------•----------•------•--------• | Code | Regex | Character | Abbre. | GC | Chr. | •---------•--------------------•--------------------------------------------•----------•------•--------• | 0000 | \x{0000} | NULL | NUL | Cc | | 0001 | \x{0001} | START OF HEADING | SOH | Cc | | 0002 | \x{0002} | START OF TEXT | STX | Cc | | 0003 | \x{0003} | END OF TEXT | ETX | Cc | | 0004 | \x{0004} | END OF TRANSMISSION | EOT | Cc | | 0005 | \x{0005} | ENQUIRY | ENQ | Cc | | 0006 | \x{0006} | ACKNOWLEDGE | ACK | Cc | | 0007 | \x{0007} | BELL | BEL | Cc | | 0008 | \x{0008} | BACKSPACE | BS | Cc | | 000B | \x{000B} | VERTICAL TABULATION | VT | Cc | | 000C | \x{000C} | FORM FEED | FF | Cc | | 000E | \x{000E} | SHIFT OUT | SO | Cc | | 000F | \x{000F} | SHIFT IN | SI | Cc | | 0010 | \x{0010} | DATA LINK ESCAPE | DLE | Cc | | 0011 | \x{0011} | DEVICE CONTROL ONE | DC1 | Cc | | 0012 | \x{0012} | DEVICE CONTROL TWO | DC2 | Cc | | 0013 | \x{0013} | DEVICE CONTROL THREE | DC3 | Cc | | 0014 | \x{0014} | DEVICE CONTROL FOUR | DC4 | Cc | | 0015 | \x{0015} | NEGATIVE ACKNOWLEDGE | NAK | Cc | | 0016 | \x{0016} | SYNCHRONOUS IDLE | SYN | Cc | | 0017 | \x{0017} | END OF TRANSMISSION BLOCK | ETB | Cc | | 0018 | \x{0018} | CANCEL | CAN | Cc | | 0019 | \x{0019} | END OF MEDIUM | EM | Cc | | 001A | \x{001A} | SUBSTITUTE | SUB | Cc | | 001B | \x{001B} | ESCAPE | ESC | Cc | | 001C | \x{001C} | FILE SEPARATOR | FS | Cc | | 001D | \x{001D} | GROUP SEPARATOR | GS | Cc | | 001E | \x{001E} | RECORD SEPARATOR | RS | Cc | | 001F | \x{001F} | UNIT SEPARATOR | US | Cc | •---------•-------------------•--------------------------------------------•----------•------•--------• | 007F | \x{007F} | DELETE | DEL | Cc | •---------•--------------------•--------------------------------------------•----------•------•-------• | 0080 | \x{0080} | PADDING CHARACTER | PAD | Cc | | 0081 | \x{0081} | HIGH OCTET PRESET | HOP | Cc | | 0082 | \x{0082} | BREAK PERMITTED HERE | BPH | Cc | | 0083 | \x{0083} | NO BREAK HERE | NBH | Cc | | 0084 | \x{0084} | INDEX | IND | Cc | | 0085 | \x{0085} | NEXT LINE | NEL | Cc | | 0086 | \x{0086} | START OF SELECTED AREA | SSA | Cc | | 0087 | \x{0087} | END OF SELECTED AREA | ESA | Cc | | 0088 | \x{0088} | HORIZONTAL TABULATION SET | HTS | Cc | | 0089 | \x{0089} | HORIZONTAL TABULATION WITH JUSTIFICATION | HTJ | Cc | | 008A | \x{008A} | VERTICAL TABULATION SET | VTS | Cc | | 008B | \x{008B} | PARTIAL LINE DOWN | PLD | Cc | | 008C | \x{008C} | PARTIAL LINE UP | PLU | Cc | | 008D | \x{008D} | REVERSE INDEX | RI | Cc | | 008E | \x{008E} | SINGLE-SHIFT 2 | SS2 | Cc | | 008F | \x{008F} | SINGLE-SHIFT 3 | SS3 | Cc | | 0090 | \x{0090} | DEVICE CONTROL STRING | DCS | Cc | | 0091 | \x{0091} | PRIVATE USE 1 | PU1 | Cc | | 0092 | \x{0092} | PRIVATE USE 2 | PU2 | Cc | | 0093 | \x{0093} | SET TRANSMIT STATE | STS | Cc | | 0094 | \x{0094} | CANCEL CHARACTER | CCH | Cc | | 0095 | \x{0095} | MESSAGE WAITING | MW | Cc | | 0096 | \x{0096} | START OF PROTECTED AREA | SPA | Cc | | 0097 | \x{0097} | END OF PROTECTED AREA | EPA | Cc | | 0098 | \x{0098} | START OF STRING | SOS | Cc | | 0099 | \x{0099} | SINGLE GRAPHIC CHARACTER INTRODUCER | SGCI | Cc | | 009A | \x{009A} | SINGLE CHARACTER INTRODUCER | SCI | Cc | | 009B | \x{009B} | CONTROL SEQUENCE INTRODUCER | CSI | Cc | | 009C | \x{009C} | STRING TERMINATOR | ST | Cc | | 009D | \x{009D} | OPERATING SYSTEM COMMAND | OSC | Cc | | 009E | \x{009E} | PRIVACY MESSAGE | PM | Cc | | 009F | \x{009F} | APPLICATION PROGRAM COMMAND | APC | Cc | •---------•--------------------•--------------------------------------------•----------•------•--------• | 00A0 | \x{00A0} | NO-BREAK SPACE | NBSP | Zs | •---------•--------------------•--------------------------------------------•----------•------•--------• | 00AD | \x{00AD} | SOFT HYPHEN | SHY | Cf | •---------•--------------------•--------------------------------------------•----------•------•--------• | 061C | \x{061C} | ARABIC LETTER MARK | ALM | Cf | •---------•--------------------•--------------------------------------------•----------•------•--------• | 070F | \x{070F} | SYRIAC ABBREVIATION MARK | SAM | Cf | •---------•--------------------•--------------------------------------------•----------•------•--------• | 0890 | \x{0890} | ARABIC POUND MARK ABOVE | | Cf | | 0891 | \x{0891} | ARABIC PIASTRE MARK ABOVE | | Cf | •---------•--------------------•--------------------------------------------•----------•------•--------• | 1680 | \x{1680} | OGHAM SPACE MARK | OSPM | Zs | •---------•--------------------•--------------------------------------------•----------•------•--------• | 180E | \x{180E} | MONGOLIAN VOWEL SEPARATOR | MVS | Cf | •---------•--------------------•--------------------------------------------•----------•------•--------• | 2000 | \x{2000} | EN QUAD | NQSP | Zs | | 2001 | \x{2001} | EM QUAD | MQSP | Zs | | 2002 | \x{2002} | EN SPACE | ENSP | Zs | | 2003 | \x{2003} | EM SPACE | EMSP | Zs | | 2004 | \x{2004} | THREE-PER-EM SPACE | 3/MSP | Zs | | 2005 | \x{2005} | FOUR-PER-EM SPACE | 4/MSP | Zs | | 2006 | \x{2006} | SIX-PER-EM SPACE | 6/MSP | Zs | | 2007 | \x{2007} | FIGURE SPACE | FSP | Zs | | 2008 | \x{2008} | PUNCTUATION SPACE | PSP | Zs | | 2009 | \x{2009} | THIN SPACE | THSP | Zs | | 200A | \x{200A} | HAIR SPACE | HSP | Zs | •---------•--------------------•--------------------------------------------•----------•------•--------• | 200B | \x{200B} | ZERO WIDTH SPACE | ZWSP | Cf | | 200C | \x{200C} | ZERO WIDTH NON-JOINER | ZWNJ | Cf | | 200D | \x{200D} | ZERO WIDTH JOINER | ZWJ | Cf | | 200E | \x{200E} | LEFT-TO-RIGHT MARK | LRM | Cf | | 200F | \x{200F} | RIGHT-TO-LEFT MARK | RLM | Cf | •---------•--------------------•--------------------------------------------•----------•------•--------• | 2028 | \x{2028} | LINE SEPARATOR | LS | Zl | | 2029 | \x{2029} | PARAGRAPH SEPARATOR | PS | Zp | •---------•--------------------•--------------------------------------------•----------•------•--------• | 202A | \x{202A} | LEFT-TO-RIGHT EMBEDDING | LRE | Cf | | 202B | \x{202B} | RIGHT-TO-LEFT EMBEDDING | RLE | Cf | | 202C | \x{202C} | POP DIRECTIONAL FORMATTING | PDF | Cf | | 202D | \x{202D} | LEFT-TO-RIGHT OVERRIDE | LRO | Cf | | 202E | \x{202E} | RIGHT-TO-LEFT OVERRIDE | RLO | Cf | | •---------•--------------------•--------------------------------------------•----------•------•--------• | 202F | \x{202F} | NARROW NO-BREAK SPACE | NNBSP | Zs | | 205F | \x{205F} | MEDIUM MATHEMATICAL SPACE | MMSP | Zs | •---------•--------------------•--------------------------------------------•----------•------•--------• | 2060 | \x{2060} | WORD JOINER | WJ | Cf | •---------•--------------------•--------------------------------------------•----------•------•--------• | 2061 | \x{2061} | FUNCTION APPLICATION | (FA) | Cf | | 2062 | \x{2062} | INVISIBLE TIMES | (IT) | Cf | | 2063 | \x{2063} | INVISIBLE SEPARATOR | (IS) | Cf | | 2064 | \x{2064} | INVISIBLE PLUS | (IP) | Cf | •---------•--------------------•--------------------------------------------•----------•------•--------• | 2066 | \x{2066} | LEFT-TO-RIGHT ISOLATE | LRI | Cf | | 2067 | \x{2067} | RIGHT-TO-LEFT ISOLATE | RLI | Cf | | 2068 | \x{2068} | FIRST STRONG ISOLATE | FSI | Cf | | 2069 | \x{2069} | POP DIRECTIONAL ISOLATE | PDI | Cf | | 206A | \x{206A} | INHIBIT SYMMETRIC SWAPPING | ISS | Cf | | 206B | \x{206B} | ACTIVATE SYMMETRIC SWAPPING | ASS | Cf | | 206C | \x{206C} | INHIBIT ARABIC FORM SHAPING | IAFS | Cf | | 206D | \x{206D} | ACTIVATE ARABIC FORM SHAPING | AAFS | Cf | | 206E | \x{206E} | NATIONAL DIGIT SHAPES | NADS | Cf | | 206F | \x{206F} | NOMINAL DIGIT SHAPES | NODS | Cf | •---------•--------------------•--------------------------------------------•----------•------•--------• | 3000 | \x{3000} | IDEOGRAPHIC SPACE | IDSP | Zs | •---------•--------------------•--------------------------------------------•----------•------•--------• | FEFF | \x{FEFF} | ZERO WIDTH NO-BREAK SPACE | ZWNBSP | Cf | •---------•--------------------•--------------------------------------------•----------•------•--------• | FFF9 | \x{FFF9} | INTERLINEAR ANNOTATION ANCHOR | IAA | Cf | | FFFA | \x{FFFA} | INTERLINEAR ANNOTATION SEPARATOR | IAS | Cf | | FFFB | \x{FFFB} | INTERLINEAR ANNOTATION TERMINATOR | IAT | Cf | •---------•--------------------•--------------------------------------------•----------•------•--------• | FFFC | \x{FFFC} | OBJECT REPLACEMENT CHARACTER | OBJ | So |  | FFFD | \x{FFFD} | REPLACEMENT CHARACTER | ? | So | � •---------•--------------------•--------------------------------------------•----------•------•--------• | 1BCA0 | \x{D82F}\x{DCA0} | SHORTHAND FORMAT LETTER OVERLAP | SFLO | Cf | | 1BCA1 | \x{D82F}\x{DCA1} | SHORTHAND FORMAT CONTINUING OVERLAP | SFCO | Cf | | 1BCA2 | \x{D82F}\x{DCA2} | SHORTHAND FORMAT DOWN STEP | SFDS | Cf | | 1BCA3 | \x{D82F}\x{DCA3} | SHORTHAND FORMAT UP STEP | SFUS | Cf | •---------•--------------------•--------------------------------------------•----------•------•--------•From this list, @m-fessler, which characters do you want to Search / Mark / Replace ?
Moreover, do you want to ignore all characters above the BMP ( so, over \x{FFFF} ) or do you consider these characters as normal chars ?
Once, you’ll know which characters you want to consider, it will be easy to get the appropriate REGEX search !
Best Regards,
guy038