Hello, @jaack-mcmahon, and All,
Here, is, bellow, a NON-exhaustive table of some Unicode characters, with code-point, above 007Fh, taken from the following Unicode blocks :
Latin 1 Supplement
General Punctuation
Mathematical Operators
Miscellaneous Symbols
Specials
which can be replaced by a similar standard ASCII character, with code-point < 0080h :
+--------------------------------------------------------------+---------------------------------------------+
| NON-ASCII Character with Code > \x{007F} | Similar Character(s) with Code < \x{0080} |
+--------------------------------------------------------------+---------------------------------------------+
| Code | Char | Character Name | Code | Char | Character Name |
+--------+------+----------------------------------------------+--------+---------+--------------------------+
| 00A0 | | NO-BREAK SPACE | 0020 | | SPACE |
| 00A6 | ¦ | BROKEN BAR | 007C | | | VERTICAL LINE |
| 00AB | « | LEFT-POINTING DOUBLE ANGLE QUOTATION MARK | 0022 | " | QUOTATION MARK |
| 00AD | | SOFT HYPHEN | 002D | - | HYPHEN-MINUS |
| 00B4 | ´ | ACUTE ACCENT | 0027 | ' | APOSTROPHE |
| 00B7 | · | MIDDLE DOT | 002E | . | FULL STOP |
| 00BB | » | RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK | 0022 | " | QUOTATION MARK |
| 00BC | ¼ | VULGAR FRACTION ONE QUARTER | | 1/4 | |
| 00BD | ½ | VULGAR FRACTION ONE HALF | | 1/2 | |
| 00BE | ¾ | VULGAR FRACTION THREE QUARTERS | | 3/4 | |
| 00D7 | × | MULTIPLICATION SIGN | 0078 | x | LATIN SMALL LETTER X |
+--------+------+----------------------------------------------+--------+---------+--------------------------+
| 2000 | | EN QUAD | | \x20{2} | |
| 2001 | | EM QUAD | | \x20{4} | |
| 2002 | | EN SPACE | | \x20{2} | |
| 2003 | | EM SPACE | | \x20{4} | |
| 2004 | | THREE-PER-EM SPACE | 0020 | | SPACE |
| 2005 | | FOUR-PER-EM SPACE | 0020 | | SPACE |
| 2007 | | FIGURE SPACE | | \x20{2} | |
| 2008 | | PUNCTUATION SPACE | 0020 | | SPACE |
| 2010 | ‐ | HYPHEN | 002D | - | HYPHEN-MINUS |
| 2011 | ‑ | NON-BREAKING HYPHEN | 002D | - | HYPHEN-MINUS |
| 2012 | ‒ | FIGURE DASH | | -- | |
| 2013 | – | EN DASH | 002D | - | HYPHEN-MINUS |
| 2014 | — | EM DASH | 002D | - | HYPHEN-MINUS |
| 2015 | ― | HORIZONTAL BAR | 002D | - | HYPHEN-MINUS |
| 2016 | ‖ | DOUBLE VERTICAL LINE | | || | |
| 2018 | ‘ | LEFT SINGLE QUOTATION MARK | 0027 | ' | APOSTROPHE |
| 2019 | ’ | RIGHT SINGLE QUOTATION MARK | 0027 | ' | APOSTROPHE |
| 201A | ‚ | SINGLE LOW-9 QUOTATION MARK | 002C | , | COMMA |
| 201B | ‛ | SINGLE HIGH-REVERSED-9 QUOTATION MARK | 0060 | ` | GRAVE ACCENT |
| 201C | “ | LEFT DOUBLE QUOTATION MARK | 0022 | " | QUOTATION MARK |
| 201D | ” | RIGHT DOUBLE QUOTATION MARK | 0022 | " | QUOTATION MARK |
| 201E | „ | DOUBLE LOW-9 QUOTATION MARK | | ,, | |
| 201F | ‟ | DOUBLE HIGH-REVERSED-9 QUOTATION MARK | 0022 | " | QUOTATION MARK |
| 2022 | • | BULLET | 002E | . | FULL STOP |
| 2024 | ․ | ONE DOT LEADER | 002E | . | FULL STOP |
| 2025 | ‥ | TWO DOT LEADER | | .. | |
| 2026 | … | HORIZONTAL ELLIPSIS | | ... | |
| 2032 | ′ | PRIME | 0027 | ' | APOSTROPHE |
| 2033 | ″ | DOUBLE PRIME | | '' | |
| 2034 | ‴ | TRIPLE PRIME | | ''' | |
| 2035 | ‵ | REVERSED PRIME | 0060 | ` | GRAVE ACCENT |
| 2036 | ‶ | REVERSED DOUBLE PRIME | | `` | |
| 2037 | ‷ | REVERSED TRIPLE PRIME | | ``` | |
| 2039 | ‹ | SINGLE LEFT-POINTING ANGLE QUOTATION MARK | 003C | < | LESS-THAN SIGN |
| 203A | › | SINGLE RIGHT-POINTING ANGLE QUOTATION MARK | 003E | > | GREATER-THAN SIGN |
| 203D | ‽ | INTERROBANG | | !? | |
| 2044 | ⁄ | FRACTION SLASH | 002F | / | SOLIDUS |
+--------+------+----------------------------------------------+--------+---------+--------------------------+
| 2212 | − | MINUS SIGN | 002D | - | HYPHEN-MINUS |
| 2215 | ∕ | DIVISION SLASH | 002F | / | SOLIDUS |
| 2216 | ∖ | SET MINUS | 005C | \ | REVERSE SOLIDUS |
| 2217 | ∗ | ASTERISK OPERATOR | 002A | * | ASTERISK |
| 2223 | ∣ | DIVIDES | 007C | | | VERTICAL LINE |
| 2225 | ∥ | PARALLEL TO | | || | |
| 2227 | ∧ | LOGICAL AND | 005E | ^ | CIRCUMFLEX ACCENT |
| 2228 | ∨ | LOGICAL OR | 0056 | V | LATIN CAPITAL LETTER V |
| 222A | ∪ | UNION | 0055 | U | LATIN CAPITAL LETTER U |
| 2236 | ∶ | RATIO | 003A | : | COLON |
| 2237 | ∷ | PROPORTION | | :: | |
| 2239 | ∹ | EXCESS | | -: | |
| 223C | ∼ | TILDE OPERATOR | 007E | ~ | TILDE |
| 2254 | ≔ | COLON EQUALS | | := | |
| 2255 | ≕ | EQUALS COLON | | =: | |
| 2264 | ≤ | LESS-THAN OR EQUAL TO | | <= | |
| 2265 | ≥ | GREATER-THAN OR EQUAL TO | | >= | |
| 226A | ≪ | MUCH LESS-THAN | | << | |
| 226B | ≫ | MUCH GREATER-THAN | | >> | |
| 2276 | ≶ | LESS-THAN OR GREATER-THAN | | <|> | |
| 2277 | ≷ | GREATER-THAN OR LESS-THAN | | >|< | |
| 22C0 | ⋀ | N-ARY LOGICAL AND | 005E | ^ | CIRCUMFLEX ACCENT |
| 22C1 | ⋁ | N-ARY LOGICAL OR | 0056 | V | LATIN CAPITAL LETTER V |
| 22C3 | ⋃ | N-ARY UNION | 0055 | U | LATIN CAPITAL LETTER U |
| 22C5 | ⋅ | DOT OPERATOR | 002E | . | FULL STOP |
| 22C6 | ⋆ | STAR OPERATOR | 002A | * | ASTERISK |
| 22D8 | ⋘ | VERY MUCH LESS-THAN | | <<< | |
| 22D9 | ⋙ | VERY MUCH GREATER-THAN | | >>> | |
| 22EF | ⋯ | MIDLINE HORIZONTAL ELLIPSIS | | ... | |
+--------+------+----------------------------------------------+--------+---------+--------------------------+
| 2639 | ☹ | WHITE FROWNING FACE | | :-( | |
| 263A | ☺ | WHITE SMILING FACE | | :-) | |
+--------+------+----------------------------------------------+--------+---------+--------------------------+
| FFFD | � | REPLACEMENT CHARACTER | 003F | ? | QUESTION MARK |
+--------+------+----------------------------------------------+--------+---------+--------------------------+
Now, let’s suppose that, from the list, below, you would like to replace these 14 Unicode characters, on the left, with their similar standard character, on the right :
| 00A6 | ¦ | BROKEN BAR | 007C | | | VERTICAL LINE |
| 00BD | ½ | VULGAR FRACTION ONE HALF | | 1/2 | |
| 2000 | | EN QUAD | | \x20{2} | |
| 2001 | | EM QUAD | | \x20{4} | |
| 2018 | ‘ | LEFT SINGLE QUOTATION MARK | 0027 | ' | APOSTROPHE |
| 2019 | ’ | RIGHT SINGLE QUOTATION MARK | 0027 | ' | APOSTROPHE |
| 201C | “ | LEFT DOUBLE QUOTATION MARK | 0022 | " | QUOTATION MARK |
| 201D | ” | RIGHT DOUBLE QUOTATION MARK | 0022 | " | QUOTATION MARK |
| 203D | ‽ | INTERROBANG | | !? | |
| 2264 | ≤ | LESS-THAN OR EQUAL TO | | <= | |
| 2265 | ≥ | GREATER-THAN OR EQUAL TO | | >= | |
| 2639 | ☹ | WHITE FROWNING FACE | | :-( | |
| 263A | ☺ | WHITE SMILING FACE | | :-) | |
| FFFD | � | REPLACEMENT CHARACTER | 003F | ? | QUESTION MARK |
Then :
Open the Replace dialog, in N++ ( Ctrl + H )
Type in the regex (¦)|(½)|( )|( )|(‘)|(’)|(“)|(”)|(‽)|(≤)|(≥)|(☹)|(☺)|(�), in the Find what: zone
Type in the regex (?1|)(?{2}1/2)(?3\x20\x20)(?4\x20\x20\x20\x20)(?5')(?6')(?7")(?8")(?9!?)(?{10}<=)(?{11}>=)(?{12}\:-\()(?{13}\:-\))(?{14}?), in the Replace with: zone
Tick the Wrap around option
Select the Regular expression search mode
Click, once , on the Replace All button, or several times on the Replace button
Et voilà !
Notes :
In search, we, simply, put each character, to be replaced, between round parentheses, in order to be stored as group 1, 2 and so on…
In replacement, we use a special conditional syntax (?#xxxx:yyyy) or (?{#..#}xxxx:yyyy), where :
# or #...# represents a group number
The part xxxx is rewritten, if group # or #...# exists
The part yyyy is rewritten, if group # or #...# does not exist
In our case, the ELSE part, in each conditional replacement, is not present
If a part xxxx or yyyy contains the character :, ( or ), it must be escaped ( preceded ) with a \ symbol
For the second conditional replacement, I used the syntax (?{2}1/2), on purpose ! Indeed, if I would have used the (?21/2) syntax, the regex engine would have, wrongly, tried to replace any searched group 21 with the /2 string !!
To end with, note that quantifiers, as {#}, do not work, in replacement. So we need to change, for instance, the \x20{2} syntax ( 2 space characters) by the simple \x20\x20 one !
Best Regards,
guy038