Hello, @jaack-mcmahon, and All,
Here, is, bellow, a NON-exhaustive table of some Unicode characters, with code-point, above 007Fh, taken from the following Unicode blocks :
Latin 1 Supplement General Punctuation Mathematical Operators Miscellaneous Symbols Specialswhich can be replaced by a similar standard ASCII character, with code-point < 0080h :
+--------------------------------------------------------------+---------------------------------------------+ | NON-ASCII Character with Code > \x{007F} | Similar Character(s) with Code < \x{0080} | +--------------------------------------------------------------+---------------------------------------------+ | Code | Char | Character Name | Code | Char | Character Name | +--------+------+----------------------------------------------+--------+---------+--------------------------+ | 00A0 | | NO-BREAK SPACE | 0020 | | SPACE | | 00A6 | ¦ | BROKEN BAR | 007C | | | VERTICAL LINE | | 00AB | « | LEFT-POINTING DOUBLE ANGLE QUOTATION MARK | 0022 | " | QUOTATION MARK | | 00AD | | SOFT HYPHEN | 002D | - | HYPHEN-MINUS | | 00B4 | ´ | ACUTE ACCENT | 0027 | ' | APOSTROPHE | | 00B7 | · | MIDDLE DOT | 002E | . | FULL STOP | | 00BB | » | RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK | 0022 | " | QUOTATION MARK | | 00BC | ¼ | VULGAR FRACTION ONE QUARTER | | 1/4 | | | 00BD | ½ | VULGAR FRACTION ONE HALF | | 1/2 | | | 00BE | ¾ | VULGAR FRACTION THREE QUARTERS | | 3/4 | | | 00D7 | × | MULTIPLICATION SIGN | 0078 | x | LATIN SMALL LETTER X | +--------+------+----------------------------------------------+--------+---------+--------------------------+ | 2000 | | EN QUAD | | \x20{2} | | | 2001 | | EM QUAD | | \x20{4} | | | 2002 | | EN SPACE | | \x20{2} | | | 2003 | | EM SPACE | | \x20{4} | | | 2004 | | THREE-PER-EM SPACE | 0020 | | SPACE | | 2005 | | FOUR-PER-EM SPACE | 0020 | | SPACE | | 2007 | | FIGURE SPACE | | \x20{2} | | | 2008 | | PUNCTUATION SPACE | 0020 | | SPACE | | 2010 | ‐ | HYPHEN | 002D | - | HYPHEN-MINUS | | 2011 | ‑ | NON-BREAKING HYPHEN | 002D | - | HYPHEN-MINUS | | 2012 | ‒ | FIGURE DASH | | -- | | | 2013 | – | EN DASH | 002D | - | HYPHEN-MINUS | | 2014 | — | EM DASH | 002D | - | HYPHEN-MINUS | | 2015 | ― | HORIZONTAL BAR | 002D | - | HYPHEN-MINUS | | 2016 | ‖ | DOUBLE VERTICAL LINE | | || | | | 2018 | ‘ | LEFT SINGLE QUOTATION MARK | 0027 | ' | APOSTROPHE | | 2019 | ’ | RIGHT SINGLE QUOTATION MARK | 0027 | ' | APOSTROPHE | | 201A | ‚ | SINGLE LOW-9 QUOTATION MARK | 002C | , | COMMA | | 201B | ‛ | SINGLE HIGH-REVERSED-9 QUOTATION MARK | 0060 | ` | GRAVE ACCENT | | 201C | “ | LEFT DOUBLE QUOTATION MARK | 0022 | " | QUOTATION MARK | | 201D | ” | RIGHT DOUBLE QUOTATION MARK | 0022 | " | QUOTATION MARK | | 201E | „ | DOUBLE LOW-9 QUOTATION MARK | | ,, | | | 201F | ‟ | DOUBLE HIGH-REVERSED-9 QUOTATION MARK | 0022 | " | QUOTATION MARK | | 2022 | • | BULLET | 002E | . | FULL STOP | | 2024 | ․ | ONE DOT LEADER | 002E | . | FULL STOP | | 2025 | ‥ | TWO DOT LEADER | | .. | | | 2026 | … | HORIZONTAL ELLIPSIS | | ... | | | 2032 | ′ | PRIME | 0027 | ' | APOSTROPHE | | 2033 | ″ | DOUBLE PRIME | | '' | | | 2034 | ‴ | TRIPLE PRIME | | ''' | | | 2035 | ‵ | REVERSED PRIME | 0060 | ` | GRAVE ACCENT | | 2036 | ‶ | REVERSED DOUBLE PRIME | | `` | | | 2037 | ‷ | REVERSED TRIPLE PRIME | | ``` | | | 2039 | ‹ | SINGLE LEFT-POINTING ANGLE QUOTATION MARK | 003C | < | LESS-THAN SIGN | | 203A | › | SINGLE RIGHT-POINTING ANGLE QUOTATION MARK | 003E | > | GREATER-THAN SIGN | | 203D | ‽ | INTERROBANG | | !? | | | 2044 | ⁄ | FRACTION SLASH | 002F | / | SOLIDUS | +--------+------+----------------------------------------------+--------+---------+--------------------------+ | 2212 | − | MINUS SIGN | 002D | - | HYPHEN-MINUS | | 2215 | ∕ | DIVISION SLASH | 002F | / | SOLIDUS | | 2216 | ∖ | SET MINUS | 005C | \ | REVERSE SOLIDUS | | 2217 | ∗ | ASTERISK OPERATOR | 002A | * | ASTERISK | | 2223 | ∣ | DIVIDES | 007C | | | VERTICAL LINE | | 2225 | ∥ | PARALLEL TO | | || | | | 2227 | ∧ | LOGICAL AND | 005E | ^ | CIRCUMFLEX ACCENT | | 2228 | ∨ | LOGICAL OR | 0056 | V | LATIN CAPITAL LETTER V | | 222A | ∪ | UNION | 0055 | U | LATIN CAPITAL LETTER U | | 2236 | ∶ | RATIO | 003A | : | COLON | | 2237 | ∷ | PROPORTION | | :: | | | 2239 | ∹ | EXCESS | | -: | | | 223C | ∼ | TILDE OPERATOR | 007E | ~ | TILDE | | 2254 | ≔ | COLON EQUALS | | := | | | 2255 | ≕ | EQUALS COLON | | =: | | | 2264 | ≤ | LESS-THAN OR EQUAL TO | | <= | | | 2265 | ≥ | GREATER-THAN OR EQUAL TO | | >= | | | 226A | ≪ | MUCH LESS-THAN | | << | | | 226B | ≫ | MUCH GREATER-THAN | | >> | | | 2276 | ≶ | LESS-THAN OR GREATER-THAN | | <|> | | | 2277 | ≷ | GREATER-THAN OR LESS-THAN | | >|< | | | 22C0 | ⋀ | N-ARY LOGICAL AND | 005E | ^ | CIRCUMFLEX ACCENT | | 22C1 | ⋁ | N-ARY LOGICAL OR | 0056 | V | LATIN CAPITAL LETTER V | | 22C3 | ⋃ | N-ARY UNION | 0055 | U | LATIN CAPITAL LETTER U | | 22C5 | ⋅ | DOT OPERATOR | 002E | . | FULL STOP | | 22C6 | ⋆ | STAR OPERATOR | 002A | * | ASTERISK | | 22D8 | ⋘ | VERY MUCH LESS-THAN | | <<< | | | 22D9 | ⋙ | VERY MUCH GREATER-THAN | | >>> | | | 22EF | ⋯ | MIDLINE HORIZONTAL ELLIPSIS | | ... | | +--------+------+----------------------------------------------+--------+---------+--------------------------+ | 2639 | ☹ | WHITE FROWNING FACE | | :-( | | | 263A | ☺ | WHITE SMILING FACE | | :-) | | +--------+------+----------------------------------------------+--------+---------+--------------------------+ | FFFD | � | REPLACEMENT CHARACTER | 003F | ? | QUESTION MARK | +--------+------+----------------------------------------------+--------+---------+--------------------------+Now, let’s suppose that, from the list, below, you would like to replace these 14 Unicode characters, on the left, with their similar standard character, on the right :
| 00A6 | ¦ | BROKEN BAR | 007C | | | VERTICAL LINE | | 00BD | ½ | VULGAR FRACTION ONE HALF | | 1/2 | | | 2000 | | EN QUAD | | \x20{2} | | | 2001 | | EM QUAD | | \x20{4} | | | 2018 | ‘ | LEFT SINGLE QUOTATION MARK | 0027 | ' | APOSTROPHE | | 2019 | ’ | RIGHT SINGLE QUOTATION MARK | 0027 | ' | APOSTROPHE | | 201C | “ | LEFT DOUBLE QUOTATION MARK | 0022 | " | QUOTATION MARK | | 201D | ” | RIGHT DOUBLE QUOTATION MARK | 0022 | " | QUOTATION MARK | | 203D | ‽ | INTERROBANG | | !? | | | 2264 | ≤ | LESS-THAN OR EQUAL TO | | <= | | | 2265 | ≥ | GREATER-THAN OR EQUAL TO | | >= | | | 2639 | ☹ | WHITE FROWNING FACE | | :-( | | | 263A | ☺ | WHITE SMILING FACE | | :-) | | | FFFD | � | REPLACEMENT CHARACTER | 003F | ? | QUESTION MARK |Then :
Open the Replace dialog, in N++ ( Ctrl + H )
Type in the regex (¦)|(½)|( )|( )|(‘)|(’)|(“)|(”)|(‽)|(≤)|(≥)|(☹)|(☺)|(�), in the Find what: zone
Type in the regex (?1|)(?{2}1/2)(?3\x20\x20)(?4\x20\x20\x20\x20)(?5')(?6')(?7")(?8")(?9!?)(?{10}<=)(?{11}>=)(?{12}\:-\()(?{13}\:-\))(?{14}?), in the Replace with: zone
Tick the Wrap around option
Select the Regular expression search mode
Click, once , on the Replace All button, or several times on the Replace button
Et voilà !
Notes :
In search, we, simply, put each character, to be replaced, between round parentheses, in order to be stored as group 1, 2 and so on…
In replacement, we use a special conditional syntax (?#xxxx:yyyy) or (?{#..#}xxxx:yyyy), where :
# or #...# represents a group number
The part xxxx is rewritten, if group # or #...# exists
The part yyyy is rewritten, if group # or #...# does not exist
In our case, the ELSE part, in each conditional replacement, is not present
If a part xxxx or yyyy contains the character :, ( or ), it must be escaped ( preceded ) with a \ symbol
For the second conditional replacement, I used the syntax (?{2}1/2), on purpose ! Indeed, if I would have used the (?21/2) syntax, the regex engine would have, wrongly, tried to replace any searched group 21 with the /2 string !!
To end with, note that quantifiers, as {#}, do not work, in replacement. So we need to change, for instance, the \x20{2} syntax ( 2 space characters) by the simple \x20\x20 one !
Best Regards,
guy038