Regexp fails to match UTF-8 characters
- 
 @PeterJones said in Regexp fails to match UTF-8 characters: Unfortunately, in character classes like you mentioned, that means that the characters outside the BMP (at U+10000 and above), while they can be found by ^.+, cannot be found by something that seems equivalent, like^[\s\S]+Is there a chance that N++ will support it in the near future? Would it work for you to search for “anything (lookahead for followed by a space or EOF)” instead, using something like ^.+?(?=\s|\Z): this matches the first character on all three of the lines from your example data: this regex finds the first 1 or more characters at the beginning of a line that are followed by a space or EOF, which I think is what you intendedThank you for the suggestion, but it was a one-off transformation I needed to do on a text file, and I achieved it by using a different editor. 
- 
 You’ll find all 3 examples. 
 (🤣 |😊 |☺ ☺\x{FE0F} )\d
- 
 Hello, @alexolog, @peterjones @olivier-thomas and All, Sorry for my late answer ! As @Peterjones said, problems arise when searching Unicode characters which are over the Basic Multilingual plane ( BMP) which have a code-point between\x{10000}and\x{10FFFF}( so over\x{FFFF})For instance, as the code-point of the emoticon 🤣is over\x{FFFF}:- 
It cannot be represented with its real regex syntax \x{1F923}, due a bug of the present Boost regex engine, which does not handle all characters in true32-bitsencoding, but only with theUTF-16encoding:-(( So, searching for\x{1F4A6}results in the error messageFind: Invalid regular expression
- 
Moreover, the simple regex dot symbol (?-s).cannot match a character, with Unicode code-point> \x{FFFF}, too !
- 
Of course if you paste your character, directly, in the Find what:zone, it does find all occurrences of theROLLING ON THE FLOOR LAUGHINGcharacter !
 BTW, your two emoticons can be found in the lists, below : https://www.unicode.org/charts/PDF/U1F600.pdf https://www.unicode.org/charts/PDF/U1F900.pdf 
 Luckily, the coding of characters of our Boost regex engine in UTF-16allows to code all characters, with code-point over\x{FFFF}, thanks to the surrogates mechanism. Refer to generalities, below :https://en.wikipedia.org/wiki/UTF-16 In short, the surrogate pair of a character, with Unicode code-point in range from \x{10000}till\x{10FFFF}, can be described by the regex :\x{hhhh}\x{iiii}whereD800< hhhh <DBFFandDC00< iiii <DFFFSo if a regex, involves the surrogates pair ( two 16-bitunits ) of a character, which is over theBMP, our regex engine is able to match it. For instance, as the surrogates pair of the characterROLLING ON THE FLOOR LAUGHINGisD83E DD23, the regex\x{D83E}\x{DD23}does find all occurrences of your emoticon character !
 - For a full explanation about the two 16-bits code units, called a surrogates pair, refer to :
 https://en.wikipedia.org/wiki/UTF-16#Code_points_from_U+010000_to_U+10FFFF - For the calculus of the surrogates pair of a specific character with code over \x{FFFF}, refer, either , to :
 http://www.russellcottrell.com/greek/utilities/SurrogatePairCalculator.htm http://www.cogsci.ed.ac.uk/~richard/utf-8.cgi? - On our site, get additional information, here :
 https://community.notepad-plus-plus.org/post/51068 https://community.notepad-plus-plus.org/post/43037 and recently I proposed a Notepad++ macro which replaces any selection of the \xhhhhhsyntaxes with their surrogate pair equivalents\x{Dhhh}\x{Diii}! See below :https://community.notepad-plus-plus.org/post/57528 
 In summary, because of the use of UTF-16, instead ofUTF-32, by the present implementation of the Boost Regex library, within N++ :- 
Use the simple regex (?-s).to match any standard character, from\x{0000}to\x{FFFF}( so not including the EOL chars nor the Form Feed char\x0c)
- 
IMPORTANT : From the surrogates mechanism, explained above, one may think that the regex [\x{D800}-\x{DBFF][\x{DC00}-\x{DFFF}]should find all the characters with Unicode code-point over \x{FFFF}. Unfortunately, this syntax does not work !? So, we need to use these derived regexes :
- 
(?-s).[\x{DC00}-\x{DFFF}]to match any standard character from\x{10000}to\x{10FFFF}
- 
(?-s).[\x{DC00}-\x{DFFF}]?to match all standard characters, from\x{0000}to\x{10FFFF}
 And : - 
To match a specific character of the BMP, from \x{0000}to\x{FFFF}, use the regex syntax\x{hhhh}, with four hexadecimal numbers
- 
To match a specific character over the BMP, from \x{10000}to\x{10FFFF}, use the high and low surrogates equivalent pair, with the regex syntax\x{<high>}\x{<low>}, replacing the <high> and <low> values with their exact hexadecimal values, using each4hexadecimal numbers
 
 Now, let’s go back to your example : 🤣 1 ☺ ☺️ 2 😊 3- 
The first line contains the \x{1F923}character, a space char and the1digit
- 
The second line contains the \x{263A}character, a space char, an other\x{263A}char, the invisible\x{FE0F}char ( VARIATION SELECTOR-16 ) a space char and, finally, the2digit
- 
The third line contains the \x{1F60A}character, a space char and the3digit
 So, in order to find the contents of : - 
The first line, the \x{1F923}\x20\x31regex must be changed by the regex\x{D83E}\x{DD23}\x20\x31
- 
The second line, simply use the syntax \x{263A}\x20\x{263A}\x{FE0F}\x20\x32
- 
The third line, the \x{1F60A}\x20\x33regex must be changed by the regex\x{D83D}\x{DE0A}\x20\x33
 
 And, in order to find an equivalent to the pseudo-wrong syntaxes ^[^ ]+and^[\S]+, use :(?-s)^((?!\x20).[\x{DC00}-\x{DFFF}]?)+Notes : - 
As usual, the in-line modifier (?-s)means that any dot will match a single standard char and not EOL chars !
- 
The ^assertion looks for a beginning of line
- 
As said above, the (.[\x{DC00}-\x{DFFF}]?)+will find any range of chars, from\x{0000}to\x{10FFFF}
- 
But as we must omit the space char, we place the negative look-ahead (?!\x20), right before the.symbol, standing for any char under\x{10000}
 Best Regards, guy038 A last example, containing your three consecutive emoticons, with a space char and digit 4:🤣☺😊 4Then, the exact regex \x{1F923}\x{263A}\x{1F60A}\x20\x34must be changed as\x{D83E}\x{DD23}\x{263A}\x{D83D}\x{DE0A}\x20\x34!
- 
- 
 @PeterJones said in Regexp fails to match UTF-8 characters: Expanding on your data with the U+#### unicode codepoints for the characters 🤣 1 U+1F923 ☺ ☺️ 2 U+263A 😊 3 U+1F60AYou will notice that Notepad++ can find the U+263A, but not the U+1F923 or U+1F60A. 
 @guy038 just recently posted in “Functionlist Help” about how Notepad++ cannot search for those in normal circumstances, and instead has to use the surrogate pairs.Unfortunately, in character classes like you mentioned, that means that the characters outside the BMP (at U+10000 and above), while they can be found by ^.+, cannot be found by something that seems equivalent, like^[\s\S]+Would it work for you to search for “anything (lookahead for followed by a space or EOF)” instead, using something like ^.+?(?=\s|\Z): this matches the first character on all three of the lines from your example data: this regex finds the first 1 or more characters at the beginning of a line that are followed by a space or EOF, which I think is what you intended
- 
 @guy038 , Thank you for the detailed explanation! Is there a way to match a string of UTF-8 characters that are not ASCII? Aside: For some reason email notifications of replies do not seem to work. 
- 
 Hello, @alexolog, @peterjones, @olivier-thomas and All, First, in my previous post, I said : - Use the regex (?-s).[\x{DC00}-\x{DFFF}]?to match all standard characters, from\x{0000}to\x{10FFFF}( so All Unicode chars ! )
 You may prefer this more simple syntax (?-s).[\x{DC00}-\x{DFFF}]|.with two alternatives ( the former relative to all the characters over theBMPand the later relative to all the characters within theBMP)
 Now, your question is a bit ambiguous ! Do you speak of : - 
Characters with Unicode code-point over \x{00FF}?
- 
Characters which cannot exist in an ANSIencoded file ?
 
 Indeed, an ANSIencoded file may contain characters whose code-point is over\x{00FF}!Probably, you’re using the Win-1252 ANSIencoding. To verify this assertion, open theEdit > Character panel. It should be identical to the one shown in this Wikipedia article :https://en.wikipedia.org/wiki/Windows-1252#Character_set which can be shortened as : •---------------•-------•--------•----------• | Win-1252 | | Unicode| Code > | | Dec | Hex | Char. | C.P. | \x{00FF} | •---------------•-------•--------•----------• | 0000 | 00 | <NUL> | 0000 | | | .... | .. | ..... | .... | | | .... | .. | ..... | .... | | | 0127 | 7F | <DEL> | 007F | | •---------------•-------•--------•----------• | 0128 | 80 | € | 20AC | Yes | | 0129 | 81 | <HOP> | 0081 | | | 0130 | 82 | ‘ | 201A | Yes | | 0131 | 83 | ƒ | 0192 | Yes | | 0132 | 84 | „ | 201E | Yes | | 0133 | 85 | … | 2026 | Yes | | 0134 | 86 | † | 2020 | Yes | | 0135 | 87 | ‡ | 2021 | Yes | | 0136 | 88 | ˆ | 02C6 | Yes | | 0137 | 89 | ‰ | 2030 | Yes | | 0138 | 8A | Š | 0160 | Yes | | 0149 | 8B | ‹ | 2039 | Yes | | 0140 | 8C | Œ | 0152 | Yes | | 0141 | 8D | <RI> | 008D | | | 0142 | 8E | Ž | 017D | Yes | | 0143 | 8F | <SS3> | 008F | | | 0144 | 90 | <DCS> | 0090 | | | 0145 | 91 | ‘ | 2018 | Yes | | 0146 | 92 | ’ | 2019 | Yes | | 0147 | 93 | “ | 201C | Yes | | 0148 | 94 | ” | 201D | Yes | | 0149 | 95 | • | 2022 | Yes | | 0150 | 96 | – | 2013 | Yes | | 0151 | 97 | — | 2014 | Yes | | 0152 | 98 | ˜ | 02DC | Yes | | 0153 | 99 | ™ | 2122 | Yes | | 0154 | 9A | š | 0161 | Yes | | 0155 | 9B | › | 203A | Yes | | 0156 | 9C | œ | 0153 | Yes | | 0157 | 9D | <OSC> | 009D | | | 0158 | 9E | ž | 017E | Yes | | 0159 | 9F | Ÿ | 0178 | Yes | •---------------•-------•--------•----------• | 0160 | A0 | <NBSP>| 00A0 | | | .... | .. | ..... | .... | | | .... | .. | ..... | .... | | | 0255 | FF | ÿ | 00FF | | •---------------•-------•--------•----------•If we sort this table by Unicode code-point ascending, we get : •---------------•-------•--------•----------• | Win-1252 | | Unicode| Code > | | Dec | Hex | Char. | C.P. | \x{00FF} | •---------------•-------•--------•----------• | 0000 | 00 | <NUL> | 0000 | | | .... | .. | ..... | .... | | | .... | .. | ..... | .... | | | 0127 | 7F | <DEL> | 007F | | •---------------•-------•--------•----------• | 0129 | 81 | <HOP> | 0081 | | | 0141 | 8D | <RI> | 008D | | | 0143 | 8F | <SS3> | 008F | | | 0144 | 90 | <DCS> | 0090 | | | 0157 | 9D | <OSC> | 009D | | •---------------•-------•--------•----------• | 0160 | A0 | <NBSP>| 00A0 | | | .... | .. | ..... | .... | | | .... | .. | ..... | .... | | | 0255 | FF | ÿ | 00FF | | •---------------•-------•--------•----------• | 0140 | 8C | Œ | 0152 | Yes | | 0156 | 9C | œ | 0153 | Yes | | 0138 | 8A | Š | 0160 | Yes | | 0154 | 9A | š | 0161 | Yes | | 0159 | 9F | Ÿ | 0178 | Yes | | 0142 | 8E | Ž | 017D | Yes | | 0158 | 9E | ž | 017E | Yes | | 0131 | 83 | ƒ | 0192 | Yes | | 0136 | 88 | ˆ | 02C6 | Yes | | 0152 | 98 | ˜ | 02DC | Yes | | 0150 | 96 | – | 2013 | Yes | | 0151 | 97 | — | 2014 | Yes | | 0145 | 91 | ‘ | 2018 | Yes | | 0146 | 92 | ’ | 2019 | Yes | | 0130 | 82 | ‘ | 201A | Yes | | 0147 | 93 | “ | 201C | Yes | | 0148 | 94 | ” | 201D | Yes | | 0132 | 84 | „ | 201E | Yes | | 0134 | 86 | † | 2020 | Yes | | 0135 | 87 | ‡ | 2021 | Yes | | 0149 | 95 | • | 2022 | Yes | | 0133 | 85 | … | 2026 | Yes | | 0137 | 89 | ‰ | 2030 | Yes | | 0149 | 8B | ‹ | 2039 | Yes | | 0155 | 9B | › | 203A | Yes | | 0128 | 80 | € | 20AC | Yes | | 0153 | 99 | ™ | 2122 | Yes | •---------------•-------•--------•----------•So, if you want to detect all strings : - Containing characters with code-point over \x{00FF}, only, use the regex :
 (?-s)(.[\x{DC00}-\x{DFFF}]|[[:unicode:]])+( Note the Posix character class[[:unicode:]])- Containing characters, not involved in the Win-1252 ANSIencoding at all, use the regex :
 (?-s)(.[\x{DC00}-\x{DFFF}]|[^\x{0000}-\x{007F}\x{0081}\x{008D}\x{008F}\x{0090}\x{009D}\x{00A0}-\x{00FF}ŒœŠšŸŽžƒˆ˜–—‘’‘“”„†‡•…‰‹›€™])+Beware : It’s important to point out that this second regex avoid, for instance, classical letters, digits, space, tabulation and usual symbols, as well !! In other words, it will find any character not present in the charactercolumn of the ASCII Codes Insertion Panel (Edit > Character Panel)Best Regards, Cheers, guy038 
- Use the regex 
- 
 Hi, @alexolog, @peterjones, @olivier-thomas and All, Ouuuups, sorry ! I read you post too quickly and I thought that you were asking the question : Is there a way to match a string of UTF-8 characters that are not ANSI ? So, if you mean a way to detect a strings of characters, not pure ASCII( so over\x{007F}), use the regex :(?-s)(.[\x{DC00}-\x{DFFF}]|[^\x{0000}-\x{007F}])+Again, this regex will not match classical letters, digits, space, tabulation and usual symbols. Only chracters with unicode code-point over \x{007F}!In other words, it will not match any character of that list : https://en.wikipedia.org/wiki/ASCII#Character_set BR guy038 
- 
 Thank you! 
- 
 Since you seem to have a good grasp on this topic… I was reading this with some interest: 
 https://github.com/notepad-plus-plus/notepad-plus-plus/issues/5558It is true indeed that: The emoji sequence “👩❤️💋👩” does not seem to be rendered as a sequence on notepad++ It is rendered as 4 characters: 👩❤️💋👩 Do you have any idea on why this is? 
- 
 Hello, @alan-kilborn and All, Ah ah ! Alan You made me discover something I didn’t know existed : the creation of a new Emoji character from a small Emoji characters set ! I found out all that story, but we need to describe some technical data, first ! 
 In this article, below, it is said : https://en.wikipedia.org/wiki/Zero-width_joiner The zero-width joiner ( ZWJ) is a non-printing character used in the computerized typesetting of some complex scripts such as the Arabic script or any Indic script or, sometimes, as the Roman script. When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms. When aZWJchar (\x[200D}) is placed between two emoji characters (or interspersed between multiple), it can result in a single glyph being shown, such as the family emoji, made up of two adult emoji and one or two child emojiSimilarly, in this article, below, it is said : https://en.wikipedia.org/wiki/Zero-width_non-joiner The zero-width non-joiner ( ZWNJ) is a non-printing character used in the computerization of writing systems that make use of ligatures. When aZWNJchar (\x[200C}) is placed between two characters that would, otherwise, be connected into a ligature, aZWNJcauses them to be printed in their final and initial forms, respectivelyOn the other hand, in this aricle, it is said : https://www.unicode.org/charts/PDF/UFE00.pdf The variation selector-16 ( VS-16) is an invisible code-point which specifies that the preceding character should be displayed with the emoji presentation. Only required if the preceding character defaults to text presentationFor instance, as the ❤ heart character ( \x{2764}) pre-dates the emoji characters, it needs this variation selector after it, to tell systems to use the ❤️ emoji version (\x{2764}\x{FE0F}), not the ❤︎ text version !Similarly, the variation selector-15 ( VS-15) is an invisible code-point which specifies that the preceding character should be displayed with the text representation. Only required if the preceding character defaults to emoji presentation
 Now, in this page, we can read : https://emojipedia.org/emoji-zwj-sequence/ An Emoji ZWJSequence is a combination of multiple emojis which display as a single emoji on supported platforms. These sequences are joined with aZero Width JoinercharacterTo learn how this feature works, refer to : https://blog.emojipedia.org/emoji-zwj-sequences-three-letters-many-possibilities/ 
 To be exhaustive, the different special characters, involved with the Emoji characters, are : - The 2Format Characters,U+200CandU+200D, in the Unicode block General Punctuation ( U+2000 - U+206F )
 https://www.unicode.org/charts/PDF/U2000.pdf - The 26Regional Indicator Symbols,U+1F1E6-U+1F1FF, in the Unicode block Enclosed Alphanumeric Supplement (U+1F100 – U+1F1FF )
 https://www.unicode.org/charts/PDF/U1F100.pdf - The 5Emoji Modifiers,U+1F3FB - U+1F3FF, in the Unicode block Miscellaneous Symbols and Pictographs ( U+1F300 – U+1F5FF )
 https://www.unicode.org/charts/PDF/U1F300.pdf - The 4Emoji Components,U+1F9B0 - U+1F9B3, in the Unicode block Supplemental Symbols and Pictographs ( U+1F900 – U+1F9FF )
 https://www.unicode.org/charts/PDF/U1F900.pdf - The 2Emoji Variation Selectors,U+FE0EandU+FE0F, in the Unicode block Variation Selectors( U+FE00 – U+FE0F )
 https://www.unicode.org/charts/PDF/UFE00.pdf 
 Now that we have the technical background, let’s come back to your example ! In fact, the emoji 👩❤️💋👩 character, of juliodcs, is the combination of : 👩 emoji + ZWJ char + ❤️ dingbat + VS-16 char + ZWJ char + 💋 emoji + ZWJ char + 👩 emoji and can be found with the following regex, where I use the free-spacing mode for readability ) (?x) \x{D83D}\x{DC69} # Woman Emoji U+1F469 \x{200D} # ZWJ character U+200D \x{2764} # Heavy Black Heart dingbat U+2764 \x{FE0F} # Variation Selector-16 character U+FE0F \x{200D} # ZWJ character U+200D \x{D83D}\x{DC8B} # Kiss Mark Emoji U+1F48B \x{200D} # ZWJ character U+200D \x{D83D}\x{DC69} # Woman Emoji U+1F469IMPORTANT : Don’t forget that in order to search characters with code-point over U+FFFF, our regex engine needs to use the Surrogate Pairs mechanism, explained in this post :https://community.notepad-plus-plus.org/post/57591 Note also that the Variation Selector-16 character does not seem necessary, to the sequence. So we end with this sequence described in this page : https://emojipedia.org/kiss-woman-woman/ 
 An other example : From this four emoji characters, below : - U+1F468 = \x{D83D}\x{DC68}=> 👨 Man
- U+1F469 = \x{D83D}\x{DC69}=> 👩 Woman
- U+1F466 = \x{D83D}\x{DC66}=> 👦 Boy
- U+1F467 = \x{D83D}\x{DC67}=> 👧 Girl
 We can build this composite emoji 👨👩👧👦 ( ZWJsequence ) = 👨 emoji + ZWJ char + 👩 emoji + ZWJ char + 👧 emoji + ZWJ char + 👦 emojiwhich can be searched with the following regex, in free-spacing mode : (?x) \x{D83D}\x{DC68} # Man \x{200D} # ZWJ ( Zero Width Joiner ) \x{D83D}\x{DC69} # Woman \x{200D} # ZWJ ( Zero Width Joiner ) \x{D83D}\x{DC67} # Girl \x{200D} # ZWJ ( Zero Width Joiner ) \x{D83D}\x{DC66} # BoyRemark : It’s important to point out that the ZWJsequence of emojis : 👨 emoji + ZWJ char + 👩 emoji + ZWJ char + 👦 emoji + ZWJ char + 👧 emoji does not give the expected result !Indeed, it just be outputted as the ZWJsequence 👨 emoji + ZWJ char + 👩 emoji + ZWJ char + 👦 emoji, followed with the single 👧 emoji ! That is to say the emoji sequence 👨👩👦👧
 A third example, using the Emoji modifiers. From these chars : - U+1F3FB = \x{D83C}\x{DFFB}🏻 EMOJI MODIFIER FITZPATRICK TYPE-1-2
- U+1F3FC = \x{D83C}\x{DFFC}🏼 EMOJI MODIFIER FITZPATRICK TYPE-3
- U+1F3FD = \x{D83C}\x{DFFD}🏽 EMOJI MODIFIER FITZPATRICK TYPE-4
- U+1F3FE = \x{D83C}\x{DFFE}🏾 EMOJI MODIFIER FITZPATRICK TYPE-5
- U+1F3FF = \x{D83C}\x{DFFF}🏿 EMOJI MODIFIER FITZPATRICK TYPE-6
 We can build this following emoji characters of a girl ( 👧 emoji ) with different skin tone : \x{D83D}\x{DC67}\x{200D}\x{D83C}\x{DFFB}=> 👧🏻 Girl with a light skin tone
 \x{D83D}\x{DC67}\x{200D}\x{D83C}\x{DFFC}=> 👧🏼 Girl with a medium-light skin tone
 \x{D83D}\x{DC67}\x{200D}\x{D83C}\x{DFFD}=> 👧🏽 Girl with a medium skin tone
 \x{D83D}\x{DC67}\x{200D}\x{D83C}\x{DFFE}=> 👧🏾 Girl with a medium-dark skin tone
 \x{D83D}\x{DC67}\x{200D}\x{D83C}\x{DFFF}=> 👧🏿 Girl with a dark skin toneNote the function of the ZWJandZWNJformat characters :- 
With the ZWJchar, the emoji sequence 👧 emoji + ZWJ char + 🏽 emoji modifier is displayed as the composite 👧🏽 emoji
- 
With the ZWNJchar, the emoji sequence 👧 emoji + ZWNJ char + 🏽 emoji modifier is displayed as the two single emojis 👧🏽
- 
However, I noticed that, without these format chars, the sequence 👧 emoji + 🏽 emoji modifier is also outputted as the composite emoji 👧🏽 ! 
 
 A fourth example, using the regional Indicator symbols. From these chars, below : - U+1F1E7 = \x{D83C}\x{DDE7}=> 🇧 Regional Indicator Symbol Letter B
- U+1F1EB = \x{D83C}\x{DDEB}=> 🇫 Regional Indicator Symbol Letter F
- U+1F1EC = \x{D83C}\x{DDEC}=> 🇬 Regional Indicator Symbol Letter G
- U+1F1F4 = \x{D83C}\x{DDF4}=> 🇴 Regional Indicator Symbol Letter O
- U+1F1F7 = \x{D83C}\x{DDF7}=> 🇷 Regional Indicator Symbol Letter R
- U+1F1F8 = \x{D83C}\x{DDF8}=> 🇸 Regional Indicator Symbol Letter S
- U+1F1FA = \x{D83C}\x{DDFA}=> 🇺 Regional Indicator Symbol Letter U
 We can build, for instance, these flags : - 
The French flag 🇫🇷 from 🇫 and 🇷 Regional indicators ( \x{D83C}\x{DDEB}\x{200D}\x{D83C}\x{DDF7})
- 
The United States flag 🇺🇸 from 🇺 and 🇸 Regional indicators ( \x{D83C}\x{DDFA}\x{200D}\x{D83C}\x{DDF8})
- 
The United Kingdom flag 🇬🇧 from 🇬 and 🇧 Regional indicators ( \x{D83C}\x{DDEC}\x{200D}\x{D83C}\x{DDE7})
- 
The Faroe Islands flag 🇫🇴 from 🇫 and 🇴 Regional indicators ( \x{D83C}\x{DDEB}\x{200D}\x{D83C}\x{DDF4})
- 
The Brazil flag 🇧🇷 from 🇧 and 🇷 Regional indicators ( \x{D83C}\x{DDE7}\x{200D}\x{D83C}\x{DDF7})
- 
The Uganda flag 🇺🇬 from 🇺 and 🇬 Regional indicators ( \x{D83C}\x{DDFA}\x{200D}\x{D83C}\x{DDEC})
 Remark : You may omit the ZWJcharacter between the two regional indicators characters !
 Note the function of the VS-15andVS-16Variation Selector characters. For instance, as emoji sequences can be represented as black and white text or as coloured emojis- 
With the VS-15 ( \x{FE0E}) char, the text representation is selected => the sequence : ℹ Information Source char +VS-15char + 👨 emoji )\x{2139}\x{FE0E}\x{D83D}\x{DC68}returns the ℹ︎👨 sequence
- 
With the VS-16 ( \x{FE0F}) char, the emoji representation is selected => the sequence : ℹ Information Source char +VS-16char + 👨 emoji )\x{2139}\x{FE0F}\x{D83D}\x{DC68}returns the ℹ️👨 sequence
 
 To end, you’ll find a list of all Emoji characters, either individual or composite, below : And, for paranoid people, refer to the Unicode Technical Standard #51 : https://www.unicode.org/reports/tr51/ Best Regards, guy038 
- The 
- 
 Wow. 
 More to it than I’d have thought.
 Thanks for the insight.
- 
 This post is deleted!

