Lexicographically or Alphabetically
-
Should Lexicographically perhaps be Alphabetically in these strings?:
<Item id=“42059” name=“Sort Lines Lexicographically Ascending”/>
<Item id=“42060” name=“Sort Lines Lexicographically Descending”/>I don’'t know the difference i would just think Alphabetically was a more know word but maybe that only applys to A-Z and not other characters.
-
Hello, @scootergrisen, and All,
Strictly speaking, The N++ sort feature is a sort of the Unicode code-points of characters !
Assuming, for instance, these few values, picked out, by hazard, from my Courrier New font,
v2.90
, that I sorted, alphabetically, by their Unicode official nameﯿ U+FBFF ARABIC LETTER FARSI YEH MEDIAL FORM ج U+062C ARABIC LETTER JEEM ق U+0642 ARABIC LETTER QAF ♫ U+266B BEAMED EIGHTH NOTES ♣ U+2663 BLACK CLUB SUIT ┼ U+253C BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL U+0009 CHARACTER TABULATION ̀ U+0300 COMBINING GRAVE ACCENT © U+00A9 COPYRIGHT SIGN Ќ U+040C CYRILLIC CAPITAL LETTER KJE к U+043A CYRILLIC SMALL LETTER KA 2 U+0032 DIGIT TWO = U+003D EQUALS SIGN € U+20AC EURO SIGN ≥ U+2265 GREATER-THAN OR EQUAL TO Δ U+0394 GREEK CAPITAL LETTER DELTA δ U+03B4 GREEK SMALL LETTER DELTA א U+05D0 HEBREW LETTER ALEF אָ U+FB2F HEBREW LETTER ALEF WITH QAMATS ר U+05E8 HEBREW LETTER RESH Ă U+0102 LATIN CAPITAL LETTER A WITH BREVE Ầ U+1EA6 LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND GRAVE É U+00C9 LATIN CAPITAL LETTER E WITH ACUTE N U+004E LATIN CAPITAL LETTER N ă U+0103 LATIN SMALL LETTER A WITH BREVE ầ U+1EA7 LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRAVE é U+00E9 LATIN SMALL LETTER E WITH ACUTE n U+006E LATIN SMALL LETTER N ø U+00F8 LATIN SMALL LETTER O WITH STROKE “ U+201C LEFT DOUBLE QUOTATION MARK _ U+005F LOW LINE ˉ U+02C9 MODIFIER LETTER MACRON × U+00D7 MULTIPLICATION SIGN  U+FFFC OBJECT REPLACEMENT CHARACTER % U+0025 PERCENT SIGN ” U+201D RIGHT DOUBLE QUOTATION MARK → U+2192 RIGHTWARDS ARROW U+0020 SPACE √ U+221A SQUARE ROOT ™ U+2122 TRADE MARK SIGN | U+007C VERTICAL LINE ½ U+00BD VULGAR FRACTION ONE HALF ☺ U+263A WHITE SMILING FACE U+FEFF ZERO WIDTH NO-BREAK SPACE
If you use the option Edit > Line Operations > Sort Lines Lexicographically Ascending, this list becomes :
U+0009 CHARACTER TABULATION U+0020 SPACE % U+0025 PERCENT SIGN 2 U+0032 DIGIT TWO = U+003D EQUALS SIGN N U+004E LATIN CAPITAL LETTER N _ U+005F LOW LINE n U+006E LATIN SMALL LETTER N | U+007C VERTICAL LINE © U+00A9 COPYRIGHT SIGN ½ U+00BD VULGAR FRACTION ONE HALF É U+00C9 LATIN CAPITAL LETTER E WITH ACUTE × U+00D7 MULTIPLICATION SIGN é U+00E9 LATIN SMALL LETTER E WITH ACUTE ø U+00F8 LATIN SMALL LETTER O WITH STROKE Ă U+0102 LATIN CAPITAL LETTER A WITH BREVE ă U+0103 LATIN SMALL LETTER A WITH BREVE ˉ U+02C9 MODIFIER LETTER MACRON ̀ U+0300 COMBINING GRAVE ACCENT Δ U+0394 GREEK CAPITAL LETTER DELTA δ U+03B4 GREEK SMALL LETTER DELTA Ќ U+040C CYRILLIC CAPITAL LETTER KJE к U+043A CYRILLIC SMALL LETTER KA א U+05D0 HEBREW LETTER ALEF ר U+05E8 HEBREW LETTER RESH ج U+062C ARABIC LETTER JEEM ق U+0642 ARABIC LETTER QAF Ầ U+1EA6 LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND GRAVE ầ U+1EA7 LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRAVE “ U+201C LEFT DOUBLE QUOTATION MARK ” U+201D RIGHT DOUBLE QUOTATION MARK € U+20AC EURO SIGN ™ U+2122 TRADE MARK SIGN → U+2192 RIGHTWARDS ARROW √ U+221A SQUARE ROOT ≥ U+2265 GREATER-THAN OR EQUAL TO ┼ U+253C BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL ☺ U+263A WHITE SMILING FACE ♣ U+2663 BLACK CLUB SUIT ♫ U+266B BEAMED EIGHTH NOTES אָ U+FB2F HEBREW LETTER ALEF WITH QAMATS ﯿ U+FBFF ARABIC LETTER FARSI YEH MEDIAL FORM U+FEFF ZERO WIDTH NO-BREAK SPACE  U+FFFC OBJECT REPLACEMENT CHARACTER
And it’s obvious that it’s sorted, according to the
U+####
value of each character ! So, the correct formulation should be Sort Lines by Unicode values Ascending /Descending :-D
BTW, It would be nice if we could sort, according to our local language. For instance, this original list of some French words, below :
ère école bateau euro colis eau ferme élu à emploi lit embarras émoi zoo elle errer avion été sceau ébène
is, presently, sorted as :
avion bateau colis eau elle embarras emploi errer euro ferme lit sceau zoo à ère ébène école élu émoi été
However, the correct order, in a French dictionary, is :
à avion bateau colis eau ébène école elle élu embarras émoi emploi ère errer été euro ferme lit sceau zoo
In the same way, the regex expression
(?-i)[e-f]
should match, for instance, the lower-case letters e and f and all Latin accentuated forms of the letter e. In other words, it should be equivalent to the regex(?-i)[eèéêëēĕėęěẹẻẽếềểễệ℮f]
, if I consider, simply, the Courier New font !Best Regards,
guy038
-
@guy038 :
english_customizable.xml:
<Item id="42059" name="Sort Lines Lexicographically Ascending"/> <Item id="42060" name="Sort Lines Lexicographically Descending"/>
->
<Item id="42059" name="Sort Lines by Unicode values Ascending"/> <Item id="42060" name="Sort Lines by Unicode values Descending"/>
=
:-)