Community
    • Login

    Lexicographically or Alphabetically

    Scheduled Pinned Locked Moved Translation
    3 Posts 3 Posters 5.1k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • scootergrisenS
      scootergrisen
      last edited by

      Should Lexicographically perhaps be Alphabetically in these strings?:
      <Item id=“42059” name=“Sort Lines Lexicographically Ascending”/>
      <Item id=“42060” name=“Sort Lines Lexicographically Descending”/>

      I don’'t know the difference i would just think Alphabetically was a more know word but maybe that only applys to A-Z and not other characters.

      1 Reply Last reply Reply Quote 1
      • guy038G
        guy038
        last edited by

        Hello, @scootergrisen, and All,

        Strictly speaking, The N++ sort feature is a sort of the Unicode code-points of characters !

        Assuming, for instance, these few values, picked out, by hazard, from my Courrier New font, v2.90, that I sorted, alphabetically, by their Unicode official name

           ﯿ   U+FBFF   ARABIC LETTER FARSI YEH MEDIAL FORM
           ج   U+062C   ARABIC LETTER JEEM
           ق   U+0642   ARABIC LETTER QAF
           ♫   U+266B   BEAMED EIGHTH NOTES
           ♣   U+2663   BLACK CLUB SUIT
           ┼   U+253C   BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
           	   U+0009   CHARACTER TABULATION 
           ̀   U+0300   COMBINING GRAVE ACCENT
           ©   U+00A9   COPYRIGHT SIGN
           Ќ   U+040C   CYRILLIC CAPITAL LETTER KJE
           к   U+043A   CYRILLIC SMALL LETTER KA
           2   U+0032   DIGIT TWO
           =   U+003D   EQUALS SIGN
           €   U+20AC   EURO SIGN
           ≥   U+2265   GREATER-THAN OR EQUAL TO
           Δ   U+0394   GREEK CAPITAL LETTER DELTA
           δ   U+03B4   GREEK SMALL LETTER DELTA
           א   U+05D0   HEBREW LETTER ALEF
           אָ   U+FB2F   HEBREW LETTER ALEF WITH QAMATS
           ר   U+05E8   HEBREW LETTER RESH
           Ă   U+0102   LATIN CAPITAL LETTER A WITH BREVE
           Ầ   U+1EA6   LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND GRAVE
           É   U+00C9   LATIN CAPITAL LETTER E WITH ACUTE
           N   U+004E   LATIN CAPITAL LETTER N
           ă   U+0103   LATIN SMALL LETTER A WITH BREVE
           ầ   U+1EA7   LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRAVE
           é   U+00E9   LATIN SMALL LETTER E WITH ACUTE
           n   U+006E   LATIN SMALL LETTER N
           ø   U+00F8   LATIN SMALL LETTER O WITH STROKE
           “   U+201C   LEFT DOUBLE QUOTATION MARK
           _   U+005F   LOW LINE
           ˉ   U+02C9   MODIFIER LETTER MACRON
           ×   U+00D7   MULTIPLICATION SIGN
              U+FFFC   OBJECT REPLACEMENT CHARACTER
           %   U+0025   PERCENT SIGN
           ”   U+201D   RIGHT DOUBLE QUOTATION MARK
           →   U+2192   RIGHTWARDS ARROW
               U+0020   SPACE
           √   U+221A   SQUARE ROOT
           ™   U+2122   TRADE MARK SIGN
           |   U+007C   VERTICAL LINE
           ½   U+00BD   VULGAR FRACTION ONE HALF
           ☺   U+263A   WHITE SMILING FACE
               U+FEFF   ZERO WIDTH NO-BREAK SPACE
        

        If you use the option Edit > Line Operations > Sort Lines Lexicographically Ascending, this list becomes :

           	   U+0009   CHARACTER TABULATION 
               U+0020   SPACE
           %   U+0025   PERCENT SIGN
           2   U+0032   DIGIT TWO
           =   U+003D   EQUALS SIGN
           N   U+004E   LATIN CAPITAL LETTER N
           _   U+005F   LOW LINE
           n   U+006E   LATIN SMALL LETTER N
           |   U+007C   VERTICAL LINE
           ©   U+00A9   COPYRIGHT SIGN
           ½   U+00BD   VULGAR FRACTION ONE HALF
           É   U+00C9   LATIN CAPITAL LETTER E WITH ACUTE
           ×   U+00D7   MULTIPLICATION SIGN
           é   U+00E9   LATIN SMALL LETTER E WITH ACUTE
           ø   U+00F8   LATIN SMALL LETTER O WITH STROKE
           Ă   U+0102   LATIN CAPITAL LETTER A WITH BREVE
           ă   U+0103   LATIN SMALL LETTER A WITH BREVE
           ˉ   U+02C9   MODIFIER LETTER MACRON
           ̀   U+0300   COMBINING GRAVE ACCENT
           Δ   U+0394   GREEK CAPITAL LETTER DELTA
           δ   U+03B4   GREEK SMALL LETTER DELTA
           Ќ   U+040C   CYRILLIC CAPITAL LETTER KJE
           к   U+043A   CYRILLIC SMALL LETTER KA
           א   U+05D0   HEBREW LETTER ALEF
           ר   U+05E8   HEBREW LETTER RESH
           ج   U+062C   ARABIC LETTER JEEM
           ق   U+0642   ARABIC LETTER QAF
           Ầ   U+1EA6   LATIN CAPITAL LETTER A WITH CIRCUMFLEX AND GRAVE
           ầ   U+1EA7   LATIN SMALL LETTER A WITH CIRCUMFLEX AND GRAVE
           “   U+201C   LEFT DOUBLE QUOTATION MARK
           ”   U+201D   RIGHT DOUBLE QUOTATION MARK
           €   U+20AC   EURO SIGN
           ™   U+2122   TRADE MARK SIGN
           →   U+2192   RIGHTWARDS ARROW
           √   U+221A   SQUARE ROOT
           ≥   U+2265   GREATER-THAN OR EQUAL TO
           ┼   U+253C   BOX DRAWINGS LIGHT VERTICAL AND HORIZONTAL
           ☺   U+263A   WHITE SMILING FACE
           ♣   U+2663   BLACK CLUB SUIT
           ♫   U+266B   BEAMED EIGHTH NOTES
           אָ   U+FB2F   HEBREW LETTER ALEF WITH QAMATS
           ﯿ   U+FBFF   ARABIC LETTER FARSI YEH MEDIAL FORM
               U+FEFF   ZERO WIDTH NO-BREAK SPACE
              U+FFFC   OBJECT REPLACEMENT CHARACTER
        

        And it’s obvious that it’s sorted, according to the U+#### value of each character ! So, the correct formulation should be Sort Lines by Unicode values Ascending /Descending :-D


        BTW, It would be nice if we could sort, according to our local language. For instance, this original list of some French words, below :

        ère
        école
        bateau
        euro
        colis
        eau
        ferme
        élu
        à
        emploi
        lit
        embarras
        émoi
        zoo
        elle
        errer
        avion
        été
        sceau
        ébène
        

        is, presently, sorted as :

        avion
        bateau
        colis
        eau
        elle
        embarras
        emploi
        errer
        euro
        ferme
        lit
        sceau
        zoo
        à
        ère
        ébène
        école
        élu
        émoi
        été
        

        However, the correct order, in a French dictionary, is :

        à
        avion
        bateau
        colis
        eau
        ébène
        école
        elle
        élu
        embarras
        émoi
        emploi
        ère
        errer
        été
        euro
        ferme
        lit
        sceau
        zoo
        

        In the same way, the regex expression (?-i)[e-f] should match, for instance, the lower-case letters e and f and all Latin accentuated forms of the letter e. In other words, it should be equivalent to the regex (?-i)[eèéêëēĕėęěẹẻẽếềểễệ℮f], if I consider, simply, the Courier New font !

        Best Regards,

        guy038

        1 Reply Last reply Reply Quote 2
        • Scott SumnerS
          Scott Sumner
          last edited by

          @guy038 :

          english_customizable.xml:

          <Item id="42059" name="Sort Lines Lexicographically Ascending"/>
          <Item id="42060" name="Sort Lines Lexicographically Descending"/>
          

          ->

          <Item id="42059" name="Sort Lines by Unicode values Ascending"/>
          <Item id="42060" name="Sort Lines by Unicode values Descending"/>
          

          =

          Imgur

          :-)

          1 Reply Last reply Reply Quote 1
          • First post
            Last post
          The Community of users of the Notepad++ text editor.
          Powered by NodeBB | Contributors