Community
    • Login

    2 txt files are different in notepad , but similar in notepad++

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    6 Posts 3 Posters 1.4k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • amir darA
      amir dar
      last edited by

      Hi
      I got 2 txt files that some process in our system is building.
      they consist of a list of directories.

      when I inspect them in notepad 1 file (A) has the directory as a string with no spaces, while the other one (B) has a “space” after each character (which is the proper way we want)

      b38c81fc-36a3-4dd4-9fe2-46189480655b-image.png

      however, when opening both files in notepadd++ - none of them is showing the spaces

      615b55d3-cf39-41c9-b00c-f3a9a0fb2221-image.png

      in show symbols, the “show all characters” is turned on
      when i inspect both of the files as hexa they are the same (well at least the prefix which is the same directory)

      any ideas why notepad++ not showing the spaces?

      PeterJonesP Alan KilbornA 2 Replies Last reply Reply Quote 0
      • PeterJonesP
        PeterJones @amir dar
        last edited by PeterJones

        @amir-dar ,

        Notice you put “space” in quotes. I am betting that is because it’s not really a space, it’s really the 0 byte of the two-byte UTF-16 LE encoding that Windows uses for Unicode files. Please even note that your “oneline” file in Windows Notepad shows that it’s “UTF-16 LE” (lower right corner) – That file will have the 0-bytes between the characters in the raw form (you don’t show the hexa output; you really should have).

        Notepad++ then properly sees that the files are encoded as UTF-16 LE, and interprets each two-byte sequence as a single character, because that’s what it is. It is doing the right thing by not showing 0bytes (which you called “spaces” but are actually NUL bytes when interpreted as a single-byte character). If you were to look at the lower-right of Notepad++, you would see that it shows UTF-16 LE BOM or similar, as it should: 2078f62b-0cbf-4dfc-9bb6-5d3c173088b0-image.png

        The only bug in the above examples is that MS Notepad is interpreting the “multiline_protected.txt” as UTF-8 instead of the correct UTF-16 LE, and wrongly showing you the 0-bytes as separate characters instead of as part of the character like in “oneline_protected.txt”.

        –
        edit: removed all the incorrect mentions of UCS2-LE (I misrembered which was which when first writing the post).

        amir darA 1 Reply Last reply Reply Quote 1
        • Alan KilbornA
          Alan Kilborn @amir dar
          last edited by

          @amir-dar

          I think some studying and understanding of Unicode encoding concepts is in order before you go too much farther along in your current task.

          1 Reply Last reply Reply Quote 0
          • amir darA
            amir dar @PeterJones
            last edited by

            @PeterJones thx for the detailed answer!
            one thing that still confuses me - why doest MS notepad doesn’t interpret the “online” txt as UTF-8 ? why it only see this file as utf-16 , but not other files ?

            PeterJonesP 1 Reply Last reply Reply Quote 0
            • PeterJonesP
              PeterJones @amir dar
              last edited by

              @amir-dar

              why doest MS notepad…

              That would technically be a question for a MS forum. But I will give you my insight, anyway

              why doest MS notepad doesn’t interpret the “online” txt as UTF-8 ?

              You’ve actually got the question backwords: the oneline text file is the one that MS properly interpreted as UTF-16, because that’s what it is. The multiline text file is the one that should be read as UTF-16, but for some reason MS reads it as UTF-8 instead, so shows all the null characters as blanks between characters.

              why it only see this file as utf-16 , but not other files ?

              I am not an expert on Microsoft’s decision making algorithm.

              My guess is that on the file where MS reads the UTF-16 file as UTF-8 that there is at least one character that isn’t properly encoded as 2-byte UTF-16 (so maybe it has an odd number of bytes in the file, which is technically impossible in a UTF-16 encoded text file, or some garbage character(s) that aren’t recognized are in there).

              amir darA 1 Reply Last reply Reply Quote 1
              • amir darA
                amir dar @PeterJones
                last edited by

                @PeterJones said in 2 txt files are different in notepad , but similar in notepad++:

                (s) that aren’t recognized are in the

                thanks a lot, dude. this really was helpful and insightful !

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors