Community
    • Login

    Python - string encoding

    Scheduled Pinned Locked Moved Help wanted · · · – – – · · ·
    6 Posts 2 Posters 5.2k Views
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DaveyDD
      DaveyD
      last edited by

      Hi
      I have a simple python script that takes the selected text, reverses it, and replaces the selection with the reversed string
      With standard English letters, it works perfectly, but when I have Hebrew letters selected, some characters don’t get entered correctly (I get some binary hex characters e.g. xA3 in a blue box)

      The document is encoded with OEM 862 encoding
      Here is the script that I use

      text = editor.getSelText()
      editor.replaceSel(text[::-1])
      

      If anyone can help with this I would appreciate it
      Thanks,
      Davey

      Claudia FrankC 1 Reply Last reply Reply Quote 0
      • Claudia FrankC
        Claudia Frank @DaveyD
        last edited by Claudia Frank

        Hello Davey,

        did you read this?

        Cheers
        Claudia

        1 Reply Last reply Reply Quote 1
        • DaveyDD
          DaveyD
          last edited by

          Hi Claudia - Thanks for the link!
          I did read about that (but not there, rather on stackoverflow.com), and I’ve tried and tried but haven’t been able to figure out what to do!
          Now that you pointed to it again, I put more time into it… and finally I figured out a method to get it to work!
          However, I have no clue how it works!
          This is what I have:

          • A text file encoded in OEM 862 (or in python ‘cp862’)
          • A script with default encoding (ANSI)

          So, I would think that I have to do some type of encoding with cp862 and utf-8 (since python works in utf-8).
          I tried:

          text.decode('cp862')
          text.encode('cp862')
          
          text.decode('cp862')
          text.encode('utf-8')
          
          text.decode('utf-8')
          text.encode('cp862') 
          

          None of the above worked!
          Now, I tried the exact example given in the help file you pointed to above:

          text.decode('utf-8')
          text.encode('utf-8')
          

          And this worked perfectly!
          This is great! My script is now working!
          But… Why?! why does this make sense??
          I am so confused with this encoding business…! :)

          Thanks,
          Davey

          Claudia FrankC 1 Reply Last reply Reply Quote 0
          • Claudia FrankC
            Claudia Frank @DaveyD
            last edited by Claudia Frank

            Hello Davey,

            what should I say, it’s like the regexes.
            Whenever you think you finally understand its behaviour
            something happens which you didn’t expect.
            Tbh, I can’t give you any advice on this as I’m still in the same situation as you - why does it what it does. ;-)

            But, c’mon - be happy it works and there will be a tomorrow when it fails again,
            so take the chance to celebrate - hip hip horroray ;-)

            Cheers
            Claudia

            1 Reply Last reply Reply Quote 0
            • DaveyDD
              DaveyD
              last edited by

              :) :) :)
              Thanks Claudia - you made me laugh!
              I just hope it doesnt happen any time soon…! :)

              Cheers
              Davey

              1 Reply Last reply Reply Quote 0
              • Claudia FrankC
                Claudia Frank
                last edited by

                you made me laugh!

                Well, then my work is done - time to party

                Cheers
                Claudia

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                The Community of users of the Notepad++ text editor.
                Powered by NodeBB | Contributors