Community

    • Login
    • Search
    • Recent
    • Tags
    • Popular
    • Users
    • Groups
    • Search

    Python - string encoding

    Help wanted · · · – – – · · ·
    2
    6
    4721
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • DaveyD
      DaveyD last edited by

      Hi
      I have a simple python script that takes the selected text, reverses it, and replaces the selection with the reversed string
      With standard English letters, it works perfectly, but when I have Hebrew letters selected, some characters don’t get entered correctly (I get some binary hex characters e.g. xA3 in a blue box)

      The document is encoded with OEM 862 encoding
      Here is the script that I use

      text = editor.getSelText()
      editor.replaceSel(text[::-1])
      

      If anyone can help with this I would appreciate it
      Thanks,
      Davey

      Claudia Frank 1 Reply Last reply Reply Quote 0
      • Claudia Frank
        Claudia Frank @DaveyD last edited by Claudia Frank

        Hello Davey,

        did you read this?

        Cheers
        Claudia

        1 Reply Last reply Reply Quote 1
        • DaveyD
          DaveyD last edited by

          Hi Claudia - Thanks for the link!
          I did read about that (but not there, rather on stackoverflow.com), and I’ve tried and tried but haven’t been able to figure out what to do!
          Now that you pointed to it again, I put more time into it… and finally I figured out a method to get it to work!
          However, I have no clue how it works!
          This is what I have:

          • A text file encoded in OEM 862 (or in python ‘cp862’)
          • A script with default encoding (ANSI)

          So, I would think that I have to do some type of encoding with cp862 and utf-8 (since python works in utf-8).
          I tried:

          text.decode('cp862')
          text.encode('cp862')
          
          text.decode('cp862')
          text.encode('utf-8')
          
          text.decode('utf-8')
          text.encode('cp862') 
          

          None of the above worked!
          Now, I tried the exact example given in the help file you pointed to above:

          text.decode('utf-8')
          text.encode('utf-8')
          

          And this worked perfectly!
          This is great! My script is now working!
          But… Why?! why does this make sense??
          I am so confused with this encoding business…! :)

          Thanks,
          Davey

          Claudia Frank 1 Reply Last reply Reply Quote 0
          • Claudia Frank
            Claudia Frank @DaveyD last edited by Claudia Frank

            Hello Davey,

            what should I say, it’s like the regexes.
            Whenever you think you finally understand its behaviour
            something happens which you didn’t expect.
            Tbh, I can’t give you any advice on this as I’m still in the same situation as you - why does it what it does. ;-)

            But, c’mon - be happy it works and there will be a tomorrow when it fails again,
            so take the chance to celebrate - hip hip horroray ;-)

            Cheers
            Claudia

            1 Reply Last reply Reply Quote 0
            • DaveyD
              DaveyD last edited by

              :) :) :)
              Thanks Claudia - you made me laugh!
              I just hope it doesnt happen any time soon…! :)

              Cheers
              Davey

              1 Reply Last reply Reply Quote 0
              • Claudia Frank
                Claudia Frank last edited by

                you made me laugh!

                Well, then my work is done - time to party

                Cheers
                Claudia

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Copyright © 2014 NodeBB Forums | Contributors