Python 3: How to convert a bytearray to an ASCII string - python

I have the following bytearray
bytearray(b'S\x00t\x00a\x00n\x00d\x00a\x00r\x00d\x00F\x00i\x00r\x00m\x00a\x00t\x00a\x00.\x00i\x00n\x00o\x00')
It should spell out StandardFirmata.ino however, I can't figure out how to decode it.
Here is what I have tried:
print(str(board.sysex_list)) #Appears to just return a string that looks identical
print(board.sysex_list.decode()) # Returns just S
Is there a simple way to do this?

Wrong encoding.
3>> bytearray(b'S\x00t\x00a\x00n\x00d\x00a\x00r\x00d\x00F\x00i\x00r\x00m\x00a\x00t\x00a\x00.\x00i\x00n\x00o\x00').decode('utf-16le')
'StandardFirmata.ino'
But that's not ASCII.

The issue was that I was not specifying a decoding. All I had to do was change decode to decode('utf-16-le')

Related

Python encoding problem when reading but not when typing

I'm reading some strings from a text file.
Some of these strings have some "strange" characters, e.g. "\xc3\xa9comiam".
If I copy that string and paste it into a variable, I can convert it to readable characters:
string = "\xc3\xa9comiam"
print(string.encode("raw_unicode_escape").decode('utf-8'))
écomiam
but if I read it from the file, it doesn't work:
with open(fn) as f:
for string in f.readlines():
print(string.encode("raw_unicode_escape").decode('utf-8'))
\xc3\xa9comiam
It seems the solution must be pretty easy, but I can't find it.
What can I do?
Thanks!
Those not unicode-escape ones - like the name suggests, that handles Unicode sequences like \u00e9 but not \xe9.
What you have is a UTF-8 enooded sequence. The way to decode that is to get it into a bytes sequence which can then be decoded to a Unicode string.
# Let's not shadow the string library
s = "\xc3\xa9comiam"
print(bytes(s, 'latin-1').decode('utf-8'))
The 'latin-1' trick is a dirty secret which simply converts every byte to a character with the same character code.
For your file, you could open it in binary mode so you don't have to explictly convert it to bytes, or you could simply apply the same conversion to the strings you read.
Thanks everyone for your help,
I think, I've found a solution (not very elegant, but it does the trick).
print(bytes(tm.strip(), "utf-8").decode("unicode_escape").encode("raw_unicode_escape").decode('utf-8'))
Thanks!

Python 3 Decoding Strings

I understand that this is likely a repeat question, but I'm having trouble finding a solution.
In short I have a string I'd like to decode:
raw = "\x94my quote\x94"
string = decode(raw)
expected from string
'"my quote"'
Last point of note is that I'm working with Python 3 so raw is unicode, and thus is already decoded. Given that, what exactly do I need to do to "decode" the "\x94" characters?
string = "\x22my quote\x22"
print(string)
You don't need to decode, Python 3 does that for you, but you need the correct control character for the double quote "
If however you have a different character set, it appears you have Windows-1252, then you need to decode the byte string from that character set:
str(b"\x94my quote\x94", "windows-1252")
If your string isn't a byte string you have to encode it first, I found the latin-1 encoding to work:
string = "\x94my quote\x94"
str(string.encode("latin-1"), "windows-1252")
I don't know if you mean to this, but this works:
some_binary = a = b"\x94my quote\x94"
result = some_binary.decode()
And you got the result...
If you don't know which encoding to choose, you can use chardet.detect:
import chardet
chardet.detect(some_binary)
Did you try it like this? I think you need to call decode as a method of the byte class, and pass utf-8 as the argument. Add b in front of the string too.
string = b"\x94my quote\x94"
decoded_str = string.decode('utf-8', 'ignore')
print(decoded_str)

Transform ascii to unicode

I am not able to convert
'Schutzt\xc3\xbcren'.encode("utf-8")
the following to unicode, but cannot, getting the error
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)
I would like to get
'Schutztüren'
as a result.
Your string is already in utf-8. You need to decode it to Unicode in order to use it inside Python:
print 'Schutzt\xc3\xbcren'.decode("utf-8")
But you have a bigger problem: You are clearly using Python 2. Switch to Python 3 immediately, there is no reason to drive yourself crazy trying to understand the Python 2 approach to handling character encodings. Switch to Python 3 and you will not have to bang your head against your desk several times a day. (Note that although you were calling the encode() method, you got a
UnicodeDecodeError.
A simple explanation:
In Python, unicode and utf-8 are different things. A str in Python 2 might be in the "utf-8" encoding, unicode objects have no encoding.
If you try to use a str for something that requires unicode (e.g., to encode() it), or vice versa, Python 2 will try to implicitly convert it first. Except it doesn't know the encoding of your strings, so it guesses (ascii, in your case). Oops.
Python2 has a lot of implicit conversions.
But really the reason is simple: You are not using Python 3.
Edit: Since Python 3 is not an option, here is some practical advice:
Unicode sandwich: Convert all text to Unicode as soon as it's read in, work with unicode strings and encode back to a utf8 str only to write it out again.
Pandas should still support the encoding argument to to_csv(), even on Python 2. Use it to write your files in utf8.
For reading a file directly, use codecs.open() instead of plain open() to read files. It accepts the encoding= argument and will give you unicode strings.
You need to use decode utf-8 encoded string to unicode instead.
'Schutzt\xc3\xbcren'.decode("utf-8")
in python 3 you'd need to decode the bytes that are your encoded string:
b'Schutzt\xc3\xbcren'.decode("utf-8")
in python 2 the b is not necessary (here the distinction between bytes and strings is less strict...).

Python decode from ascii hex

I'm looking for a function to convert "\x61\x62\x63\x64\x65\x66" to "abcdef" in python, without having to print it
and is the proper term for this type of encoding ascii hex?
>>> "\x61\x62\x63\x64\x65\x66".decode('ascii')
u'abcdef'
You can convert "\x61\x62\x63\x64\x65\x66" to a unicode string with the unicode method:
unicode("\x61\x62\x63\x64\x65\x66")
Output: u'abcdef'

Which encoding?

does anybody know in which way the string 'Krummh%C3%B6rn' is encoded?
Plain text is "Krummhörn".
I need to decode strings like this one in Python and tried urllib.unquote('Krummh%C3%B6rn')
The result: 'Krummh\xc3\xb6rn'
That's UTF-8 in URL encoding.
print(urllib.unquote('Krummh%C3%B6rn').decode('utf-8'))
prints the string as you'd expect it to look.
You're halfway there. Take that result and decode it as UTF-8.
Looks like URL encoding

Categories

Resources