I need to send a string via tcp. One of the first sections of the string is the length of the command variable
Example:
command = STATUS?UPDATE
I need to send the following string below
sendCommand = '\x00\x00\x00'+STRINGLENGTH+'\x02'+command+'\x0D\x0A'
My string length is 11 so I need STRINGLENGTH to be the hex equivalent of 11, which is 0xB, except that I need it to output as \x0B
Padding it with the leading 0 is easy, but I cannot get it to output as \x instead of 0x, and if I do a string replace it is treated as text and not as hex, so it doesn't work.
My final hex string should be:
\x00\x00\x00\x0B\x02\x53\x54\x41\x54\x55\x53\x3f\x55\x53\x45\x52\x0D\x0A
I am instead getting:
\x00\x00\x000x0B\x02\x53\x54\x41\x54\x55\x53\x3f\x55\x53\x45\x52\x0D\x0A
Any ideas on how to format it correctly?
So, this is a bit of a round-about fashion, but use a bytes object:
>>> STRINGLENGTH = bytes([11]).decode()
>>> endCommand = '\x00\x00\x00'+STRINGLENGTH+'\x02'
>>> endCommand
'\x00\x00\x00\x0b\x02'
Almost certainly, you are going to want to change your str object back to a bytes object, but the above should get you going.
I suspect what you were doing was using the hex function:
>>> STRINGLENGTH = hex(11)
>>> endCommand = '\x00\x00\x00'+STRINGLENGTH+'\x02'
>>> endCommand
'\x00\x00\x000xb\x02'
The fundamental thing you need to understand is that you aren't working with "hex", you are working with bytes. Hex is just how bytes are traditionally represented. The hex helper function returns a hexadecimal representation, as a string of an integer. But that isn't what you want. You want the byte corresponding to the value 11.
Note, for the ascii-range, chr(i) might works as well, so
>>> STRINGLENGTH = chr(11)
>>> endCommand = '\x00\x00\x00'+STRINGLENGTH+'\x02'
>>> endCommand
'\x00\x00\x00\x0b\x02'
But be careful, say you wanted the number 129, you have to care about the encoding...
>>> chr(129)
'\x81'
But in bytes, in UTF-8, that's actually represented by two different bytes
>>> chr(129).encode()
b'\xc2\x81'
>>> list(chr(129).encode())
[194, 129]
Which of course, depends on the encoding:
>>> chr(129).encode('latin')
b'\x81'
>>> list(chr(129).encode('latin'))
[129]
>>>
For that reason, I think it is safer to stick with the slightly wordier:
>>> bytes([129])
b'\x81'
Related
I want to encode string to bytes.
To convert to byes, I used byte.fromhex()
>>> byte.fromhex('7403073845')
b't\x03\x078E'
But it displayed some characters.
How can it be displayed as hex like following?
b't\x03\x078E' => '\x74\x03\x07\x38\x45'
I want to encode string to bytes.
bytes.fromhex() already transforms your hex string into bytes. Don't confuse an object and its text representation -- REPL uses sys.displayhook that uses repr() to display bytes in ascii printable range as the corresponding characters but it doesn't affect the value in any way:
>>> b't' == b'\x74'
True
Print bytes to hex
To convert bytes back into a hex string, you could use bytes.hex method since Python 3.5:
>>> b't\x03\x078E'.hex()
'7403073845'
On older Python version you could use binascii.hexlify():
>>> import binascii
>>> binascii.hexlify(b't\x03\x078E').decode('ascii')
'7403073845'
How can it be displayed as hex like following? b't\x03\x078E' => '\x74\x03\x07\x38\x45'
>>> print(''.join(['\\x%02x' % b for b in b't\x03\x078E']))
\x74\x03\x07\x38\x45
The Python repr can't be changed. If you want to do something like this, you'd need to do it yourself; bytes objects are trying to minimize spew, not format output for you.
If you want to print it like that, you can do:
from itertools import repeat
hexstring = '7403073845'
# Makes the individual \x## strings using iter reuse trick to pair up
# hex characters, and prefixing with \x as it goes
escapecodes = map(''.join, zip(repeat(r'\x'), *[iter(hexstring)]*2))
# Print them all with quotes around them (or omit the quotes, your choice)
print("'", *escapecodes, "'", sep='')
Output is exactly as you requested:
'\x74\x03\x07\x38\x45'
I'm reading a wav audio file in Python using wave module. The readframe() function in this library returns frames as hex string. I want to remove \x of this string, but translate() function doesn't work as I want:
>>> input = wave.open(r"G:\Workspace\wav\1.wav",'r')
>>> input.readframes (1)
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\\x')
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\x')
ValueError: invalid \x escape
>>> '\xff\x1f\x00\xe8'.translate(None,r'\x')
'\xff\x1f\x00\xe8'
>>>
Any way I want divide the result values by 2 and then add \x again and generate a new wav file containing these new values. Does any one have any better idea?
What's wrong?
Indeed, you don't have backslashes in your string. So, that's why you can't remove them.
If you try to play with each hex character from this string (using ord() and len() functions - you'll see their real values. Besides, the length of your string is just 4, not 16.
You can play with several solutions to achieve your result:
'hex' encode:
'\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'
Or use repr() function:
repr('\xff\x1f\x00\xe8').translate(None,r'\\x')
One way to do what you want is:
>>> s = '\xff\x1f\x00\xe8'
>>> ''.join('%02x' % ord(c) for c in s)
'ff1f00e8'
The reason why translate is not working is that what you are seeing is not the string itself, but its representation. In other words, \x is not contained in the string:
>>> '\\x' in '\xff\x1f\x00\xe8'
False
\xff, \x1f, \x00 and \xe8 are the hexadecimal representation of for characters (in fact, len(s) == 4, not 24).
Use the encode method:
>>> s = '\xff\x1f\x00\xe8'
>>> print s.encode("hex")
'ff1f00e8'
As this is a hexadecimal representation, encode with hex
>>> '\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'
If I have a binary string, let say str = "010100011010101001001101100101100110101" which is an encoded by base64 version of some other string how can I decode this string?
It would have been great if your example string is actually something meaningful rather than something made up which makes this question rather unclear, but I will try my best here to figure out what you might have meant in the most verbose manner possible.
Assuming your actual input is a str that looks like this:
s = '101100101010011010000100011000001011010010110000100111000110000'
You can get the hexadecimal form of this by casting it to int using the base keyword argument
>>> i = int(s, base=2) # 6436561067884170800
Then turn it back into a string by formatting it like so:
>>> h = '%x' % i # '595342305a584e30'
Then use the binascii.a2b_hex function on the hexadecimal string to get the raw bytes:
>>> b64 = binascii.a2b_hex(h) # b'YSB0ZXN0'
If it is some valid base 64 encoded stream of bytes, you may then use base64.b64decode on that to get the actual bytes
>>> r = base64.b64decode(b64) # b'a test'
To turn that into a string, apply the correct codec to it (i.e. use bytes.encode).
Finally, if you cared to know how I generated that input, this is all the above, reversed into a single one-line function:
>>> '{0:b}'.format(int(binascii.b2a_hex(base64.b64encode(b'a test')), base=16))
'101100101010011010000100011000001011010010110000100111000110000'
I know this may sounds like a duplicate question, but that's because I don't know how to describe this question properly.
For some reason I got a bunch of unicode string like this:
a = u'\xcb\xea'
As you can see, it's actually bytes representation of a Chinese character, encoding in gbk
>>> print(b'\xcb\xea'.decode('gbk'))
岁
u'岁' is what I need, but I don't know how to convert u'\xcb\xea' to b'\xcb\xea'.
Any suggestions?
It's not really a bytes representation, it's still unicode codepoints. They are the wrong codepoints, because it was decoded from bytes as if it was encoded to Latin-1.
Encode to Latin 1 (whose codepoints map one-on-one to bytes), then decode as GBK:
a.encode('latin1').decode('gbk')
Demo:
>>> a = u'\xcb\xea'
>>> a.encode('latin1').decode('gbk')
u'\u5c81'
>>> print a.encode('latin1').decode('gbk')
岁
The simpliest way for python2 is to use the repr():
>>> key_unicode = u'uuuu\xf6\x9f_\xa1\x05\xeb9\xd4\xa3\xd1'
>>> key_ascii = 'uuuu\xf6\x9f_\xa1\x05\xeb9\xd4\xa3\xd1'
>>> print(key_ascii)
uuuu��_��9ԣ�
>>> print(key_unicode)
uuuuö_¡ë9Ô£Ñ
>>>
>>> # here is the save method for both string types:
>>> print(repr(key_ascii).lstrip('u')[1:-1])
uuuu\xf6\x9f_\xa1\x05\xeb9\xd4\xa3\xd1
>>> print(repr(key_unicode).lstrip('u')[1:-1])
uuuu\xf6\x9f_\xa1\x05\xeb9\xd4\xa3\xd1
>>> # ____________WARNING!______________
>>> # if you will use jsut `str.strip('u\'\"')`, you will lose
>>> # the "uuuu" (and quotes, if such are present) on sides of the string:
>>> print(repr(key_unicode).strip('u\'\"'))
\xf6\x9f_\xa1\x05\xeb9\xd4\xa3\xd1
For python3 use str.encode() to get the bytes type.
>>> key = 'l\xf6\x9f_\xa1\x05\xeb9\xd4\xa3\xd1q\xf5L\xa9\xdd0\x90\x8b\xf5ht\x86za\x0e\x1b\xed\xb6(\xaa+'
>>> key
'lö\x9f_¡\x05ë9Ô£ÑqõL©Ý0\x90\x8bõht\x86za\x0e\x1bí¶(ª+'
>>> print(key)
lö_¡ë9Ô£ÑqõL©Ý0õhtzaí¶(ª+
>>> print(repr(key.encode()).lstrip('b')[1:-1])
l\xc3\xb6\xc2\x9f_\xc2\xa1\x05\xc3\xab9\xc3\x94\xc2\xa3\xc3\x91
Given a character code as integer number in one encoding, how can you get the character code in, say, utf-8 and again as integer?
UTF-8 is a variable-length encoding, so I'll assume you really meant "Unicode code point". Use chr() to convert the character code to a character, decode it, and use ord() to get the code point.
>>> ord(chr(145).decode('koi8-r'))
9618
You can only map an "integer number" from one encoding to another if they are both single-byte encodings.
Here's an example using "iso-8859-15" and "cp1252" (aka "ANSI"):
>>> s = u'€'
>>> s.encode('iso-8859-15')
'\xa4'
>>> s.encode('cp1252')
'\x80'
>>> ord(s.encode('cp1252'))
128
>>> ord(s.encode('iso-8859-15'))
164
Note that ord is here being used to get the ordinal number of the encoded byte. Using ord on the original unicode string would give its unicode code point:
>>> ord(s)
8364
The reverse operation to ord can be done using either chr (for codes in the range 0 to 127) or unichr (for codes in the range 0 to sys.maxunicode):
>>> print chr(65)
A
>>> print unichr(8364)
€
For multi-byte encodings, a simple "integer number" mapping is usually not possible.
Here's the same example as above, but using "iso-8859-15" and "utf-8":
>>> s = u'€'
>>> s.encode('iso-8859-15')
'\xa4'
>>> s.encode('utf-8')
'\xe2\x82\xac'
>>> [ord(c) for c in s.encode('iso-8859-15')]
[164]
>>> [ord(c) for c in s.encode('utf-8')]
[226, 130, 172]
The "utf-8" encoding uses three bytes to encode the same character, so a one-to-one mapping is not possible. Having said that, many encodings (including "utf-8") are designed to be ASCII-compatible, so a mapping is usually possible for codes in the range 0-127 (but only trivially so, because the code will always be the same).
Here's an example of how the encode/decode dance works:
>>> s = b'd\x06' # perhaps start with bytes encoded in utf-16
>>> map(ord, s) # show those bytes as integers
[100, 6]
>>> u = s.decode('utf-16') # turn the bytes into unicode
>>> print u # show what the character looks like
٤
>>> print ord(u) # show the unicode code point as an integer
1636
>>> t = u.encode('utf-8') # turn the unicode into bytes with a different encoding
>>> map(ord, t) # show that encoding as integers
[217, 164]
Hope this helps :-)
If you need to construct the unicode directly from an integer, use unichr:
>>> u = unichr(1636)
>>> print u
٤