How to remove '\x' from a hex string in Python? - python

I'm reading a wav audio file in Python using wave module. The readframe() function in this library returns frames as hex string. I want to remove \x of this string, but translate() function doesn't work as I want:
>>> input = wave.open(r"G:\Workspace\wav\1.wav",'r')
>>> input.readframes (1)
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\\x')
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\x')
ValueError: invalid \x escape
>>> '\xff\x1f\x00\xe8'.translate(None,r'\x')
'\xff\x1f\x00\xe8'
>>>
Any way I want divide the result values by 2 and then add \x again and generate a new wav file containing these new values. Does any one have any better idea?
What's wrong?

Indeed, you don't have backslashes in your string. So, that's why you can't remove them.
If you try to play with each hex character from this string (using ord() and len() functions - you'll see their real values. Besides, the length of your string is just 4, not 16.
You can play with several solutions to achieve your result:
'hex' encode:
'\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'
Or use repr() function:
repr('\xff\x1f\x00\xe8').translate(None,r'\\x')

One way to do what you want is:
>>> s = '\xff\x1f\x00\xe8'
>>> ''.join('%02x' % ord(c) for c in s)
'ff1f00e8'
The reason why translate is not working is that what you are seeing is not the string itself, but its representation. In other words, \x is not contained in the string:
>>> '\\x' in '\xff\x1f\x00\xe8'
False
\xff, \x1f, \x00 and \xe8 are the hexadecimal representation of for characters (in fact, len(s) == 4, not 24).

Use the encode method:
>>> s = '\xff\x1f\x00\xe8'
>>> print s.encode("hex")
'ff1f00e8'

As this is a hexadecimal representation, encode with hex
>>> '\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'

Related

Python format hex number

I need to send a string via tcp. One of the first sections of the string is the length of the command variable
Example:
command = STATUS?UPDATE
I need to send the following string below
sendCommand = '\x00\x00\x00'+STRINGLENGTH+'\x02'+command+'\x0D\x0A'
My string length is 11 so I need STRINGLENGTH to be the hex equivalent of 11, which is 0xB, except that I need it to output as \x0B
Padding it with the leading 0 is easy, but I cannot get it to output as \x instead of 0x, and if I do a string replace it is treated as text and not as hex, so it doesn't work.
My final hex string should be:
\x00\x00\x00\x0B\x02\x53\x54\x41\x54\x55\x53\x3f\x55\x53\x45\x52\x0D\x0A
I am instead getting:
\x00\x00\x000x0B\x02\x53\x54\x41\x54\x55\x53\x3f\x55\x53\x45\x52\x0D\x0A
Any ideas on how to format it correctly?
So, this is a bit of a round-about fashion, but use a bytes object:
>>> STRINGLENGTH = bytes([11]).decode()
>>> endCommand = '\x00\x00\x00'+STRINGLENGTH+'\x02'
>>> endCommand
'\x00\x00\x00\x0b\x02'
Almost certainly, you are going to want to change your str object back to a bytes object, but the above should get you going.
I suspect what you were doing was using the hex function:
>>> STRINGLENGTH = hex(11)
>>> endCommand = '\x00\x00\x00'+STRINGLENGTH+'\x02'
>>> endCommand
'\x00\x00\x000xb\x02'
The fundamental thing you need to understand is that you aren't working with "hex", you are working with bytes. Hex is just how bytes are traditionally represented. The hex helper function returns a hexadecimal representation, as a string of an integer. But that isn't what you want. You want the byte corresponding to the value 11.
Note, for the ascii-range, chr(i) might works as well, so
>>> STRINGLENGTH = chr(11)
>>> endCommand = '\x00\x00\x00'+STRINGLENGTH+'\x02'
>>> endCommand
'\x00\x00\x00\x0b\x02'
But be careful, say you wanted the number 129, you have to care about the encoding...
>>> chr(129)
'\x81'
But in bytes, in UTF-8, that's actually represented by two different bytes
>>> chr(129).encode()
b'\xc2\x81'
>>> list(chr(129).encode())
[194, 129]
Which of course, depends on the encoding:
>>> chr(129).encode('latin')
b'\x81'
>>> list(chr(129).encode('latin'))
[129]
>>>
For that reason, I think it is safer to stick with the slightly wordier:
>>> bytes([129])
b'\x81'

Importing unicode characters from YAML to Python [duplicate]

I'm trying to write out to a flat file some Chinese, or Russian or various non-English character-sets for testing purposes. I'm getting stuck on how to output a Unicode hex-decimal or decimal value to its corresponding character.
For example in Python, if you had a hard coded set of characters like абвгдежзийкл you would assign value = u"абвгдежзийкл" and no problem.
If however you had a single decimal or hex decimal like 1081 / 0439 stored in a variable and you wanted to print that out with it's corresponding actual character (and not just output 0x439) how would this be done? The Unicode decimal/hex value above refers to й.
Python 2: Use unichr():
>>> print(unichr(1081))
й
Python 3: Use chr():
>>> print(chr(1081))
й
So the answer to the question is:
convert the hexadecimal value to decimal with int(hex_value, 16)
then get the corresponding strin with chr().
To sum up:
>>> print(chr(int('0x897F', 16)))
西
While working on a project that included parsing some JSONs, I encountered a similar problem. I had a lot of strings that had all non-ASCII characters escaped like this:
>>> print(content)
\u0412\u044B j\u0435\u0441\u0442\u0435 \u0438\u0437 \u0420\u043E\u0441\u0441\u0438\u0438?
...
>>> print(content)
\u010Cemu jesi na\u010Dinal izu\u010Dati med\u017Euslovjansky jezyk?
Converting such mixes symbol-by-symbol with unichr() would be tedious. The solution I eventually decided on:
content.encode("utf8").decode("unicode-escape")
The first operation (encoding) produces bytestrings like this:
b'\\u0412\\u044B j\\u0435\\u0441\\u0442\\u0435 \\u0438\\u0437 \\u0420\\u043E\\u0441\\u0441\\u0438\\u0438?'
b'\\u010Cemu jesi na\\u010Dinal izu\\u010Dati med\\u017Euslovjansky jezyk?'
and the second operation (decoding) transforms the byte string into Unicode string but with \\ replaced by \, which "unpacks" the characters, giving the result like this:
Вы jесте из России?
Čemu jesi načinal izučati medžuslovjansky jezyk?
If you run into the error:
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
While trying to convert your hex value using unichr, you can get around that error by doing something like:
>>> n = int('0001f600', 16)
>>> s = '\\U{:0>8X}'.format(n)
>>> s
'\\U0001F600'
>>> binary = s.decode('unicode-escape')
>>> print(binary)
😀

Print bytes to hex

I want to encode string to bytes.
To convert to byes, I used byte.fromhex()
>>> byte.fromhex('7403073845')
b't\x03\x078E'
But it displayed some characters.
How can it be displayed as hex like following?
b't\x03\x078E' => '\x74\x03\x07\x38\x45'
I want to encode string to bytes.
bytes.fromhex() already transforms your hex string into bytes. Don't confuse an object and its text representation -- REPL uses sys.displayhook that uses repr() to display bytes in ascii printable range as the corresponding characters but it doesn't affect the value in any way:
>>> b't' == b'\x74'
True
Print bytes to hex
To convert bytes back into a hex string, you could use bytes.hex method since Python 3.5:
>>> b't\x03\x078E'.hex()
'7403073845'
On older Python version you could use binascii.hexlify():
>>> import binascii
>>> binascii.hexlify(b't\x03\x078E').decode('ascii')
'7403073845'
How can it be displayed as hex like following? b't\x03\x078E' => '\x74\x03\x07\x38\x45'
>>> print(''.join(['\\x%02x' % b for b in b't\x03\x078E']))
\x74\x03\x07\x38\x45
The Python repr can't be changed. If you want to do something like this, you'd need to do it yourself; bytes objects are trying to minimize spew, not format output for you.
If you want to print it like that, you can do:
from itertools import repeat
hexstring = '7403073845'
# Makes the individual \x## strings using iter reuse trick to pair up
# hex characters, and prefixing with \x as it goes
escapecodes = map(''.join, zip(repeat(r'\x'), *[iter(hexstring)]*2))
# Print them all with quotes around them (or omit the quotes, your choice)
print("'", *escapecodes, "'", sep='')
Output is exactly as you requested:
'\x74\x03\x07\x38\x45'

Printing Unicode elements in a loop

Consider this:
print u'\u2599'
I get
▙
something like this, which is what I need
But when I try to run it in a loop like this :
for i in range(2500,2600):
str1 = """u\'\\u""" + str(i) + '\''
print str1
I just get an output like:
u'\u2500'
u'\u2501'
u'\u2502'
u'\u2503'
u'\u2504'
u'\u2505'
u'\u2506'
u'\u2507'
u'\u2508'
u'\u2509'
u'\u2510'
u'\u2511'
u'\u2512'
u'\u2513'
u'\u2514'
How do I get the code to print the Unicode values correctly in a loop?
I tried capturing the print output from the cmd prompt but it displays an error:
Unable to initialize device PRN
(which I researched and is probably because of the print command).
You are confusing literal syntax and the value it produces. You cannot produce a value and expect it to be treated as a literal, the same way that producing a string with '1' + '0' does not make the integer 10.
Use the unichr() function to convert an integer to a Unicode character, or use the unicode_escape codec to decode a bytestring containing Python literal syntax to a Unicode string:
>>> unichr(0x2599)
u'\u2599'
>>> print unichr(0x2599)
▙
>>> print '\\u2599'
\u2599
>>> print '\\u2599'.decode('unicode_escape')
▙
You are also missing the crucial detail that the \uhhhh syntax uses hexadecimal numbers. 2500 decimal is 9C4 in hexadecimal, and 2500 in hexadecimal is 9472 in decimal.
To produce your range of values then, you want to use the 0xhhhh Python literal notation to produce a sequence between 0x2500 hex and 0x2600 hex:
for codepoint in range(0x2500, 0x2600):
print unichr(codepoint)
as that's easier to read and understand when using Unicode codepoints.
for i in range(0x2500, 0x2600):
print unichr(i)
Why on earth are you doing it like that?
If you're trying to print the code-points in that range you should do this:
for i in range(0x2500,0x2600):
print unichr(i)
All you're doing in your code above is constructing a string with literal "\u" in it and a number ...
In [9]: for i in range(2500,2503):
a="\\u"+str(i)
print a.decode('unicode-escape')
...:
─
━
│

Convert hex-string to string using binascii

By hex-string, it is a regular string except every two characters represents some byte, which is mapped to some ASCII char.
So for example the string
abc
Would be represented as
979899
I am looking at the binascii module but don't really know how to take the hex-string and turn it back into the ascii string.
Which method can I use?
Note: I am starting with 979899 and want to convert it back to abc
You can use ord() to get the integer value of each character:
>>> map(ord, 'abc')
[97, 98, 99]
>>> ''.join(map(lambda c: str(ord(c)), 'asd'))
'979899'
>>> ''.join((str(ord(c)) for c in 'abc'))
'979899'
You don't need binascii to get the integer representation of a character in a string, all you need is the built in function ord().
s = 'abc'
print(''.join(map(lambda x:str(ord(x)),s))) # outputs "979899"
To get the string back from the hexadecimal number you can use
s=str(616263)
print "".join([chr(int(s[x:x+2], 16)) for x in range(0,len(s),2)])
See http://ideone.com/dupgs

Categories

Resources