python string.format and invalid \x escape - python

I'm trying to format a string with some \x inside in python.
When I use:
print '\x00\x00\xFF\x00'
it works nicely and print �. But when I try to format the string:
print '\x{}\x{}\x{}\x{}'.format('00','00','FF','00')
I get this error:
ValueError: invalid \x escape
The problem when I escape the backslash like this:
print '\\x{}\\x{}\\x{}\\x{}'.format('00','00','FF','00')
It prints:
\x00\x00\xFF\x00
And not the little � like the non-formatted string.
chr and bytearray seem interesting for example:
print chr(0x00),chr(0x00),chr(0xFF),chr(0x00) or print bytearray([0x00, 0x00, 0xFF, 0x00])
prints �, but when I try to format them, I get a SyntaxError.
I found some interesting posts like:
Why can't Python's string.format pad with "\x00"?
Converting int to bytes in Python 3
But I'm still stuck...
How to print a formatted string with \x inside?
(I'm using python 2.7 but I can use an other version.)
Thank you

The objective is to create a format string that will print characters, given string representations of hex values that correspond to unicode code points, so that something like this
for var1 in 'FF','00','38':
print '\x{}\x{}\x{}\x{}'.format(var1,'00','FF','00')
will output
��
�
8�
The trick is to convert the hex values to integers, using the int builtin function, then use the c string format code to convert the integer value to the corresponding unicode character.
for v in ('ff', '00', '38'):
print '{:c}{:c}{:c}{:c}'.format(*[int(x, 16) for x in [v, '00', 'ff', '00']])
��
�
8�
From the docs:
c: Character. Converts the integer to the corresponding unicode character before printing.

Related

Importing unicode characters from YAML to Python [duplicate]

I'm trying to write out to a flat file some Chinese, or Russian or various non-English character-sets for testing purposes. I'm getting stuck on how to output a Unicode hex-decimal or decimal value to its corresponding character.
For example in Python, if you had a hard coded set of characters like абвгдежзийкл you would assign value = u"абвгдежзийкл" and no problem.
If however you had a single decimal or hex decimal like 1081 / 0439 stored in a variable and you wanted to print that out with it's corresponding actual character (and not just output 0x439) how would this be done? The Unicode decimal/hex value above refers to й.
Python 2: Use unichr():
>>> print(unichr(1081))
й
Python 3: Use chr():
>>> print(chr(1081))
й
So the answer to the question is:
convert the hexadecimal value to decimal with int(hex_value, 16)
then get the corresponding strin with chr().
To sum up:
>>> print(chr(int('0x897F', 16)))
西
While working on a project that included parsing some JSONs, I encountered a similar problem. I had a lot of strings that had all non-ASCII characters escaped like this:
>>> print(content)
\u0412\u044B j\u0435\u0441\u0442\u0435 \u0438\u0437 \u0420\u043E\u0441\u0441\u0438\u0438?
...
>>> print(content)
\u010Cemu jesi na\u010Dinal izu\u010Dati med\u017Euslovjansky jezyk?
Converting such mixes symbol-by-symbol with unichr() would be tedious. The solution I eventually decided on:
content.encode("utf8").decode("unicode-escape")
The first operation (encoding) produces bytestrings like this:
b'\\u0412\\u044B j\\u0435\\u0441\\u0442\\u0435 \\u0438\\u0437 \\u0420\\u043E\\u0441\\u0441\\u0438\\u0438?'
b'\\u010Cemu jesi na\\u010Dinal izu\\u010Dati med\\u017Euslovjansky jezyk?'
and the second operation (decoding) transforms the byte string into Unicode string but with \\ replaced by \, which "unpacks" the characters, giving the result like this:
Вы jесте из России?
Čemu jesi načinal izučati medžuslovjansky jezyk?
If you run into the error:
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
While trying to convert your hex value using unichr, you can get around that error by doing something like:
>>> n = int('0001f600', 16)
>>> s = '\\U{:0>8X}'.format(n)
>>> s
'\\U0001F600'
>>> binary = s.decode('unicode-escape')
>>> print(binary)
😀

Convert from ASCII to Hex in Python

I'm trying to convert a string with special characters from ASCII to Hex using python, but it doesn't seem that I'm getting the correct value, noting that it works just fine whenever I try to convert a string that has no special characters. So basically here is what I'm doing:
import binascii
s = "D`Cزف³›"
s_bytes = str.encode(s)
hex_value = str(binascii.hexlify(s_bytes),'ascii')
print (hex_value)
Output
446043d8b2d981c2b316e280ba
Where the output should be (using online converter https://www.rapidtables.com/convert/number/ascii-to-hex.html):
446043632641b3203a
str.encode(s) defaults to utf8 encoding, which doesn't give you the byte values needed to get the desired output. The values you want are simply Unicode ordinals as hexadecimal values, so get the ordinal, convert to hex and join them all together:
s = 'D`Cزف³›'
h = ''.join([f'{ord(c):x}' for c in s])
print(h)
446043632641b3203a
Just realize that Unicode ordinals can be 1-6 hexadecimal digits long, so there is no easy way to reverse the process since you have no spacing of the numbers.

How to remove '\x' from a hex string in Python?

I'm reading a wav audio file in Python using wave module. The readframe() function in this library returns frames as hex string. I want to remove \x of this string, but translate() function doesn't work as I want:
>>> input = wave.open(r"G:\Workspace\wav\1.wav",'r')
>>> input.readframes (1)
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\\x')
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\x')
ValueError: invalid \x escape
>>> '\xff\x1f\x00\xe8'.translate(None,r'\x')
'\xff\x1f\x00\xe8'
>>>
Any way I want divide the result values by 2 and then add \x again and generate a new wav file containing these new values. Does any one have any better idea?
What's wrong?
Indeed, you don't have backslashes in your string. So, that's why you can't remove them.
If you try to play with each hex character from this string (using ord() and len() functions - you'll see their real values. Besides, the length of your string is just 4, not 16.
You can play with several solutions to achieve your result:
'hex' encode:
'\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'
Or use repr() function:
repr('\xff\x1f\x00\xe8').translate(None,r'\\x')
One way to do what you want is:
>>> s = '\xff\x1f\x00\xe8'
>>> ''.join('%02x' % ord(c) for c in s)
'ff1f00e8'
The reason why translate is not working is that what you are seeing is not the string itself, but its representation. In other words, \x is not contained in the string:
>>> '\\x' in '\xff\x1f\x00\xe8'
False
\xff, \x1f, \x00 and \xe8 are the hexadecimal representation of for characters (in fact, len(s) == 4, not 24).
Use the encode method:
>>> s = '\xff\x1f\x00\xe8'
>>> print s.encode("hex")
'ff1f00e8'
As this is a hexadecimal representation, encode with hex
>>> '\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'

Printing Unicode elements in a loop

Consider this:
print u'\u2599'
I get
▙
something like this, which is what I need
But when I try to run it in a loop like this :
for i in range(2500,2600):
str1 = """u\'\\u""" + str(i) + '\''
print str1
I just get an output like:
u'\u2500'
u'\u2501'
u'\u2502'
u'\u2503'
u'\u2504'
u'\u2505'
u'\u2506'
u'\u2507'
u'\u2508'
u'\u2509'
u'\u2510'
u'\u2511'
u'\u2512'
u'\u2513'
u'\u2514'
How do I get the code to print the Unicode values correctly in a loop?
I tried capturing the print output from the cmd prompt but it displays an error:
Unable to initialize device PRN
(which I researched and is probably because of the print command).
You are confusing literal syntax and the value it produces. You cannot produce a value and expect it to be treated as a literal, the same way that producing a string with '1' + '0' does not make the integer 10.
Use the unichr() function to convert an integer to a Unicode character, or use the unicode_escape codec to decode a bytestring containing Python literal syntax to a Unicode string:
>>> unichr(0x2599)
u'\u2599'
>>> print unichr(0x2599)
▙
>>> print '\\u2599'
\u2599
>>> print '\\u2599'.decode('unicode_escape')
▙
You are also missing the crucial detail that the \uhhhh syntax uses hexadecimal numbers. 2500 decimal is 9C4 in hexadecimal, and 2500 in hexadecimal is 9472 in decimal.
To produce your range of values then, you want to use the 0xhhhh Python literal notation to produce a sequence between 0x2500 hex and 0x2600 hex:
for codepoint in range(0x2500, 0x2600):
print unichr(codepoint)
as that's easier to read and understand when using Unicode codepoints.
for i in range(0x2500, 0x2600):
print unichr(i)
Why on earth are you doing it like that?
If you're trying to print the code-points in that range you should do this:
for i in range(0x2500,0x2600):
print unichr(i)
All you're doing in your code above is constructing a string with literal "\u" in it and a number ...
In [9]: for i in range(2500,2503):
a="\\u"+str(i)
print a.decode('unicode-escape')
...:
─
━
│

Format hex digits for character code \x

I'm new to Python's \x00 string formatting and I was wondering if there is a nice pythonic way of doing something like below? I would like to dynamically insert the \x formatting into my python strings.
# Is there a way get a similar effect to this?
s = "commandstring \x{}\x{}".format('00', '00')
s = "commandstring \x%s\x%s" % ('00', '00')
Some of my strings will be regular text and numbers, but I also need to insert the Hex values
\x00 represents a single byte. Produce those single bytes directly:
>>> "commandstring {}{}".format('\x00', '\x00')
'commandstring \x00\x00'
or use the chr() function to produce the byte given an integer:
>>> "commandstring {}{}".format(chr(0), chr(0))
'commandstring \x00\x00'
The \xhh notation is syntax that can only be used in a string literal. You could construct the syntax then have Python explicitly interpret that with eval(), ast.literal_eval or the raw_string codec, but that is usually not what you need in the first place.
I think what you want here is the %x placeholder. Try the following:
s = "commandstring \x%x\x%x" % (50, 95)
It will give you
s = "commandstring \x32\x5f"
But, you need to pass integers for it to work.

Categories

Resources