How to print unicode from a generator expression in python?

How to print unicode from a generator expression in python? - python

Create a list from generator expression:
V = [('\\u26' + str(x)) for x in range(63,70)]
First issue: if you try to use just "\u" + str(...) it gives a decoder error right away. Seems like it tries to decode immediately upon seeing the \u instead of when a full chunk is ready. I am trying to work around that with double backslash.
Second, that creates something promising but still cannot actually print them as unicode to console:
>>> print([v[0:] for v in V])
['\\u2663', '\\u2664', '\\u2665', .....]
>>> print(V[0])
\u2663
What I would expect to see is a list of symbols that look identical to when using commands like '\u0123' such as:
>>> print('\u2663')
♣
Any way to do that from a generated list? Or is there a better way to print them instead of the '\u0123' format?
This is NOT what I want. I want to see the actual symbols drawn, not the Unicode values:
>>> print(['{}'.format(v[0:]) for v in V])
['\\u2663', '\\u2664', '\\u2665', '\\u2666', '\\u2667', '\\u2668', '\\u2669']

Unicode is a character to bytes encoding, not escape sequences. Python 3 strings are Unicode. To return the character that corresponds to a Unicode code point use chr :
chr(i)
Return the string representing a character whose Unicode code point is the integer i. For example, chr(97) returns the string 'a', while chr(8364) returns the string '€'. This is the inverse of ord().
The valid range for the argument is from 0 through 1,114,111 (0x10FFFF in base 16). ValueError will be raised if i is outside that range.
To generate the characters between 2663 and 2670:
>>> [chr(x) for x in range(2663,2670)]
['੧', '੨', '੩', '੪', '੫', '੬', '੭']
Escape sequences use hexadecimal notation though. 0x2663 is 9827 in decimal, and 0x2670 becomes 9840.
>>> [chr(x) for x in range(9827,9840)]
['♣', '♤', '♥', '♦', '♧', '♨', '♩', '♪', '♫', '♬', '♭', '♮', '♯']
You can use also use hex numeric literals:
>>> [chr(x) for x in range(0x2663,0x2670)]
['♣', '♤', '♥', '♦', '♧', '♨', '♩', '♪', '♫', '♬', '♭', '♮', '♯']
or, to use exactly the same logic as the question
>>> [chr(0x2600 + x) for x in range(0x63,0x70)]
['♣', '♤', '♥', '♦', '♧', '♨', '♩', '♪', '♫', '♬', '♭', '♮', '♯']
The reason the original code doesn't work is that escape sequences are used to represent a single character in a string when we can't or don't want to type the character itself. The interpreter or compiler replaces them with the corresponding character immediatelly. The string \\u26 is an escaped \ followed by u, 2 and 6:
>>> len('\\u26')
4

Related

Converting a hex values to ASCII

What is the most 'Pythonic' way of translating
'\xff\xab\x12'
into
'ffab12'
I looked for functions that can do it, but they all want to translate to ASCII (so '\x40' to 'a'). I want to have the hexadecimal digits in ASCII.

There's a module called binascii that contains functions for just this:
>>> import binascii
>>> binascii.hexlify('\xff\xab\x12')
'ffab12'
>>> binascii.unhexlify('ffab12')
'\xff\xab\x12'

original = '\xff\xab\x12'
result = original.replace('\\x', '')
print result
It's \x because it's escaped. a.replace(b,c) just replaces all occurances of b with c in a.
What you want is not ascii, because ascii translates 0x41 to 'A'. You just want it in hexadecimal base without the \x (or 0x, in some cases)
Edit!!
Sorry, I thought the \x is escaped. So, \x followed by 2 hex digits represents a single char, not 4..
print "\x41"
Will print
A
So what we have to do is to convert each char to hex, then print it like that:
res = ""
for i in original:
res += hex(ord(i))[2:].zfill(2)
print res
Now let's go over this line:
hex(ord(i))[2:]
ord(c) - returns the numerical value of the char c
hex(i) - returns the hex string value of the int i (e.g if i=65 it will return 0x41.
[2:] - cutting the "0x" prefix out of the hex string.
.zfill(2) - padding with zeroes
So, making that with a list comprehension will be much shorter:
result = "".join([hex(ord(c))[2:].zfill(2) for c in original])
print result

How to convert byte string with non-printable chars to hexadecimal in python? [duplicate]

This question already has answers here:
What's the correct way to convert bytes to a hex string in Python 3?
(9 answers)
Closed 7 years ago.
I have an ANSI string Ď–ór˙rXüď\ő‡íQl7 and I need to convert it to hexadecimal like this:
06cf96f30a7258fcef5cf587ed51156c37 (converted with XVI32).
The problem is that Python cannot encode all characters correctly (some of them are incorrectly displayed even here, on Stack Overflow) so I have to deal with them with a byte string.
So the above string is in bytes this: b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
And that's what I need to convert to hexadecimal.
So far I tried binascii with no success, I've tried this:
h = ""
for i in b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7':
h += hex(i)
print(h)
It prints:
0x60xcf0x960xf30xa0x720x830xff0x720x580xfc0xef0x5c0xf50x870xed0x510x150x6c0x37
Okay. It looks like I'm getting somewhere... but what's up with the 0x thing?
When I remove 0x from the string like this:
h.replace("0x", "")
I get 6cf96f3a7283ff7258fcef5cf587ed51156c37 which looks like it's correct.
But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).
Any ideas?

If you're running python 3.5+, bytes type has an new bytes.hex() method that returns string representation.
>>> h = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> h.hex()
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
Otherwise you can use binascii.hexlify() to do the same thing
>>> import binascii
>>> binascii.hexlify(h).decode('utf8')
'06cf96f30a7283ff7258fcef5cf587ed51156c37'

As per the documentation, hex() converts “an integer number to a lowercase hexadecimal string prefixed with ‘0x’.” So when using hex() you always get a 0x prefix. You will always have to remove that if you want to concatenate multiple hex representations.
But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).
That does not make any sense. x is not a valid hexadecimal character, so in your solution it can only be generated by the hex() call. And that, as said above, will always create a 0x. So the sequence 0x can never appear in a different way in your resulting string, so replacing 0x by nothing should work just fine.
The actual problem in your solution is that hex() does not enforce a two-digit result, as simply shown by this example:
>>> hex(10)
'0xa'
>>> hex(2)
'0x2'
So in your case, since the string starts with b\x06 which represents the number 6, hex(6) only returns 0x6, so you only get a single digit here which is the real cause of your problem.
What you can do is use format strings to perform the conversion to hexadecimal. That way you can both leave out the prefix and enforce a length of two digits. You can then use str.join to combine it all into a single hexadecimal string:
>>> value = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> ''.join(['{:02x}'.format(x) for x in value])
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
This solution does not only work with a bytes string but with really anything that can be formatted as a hexadecimal string (e.g. an integer list):
>>> value = [1, 2, 3, 4]
>>> ''.join(['{:02x}'.format(x) for x in value])
'01020304'

How to remove '\x' from a hex string in Python?

I'm reading a wav audio file in Python using wave module. The readframe() function in this library returns frames as hex string. I want to remove \x of this string, but translate() function doesn't work as I want:
>>> input = wave.open(r"G:\Workspace\wav\1.wav",'r')
>>> input.readframes (1)
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\\x')
'\xff\x1f\x00\xe8'
>>> '\xff\x1f\x00\xe8'.translate(None,'\x')
ValueError: invalid \x escape
>>> '\xff\x1f\x00\xe8'.translate(None,r'\x')
'\xff\x1f\x00\xe8'
>>>
Any way I want divide the result values by 2 and then add \x again and generate a new wav file containing these new values. Does any one have any better idea?
What's wrong?

Indeed, you don't have backslashes in your string. So, that's why you can't remove them.
If you try to play with each hex character from this string (using ord() and len() functions - you'll see their real values. Besides, the length of your string is just 4, not 16.
You can play with several solutions to achieve your result:
'hex' encode:
'\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'
Or use repr() function:
repr('\xff\x1f\x00\xe8').translate(None,r'\\x')

One way to do what you want is:
>>> s = '\xff\x1f\x00\xe8'
>>> ''.join('%02x' % ord(c) for c in s)
'ff1f00e8'
The reason why translate is not working is that what you are seeing is not the string itself, but its representation. In other words, \x is not contained in the string:
>>> '\\x' in '\xff\x1f\x00\xe8'
False
\xff, \x1f, \x00 and \xe8 are the hexadecimal representation of for characters (in fact, len(s) == 4, not 24).

Use the encode method:
>>> s = '\xff\x1f\x00\xe8'
>>> print s.encode("hex")
'ff1f00e8'

As this is a hexadecimal representation, encode with hex
>>> '\xff\x1f\x00\xe8'.encode('hex')
'ff1f00e8'

Printing Unicode elements in a loop

Consider this:
print u'\u2599'
I get
▙
something like this, which is what I need
But when I try to run it in a loop like this :
for i in range(2500,2600):
str1 = """u\'\\u""" + str(i) + '\''
print str1
I just get an output like:
u'\u2500'
u'\u2501'
u'\u2502'
u'\u2503'
u'\u2504'
u'\u2505'
u'\u2506'
u'\u2507'
u'\u2508'
u'\u2509'
u'\u2510'
u'\u2511'
u'\u2512'
u'\u2513'
u'\u2514'
How do I get the code to print the Unicode values correctly in a loop?
I tried capturing the print output from the cmd prompt but it displays an error:
Unable to initialize device PRN
(which I researched and is probably because of the print command).

You are confusing literal syntax and the value it produces. You cannot produce a value and expect it to be treated as a literal, the same way that producing a string with '1' + '0' does not make the integer 10.
Use the unichr() function to convert an integer to a Unicode character, or use the unicode_escape codec to decode a bytestring containing Python literal syntax to a Unicode string:
>>> unichr(0x2599)
u'\u2599'
>>> print unichr(0x2599)
▙
>>> print '\\u2599'
\u2599
>>> print '\\u2599'.decode('unicode_escape')
▙
You are also missing the crucial detail that the \uhhhh syntax uses hexadecimal numbers. 2500 decimal is 9C4 in hexadecimal, and 2500 in hexadecimal is 9472 in decimal.
To produce your range of values then, you want to use the 0xhhhh Python literal notation to produce a sequence between 0x2500 hex and 0x2600 hex:
for codepoint in range(0x2500, 0x2600):
print unichr(codepoint)
as that's easier to read and understand when using Unicode codepoints.

for i in range(0x2500, 0x2600):
print unichr(i)

Why on earth are you doing it like that?
If you're trying to print the code-points in that range you should do this:
for i in range(0x2500,0x2600):
print unichr(i)
All you're doing in your code above is constructing a string with literal "\u" in it and a number ...

In [9]: for i in range(2500,2503):
a="\\u"+str(i)
print a.decode('unicode-escape')
...:
─
━
│

as I hold the string of hexadecimal format without encoding the string?

The general problem is that I need the hexadecimal string stays in that format to assign it to a variable and not to save the coding?
no good:
>>> '\x61\x74'
'at'
>>> a = '\x61\x74'
>>> a
'at'
works well, but is not as:
>>> '\x61\x74'
'\x61\x74' ????????
>>> a = '\x61\x74'
>>> a
'\x61\x74' ????????

Use r prefix (explained on SO)
a = r'\x61\x74'
b = '\x61\x74'
print (a) #prints \x61\x74
print (b) # prints at

It is the same data. Python lets you specify a literal string using different methods, one of which is to use escape codes to represent bytes.
As such, '\x61' is the same character value as 'a'. Python just chooses to show printable ASCII characters as printable ASCII characters instead of the escape code, just because that makes working with bytestrings that much easier.
If you need the literal slash, x character and the two digit 6 and 1 characters (so a string of length 4), you need to double the slash or use raw strings.
To illustrate:
>>> '\x61' == 'a' # two notations for the same value
True
>>> len('\x61') # it's just 1 character
1
>>> '\\x61' # escape the escape
'\\x61'
>>> r'\x61' # or use a raw literal instead
'\\x61'
>>> len('\\x61') # which produces 4 characters
4

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to print unicode from a generator expression in python? - python

Related

Converting a hex values to ASCII

How to convert byte string with non-printable chars to hexadecimal in python? [duplicate]

How to remove '\x' from a hex string in Python?

Printing Unicode elements in a loop

as I hold the string of hexadecimal format without encoding the string?

Categories

Resources