What does %xP mean in python? - python

I came across this line of code and I'm having a tough time figuring out what %xP is doing.
result = "0x%xP"%address
Isn't %address a modulus or is this performing some kind of formatting?

It's a format string. It's a literal 0x followed by the address as a hexidecimal number (%x) followed by a literal P.
Some examples:
>>> '0x%xP' % 1
'0x1P'
>>> '0x%xP' % 10
'0xaP'

Related

How to print unicode character from a string variable?

I am new in programming world, and I am a bit confused.
I expecting that both print result the same graphical unicode exclamation mark symbol:
My experiment:
number = 10071
byteStr = number.to_bytes(4, byteorder='big')
hexStr = hex(number)
uniChar = byteStr.decode('utf-32be')
uniStr = '\\u' + hexStr[2:6]
print(f'{number} - {hexStr[2:6]} - {byteStr} - {uniChar}')
print(f'{uniStr}') # Not working
print(f'\u2757') # Working
Output:
10071 - 2757 - b"\x00\x00'W" - ❗
\u2757
❗
What are the difference in the last two lines?
Please, help me to understand it!
My environment is JupyterHub and v3.9 python.
An escape code evaluated by the Python parser when constructing literal strings. For example, the literal string '马' and '\u9a6c' are evaluated by the parser as the same, length 1, string.
You can (and did) build a string with the 6 characters \u9a6c by using an escape code for the backslash (\\) to prevent the parser from evaluating those 6 characters as an escape code, which is why it prints as the 6-character \u2757.
If you build a byte string with those 6 characters, you can decode it with .decode('unicode-escape') to get the character:
>>> b'\\u2757'.decode('unicode_escape')
'❗'
But it is easier to use the chr() function on the number itself:
>>> chr(0x2757)
'❗'
>>> chr(10071)
'❗'

Importing unicode characters from YAML to Python [duplicate]

I'm trying to write out to a flat file some Chinese, or Russian or various non-English character-sets for testing purposes. I'm getting stuck on how to output a Unicode hex-decimal or decimal value to its corresponding character.
For example in Python, if you had a hard coded set of characters like абвгдежзийкл you would assign value = u"абвгдежзийкл" and no problem.
If however you had a single decimal or hex decimal like 1081 / 0439 stored in a variable and you wanted to print that out with it's corresponding actual character (and not just output 0x439) how would this be done? The Unicode decimal/hex value above refers to й.
Python 2: Use unichr():
>>> print(unichr(1081))
й
Python 3: Use chr():
>>> print(chr(1081))
й
So the answer to the question is:
convert the hexadecimal value to decimal with int(hex_value, 16)
then get the corresponding strin with chr().
To sum up:
>>> print(chr(int('0x897F', 16)))
西
While working on a project that included parsing some JSONs, I encountered a similar problem. I had a lot of strings that had all non-ASCII characters escaped like this:
>>> print(content)
\u0412\u044B j\u0435\u0441\u0442\u0435 \u0438\u0437 \u0420\u043E\u0441\u0441\u0438\u0438?
...
>>> print(content)
\u010Cemu jesi na\u010Dinal izu\u010Dati med\u017Euslovjansky jezyk?
Converting such mixes symbol-by-symbol with unichr() would be tedious. The solution I eventually decided on:
content.encode("utf8").decode("unicode-escape")
The first operation (encoding) produces bytestrings like this:
b'\\u0412\\u044B j\\u0435\\u0441\\u0442\\u0435 \\u0438\\u0437 \\u0420\\u043E\\u0441\\u0441\\u0438\\u0438?'
b'\\u010Cemu jesi na\\u010Dinal izu\\u010Dati med\\u017Euslovjansky jezyk?'
and the second operation (decoding) transforms the byte string into Unicode string but with \\ replaced by \, which "unpacks" the characters, giving the result like this:
Вы jесте из России?
Čemu jesi načinal izučati medžuslovjansky jezyk?
If you run into the error:
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
While trying to convert your hex value using unichr, you can get around that error by doing something like:
>>> n = int('0001f600', 16)
>>> s = '\\U{:0>8X}'.format(n)
>>> s
'\\U0001F600'
>>> binary = s.decode('unicode-escape')
>>> print(binary)
😀

regex python pattern error

I am using the following regular expression pattern for searching in a text file :
hexadecimal numbers (to find : 1a2bc3d4e5 or 2369.235.26.158963 or Aaa4 )
only letters "a" or spaces. There may be "a", spaces or a mixture of
two but nothing else. :
Below my regex for hexadecimal numbers :
matches = re.compile(' 0[xX][0-9a-fA-F]+ ')
Below my regex for second pattern :
matches = re.compile(r'^[a| ]*$')
Unfortunately, it does not work.
Thanks in advance for your help
Honestly, sometimes I think it's best when asking questions to include some of the actual input (or something close to it) and the desired output. For your hex numbers I'm wondering if you want to capture the 0x which precedes the value or avoid it; secondly variable length hex with your regex prototype (slightly corrected) would capture things like 'def', 'bad', etc. Anyway, having input and desired output helps with understanding the problem. The same can be said for people who answer.
With that said, for your first regex (cause I couldn't understand what you wanted for the second), I tend to prefer using "findall" cause it's more direct and yields group matching, so with the following input (presuming you know I'm creating a string in place of using the file.read() method and making my regex capture strings of more than 4 characters)
Code
import re
input = '''This is a hex number 0xAF67E49
This is NOT a hex number tgey736zde
This hex number 0xb34df49a appears in the middle of a sentence
This could be a hex number but has no letters 3689320'''
matches1 = re.findall('([0-9a-fA-F]{4,})', input)
matches2 = re.findall('0x([0-9a-fA-F]{4,})', input)
matches3 = re.findall('(0x[0-9a-fA-F]{4,})', input)
print('matches1: %s' % (str(matches1)))
print('matches2: %s' % (str(matches2)))
print('matches3: %s' % (str(matches3)))
Output
matches1: ['AF67E49', 'b34df49a', '3689320']
matches2: ['AF67E49', 'b34df49a']
matches3: ['0xAF67E49', '0xb34df49a']
Explanation
matches1 indiscriminately matches anything that is 4 or more characters and falls within the hex range. Experiment with this by changing "tgey736zde" in the input to "tgey736de"
matches2 effectively says capture any hex string of more than 4 characters preceded by 0x, ignoring the 0x
matches3 effectively says capture any hex string of more than 4 characters preceded by 0x, but include the 0x
Extra Information
To make this more effective, you might want to research how to use lookaheads as well

How to convert byte string with non-printable chars to hexadecimal in python? [duplicate]

This question already has answers here:
What's the correct way to convert bytes to a hex string in Python 3?
(9 answers)
Closed 7 years ago.
I have an ANSI string Ď–ór˙rXüď\ő‡íQl7 and I need to convert it to hexadecimal like this:
06cf96f30a7258fcef5cf587ed51156c37 (converted with XVI32).
The problem is that Python cannot encode all characters correctly (some of them are incorrectly displayed even here, on Stack Overflow) so I have to deal with them with a byte string.
So the above string is in bytes this: b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
And that's what I need to convert to hexadecimal.
So far I tried binascii with no success, I've tried this:
h = ""
for i in b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7':
h += hex(i)
print(h)
It prints:
0x60xcf0x960xf30xa0x720x830xff0x720x580xfc0xef0x5c0xf50x870xed0x510x150x6c0x37
Okay. It looks like I'm getting somewhere... but what's up with the 0x thing?
When I remove 0x from the string like this:
h.replace("0x", "")
I get 6cf96f3a7283ff7258fcef5cf587ed51156c37 which looks like it's correct.
But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).
Any ideas?
If you're running python 3.5+, bytes type has an new bytes.hex() method that returns string representation.
>>> h = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> h.hex()
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
Otherwise you can use binascii.hexlify() to do the same thing
>>> import binascii
>>> binascii.hexlify(h).decode('utf8')
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
As per the documentation, hex() converts “an integer number to a lowercase hexadecimal string prefixed with ‘0x’.” So when using hex() you always get a 0x prefix. You will always have to remove that if you want to concatenate multiple hex representations.
But sometimes the byte string has a 0 next to a x and it gets removed from the string resulting in a incorrect hexadecimal string. (the string above is missing the 0 at the beginning).
That does not make any sense. x is not a valid hexadecimal character, so in your solution it can only be generated by the hex() call. And that, as said above, will always create a 0x. So the sequence 0x can never appear in a different way in your resulting string, so replacing 0x by nothing should work just fine.
The actual problem in your solution is that hex() does not enforce a two-digit result, as simply shown by this example:
>>> hex(10)
'0xa'
>>> hex(2)
'0x2'
So in your case, since the string starts with b\x06 which represents the number 6, hex(6) only returns 0x6, so you only get a single digit here which is the real cause of your problem.
What you can do is use format strings to perform the conversion to hexadecimal. That way you can both leave out the prefix and enforce a length of two digits. You can then use str.join to combine it all into a single hexadecimal string:
>>> value = b'\x06\xcf\x96\xf3\nr\x83\xffrX\xfc\xef\\\xf5\x87\xedQ\x15l7'
>>> ''.join(['{:02x}'.format(x) for x in value])
'06cf96f30a7283ff7258fcef5cf587ed51156c37'
This solution does not only work with a bytes string but with really anything that can be formatted as a hexadecimal string (e.g. an integer list):
>>> value = [1, 2, 3, 4]
>>> ''.join(['{:02x}'.format(x) for x in value])
'01020304'

Printing Unicode elements in a loop

Consider this:
print u'\u2599'
I get
▙
something like this, which is what I need
But when I try to run it in a loop like this :
for i in range(2500,2600):
str1 = """u\'\\u""" + str(i) + '\''
print str1
I just get an output like:
u'\u2500'
u'\u2501'
u'\u2502'
u'\u2503'
u'\u2504'
u'\u2505'
u'\u2506'
u'\u2507'
u'\u2508'
u'\u2509'
u'\u2510'
u'\u2511'
u'\u2512'
u'\u2513'
u'\u2514'
How do I get the code to print the Unicode values correctly in a loop?
I tried capturing the print output from the cmd prompt but it displays an error:
Unable to initialize device PRN
(which I researched and is probably because of the print command).
You are confusing literal syntax and the value it produces. You cannot produce a value and expect it to be treated as a literal, the same way that producing a string with '1' + '0' does not make the integer 10.
Use the unichr() function to convert an integer to a Unicode character, or use the unicode_escape codec to decode a bytestring containing Python literal syntax to a Unicode string:
>>> unichr(0x2599)
u'\u2599'
>>> print unichr(0x2599)
▙
>>> print '\\u2599'
\u2599
>>> print '\\u2599'.decode('unicode_escape')
▙
You are also missing the crucial detail that the \uhhhh syntax uses hexadecimal numbers. 2500 decimal is 9C4 in hexadecimal, and 2500 in hexadecimal is 9472 in decimal.
To produce your range of values then, you want to use the 0xhhhh Python literal notation to produce a sequence between 0x2500 hex and 0x2600 hex:
for codepoint in range(0x2500, 0x2600):
print unichr(codepoint)
as that's easier to read and understand when using Unicode codepoints.
for i in range(0x2500, 0x2600):
print unichr(i)
Why on earth are you doing it like that?
If you're trying to print the code-points in that range you should do this:
for i in range(0x2500,0x2600):
print unichr(i)
All you're doing in your code above is constructing a string with literal "\u" in it and a number ...
In [9]: for i in range(2500,2503):
a="\\u"+str(i)
print a.decode('unicode-escape')
...:
─
━
│

Categories

Resources