Converting repr to Hex - python

I received the following string.How can it be converted to hex value='(\xd2M\x00\x18\x00\x18\x80\x00\x80\x00\x00\x00\x00\x00\x00\xe0\xd2\xe0\xd2.\xd2\x00\x00\x00\x00\x00\x00\n\x00\x18\x00&\x00\x00\x00\x00\x00\x00\x00\x0f0\xfe/\x010\xff/\x000\xff/\x000\xff/\xff/\xff/\xff/\xff/\x000\xff/\xff/\xff/\x000\x000\xff/\x000\x000\x000\xff/\xff/\x000\x000\xff/\x000\xad\xff\x0c\x00\xdd\xff\xc2\xff\xd3\xff\xde\xff\xe9\xff\xca\xff\xd8\xff\xe6\xff\xb5\xff\xb2\xff\xe6\xff\x92\xff\xd0\xff\xa0\xff\xbd\xff\xb4\xff\x82\xff\x90\xfff\xff\xe1\xff\x9f\xff\x94\xff\xd4\xff\xa4\xff\xbb\xff\xe8\xff\x00\x00\x02\x00\xff\x7f\xff\x7f\x97\xff\xd0\xff\xb7\xff~\xffG\xff\xa1\xff\xa1\xff\xcd\xab\x00\x00A\n\x00\x00'

That's not a hex string. You are confusing the Python repr() output for a bytestring, which aims to make debugging easier, with the contents.
Each \xhh is a standard Python string literal escape sequence, and displaying the string like this makes it trivial to copy and paste into another Python session to reproduce the exact same value.
You don't need to hex decode this at all.
An actual hex string consists only of the digits 0 through to 9, and the letters a through to f (upper or lowercase). Your value, converted to hex, looks like this:
>>> value='(\xd2M\x00\x18\x00\x18\x80\x00\x80\x00\x00\x00\x00\x00\x00\xe0\xd2\xe0\xd2.\xd2\x00\x00\x00\x00\x00\x00\n\x00\x18\x00&\x00\x00\x00\x00\x00\x00\x00\x0f0\xfe/\x010\xff/\x000\xff/\x000\xff/\xff/\xff/\xff/\xff/\x000\xff/\xff/\xff/\x000\x000\xff/\x000\x000\x000\xff/\xff/\x000\x000\xff/\x000\xad\xff\x0c\x00\xdd\xff\xc2\xff\xd3\xff\xde\xff\xe9\xff\xca\xff\xd8\xff\xe6\xff\xb5\xff\xb2\xff\xe6\xff\x92\xff\xd0\xff\xa0\xff\xbd\xff\xb4\xff\x82\xff\x90\xfff\xff\xe1\xff\x9f\xff\x94\xff\xd4\xff\xa4\xff\xbb\xff\xe8\xff\x00\x00\x02\x00\xff\x7f\xff\x7f\x97\xff\xd0\xff\xb7\xff~\xffG\xff\xa1\xff\xa1\xff\xcd\xab\x00\x00A\n\x00\x00'
>>> import binascii
>>> binascii.hexlify(value)
'28d24d00180018800080000000000000e0d2e0d22ed20000000000000a00180026000000000000000f30fe2f0130ff2f0030ff2f0030ff2fff2fff2fff2fff2f0030ff2fff2fff2f00300030ff2f003000300030ff2fff2f00300030ff2f0030adff0c00ddffc2ffd3ffdeffe9ffcaffd8ffe6ffb5ffb2ffe6ff92ffd0ffa0ffbdffb4ff82ff90ff66ffe1ff9fff94ffd4ffa4ffbbffe8ff00000200ff7fff7f97ffd0ffb7ff7eff47ffa1ffa1ffcdab0000410a0000'

Related

What encoding is used by json.dumps?

When I use json.dumps in Python 3.8 for special characters they are being "escaped", like:
>>> import json
>>> json.dumps({'Crêpes': 5})
'{"Cr\\u00eapes": 5}'
What kind of encoding is this? Is this an "escape encoding"? And why is this kind of encoding not part of the encodings module? (Also see codecs, I think I tried all of them.)
To put it another way, how can I convert the string 'Crêpes' to the string 'Cr\\u00eapes' using Python encodings, escaping, etc.?
You are probably confused by the fact that this is a JSON string, not directly a Python string.
Python would encode this string as "Cr\u00eapes", where \u00ea represents a single Unicode character using its hexadecimal code point. In other words, in Python, len("\u00ea") == 1
JSON requires the same sort of encoding, but embedding the JSON-encoded value in a Python string requires you to double the backslash; so in Python's representation, this becomes "Cr\\u00eapes" where you have a literal backslash (which has to be escaped by another backslash), two literal zeros, a literal e character, and a literal a character. Thus, len("\\u00ea") == 6
If you have JSON in a file, the absolutely simplest way to load it into Python is to use json.loads() to read and decode it into a native Python data structure.
If you need to decode the hexadecimal sequence separately, the unicode-escape function does that on a byte value:
>>> b"Cr\\u00eapes".decode('unicode-escape')
'Crêpes'
This is sort of coincidental, and works simply because the JSON representation happens to be identical to the Python unicode-escape representation. You still need a b'...' aka bytes input for that. ("Crêpes".encode('unicode-escape') produces a slightly different representation. "Cr\\u00eapes".encode('us-ascii') produces a bytes string with the Unicode representation b"Cr\\u00eapes".)
It is not a Python encoding. It is the way JSON encodes Unicode non-ASCII characters. It is independent of Python and is used exactly the same for example in Java or with a C or C++ library.
The rule is that a non-ASCII character in the Basic Multilingual Plane (i.e. with a maximum 16 bits code) is encoded as \uxxxx where xxxx is the unicode code value.
Which explains why the ê is written as \u00ea, because its unicode code point is U+00EA

concatenate a string with \x in python 3

Say I have someString = "00". basically I want to convert someString to \x00
I tried multiple ways to achieve my goal, but couldn't find a successful one.
tried:
HexString = '\x'+someString
This method throws this error:
ValueError: invalid \x escape
Unless I do HexString = r'\x'+someString, but then HexString value is set to \\x00 which is not the same as I want.
I also tried using hex() function, which had few issues. But the big issue I had with it was that it returns 0x0. It expects int and etc...
Can anyone help me with converting a string("11") to \x11?
If I am understanding you correctly, the actual goal is to take a string that contains a bunch of pairs of hex digits, and translate each pair of hex digits into the corresponding byte and have a result of type bytes.
In 3.x, this is built directly into the bytes type itself:
>>> bytes.fromhex('11abcdef')
b'\x11\xab\xcd\xef'
You can also instead use the standard library:
>>> import binascii
>>> binascii.unhexlify('11abcdef')
b'\x11\xab\xcd\xef'
You will not necessarily see a \x escape sequence for every byte value. This is normal and expected; it has to do with how the bytes object is represented as text for display purposes.
'\x'+someString
No approach of this general form can work, because it fundamentally misunderstands the problem. The output that you want is not a string, and a string literal like '\x00' does not have a backslash in it, nor a lowercase x - again, what you are seeing is how the string is represented as text, because not every character is printable.
int lets you set the base. For base 16
>>> someString = "00"
>>> int(someString, 16)
0
Of course, 0 is kinda boring because it works for all bases.
If you wanted a byte in a bytes object, you could
>>> import struct
>>> struct.pack("b", int(someString, 16))
b'\x00'
If you want a string (and I'm switching to 0x41 here) you could
>>> chr(int("41", 16))
'A'
You can get ord of the character by using int, then convert it to a character. Then you can encode it to bytes object without any import.
>>> chr(int("11", 16)) # a character
'\x11'
>>> chr(int("11", 16)).encode() # bytes object
b'\x11'

having an issue to convert decimal to hex [duplicate]

I'm trying to convert a binary I have in python (a gzipped protocol buffer object) to an hexadecimal string in a string escape fashion (eg. \xFA\x1C ..).
I have tried both
repr(<mygzipfileobj>.getvalue())
as well as
<mygzipfileobj>.getvalue().encode('string-escape')
In both cases I end up with a string which is not made of HEX chars only.
\x86\xe3$T]\x0fPE\x1c\xaa\x1c8d\xb7\x9e\x127\xcd\x1a.\x88v ...
How can I achieve a consistent hexadecimal conversion where every single byte is actually translated to a \xHH format ? (where H represents a valid hex char 0-9A-F)
The \xhh format you often see is a debugging aid, the output of the repr() applied to a string with non-ASCII codepoints. Any ASCII codepoints are left a in-place to leave what readable information is there.
If you must have a string with all characters replaced by \xhh escapes, you need to do so manually:
''.join(r'\x{0:02x}'.format(ord(c)) for c in value)
If you need quotes around that, you'd need to add those manually too:
"'{0}'".format(''.join(r'\x{:02x}'.format(ord(c)) for c in value))

Converting unicode string to hexadecimal representation

I want to convert unicode string to its hexadecimal representation.
For example, u'\u041a\u0418\u0421\u0410' should be converted to "\xD0\x9A\xD0\x98\xD0\xA1\xD0\x90". I tried the code below (python 2.7):
unicode_username.encode("utf-8").encode("hex")
However, I get a string:
'd09ad098d0a1d090'
Any suggestions how I can get \xD0\x9A\xD0\x98\xD0\xA1\xD0\x90?
When you do string.encode('utf-8'), it changes to hex notation.
But if you print it, you will get original unicode string.
If you want the hex notation you can get it like this with repr() function:
>>> print u'\u041a\u0418\u0421\u0410'.encode('utf-8')
КИСА
>>> print repr(u'\u041a\u0418\u0421\u0410'.encode('utf-8'))
'\xd0\x9a\xd0\x98\xd0\xa1\xd0\x90'
you can also try:
print "hex_signature : ",'\\X'.join(x.encode("hex") for x in signature)
The join function is used with the separator '\X' so that for each byte to hex conversion the \X is inserted. The join function is done for each byte of the variable signature in a loop. Everything is joined/concatenated and printed.

Python unicode string with UTF-8?

I'm getting back from a library what looks to be an incorrect unicode string:
>>> title
u'Sopet\xc3\xb3n'
Now, those two hex escapes there are the UTF-8 encoding for U+00F3 LATIN SMALL LETTER O WITH ACUTE. So far as I understand, a unicode string in Python should have the actual character, not the the UTF-8 encoding for the character, so I think this is incorrect and presumably a bug either in the library or in my input, right?
The question is, how do I (a) recognize that I have UTF-8 encoded text in my unicode string, and (b) convert this to a proper unicode string?
I'm stumped on (a), as there's nothing wrong, encoding-wise, about that original string (i.e, both are valid characters in their own right, u'\xc3\xb3' == ó, but they're not what's supposed to be there)
It looks like I can achieve (b) by eval()ing that repr() output minus the "u" in front to get a str and then decoding the str with UTF-8:
>>> eval(repr(title)[1:]).decode("utf-8")
u'Sopet\xf3n'
>>> print eval(repr(title)[1:]).decode("utf-8")
Sopetón
But that seems a bit kludgy. Is there an officially-sanctioned way to get the raw data out of a unicode string and treat that as a regular string?
a) Try to put it through the method below.
b)
>>> u'Sopet\xc3\xb3n'.encode('latin-1').decode('utf-8')
u'Sopet\xf3n'
You should use:
>>> title.encode('raw_unicode_escape')
Python2:
print(u'\xd0\xbf\xd1\x80\xd0\xb8'.encode('raw_unicode_escape'))
Python3:
print(u'\xd0\xbf\xd1\x80\xd0\xb8'.encode('raw_unicode_escape').decode('utf8'))

Categories

Resources