I'm trying to convert the following (that seems to be an HEX) with Python with its decoded output:
I want to convert this:
To this:
How to do this?
This is the string:
0x00000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000080000000000000000000000000000000000000000000000000000000006331b7e000000000000000000000000000000000000000000000000000000000000000c0000000000000000000000000000000000000000000000000000000000000000474657374000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007566963746f727900000000000000000000000000000000000000000000000000
First you need to convert the hex into a bytearray:
hex = 0x00000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000080000000000000000000000000000000000000000000000000000000006331b7e000000000000000000000000000000000000000000000000000000000000000c0000000000000000000000000000000000000000000000000000000000000000474657374000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007566963746f727900000000000000000000000000000000000000000000000000
b = bytearray.fromhex(hex).decode()
Then you will need to determine the layout of the bytes. For example, an unit256 is probably 32 bytes which is 64 hex digits:
a = b[:64]
print(int.from_bytes(a, "big"))
Here I assume the bytes are in big-endian. If they are instead little-endian, you can use "little" instead of "big". You will need to learn about so-called "endianness" to understand this better.
You can get the other uint256 in a similar way.
As for the strings, I don't know what their length is. You will have to research the format for Ethereum blockchain data. Once you determine the length, you can use a similar technique to get the bytes for each string and then decode it into characters.
Just use the inbuilt decode function in Python:
str="0x00000000000000000000000000000000000000000000000000000000000000040000000000000000000000000000000000000000000000000000000000000080000000000000000000000000000000000000000000000000000000006331b7e000000000000000000000000000000000000000000000000000000000000000c0000000000000000000000000000000000000000000000000000000000000000474657374000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000007566963746f727900000000000000000000000000000000000000000000000000" #YOUR HEX
str.decode("hex")
Alternatively, if that does not work, you can use:
bytearray.fromhex(str).decode()
Related
This question already has answers here:
Process escape sequences in a string in Python
(8 answers)
Closed 7 months ago.
My problem is as follows:
I'm reading a .csv generated by some software and to read it I'm using Pandas. Pandas read the .csv properly but one of the columns stores bytes sequences representing vectors and Pandas stores them as a string.
So I have data (string) and I want to use np.frombuffer() to get the proper vector. The problem is, data is a string so its already encoded so when I use .encode() to turn it into bytes, the sequence is not the original one.
Example: The .csv contains \x00\x00 representing the vector [0,0] with dtype=np.uint8. Pandas stores it as a string and when I try to process it something like this happens:
data = df.data[x] # With x any row.
type(data)
<class 'str'>
print(data)
\x00\x00
e_data = data.encode("latin1")
print(e_data)
b'\\x00\\x00'
v = np.frombuffer(e_data, np.uint8)
print(v)
array([ 92 120 48 48 92 120 48 48], dtype=uint8)
I just want to get b'\x00\x00' from data instead of b'\\x00\\x00' which I understand is a little encoding mess I have not been able to fix yet.
Any way to do this?
Thanks!
Issue: you (apparently) have a string that contains literal backslash escape sequences, such as:
>>> x = r'\x00' # note the use of a raw string literal
>>> x # Python's representation of the string escapes the backslash
'\\x00'
>>> print(x) # but it looks right when printing
\x00
From this, you wish to create a corresponding bytes object, wherein the backslash-escape sequences are translated into the corresponding byte.
Handling these kinds of escape sequences is done using the unicode-escape string encoding. As you may be aware, string encodings convert between bytes and str objects, specifying the rules for which byte sequences correspond to what Unicode code points.
However, the unicode-escape codec assumes that the escape sequences are on the bytes side of the equation and that the str side will have the corresponding Unicode characters:
>>> rb'\x00'.decode('unicode-escape') # create a string with a NUL char
'\x00'
Applying .encode to the string will reverse that process; so if you start with the backslash-escape sequence, it will re-escape the backslash:
>>> r'\x00'.encode('unicode-escape') # the result contains two backslashes, represented as four
b'\\\\x00'
>>> list(r'\x00'.encode('unicode-escape')) # let's look at the numeric values of the bytes
[92, 92, 120, 48, 48]
As you can see, that is clearly not what we want.
We want to convert from bytes to str to do the backslash-escaping. But we have a str to start, so we need to change that to bytes; and we want bytes at the end, so we need to change the str that we get from the backslash-escaping. In both cases, we need to make each Unicode code point from 0-255 inclusive, correspond to a single byte with the same value.
The encoding we need for that task is called latin-1, also known as iso-8859-1.
For example:
>>> r'\x00'.encode('latin-1')
b'\\x00'
Thus, we can reason out the overall conversion:
>>> r'\x00'.encode('latin-1').decode('unicode-escape').encode('latin-1')
b'\x00'
As desired: our str with a literal backslash, lowercase x and two zeros, is converted to a bytes object containing a single zero byte.
Alternately: we can request that backslash-escapes are processed while decoding, by using escape_decode from the codecs standard library module. However, this isn't documented and isn't really meant to be used that way - it's internal stuff used to implement the unicode-escape codec and possibly some other things.
If you want to expose yourself to the risk of that breaking in the future, it looks like:
>>> import codecs
>>> codecs.escape_decode(r'\x00\x00')
(b'\x00\x00', 8)
We get a 2-tuple, with the desired bytes and what I assume is the number of Unicode code points that were decoded (i.e. the length of the string). From my testing, it appears that it can only use UTF-8 encoding for the non-backslash sequences (but this could be specific to how Python is configured), and you can't change this; there is no actual parameter to specify the encoding, for a decode method. Like I said - not meant for general use.
Yes, all of that is as awkward as it seems. The reason you don't get easy support for this kind of thing is that it isn't really how you're intended to design your system. Fundamentally, all data is bytes; text is an abstraction that is encoded by that byte data. Using a single byte (with value 0) to represent four characters of text (the symbols \, x, 0 and 0) is not a normal encoding, and not a reversible one (how do I know whether to decode the byte as those four characters, or as a single NUL character?). Instead, you should strongly consider using some other friendly string representation of your data (perhaps a plain hex dump) and a non-text-encoding-related way to parse it. For example:
>>> data = '41 42' # a string in a simple hex dump format
>>> bytes.fromhex(data) # support is built-in, and works simply
b'AB'
>>> list(bytes.fromhex(data))
[65, 66]
i have a code that encrypts data and then i embed it into an image,
so when I was checking the encryption and decryption code that worked fine,
Also i used blowfish module for encryption.
now problem is that when I embed the data into the image and extract it,
It's a bytesarray in plaintext form,
b'\x98\xac\xc3ymQ_\x80\xcb\xec\x9c\x04\xc3#\x88\x93`j\x05\x96\x9d\xcb\x0ec\xb2\x9b(\xd9#\x9fI\x00\xc7h\xe3\x83\xbd0\r\xad}*t'
the above is a bytesarray in plaintext form,
So if I try to convert it to bytesarray again it will re-encode it and put the '\' between the characters that already have it and now this new bytesarray is not a normal bytesarray, and the data is corrupted.
bytearray(b"b\'\\x98\\xac\\xc3ymQ_\\x80\\xcb\\xec\\x9c\\x04\\xc3#\\x88\\x93`j\\x05\\x96\\x9d\\xcb\\x0ec\\xb2\\x9b(\\xd9#\\x9fI\\x00\\xc7h\\xe3\\x83\\xbd0\\r\\xad}*t\'")
So my question is that how do I typecast or convert the str to a bytesarray?
without changing the data.
If I can understand your requirement then the following code snippet could help. Applied module ast — Abstract Syntax Trees:
import ast
bys=r"b'\x98\xac\xc3ymQ_\x80\xcb\xec\x9c\x04\xc3#\x88\x93`j\x05\x96\x9d\xcb\x0ec\xb2\x9b(\xd9#\x9fI\x00\xc7h\xe3\x83\xbd0\r\xad}*t'"
print( '↓↓↓', type(bys))
print( bys)
byb = ast.literal_eval(bys)
print( byb)
print( '↑↑↑', type(byb))
Result: .\SO\68139330.py
↓↓↓ <class 'str'>
b'\x98\xac\xc3ymQ_\x80\xcb\xec\x9c\x04\xc3#\x88\x93`j\x05\x96\x9d\xcb\x0ec\xb2\x9b(\xd9#\x9fI\x00\xc7h\xe3\x83\xbd0\r\xad}*t'
b'\x98\xac\xc3ymQ_\x80\xcb\xec\x9c\x04\xc3#\x88\x93`j\x05\x96\x9d\xcb\x0ec\xb2\x9b(\xd9#\x9fI\x00\xc7h\xe3\x83\xbd0\r\xad}*t'
↑↑↑ <class 'bytes'>
Okay anyone else trying to achieve this the other answers are correct provided you are working with a small byte string.
But as I wasn't and needed the theoretical unlimited byte string size
(I wanted it to be at-least 1000 lines of 25 words of 4 letters on average each).
So if you want to do the same then,
convert the Byte string to it's equivalent binary by below code:
def bitstring_to_bytes(s):
return int(s, 2).to_bytes((len(s) + 7) // 8, byteorder='big')
Then reconstruct the byte string by
def bin2byte(bin_string):
return bin(int.from_bytes(bit_string, byteorder="big"))
this can overcome the limitation with eval and can go way further than that.
I referred these posts from stack overflow:
byte string to binary: How can I convert bytes object to decimal or binary representation in python?
binary to byte string :Convert binary string to bytearray in Python 3
I've received byte code similar to this:
}Pl\xA1u#\x1EW\x02\x00\x01\x00\x00\x00\x00\x00\x00\x00\x85\xA9\xF4>\x08\x00\x00\x00\xBF\xE8\xA3B\xC30\xECA\xFA~
How can this be decoded to normal value?
Try,
data.decode("utf-16"), where data is your byte code.
If the bytes are binary format of float sequence, you need to know the byte order when serializing them first.
And in Python, we could use struct.unpack() with format f to unpack the bytes to float.
See https://docs.python.org/2/library/struct.html for reference.
I would like to represent four floats e.g, 123.545, 56.234, -4534.234, 544.64 using the set of characters [a..z, A..Z, 0..9] in the shortest way possible so I can encode the four floats and store them in a filename. What is the most efficient to do this?
I've looked at base64 encoding which doesn't actually compress the result. I also looked at a polyline encoding algorithm which uses characters like ) and { and I can't have that.
You could use the struct module to store them as binary 32-bit floats, and encode the result into base64. In Python 2:
>>> import struct, base64
>>> base64.urlsafe_b64encode(struct.pack("ffff", 123.545,56.234,-4534.234,544.64))
'Chf3Qp7vYELfsY3F9igIRA=='
The == padding can be removed and re-added for decoding such that the length of the base64 string is a multiple of 4. You will also want to use URL-safe base64 to avoid the / character.
I get the following bytes from a network service: \x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01 These are 8 bit number. I want to change the representation to my system's representation (32 bits) to be able to work on the bytes. How would I do this with python? Is there a special 'reverse' function for this?
best regards
If you have 8-bit numbers the byte order is irrelevant, as there is only one byte in each of them. If you want to convert every character to integer you can write:
struct.unpack("11B", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
or
struct.unpack("!11B", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
or
map(ord, "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
It's equivalent.
If string contains 16-bit or 32-bit integers, you can write things like:
struct.unpack("!IIHB", "\x83\x08\x04\x04\x60\x02\x00\x81\x15\x01\x01")
which would be decoded as two 4-byte, one 2-byte and one 1-byte unsigned integers. The ! (which is equivalent to big-endian >) means that string is in network byte order, so all integers larger than one byte can be converted correctly to your native byte order.
EDIT: If what you want is to get eleven numbers and process them in reversed order, you should use one of above methods and call reversed, for example: reversed(map(ord, data)); but this reverses the order regardless of your native byte order. You didn't say what the data really is thou and I'm not convinced endianness does matter here.
Determine which byte order the bytes are in, and supply the correct byte order character to struct.unpack.
If you want to reverse all of the bytes in a string, you can do this:
'example string'[::-1]
I would recommend the struct module for unpacking network or otherwise binary data, as you otherwise don't have a good way to tell where exactly the reversing needs to happen. It allows you to specify the byte order.
I'm not sure what you mean by 8308040460020081150101, but the struct package should have everything you need.
Have you looked at the core struct library? It has methods for converting byte orders.