I have encoded a string to integer in the following way in python:
b = bytearray()
b.extend(input_number_or_text.encode('ascii'))
input_number_or_text = int.from_bytes(b,byteorder='big', signed=False)
I am encrypting this integer to get a new value and subsequently decrypting to get back the original integer.
Now how do I get back the string from the integer
I have tried the following method for decryption:
decrypted_data.to_bytes(1,byteorder='big').decode('ascii')
but I get int too big to convert error.
How to fix this problem?
You told it the int should be convertable to a length 1 byte string. If it's longer, that won't work. You can remember the length, or you can guess at it:
num_bytes = (decrypted_data.bit_length() + 7) // 8
decrypted_data.to_bytes(num_bytes, byteorder='big').decode('ascii')
Adding 7 and floor-dividing by 8 ensures enough bytes for the data. -(-decrypted_data.bit_length() // 8) also works (and is trivially faster on Python), but is a bit more magical looking.
The byte representation of an integer is different than a string.
For example - 1 , '1', 1.0 all look different when looking at the byte representation.
From the code you supply -
b.extend(input_number_or_text.encode('ascii'))
and int.from_bytes(b,byteorder='big', signed=False)
Seems like your encoding a string of a number, and trying to decode it as a int.
See the next example:
In [3]: b = bytearray()
In [4]: a = '1'
In [5]: b.extend(a.encode('ascii'))
In [6]: int.from_bytes(b,byteorder='big',signed=False)
Out[6]: 49
If you are encoding a string, you should first decode a string, and then convert to int.
In [1]: b = bytearray()
In [2]: a = '123'
In [3]: b.extend(a.encode('ascii'))
In [4]: decoded = int(b.decode('ascii'))
In [5]: decoded
Out[5]: 123
Related
I am using the int.to_bytes method within Python to convert integer values into bytes.
With certain values, this seems to fail. Attached is the output from the Python console:
value = 2050
value.to_bytes(2, 'big')
>>> b'\x08\x02'
value = 2082
value.to_bytes(2, 'big')
>>> b'\x08"'
With a value of 2050, the conversion seems to be correct. But when the value is 2082, for some reason only the upper byte seems to be extracted. Any reason as to why this is happening?
It extracts all bytes. Try
value = 2082
x = value.to_bytes(2, 'big')
print(x[0]) # Output: 8
print(x[1]) # Output: 34
When you convert to string, byte 34 translates to ASCII ", which is what you see.
If I take the letter 'à' and encode it in UTF-8 I obtain the following result:
'à'.encode('utf-8')
>> b'\xc3\xa0'
Now from a bytearray I would like to convert 'à' into a binary string and turn it back into 'à'. To do so I execute the following code:
byte = bytearray('à','utf-8')
for x in byte:
print(bin(x))
I get 0b11000011and0b10100000, which is 195 and 160. Then, I fuse them together and take the 0b part out. Now I execute this code:
s = '1100001110100000'
value1 = s[0:8].encode('utf-8')
value2 = s[9:16].encode('utf-8')
value = value1 + value2
print(chr(int(value, 2)))
>> 憠
No matter how I develop the later part I get symbols and never seem to be able to get back my 'à'. I would like to know why is that? And how can I get an 'à'.
>>> bytes(int(s[i:i+8], 2) for i in range(0, len(s), 8)).decode('utf-8')
'à'
There are multiple parts to this. The bytes constructor creates a byte string from a sequence of integers. The integers are formed from strings using int with a base of 2. The range combined with the slicing peels off 8 characters at a time. Finally decode converts those bytes back into Unicode characters.
you need your second bits to be s[8:16] (or just s[8:]) otherwise you get 0100000
you also need to convert you "bit string" back to an integer before thinking of it as a byte with int("0010101",2)
s = '1100001110100000'
value1 = bytearray([int(s[:8],2), # bits 0..7 (8 total)
int(s[8:],2)] # bits 8..15 (8 total)
)
print(value1.decode("utf8"))
Convert the base-2 value back to an integer with int(s,2), convert that integer to a number of bytes (int.to_bytes) based on the original length divided by 8 and big-endian conversion to keep the bytes in the right order, then .decode() it (default in Python 3 is utf8):
>>> s = '1100001110100000'
>>> int(s,2)
50080
>>> int(s,2).to_bytes(len(s)//8,'big')
b'\xc3\xa0'
>>> int(s,2).to_bytes(len(s)//8,'big').decode()
'à'
I want to read bytes 1,2 and 3 from a file. I know it corresponds to a string (in this case it's ELF of a Linux binary header)
Following examples I could find on the net I came up with this:
with open('hello', 'rb') as f:
f.seek(1)
bytes = f.read(3)
string = struct.unpack('s', bytes)
print st
Looking at the official documentation of struct it seems that passing s as argument should allow me to read a string.
I get the error:
st = struct.unpack('s', bytes)
struct.error: unpack requires a string argument of length 1
EDIT: Using Python 2.7
In your special case, it is enough to just check
if bytes == 'ELF':
to test all three bytes in one step to be the three characters E, L and F.
But also if you want to check the numerical values, you do not need to unpack anything here. Just use ord(bytes[i]) (with i in 0, 1, 2) to get the byte values of the three bytes.
Alternatively you can use
byte_values = struct.unpack('bbb', bytes)
to get a tuple of the three bytes. You can also unpack that tuple on the fly in case the bytes have nameable semantics like this:
width, height, depth = struct.unpack('bbb', bytes)
Use 'BBB' instead of 'bbb' in case your byte values shall be unsigned.
In Python 2, read returns a string; in the sense "string of bytes". To get a single byte, use bytes[i], it will return another string but with a single byte. If you need the numeric value of a byte, use ord: ord(bytes[i]). Finally, to get numeric values for all bytes use map(ord, bytes).
In [4]: s = "foo"
In [5]: s[0]
Out[5]: 'f'
In [6]: ord(s[0])
Out[6]: 102
In [7]: map(ord, s)
Out[7]: [102, 111, 111]
The shortest ways I have found are:
n = 5
# Python 2.
s = str(n)
i = int(s)
# Python 3.
s = bytes(str(n), "ascii")
i = int(s)
I am particularly concerned with two factors: readability and portability. The second method, for Python 3, is ugly. However, I think it may be backwards compatible.
Is there a shorter, cleaner way that I have missed? I currently make a lambda expression to fix it with a new function, but maybe that's unnecessary.
Answer 1:
To convert a string to a sequence of bytes in either Python 2 or Python 3, you use the string's encode method. If you don't supply an encoding parameter 'ascii' is used, which will always be good enough for numeric digits.
s = str(n).encode()
Python 2: http://ideone.com/Y05zVY
Python 3: http://ideone.com/XqFyOj
In Python 2 str(n) already produces bytes; the encode will do a double conversion as this string is implicitly converted to Unicode and back again to bytes. It's unnecessary work, but it's harmless and is completely compatible with Python 3.
Answer 2:
Above is the answer to the question that was actually asked, which was to produce a string of ASCII bytes in human-readable form. But since people keep coming here trying to get the answer to a different question, I'll answer that question too. If you want to convert 10 to b'10' use the answer above, but if you want to convert 10 to b'\x0a\x00\x00\x00' then keep reading.
The struct module was specifically provided for converting between various types and their binary representation as a sequence of bytes. The conversion from a type to bytes is done with struct.pack. There's a format parameter fmt that determines which conversion it should perform. For a 4-byte integer, that would be i for signed numbers or I for unsigned numbers. For more possibilities see the format character table, and see the byte order, size, and alignment table for options when the output is more than a single byte.
import struct
s = struct.pack('<i', 5) # b'\x05\x00\x00\x00'
You can use the struct's pack:
In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'
The ">" is the byte-order (big-endian) and the "I" is the format character. So you can be specific if you want to do something else:
In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'
In [13]: struct.pack("B", 1)
Out[13]: '\x01'
This works the same on both python 2 and python 3.
Note: the inverse operation (bytes to int) can be done with unpack.
I have found the only reliable, portable method to be
bytes(bytearray([n]))
Just bytes([n]) does not work in python 2. Taking the scenic route through bytearray seems like the only reasonable solution.
Converting an int to a byte in Python 3:
n = 5
bytes( [n] )
>>> b'\x05'
;) guess that'll be better than messing around with strings
source: http://docs.python.org/3/library/stdtypes.html#binaryseq
In Python 3.x, you can convert an integer value (including large ones, which the other answers don't allow for) into a series of bytes like this:
import math
x = 0x1234
number_of_bytes = int(math.ceil(x.bit_length() / 8))
x_bytes = x.to_bytes(number_of_bytes, byteorder='big')
x_int = int.from_bytes(x_bytes, byteorder='big')
x == x_int
from int to byte:
bytes_string = int_v.to_bytes( lenth, endian )
where the lenth is 1/2/3/4...., and endian could be 'big' or 'little'
form bytes to int:
data_list = list( bytes );
When converting from old code from python 2 you often have "%s" % number this can be converted to b"%d" % number (b"%s" % number does not work) for python 3.
The format b"%d" % number is in addition another clean way to convert int to a binary string.
b"%d" % number
I have a long Hex string that represents a series of values of different types. I need to convert this Hex String into bytes or bytearray so that I can extract each value from the raw data. How can I do this?
For example, the string "ab" should convert to the bytes b"\xab" or equivalent byte array. Longer example:
>>> # what to use in place of `convert` here?
>>> convert("8e71c61de6a2321336184f813379ec6bf4a3fb79e63cd12b")
b'\x8eq\xc6\x1d\xe6\xa22\x136\x18O\x813y\xeck\xf4\xa3\xfby\xe6<\xd1+'
Suppose your hex string is something like
>>> hex_string = "deadbeef"
Convert it to a bytearray (Python 3 and 2.7):
>>> bytearray.fromhex(hex_string)
bytearray(b'\xde\xad\xbe\xef')
Convert it to a bytes object (Python 3):
>>> bytes.fromhex(hex_string)
b'\xde\xad\xbe\xef'
Note that bytes is an immutable version of bytearray.
Convert it to a string (Python ≤ 2.7):
>>> hex_data = hex_string.decode("hex")
>>> hex_data
"\xde\xad\xbe\xef"
There is a built-in function in bytearray that does what you intend.
bytearray.fromhex("de ad be ef 00")
It returns a bytearray and it reads hex strings with or without space separator.
provided I understood correctly, you should look for binascii.unhexlify
import binascii
a='45222e'
s=binascii.unhexlify(a)
b=[ord(x) for x in s]
Assuming you have a byte string like so
"\x12\x45\x00\xAB"
and you know the amount of bytes and their type you can also use this approach
import struct
bytes = '\x12\x45\x00\xAB'
val = struct.unpack('<BBH', bytes)
#val = (18, 69, 43776)
As I specified little endian (using the '<' char) at the start of the format string the function returned the decimal equivalent.
0x12 = 18
0x45 = 69
0xAB00 = 43776
B is equal to one byte (8 bit) unsigned
H is equal to two bytes (16 bit) unsigned
More available characters and byte sizes can be found here
The advantages are..
You can specify more than one byte and the endian of the values
Disadvantages..
You really need to know the type and length of data your dealing with
You can use the Codecs module in the Python Standard Library, i.e.
import codecs
codecs.decode(hexstring, 'hex_codec')
You should be able to build a string holding the binary data using something like:
data = "fef0babe"
bits = ""
for x in xrange(0, len(data), 2)
bits += chr(int(data[x:x+2], 16))
This is probably not the fastest way (many string appends), but quite simple using only core Python.
A good one liner is:
byte_list = map(ord, hex_string)
This will iterate over each char in the string and run it through the ord() function. Only tested on python 2.6, not too sure about 3.0+.
-Josh