I'm in a special circumstance where I would like to convert an integer into a bytes object of the smallest length possible. I currently use the following method to covert to bytes:
number = 9847
bytes = number.to_bytes(4, 'little')
However I would like to scale that the amount of bytes used down (the 4) to the smallest possible size. How can I achieve this?
I figured it out on my own! I use the following function to do the conversion to bytes for me now:
import math
def int_to_bytes(self, integer_in: int) -> bytes:
"""Convert an integer to bytes"""
# Calculates the least amount of bytes the integer can be fit into
length = math.ceil(math.log(integer_in)/math.log(256))
return integer_in.to_bytes(length, 'little')
This works because with exponents a = b^e is equivalent to e = log(a)/log(b)
In this case our problem is integer_in = 256^e, and we want to solve for e. This can be solved by rephrasing it to e = log(integer_in)/log(256). Lastly, we use math.ceil() to round up the answer to an integer.
Related
I'm workign with Python 3, trying get an integer out of a digest in python. I'm only interested in the first n bits of the digest though.
What I have right now is this:
n = 3
int(hashlib.sha1(b'test').digest()[0:n])
This however results in a ValueError: invalid literal for int() with base 10: b'\xa9J' error.
Thanks.
The Py3 solution is to use int.from_bytes to convert bytes to int, then shift off the part you don't care about:
def bitsof(bt, nbits):
# Directly convert enough bytes to an int to ensure you have at least as many bits
# as needed, but no more
neededbytes = (nbits+7)//8
if neededbytes > len(bt):
raise ValueError("Require {} bytes, received {}".format(neededbytes, len(bt)))
i = int.from_bytes(bt[:neededbytes], 'big')
# If there were a non-byte aligned number of bits requested,
# shift off the excess from the right (which came from the last byte processed)
if nbits % 8:
i >>= 8 - nbits % 8
return i
Example use:
>>> bitsof(hashlib.sha1(b'test').digest(), 3)
5 # The leftmost bits of the a nibble that begins the hash
On Python 2, the function can be used almost as is, aside from adding a binascii import, and changing the conversion from bytes to int to the slightly less efficient two step conversion (from str to hex representation, then using int with base of 16 to parse it):
i = int(binascii.hexlify(bt[:neededbytes]), 16)
Everything else works as is (even the // operator works as expected; Python 2's / operator is different from Py 3's, but // works the same on both).
I'm trying to write my "personal" python version of STL binary file reader, according to WIKIPEDIA : A binary STL file contains :
an 80-character (byte) headern which is generally ignored.
a 4-byte unsigned integer indicating the number of triangular facets in the file.
Each triangle is described by twelve 32-bit floating-point numbers: three for the normal and then three for the X/Y/Z coordinate of each vertex – just as with the ASCII version of STL. After these follows a 2-byte ("short") unsigned integer that is the "attribute byte count" – in the standard format, this should be zero because most software does not understand anything else. --Floating-point numbers are represented as IEEE floating-point numbers and are assumed to be little-endian--
Here is my code :
#! /usr/bin/env python3
with open("stlbinaryfile.stl","rb") as fichier :
head=fichier.read(80)
nbtriangles=fichier.read(4)
print(nbtriangles)
The output is :
b'\x90\x08\x00\x00'
It represents an unsigned integer, I need to convert it without using any package (struct,stl...). Are there any (basic) rules to do it ?, I don't know what does \x mean ? How does \x90 represent one byte ?
most of the answers in google mention "C structs", but I don't know nothing about C.
Thank you for your time.
Since you're using Python 3, you can use int.from_bytes. I'm guessing the value is stored little-endian, so you'd just do:
nbtriangles = int.from_bytes(fichier.read(4), 'little')
Change the second argument to 'big' if it's supposed to be big-endian.
Mind you, the normal way to parse a fixed width type is the struct module, but apparently you've ruled that out.
For the confusion over the repr, bytes objects will display ASCII printable characters (e.g. a) or standard ASCII escapes (e.g. \t) if the byte value corresponds to one of them. If it doesn't, it uses \x##, where ## is the hexadecimal representation of the byte value, so \x90 represents the byte with value 0x90, or 144. You need to combine the byte values at offsets to reconstruct the int, but int.from_bytes does this for you faster than any hand-rolled solution could.
Update: Since apparent int.from_bytes isn't "basic" enough, a couple more complex, but only using top-level built-ins (not alternate constructors) solutions. For little-endian, you can do this:
def int_from_bytes(inbytes):
res = 0
for i, b in enumerate(inbytes):
res |= b << (i * 8) # Adjust each byte individually by 8 times position
return res
You can use the same solution for big-endian by adding reversed to the loop, making it enumerate(reversed(inbytes)), or you can use this alternative solution that handles the offset adjustment a different way:
def int_from_bytes(inbytes):
res = 0
for b in inbytes:
res <<= 8 # Adjust bytes seen so far to make room for new byte
res |= b # Mask in new byte
return res
Again, this big-endian solution can trivially work for little-endian by looping over reversed(inbytes) instead of inbytes. In both cases inbytes[::-1] is an alternative to reversed(inbytes) (the former makes a new bytes in reversed order and iterates that, the latter iterates the existing bytes object in reverse, but unless it's a huge bytes object, enough to strain RAM if you copy it, the difference is pretty minimal).
The typical way to interpret an integer is to use struct.unpack, like so:
import struct
with open("stlbinaryfile.stl","rb") as fichier :
head=fichier.read(80)
nbtriangles=fichier.read(4)
print(nbtriangles)
nbtriangles=struct.unpack("<I", nbtriangles)
print(nbtriangles)
If you are allergic to import struct, then you can also compute it by hand:
def unsigned_int(s):
result = 0
for ch in s[::-1]:
result *= 256
result += ch
return result
...
nbtriangles = unsigned_int(nbtriangles)
As to what you are seeing when you print b'\x90\x08\x00\x00'. You are printing a bytes object, which is an array of integers in the range [0-255]. The first integer has the value 144 (decimal) or 90 (hexadecimal). When printing a bytes object, that value is represented by the string \x90. The 2nd has the value eight, represented by \x08. The 3rd and final integers are both zero. They are presented by \x00.
If you would like to see a more familiar representation of the integers, try:
print(list(nbtriangles))
[144, 8, 0, 0]
To compute the 32-bit integers represented by these four 8-bit integers, you can use this formula:
total = byte0 + (byte1*256) + (byte2*256*256) + (byte3*256*256*256)
Or, in hex:
total = byte0 + (byte1*0x100) + (byte2*0x10000) + (byte3*0x1000000)
Which results in:
0x00000890
Perhaps you can see the similarities to decimal, where the string "1234" represents the number:
4 + 3*10 + 2*100 + 1*1000
I am trying to convert a number stored as a list of ints to a float type. I got the number via a serial console and want to reassemble it back together into a float.
The way I would do it in C is something like this:
bit_data = ((int16_t)byte_array[0] << 8) | byte_array[1];
result = (float)bit_data;
What I tried to use in python is a much more simple conversion:
result = int_list[0]*256.0 + int_list[1]
However, this does not preserve the sign of the result, as the C code does.
What is the right way to do this in python?
UPDATE:
Python version is 2.7.3.
My byte array has a length of 2.
in the python code byte_array is list of ints. I've renamed it to avoid misunderstanding. I can not just use the float() function because it will not preserve the sign of the number.
I'm a bit confused by what data you have, and how it is represented in Python. As I understand it, you have received two unsigned bytes over a serial connection, which are now represented by a list of two python ints. This data represents a big endian 16-bit signed integer, which you want to extract and turn into a float. eg. [0xFF, 0xFE] -> -2 -> -2.0
import array, struct
two_unsigned_bytes = [255, 254] # represented by ints
byte_array = array.array("B", two_unsigned_bytes)
# change above to "b" if the ints represent signed bytes ie. in range -128 to 127
signed_16_bit_int, = struct.unpack(">h", byte_array)
float_result = float(signed_16_bit_int)
I think what you want is the struct module.
Here's a round trip snippet:
import struct
sampleValue = 42.13
somebytes = struct.pack('=f', sampleValue)
print(somebytes)
result = struct.unpack('=f', somebytes)
print(result)
result may be surprising to you. unpack returns a tuple. So to get to the value you can do
result[0]
or modify the result setting line to be
result = struct.unpack('=f', some bytes)[0]
I personally hate that, so use the following instead
result , = struct.unpack('=f', some bytes) # tuple unpacking on assignment
The second thing you'll notice is that the value has extra digits of noise. That's because python's native floating point representation is double.
(This is python3 btw, adjust for using old versions of python as appropriate)
I am not sure I really understand what you are doing, but I think you got 4 bytes from a stream and know them to represent a float32 value. The way you handling this suggests big-endian byte-order.
Python has the struct package (https://docs.python.org/2/library/struct.html) to handle bytestreams.
import struct
stream = struct.pack(">f", 2/3.)
len(stream) # 4
reconstructed_float = struct.unpack(">f", stream)
Okay, so I think int_list isn't really just a list of ints. The ints are constrained to 0-255 and represent bytes that can be built into a signed integer. You then want to turn that into a float. The trick is to set the sign of the first byte properly and then procede much like you did.
float((-(byte_array[0]-127) if byte_array[0]>127 else byte_array[0])*256 + byte_array[1])
In Python I need to convert a bunch of floats into hexadecimal. It needs to be zero padded (for instance, 0x00000010 instead of 0x10). Just like http://gregstoll.dyndns.org/~gregstoll/floattohex/ does. (sadly i can't use external libs on my platform so i can't use the one provided on that website)
What is the most efficient way of doing this?
This is a bit tricky in python, because aren't looking to convert the floating-point value to a (hex) integer. Instead, you're trying to interpret the IEEE 754 binary representation of the floating-point value as hex.
We'll use the pack and unpack functions from the built-in struct library.
A float is 32-bits. We'll first pack it into a binary1 string, and then unpack it as an int.
def float_to_hex(f):
return hex(struct.unpack('<I', struct.pack('<f', f))[0])
float_to_hex(17.5) # Output: '0x418c0000'
We can do the same for double, knowing that it is 64 bits:
def double_to_hex(f):
return hex(struct.unpack('<Q', struct.pack('<d', f))[0])
double_to_hex(17.5) # Output: '0x4031800000000000L'
1 - Meaning a string of raw bytes; not a string of ones and zeroes.
In Python float is always double-precision.
If you require your answer to be output in the form of a hexadecimal integer, the question was already answered:
import struct
# define double_to_hex as in the other answer
double_to_hex(17.5) # Output: '0x4031800000000000'
double_to_hex(-17.5) # Output: '0xc031800000000000'
However you might instead consider using the builtin function:
(17.5).hex() # Output: '0x1.1800000000000p+4'
(-17.5).hex() # Output: '-0x1.1800000000000p+4'
# 0x1.18p+4 == (1 + 1./0x10 + 8./0x100) * 2**4 == 1.09375 * 16 == 17.5
This is the same answer as before, just in a more structured and human-readable format.
The lower 52 bits are the mantissa. The upper 12 bits consists of a sign bit and an 11-bit exponent; the exponent bias is 1023 == 0x3FF, so 0x403 means '4'. See Wikipedia article on IEEE floating point.
Further to Jonathon Reinhart's very helpful answer. I needed this to send a floating point number as bytes over UDP
import struct
# define double_to_hex (or float_to_hex)
def double_to_hex(f):
return hex(struct.unpack('<Q', struct.pack('<d', f))[0])
# On the UDP transmission side
doubleAsHex = double_to_hex(17.5)
doubleAsBytes = bytearray.fromhex(doubleAsHex.lstrip('0x').rstrip('L'))
# On the UDP receiving side
doubleFromBytes = struct.unpack('>d', doubleAsBytes)[0] # or '>f' for float_to_hex
if you are on micropython (which is not said in the question, but I had trouble finding) you can use this
import struct
import binascii
def float_to_hex(f):
return binascii.hexlify(struct.pack('<f', f))
float_to_hex(17.5) # 0x418c0000
You may use these minimal python script in order to write and read 32 and 16 bit floating point numbers into hex string.
import numpy as np
import struct
# encoding
a = np.float32(0.012533333)
hex_str=struct.pack('<f', a)
# check how many byte it has. In this case it is 4.
print(len(hex_str))
# decoding
bhat=struct.unpack('<f',hex_str)[0]
ahat=np.float32(bhat)
But for float16, the situation is alittle different; first you need to find corresponed integer representation then write/read it to the hex file as follows;
# encoding
a = np.float16(0.012533333)
b= a.view(np.int16)
hex_str=struct.pack('<h', b)
# check how many byte it has. In this case it is 2
print(len(hex_str))
# decoding
bhat=struct.unpack('<h',hex_str)[0]
ahat=np.int16(bhat).view(np.float16)
The shortest ways I have found are:
n = 5
# Python 2.
s = str(n)
i = int(s)
# Python 3.
s = bytes(str(n), "ascii")
i = int(s)
I am particularly concerned with two factors: readability and portability. The second method, for Python 3, is ugly. However, I think it may be backwards compatible.
Is there a shorter, cleaner way that I have missed? I currently make a lambda expression to fix it with a new function, but maybe that's unnecessary.
Answer 1:
To convert a string to a sequence of bytes in either Python 2 or Python 3, you use the string's encode method. If you don't supply an encoding parameter 'ascii' is used, which will always be good enough for numeric digits.
s = str(n).encode()
Python 2: http://ideone.com/Y05zVY
Python 3: http://ideone.com/XqFyOj
In Python 2 str(n) already produces bytes; the encode will do a double conversion as this string is implicitly converted to Unicode and back again to bytes. It's unnecessary work, but it's harmless and is completely compatible with Python 3.
Answer 2:
Above is the answer to the question that was actually asked, which was to produce a string of ASCII bytes in human-readable form. But since people keep coming here trying to get the answer to a different question, I'll answer that question too. If you want to convert 10 to b'10' use the answer above, but if you want to convert 10 to b'\x0a\x00\x00\x00' then keep reading.
The struct module was specifically provided for converting between various types and their binary representation as a sequence of bytes. The conversion from a type to bytes is done with struct.pack. There's a format parameter fmt that determines which conversion it should perform. For a 4-byte integer, that would be i for signed numbers or I for unsigned numbers. For more possibilities see the format character table, and see the byte order, size, and alignment table for options when the output is more than a single byte.
import struct
s = struct.pack('<i', 5) # b'\x05\x00\x00\x00'
You can use the struct's pack:
In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'
The ">" is the byte-order (big-endian) and the "I" is the format character. So you can be specific if you want to do something else:
In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'
In [13]: struct.pack("B", 1)
Out[13]: '\x01'
This works the same on both python 2 and python 3.
Note: the inverse operation (bytes to int) can be done with unpack.
I have found the only reliable, portable method to be
bytes(bytearray([n]))
Just bytes([n]) does not work in python 2. Taking the scenic route through bytearray seems like the only reasonable solution.
Converting an int to a byte in Python 3:
n = 5
bytes( [n] )
>>> b'\x05'
;) guess that'll be better than messing around with strings
source: http://docs.python.org/3/library/stdtypes.html#binaryseq
In Python 3.x, you can convert an integer value (including large ones, which the other answers don't allow for) into a series of bytes like this:
import math
x = 0x1234
number_of_bytes = int(math.ceil(x.bit_length() / 8))
x_bytes = x.to_bytes(number_of_bytes, byteorder='big')
x_int = int.from_bytes(x_bytes, byteorder='big')
x == x_int
from int to byte:
bytes_string = int_v.to_bytes( lenth, endian )
where the lenth is 1/2/3/4...., and endian could be 'big' or 'little'
form bytes to int:
data_list = list( bytes );
When converting from old code from python 2 you often have "%s" % number this can be converted to b"%d" % number (b"%s" % number does not work) for python 3.
The format b"%d" % number is in addition another clean way to convert int to a binary string.
b"%d" % number