How to convet bytes to bitstirng - python

I have a bitstring which I encoded with the function below when I try to decode with my function it doesn't work. what can I do?
def bitstring_to_bytes(self,s):
return int(s, 2).to_bytes(len(s) // 8, byteorder='big')
def bytes_to_string(self,xbytes):
return xbytes.from_bytes(xbytes, 'big')

(More or less) analogous to how the int type knows how to convert an integer instance to bytes, it is int again who knows how to convert bytes back to an integer.
Your xbytes variable is a bytes object, so it doesn't know how to convert itself to an integer.
Instead you do it like this:
intermediate_result = int.from_bytes(xbytes, byteorder='big')
(And afterwards you will want to convert it into a string of 0s and 1s.)
The reason it's not completely orthogonal (Converting forward you use value.method() and converting back you use type.method(value)) is that in the forward case your value is already of the type which knows how to convert itself, but in the backward case, your value is of the other type, who doesn't know how to convert back.
You can also think about it in English terms, if you like:
value.to_bytes(byteorder='big')
would be Convert this integer "value" to a bytes object.
int.from_bytes(value, byteorder='big')
would be Create a fresh integer object from that bytes "value".

Related

Python - Integer to bytes object smallest possible size

I'm in a special circumstance where I would like to convert an integer into a bytes object of the smallest length possible. I currently use the following method to covert to bytes:
number = 9847
bytes = number.to_bytes(4, 'little')
However I would like to scale that the amount of bytes used down (the 4) to the smallest possible size. How can I achieve this?
I figured it out on my own! I use the following function to do the conversion to bytes for me now:
import math
def int_to_bytes(self, integer_in: int) -> bytes:
"""Convert an integer to bytes"""
# Calculates the least amount of bytes the integer can be fit into
length = math.ceil(math.log(integer_in)/math.log(256))
return integer_in.to_bytes(length, 'little')
This works because with exponents a = b^e is equivalent to e = log(a)/log(b)
In this case our problem is integer_in = 256^e, and we want to solve for e. This can be solved by rephrasing it to e = log(integer_in)/log(256). Lastly, we use math.ceil() to round up the answer to an integer.

str.join TypeError when decoding binary file using struct.unpack [duplicate]

I'm trying to get the first char of a byte-string in python 3.4, but when I index it, I get an int:
>>> my_bytes = b'just a byte string'
b'just a byte string'
>>> my_bytes[0]
106
>>> type(my_bytes[0])
<class 'int'>
This seems unintuitive to me, as I was expecting to get b'j'.
I have discovered that I can get the value I expect, but it feels like a hack to me.
>>> my_bytes[0:1]
b'j'
Can someone please explain why this happens?
The bytes type is a Binary Sequence type, and is explicitly documented as containing a sequence of integers in the range 0 to 255.
From the documentation:
Bytes objects are immutable sequences of single bytes.
[...]
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that 0 <= x < 256[.]
[...]
Since bytes objects are sequences of integers (akin to a tuple), for a bytes object b, b[0] will be an integer, while b[0:1] will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1).
Bold emphasis mine. Note than indexing a string is a bit of an exception among the sequence types; 'abc'[0] gives you a str object of length one; str is the only sequence type that contains elements of its own type, always.
This echoes how other languages treat string data; in C the unsigned char type is also effectively an integer in the range 0-255. Many C compilers default to unsigned if you use an unqualified char type, and text is modelled as a char[] array.

Converting integer to a pair of bytes produces unexpected format?

I am using python 3.8.5, and trying to convert from an integer in the range (0,65535) to a pair of bytes. I am currently using the following code:
from struct import pack
input_integer = 2111
bytes_val = voltage.to_bytes(2,'little')
output_data = struct.pack('bb',bytes_val[1],bytes_val[0])
print(output_data)
This produces the following output:
b'\x08?'
This \x08 is 8 in hex, the most significant byte, and ? is 63 in ascii. So together, the numbers add up to 2111 (8*256+63=2111). What I can't figure out is why the least significant byte is coming out in ascii instead of hex? It's very strange to me that it's in a different format than the MSB right next to it. I want it in hex for the output data, and am trying to figure out how to achieve that.
I have also tried modifying the format string in the last line to the following:
output_data = struct.pack('cc',bytes_val[1],bytes_val[0])
which produces the following error:
struct.error: char format requires a bytes object of length 1
I checked the types at each step, and it looks like bytes_val is a bytearray of length 2, but when I take one of the individual elements, say bytes_val[1], it is an integer rather than a byte array.
Any ideas?
All your observations can be verified from the docs for the bytes class:
While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers
In Python strings any letters and punctuation are represented by themselves in ASCII, while any control codes by their hexadecimal value (0-31, 127). You can see this by printing ''.join(map(chr, range(128))). Bytes literals follow the same convention, except that individual byte elements are integer, i.e., output_data[0].
If you want to represent everything as hex
>>> output_data.hex()
'083f'
>>> bytes.fromhex('083f') # to recover
b'\x08?'
As of version 3.8 bytes.hex() now supports optional sep and bytes_per_sep parameters to insert separators between bytes in the hex output.
>>> b'abcdef'.hex(' ', 2)
'6162 6364 6566'

Convert a list of ints to a float

I am trying to convert a number stored as a list of ints to a float type. I got the number via a serial console and want to reassemble it back together into a float.
The way I would do it in C is something like this:
bit_data = ((int16_t)byte_array[0] << 8) | byte_array[1];
result = (float)bit_data;
What I tried to use in python is a much more simple conversion:
result = int_list[0]*256.0 + int_list[1]
However, this does not preserve the sign of the result, as the C code does.
What is the right way to do this in python?
UPDATE:
Python version is 2.7.3.
My byte array has a length of 2.
in the python code byte_array is list of ints. I've renamed it to avoid misunderstanding. I can not just use the float() function because it will not preserve the sign of the number.
I'm a bit confused by what data you have, and how it is represented in Python. As I understand it, you have received two unsigned bytes over a serial connection, which are now represented by a list of two python ints. This data represents a big endian 16-bit signed integer, which you want to extract and turn into a float. eg. [0xFF, 0xFE] -> -2 -> -2.0
import array, struct
two_unsigned_bytes = [255, 254] # represented by ints
byte_array = array.array("B", two_unsigned_bytes)
# change above to "b" if the ints represent signed bytes ie. in range -128 to 127
signed_16_bit_int, = struct.unpack(">h", byte_array)
float_result = float(signed_16_bit_int)
I think what you want is the struct module.
Here's a round trip snippet:
import struct
sampleValue = 42.13
somebytes = struct.pack('=f', sampleValue)
print(somebytes)
result = struct.unpack('=f', somebytes)
print(result)
result may be surprising to you. unpack returns a tuple. So to get to the value you can do
result[0]
or modify the result setting line to be
result = struct.unpack('=f', some bytes)[0]
I personally hate that, so use the following instead
result , = struct.unpack('=f', some bytes) # tuple unpacking on assignment
The second thing you'll notice is that the value has extra digits of noise. That's because python's native floating point representation is double.
(This is python3 btw, adjust for using old versions of python as appropriate)
I am not sure I really understand what you are doing, but I think you got 4 bytes from a stream and know them to represent a float32 value. The way you handling this suggests big-endian byte-order.
Python has the struct package (https://docs.python.org/2/library/struct.html) to handle bytestreams.
import struct
stream = struct.pack(">f", 2/3.)
len(stream) # 4
reconstructed_float = struct.unpack(">f", stream)
Okay, so I think int_list isn't really just a list of ints. The ints are constrained to 0-255 and represent bytes that can be built into a signed integer. You then want to turn that into a float. The trick is to set the sign of the first byte properly and then procede much like you did.
float((-(byte_array[0]-127) if byte_array[0]>127 else byte_array[0])*256 + byte_array[1])

Convert decimal int to little endian string ('\x##\x##...')

I want to convert an integer value to a string of hex values, in little endian. For example, 5707435436569584000 would become '\x4a\xe2\x34\x4f\x4a\xe2\x34\x4f'.
All my googlefu is finding for me is hex(..) which gives me '0x4f34e24a4f34e180' which is not what I want.
I could probably manually split up that string and build the one I want but I'm hoping somone can point me to a better option.
You need to use the struct module:
>>> import struct
>>> struct.pack('<Q', 5707435436569584000)
'\x80\xe14OJ\xe24O'
>>> struct.pack('<Q', 5707435436569584202)
'J\xe24OJ\xe24O'
Here < indicates little-endian, and Q that we want to pack a unsigned long long (8 bytes).
Note that Python will use ASCII characters for any byte that falls within the printable ASCII range to represent the resulting bytestring, hence the 14OJ, 24O and J parts of the above result:
>>> struct.pack('<Q', 5707435436569584202).encode('hex')
'4ae2344f4ae2344f'
>>> '\x4a\xe2\x34\x4f\x4a\xe2\x34\x4f'
'J\xe24OJ\xe24O'
I know it is an old thread, but it is still useful. Here my two cents using python3:
hex_string = hex(5707435436569584202) # '0x4f34e24a4f34e180' as you said
bytearray.fromhex(hex_string[2:]).reverse()
So, the key is convert it to a bytearray and reverse it.
In one line:
bytearray.fromhex(hex(5707435436569584202)[2:])[::-1] # bytearray(b'J\xe24OJ\xe24O')
PS: You can treat "bytearray" data like "bytes" and even mix them with b'raw bytes'
Update:
As Will points in coments, you can also manage negative integers:
To make this work with negative integers you need to mask your input with your preferred int type output length. For example, -16 as a little endian uint32_t would be bytearray.fromhex(hex(-16 & (2**32-1))[2:])[::-1], which evaluates to bytearray(b'\xf0\xff\xff\xff')

Categories

Resources