Byte array in Python only accepts unsigned integers - python

I'm working on audio codecs in Python (yikes) using byte arrays to edit individual byte data from an audio file.
I have a certain encryption in mind that requires me to perform bit wise operations on single bytes stored in the byte-array.
One of those operations is the ~ operator (bitwise NOT) which essentially reverses the bit (b'0001 becomes b'1110).
The problem is when you reference a single element of a byte array, it returns an int (does Python by default consider untyped 8 bit data integers?). Integers in Python are by default signed (I don't think unsigned integers even exist in Python).
When you try to perform bit-wise NOT on a byte in the byte-array, you get the following error:
>>> array[0] = ~array[0]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: byte must be in range(0, 256)
This is because it expects an unsigned int between 0-255. How do I convert a signed int into an unsigned int such that the bits used to represent both values remain the same?
Cheers

Use a different operation for bit flipping.
E.g.:1
array[i] = 255 - array[i]
or also:
array[i] = 255 ^ array[i]
will flip all (i.e.: 8) bits.
1 the math behind this can be worked out from two's complement wikipedia page.

The solution is actually remarkably simple after playing around with a binary calculator a little bit.
Just subtract the magnitude of the SIGNED int from 256, to get the value of the UNSIGNED int with the same binary representation.
So,
-23 signed would be 233 unsigned.
Hope this helps anyone else looking for a solution :)
EDIT: For those saying answer is 255 - array[0]. In this case I'm looking for a way to go from post NOT'd int to its unsigned counter part. So I've already performed the bitwise NOT on the integer, now I'm just getting it back to a form that can be inputted into the byte-array.
So in the end it would look something like this:
tmp = ~array[0]
array[0] = 256 + tmp
or
array[0] = 256 - abs(tmp)
This gets me the correct answer :)

Related

Saving and loading bits/bytes in Python

I've been studying compression algorithms recently, and I'm trying to understand how I can store integers as bits in Python to save space.
So first I save '1' and '0' as strings in Python.
import os
import numpy as np
array= np.random.randint(0, 2, size = 200)
string = [str(i) for i in array]
with open('testing_int.txt', 'w') as f:
for i in string:
f.write(i)
print(os.path.getsize('testing_int.txt'))
I get back 200 bytes which makes sense, since each each char is represented by one byte in ascii (and utf-8 as well if characters are latin?).
Now if trying to save these ones and zeroes as bits, I should only take up around 25 bytes right?
200 bits/8 = 25 bytes.
However, when I try the following code below, I get 105 bytes.
Am I doing something wrong?
Using the same 'array variable' as above I tried this:
bytes_string = [bytes(i) for i in array]
with open('testing_bytes.txt', 'wb') as f:
for i in bytes_string:
f.write(i)
Then I tried this:
bin_string = [bin(i) for i in array]
with open('testing_bin.txt', 'wb') as f:
for i in bytes_string:
f.write(i)
This also takes up around 105 bytes.
So I tried looking at the text files, and I noticed that
both the 'bytes.txt' and 'bin.txt' are blank.
So I tried to read the 'bytes.txt' file via this code:
with open(r"C:\Users\Moondra\Desktop\testing_bytes\testing_bytes.txt", 'rb') as f:
x =f.read()
Now I get get back as this :
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
So I tried these commands:
>>> int.from_bytes(x, byteorder='big')
0
>>> int.from_bytes(x, byteorder='little')
0
>>>
So apparently I'm doing multiple things incorrectly.
I can't figure out:
1) Why I am not getting a text file that is 25 bytes
2) Why can I read back the bytes file correctly.
Thank you.
bytes_string = [bytes(i) for i in array]
It looks like you expect bytes(x) to give you a one-byte bytes object with the value of x. Follow the documentation, and you'll see that bytes() is initialized like bytearray(), and bytearray() says this about its argument:
If it is an integer, the array will have that size and will be initialized with null bytes.
So bytes(0) gives you an empty bytes object, and bytes(1) gives you a single byte with the ordinal zero. That's why bytes_string is about half the size of array and is made up completely of zero bytes.
As for why the bin() example didn't work, it looks like a simple case of copy-pasting and forgetting to change bytes_string to bin_string in the for loop.
This all still doesn't accomplish your goal of treating 0 or 1 value integers as bits. Python doesn't really have that sort of functionality built in. There are third-party modules that allow you to work at the bit level, but I can't speak to any of them specifically. Personally I would probably just roll my own specific to the application.
It looks like you're trying to bit shift all the values into a single byte. For example, you expect the integer values [0,1,0,1,0,1,0,1] to be packed into a byte that looks like the following binary number: 0b01010101. To do this, you need to use the bitwise shift operator and bitwise or operator along with the struct module to pack the values into an unsigned Char which represents the sequence of int values you have.
The code below takes the array of random integers in range [0,1] and shifts them together to make a binary number that can be packed into a single byte. I used 256 ints for convenience. The expected number of bytes for the file to be is then 32 (256/8). You will see that when it is run this is indeed what you get.
import struct
import numpy as np
import os
a = np.random.randint(0, 2, size = 256)
bool_data = []
bin_vals = []
for i in range(0, len(a), 8):
bin_val = (a[i] << 0) | (a[i+1] << 1) | \
(a[i+2] << 2) | (a[i+3] << 3) | \
(a[i+4] << 4) | (a[i+5] << 5) | \
(a[i+6] << 6) | (a[i+7] << 7)
bin_vals.append(struct.pack('B', bin_val))
with open("output.txt", 'wb') as f:
for val in bin_vals:
f.write(val)
print(os.path.getsize('output.txt'))
Please note, however, that this will only work for values of integers in the range [0,1] since if they are bigger it will shift more non-zeros and wreck the structure of the generated byte. The binary number may also exceed 1 byte in size in this case.
It seems like you're just using python in attempt to generate an array of bits for demonstration purposes, and to that token I would say that python probably isn't best suited for this. I would recommend using a lower level language such as C/C++ which has more direct access to data type than python does.

How to turn the first n bits of a digest into a integer?

I'm workign with Python 3, trying get an integer out of a digest in python. I'm only interested in the first n bits of the digest though.
What I have right now is this:
n = 3
int(hashlib.sha1(b'test').digest()[0:n])
This however results in a ValueError: invalid literal for int() with base 10: b'\xa9J' error.
Thanks.
The Py3 solution is to use int.from_bytes to convert bytes to int, then shift off the part you don't care about:
def bitsof(bt, nbits):
# Directly convert enough bytes to an int to ensure you have at least as many bits
# as needed, but no more
neededbytes = (nbits+7)//8
if neededbytes > len(bt):
raise ValueError("Require {} bytes, received {}".format(neededbytes, len(bt)))
i = int.from_bytes(bt[:neededbytes], 'big')
# If there were a non-byte aligned number of bits requested,
# shift off the excess from the right (which came from the last byte processed)
if nbits % 8:
i >>= 8 - nbits % 8
return i
Example use:
>>> bitsof(hashlib.sha1(b'test').digest(), 3)
5 # The leftmost bits of the a nibble that begins the hash
On Python 2, the function can be used almost as is, aside from adding a binascii import, and changing the conversion from bytes to int to the slightly less efficient two step conversion (from str to hex representation, then using int with base of 16 to parse it):
i = int(binascii.hexlify(bt[:neededbytes]), 16)
Everything else works as is (even the // operator works as expected; Python 2's / operator is different from Py 3's, but // works the same on both).

STL binary file reader with Python

I'm trying to write my "personal" python version of STL binary file reader, according to WIKIPEDIA : A binary STL file contains :
an 80-character (byte) headern which is generally ignored.
a 4-byte unsigned integer indicating the number of triangular facets in the file.
Each triangle is described by twelve 32-bit floating-point numbers: three for the normal and then three for the X/Y/Z coordinate of each vertex – just as with the ASCII version of STL. After these follows a 2-byte ("short") unsigned integer that is the "attribute byte count" – in the standard format, this should be zero because most software does not understand anything else. --Floating-point numbers are represented as IEEE floating-point numbers and are assumed to be little-endian--
Here is my code :
#! /usr/bin/env python3
with open("stlbinaryfile.stl","rb") as fichier :
head=fichier.read(80)
nbtriangles=fichier.read(4)
print(nbtriangles)
The output is :
b'\x90\x08\x00\x00'
It represents an unsigned integer, I need to convert it without using any package (struct,stl...). Are there any (basic) rules to do it ?, I don't know what does \x mean ? How does \x90 represent one byte ?
most of the answers in google mention "C structs", but I don't know nothing about C.
Thank you for your time.
Since you're using Python 3, you can use int.from_bytes. I'm guessing the value is stored little-endian, so you'd just do:
nbtriangles = int.from_bytes(fichier.read(4), 'little')
Change the second argument to 'big' if it's supposed to be big-endian.
Mind you, the normal way to parse a fixed width type is the struct module, but apparently you've ruled that out.
For the confusion over the repr, bytes objects will display ASCII printable characters (e.g. a) or standard ASCII escapes (e.g. \t) if the byte value corresponds to one of them. If it doesn't, it uses \x##, where ## is the hexadecimal representation of the byte value, so \x90 represents the byte with value 0x90, or 144. You need to combine the byte values at offsets to reconstruct the int, but int.from_bytes does this for you faster than any hand-rolled solution could.
Update: Since apparent int.from_bytes isn't "basic" enough, a couple more complex, but only using top-level built-ins (not alternate constructors) solutions. For little-endian, you can do this:
def int_from_bytes(inbytes):
res = 0
for i, b in enumerate(inbytes):
res |= b << (i * 8) # Adjust each byte individually by 8 times position
return res
You can use the same solution for big-endian by adding reversed to the loop, making it enumerate(reversed(inbytes)), or you can use this alternative solution that handles the offset adjustment a different way:
def int_from_bytes(inbytes):
res = 0
for b in inbytes:
res <<= 8 # Adjust bytes seen so far to make room for new byte
res |= b # Mask in new byte
return res
Again, this big-endian solution can trivially work for little-endian by looping over reversed(inbytes) instead of inbytes. In both cases inbytes[::-1] is an alternative to reversed(inbytes) (the former makes a new bytes in reversed order and iterates that, the latter iterates the existing bytes object in reverse, but unless it's a huge bytes object, enough to strain RAM if you copy it, the difference is pretty minimal).
The typical way to interpret an integer is to use struct.unpack, like so:
import struct
with open("stlbinaryfile.stl","rb") as fichier :
head=fichier.read(80)
nbtriangles=fichier.read(4)
print(nbtriangles)
nbtriangles=struct.unpack("<I", nbtriangles)
print(nbtriangles)
If you are allergic to import struct, then you can also compute it by hand:
def unsigned_int(s):
result = 0
for ch in s[::-1]:
result *= 256
result += ch
return result
...
nbtriangles = unsigned_int(nbtriangles)
As to what you are seeing when you print b'\x90\x08\x00\x00'. You are printing a bytes object, which is an array of integers in the range [0-255]. The first integer has the value 144 (decimal) or 90 (hexadecimal). When printing a bytes object, that value is represented by the string \x90. The 2nd has the value eight, represented by \x08. The 3rd and final integers are both zero. They are presented by \x00.
If you would like to see a more familiar representation of the integers, try:
print(list(nbtriangles))
[144, 8, 0, 0]
To compute the 32-bit integers represented by these four 8-bit integers, you can use this formula:
total = byte0 + (byte1*256) + (byte2*256*256) + (byte3*256*256*256)
Or, in hex:
total = byte0 + (byte1*0x100) + (byte2*0x10000) + (byte3*0x1000000)
Which results in:
0x00000890
Perhaps you can see the similarities to decimal, where the string "1234" represents the number:
4 + 3*10 + 2*100 + 1*1000

Convert a list of ints to a float

I am trying to convert a number stored as a list of ints to a float type. I got the number via a serial console and want to reassemble it back together into a float.
The way I would do it in C is something like this:
bit_data = ((int16_t)byte_array[0] << 8) | byte_array[1];
result = (float)bit_data;
What I tried to use in python is a much more simple conversion:
result = int_list[0]*256.0 + int_list[1]
However, this does not preserve the sign of the result, as the C code does.
What is the right way to do this in python?
UPDATE:
Python version is 2.7.3.
My byte array has a length of 2.
in the python code byte_array is list of ints. I've renamed it to avoid misunderstanding. I can not just use the float() function because it will not preserve the sign of the number.
I'm a bit confused by what data you have, and how it is represented in Python. As I understand it, you have received two unsigned bytes over a serial connection, which are now represented by a list of two python ints. This data represents a big endian 16-bit signed integer, which you want to extract and turn into a float. eg. [0xFF, 0xFE] -> -2 -> -2.0
import array, struct
two_unsigned_bytes = [255, 254] # represented by ints
byte_array = array.array("B", two_unsigned_bytes)
# change above to "b" if the ints represent signed bytes ie. in range -128 to 127
signed_16_bit_int, = struct.unpack(">h", byte_array)
float_result = float(signed_16_bit_int)
I think what you want is the struct module.
Here's a round trip snippet:
import struct
sampleValue = 42.13
somebytes = struct.pack('=f', sampleValue)
print(somebytes)
result = struct.unpack('=f', somebytes)
print(result)
result may be surprising to you. unpack returns a tuple. So to get to the value you can do
result[0]
or modify the result setting line to be
result = struct.unpack('=f', some bytes)[0]
I personally hate that, so use the following instead
result , = struct.unpack('=f', some bytes) # tuple unpacking on assignment
The second thing you'll notice is that the value has extra digits of noise. That's because python's native floating point representation is double.
(This is python3 btw, adjust for using old versions of python as appropriate)
I am not sure I really understand what you are doing, but I think you got 4 bytes from a stream and know them to represent a float32 value. The way you handling this suggests big-endian byte-order.
Python has the struct package (https://docs.python.org/2/library/struct.html) to handle bytestreams.
import struct
stream = struct.pack(">f", 2/3.)
len(stream) # 4
reconstructed_float = struct.unpack(">f", stream)
Okay, so I think int_list isn't really just a list of ints. The ints are constrained to 0-255 and represent bytes that can be built into a signed integer. You then want to turn that into a float. The trick is to set the sign of the first byte properly and then procede much like you did.
float((-(byte_array[0]-127) if byte_array[0]>127 else byte_array[0])*256 + byte_array[1])

How to convert a binary file into a Long integer?

In python, long integers have an unlimited range. Is there a simple way to convert a binary file (e.g., a photo) into a single long integer?
Using the bitstring module it's just:
bitstring.BitString(filename='your_file').uint
If you prefer you can get a signed integer using the int property.
Internally this is using struct.unpack to convert chunks of bytes, which is more efficient than doing it per byte.
Here's one way to do it.
def file_to_number(f):
number = 0
for line in f:
for char in line:
number = ord(char) | (number << 8)
return number
You might get a MemoryError eventually.

Categories

Resources