Using 32-bit ints and operands - python

Is it possible to somehow override or overload the standard implementation of ints/numbers in python so that it acts like a 32-bit int.
a: int
a = 4076863488
>>> -218103808
Or is it possible to somehow define a variable that can't change type? Doing something like: x: int?
I want to do this because it's annoying to write ctypes.c_int32(n) on every bit operation and assignment. Especially since Python does not use 32 bits bitwise operands.
I know I'm basically trying to change the nature of the language. So maybe I'm asking what you would do if you had to do 32-bit stuff in python.

Some options:
Use Cython. You can declare a native 32-bit int type there, and you even get the advantage that pure numerical code gets compiled to (very) fast C code.
Use a numpy array of a single element: np.zeros((1,), dtype=np.int32). Provided you only ever use in-place operations (+=, *=, etc.), this will work like a 32-bit int type. Do be aware that if you ever use a regular binary operator (e.g. myint + 3), you might be subjected to type promotion or conversion, and the result will no longer be int32.
Use ctypes.c_int32. This comes built-in to Python, but supports no mathematical operations so you have to wrap and unwrap yourself (e.g. newval = c_int32(v1.value + v2.value)).
Use a library like fixedint (shameless plug), which provides fixed-integer classes that remain fixed size through operations rather than decaying to int. fixedint was specifically designed with fixed-width bitwise math in mind. In this case you would use fixedint.Int32.
Some less desirable options:
struct: Throws errors if your input is out of range. You can work around this with unpack('i', pack('I', val & 0xffffffff))[0], but that's really unwieldy.
array: Throws errors if you try to store a value out of range. Harder to work around than struct.
Manual bitmashing. With an unsigned 32-bit int, this is just a matter of adding & 0xffffffff a lot, which is not too bad. But, Python doesn't have any built-in way to wrap a value to a signed 32-bit int, so you'll have to write your own int32 conversion function and wrap all your operations with it:
def to_int32(val):
val &= ((1<<32)-1)
if val & (1<<31): val -= (1<<32)
return val
Demonstrations of your options:
Cython
cpdef int munge(int val):
cdef int x
x = val * 32
x += 0x7fffffff
return x
Save as int_test.pyx and compile with cythonize -a -i int_test.pyx.
>>> import int_test
>>> int_test.munge(3)
-2147483553
NumPy
import numpy as np
def munge(val):
x = val.copy()
x *= 32
x += 0x7fffffff
return x
def to_int32(val):
return np.array((val,), dtype=np.int32)
print(munge(to_int32(3)))
# prints [-2147483553]
ctypes
from ctypes import c_int32
def munge(val):
x = c_int32(val.value * 32)
x = c_int32(x.value + 0x7fffffff)
return x
print(munge(c_int32(3)))
# prints c_int(-2147483553)
fixedint
import fixedint
def munge(val):
x = val * 32
x += 0x7fffffff
return x
print(munge(fixedint.Int32(3)))
# prints -2147483553

Related

How to work around the limitation of `len` function in Python?

The len builtin function in Python is limited to the system's integer length. So, in my case, it is limited to sys.maxsize which is 2147483647. However, the in light with the Python3's unlimited integer, I think this limitation is causing frustration. Is there any workarounds to overcome this limitation? For example, I would like to get the length of this:
range(3, 100000000000000000000, 3)
But this:
len(range(3, 100000000000000000000, 3))
returns this error:
OverflowError: Python int too large to convert to C ssize_t
Unless you plan to have a plethora of lazily-iterable types with massive capacities, you could special-case range and do the math yourself:
def robustish_len(c):
try:
return len(c)
except OverflowError:
return (c.stop - c.start + c.step - 1) // c.step
Or, alternatively:
def robust_len(c):
try:
return len(c)
except OverflowError:
return float('inf') # close enough :)
This seems like a bug in Python. At least for classes, you could replace
len(c)
with
c.__len__()

How do I pack bits from one byte array to another efficiently in python3?

I have a fairly large byte array in python. In the simplest situation the byte array only contains 0 or 1 values (0x00, 0x01), also the array is always a multiple of 8 in length. How can I pack these "bits" into another byte array (it doesn't need to be mutable) so the source index zero goes to the MSB of the first output byte etc.
For example if src = bytearray([1,0,0,0,1,0,0,1, 1,1,1,0,0,0,1,0, 1,1,1,1,1,1,1,1])
Desired output would be b'\x89\xe2\xff'.
I could do it with a for loop and bit shifting and or-ing and concatenation, but there surely is a faster/better built-in way to do this.
In a follow up question, I also might want to have the source byte array contain values from the set 0-3 and pack these 4 at a time into the output array. Is there a way of doing that?
In general is there a way of interpreting elements of a list as true or false and packing them 8 at a time into a byte array?
As ridiculous as it may sound, the fastest solution using builtins may be to build a string and pass it to int, much as the fastest way to count 1-bits in an int is bin(n).count('1'). And it's dead simple, too:
def unbitify_byte(src):
s = ''.join(map(str, src))
n = int(s, 2)
return n.to_bytes(len(src)//8, 'big')
Equivalent (but marginally more complex) code using gmpy2 instead of native Python int is a bit faster.
And you can extend it to 2-bit values pretty easily:
def unhalfnybblify_byte(src):
s = ''.join(map(str, src))
n = int(s, 4)
return n.to_bytes(len(src)//4, 'big')
If you want something more flexible, but possibly slower, here's a simple solution using ctypes.
If you know C, you can probably see a struct of 8 single-bit bit-fields would come in handy here. And you can write the equivalent struct type in Python like this:
class Bits(ctypes.Structure):
_fields_ = [(f'bit{8-i}', ctypes.c_uint, 1) for i in range(8)]
And you can construct one of them from 8 ints that are all 0 or 1:
bits = Bits(*src[:8])
And you can convert that to a single int by using an ugly cast or a simple union:
class UBits(ctypes.Union):
_fields_ = [('bits', Bits), ('i', ctypes.c_uint8)]
i = UBits(Bits(*src[:8])).i
So now it's just a matter of chunking src into groups of 8 in big-endian order:
chunks = (src[i:i+8][::-1] for i in range(0, len(src), 8))
dst = bytearray(UBits(Bits(*chunk)).i for chunk in chunks)
And it should be pretty obvious how to extend this to four 2-bit fields, or two 4-bit fields, or even two 3-bit fields and a 2-bit field, per byte.
However, despite looking like low-level C code, it's probably slower. Still, it might be worth testing to see if it's fast enough for your uses.
A custom C extension can probably do better. And there are a number of bit-array-type modules on PyPI to try out. But if you want to go down that road, numpy is the obvious answer. You can't get any simpler than this:
np.packbits(src)
(A bytearray works just fine as an "array-like".)
It's also hard to beat for speed.
For comparison, here's some measurements:
60ns/byte + 0.3µs: np.packbits on an array instead of a bytearray
60ns/byte + 1.9µs: np.packbits
440ns/byte + 3.2µs: for and bit-twiddling in PyPy instead of CPython
570µs/byte + 3.8µs: int(…, 2).to_bytes(…) in PyPy instead of CPython
610ns/byte + 9.1µs: bitarray
800ns/byte + 2.9µs: gmpy.mpz(…)…
1.0µs/byte + 2.8µs: int(…, 2).to_bytes(…)
2.9µs/byte + 0.2µs: (UBits(Bits(*chunk)) …)
16.µs/byte + 0.9µs: for and bit-twiddling
Using numpy, with test code and comments:
#!/usr/bin/env python3
import numpy as np
def pack_bits(a):
# big-endian - use '<u8' if you want little-endian
#0000000A0000000B0000000C0000000D0000000E0000000F0000000G0000000H
b = np.copy(a.view('>u8'))
#0000000A000000AB000000BC000000CD000000DE000000EF000000FG000000GH
b |= b >> 7
#0000000A000000AB00000ABC0000ABCD0000BCDE0000CDEF0000DEFG0000EFGH
b |= b >> 14
#0000000A000000AB00000ABC0000ABCD000ABCDE00ABCDEF0ABCDEFGABCDEFGH
b |= b >> 28
return np.array(b, dtype='u1')
def main():
a = []
for i in range(256):
# build 8-bit lists without numpy, then convert
a.append(np.array([int(b) for b in bin(256 + i)[2+1:]], dtype='u1'))
a = np.array(a)
print(a)
b = pack_bits(a)
print(b)
if __name__ == '__main__':
main()
Similar code exists for other deinterleaving, bit since the number of bits between inputs is less than the number of bytes in a word, we can avoid the masking here (note that the 0ABCDEFG does not overlap the ABCDEFGH).

Find maximum integer type at runtime in numpy

I would like to know if there is a way to find out the maximum, for the sake of having something specific let's say, integer type (or unsigned integer, or float, or complex - any "fixed size" type) supported by numpy at runtime. That is, let's assume that I know (from documentation) that largest unsigned integer type in the current version of numpy is np.uint64 and I have a line of code such as:
y = np.uint64(x)
I would like my code to use whatever is the largest, let's say, unsigned integer type available in the version of numpy that my code uses. That is, I would be interested in replacing the above hardcoded type with something like this:
y = np.largest_uint_type(x)
Is there such a method?
You can use np.sctypes:
>>> def largest_of_kind(kind):
... return max(np.sctypes[kind], key=lambda x: np.dtype(x).itemsize)
...
>>> largest_of_kind('int')
<class 'numpy.int64'>
>>> largest_of_kind('uint')
<class 'numpy.uint64'>
>>> largest_of_kind('float')
<class 'numpy.float128'>
>>> largest_of_kind('complex')
<class 'numpy.complex256'>
While I do like #PaulPanzer solution, I also found that numpy defines a function maximum_sctype() not documented in numpy's standard docs. This function fundamentally does the same thing as #PaulPanzer solution (plus some edge case analysis). From the code it is clear that sctype types are sorted in the increasing size order. Using this function, what I need can be done as follows:
y = np.maximum_sctype(np.float)(x) # currently np.float128 on OSX
y = np.maximum_sctype(np.uint8)(x) # currently np.uint64
etc.
Not so elegant, but using the prior knowledge that np.uint is always an exponent of 2, you can do something like that:
for i in range(4,100):
try:
eval('np.uint'+str(2**i)+'(0)')
except:
c=i-1
break
answer='np.uint'+str(2**c)
>>answer
Out[657]: 'np.uint64'
and you can use it as
y=eval(answer+'('+str(x)+')')
or, alternatively without the assumption of exp(2) and with no eval (check all the numbers up to N, here 1000):
for i in range(1000):
if hasattr(np,'uint'+str(i)):
x='uint'+str(i)
>>x
Out[662]: 'uint64'

Using large index in python (numpy or lists)

I frequently need to enter large integers for indexing and creating numpy arrays, such as 3500000 or 250000. Normally I'd enter these using scientific notation, 3.5e6 or .25e6 or such. This is quicker, and much less likely to have errors.
Unfortunately, python expects integer datatypes for indexing. The obvious solution is to convert datatypes. So [5e5:1e6] becomes [int(5e5):int(1e6)], but this decreases readability and is somewhat longer to type. Not to mention, it's easy to forget what datatype an index is until an indexing operation fails on a list or numpy.ndarray.
Is there a way to have numpy or python interpret large floats as integers, or is there an easy way to create large integers in python?
In a comment you considered having e5 = 10**5 for use as in 35*e5, lamenting it doesn't support 3.5*e6. Here's a hack that does:
class E:
def __init__(self, e):
self.val = 10**e
def __rmul__(self, x):
return int(x * self.val)
Demo:
>>> e6 = E(6)
>>> 3.5*e6
3500000
Though due to floats being lossy, this can lead to slight inaccurracies, for example:
>>> 0.1251*e6
125099
Here's a better hack, building the literal '0.1251e6' and evaluating that:
class E:
def __init__(self, e):
self.e = e
def __rmul__(self, x):
return int(float('%se%d' % (x, self.e)))
Demo:
>>> e6 = E(6)
>>> 0.1251*e6
125100
If you're worried about mistakes in the number of zeros, try underscores.
>>> 3_500_000
3500000
My cheap solution is to create a helper function in proper scope.
def e(coeff, exponent):
return int (coeff * 10 ** exponent)
np_array[e(3.5,6)] # use like this
But this cheaper answer may cause round off error,
Create alias for int in proper scope is simple and clean solution.
e=int # in proper scope
I can propose to use such notation [5*10**5:1*10**6] but it's not so clear as in case of 5e5 and 1e6. And even worse in case of 3.5e6 = 35*10**5
You can add a shorter name to int() such as I
I = int
x = I(3.5e6)
print (x)
#3500000
This still allows use of int() normally
This should fix the problems with indexing lists and arrays with floats,
slice_orig = slice
def slice(*args):
return slice_orig(*[int(i) for i in args])
slice.__doc__ = slice_orig.__doc__+ """
WARNING: overridden to convert (stop, start, step) to integers"""
It doesn't allow using large numbers with other numpy functions requiring an int type.
EDIT: This has to be used explicitly, such as list[slice(1e5)], so it's not as useful as I expected.

long double returns and ctypes

i have a c function which returns a long double. i'd like to call this function from python using ctypes, and it mostly works. setting so.func.restype = c_longdouble does the trick -- except that python's float type is a c_double so if the returned value is larger than a double, but well within the bounds of a long double, python still gets inf as the return value. i'm on a 64 bit processor and sizeof(long double) is 16.
any ideas on getting around this (e.g. using the decimal class or numpy) without modifying the c code?
I'm not sure you can do it without modifying the C code. ctypes seems to have really bad support for long doubles - you can't manipulate them like numbers at all, all you can do is convert them back and forth between the native float Python type.
You can't even use a byte array as the return value instead of a c_longdouble, because of the ABI - floating-point values aren't returned in the %eax register or on the stack like normal return values, they're passed through the hardware-specific floating-point registers.
If you have a function return a subclass of c_longdouble, it will return the ctypes wrapped field object rather than converting to a python float. You can then extract the bytes from this (with memcpy into a c_char array, for example) or pass the object to another C function for further processing. The snprintf function can format it into a string for printing or conversion into a high-precision python numeric type.
import ctypes
libc = ctypes.cdll['libc.so.6']
libm = ctypes.cdll['libm.so.6']
class my_longdouble(ctypes.c_longdouble):
def __str__(self):
size = 100
buf = (ctypes.c_char * size)()
libc.snprintf(buf, size, '%.35Le', self)
return buf[:].rstrip('\0')
powl = libm.powl
powl.restype = my_longdouble
powl.argtypes = [ctypes.c_longdouble, ctypes.c_longdouble]
for i in range(1020,1030):
res = powl(2,i)
print '2**'+str(i), '=', str(res)
Output:
2**1020 = 1.12355820928894744233081574424314046e+307
2**1021 = 2.24711641857789488466163148848628092e+307
2**1022 = 4.49423283715578976932326297697256183e+307
2**1023 = 8.98846567431157953864652595394512367e+307
2**1024 = 1.79769313486231590772930519078902473e+308
2**1025 = 3.59538626972463181545861038157804947e+308
2**1026 = 7.19077253944926363091722076315609893e+308
2**1027 = 1.43815450788985272618344415263121979e+309
2**1028 = 2.87630901577970545236688830526243957e+309
2**1029 = 5.75261803155941090473377661052487915e+309
(Note that my estimate of 35 digits of precision turned out to be excessively optimistic for long double calculations on Intel processors, which only have 64 bits of mantissa. You should use %a rather than %e/f/g if you intend to convert to a format that is not based on decimal representation.)
If you need high-precision floating point, have a look at GMPY.
GMPY is a C-coded Python extension module that wraps the GMP library to provide to Python code fast multiprecision arithmetic (integer, rational, and float), random number generation, advanced number-theoretical functions, and more.
GMP contains high-level floating-point arithmetic functions (mpf). This is the GMP function category to use if the C type `double' doesn't give enough precision for an application. There are about 65 functions in this category.

Categories

Resources