Using large index in python (numpy or lists) - python

I frequently need to enter large integers for indexing and creating numpy arrays, such as 3500000 or 250000. Normally I'd enter these using scientific notation, 3.5e6 or .25e6 or such. This is quicker, and much less likely to have errors.
Unfortunately, python expects integer datatypes for indexing. The obvious solution is to convert datatypes. So [5e5:1e6] becomes [int(5e5):int(1e6)], but this decreases readability and is somewhat longer to type. Not to mention, it's easy to forget what datatype an index is until an indexing operation fails on a list or numpy.ndarray.
Is there a way to have numpy or python interpret large floats as integers, or is there an easy way to create large integers in python?

In a comment you considered having e5 = 10**5 for use as in 35*e5, lamenting it doesn't support 3.5*e6. Here's a hack that does:
class E:
def __init__(self, e):
self.val = 10**e
def __rmul__(self, x):
return int(x * self.val)
Demo:
>>> e6 = E(6)
>>> 3.5*e6
3500000
Though due to floats being lossy, this can lead to slight inaccurracies, for example:
>>> 0.1251*e6
125099
Here's a better hack, building the literal '0.1251e6' and evaluating that:
class E:
def __init__(self, e):
self.e = e
def __rmul__(self, x):
return int(float('%se%d' % (x, self.e)))
Demo:
>>> e6 = E(6)
>>> 0.1251*e6
125100

If you're worried about mistakes in the number of zeros, try underscores.
>>> 3_500_000
3500000

My cheap solution is to create a helper function in proper scope.
def e(coeff, exponent):
return int (coeff * 10 ** exponent)
np_array[e(3.5,6)] # use like this
But this cheaper answer may cause round off error,
Create alias for int in proper scope is simple and clean solution.
e=int # in proper scope

I can propose to use such notation [5*10**5:1*10**6] but it's not so clear as in case of 5e5 and 1e6. And even worse in case of 3.5e6 = 35*10**5

You can add a shorter name to int() such as I
I = int
x = I(3.5e6)
print (x)
#3500000
This still allows use of int() normally

This should fix the problems with indexing lists and arrays with floats,
slice_orig = slice
def slice(*args):
return slice_orig(*[int(i) for i in args])
slice.__doc__ = slice_orig.__doc__+ """
WARNING: overridden to convert (stop, start, step) to integers"""
It doesn't allow using large numbers with other numpy functions requiring an int type.
EDIT: This has to be used explicitly, such as list[slice(1e5)], so it's not as useful as I expected.

Related

Using 32-bit ints and operands

Is it possible to somehow override or overload the standard implementation of ints/numbers in python so that it acts like a 32-bit int.
a: int
a = 4076863488
>>> -218103808
Or is it possible to somehow define a variable that can't change type? Doing something like: x: int?
I want to do this because it's annoying to write ctypes.c_int32(n) on every bit operation and assignment. Especially since Python does not use 32 bits bitwise operands.
I know I'm basically trying to change the nature of the language. So maybe I'm asking what you would do if you had to do 32-bit stuff in python.
Some options:
Use Cython. You can declare a native 32-bit int type there, and you even get the advantage that pure numerical code gets compiled to (very) fast C code.
Use a numpy array of a single element: np.zeros((1,), dtype=np.int32). Provided you only ever use in-place operations (+=, *=, etc.), this will work like a 32-bit int type. Do be aware that if you ever use a regular binary operator (e.g. myint + 3), you might be subjected to type promotion or conversion, and the result will no longer be int32.
Use ctypes.c_int32. This comes built-in to Python, but supports no mathematical operations so you have to wrap and unwrap yourself (e.g. newval = c_int32(v1.value + v2.value)).
Use a library like fixedint (shameless plug), which provides fixed-integer classes that remain fixed size through operations rather than decaying to int. fixedint was specifically designed with fixed-width bitwise math in mind. In this case you would use fixedint.Int32.
Some less desirable options:
struct: Throws errors if your input is out of range. You can work around this with unpack('i', pack('I', val & 0xffffffff))[0], but that's really unwieldy.
array: Throws errors if you try to store a value out of range. Harder to work around than struct.
Manual bitmashing. With an unsigned 32-bit int, this is just a matter of adding & 0xffffffff a lot, which is not too bad. But, Python doesn't have any built-in way to wrap a value to a signed 32-bit int, so you'll have to write your own int32 conversion function and wrap all your operations with it:
def to_int32(val):
val &= ((1<<32)-1)
if val & (1<<31): val -= (1<<32)
return val
Demonstrations of your options:
Cython
cpdef int munge(int val):
cdef int x
x = val * 32
x += 0x7fffffff
return x
Save as int_test.pyx and compile with cythonize -a -i int_test.pyx.
>>> import int_test
>>> int_test.munge(3)
-2147483553
NumPy
import numpy as np
def munge(val):
x = val.copy()
x *= 32
x += 0x7fffffff
return x
def to_int32(val):
return np.array((val,), dtype=np.int32)
print(munge(to_int32(3)))
# prints [-2147483553]
ctypes
from ctypes import c_int32
def munge(val):
x = c_int32(val.value * 32)
x = c_int32(x.value + 0x7fffffff)
return x
print(munge(c_int32(3)))
# prints c_int(-2147483553)
fixedint
import fixedint
def munge(val):
x = val * 32
x += 0x7fffffff
return x
print(munge(fixedint.Int32(3)))
# prints -2147483553

How do I pack bits from one byte array to another efficiently in python3?

I have a fairly large byte array in python. In the simplest situation the byte array only contains 0 or 1 values (0x00, 0x01), also the array is always a multiple of 8 in length. How can I pack these "bits" into another byte array (it doesn't need to be mutable) so the source index zero goes to the MSB of the first output byte etc.
For example if src = bytearray([1,0,0,0,1,0,0,1, 1,1,1,0,0,0,1,0, 1,1,1,1,1,1,1,1])
Desired output would be b'\x89\xe2\xff'.
I could do it with a for loop and bit shifting and or-ing and concatenation, but there surely is a faster/better built-in way to do this.
In a follow up question, I also might want to have the source byte array contain values from the set 0-3 and pack these 4 at a time into the output array. Is there a way of doing that?
In general is there a way of interpreting elements of a list as true or false and packing them 8 at a time into a byte array?
As ridiculous as it may sound, the fastest solution using builtins may be to build a string and pass it to int, much as the fastest way to count 1-bits in an int is bin(n).count('1'). And it's dead simple, too:
def unbitify_byte(src):
s = ''.join(map(str, src))
n = int(s, 2)
return n.to_bytes(len(src)//8, 'big')
Equivalent (but marginally more complex) code using gmpy2 instead of native Python int is a bit faster.
And you can extend it to 2-bit values pretty easily:
def unhalfnybblify_byte(src):
s = ''.join(map(str, src))
n = int(s, 4)
return n.to_bytes(len(src)//4, 'big')
If you want something more flexible, but possibly slower, here's a simple solution using ctypes.
If you know C, you can probably see a struct of 8 single-bit bit-fields would come in handy here. And you can write the equivalent struct type in Python like this:
class Bits(ctypes.Structure):
_fields_ = [(f'bit{8-i}', ctypes.c_uint, 1) for i in range(8)]
And you can construct one of them from 8 ints that are all 0 or 1:
bits = Bits(*src[:8])
And you can convert that to a single int by using an ugly cast or a simple union:
class UBits(ctypes.Union):
_fields_ = [('bits', Bits), ('i', ctypes.c_uint8)]
i = UBits(Bits(*src[:8])).i
So now it's just a matter of chunking src into groups of 8 in big-endian order:
chunks = (src[i:i+8][::-1] for i in range(0, len(src), 8))
dst = bytearray(UBits(Bits(*chunk)).i for chunk in chunks)
And it should be pretty obvious how to extend this to four 2-bit fields, or two 4-bit fields, or even two 3-bit fields and a 2-bit field, per byte.
However, despite looking like low-level C code, it's probably slower. Still, it might be worth testing to see if it's fast enough for your uses.
A custom C extension can probably do better. And there are a number of bit-array-type modules on PyPI to try out. But if you want to go down that road, numpy is the obvious answer. You can't get any simpler than this:
np.packbits(src)
(A bytearray works just fine as an "array-like".)
It's also hard to beat for speed.
For comparison, here's some measurements:
60ns/byte + 0.3µs: np.packbits on an array instead of a bytearray
60ns/byte + 1.9µs: np.packbits
440ns/byte + 3.2µs: for and bit-twiddling in PyPy instead of CPython
570µs/byte + 3.8µs: int(…, 2).to_bytes(…) in PyPy instead of CPython
610ns/byte + 9.1µs: bitarray
800ns/byte + 2.9µs: gmpy.mpz(…)…
1.0µs/byte + 2.8µs: int(…, 2).to_bytes(…)
2.9µs/byte + 0.2µs: (UBits(Bits(*chunk)) …)
16.µs/byte + 0.9µs: for and bit-twiddling
Using numpy, with test code and comments:
#!/usr/bin/env python3
import numpy as np
def pack_bits(a):
# big-endian - use '<u8' if you want little-endian
#0000000A0000000B0000000C0000000D0000000E0000000F0000000G0000000H
b = np.copy(a.view('>u8'))
#0000000A000000AB000000BC000000CD000000DE000000EF000000FG000000GH
b |= b >> 7
#0000000A000000AB00000ABC0000ABCD0000BCDE0000CDEF0000DEFG0000EFGH
b |= b >> 14
#0000000A000000AB00000ABC0000ABCD000ABCDE00ABCDEF0ABCDEFGABCDEFGH
b |= b >> 28
return np.array(b, dtype='u1')
def main():
a = []
for i in range(256):
# build 8-bit lists without numpy, then convert
a.append(np.array([int(b) for b in bin(256 + i)[2+1:]], dtype='u1'))
a = np.array(a)
print(a)
b = pack_bits(a)
print(b)
if __name__ == '__main__':
main()
Similar code exists for other deinterleaving, bit since the number of bits between inputs is less than the number of bytes in a word, we can avoid the masking here (note that the 0ABCDEFG does not overlap the ABCDEFGH).

Fast way of converting parts of a bit numpy array to base 10 [duplicate]

This question already has answers here:
Binary numpy array to list of integers?
(5 answers)
Closed 5 years ago.
Say I am given a numpy integer array of the form
a=np.array([0,0,0,1,1,0,1,0,1,0,1])
Now suppose I want to extract part of that array from positions i1:i2 and convert it to a base 10 representation. For instance, take i1=4 and i2=8. Then:
base_2=a[i1:i2] # base_2 = np.array([1,0,1,0])
And I would expect the function to return 10=2+8.
My question is the following : What is a fast way to achieve this in Python ?
Consider a function with the following signature:
def base_2_to_10_array(my_array,i1,i2):
return """your awesome answer here"""
One way (don't know if it is the fastest)
>>> a=np.array([0,0,0,1,1,0,1,0,1,0,1])
>>> int(''.join(map(str, a[4:8])), 2)
10
Another way, which I believe to be faster (benchmark), is:
def base_2_to_10_array(arr, i1, i2):
res = 0
for bit in arr[i1:i2][::-1]:
res = (res << 1) ^ bit
return res
This is probably faster because it is entirely binary operations (<< and ^), which are both fast (^ is faster because one of the operands is small, being 0 or 1).
percusse's answer is probably slower because of either mapping with str, or casting to int (might not be as optimized for binary).
type_none's is probably slower due to repeated calling of a lambda, multiplication instead of shifts and adding instead of oring.
Example benchmark results:
Size: 10
percusse: 0.016117284998472314
type_none: 0.004335935998824425
pycoder_3rd_fastest: 0.0028656079957727343
pycoder_2nd_fastest: 0.0033370210003340617
pycoder_fastest: 0.0031539250048808753
Size: 100
percusse: 0.13562769599957392
type_none: 0.04904397700011032
pycoder_3rd_fastest: 0.016703221001080237
pycoder_2nd_fastest: 0.021887271002924535
pycoder_fastest: 0.019885091001924593
Size: 1000
percusse: 1.358273936995829
type_none: 0.7615448830038076
pycoder_3rd_fastest: 0.18778558799385792
pycoder_2nd_fastest: 0.20695334099582396
pycoder_fastest: 0.18905453699699137
Size: 10000
percusse: 14.638380388998485
type_none: 7.554422806002549
pycoder_3rd_fastest: 5.3742733830004
pycoder_2nd_fastest: 2.2020759820006788
pycoder_fastest: 1.9534191700004158
Other attempts, one faster on shorter inputs, can be found in the benchmark link.
I used reduce here. Unlike other answers this doesn't need conversion to string.
from functools import reduce # no need for import in python2
import numpy as np
def arrayToInt(l):
return reduce(lambda x,y: (x<<1) + y, l)
a=np.array([0,0,0,1,1,0,1,0,1,0,1])
number = arrayToInt(a[4:8])
print(number)

Slick way to reverse the (binary) digits of a number in Python?

I am looking for a slick function that reverses the digits of the binary representation of a number.
If f were such a function I would have
int(reversed(s),2) == f(int(s,2)) whenever s is a string of zeros and ones starting with 1.
Right now I am using lambda x: int(''.join(reversed(bin(x)[2:])),2)
which is ok as far as conciseness is concerned, but it seems like a pretty roundabout way of doing this.
I was wondering if there was a nicer (perhaps faster) way with bitwise operators and what not.
How about
int('{0:b}'.format(n)[::-1], 2)
or
int(bin(n)[:1:-1], 2)
The second method seems to be the faster of the two, however both are much faster than your current method:
import timeit
print timeit.timeit("int('{0:b}'.format(n)[::-1], 2)", 'n = 123456')
print timeit.timeit("int(bin(n)[:1:-1], 2)", 'n = 123456')
print timeit.timeit("int(''.join(reversed(bin(n)[2:])),2)", 'n = 123456')
1.13251614571
0.710681915283
2.23476600647
You could do it with shift operators like this:
def revbits(x):
rev = 0
while x:
rev <<= 1
rev += x & 1
x >>= 1
return rev
It doesn't seem any faster than your method, though (in fact, slightly slower for me).
Here is my suggestion:
In [83]: int(''.join(bin(x)[:1:-1]), 2)
Out[83]: 9987
Same method, slightly simplified.
I would argue your current method is perfectly fine, but you can lose the list() call, as str.join() will accept any iterable:
def binary_reverse(num):
return int(''.join(reversed(bin(num)[2:])), 2)
It would also advise against using lambda for anything but the simplest of functions, where it will only be used once, and makes surrounding code clearer by being inlined.
The reason I feel this is fine as it describes what you want to do - take the binary representation of a number, reverse it, then get a number again. That makes this code very readable, and that should be a priority.
There is an entire half chapter of Hacker's Delight devoted to this issue (Section 7-1: Reversing Bits and Bytes) using binary operations, bit shifts, and other goodies. Seems like these are all possible in Python and it should be much quicker than the binary-to-string-and-reverse methods.
The book isn't available publicly but I found this blog post that discusses some of it. The method shown in the blog post follows the following quote from the book:
Bit reversal can be done quite efficiently by interchanging adjacent
single bits, then interchanging adjacent 2-bit fields, and so on, as
shown below. These five assignment statements can be executed in any
order.
http://blog.sacaluta.com/2011/02/hackers-delight-reversing-bits.html
>>> def bit_rev(n):
... return int(bin(n)[:1:-1], 2)
...
>>> bit_rev(2)
1
>>>bit_rev(10)
5
What if you wanted to reverse the binary value based on a specific amount of bits, i.e. 1 = 2b'00000001? In this case the reverse value would be 2b'10000000 or 128 (dec) respectively 0x80 (hex).
def binary_reverse(num, bit_length):
# Convert to binary and pad with 0s on the left
bin_val = bin(num)[2:].zfill(bit_length)
return int(''.join(reversed(bin_val)), 2)
# Or, alternatively:
# return int(bin_val[::-1], 2)

Parameter in Python

Let's say there is a parameter n. Can n be any numbers? For example, question like this: Given a non-negative number num, return True if num is within 2 of a multiple of 10. This is what I am thinking:
def near_ten(num):
n = int #So I assume n can be any integer
if abs(num - n*10) <=2:
return True
Return False
However, there are two problems. First, in n*10, * is a unsupported operand type cuz I thought I could use Python as a calculator. 2nd, I cannot just simply say n = int, then n can be viewed as a variable as any number (or integer) in a math function. If there is a way that I could use n in that way, then life would be so much easier.
Finally I figure it out in another way which doesn't include "n" as a parameter:
def near_ten(num):
if num%10<=2:
return True
if (num+2)%10<=2:
return True
return False
However, I'm still curious about "n" as a parameter mentioned before. Since I'm just a starter, so this is really confusing.
In Python, int is a type. Types are first-class objects in Python, and can be bound to names. Of course, trying to multiply a type by a number is usually meaningless, so the operation is not defined by default. It can be referred to by the new name though.
n = int
print(n(3.4))
print(n('10') == 10)
Here is a much simpler solution:
def near_mult_ten(num):
return abs(num - (num+5) // 10 * 10) <= 2
Edit: Fixed.
try this:
d=a%10
if d<=2 or d>=8:
return True
return False
I am new to coding as well so forgive any mistakes.
i am not sure if you have ran your code or not, but python is a high level interpreted language, python CAN distinguish between variable types and therefore you do not need to explicitly declare them, your function header is valid.
you can also do operations between integers and floats/doubles without the need of casting, python already handles that for you
your function will raise an error in any compiler, ur n variable is declared, you have defined it, but you have not initialized it

Categories

Resources