Array of ints in numba

Array of ints in numba - python

I am calculating the most frequent number in a vector of int8s. Numba complains when I set up a counter array of ints:
#jit(nopython=True)
def freq_int8(y):
"""Find most frequent number in array"""
count = np.zeros(256, dtype=int)
for val in y:
count[val] += 1
return ((np.argmax(count)+128) % 256) - 128
Calling it I get the following error:
TypingError: Invalid usage of Function(<built-in function zeros>) with parameters (int64, Function(<class 'int'>))
If I delete dtype=int it works and I get a decent speedup. I am however puzzled as to why declaring an array of ints isn't working. Is there a known workaround, and would there be any efficiency gain worth having here?
Background: I am trying to shave microseconds off some numpy-heavy code. I am especially being hurt by numpy.median, and have been looking into Numba, but am struggling to improve on median. Finding the most frequent number is an acceptable alternative to median, and here I've been able to gain some performance. The above numba code is also faster than numpy.bincount.
Update: After input in the accepted answer, here's an implementation of median for int8 vectors. It is roughly an order of magnitude faster than numpy.median:
#jit(nopython=True)
def median_int8(y):
N2 = len(y)//2
count = np.zeros(256, dtype=np.int32)
for val in y:
count[val] += 1
cs = 0
for i in range(-128, 128):
cs += count[i]
if cs > N2:
return float(i)
elif cs == N2:
j = i+1
while count[j] == 0:
j += 1
return (i + j)/2
Surprisingly, the performance difference is even greater for short vectors, apparently due to overhead in numpy vectors:
>>> a = np.random.randint(-128, 128, 10)
>>> %timeit np.median(a)
The slowest run took 7.03 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 20.8 µs per loop
>>> %timeit median_int8(a)
The slowest run took 11.67 times longer than the fastest. This could mean that an intermediate result is being cached.
1000000 loops, best of 3: 593 ns per loop
This overhead is so large, I'm wondering if something is wrong.

Just a quick note, finding the most frequent number is normally called mode, and it is as similar to the median as it is the mean... in which case np.mean will be considerably faster. Unless you have some constrains or particularities in your data, there is no guarantee that the mode approximates the median.
If you still want to calculate the mode of a list of integer numbers, np.bincount, as you mention, should be enough (if numba is faster, it shouldn't be by much):
count = np.bincount(y, minlength=256)
result = ((np.argmax(count)+128) % 256) - 128
Note I've added the minlength parameter to np.bincount just so it returns the same 256 length list that you have in your code. But it is completely unnecessary in practice, as you only want the argmax, np.bincount (without minlength) will return a list which length is the maximum number in y.
As for the numba error, replacing dtype=int with dtype=np.int32 should solve the problem. int is a python function, and you are specifying nopython in the numba header. If you remove nopython, then either dtype=int or dtype='i' will also work (having the same effect).

Related

When does python start using a different algorithm for big multiplication?

I'm currently in an algorithms class and was interested to see which of two methods of multiplying a list of large numbers gives the faster runtime. What I found was that the recursive multiply performs about 10x faster. For the code below, I got t_sim=53.05s and t_rec=4.73s. I did some other tests and they all seemed to be around the 10x range.
Additionally, you could put the values from the recursive multiply into a tree and reuse them to even more quickly compute multiplications of subsets of the list.
I did a theoretical runtime analysis, and both are n^2 using standard multiplication, but when the karatsuba algorithm is used, that factor goes down to n^log_2(3).
Every multiply in simple_multiply should have runtime i * 1. Summing over i=1...n, we get an arithmetic series and can use gauss's formula to get n*(n+1)/2 = O(n^2).
For the second one, we can see that the time to multiply for a given level of recursion is (2^d)^2, where d is the depth, but only needs to multiply n*2^-d values. The levels turn out to form a geometric series where the runtime at each level is n*2^d with a final depth of log_2(n). The solution to the geometric series is n * (1-2^log_2(n))/(1-2) = n*(n-1) = O(n^2). If using the karatsuba algorithm, you can get O(n^log_2(3)) by doing the same method
If the code were using the karatsuba algorithm, then the speedup would make sense, but what doesn't seem to make sense is the linear relationship between the two runtimes, making it seem like python is using standard multiplication, which according to wikipedia is faster when using under 500ish bits. (I'm using 2^23 bits in the code below. Each number is literally a megabyte long)
import random
import time
def simple_multiply(values):
a = 1
for val in values:
a *= val
return a
def recursive_multiply(values):
if len(values) == 1:
return values[0]
temp = []
i = 0
while i + 1 < len(values):
temp.append(values[i] * values[i+1])
i += 2
if len(values) % 2 == 1:
temp.append(values[-1])
return recursive_multiply(temp)
def test(func, values):
t1 = time.time()
func(values)
print( time.time() - t1)
def main():
n = 2**11
scale = 2**12
values = [random.getrandbits(scale) for i in range(n)]
test(simple_multiply, values)
test(recursive_multiply, values)
pass
if __name__ == '__main__':
main()

Both versions of the code have the same number of multiplications, but in the simple version each multiplication is ~2000 bits long on average.
In the second version n/2 multiplications are 24 bits long, n/4 are 48 bits long, n/8 are 96 bits long, etc... The average length is only 48 bits.

There is something wrong in your assumption. Your assumption is that each multiplication of the between the different ranks should take same times, for instance len(24)*len(72) approx len(48)*len(48). But that's not true, as evident by the following snippets:
%%timeit
random.getrandbits(2**14)*random.getrandbits(2**14)*random.getrandbits(2**14)*random.getrandbits(2**14)
>>>1000 loops, best of 3: 1.48 ms per loop
%%timeit
(random.getrandbits(2**14)*random.getrandbits(2**14))*(random.getrandbits(2**14)*random.getrandbits(2**14))
>>>1000 loops, best of 3: 1.23 ms per loop
The difference is consistent even on such a small scale

FAST comparing two numpy arrays for equality [Python] [duplicate]

Suppose I have a bunch of arrays, including x and y, and I want to check if they're equal. Generally, I can just use np.all(x == y) (barring some dumb corner cases which I'm ignoring now).
However this evaluates the entire array of (x == y), which is usually not needed. My arrays are really large, and I have a lot of them, and the probability of two arrays being equal is small, so in all likelihood, I really only need to evaluate a very small portion of (x == y) before the all function could return False, so this is not an optimal solution for me.
I've tried using the builtin all function, in combination with itertools.izip: all(val1==val2 for val1,val2 in itertools.izip(x, y))
However, that just seems much slower in the case that two arrays are equal, that overall, it's stil not worth using over np.all. I presume because of the builtin all's general-purposeness. And np.all doesn't work on generators.
Is there a way to do what I want in a more speedy manner?
I know this question is similar to previously asked questions (e.g. Comparing two numpy arrays for equality, element-wise) but they specifically don't cover the case of early termination.

Until this is implemented in numpy natively you can write your own function and jit-compile it with numba:
import numpy as np
import numba as nb
#nb.jit(nopython=True)
def arrays_equal(a, b):
if a.shape != b.shape:
return False
for ai, bi in zip(a.flat, b.flat):
if ai != bi:
return False
return True
a = np.random.rand(10, 20, 30)
b = np.random.rand(10, 20, 30)
%timeit np.all(a==b) # 100000 loops, best of 3: 9.82 µs per loop
%timeit arrays_equal(a, a) # 100000 loops, best of 3: 9.89 µs per loop
%timeit arrays_equal(a, b) # 100000 loops, best of 3: 691 ns per loop
Worst case performance (arrays equal) is equivalent to np.all and in case of early stopping the compiled function has the potential to outperform np.all a lot.

Adding short-circuit logic to array comparisons is apparently being discussed on the numpy page on github, and will thus presumably be available in a future version of numpy.

Probably someone who understands the underlying data structure could optimize this or explain whether it's reliable/safe/good practice, but it seems to work.
np.all(a==b)
Out[]: True
memoryview(a.data)==memoryview(b.data)
Out[]: True
%timeit np.all(a==b)
The slowest run took 10.82 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 6.2 µs per loop
%timeit memoryview(a.data)==memoryview(b.data)
The slowest run took 8.55 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 1.85 µs per loop
If I understand this correctly, ndarray.data creates a pointer to the data buffer and memoryview creates a native python type that can be short-circuited out of the buffer.
I think.
EDIT: further testing shows it may not be as big a time-improvement as shown. previously a=b=np.eye(5)
a=np.random.randint(0,10,(100,100))
b=a.copy()
%timeit np.all(a==b)
The slowest run took 6.70 times longer than the fastest. This could mean that an intermediate result is being cached.
10000 loops, best of 3: 17.7 µs per loop
%timeit memoryview(a.data)==memoryview(b.data)
10000 loops, best of 3: 30.1 µs per loop
np.all(a==b)
Out[]: True
memoryview(a.data)==memoryview(b.data)
Out[]: True

Hmmm, I know it is the poor answer but it seems there is no easy way for this. Numpy Creators should fix it. I suggest:
def compare(a, b):
if len(a) > 0 and not np.array_equal(a[0], b[0]):
return False
if len(a) > 15 and not np.array_equal(a[:15], b[:15]):
return False
if len(a) > 200 and not np.array_equal(a[:200], b[:200]):
return False
return np.array_equal(a, b)
:)

Well, not really an answer as I haven't checked if it break-circuits, but:
assert_array_equal.
From the documentation:
Raises an AssertionError if two array_like objects are not equal.
Try Except it if not on a performance sensitive code path.
Or follow the underlying source code, maybe it's efficient.

You could iterate all elements of the arrays and check if they are equal.
If the arrays are most likely not equal it will return much faster than the .all function.
Something like this:
import numpy as np
a = np.array([1, 2, 3])
b = np.array([1, 3, 4])
areEqual = True
for x in range(0, a.size-1):
if a[x] != b[x]:
areEqual = False
break
else:
print "a[x] is equal to b[x]\n"
if areEqual:
print "The tables are equal\n"
else:
print "The tables are not equal\n"

As Thomas Kühn wrote in a comment to your post, array_equal is a function which should solve the problem. It is described in Numpy's API reference.

Breaking down the original problem to three parts: "(1) My arrays are really large, and (2) I have a lot of them, and (3) the probability of two arrays being equal is small"
All the solutions (to date) are focused on part (1) - optimizing the performance of each equality check, and some improve this performance by factor of 10. Points (2) and (3) are ignored. Comparing each pair has O(n^2) complexity, which can become huge for a lot of matrices, while needles as the probability of being duplicates is very small.
The check can become much faster with the following general algorithm -
fast hash of each array O(n)
check equality only for arrays with the same hash
A good hash is almost unique, so the number of keys can easily be a very large fraction of n. On average, number of arrays with the same hash will be very small, and almost 1 in some cases. Duplicate arrays will have the same hash, while having the same hash doesn't guarantee they are duplicates. In that sense, the algorithm will catch all the duplicates. Comparing images only with the same hash significantly reduces the number of comparisons, which becomes almost O(n)
For my problem, I had to check duplicates within ~1 million integer arrays, each with 10k elements. Optimizing only the array equality check (with #MB-F solution) estimated run time was 5 days. With hashing first it finished in minutes. (I used array sum as the hash, that was suited for my arrays characteristics)
Some psuedo-python code
def fast_hash(arr) -> int:
pass
def arrays_equal(arr1, arr2) -> bool:
pass
def make_hash_dict(array_stack, hush_fn=np.sum):
hash_dict = defaultdict(list)
hashes = np.squeeze(np.apply_over_axes(hush_fn, array_stack, range(1, array_stack.ndim)))
for idx, hash_val in enumerate(hashes):
hash_dict[hash_val].append(idx)
return hash_dict
def get_duplicate_sets(hash_dict, array_stack):
duplicate_sets = []
for hash_key, ind_list in hash_dict.items():
if len(ind_list) == 1:
continue
all_duplicates = []
for idx1 in range(len(ind_list)):
v1 = ind_list[idx1]
if v1 in all_duplicates:
continue
arr1 = array_stack[v1]
curr_duplicates = []
for idx2 in range(idx1+1, len(ind_list)):
v2 = ind_list[idx2]
arr2 = array_stack[v2]
if arrays_equal(arr1, arr2):
if len(curr_duplicates) == 0:
curr_duplicates.append(v1)
curr_duplicates.append(v2)
if len(curr_duplicates) > 0:
all_duplicates.extend(curr_duplicates)
duplicate_sets.append(curr_duplicates)
return duplicate_sets
The variable duplicate_sets is a list of lists, each internal list contains indices of all the same duplicates.

Numpy optimization with Numba

I have two sets of points on a sphere, labelled 'obj' and 'ps' in the code example below. I would like to identify all 'obj' points that are closer than a certain angular distance from a 'ps' point.
My take on this is to represent each point by a 3D unit vector, and to compare their dot products to cos(maximum separation). This can be done easily with numpy broadcasting, but in my application I have n_obj ~ 500,000 and n_ps ~ 50,000, so the memory requirements of broadcasting are too large. Below I have pasted my current take using numba. Can this be optimized further?
from numba import jit
import numpy as np
from sklearn.preprocessing import normalize
def gen_points(n):
"""
generate random 3D unit vectors (not uniform, but irrelevant here)
"""
vec = 2*np.random.rand(n,3)-1.
vec_norm = normalize(vec)
return vec_norm
##jit(nopython=True)
#jit
def angdist_threshold_numba(vec_obj,vec_ps,cos_maxsep):
"""
finds obj that are closer than maxsep to a ps
"""
nps = len(vec_ps)
nobj = len(vec_obj)
#closeobj_all = []
closeobj_all = np.empty(0)
dotprod = np.empty(nobj)
a = np.arange(nobj)
for ps in range(nps):
np.sum(vec_obj*vec_ps[ps],axis=1,out=dotprod)
#closeobj_all.extend(a[dotprod > cos_maxsep])
closeobj_all = np.append(closeobj_all, a[dotprod > cos_maxsep])
return closeobj_all
vec_obj = gen_points(50000) #in reality ~500,000
vec_ps = gen_points(5000) #in reality ~50,000
cos_maxsep = np.cos(0.003)
closeobj_all = np.unique(angdist_threshold_numba(vec_obj,vec_ps,cos_maxsep))
This is the performance using the test case given in the code:
%timeit np.unique(angdist_threshold_numba(vec_obj,vec_ps,cos_maxsep))
1 loops, best of 3: 4.53 s per loop
I have tried to speed it up using
#jit(nopython=True)
but this fails with
NotImplementedError: Failed at nopython (nopython frontend)
(<class 'numba.ir.Expr'>, build_list(items=[]))
Edit: After a numba update to 0.26 the creation of the empty list fails even in the python mode. This can be fixed by replacing it with np.empty(0), and the .extend() with np.append(), see above. This almost doesn't change the performance.
According to https://github.com/numba/numba/issues/858 np.empty() is now supported in nopython mode, but I still can't run this with #jit(nopython = True):
TypingError: Internal error at <numba.typeinfer.CallConstraint object at 0x7ff3114a9310>

Unlike list.append you should never call numpy.append in a loop! This is because even for appending a single element the whole array needs to be copied. Because you're only interested in the unique obj you could use a Boolean array to flag the matches found so far.
As for Numba, it works best if you write out all the loops. So for example:
#jit(nopython=True)
def numba2(vec_obj, vec_ps, cos_maxsep):
nps = vec_ps.shape[0]
nobj = vec_obj.shape[0]
dim = vec_obj.shape[1]
found = np.zeros(nobj, np.bool_)
for i in range(nobj):
for j in range(nps):
cos = 0.0
for k in range(dim):
cos += vec_obj[i,k] * vec_ps[j,k]
if cos > cos_maxsep:
found[i] = True
break
return found.nonzero()
The added benefit is that we can break out of the loop over the ps array as soon as we find a match to the current obj.
You can gain some more speed by specializing the function for 3 dimensional spaces. Also, for some reason, passing all arrays and relevant dimensions into a helper function results in another speedup:
def numba3(vec_obj, vec_ps, cos_maxsep):
nps = len(vec_ps)
nobj = len(vec_obj)
out = np.zeros(nobj, bool)
numba3_helper(vec_obj, vec_ps, cos_maxsep, out, nps, nobj)
return np.flatnonzero(out)
#jit(nopython=True)
def numba3_helper(vec_obj, vec_ps, cos_maxsep, out, nps, nobj):
for i in range(nobj):
for j in range(nps):
cos = (vec_obj[i,0]*vec_ps[j,0] +
vec_obj[i,1]*vec_ps[j,1] +
vec_obj[i,2]*vec_ps[j,2])
if cos > cos_maxsep:
out[i] = True
break
return out
Timings I get for 20,000 obj and 2,000 ps:
%timeit angdist_threshold_numba(vec_obj,vec_ps,cos_maxsep)
1 loop, best of 3: 2.99 s per loop
%timeit numba2(vec_obj, vec_ps, cos_maxsep)
1 loop, best of 3: 444 ms per loop
%timeit numba3(vec_obj, vec_ps, cos_maxsep)
10 loops, best of 3: 134 ms per loop

Most memory-efficient way to compute abs()**2 of complex numpy ndarray

I'm looking for the most memory-efficient way to compute the absolute squared value of a complex numpy ndarray
arr = np.empty((250000, 150), dtype='complex128') # common size
I haven't found a ufunc that would do exactly np.abs()**2.
As an array of that size and type takes up around half a GB, I'm looking for a primarily memory-efficient way.
I would also like it to be portable, so ideally some combination of ufuncs.
So far my understanding is that this should be about the best
result = np.abs(arr)
result **= 2
It will needlessly compute (**0.5)**2, but should compute **2 in-place. Altogether the peak memory requirement is only the original array size + result array size, which should be 1.5 * original array size as the result is real.
If I wanted to get rid of the useless **2 call I'd have to do something like this
result = arr.real**2
result += arr.imag**2
but if I'm not mistaken, this means I'll have to allocate memory for both the real and imaginary part calculation, so the peak memory usage would be 2.0 * original array size. The arr.real properties also return a non-contiguous array (but that is of lesser concern).
Is there anything I'm missing? Are there any better ways to do this?
EDIT 1:
I'm sorry for not making it clear, I don't want to overwrite arr, so I can't use it as out.

Thanks to numba.vectorize in recent versions of numba, creating a numpy universal function for the task is very easy:
#numba.vectorize([numba.float64(numba.complex128),numba.float32(numba.complex64)])
def abs2(x):
return x.real**2 + x.imag**2
On my machine, I find a threefold speedup compared to a pure-numpy version that creates intermediate arrays:
>>> x = np.random.randn(10000).view('c16')
>>> y = abs2(x)
>>> np.all(y == x.real**2 + x.imag**2) # exactly equal, being the same operation
True
>>> %timeit np.abs(x)**2
10000 loops, best of 3: 81.4 µs per loop
>>> %timeit x.real**2 + x.imag**2
100000 loops, best of 3: 12.7 µs per loop
>>> %timeit abs2(x)
100000 loops, best of 3: 4.6 µs per loop

EDIT: this solution has twice the minimum memory requirement, and is just marginally faster. The discussion in the comments is good for reference however.
Here's a faster solution, with the result stored in res:
import numpy as np
res = arr.conjugate()
np.multiply(arr,res,out=res)
where we exploited the property of the abs of a complex number, i.e. abs(z) = sqrt(z*z.conjugate), so that abs(z)**2 = z*z.conjugate

If your primary goal is to conserve memory, NumPy's ufuncs take an optional out parameter that lets you direct the output to an array of your choosing. It can be useful when you want to perform operations in place.
If you make this minor modification to your first method, then you can perform the operation on arr completely in place:
np.abs(arr, out=arr)
arr **= 2
One convoluted way that only uses a little extra memory could be to modify arr in place, compute the new array of real values and then restore arr.
This means storing information about the signs (unless you know that your complex numbers all have positive real and imaginary parts). Only a single bit is needed for the sign of each real or imaginary value, so this uses 1/16 + 1/16 == 1/8 the memory of arr (in addition to the new array of floats you create).
>>> signs_real = np.signbit(arr.real) # store information about the signs
>>> signs_imag = np.signbit(arr.imag)
>>> arr.real **= 2 # square the real and imaginary values
>>> arr.imag **= 2
>>> result = arr.real + arr.imag
>>> arr.real **= 0.5 # positive square roots of real and imaginary values
>>> arr.imag **= 0.5
>>> arr.real[signs_real] *= -1 # restore the signs of the real and imagary values
>>> arr.imag[signs_imag] *= -1
At the expense of storing signbits, arr is unchanged and result holds the values we want.

arr.real and arr.imag are only views into the complex array. So no additional memory is allocated.

If you don't want sqrt (what should be much heavier than multiply), then no abs.
If you don't want double memory, then no real**2 + imag**2
Then you might try this (use indexing trick)
N0 = 23
np0 = (np.random.randn(N0) + 1j*np.random.randn(N0)).astype(np.complex128)
ret_ = np.abs(np0)**2
tmp0 = np0.view(np.float64)
ret0 = np.matmul(tmp0.reshape(N0,1,2), tmp0.reshape(N0,2,1)).reshape(N0)
assert np.abs(ret_-ret0).max()<1e-7
Anyway, I prefer the numba solution

Faster looping with itertools

I have a function
def getSamples():
p = lambda x : mlab.normpdf(x,3,2) + mlab.normpdf(x,-5,1)
q = lambda x : mlab.normpdf(x,5,14)
k=30
goodSamples = []
rightCount = 0
totalCount = 0
while(rightCount < 100000):
z0 = np.random.normal(5, 14)
u0 = np.random.uniform(0,k*q(z0))
if(p(z0) > u0):
goodSamples.append(z0)
rightCount += 1
totalCount += 1
return np.array(goodSamples)
My implementation to generate 100000 samples is taking much long. How can I make it fast with itertools or something similar?

I would say that the secret to making this code faster does not lie in changing the loop syntax. Here are a few points:
np.random.normal has an additional parameter size that lets you get many values at once. I would suggest using an array of say 1E09 elements and then checking your condition on that for how many are good. You can then estimate how likely that is.
To create your uniform samples, why not use sympy for symbolic evaluation of the pdf? (I don't know if this is faster but it could be since you already know the mean and variance.)
Again, for p could you use a symbolic function?

In general, performance problems are caused by doing things the "wrong way". Numpy can be very fast when used as it is designed to be used, that is by exploiting its vector processing where these vectorized operations are handed off to compiled code. Two bad practices that come from other programing languages/approaches are
Loops: Whenever you think you need a loop stop and think. Most of the time you do not and in fact do not even want one. It is much faster both to write and run code without loops.
Memory allocation: Whenever you know the size of an object, preallocate space for it. Growing memory, particularly in Python lists, is very slow compared to the alternatives.
In this case it is easy to get (approximately) two orders of magnitude speedup; the tradeoff is more memory usage.
Below is some representative code, it is not meant to be blindly used. I have not even verified it produces the correct results. It is more or less a direct translation of your routine. It appears you are drawing random numbers from a probability distribution using the rejection method. There may be more efficient algorithms to do this for your probability distribution.
def getSamples2() :
p = lambda x : mlab.normpdf(x,3,2) + mlab.normpdf(x,-5,1)
q = lambda x : mlab.normpdf(x,5,14)
k=30
N = 100000 # Total number of samples we want
Ngood = 0 # Current number of good samples
goodSamples = np.zeros(N) # Storage for the good samples
while Ngood < N : # Unfortunately a loop, ....
z0 = np.random.normal(5, 14, size=N)
u0 = np.random.uniform(size=N)*k*q(z0)
ind, = np.where(p(z0) > u0)
n = min(len(ind), N-Ngood)
goodSamples[Ngood:Ngood+n] = z0[ind[:n]]
Ngood += n
return goodSamples
This generates random numbers in chunks and saves the good ones. I have not tried to optimize the chunk size (here I just use N, the total number we want, in principle this could/should be different and could even be adjusted based on the number we have left to generate). This still uses a loop, unfortunately, but now this will be run "tens" of times instead of 100,000 times. This also uses the where function and array slicing; these are good general tools to be comfortable with.
In one test with %timeit on my machine I found
In [27]: %timeit getSamples() # Original routine
1 loops, best of 3: 49.3 s per loop
In [28]: %timeit getSamples2()
1 loops, best of 3: 505 ms per loop

Here is kinda itertools "magic", but I'm not sure it can help. Probably it's much better for perfomance to prepare an numpy array (using zeros) and fill it without creating python auto-growing list. Here is both itertools and zero-preparations. (Excuse me in advance for untested code)
from itertools import count, ifilter, imap, takewhile
import operator
def getSamples():
p = lambda x : mlab.normpdf(x, 3, 2) + mlab.normpdf(x, -5, 1)
q = lambda x : mlab.normpdf(x, 5, 14)
k = 30
n = 100000
samples_iter = imap(
operator.itemgetter(1),
takewhile(
lambda i, s: i < n,
enumerate(
ifilter(lambda z: p(z) > np.random.uniform(0,k*q(z)),
(np.random.normal(5, 14) for _ in count()))
)))
goodSamples = numpy.zeros(n)
# set values from iterator, probably there is a better way for that
for i, sample in enumerate(samples_iter):
goodSamples[i] = sample
return goodSamples

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Array of ints in numba - python

Related

When does python start using a different algorithm for big multiplication?

FAST comparing two numpy arrays for equality [Python] [duplicate]

Numpy optimization with Numba

Most memory-efficient way to compute abs()**2 of complex numpy ndarray

Faster looping with itertools

Categories

Resources