Improving sum() calculations on evenly spaced list

Improving sum() calculations on evenly spaced list - python

I have a code where I need 10Million evenly spaced number between 0 and 1 and I have a logic function which is responsible to pick a random index and return the sum of numbers from that index till the end of the list.
Thus the code looks like below,
import random
import numpy as np
ten_million = np.linspace(0.0, 1.0, 10000000)
def deep_dive_logic():
# this pick is derived from good logic, however, let's just use random here for demonstration
pick = random.randint(0, 10000000)
return sum(ten_million[pick:])
for _ in range(2500):
r = deep_dive_logic()
print(r)
# more logic ahead...
The problem here is as I loop sum() on a list of such size it takes approx. 1.3 s for each result.
Is there any efficient way to reduce the 1.3s wait per call? I also tried creating a kind of cache dictionary, but the deep_dive_logic() function runs in a multi-process environment hence there is need to cache this dictionary, either redis or a json.dump not a choice because of the size of dictionary mounts to around 236MB and adds up as overhead in inter-process communication if not cached.
sums_dict = {0: sum(ten_million)}
even_difference = (ten_million[1] - ten_million[0])
for i in range(len(ten_million) - 1):
sums_dict[i+1] = sums_dict[i] - (even_difference * (i+1))
I need help with either caching of 10Million dictionary or an alternate formula to return the result without using sum() or any out-of-box solution.
https://repl.it/repls/HoneydewGoldenShockwave

np.sum(ten_million) does it in about 0.005 seconds, whereas sum(ten_million) is about 1.5 seconds on my machine
As for a solution without using any out of the box functions, as suggested in the comments to your question by MrT, you can use the property of arithmetic progressions, which says that the sum of a progression is equal to n(a1+an) / 2, where n is the number of elements (10000000), a1 is your first element (0), and an is your last element (1). In your example, this is 10000000(0+1) / 2 = 5000000
so, for your deep_dive_logic function, just return that:
def deep_dive_logic():
pick = random.randint(0, 10000000)
return (len(ten_million)-pick)*(ten_million[pick]+ten_million[-1]) / 2
Also does the job extremely fast, in fact, much faster than np.sum: on average, the arithmetic progression calculation took 1.223e-06 seconds, whereas np.sum took 0.00577 seconds on my machine. Makes sense, seeing how it's just one addition, one multiplication, and one division...

Do it analytically:
def cumm_sum(start, finish, steps, k):
step = (finish - start) / steps
pop = (finish - k) / step
return (pop + 1) * 0.5 * (k + finish)
and the call would be like:
pick = ten_million[random.randint(0, 10000000)]
result = cumm_sum(0.0, 1.0, 10000000, pick)

use math to reduce the problem complexity:
The sum of an arithmetic progression is given by
(m+n)*(m-n+1)*0.5
use np.vectorize to speed up the array operation:
ten_m = 10000000
def sum10m_py(n):
return (1+n)*(ten_m-n*ten_m+1)*0.5
sum_np = np.vectorize(sum_py)
pick the elements you want, and then apply the vectorized function on it.
mask = np.random.randint(0,ten_m,2500)
sums = sum_np(ten_million[mask])

Related

Long For Loop Excecution Time

I am attempting to compare the numerical output of a MATLAB script to that of numpy.
The task, from a homework assignment, is to add the value of 10^-N to itself 10^N times.
The goal is to demonstrate that computers have limitations that will compound in such a calculation. The result should always be 1, but as N increases, the error will also increase. For this problem, N should be 0, 1, 2, ..., 8.
My MATLAB script runs quickly with no issues:
solns = zeros(1, 9);
for N = 0:8
for ii = 1:(10^N)
solns(N+1) = solns(N+1) + 10^-N;
end
end
However, using Python 3.8.12 on an Anaconda installation with numpy, this code will not terminate (See Update, code does execute):
import numpy as np
solns = np.zeros((9, 1))
for N in range(len(solns)):
for _ in range(10**N):
solns[N] += (10**-N)
Is there an error in the Python code or is there some significance difference between the languages that creates this result.
EDIT:
I let the python code run for awhile and it eventually terminated after 230 seconds. The whole array printed out as just [1., 1., 1., etc.].
For reference, using the tic toc commands on MATLAB, the program was executed in 0.155760 seconds

You're rather abusing NumPy there. Especially the += with an array element seems to take almost all of the time.
This is how I'd do it, takes about 0.5 seconds:
from itertools import repeat
for N in range(9):
print(sum(repeat(10**-N, 10**N)))
Output (Try it online!):
1
0.9999999999999999
1.0000000000000007
1.0000000000000007
0.9999999999999062
0.9999999999980838
1.000000000007918
0.99999999975017
1.0000000022898672
As #AndrasDeak commented, it's fast due to several optimizations. Let's try fewer optimizations as well.
First, only get rid of NumPy, use simple Python floats. Your original btw takes me about 230 seconds as well. This only about 29 seconds:
for N in range(9):
total = 0
for _ in range(10**N):
total += (10**-N)
print(total)
Next, don't recompute the added value over and over again. Takes about 11 seconds:
for N in range(9):
total = 0
add = 10**-N
for _ in range(10**N):
total += add
print(total)
Next, get rid of range producing lots of int objects for no good reason. Let's use itertools.repeat, it's the fastest iterable I know. Takes about 9 seconds:
from itertools import repeat
for N in range(9):
total = 0
add = 10**-N
for _ in repeat(None, 10**N):
total += add
print(total)
Or alternatively, about the same speed:
from itertools import repeat
for N in range(9):
total = 0
for add in repeat(10**-N, 10**N):
total += add
print(total)
Simply put it into a function to benefit from local variables being faster than globals. Takes about 3.3 seconds then:
from itertools import repeat
def run():
for N in range(9):
total = 0
for add in repeat(10**-N, 10**N):
total += add
print(total)
run()
That might be the fastest I can do with an ordinary clean for loop (loop unrolling would help some more, but ugh...).
My sum(repeat(...)) version lets C code do all the work (and sum even optimizes the summation of floats), making it still quite a bit faster.

When does python start using a different algorithm for big multiplication?

I'm currently in an algorithms class and was interested to see which of two methods of multiplying a list of large numbers gives the faster runtime. What I found was that the recursive multiply performs about 10x faster. For the code below, I got t_sim=53.05s and t_rec=4.73s. I did some other tests and they all seemed to be around the 10x range.
Additionally, you could put the values from the recursive multiply into a tree and reuse them to even more quickly compute multiplications of subsets of the list.
I did a theoretical runtime analysis, and both are n^2 using standard multiplication, but when the karatsuba algorithm is used, that factor goes down to n^log_2(3).
Every multiply in simple_multiply should have runtime i * 1. Summing over i=1...n, we get an arithmetic series and can use gauss's formula to get n*(n+1)/2 = O(n^2).
For the second one, we can see that the time to multiply for a given level of recursion is (2^d)^2, where d is the depth, but only needs to multiply n*2^-d values. The levels turn out to form a geometric series where the runtime at each level is n*2^d with a final depth of log_2(n). The solution to the geometric series is n * (1-2^log_2(n))/(1-2) = n*(n-1) = O(n^2). If using the karatsuba algorithm, you can get O(n^log_2(3)) by doing the same method
If the code were using the karatsuba algorithm, then the speedup would make sense, but what doesn't seem to make sense is the linear relationship between the two runtimes, making it seem like python is using standard multiplication, which according to wikipedia is faster when using under 500ish bits. (I'm using 2^23 bits in the code below. Each number is literally a megabyte long)
import random
import time
def simple_multiply(values):
a = 1
for val in values:
a *= val
return a
def recursive_multiply(values):
if len(values) == 1:
return values[0]
temp = []
i = 0
while i + 1 < len(values):
temp.append(values[i] * values[i+1])
i += 2
if len(values) % 2 == 1:
temp.append(values[-1])
return recursive_multiply(temp)
def test(func, values):
t1 = time.time()
func(values)
print( time.time() - t1)
def main():
n = 2**11
scale = 2**12
values = [random.getrandbits(scale) for i in range(n)]
test(simple_multiply, values)
test(recursive_multiply, values)
pass
if __name__ == '__main__':
main()

Both versions of the code have the same number of multiplications, but in the simple version each multiplication is ~2000 bits long on average.
In the second version n/2 multiplications are 24 bits long, n/4 are 48 bits long, n/8 are 96 bits long, etc... The average length is only 48 bits.

There is something wrong in your assumption. Your assumption is that each multiplication of the between the different ranks should take same times, for instance len(24)*len(72) approx len(48)*len(48). But that's not true, as evident by the following snippets:
%%timeit
random.getrandbits(2**14)*random.getrandbits(2**14)*random.getrandbits(2**14)*random.getrandbits(2**14)
>>>1000 loops, best of 3: 1.48 ms per loop
%%timeit
(random.getrandbits(2**14)*random.getrandbits(2**14))*(random.getrandbits(2**14)*random.getrandbits(2**14))
>>>1000 loops, best of 3: 1.23 ms per loop
The difference is consistent even on such a small scale

Efficient random generator for very large range (in python)

I am trying to create a generator that returns numbers in a given range that pass a particular test given by a function foo. However I would like the numbers to be tested in a random order. The following code will achieve this:
from random import shuffle
def MyGenerator(foo, num):
order = list(range(num))
shuffle(order)
for i in order:
if foo(i):
yield i
The Problem
The problem with this solution is that sometimes the range will be quite large (num might be of the order 10**8 and upwards). This function can become slow, having such a large list in memory. I have tried to avoid this problem, with the following code:
from random import randint
def MyGenerator(foo, num):
tried = set()
while len(tried) <= num - 1:
i = randint(0, num-1)
if i in tried:
continue
tried.add(i)
if foo(i):
yield i
This works well most of the time, since in most cases num will be quite large, foo will pass a reasonable number of numbers and the total number of times the __next__ method will be called will be relatively small (say, a maximum of 200 often much smaller). Therefore its reasonable likely we stumble upon a value that passes the foo test and the size of tried never gets large. (Even if it only passes 10% of the time, we wouldn't expect tried to get larger than about 2000 roughly.)
However, when num is small (close to the number of times that the __next__ method is called, or foo fails most of the time, the above solution becomes very inefficient - randomly guessing numbers until it guesses one that isn't in tried.
My attempted solution...
I was hoping to use some kind of function that maps the numbers 0,1,2,..., n onto themselves in a roughly random way. (This isn't being used for any security purposes and so doesn't matter if it isn't the most 'random' function in the world). The function here (Create a random bijective function which has same domain and range) maps signed 32-bit integers onto themselves, but I am not sure how to adapt the mapping to a smaller range. Given num I don't even need a bijection on 0,1,..num just a value of n larger than and 'close' to num (using whatever definition of close you see fit). Then I can do the following:
def mix_function_factory(num):
# something here???
def foo(index):
# something else here??
return foo
def MyGenerator(foo, num):
mix_function = mix_function_factory(num):
for i in range(num):
index = mix_function(i)
if index <= num:
if foo(index):
yield index
(so long as the bijection isn't on a set of numbers massively larger than num the number of times index <= num isn't True will be small).
My Question
Can you think of one of the following:
A potential solution for mix_function_factory or even a few other potential functions for mix_function that I could attempt to generalise for different values of num?
A better way of solving the original problem?
Many thanks in advance....

The problem is basically generating a random permutation of the integers in the range 0..n-1.
Luckily for us, these numbers have a very useful property: they all have a distinct value modulo n. If we can apply some mathemical operations to these numbers while taking care to keep each number distinct modulo n, it's easy to generate a permutation that appears random. And the best part is that we don't need any memory to keep track of numbers we've already generated, because each number is calculated with a simple formula.
Examples of operations we can perform on every number x in the range include:
Addition: We can add any integer c to x.
Multiplication: We can multiply x with any number m that shares no prime factors with n.
Applying just these two operations on the range 0..n-1 already gives quite satisfactory results:
>>> n = 7
>>> c = 1
>>> m = 3
>>> [((x+c) * m) % n for x in range(n)]
[3, 6, 2, 5, 1, 4, 0]
Looks random, doesn't it?
If we generate c and m from a random number, it'll actually be random, too. But keep in mind that there is no guarantee that this algorithm will generate all possible permutations, or that each permutation has the same probability of being generated.
Implementation
The difficult part about the implementation is really just generating a suitable random m. I used the prime factorization code from this answer to do so.
import random
# credit for prime factorization code goes
# to https://stackoverflow.com/a/17000452/1222951
def prime_factors(n):
gaps = [1,2,2,4,2,4,2,4,6,2,6]
length, cycle = 11, 3
f, fs, next_ = 2, [], 0
while f * f <= n:
while n % f == 0:
fs.append(f)
n /= f
f += gaps[next_]
next_ += 1
if next_ == length:
next_ = cycle
if n > 1: fs.append(n)
return fs
def generate_c_and_m(n, seed=None):
# we need to know n's prime factors to find a suitable multiplier m
p_factors = set(prime_factors(n))
def is_valid_multiplier(m):
# m must not share any prime factors with n
factors = prime_factors(m)
return not p_factors.intersection(factors)
# if no seed was given, generate random values for c and m
if seed is None:
c = random.randint(n)
m = random.randint(1, 2*n)
else:
c = seed
m = seed
# make sure m is valid
while not is_valid_multiplier(m):
m += 1
return c, m
Now that we can generate suitable values for c and m, creating the permutation is trivial:
def random_range(n, seed=None):
c, m = generate_c_and_m(n, seed)
for x in range(n):
yield ((x + c) * m) % n
And your generator function can be implemented as
def MyGenerator(foo, num):
for x in random_range(num):
if foo(x):
yield x

That may be a case where the best algorithm depends on the value of num, so why not using 2 selectable algorithms wrapped in one generator ?
you could mix your shuffle and set solutions with a threshold on the value of num. That's basically assembling your 2 first solutions in one generator:
from random import shuffle,randint
def MyGenerator(foo, num):
if num < 100000 # has to be adjusted by experiments
order = list(range(num))
shuffle(order)
for i in order:
if foo(i):
yield i
else: # big values, few collisions with random generator
tried = set()
while len(tried) < num:
i = randint(0, num-1)
if i in tried:
continue
tried.add(i)
if foo(i):
yield i
The randint solution (for big values of num) works well because there aren't so many repeats in the random generator.

Getting the best performance in Python is much trickier than in lower-level languages. For example, in C, you can often save a little bit in hot inner loops by replacing a multiplication by a shift. The overhead of python bytecode-orientation erases this. Of course, this changes again when you consider which variant of "python" you're targetting (pypy? numpy? cython?)- you really have to write your code based on which one you're using.
But even more important is arranging operations to avoid serialized dependencies, since all CPUs are superscalar these days. Of course, real compilers know about this, but it still matters when choosing an algorithm.
One of the easiest ways to gain a little bit over existing answers would be by by generating numbers in chunks using numpy.arange() and applying the ((x + c) * m) % n to the numpy ndarray directly. Every python-level loop that can be avoided helps.
If the function can be applied directly to numpy ndarrays, that might even better. Of course, a sufficiently-small function in python will be dominated by function-call overhead anyway.
The best fast random-number-generator today is PCG. I wrote a pure-python port here but concentrated on flexibility and ease-of-understanding rather than speed.
Xoroshiro128+ is second-best-quality and faster, but less informative to study.
Python's (and many others') default choice of Mersenne Twister is among the worst.
(there's also something called splitmix64 which I don't know enough about to place - some people say it's better than xoroshiro128+, but it has a period problem - of course, you might want that here)
Both default-PCG and xoroshiro128+ use a 2N-bit state to generate N-bit numbers. This is generally desirable, but means numbers will be repeated. PCG has alternate modes that avoid this, however.
Of course, much of this depends on whether num is (close to) a power of 2. In theory, PCG variants can be created for any bit width, but currently only various word sizes are implemented since you'd need explicit masking. I'm not sure exactly how to generate the parameters for new bit sizes (perhaps it's in the paper?), but they can be tested simply by doing a period/2 jump and verifying that the value is different.
Of course, if you're only making 200 calls to the RNG, you probably don't actually need to avoid duplicates on the math side.
Alternatively, you could use an LFSR, which does exist for every bit size (though note that it never generates the all-zeros value (or equivalently, the all-ones value)). LFSRs are serial and (AFAIK) not jumpable, and thus can't be easily split across multiple tasks. Edit: I figured out that this is untrue, simply represent the advance step as a matrix, and exponentiate it to jump.
Note that LFSRs do have the same obvious biases as simply generating numbers in sequential order based on a random start point - for example, if rng_outputs[a:b] all fail your foo function, then rng_outputs[b] will be much more likely as a first output regardless of starting point. PCG's "stream" parameter avoids this by not generating numbers in the same order.
Edit2: I have completed what I thought was a "brief project" implementing LFSRs in python, including jumping, fully tested.

Time Complexity - Codility - Ladder - Python

The question is available here. My Python code is
def solution(A, B):
if len(A) == 1:
return [1]
ways = [0] * (len(A) + 1)
ways[1], ways[2] = 1, 2
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
result = [1] * len(A)
for i in xrange(len(A)):
result[i] = ways[A[i]] & ((1<<B[i]) - 1)
return result
The detected time complexity by the system is O(L^2) and I can't see why. Thank you in advance.

First, let's show that the runtime genuinely is O(L^2). I copied a section of your code, and ran it with increasing values of L:
import time
import matplotlib.pyplot as plt
def solution(L):
if L == 0:
return
ways = [0] * (L+5)
ways[1], ways[2] = 1, 2
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
points = []
for L in xrange(0, 100001, 10000):
start = time.time()
solution(L)
points.append(time.time() - start)
plt.plot(points)
plt.show()
The result graph is this:
To understand why this O(L^2) when the obvious "time complexity" calculation suggests O(L), note that "time complexity" is not a well-defined concept on its own since it depends on which basic operations you're counting. Normally the basic operations are taken for granted, but in some cases you need to be more careful. Here, if you count additions as a basic operation, then the code is O(N). However, if you count bit (or byte) operations then the code is O(N^2). Here's the reason:
You're building an array of the first L Fibonacci numbers. The length (in digits) of the i'th Fibonacci number is Theta(i). So ways[i] = ways[i-1] + ways[i-2] adds two numbers with approximately i digits, which takes O(i) time if you count bit or byte operations.
This observation gives you an O(L^2) bit operation count for this loop:
for i in xrange(3, len(ways)):
ways[i] = ways[i-1] + ways[i-2]
In the case of this program, it's quite reasonable to count bit operations: your numbers are unboundedly huge as L increases and addition of huge numbers is linear in clock time rather than O(1).
You can fix the complexity of your code by computing the Fibonacci numbers mod 2^32 -- since 2^32 is a multiple of 2^B[i]. That will keep a finite bound on the numbers you're dealing with:
for i in xrange(3, len(ways)):
ways[i] = (ways[i-1] + ways[i-2]) & ((1<<32) - 1)
There are some other issues with the code, but this will fix the slowness.

I've taken the relevant parts of the function:
def solution(A, B):
for i in xrange(3, len(A) + 1): # replaced ways for clarity
# ...
for i in xrange(len(A)):
# ...
return result
Observations:
A is an iterable object (e.g. a list)
You're iterating over the elements of A in sequence
The behavior of your function depends on the number of elements in A, making it O(A)
You're iterating over A twice, meaning 2 O(A) -> O(A)
On point 4, since 2 is a constant factor, 2 O(A) is still in O(A).
I think the page is not correct in its measurement. Had the loops been nested, then it would've been O(A²), but the loops are not nested.
This short sample is O(N²):
def process_list(my_list):
for i in range(0, len(my_list)):
for j in range(0, len(my_list)):
# do something with my_list[i] and my_list[j]
I've not seen the code the page is using to 'detect' the time complexity of the code, but my guess is that the page is counting the number of loops you're using without understanding much of the actual structure of the code.
EDIT1:
Note that, based on this answer, the time complexity of the len function is actually O(1), not O(N), so the page is not incorrectly trying to count its use for the time-complexity. If it were doing that, it would've incorrectly claimed a larger order of growth because it's used 4 separate times.
EDIT2:
As #PaulHankin notes, asymptotic analysis also depends on what's considered a "basic operation". In my analysis, I've counted additions and assignments as "basic operations" by using the uniform cost method, not the logarithmic cost method, which I did not mention at first.
Most of the time simple arithmetic operations are always treated as basic operations. This is what I see most commonly being done, unless the algorithm being analysed is for a basic operation itself (e.g. time complexity of a multiplication function), which is not the case here.
The only reason why we have different results appears to be this distinction. I think we're both correct.
EDIT3:
While an algorithm in O(N) is also in O(N²), I think it's reasonable to state that the code is still in O(N) b/c, at the level of abstraction we're using, the computational steps that seem more relevant (i.e. are more influential) are in the loop as a function of the size of the input iterable A, not the number of bits being used to represent each value.
Consider the following algorithm to compute an:
def function(a, n):
r = 1
for i in range(0, n):
r *= a
return r
Under the uniform cost method, this is in O(N), because the loop is executed n times, but under logarithmic cost method, the algorithm above turns out to be in O(N²) instead due to the time complexity of the multiplication at line r *= a being in O(N), since the number of bits to represent each number is dependent on the size of the number itself.

Codility Ladder competition is best solved in here:
It is super tricky.
We first compute the Fibonacci sequence for the first L+2 numbers. The first two numbers are used only as fillers, so we have to index the sequence as A[idx]+1 instead of A[idx]-1. The second step is to replace the modulo operation by removing all but the n lowest bits

Faster looping with itertools

I have a function
def getSamples():
p = lambda x : mlab.normpdf(x,3,2) + mlab.normpdf(x,-5,1)
q = lambda x : mlab.normpdf(x,5,14)
k=30
goodSamples = []
rightCount = 0
totalCount = 0
while(rightCount < 100000):
z0 = np.random.normal(5, 14)
u0 = np.random.uniform(0,k*q(z0))
if(p(z0) > u0):
goodSamples.append(z0)
rightCount += 1
totalCount += 1
return np.array(goodSamples)
My implementation to generate 100000 samples is taking much long. How can I make it fast with itertools or something similar?

I would say that the secret to making this code faster does not lie in changing the loop syntax. Here are a few points:
np.random.normal has an additional parameter size that lets you get many values at once. I would suggest using an array of say 1E09 elements and then checking your condition on that for how many are good. You can then estimate how likely that is.
To create your uniform samples, why not use sympy for symbolic evaluation of the pdf? (I don't know if this is faster but it could be since you already know the mean and variance.)
Again, for p could you use a symbolic function?

In general, performance problems are caused by doing things the "wrong way". Numpy can be very fast when used as it is designed to be used, that is by exploiting its vector processing where these vectorized operations are handed off to compiled code. Two bad practices that come from other programing languages/approaches are
Loops: Whenever you think you need a loop stop and think. Most of the time you do not and in fact do not even want one. It is much faster both to write and run code without loops.
Memory allocation: Whenever you know the size of an object, preallocate space for it. Growing memory, particularly in Python lists, is very slow compared to the alternatives.
In this case it is easy to get (approximately) two orders of magnitude speedup; the tradeoff is more memory usage.
Below is some representative code, it is not meant to be blindly used. I have not even verified it produces the correct results. It is more or less a direct translation of your routine. It appears you are drawing random numbers from a probability distribution using the rejection method. There may be more efficient algorithms to do this for your probability distribution.
def getSamples2() :
p = lambda x : mlab.normpdf(x,3,2) + mlab.normpdf(x,-5,1)
q = lambda x : mlab.normpdf(x,5,14)
k=30
N = 100000 # Total number of samples we want
Ngood = 0 # Current number of good samples
goodSamples = np.zeros(N) # Storage for the good samples
while Ngood < N : # Unfortunately a loop, ....
z0 = np.random.normal(5, 14, size=N)
u0 = np.random.uniform(size=N)*k*q(z0)
ind, = np.where(p(z0) > u0)
n = min(len(ind), N-Ngood)
goodSamples[Ngood:Ngood+n] = z0[ind[:n]]
Ngood += n
return goodSamples
This generates random numbers in chunks and saves the good ones. I have not tried to optimize the chunk size (here I just use N, the total number we want, in principle this could/should be different and could even be adjusted based on the number we have left to generate). This still uses a loop, unfortunately, but now this will be run "tens" of times instead of 100,000 times. This also uses the where function and array slicing; these are good general tools to be comfortable with.
In one test with %timeit on my machine I found
In [27]: %timeit getSamples() # Original routine
1 loops, best of 3: 49.3 s per loop
In [28]: %timeit getSamples2()
1 loops, best of 3: 505 ms per loop

Here is kinda itertools "magic", but I'm not sure it can help. Probably it's much better for perfomance to prepare an numpy array (using zeros) and fill it without creating python auto-growing list. Here is both itertools and zero-preparations. (Excuse me in advance for untested code)
from itertools import count, ifilter, imap, takewhile
import operator
def getSamples():
p = lambda x : mlab.normpdf(x, 3, 2) + mlab.normpdf(x, -5, 1)
q = lambda x : mlab.normpdf(x, 5, 14)
k = 30
n = 100000
samples_iter = imap(
operator.itemgetter(1),
takewhile(
lambda i, s: i < n,
enumerate(
ifilter(lambda z: p(z) > np.random.uniform(0,k*q(z)),
(np.random.normal(5, 14) for _ in count()))
)))
goodSamples = numpy.zeros(n)
# set values from iterator, probably there is a better way for that
for i, sample in enumerate(samples_iter):
goodSamples[i] = sample
return goodSamples

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.