Faster looping with itertools

Faster looping with itertools - python

I have a function
def getSamples():
p = lambda x : mlab.normpdf(x,3,2) + mlab.normpdf(x,-5,1)
q = lambda x : mlab.normpdf(x,5,14)
k=30
goodSamples = []
rightCount = 0
totalCount = 0
while(rightCount < 100000):
z0 = np.random.normal(5, 14)
u0 = np.random.uniform(0,k*q(z0))
if(p(z0) > u0):
goodSamples.append(z0)
rightCount += 1
totalCount += 1
return np.array(goodSamples)
My implementation to generate 100000 samples is taking much long. How can I make it fast with itertools or something similar?

I would say that the secret to making this code faster does not lie in changing the loop syntax. Here are a few points:
np.random.normal has an additional parameter size that lets you get many values at once. I would suggest using an array of say 1E09 elements and then checking your condition on that for how many are good. You can then estimate how likely that is.
To create your uniform samples, why not use sympy for symbolic evaluation of the pdf? (I don't know if this is faster but it could be since you already know the mean and variance.)
Again, for p could you use a symbolic function?

In general, performance problems are caused by doing things the "wrong way". Numpy can be very fast when used as it is designed to be used, that is by exploiting its vector processing where these vectorized operations are handed off to compiled code. Two bad practices that come from other programing languages/approaches are
Loops: Whenever you think you need a loop stop and think. Most of the time you do not and in fact do not even want one. It is much faster both to write and run code without loops.
Memory allocation: Whenever you know the size of an object, preallocate space for it. Growing memory, particularly in Python lists, is very slow compared to the alternatives.
In this case it is easy to get (approximately) two orders of magnitude speedup; the tradeoff is more memory usage.
Below is some representative code, it is not meant to be blindly used. I have not even verified it produces the correct results. It is more or less a direct translation of your routine. It appears you are drawing random numbers from a probability distribution using the rejection method. There may be more efficient algorithms to do this for your probability distribution.
def getSamples2() :
p = lambda x : mlab.normpdf(x,3,2) + mlab.normpdf(x,-5,1)
q = lambda x : mlab.normpdf(x,5,14)
k=30
N = 100000 # Total number of samples we want
Ngood = 0 # Current number of good samples
goodSamples = np.zeros(N) # Storage for the good samples
while Ngood < N : # Unfortunately a loop, ....
z0 = np.random.normal(5, 14, size=N)
u0 = np.random.uniform(size=N)*k*q(z0)
ind, = np.where(p(z0) > u0)
n = min(len(ind), N-Ngood)
goodSamples[Ngood:Ngood+n] = z0[ind[:n]]
Ngood += n
return goodSamples
This generates random numbers in chunks and saves the good ones. I have not tried to optimize the chunk size (here I just use N, the total number we want, in principle this could/should be different and could even be adjusted based on the number we have left to generate). This still uses a loop, unfortunately, but now this will be run "tens" of times instead of 100,000 times. This also uses the where function and array slicing; these are good general tools to be comfortable with.
In one test with %timeit on my machine I found
In [27]: %timeit getSamples() # Original routine
1 loops, best of 3: 49.3 s per loop
In [28]: %timeit getSamples2()
1 loops, best of 3: 505 ms per loop

Here is kinda itertools "magic", but I'm not sure it can help. Probably it's much better for perfomance to prepare an numpy array (using zeros) and fill it without creating python auto-growing list. Here is both itertools and zero-preparations. (Excuse me in advance for untested code)
from itertools import count, ifilter, imap, takewhile
import operator
def getSamples():
p = lambda x : mlab.normpdf(x, 3, 2) + mlab.normpdf(x, -5, 1)
q = lambda x : mlab.normpdf(x, 5, 14)
k = 30
n = 100000
samples_iter = imap(
operator.itemgetter(1),
takewhile(
lambda i, s: i < n,
enumerate(
ifilter(lambda z: p(z) > np.random.uniform(0,k*q(z)),
(np.random.normal(5, 14) for _ in count()))
)))
goodSamples = numpy.zeros(n)
# set values from iterator, probably there is a better way for that
for i, sample in enumerate(samples_iter):
goodSamples[i] = sample
return goodSamples

Related

Improving sum() calculations on evenly spaced list

I have a code where I need 10Million evenly spaced number between 0 and 1 and I have a logic function which is responsible to pick a random index and return the sum of numbers from that index till the end of the list.
Thus the code looks like below,
import random
import numpy as np
ten_million = np.linspace(0.0, 1.0, 10000000)
def deep_dive_logic():
# this pick is derived from good logic, however, let's just use random here for demonstration
pick = random.randint(0, 10000000)
return sum(ten_million[pick:])
for _ in range(2500):
r = deep_dive_logic()
print(r)
# more logic ahead...
The problem here is as I loop sum() on a list of such size it takes approx. 1.3 s for each result.
Is there any efficient way to reduce the 1.3s wait per call? I also tried creating a kind of cache dictionary, but the deep_dive_logic() function runs in a multi-process environment hence there is need to cache this dictionary, either redis or a json.dump not a choice because of the size of dictionary mounts to around 236MB and adds up as overhead in inter-process communication if not cached.
sums_dict = {0: sum(ten_million)}
even_difference = (ten_million[1] - ten_million[0])
for i in range(len(ten_million) - 1):
sums_dict[i+1] = sums_dict[i] - (even_difference * (i+1))
I need help with either caching of 10Million dictionary or an alternate formula to return the result without using sum() or any out-of-box solution.
https://repl.it/repls/HoneydewGoldenShockwave

np.sum(ten_million) does it in about 0.005 seconds, whereas sum(ten_million) is about 1.5 seconds on my machine
As for a solution without using any out of the box functions, as suggested in the comments to your question by MrT, you can use the property of arithmetic progressions, which says that the sum of a progression is equal to n(a1+an) / 2, where n is the number of elements (10000000), a1 is your first element (0), and an is your last element (1). In your example, this is 10000000(0+1) / 2 = 5000000
so, for your deep_dive_logic function, just return that:
def deep_dive_logic():
pick = random.randint(0, 10000000)
return (len(ten_million)-pick)*(ten_million[pick]+ten_million[-1]) / 2
Also does the job extremely fast, in fact, much faster than np.sum: on average, the arithmetic progression calculation took 1.223e-06 seconds, whereas np.sum took 0.00577 seconds on my machine. Makes sense, seeing how it's just one addition, one multiplication, and one division...

Do it analytically:
def cumm_sum(start, finish, steps, k):
step = (finish - start) / steps
pop = (finish - k) / step
return (pop + 1) * 0.5 * (k + finish)
and the call would be like:
pick = ten_million[random.randint(0, 10000000)]
result = cumm_sum(0.0, 1.0, 10000000, pick)

use math to reduce the problem complexity:
The sum of an arithmetic progression is given by
(m+n)*(m-n+1)*0.5
use np.vectorize to speed up the array operation:
ten_m = 10000000
def sum10m_py(n):
return (1+n)*(ten_m-n*ten_m+1)*0.5
sum_np = np.vectorize(sum_py)
pick the elements you want, and then apply the vectorized function on it.
mask = np.random.randint(0,ten_m,2500)
sums = sum_np(ten_million[mask])

how to speed up loop in numpy?

I would like to speed up this code :
import numpy as np
import pandas as pd
a = pd.read_csv(path)
closep = a['Clsprc']
delta = np.array(closep.diff())
upgain = np.where(delta >= 0, delta, 0)
downloss = np.where(delta <= 0, -delta, 0)
up = sum(upgain[0:14]) / 14
down = sum(downloss[0:14]) / 14
u = []
d = []
for x in np.nditer(upgain[14:]):
u1 = 13 * up + x
u.append(u1)
up = u1
for y in np.nditer(downloss[14:]):
d1 = 13 * down + y
d.append(d1)
down = d1
The data below:
0 49.00
1 48.76
2 48.52
3 48.28
...
36785758 13.88
36785759 14.65
36785760 13.19
Name: Clsprc, Length: 36785759, dtype: float64
The for loop is too slow, what can I do to speed up this code? Can I vectorize the entire operation?

It looks like you're trying to calculate an exponential moving average (rolling mean), but forgot the division. If that's the case then you may want to see this SO question. Meanwhile, here's a fast a simple moving average using the cumsum() function taken from the referenced link.
def moving_average(a, n=14) :
ret = np.cumsum(a, dtype=float)
ret[n:] = ret[n:] - ret[:-n]
return ret[n - 1:] / n
If this is not the case, and you really want the function described, you can increase the iteration speed by getting using the external_loop flag in your iteration. From the numpy documentation:
The nditer will try to provide chunks that are as large as possible to
the inner loop. By forcing ‘C’ and ‘F’ order, we get different
external loop sizes. This mode is enabled by specifying an iterator
flag.
Observe that with the default of keeping native memory order, the
iterator is able to provide a single one-dimensional chunk, whereas
when forcing Fortran order, it has to provide three chunks of two
elements each.
for x in np.nditer(upgain[14:], flags=['external_loop'], order='F'):
# x now has x[0],x[1], x[2], x[3], x[4], x[5] elements.

In simplified terms, I think this is what the loops are doing:
upgain=np.array([.1,.2,.3,.4])
u=[]
up=1
for x in upgain:
u1=10*up+x
u.append(u1)
up=u1
producing:
[10.1, 101.2, 1012.3, 10123.4]
np.cumprod([10,10,10,10]) is there, plus a modified cumsum for the [.1,.2,.3,.4] terms. But I can't off hand think of a way of combining these with compiled numpy functions. We could write a custom ufunc, and use its accumulate. Or we could write it in cython (or other c interface).
https://stackoverflow.com/a/27912352 suggests that frompyfunc is a way of writing a generalized accumulate. I don't expect big time savings, maybe 2x.
To use frompyfunc, define:
def foo(x,y):return 10*x+y
The loop application (above) would be
def loopfoo(upgain,u,u1):
for x in upgain:
u1=foo(u1,x)
u.append(u1)
return u
The 'vectorized' version would be:
vfoo=np.frompyfunc(foo,2,1) # 2 in arg, 1 out
vfoo.accumulate(upgain,dtype=object).astype(float)
The dtype=object requirement was noted in the prior SO, and https://github.com/numpy/numpy/issues/4155
In [1195]: loopfoo([1,.1,.2,.3,.4],[],0)
Out[1195]: [1, 10.1, 101.2, 1012.3, 10123.4]
In [1196]: vfoo.accumulate([1,.1,.2,.3,.4],dtype=object)
Out[1196]: array([1.0, 10.1, 101.2, 1012.3, 10123.4], dtype=object)
For this small list, loopfoo is faster (3µs v 21µs)
For a 100 element array, e.g. biggain=np.linspace(.1,1,100), the vfoo.accumulate is faster:
In [1199]: timeit loopfoo(biggain,[],0)
1000 loops, best of 3: 281 µs per loop
In [1200]: timeit vfoo.accumulate(biggain,dtype=object)
10000 loops, best of 3: 57.4 µs per loop
For an even larger biggain=np.linspace(.001,.01,1000) (smaller number to avoid overflow), the 5x speed ratio remains.

Breaking early when computing cumulative products or sums in numpy

Say I have a range r=numpy.array(range(1, 6)) and I am calculating the cumulative sum using numpy.cumsum(r). But instead of returning [1, 3, 6, 10, 15] I would like it to return [1, 3, 6] because of the condition that the cumulative result must be less than 10.
If the array is very large, I would like the cumulative sum to break out before it starts calculating values that are redundant and will be thrown away later. Of course, I am trivializing everything here for the sake of the question.
Is it possible to break out of cumsum or cumprod early based on a condition?

I don't think this is possible with any function in numpy, since in most cases these are meant for vectorized computations on fixed-length arrays. One obvious way to do what you want is to break out of a standard for-loop in Python (as I assume you know):
def limited_cumsum(x, limit):
y = []
sm = 0
for item in x:
sm += item
if sm > limit:
return y
y.append(sm)
return y
But this would obviously be an order of magnitude slower than numpy's cumsum.
Since you probably need some very specialized function, the changes are low to find the exact function you need in numpy. You should probably have a look at Cython, which allows you to implement custom functions that are as flexible as a Python function (and using a syntax that is almost Python), with a speed close to that of C.

Depending on size of the array you are computing the cumulative sum for and how quickly you expect the target value to be reached it may be quicker to calculate the cumulative sum in steps.
import numpy as np
size = 1000000
target = size
def stepped_cumsum():
arr = np.arange(size)
out = np.empty(len(arr), dtype=int)
step = 1000
last_value = 0
for i in range(0, len(arr), step):
np.cumsum(arr[i:i+step], out=out[i:i+step])
out[i:i+step] += last_value
last_value = out[i+step-1]
if last_value >= target:
break
else:
return out
greater_than_target_index = i + (out[i:i+step] >= target).argmax()
# .copy() required so rest of backing array can be freed
return out[:greater_than_target_index].copy()
def normal_cumsum():
arr = np.arange(size)
out = np.cumsum(arr)
return out
stepped_result = stepped_cumsum()
normal_result = normal_cumsum()
assert (stepped_result < target).all()
assert (stepped_result == normal_result[:len(stepped_result)]).all()
Results:
In [60]: %timeit cumsum.stepped_cumsum()
1000 loops, best of 3: 1.22 ms per loop
In [61]: %timeit cumsum.normal_cumsum()
100 loops, best of 3: 3.69 ms per loop

Need a fast way to count and sum an iterable in a single pass

Can any one help me? I'm trying to come up with a way to compute
>>> sum_widths = sum(col.width for col in cols if not col.hide)
and also count the number of items in this sum, without having to make two passes over cols.
It seems unbelievable but after scanning the std-lib (built-in functions, itertools, functools, etc), I couldn't even find a function which would count the number of members in an iterable. I found the function itertools.count, which sounds like what I want, but It's really just a deceptively named range function.
After a little thought I came up with the following (which is so simple that the lack of a library function may be excusable, except for its obtuseness):
>>> visable_col_count = sum(col is col for col in cols if not col.hide)
However, using these two functions requires two passes of the iterable, which just rubs me the wrong way.
As an alternative, the following function does what I want:
>>> def count_and_sum(iter):
>>> count = sum = 0
>>> for item in iter:
>>> count += 1
>>> sum += item
>>> return count, sum
The problem with this is that it takes 100 times as long (according to timeit) as the sum of a generator expression form.
If anybody can come up with a simple one-liner which does what I want, please let me know (using Python 3.3).
Edit 1
Lots of great ideas here, guys. Thanks to all who replied. It will take me a while to digest all these answers, but I will and I will try to pick one to check.
Edit 2
I repeated the timings on my two humble suggestions (count_and_sum function and 2 separate sum functions) and discovered that my original timing was way off, probably due to an auto-scheduled backup process running in the background.
I also timed most of the excellent suggestions given as answers here, all with the same model. Analysing these answers has been quite an education for me: new uses for deque, enumerate and reduce and first time for count and accumulate. Thanks to all!
Here are the results (from my slow netbook) using the software I'm developing for display:
┌───────────────────────────────────────────────────────┐
│ Count and Sum Timing │
├──────────────────────────┬───────────┬────────────────┤
│ Method │Time (usec)│Time (% of base)│
├──────────────────────────┼───────────┼────────────────┤
│count_and_sum (base) │ 7.2│ 100%│
│Two sums │ 7.5│ 104%│
│deque enumerate accumulate│ 7.3│ 101%│
│max enumerate accumulate │ 7.3│ 101%│
│reduce │ 7.4│ 103%│
│count sum │ 7.3│ 101%│
└──────────────────────────┴───────────┴────────────────┘
(I didn't time the complex and fold methods as being just too obscure, but thanks anyway.)
Since there's very little difference in timing among all these methods I decided to use the count_and_sum function (with an explicit for loop) as being the most readable, explicit and simple (Python Zen) and it also happens to be the fastest!
I wish I could accept one of these amazing answers as correct but they are all equally good though more or less obscure, so I'm just up-voting everybody and accepting my own answer as correct (count_and_sum function) since that's what I'm using.
What was that about "There should be one-- and preferably only one --obvious way to do it."?

Using complex numbers
z = [1, 2, 4, 5, 6]
y = sum(x + 1j for x in z)
sum_z, count_z = y.real, int(y.imag)
print sum_z, count_z
18.0 5

I don't know about speed, but this is kind of pretty:
>>> from itertools import accumulate
>>> it = range(10)
>>> max(enumerate(accumulate(it), 1))
(10, 45)

Adaption of DSM's answer. using deque(... maxlen=1) to save memory use.
import itertools
from collections import deque
deque(enumerate(itertools.accumulate(x), 1), maxlen=1)
timing code in ipython:
import itertools , random
from collections import deque
def count_and_sum(iter):
count = sum = 0
for item in iter:
count += 1
sum += item
return count, sum
X = [random.randint(0, 10) for _ in range(10**7)]
%timeit count_and_sum(X)
%timeit deque(enumerate(itertools.accumulate(X), 1), maxlen=1)
%timeit (max(enumerate(itertools.accumulate(X), 1)))
results: now faster than OP's method
1 loops, best of 3: 1.08 s per loop
1 loops, best of 3: 659 ms per loop
1 loops, best of 3: 1.19 s per loop

Here's some timing data that might be of interest:
import timeit
setup = '''
import random, functools, itertools, collections
x = [random.randint(0, 10) for _ in range(10**5)]
def count_and_sum(it):
c, s = 0, 0
for i in it:
c += 1
s += i
return c, s
def two_pass(it):
return sum(i for i in it), sum(True for i in it)
def functional(it):
return functools.reduce(lambda pair, x: (pair[0]+1, pair[1]+x), it, [0, 0])
def accumulator(it):
return max(enumerate(itertools.accumulate(it), 1))
def complex(it):
cpx = sum(x + 1j for x in it)
return cpx.real, int(cpx.imag)
def dequed(it):
return collections.deque(enumerate(itertools.accumulate(it), 1), maxlen=1)
'''
number = 100
for stmt in ['count_and_sum(x)',
'two_pass(x)',
'functional(x)',
'accumulator(x)',
'complex(x)',
'dequed(x)']:
print('{:.4}'.format(timeit.timeit(stmt=stmt, setup=setup, number=number)))
Result:
3.404 # OP's one-pass method
3.833 # OP's two-pass method
8.405 # Timothy Shields's fold method
3.892 # DSM's accumulate-based method
4.946 # 1_CR's complex-number method
2.002 # M4rtini's deque-based modification of DSM's method
Given these results, I'm not really sure how the OP is seeing a 100x slowdown with the one-pass method. Even if the data looks radically different from a list of random integers, that just shouldn't happen.
Also, M4rtini's solution looks like the clear winner.
To clarify, these results are in CPython 3.2.3. For a comparison to PyPy3, see James_pic's answer, which shows some serious gains from JIT compilation for some methods (also mentioned in a comment by M4rtini.

As a follow-up to senshin's answer, it's worth noting that the performance differences are largely due to quirks in CPython's implementation, that make some methods slower than others (for example, for loops are relatively slow in CPython). I thought it would be interesting to try the exact same test in PyPy (using PyPy3 2.1 beta), which has different performance characteristics. In PyPy the results are:
0.6227 # OP's one-pass method
0.8714 # OP's two-pass method
1.033 # Timothy Shields's fold method
6.354 # DSM's accumulate-based method
1.287 # 1_CR's complex-number method
3.857 # M4rtini's deque-based modification of DSM's method
In this case, the OP's one-pass method is fastest. This makes sense, as it's arguably the simplest (at least from a compiler's point of view) and PyPy can eliminate many of the overheads by inlining method calls, which CPython can't.
For comparison, CPython 3.3.2 on my machine give the following:
1.651 # OP's one-pass method
1.825 # OP's two-pass method
3.258 # Timothy Shields's fold method
1.684 # DSM's accumulate-based method
3.072 # 1_CR's complex-number method
1.191 # M4rtini's deque-based modification of DSM's method

You can keep count inside the sum with tricks similar to this
>>> from itertools import count
>>> cnt = count()
>>> sum((next(cnt), x)[1] for x in range(10) if x%2)
25
>>> next(cnt)
5
But it's probably going to be more readable to just use a for loop

You can use this:
from itertools import count
lst = range(10)
c = count(1)
tot = sum(next(c) and x for x in lst if x % 2)
n = next(c)-1
print(n, tot)
# 5 25
It's kind of a hack, but it works perfectly well.

I don't know what the Python syntax is off hand, but you could potentially use a fold. Something like this:
(count, total) = fold((0, 0), lambda pair, x: (pair[0] + 1, pair[1] + x))
The idea is to use a seed of (0,0) and then at each step add 1 to the first component and the current number to the second component.
For comparison, you could implement sum as a fold as follows:
total = fold(0, lambda t, x: t + x)

1_CR's complex number solution is cute but overly hacky. The reason it works is that a complex number is a 2-tuple, that sums elementwise. The same is true of numpy arrays and I think it's slightly cleaner to use those:
import numpy as np
z = [1, 2, 4, 5, 6]
y = sum(np.array([x, 1]) for x in z)
sum_z, count_z = y[0], y[1]
print sum_z, count_z
18 5

Something else to consider: If it is possible to determine a minimum possible count, we can let the efficient built-in sum do part of the work:
from itertools import islice
def count_and_sum(iterable):
# insert favorite implementation here
def count_and_sum_with_min_count(iterable, min_count):
iterator = iter(iterable)
slice_sum = sum(islice(iterator, None, min_count))
rest_count, rest_sum = count_and_sum(iterator)
return min_count + rest_count, slice_sum + rest_sum
For example, using the deque method with a sequence of 10000000 items and min_count of 5000000, timing results are:
count_and_sum: 1.03
count_and_sum_with_min_count: 0.63

Thanks for all the great answers, but I decided to use my original count_and_sum function, called as follows:
>>> cc, cs = count_and_sum(c.width for c in cols if not c.hide)
As explained in the edits to my original question this turned out to be the fastest and most readable solution.

How about this?
It seems to work.
from functools import reduce
class Column:
def __init__(self, width, hide):
self.width = width
self.hide = hide
lst = [Column(10, False), Column(100, False), Column(1000, True), Column(10000, False)]
print(reduce(lambda acc, col: Column(col.width + acc.width, False) if not col.hide else acc, lst, Column(0, False)).width)

You might only need the sum & count today, but who knows what you'll need tomorrow!
Here's an easily extensible solution:
def fold_parallel(itr, **fs):
res = {
k: zero for k, (zero, f) in fs.items()
}
for x in itr:
for k, (_, f) in fs.items():
res[k] = f(res[k], x)
return res
from operator import add
print(fold_parallel([1, 2, 3],
count = (0, lambda a, b: a + 1),
sum = (0, add),
))
# {'count': 3, 'sum': 6}

Fastest Way to generate 1,000,000+ random numbers in python

I am currently writing an app in python that needs to generate large amount of random numbers, FAST. Currently I have a scheme going that uses numpy to generate all of the numbers in a giant batch (about ~500,000 at a time). While this seems to be faster than python's implementation. I still need it to go faster. Any ideas? I'm open to writing it in C and embedding it in the program or doing w/e it takes.
Constraints on the random numbers:
A Set of 7 numbers that can all have different bounds:
eg: [0-X1, 0-X2, 0-X3, 0-X4, 0-X5, 0-X6, 0-X7]
Currently I am generating a list of 7 numbers with random values from [0-1) then multiplying by [X1..X7]
A Set of 13 numbers that all add up to 1
Currently just generating 13 numbers then dividing by their sum
Any ideas? Would pre calculating these numbers and storing them in a file make this faster?
Thanks!

You can speed things up a bit from what mtrw posted above just by doing what you initially described (generating a bunch of random numbers and multiplying and dividing accordingly)...
Also, you probably already know this, but be sure to do the operations in-place (*=, /=, +=, etc) when working with large-ish numpy arrays. It makes a huge difference in memory usage with large arrays, and will give a considerable speed increase, too.
In [53]: def rand_row_doubles(row_limits, num):
....: ncols = len(row_limits)
....: x = np.random.random((num, ncols))
....: x *= row_limits
....: return x
....:
In [59]: %timeit rand_row_doubles(np.arange(7) + 1, 1000000)
10 loops, best of 3: 187 ms per loop
As compared to:
In [66]: %timeit ManyRandDoubles(np.arange(7) + 1, 1000000)
1 loops, best of 3: 222 ms per loop
It's not a huge difference, but if you're really worried about speed, it's something.
Just to show that it's correct:
In [68]: x.max(0)
Out[68]:
array([ 0.99999991, 1.99999971, 2.99999737, 3.99999569, 4.99999836,
5.99999114, 6.99999738])
In [69]: x.min(0)
Out[69]:
array([ 4.02099599e-07, 4.41729377e-07, 4.33480302e-08,
7.43497138e-06, 1.28446819e-05, 4.27614385e-07,
1.34106753e-05])
Likewise, for your "rows sum to one" part...
In [70]: def rand_rows_sum_to_one(nrows, ncols):
....: x = np.random.random((ncols, nrows))
....: y = x.sum(axis=0)
....: x /= y
....: return x.T
....:
In [71]: %timeit rand_rows_sum_to_one(1000000, 13)
1 loops, best of 3: 455 ms per loop
In [72]: x = rand_rows_sum_to_one(1000000, 13)
In [73]: x.sum(axis=1)
Out[73]: array([ 1., 1., 1., ..., 1., 1., 1.])
Honestly, even if you re-implement things in C, I'm not sure you'll be able to beat numpy by much on this one... I could be very wrong, though!

EDIT Created functions that return the full set of numbers, not just one row at a time.
EDIT 2 Make the functions more pythonic (and faster), add solution for second question
For the first set of numbers, you might consider numpy.random.randint or numpy.random.uniform, which take low and high parameters. Generating an array of 7 x 1,000,000 numbers in a specified range seems to take < 0.7 second on my 2 GHz machine:
def LimitedRandInts(XLim, N):
rowlen = (1,N)
return [np.random.randint(low=0,high=lim,size=rowlen) for lim in XLim]
def LimitedRandDoubles(XLim, N):
rowlen = (1,N)
return [np.random.uniform(low=0,high=lim,size=rowlen) for lim in XLim]
>>> import numpy as np
>>> N = 1000000 #number of randoms in each range
>>> xLim = [x*500 for x in range(1,8)] #convenient limit generation
>>> fLim = [x/7.0 for x in range(1,8)]
>>> aa = LimitedRandInts(xLim, N)
>>> ff = LimitedRandDoubles(fLim, N)
This returns integers in [0,xLim-1] or floats in [0,fLim). The integer version took ~0.3 seconds, the double ~0.66, on my 2 GHz single-core machine.
For the second set, I used #Joe Kingston's suggestion.
def SumToOneRands(NumToSum, N):
aa = np.random.uniform(low=0,high=1.0,size=(NumToSum,N)) #13 rows by 1000000 columns, for instance
s = np.reciprocal(aa.sum(0))
aa *= s
return aa.T #get back to column major order, so aa[k] is the kth set of 13 numbers
>>> ll = SumToOneRands(13, N)
This takes ~1.6 seconds.
In all cases, result[k] gives you the kth set of data.

Try r = 1664525*r + 1013904223
from "an even quicker generator"
in "Numerical Recipes in C" 2nd edition, Press et al., isbn 0521431085, p. 284.
np.random is certainly "more random"; see
Linear congruential generator .
In python, use np.uint32 like this:
python -mtimeit -s '
import numpy as np
r = 1
r = np.array([r], np.uint32)[0] # 316 py -> 16 us np
# python longs can be arbitrarily long, so slow
' '
r = r*1664525 + 1013904223 # NR2 p. 284
'
To generate big blocks at a time:
# initialize --
np.random.seed( ... )
R = np.random.randint( 0, np.iinfo( np.uint32 ).max, size, dtype=np.uint32 )
...
R *= 1664525
R += 1013904223

Making your code run in parallel certainly couldn't hurt. Try adapting it for SMP with Parallel Python

As others have already pointed out, numpy is a very good start, fast and easy to use.
If you need random numbers on a massive scale, consider eas-ecb or rc4. Both can be parallelised, you should reach performance in several GB/s.
achievable numbers posted here

If you have access to multiple cores, the computations can be done in parallel with dask.array:
import dask.array as da
x = da.random.random(size=(rows, cols)).compute()
# .compute is not necessary here, because calculations
# can continue in a lazy form and .compute is used
# on the final result

import random
for i in range(1000000):
print(random.randint(1, 1000000))
Here's a code in Python that you can use to generate one million random numbers, one per line!

Just a quick example of numpy in action:
data = numpy.random.rand(1000000)
No need for loop, you can pass in how many numbers you want to generate.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Faster looping with itertools - python

Related

Improving sum() calculations on evenly spaced list

how to speed up loop in numpy?

Breaking early when computing cumulative products or sums in numpy

Need a fast way to count and sum an iterable in a single pass

Fastest Way to generate 1,000,000+ random numbers in python

Categories

Resources