What's the best (fastest) way of extracting a random value from a list, a large number (>1M) of times?
I am currently in a situation where I have a graph represented as an adjacency list, whose inner lists can have vastly different length (in the range [2, potentially 100k]).
I need to iterate over this list to generate random walks, so my current solution is along the lines of
Get random node
Pick a random index from that node's adjacency list
Move to the new node
Go to 2
Repeat until the random walk is as long as needed
Go to 1
This works well enough when the graph isn't too big, however now I am working with a graph that contains >440k nodes, with a very large variance in the number of edges each node has.
The function I am using at the moment to extract the random index is
node_neighbors[int(random.random() * number_neighbors_of_node)]
This sped up the computation over my previous implementation, however it's still unacceptably slow for my purposes.
The number of neighbors of a node can go from 2 to tens of thousands, I cannot remove small nodes, I have to generate tens of thousands of random walks in this environment.
From profiling the code, most of the generation time is spent looking for these indexes, so I'm looking for a method that can reduce the time taken to do so. However, if it's possible to sidestep it entirely by modifying the algorithm, that would also be great.
Thanks!
EDIT: out of curiosity, I tested three variants of the same code using timeit and these are the results:
setup='''
import numpy as np
import random
# generate a random adjacency list, nodes have a number of neighbors between 2 and 10000
l = [list(range(random.randint(2, 10000))) for _ in range(10000)]
'''
for _ in range(1000):
v = l[random.randint(0, 10000-1)] # Get a random node adj list
vv = v[random.randint(0, len(v)-1)] # Find a random neighbor in v
0.29709450000001425
for _ in range(1000):
v = l[random.randint(0, 10000-1)]
vv = v[np.random.choice(v)]
26.760767499999986
for _ in range(1000):
v = l[random.randint(0, 10000-1)]
vv = v[int(random.random()*(len(v)))]
0.19086300000000733
for _ in range(1000):
v = l[random.randint(0, 10000-1)]
vv = v[int(random.choice(v))]
0.24351880000000392
#Alain T. already answers the specific question about graphs.
My answer will address the generic question: "What's the fastest way of finding a random index in a Python list, a large number of times?", meaning generating random subsample of N elements from a given list or NumPy array.
We will evaluate the following methods:
## Returns Python list
random_subset = random.sample(data, N)
## Returns numpy.ndarray
random_selection = np.random.choice(data, N)
## Obtain a single random idx using random.randrange, N times, used to index data
random_selection = [data[idx] for idx in (random.randrange(0, len(data)) for _ in range(N))]
## Obtain a single random idx using using np.random.randint, N times, used to index data
random_selection = [data[idx] for idx in (np.random.randint(0, len(data)) for _ in range(N))]
## Create an NumPy array of `N` random indexes using np.random.randint, used to index data
random_selection = [data[idx] for idx in np.random.randint(0, len(data), N)]
Assume we have 1MM data values (the exact values do not matter).
data_nparray = np.random.random(int(1e6))
data_list = list(data_nparray)
As it turns out, when evaluating the fastest way to generate a series of random elements, the answer changes if data stored as a list or a NumPy array.
Random selection from a Python list or tuple
The plot above shows the increase time (microseconds) when randomly selecting elements from a Python list.
For <50 elements, random.sample is the fastest (i.e. random.sample(data, N))
For >= 50 elements, it is fastest to create an array of N random indexes using np.random.randint (i.e. [data[idx] for idx in np.random.randint(0, len(data), N)])
np.random.choice is extremely slow here, as it converts the data into a NumPy array each time. However, for >100,000 selections, it becomes the fastest option.
You can probably get away with using random.sample throughout.
The same results were found to hold for Python tuples as well.
Random selection from a NumPy array
The plot above shows the increase time (microseconds) when randomly selecting elements from a NumPy array
random.sample does not work with NumPy arrays, only lists. Converting 1MM elements from a NumPy array to a list itself takes ~23,000 microseconds on average, so we will skip random.sample.
For <10 elements, it is fastest to select N random indexes one-by-one using random.randrange (i.e. [data[idx] for idx in (random.randrange(0, len(data)) for _ in range(N))])
For 10 to 50 elements, it is fastest to create an array of N random indexes using np.random.randint (i.e. [data[idx] for idx in np.random.randint(0, len(data), N)])
For >50 elements, it is fastest to use np.random.choice (i.e. np.random.choice(data, N)) which also returns a NumPy array, not a list.
You can probably get away with using random.randrange for <25 elements, then switching to np.random.choice.
Code to reproduce
import time
import random
import numpy as np
import pandas as pd
from IPython.display import display
def time_fn(fn, num_runs=7, num_loops=1_000):
run_times = []
for num_run_i in range(num_runs):
start = time.time()
for _ in range(num_loops):
fn()
end = time.time()
run_times.append((end-start)/num_loops)
return run_times
def create_times_df_row(data, N, fn, fn_name, num_runs=7, num_loops=1_000):
try:
run_times = time_fn(fn, num_runs=num_runs, num_loops=num_loops)
return [
{
'N': N,
'function': fn_name,
'run_time_avg': np.mean(run_times),
'run_time_std': np.std(run_times),
'num_runs': num_runs,
'num_loops': num_loops,
'input_type': str(type(data)),
'return_type': str(type(fn())),
}
]
except Exception as e:
print(e)
return []
data_nparray = np.random.random(int(1e4))
data_list = list(data_nparray)
all_times_df = []
for N in list(range(1, 10)) + list(range(10, 25, 5)) + list(range(25, 100, 25)) + list(range(100, 1000+1, 100)):
times_df = []
for data in [data_nparray, data_list]:
num_loops = 1_000
if N >= 100:
num_loops = 100
if N >= 1_000:
num_loops = 10
times_df.extend(create_times_df_row(
data=data,
N=N,
fn=lambda: np.random.choice(data, N),
fn_name='np.random.choice',
num_loops=num_loops if isinstance(data, np.ndarray) else 10,
))
times_df.extend(create_times_df_row(
data=data,
N=N,
fn=lambda: random.sample(data, N),
fn_name='random.sample',
num_loops=num_loops,
))
times_df.extend(create_times_df_row(
data=data,
N=N,
fn=lambda: [data[idx] for idx in (random.randrange(0, len(data)) for _ in range(N))],
fn_name='random.randrange',
num_loops=num_loops,
))
times_df.extend(create_times_df_row(
data=data,
N=N,
fn=lambda: [data[idx] for idx in (np.random.randint(0, len(data)) for _ in range(N))],
fn_name='np.random.randint (single idx)',
num_loops=num_loops,
))
times_df.extend(create_times_df_row(
data=data,
N=N,
fn=lambda: [data[idx] for idx in np.random.randint(0, len(data), N)],
fn_name='np.random.randint (idx array)',
num_loops=num_loops,
))
print(N, end=' ')
times_df = pd.DataFrame(times_df)
display(times_df)
all_times_df.append(times_df)
all_times_df = pd.concat(all_times_df).reset_index(drop=True)
from __future__ import division
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import time
from typing import *
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Assignment Constants
RANDOM_STATE = 10
FIGSIZE = (12.0, 9.0)
#### Use the following line before plt.plot(....) to increase the plot size ####
plt.figure(figsize=FIGSIZE)
### Adjust some plot display options:
sns.set(rc={
'figure.figsize':FIGSIZE,
'axes.facecolor':'white',
'figure.facecolor':'white',
'grid.color': 'gainsboro',
})
sns.set_context("talk")
def time_to_ms(row):
row['run_time_avg'] = 1000*row['run_time_avg']
row['run_time_std'] = 1000*row['run_time_std']
return row
def time_to_us(row):
row['run_time_avg'] = 1e6*row['run_time_avg']
row['run_time_std'] = 1e6*row['run_time_std']
return row
cmap = dict(zip(all_times_df['function'].unique(), sns.color_palette('tab10')))
xticks = [1, 2, 3, 5, 7, 10, 15, 25, 50, 100, 200, 500, 1000,]
## Plot for Python list
line_plot = sns.lineplot(
data=all_times_df.query(f'''input_type == "<class 'list'>"''').apply(time_to_us, axis=1),
x='N', y='run_time_avg', hue='function', linewidth=3, palette=cmap,
)
scatter_plot = sns.scatterplot(
data=all_times_df.query(f'''input_type == "<class 'list'>"''').apply(time_to_us, axis=1),
x='N', y='run_time_avg', hue='function', linewidth=3, palette=cmap,
legend=False,
)
line_plot.set_xscale("log")
line_plot.set_yscale("log")
plt.xticks(xticks)
line_plot.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
plt.xlabel("Size")
plt.ylabel("Time (µs)")
line_plot.set_title("Random selection from a Python list")
plt.legend(bbox_to_anchor=(0.5,0.5))
plt.ylim(7e-1, 2e5)
display(line_plot)
## Plot for Numpy array
line_plot = sns.lineplot(
data=all_times_df.query(f'''input_type == "<class 'numpy.ndarray'>"''').apply(time_to_us, axis=1),
x='N', y='run_time_avg', hue='function', linewidth=3, palette=cmap,
# legend=False,
)
sns.scatterplot(
data=all_times_df.query(f'''input_type == "<class 'numpy.ndarray'>"''').apply(time_to_us, axis=1),
x='N', y='run_time_avg', hue='function', linewidth=3, palette=cmap,
legend=False,
)
line_plot.set_xscale("log")
line_plot.set_yscale("log")
plt.xticks(xticks)
line_plot.get_xaxis().set_major_formatter(matplotlib.ticker.ScalarFormatter())
plt.xlabel("Size")
plt.ylabel("Time (µs)")
line_plot.set_title("Random selection from a NumPy array")
plt.legend(bbox_to_anchor=(1, 1))
plt.ylim(7e-1, 2e5)
display(line_plot)
Your solution (sol3) is already the fastest by a larger margin than your tests suggest. I adapted the performance measurements to eliminate the arbitrary selection of nodes in favour of a path traversal that will be much closer to your stated goal.
Here are the improved performance tests and result. I added sol5() to see if pre-computing a list of random values would make a difference (I was hoping numpy would vectorize that but it didn't go any faster).
Setup
import numpy as np
import random
# generate a random adjacency list, nodes have a number of neighbors between 2 and 10000
nodes = [list(range(random.randint(2, 10000))) for _ in range(10000)]
pathLen = 1000
Solutions
def sol1():
node = nodes[0]
for _ in range(pathLen):
node = nodes[random.randint(0, len(node)-1)] # move to a random neighbor
def sol2():
node = nodes[0]
for _ in range(pathLen):
node = nodes[np.random.choice(node)]
def sol3():
node = nodes[0]
for _ in range(pathLen):
node = nodes[int(random.random()*(len(node)))]
def sol4():
node = nodes[0]
for _ in range(pathLen):
node = nodes[int(random.choice(node))]
def sol5():
node = nodes[0]
for rng in np.random.random_sample(pathLen):
node = nodes[int(rng*len(node))]
Measurements
from timeit import timeit
count = 100
print("sol1",timeit(sol1,number=count))
print("sol2",timeit(sol2,number=count))
print("sol3",timeit(sol3,number=count))
print("sol4",timeit(sol4,number=count))
print("sol5",timeit(sol5,number=count))
sol1 0.12516996199999975
sol2 30.445685411
sol3 0.03886452900000137
sol4 0.1244026900000037
sol5 0.05330073100000021
numpy is not very good at manipulating matrices that have variable dimensions (like your neighbor lists) but perhaps a way to accelerate the process is to vectorize the next node selection. By assigning a random float to each node in a numpy array, you can use that to navigate between nodes until your path comes back to an already visited node. Only then would you need to generate a new random value for that node. Presumably, and depending on the path length, there would be a relatively small number of these "collisions".
Using the same idea, and leveraging numpy's vectorizing, you can make several traversals in parallel by creating a matrix of node identifiers (columns) with each row being a parallel traversal.
To illustrate this, here's a function that advances multiple "ants" on their individual random paths through the nodes:
import numpy as np
import random
nodes = [list(range(random.randint(2, 10000))) for _ in range(10000)]
nbLinks = np.array(list(map(len,nodes)),dtype=np.int) # number of neighbors per node
npNodes = np.array([nb+[-1]*(10000-len(nb)) for nb in nodes]) # fixed sized rows for numpy
def moveAnts(antCount=12,stepCount=8,antPos=None,allPaths=False):
if antPos is None:
antPos = np.random.choice(len(nodes),antCount)
paths = antPos[:,None]
for _ in range(stepCount):
nextIndex = np.random.random_sample(size=(antCount,))*nbLinks[antPos]
antPos = npNodes[antPos,nextIndex.astype(np.int)]
if allPaths:
paths = np.append(paths,antPos[:,None],axis=1)
return paths if allPaths else antPos
Example: 12 ants advancing randomly by 8 steps from random starting locations
print(moveAnts(12,8,allPaths=True))
"""
[[8840 1302 3438 4159 2983 2269 1284 5031 1760]
[4390 5710 4981 3251 3235 2533 2771 6294 2940]
[3610 2059 1118 4630 2333 552 1375 4656 6212]
[9238 1295 7053 542 6914 2348 2481 718 949]
[5308 2826 2622 17 78 976 13 1640 561]
[5763 6079 1867 7748 7098 4884 2061 432 1827]
[3196 3057 27 440 6545 3629 243 6319 427]
[7694 1260 1621 956 1491 2258 676 3902 582]
[1590 4720 772 1366 2112 3498 1279 5474 3474]
[2587 872 333 1984 7263 168 3782 823 9]
[8525 193 449 982 4521 449 3811 2891 3353]
[6824 9221 964 389 4454 720 1898 806 58]]
"""
Performance is not better on individual ants, but in parallel the time per-ant is much better
from timeit import timeit
count = 100
antCount = 100
stepCount = 1000
ap = np.random.choice(len(nodes),antCount)
t = timeit(lambda:moveAnts(antCount,stepCount,ap),number=count)
print(t) # 0.9538277329999989 / 100 --> 0.009538277329999989 per ant
[EDIT] I thought of a better array model for the variable sized rows and came up with an approach that will not waste memory in a (mostly empty) matrix of fixed dimension. The approach is to use a 1D array to hold the links of all nodes consecutively and two additional arrays to hold the starting position and number of neighbours. This data structure turns out to run even faster than the fixed sized 2D matrix.
import numpy as np
import random
nodes = [list(range(random.randint(2, 10000))) for _ in range(10000)]
links = np.array(list(n for neighbors in nodes for n in neighbors))
linkCount = np.array(list(map(len,nodes)),dtype=np.int) # number of neighbors for each node
firstLink = np.insert(np.add.accumulate(linkCount),0,0) # index of first link for each node
def moveAnts(antCount=12,stepCount=8,antPos=None,allPaths=False):
if antPos is None:
antPos = np.random.choice(len(nodes),antCount)
paths = antPos[:,None]
for _ in range(stepCount):
nextIndex = np.random.random_sample(size=(antCount,))*linkCount[antPos]
antPos = links[firstLink[antPos]+nextIndex.astype(np.int)]
if allPaths:
paths = np.append(paths,antPos[:,None],axis=1)
return paths if allPaths else antPos
from timeit import timeit
count = 100
antCount = 100
stepCount = 1000
ap = np.random.choice(len(nodes),antCount)
t = timeit(lambda:moveAnts(antCount,stepCount,ap),number=count)
print(t) # 0.7157810379999994 / 100 --> 0.007157810379999994 per ant
The performance "per ant" improves as you add more of them, up to a point (roughly 10x faster than sol3):
antCount = 1000
stepCount = 1000
ap = np.random.choice(len(nodes),antCount)
t = timeit(lambda:moveAnts(antCount,stepCount,ap),number=count)
print(t,t/antCount) #3.9749405650000007, 0.0039749405650000005 per ant
antCount = 10000
stepCount = 1000
ap = np.random.choice(len(nodes),antCount)
t = timeit(lambda:moveAnts(antCount,stepCount,ap),number=count)
print(t,t/antCount) #32.688697579, 0.0032688697579 per ant
I need to find where a smaller 2d array, array1 matches the closest inside another 2d array, array2.
array1 with have the size of grid_size 46x46 to 96x96.
array2 will be larger (184x184).
I only have access to numpy.
I am currently trying to use the Tversky formula but am not tied to it.
Efficiency is the most important part as this will run many times. My current solution shown below is very slow.
for i in range(array2.shape[0] - grid_size):
for j in range(array2.shape[1] - grid_size):
r[i, j] = np.sum(array2[i:i+grid_size, j:j+grid_size] == array1 ) / (np.sum(array2[i:i+grid_size, j:j+grid_size] != array1 ) + np.sum(Si[i:i+grid_size, j:j+grid_size] == array1 ))
Edit:
The goal is to find the location where a smaller image matches another image.
Here is an FFT/convolution based approach that minimizes Euclidean distance:
import numpy as np
from numpy import fft
N = 184
n = 46
pad = 192
def best_offs(A,a):
A,a = A.astype(float),a.astype(float)
Ap,ap = (np.zeros((pad,pad)) for _ in "Aa")
Ap[:N,:N] = A
ap[:n,:n] = a
sim = fft.irfft2(fft.rfft2(ap).conj()*fft.rfft2(Ap))[:N-n+1,:N-n+1]
Ap[:N,:N] = A*A
ap[:n,:n] = 1
ref = fft.irfft2(fft.rfft2(ap).conj()*fft.rfft2(Ap))[:N-n+1,:N-n+1]
return np.unravel_index((ref-2*sim).argmin(),sim.shape)
# example
# random picture
A = np.random.randint(0,256,(N,N),dtype=np.uint8)
# random offset
offy,offx = np.random.randint(0,N-n+1,2)
# sub pic at random offset
# randomly flip half of the least significant 75% of all bits
a = A[offy:offy+n,offx:offx+n] ^ np.random.randint(0,64,(n,n))
# reconstruct offset
oyrec,oxrec = best_offs(A,a)
assert offy==oyrec and offx==oxrec
# speed?
from timeit import timeit
print(timeit(lambda:best_offs(A,a),number=100)*10,"ms")
# example with zero a
a[...] = 0
# make A smaller in a matching subsquare
A[offy:offy+n,offx:offx+n]>>=1
# reconstruct offset
oyrec,oxrec = best_offs(A,a)
assert offy==oyrec and offx==oxrec
Sample run:
3.458537160186097 ms
Given a numpy array M I want to calculate the matrix product M[i] # M[j].T of every 2-combination of matrices of this array. After having applied some operations to that matrix (the product), I want to store the results in another matrix at position [i,j]. Is there a way to compute this fast without iterating over two nested loops?
I.e. what I want to avoid (because it takes literally hours) is:
import numpy as np
M = np.random.rand(7000,3,3)
r = np.zeros((len(M), len(M)))
for i in range(len(r)):
for j in range(len(r[0])):
n = M[i] # M[j].T
r[i,j] = np.linalg.norm(n)
Assuming You wanted r's shape to be (M.shape[0],M.shape[0]).
M = np.random.rand(700,3,3)
t = M.shape[0]
r = np.zeros((t, t))
This is equivalent to the first statement of your inner loop
q = M[:,None,...] # M.swapaxes(1,2)
And this completes the inner/outer loop
p = np.linalg.norm(q, axis=(2,3))
for i in range(len(r)):
for j in range(len(r[0])):
n = M[i] # M[j].T
r[i,j] = np.linalg.norm(n)
>>> np.all(np.isclose(p,r))
True
With M.shape -> (70,3,3) it is about 42 times faster than the for loop.
With M.shape -> (700,3,3) it is about 36 times faster than the for loop.
My poor computer cannot handle M.shape --> (7000,3,3) ... MemoryError.
In a UxU periodic domain, I simulate the dynamics of a 2D array, with entries denoting x-y coordinates. At each time step, the "parent" entries are replaced by new coordinates selected from their normally distributed "offsprings", keeping the array size the same. To illustrate:
import numpy as np
import random
np.random.seed(13)
def main(time_step=10):
def dispersal(self, litter_size_):
return np.random.multivariate_normal([self[0], self[1]], [[sigma**2*1, 0], [0, 1*sigma**2]], litter_size_) % U
U = 10
sigma = 2.
parent = np.random.random(size=(4,2))*U
for t in range(time_step):
offspring = []
for parent_id in range(len(parent)):
litter_size = np.random.randint(1,4) # 1-3 offsprings reproduced per parent
offspring.append(dispersal(parent[parent_id], litter_size))
offspring = np.vstack(offspring)
indices = np.arange(len(offspring))
parent = offspring[np.random.choice(indices, 4, replace=False)] # only 4 survives to parenthood
return parent
However, the function can be inefficient to run, indicated by:
from timeit import timeit
timeit(main, number=10000)
that returns 40.13353896141052 secs.
A quick check with cProfile seems to identify Numpy's multivariate_normal function as a bottleneck.
Is there a more efficient way to implement this operation?
Yeah many functions in Numpy are relatively expensive if you use them on single numbers, as multivariate_normal shows in this case. Because the number of offspring is within the narrow range of [1, 3] it's worthwhile to pre-compute random samples. We can take samples around mean=(0,0) and during the iteration add the actual coordinates of the parents.
Also the inner loop can be vectorized. Resulting in:
def main_2(time_step=10, n_parent=4, max_offspring=3):
U = 10
sigma = 2.
cov = [[sigma**2, 0], [0, sigma**2]]
size = n_parent * max_offspring * time_step
samples = np.random.multivariate_normal(np.zeros(2), cov, size)
parents = np.random.rand(n_parent, 2) * U
for _ in range(time_step):
litter_size = np.random.randint(1, max_offspring+1, n_parent)
n_offspring = litter_size.sum()
parents = np.repeat(parents, litter_size, axis=0)
offspring = (parents + samples[:n_offspring]) % U
samples = samples[n_offspring:]
parents = np.random.permutation(offspring)[:n_parent]
return parents
The timings I get are:
In [153]: timeit(main, number=1000)
Out[153]: 9.255848071099535
In [154]: timeit(main_2, number=1000)
Out[154]: 0.870663221881841