How to improve the runtime of this matrix generation loop in python3? - python

I have code that is simulating interactions between lots of particles. Using profiling, I've worked out that the function that's causing the most slowdown is a loop which iterates over all my particles and works out the time for the collision between each of them. This generates a symmetric matrix, which I then take the minimum value out of.
def find_next_collision(self, print_matrix = False):
"""
Sets up a matrix of collision times
Returns the indices of the balls in self.list_of_balls that are due to
collide next and the time to the next collision
"""
self.coll_time_matrix = np.zeros((np.size(self.list_of_balls), np.size(self.list_of_balls)))
for i in range(np.size(self.list_of_balls)):
for j in range(i+1):
if (j==i):
self.coll_time_matrix[i][j] = np.inf
else:
self.coll_time_matrix[i][j] = self.list_of_balls[i].time_to_collision(self.list_of_balls[j])
matrix = self.coll_time_matrix + self.coll_time_matrix.T
self.coll_time_matrix = matrix
ind = np.unravel_index(np.argmin(self.coll_time_matrix, axis = None), self.coll_time_matrix.shape)
dt = self.coll_time_matrix[ind]
if (print_matrix):
print(self.coll_time_matrix)
return dt, ind
This code is a method inside a class which defines the positions of all of the particles. Each of these particles is an object saved in self.list_of_balls (which is a list). As you can see, I'm already only iterating over half of this matrix, but it's still quite a slow function. I've tried using numba, but this is a section of quite a large code, and I don't want to have to optimize every function with numba when this is the slow one.
Does anyone have any ideas for a more efficient way to write this function?
Thank you in advance!

Like Raubsauger mentioned in their answer, evaluating ifs is slow
for j in range(i+1):
if (j==i):
You can get rid of this if by simply doing for j in range(i). That way j goes from 0 to i-1
You should also try to avoid loops when possible. You can do this by expressing your problem in a vectorized way, and using numpy or scipy functions that leverage SIMD operations to speed up calculations. Here's a simplified example assuming the time_to_collision simply divides Euclidean distance by speed. If you store the coordinates and speeds of the balls in a numpy array instead of storing ball objects in a list, you could do:
from scipy.spatial.distance import pdist
rel_distances = pdist(ball_coordinates)
rel_speeds = pdist(ball_speeds)
time = rel_distances / rel_speeds
pdist documentation
Of course, this isn't going to work verbatim if your time_to_collision function is more complicated, but it should point you in the right direction.

First Question: How many particles do you have?
If you have a lot of particles: one improvement would be
for i in range(np.size(self.list_of_balls)):
for j in range(i):
self.coll_time_matrix[i][j] = self.list_of_balls[i].time_to_collision(self.list_of_balls[j])
self.coll_time_matrix[i][i] = np.inf
often executed ifs slow everything down. Avoid them in innerloops
2nd question: is it necessary to compute this every time? Wouldn't it be faster to compute points in time and only refresh those lines and columns which were involved in a collision?
Edit:
Idea here is to initially calculate either the time left or (better solution) the timestamp for the collision as you already as well as the order. But instead to throw the calculated results away, you only update the values when needed. This way you need to calculate only 2*n instead of n^2/2 values.
Sketch:
# init step, done once at the beginning, might need an own function
matrix ... # calculate matrix like before; I asume that you use timestamps instead of time left
min_times = np.zeros(np.size(self.list_of_balls))
for i in range(np.size(self.list_of_balls)):
min_times[i] = min(self.coll_time_matrix[i])
order_coll = np.argsort(min_times)
ind = order_coll[0]
dt = self.coll_time_matrix[ind]
return dt, ind
# function step: if a collision happened, order_coll[0] and order_coll[1] hit each other
for balls in order_coll[0:2]:
for i in range(np.size(self.list_of_balls)):
self.coll_time_matrix[balls][i] = self.list_of_balls[balls].time_to_collision(self.list_of_balls[i])
self.coll_time_matrix[i][balls] = self.coll_time_matrix[balls][i]
self.coll_time_matrix[balls][balls] = np.inf
for i in range(np.size(self.list_of_balls)):
min_times[i] = min(self.coll_time_matrix[i])
order_coll = np.argsort(min_times)
ind = order_coll[0]
dt = self.coll_time_matrix[ind]
return dt, ind
In case you calculate time left in the matrix you have to subtract the time passed from the matrix. Also you somehow need to store the matrix and (optionally) min_times and order_coll.

Related

Why is this numba code to store and process pointcloud data slower than the pure Python version?

I need to store some data structure like that:
{'x1,y1,z1': [[p11_x,p11_y,p11_z], [p12_x,p12_y,p12_z], ..., [p1n_x,p1n_y,p1n_z]],
'x2,y2,z2': [[p21_x,p21_y,p21_z], [p22_x,p22_y,p22_z], ..., [p2n_x,p2n_y,p2n_z]],
...
'xn,yn,zn': [[pn1_x,pn1_y,pn1_z], [pn2_x,pn2_y,pn2_z], ..., [pnm_x,pnm_y,pnm_z]]}
Every key is a grid cell index and the value is a list of classified points. The list can be variable length but I can set it static, for example 1000 elements.
For now I tried something like this:
np.zeros(shape=(100,100,100,50,3))
But if I use numba.jit with that function the execution time is few times worse than with pure Python.
Simple Python example of what I want to do:
def split_into_grid_py(points: np.array):
grid = {}
for point in points:
r_x = round(point[0])
r_y = round(point[1])
r_z = round(point[2])
try:
grid[(r_x, r_y, r_z)].append(point)
except KeyError:
grid[(r_x, r_y, r_z)] = [point]
return grid
Is there any efficient way of doing that with numba?
Per 10 execution in loop times are like:
numba: 7.050494909286499
pure Python: 1.0014197826385498
With the same data set, so it's crap optimization.
My numba code:
#numba.jit(nopython=True)
def split_into_grid(points: np.array):
grid = np.zeros(shape=(100,100,100,50,3))
for point in points:
r_x = round(point[0])
r_y = round(point[1])
r_z = round(point[2])
i = 0
for cell in grid[r_x][r_y][r_z]:
if not np.sum(cell):
grid[r_x][r_y][r_z][i] = point
break
i += 1
return grid
The pure Python version append items in O(1) time thanks to the dictionary container while the Numba version use a O(n) array search (bounded by 50). Moreover, np.zeros(shape=(100,100,100,50,3)) allocate an array of about 1 GiB which resulting in many cache misses during the computation which will be done in RAM. Meanwhile, the pure Python version may fit in the CPU caches. There are two strategies to solve that.
The first strategy is to use 3 containers. An array keyGrid mapping each grid cell to an offset in a second array valueGrid or -1 if there is no point associated to this cell. valueGrid contains all the points for a given grid cell. Finally, countingGrid count the number of points per grid cell. Here is an untested example:
#numba.jit(nopython=True)
def split_into_grid(points: np.array):
# Note: use np.uint16 if the actual number of filled grid cell is less than 65536
keyGrid = np.full(shape=(100,100,100), -1, dtype=np.uint32)
i = 0
for point in points:
r_x = round(point[0])
r_y = round(point[1])
r_z = round(point[2])
if keyGrid[r_x,r_y,r_z] < 0:
keyGrid[r_x,r_y,r_z] = i
i += 1
uniqueCloundPointCount = i
# Note the number of points per grid cell is also bounded by the type
countingGrid = np.zeros(uniqueCloundPointCount, dtype=np.uint8)
valueGrid = np.full((uniqueCloundPointCount, 50, 3), -1, dtype=np.int32)
for point in points:
r_x = round(point[0])
r_y = round(point[1])
r_z = round(point[2])
key = keyGrid[r_x,r_y,r_z]
addingPos = countingGrid[key]
valueGrid[key, addingPos] = point
countingGrid[key] += 1
return (keyGrid, valueGrid, countingGrid)
Note that the arrays are quite small as long as not all grid cells contains points resulting in less cache misses. Moreover the mapping of each point is done in (small) constant time resulting in a much faster code.
The second strategy is to use the same method than in the pure Python implementation, but with Numba types. Indeed, Numba experimentally supports dictionaries. You can replace the exception with a dictionary check ((r_x, r_y, r_z) in grid) which will cause less compilation issues and likely speed up resulting code. Note that Numba dict are often as fast as CPython ones (if not slower). So the resulting code may not be much much faster.

Programing and optimizing code for a Basin Hopping method

So I'm programing a Basin Hoping from zero to find the potential minima of a system of interacting particles and I have obtained good results so far, but since since I increase the number of particles of the system, the code takes more time to run. I am using scipy conjugate gradient for finding local minima. I'm not getting any error messages and the program seems to work out fine, but I was wondering how I can optimize the code so the computing time is reduced.
I defined the Basin Hopping as a function, given:
def bhmet(n,itmax, pot,derpot, T):
i = 1
phi = np.random.rand(n)*2*np.pi
theta = np.random.rand(n)*np.pi
x = 3*np.sin(theta)*np.cos(phi)
y = 3*np.sin(theta)*np.sin(phi)
z = 3*np.cos(theta)
xyzat = np.hstack((x,y,z))
vmintot = 0
while i <= itmax:
print(i)
plmin = optimize.fmin_cg(pot,xyzat,derpot,gtol = 1e-5,) #posiciones para el mínimo local.4
if pot(plmin) < vmintot or np.exp((1/T)*(vmintot-pot(plmin))) > np.random.rand():
vmintot = pot(plmin)
xyzat = plmin + 2*0.3*(np.random.rand(len(plmin))-0.5)
i = i + 1
return plmin,vmintot
I tried defining the initial condition (the first 'xyzat') as a matrix, but the parameter of scipy.optimize.fmin_cg is requested as an array (and thats why in the functions I will be reshaping the array as a matrix).
The function I'm searching the global minima for is:
def ljpot(posiciones,):
r = np.array([])
matpos = np.zeros((int((1/3)*len(posiciones)),3))
matpos[:,0] = posiciones[0:int((1/3)*len(posiciones))]
matpos[:,1] = posiciones[int((1/3)*len(posiciones)):int((2/3)*len(posiciones))]
matpos[:,2] = posiciones[int((2/3)*len(posiciones)):]
for j in range(0,np.shape(matpos)[0]):
for k in range(j+1,np.shape(matpos)[0]):
ri = np.sqrt(sum((matpos[k,:]-matpos[j,:])**2))
r = np.append(r,ri)
V = 4*((1/r)**12-(1/r)**6)
vt = sum(V)
return vt
And its gradient would be:
def gradpot(posiciones,):
gradv = np.array([])
matposg = np.zeros((int((1/3)*len(posiciones)),3))
matposg[:,0] = posiciones[:int((1/3)*len(posiciones))]
matposg[:,1] = posiciones[int((1/3)*len(posiciones)):int((2/3)*len(posiciones))]
matposg[:,2] = posiciones[int((2/3)*len(posiciones)):]
for w in range(0,np.shape(matposg)[1]): #índice que cambia de columna.
for k in range(0,np.shape(matposg)[0]): #índice que cambia de fila.
rkj = np.array([])
xkj = np.array([])
for j in range(0,np.shape(matposg)[0]): #también este cambia de fila.
if j != k:
r = np.sqrt(sum((matposg[j,:]-matposg[k,:])**2))
rkj = np.append(rkj,r)
xkj = np.append(xkj,matposg[j,w])
dEdxj = sum(4*(6*(1/rkj)**8- 12*(1/rkj)**14)*(matposg[k,w]-xkj))
gradv = np.append(gradv,dEdxj)
return gradv
The reason I transform the array input in a matrix is that for each particle there are three coordinates x,y,z so the columns of the matrix are x's, y's, z's for every particle. I tried to do this using np.reshape(), but it seemed to give me wrong results for systems for which the program has already gotten the correct ones.
The code seems to work out fine, but as I increase the number of particles the running time increases exponentially. I know global optimization can take long time, but maybe I'm messing a little with the code. I don´t know if the way to reduce the time of running is pretty obvious,I'm kinda new to optimizing the code, so sorry if it's the case. Of course, any advices are welcome. Thank you all very much!
Two things I noticed after a quick look where you can definitely save some time. Both require some more thinking before, but afterwards you will be rewarded with an optimised and cleaner code.
1. Try to avoid using append. append is highly inefficient. You start with an empty array, and afterwards need to allocate more memory each time. This leads to inefficient memory handling, as you copy your array each time you append a number. The longer the array gets, the more inefficient append becomes.
Alternative: Use np.zeros((m,n)) to initialise the array, with m and n being the size it will end up with. Then you need a counter that puts the new values in the corresponding places. If the size of the array is not defined before your calculation, you can initialise it as a big number, and afterwards cut it.
2. Try to avoid using for loops. They are generally very slow, especially when iterating through large matrices, as you need to index each entry individually.
Alternative: Matrix operations are generally much faster. For example, instead of r = np.sqrt(sum((matposg[j,:]-matposg[k,:])**2)) inside two for loops, you could first define two matrices A and B that correspond to matposg[j,:] and matposg[k,:] (should be possible without the use of loops!) and then simply use r = np.linalg.norm(A-B)

python find the intersection point of two numpy array

I have two numpy array that describes a spatial curve, that are intersected on one point and I want to find the nearest value in both array for that intersection point, I have this code that works fine but its to slow for large amount of points.
from scipy import spatial
def nearest(arr0, arr1):
ptos = []
j = 0
for i in arr0:
distance, index = spatial.KDTree(arr1).query(i)
ptos.append([distance, index, j])
j += 1
ptos.sort()
return (arr1[ptos[0][1]].tolist(), ptos[0][1], ptos[0][2])
the result will be (<point coordinates>,<position in arr1>,<position in arr0>)
Your code is doing a lot of things you don't need. First you're rebuilding the KDtree on every loop and that's a waste. Also query takes an array of points, so no need to write your own loop. Ptos is an odd data structure, and you don't need it (and don't need to sort it). Try something like this.
from scipy import spatial
def nearest(arr0, arr1):
tree = spatial.KDTree(arr1)
distance, arr1_index = tree.query(arr0)
best_arr0 = distance.argmin()
best_arr1 = arr1_index[best_arr0]
two_closest_points = (arr0[best_arr0], arr1[best_arr1])
return two_closest_points, best_arr1, best_arr0
If that still isn't fast enough, you'll need to describe your problem in more detail and figure out if another search algorithm will work better for your problem.

Speeding up distance between all possible pairs in an array

I have an array of x,y,z coordinates of several (~10^10) points (only 5 shown here)
a= [[ 34.45 14.13 2.17]
[ 32.38 24.43 23.12]
[ 33.19 3.28 39.02]
[ 36.34 27.17 31.61]
[ 37.81 29.17 29.94]]
I want to make a new array with only those points which are at least some distance d away from all other points in the list. I wrote a code using while loop,
import numpy as np
from scipy.spatial import distance
d=0.1 #or some distance
i=0
selected_points=[]
while i < len(a):
interdist=[]
j=i+1
while j<len(a):
interdist.append(distance.euclidean(a[i],a[j]))
j+=1
if all(dis >= d for dis in interdist):
np.array(selected_points.append(a[i]))
i+=1
This works, but it is taking really long to perform this calculation. I read somewhere that while loops are very slow.
I was wondering if anyone has any suggestions on how to speed up this calculation.
EDIT: While my objective of finding the particles which are at least some distance away from all the others stays the same, I just realized that there is a serious flaw in my code, let's say I have 3 particles, my code does the following, for the first iteration of i, it calculates the distances 1->2, 1->3, let's say 1->2 is less than the threshold distance d, so the code throws away particle 1. For the next iteration of i, it only does 2->3, and let's say it finds that it is greater than d, so it keeps particle 2, but this is wrong! since 2 should also be discarded with particle 1. The solution by #svohara is the correct one!
For big data sets and low-dimensional points (such as your 3-dimensional data), sometimes there is a big benefit to using a spatial indexing method. One popular choice for low-dimensional data is the k-d tree.
The strategy is to index the data set. Then query the index using the same data set, to return the 2-nearest neighbors for each point. The first nearest neighbor is always the point itself (with dist=0), so we really want to know how far away the next closest point is (2nd nearest neighbor). For those points where the 2-NN is > threshold, you have the result.
from scipy.spatial import cKDTree as KDTree
import numpy as np
#a is the big data as numpy array N rows by 3 cols
a = np.random.randn(10**8, 3).astype('float32')
# This will create the index, prepare to wait...
# NOTE: took 7 minutes on my mac laptop with 10^8 rand 3-d numbers
# there are some parameters that could be tweaked for faster indexing,
# and there are implementations (not in scipy) that can construct
# the kd-tree using parallel computing strategies (GPUs, e.g.)
k = KDTree(a)
#ask for the 2-nearest neighbors by querying the index with the
# same points
(dists, idxs) = k.query(a, 2)
# (dists, idxs) = k.query(a, 2, n_jobs=4) # to use more CPUs on query...
#Note: 9 minutes for query on my laptop, 2 minutes with n_jobs=6
# So less than 10 minutes total for 10^8 points.
# If the second NN is > thresh distance, then there is no other point
# in the data set closer.
thresh_d = 0.1 #some threshold, equiv to 'd' in O.P.'s code
d_slice = dists[:, 1] #distances to second NN for each point
res = np.flatnonzero( d_slice >= thresh_d )
Here's a vectorized approach using distance.pdist -
# Store number of pts (number of rows in a)
m = a.shape[0]
# Get the first of pairwise indices formed with the pairs of rows from a
# Simpler version, but a bit slow : idx1,_ = np.triu_indices(m,1)
shifts_arr = np.zeros(m*(m-1)/2,dtype=int)
shifts_arr[np.arange(m-1,1,-1).cumsum()] = 1
idx1 = shifts_arr.cumsum()
# Get the IDs of pairs of rows that are more than "d" apart and thus select
# the rest of the rows using a boolean mask created with np.in1d for the
# entire range of number of rows in a. Index into a to get the selected points.
selected_pts = a[~np.in1d(np.arange(m),idx1[distance.pdist(a) < d])]
For a huge dataset like 10e10, we might have to perform the operations in chunks based on the system memory available.
your algorithm is quadratic (10^20 operations), Here is a linear approach if distribution is nearly random.
Splits your space in boxes of size d/sqrt(3)^3. Put each points in its box.
Then for each box,
if there is just one point, you just have to calculate distance with points in a little neighborhood.
else there is nothing to do.
Drop the append, it must be really slow. You can have a static vector of distances and use [] to put the number in the right position.
Use min instead of all. You only need to check if the minimum distance is bigger than x.
Actually, you can break on your append in the moment that you find a distance smaller than your limit, and then you can drop out both points. In this way you even do not have to save any distance (unless you need them later).
Since d(a,b)=d(b,a) you can do the internal loop only for the following points, forget about the distances you already calculated. If you need them you can pick the faster from the array.
From your comment, I believe this would do, if you have no repeated points.
selected_points = []
for p1 in a:
save_point = True
for p2 in a:
if p1!=p2 and distance.euclidean(p1,p2)<d:
save_point = False
break
if save_point:
selected_points.append(p1)
return selected_points
In the end I check a,b and b,a because you should not modify a list while processing it, but you can be smarter using some aditional variables.

can I do fast set difference with floats using numpy if elements are equal up to some tolerance

I have two lists of float numbers, and I want to calculate the set difference between them.
With numpy I originally wrote the following code:
aprows = allpoints.view([('',allpoints.dtype)]*allpoints.shape[1])
rprows = toberemovedpoints.view([('',toberemovedpoints.dtype)]*toberemovedpoints.shape[1])
diff = setdiff1d(aprows, rprows).view(allpoints.dtype).reshape(-1, 2)
This works well for things like integers. In case of 2d points with float coordinates that are the result of some geometrical calculations, there's a problem of finite precision and rounding errors causing the set difference to miss some equalities. For now I resorted to the much, much slower:
diff = []
for a in allpoints:
remove = False
for p in toberemovedpoints:
if norm(p-a) < 0.1:
remove = True
if not remove:
diff.append(a)
return array(diff)
But is there a way to write this with numpy and gain back the speed?
Note that I want the remaining points to still have their full precision, so first rounding the numbers and then do a set difference probably is not the way forward (or is it? :) )
Edited to add an solution based on scipy.KDTree that seems to work:
def remove_points_fast(allpoints, toberemovedpoints):
diff = []
removed = 0
# prepare a KDTree
from scipy.spatial import KDTree
tree = KDTree(toberemovedpoints, leafsize=allpoints.shape[0]+1)
for p in allpoints:
distance, ndx = tree.query([p], k=1)
if distance < 0.1:
removed += 1
else:
diff.append(p)
return array(diff), removed
If you want to do this with the matrix form, you have a lot of memory consumption with larger arrays. If that does not matter, then you get the difference matrix by:
diff_array = allpoints[:,None] - toberemovedpoints[None,:]
The resulting array has as many rows as there are points in allpoints, and as many columns as there are points in toberemovedpoints. Then you can manipulate this any way you want (e.g. calculate the absolute value), which gives you a boolean array. To find which rows have any hits (absolute difference < .1), use numpy.any:
hits = numpy.any(numpy.abs(diff_array) < .1, axis=1)
Now you have a vector which has the same number of items as there were rows in the difference array. You can use that vector to index all points (negation because we wanted the non-matching points):
return allpoints[-hits]
This is a numpyish way of doing this. But, as I said above, it takes a lot of memory.
If you have larger data, then you are better off doing it point by point. Something like this:
return allpoints[-numpy.array([numpy.any(numpy.abs(a-toberemoved) < .1) for a in allpoints ])]
This should perform well in most cases, and the memory use is much lower than with the matrix solution. (For stylistic reasons you may want to use numpy.all instead of numpy.any and turn the comparison around to get rid of the negation.)
(Beware, there may be pritning mistakes in the code.)

Categories

Resources