So I'm programing a Basin Hoping from zero to find the potential minima of a system of interacting particles and I have obtained good results so far, but since since I increase the number of particles of the system, the code takes more time to run. I am using scipy conjugate gradient for finding local minima. I'm not getting any error messages and the program seems to work out fine, but I was wondering how I can optimize the code so the computing time is reduced.
I defined the Basin Hopping as a function, given:
def bhmet(n,itmax, pot,derpot, T):
i = 1
phi = np.random.rand(n)*2*np.pi
theta = np.random.rand(n)*np.pi
x = 3*np.sin(theta)*np.cos(phi)
y = 3*np.sin(theta)*np.sin(phi)
z = 3*np.cos(theta)
xyzat = np.hstack((x,y,z))
vmintot = 0
while i <= itmax:
print(i)
plmin = optimize.fmin_cg(pot,xyzat,derpot,gtol = 1e-5,) #posiciones para el mínimo local.4
if pot(plmin) < vmintot or np.exp((1/T)*(vmintot-pot(plmin))) > np.random.rand():
vmintot = pot(plmin)
xyzat = plmin + 2*0.3*(np.random.rand(len(plmin))-0.5)
i = i + 1
return plmin,vmintot
I tried defining the initial condition (the first 'xyzat') as a matrix, but the parameter of scipy.optimize.fmin_cg is requested as an array (and thats why in the functions I will be reshaping the array as a matrix).
The function I'm searching the global minima for is:
def ljpot(posiciones,):
r = np.array([])
matpos = np.zeros((int((1/3)*len(posiciones)),3))
matpos[:,0] = posiciones[0:int((1/3)*len(posiciones))]
matpos[:,1] = posiciones[int((1/3)*len(posiciones)):int((2/3)*len(posiciones))]
matpos[:,2] = posiciones[int((2/3)*len(posiciones)):]
for j in range(0,np.shape(matpos)[0]):
for k in range(j+1,np.shape(matpos)[0]):
ri = np.sqrt(sum((matpos[k,:]-matpos[j,:])**2))
r = np.append(r,ri)
V = 4*((1/r)**12-(1/r)**6)
vt = sum(V)
return vt
And its gradient would be:
def gradpot(posiciones,):
gradv = np.array([])
matposg = np.zeros((int((1/3)*len(posiciones)),3))
matposg[:,0] = posiciones[:int((1/3)*len(posiciones))]
matposg[:,1] = posiciones[int((1/3)*len(posiciones)):int((2/3)*len(posiciones))]
matposg[:,2] = posiciones[int((2/3)*len(posiciones)):]
for w in range(0,np.shape(matposg)[1]): #índice que cambia de columna.
for k in range(0,np.shape(matposg)[0]): #índice que cambia de fila.
rkj = np.array([])
xkj = np.array([])
for j in range(0,np.shape(matposg)[0]): #también este cambia de fila.
if j != k:
r = np.sqrt(sum((matposg[j,:]-matposg[k,:])**2))
rkj = np.append(rkj,r)
xkj = np.append(xkj,matposg[j,w])
dEdxj = sum(4*(6*(1/rkj)**8- 12*(1/rkj)**14)*(matposg[k,w]-xkj))
gradv = np.append(gradv,dEdxj)
return gradv
The reason I transform the array input in a matrix is that for each particle there are three coordinates x,y,z so the columns of the matrix are x's, y's, z's for every particle. I tried to do this using np.reshape(), but it seemed to give me wrong results for systems for which the program has already gotten the correct ones.
The code seems to work out fine, but as I increase the number of particles the running time increases exponentially. I know global optimization can take long time, but maybe I'm messing a little with the code. I don´t know if the way to reduce the time of running is pretty obvious,I'm kinda new to optimizing the code, so sorry if it's the case. Of course, any advices are welcome. Thank you all very much!
Two things I noticed after a quick look where you can definitely save some time. Both require some more thinking before, but afterwards you will be rewarded with an optimised and cleaner code.
1. Try to avoid using append. append is highly inefficient. You start with an empty array, and afterwards need to allocate more memory each time. This leads to inefficient memory handling, as you copy your array each time you append a number. The longer the array gets, the more inefficient append becomes.
Alternative: Use np.zeros((m,n)) to initialise the array, with m and n being the size it will end up with. Then you need a counter that puts the new values in the corresponding places. If the size of the array is not defined before your calculation, you can initialise it as a big number, and afterwards cut it.
2. Try to avoid using for loops. They are generally very slow, especially when iterating through large matrices, as you need to index each entry individually.
Alternative: Matrix operations are generally much faster. For example, instead of r = np.sqrt(sum((matposg[j,:]-matposg[k,:])**2)) inside two for loops, you could first define two matrices A and B that correspond to matposg[j,:] and matposg[k,:] (should be possible without the use of loops!) and then simply use r = np.linalg.norm(A-B)
Related
The quantity to be computed is log(k!), where k could be 4000 or even higher, but of course the log will compensate. I tried computing sum(log(k)) which is the same.
So, I am given an large array with integers and I want to efficiently compute sum(log(k)). This was my attempt:
integers = np.asarray([435, 535, 242,])
score = np.sum(np.log(np.arange(1,integers+1)))
This would work, except that np.arange would generate an array of different size for each integer, so when I run that, it gives me an error (as it should).
The problem could be easily solved with a for loop as follows:
scores = []
for i in range(integers.shape[0]):
score = np.sum(np.log(np.arange(1,integer[i]+1)))
scores.append(score)
but that's too slow. My actual integers has millions of value to be computed.
Is there an efficient implementation for this that basically that doesn't need a for loop? I was thinking of a lambda function or something like that, but I am not really sure how to apply it. Any help is appreciated!
How about math.lgamma? Gamma function is factorial, and lgamma is log of gamma.
You don't need to compute factorial and then log.
There is also gammaln in the SciPy
Code, Python 3.9 x64 Win 10
import numpy as np
from scipy.special import gammaln
startf = 1 # start of factorial sequence
stopf = 400 # end of of factorial sequence
q = gammaln(range(startf+1, stopf+1)) # n! = G(n+1)
print(q)
looks reasonable to me
You can vectorize with something like this:
mi = integers.max()
ls = np.log(np.arange(2, mi + 1))
Two optimizations so far: you only need the range up to the maximum, since the other numbers are covered by that, and you don't need log(1).
Now you take the cumulative sum:
cs = np.cumsum(ls)
The desired elements can be indexed directly:
result = cs[integers - 2]
If this is something you need to do many times, and you know the upper bound, this solution will be much faster than using math.lgmamma or scipy.special.gammaln once you precompute cs to the upper bound.
If this is a one-time call, here is the obligatory one-liner:
np.cumsum(np.log(np.arange(2, np.max(integers))))[integers - 2]
You can do most of the operations in-place if memory is a concern (I think it also makes them faster):
mi = integers.max()
cs = np.arange(2, mi + 1)
np.cumsum(np.log(cs, out=cs), out=cs)
I have code that is simulating interactions between lots of particles. Using profiling, I've worked out that the function that's causing the most slowdown is a loop which iterates over all my particles and works out the time for the collision between each of them. This generates a symmetric matrix, which I then take the minimum value out of.
def find_next_collision(self, print_matrix = False):
"""
Sets up a matrix of collision times
Returns the indices of the balls in self.list_of_balls that are due to
collide next and the time to the next collision
"""
self.coll_time_matrix = np.zeros((np.size(self.list_of_balls), np.size(self.list_of_balls)))
for i in range(np.size(self.list_of_balls)):
for j in range(i+1):
if (j==i):
self.coll_time_matrix[i][j] = np.inf
else:
self.coll_time_matrix[i][j] = self.list_of_balls[i].time_to_collision(self.list_of_balls[j])
matrix = self.coll_time_matrix + self.coll_time_matrix.T
self.coll_time_matrix = matrix
ind = np.unravel_index(np.argmin(self.coll_time_matrix, axis = None), self.coll_time_matrix.shape)
dt = self.coll_time_matrix[ind]
if (print_matrix):
print(self.coll_time_matrix)
return dt, ind
This code is a method inside a class which defines the positions of all of the particles. Each of these particles is an object saved in self.list_of_balls (which is a list). As you can see, I'm already only iterating over half of this matrix, but it's still quite a slow function. I've tried using numba, but this is a section of quite a large code, and I don't want to have to optimize every function with numba when this is the slow one.
Does anyone have any ideas for a more efficient way to write this function?
Thank you in advance!
Like Raubsauger mentioned in their answer, evaluating ifs is slow
for j in range(i+1):
if (j==i):
You can get rid of this if by simply doing for j in range(i). That way j goes from 0 to i-1
You should also try to avoid loops when possible. You can do this by expressing your problem in a vectorized way, and using numpy or scipy functions that leverage SIMD operations to speed up calculations. Here's a simplified example assuming the time_to_collision simply divides Euclidean distance by speed. If you store the coordinates and speeds of the balls in a numpy array instead of storing ball objects in a list, you could do:
from scipy.spatial.distance import pdist
rel_distances = pdist(ball_coordinates)
rel_speeds = pdist(ball_speeds)
time = rel_distances / rel_speeds
pdist documentation
Of course, this isn't going to work verbatim if your time_to_collision function is more complicated, but it should point you in the right direction.
First Question: How many particles do you have?
If you have a lot of particles: one improvement would be
for i in range(np.size(self.list_of_balls)):
for j in range(i):
self.coll_time_matrix[i][j] = self.list_of_balls[i].time_to_collision(self.list_of_balls[j])
self.coll_time_matrix[i][i] = np.inf
often executed ifs slow everything down. Avoid them in innerloops
2nd question: is it necessary to compute this every time? Wouldn't it be faster to compute points in time and only refresh those lines and columns which were involved in a collision?
Edit:
Idea here is to initially calculate either the time left or (better solution) the timestamp for the collision as you already as well as the order. But instead to throw the calculated results away, you only update the values when needed. This way you need to calculate only 2*n instead of n^2/2 values.
Sketch:
# init step, done once at the beginning, might need an own function
matrix ... # calculate matrix like before; I asume that you use timestamps instead of time left
min_times = np.zeros(np.size(self.list_of_balls))
for i in range(np.size(self.list_of_balls)):
min_times[i] = min(self.coll_time_matrix[i])
order_coll = np.argsort(min_times)
ind = order_coll[0]
dt = self.coll_time_matrix[ind]
return dt, ind
# function step: if a collision happened, order_coll[0] and order_coll[1] hit each other
for balls in order_coll[0:2]:
for i in range(np.size(self.list_of_balls)):
self.coll_time_matrix[balls][i] = self.list_of_balls[balls].time_to_collision(self.list_of_balls[i])
self.coll_time_matrix[i][balls] = self.coll_time_matrix[balls][i]
self.coll_time_matrix[balls][balls] = np.inf
for i in range(np.size(self.list_of_balls)):
min_times[i] = min(self.coll_time_matrix[i])
order_coll = np.argsort(min_times)
ind = order_coll[0]
dt = self.coll_time_matrix[ind]
return dt, ind
In case you calculate time left in the matrix you have to subtract the time passed from the matrix. Also you somehow need to store the matrix and (optionally) min_times and order_coll.
I have a function written in python which does two procedures:
Preprocessing: read in data from an array and compute some values that I will later need to prevent repeated computation
Iterate and compute a 'summary' of the data at every stage and use this to solve an optimisation problem.
The code is as follows:
import numpy as np
def iterative_hessian(data, targets,
sketch_method, sketch_size, num_iters):
'''
Original problem is min 0.5*||Ax-b||_2^2
iterative_hessian asks us to minimise 0.5*||S_Ax||_2^2 - <A^Tb, x>
for a summary of the data S_A
'''
A = data
y = targets
n,d = A.shape
x0 = np.zeros(shape=(d,))
m = int(sketch_size) # sketching dimension
ATy = A.T#y
covariance_mat = A.T.dot(A)
for n_iter in range(int(num_iters)):
S_A = m**(-0.5)*np.random.normal(size=(m, n))
B = S_A.T.dot(S_A)
z = ATy - covariance_mat#x0 + np.dot(S_A.T, np.dot(S_A,x0)) #
x_new = np.linalg.solve(B,z)
x0 = x_new
return np.ravel(x0)
In practise I do not use the S_A = m**(-0.5)*np.random.normal(size=(m, n)) line but use a different random transform which is faster to apply but in principle it is sufficient for the question. This code works well for what I need but I was wondering if there is a reasonable way to do the following:
Instead of repeating the line S_A = m**(-0.5)*np.random.normal(size=(m, n)) for every iteration, is there a way to specify the number of independent random copies (num_iters - which can be thought of as between 10 and 30) of S_A that are needed prior to the iteration and scan through the input only once to generate all such copies? I think this would store the S_A variables in some kind of multi-dimensional array but I'm not sure how best to do this, or whether it is even practical. I have tried a basic example doing this in parallel but it is slower than repeatedly passing through the matrix.
Suppose that I want to endow this function with more properties, for instance I want to return the average time taken on line x_new = np.linalg.solve(B,z). Doing this is straightforward - import a time module and put the code in the function, however, this will always time the function and perhaps I only want to do this when testing. An easy way around this is to create a parameter in the function definition time_updates = False and then have if time_updates == False: proceed as above else: copy the exact same code but with some timing functionality added. Is there a better way to do this which can perhaps use classes in Python?
My intention is to use this iteration on blocks of data read in from a file which doesn't fit into memory. Whilst it might be possible to store a block in memory, it would be convenient if the function only passed over that block once rather than num_iters times. Passing over the quantities computed , S_A, covariance_matrix etc, is fine however.
I have a very large multiply and sum operation that I need to implement as efficiently as possible. The best method I've found so far is bsxfun in MATLAB, where I formulate the problem as:
L = 10000;
x = rand(4,1,L+1);
A_k = rand(4,4,L);
tic
for k = 2:L
i = 2:k;
x(:,1,k+1) = x(:,1,k+1)+sum(sum(bsxfun(#times,A_k(:,:,2:k),x(:,1,k+1-i)),2),3);
end
toc
Note that L will be larger in practice. Is there a faster method? It's strange that I need to first add the singleton dimension to x and then sum over it, but I can't get it to work otherwise.
It's still much faster than any other method I've tried, but not enough for our application. I've heard rumors that the Python function numpy.einsum may be more efficient, but I wanted to ask here first before I consider porting my code.
I'm using MATLAB R2017b.
I believe both of your summations can be removed, but I only removed the easier one for the time being. The summation over the second dimension is trivial, since it only affects the A_k array:
B_k = sum(A_k,2);
for k = 2:L
i = 2:k;
x(:,1,k+1) = x(:,1,k+1) + sum(bsxfun(#times,B_k(:,1,2:k),x(:,1,k+1-i)),3);
end
With this single change the runtime is reduced from ~8 seconds to ~2.5 seconds on my laptop.
The second summation could also be removed, by transforming times+sum into a matrix-vector product. It needs some singleton fiddling to get the dimensions right, but if you define an auxiliary array that is B_k with the second dimension reversed, you can generate the remaining sum as ~x*C_k with this auxiliary array C_k, give or take a few calls to reshape.
So after a closer look I realized that my original assessment was overly optimistic: you have multiplications in both dimensions in your remaining term, so it's not a simple matrix product. Anyway, we can rewrite that term to be the diagonal of a matrix product. This implies that we're computing a bunch of unnecessary matrix elements, but this still seems to be slightly faster than the bsxfun approach, and we can get rid of your pesky singleton dimension too:
L = 10000;
x = rand(4,L+1);
A_k = rand(4,4,L);
B_k = squeeze(sum(A_k,2)).';
tic
for k = 2:L
ii = 1:k-1;
x(:,k+1) = x(:,k+1) + diag(x(:,ii)*B_k(k+1-ii,:));
end
toc
This runs in ~2.2 seconds on my laptop, somewhat faster than the ~2.5 seconds obtained previously.
Since you're using an new version of Matlab you might try broadcasting / implicit expansion instead of bsxfun:
x(:,1,k+1) = x(:,1,k+1)+sum(sum(A_k(:,:,2:k).*x(:,1,k-1:-1:1),3),2);
I also changed the order of summation and removed the i variable for further improvement. On my machine, and with Matlab R2017b, this was about 25% faster for L = 10000.
This is more of a Matlab programming question than it is a math question.
I'd like to run gradient descent multiple on different learning rates. I have a set of learning rates
alpha = [0.3, 0.1, 0.03, 0.01, 0.003, 0.001];
and each time I run gradient descent, I get a vector J_vals as output. However, I don't know Matlab well enough to know how to implement this besides doing something like:
[theta, J_vals] = gradientDescent(...., alpha(1),...);
J1 = J_vals;
[theta, J_vals] = gradientDescent(...., alpha(2),...);
J2 = J_vals;
and so on.
I thought about using a for loop, but then I don't know how I would deal with the J_vals's (not sure how to apply the for loop to J1, J2, and so on). Perhaps it would look something like this:
for i = len(alpha)
[theta, J_vals] = gradientDescent(..., alpha(i),...);
J(i) = J_vals;
end
Then I would have a vector of vectors.
In Python, I would just run a for loop and append each new result to the end of a list. How do I implement something like this in Matlab? Or is there a more efficient way?
If you know how many loops you are going have and the size of the J_vals (or at least a reasonable upper bound) I would suggest pre-allocating the size of the container array
J = zeros(n,1);
then on each loop insert the new values
J(start:start+n) = J_vals
That way you don't reallocate memory. If you don't know, you can append the values to the array. For example,
J = []; % initialize
for i = len(alpha)
[theta, J_vals] = gradientDescent(..., alpha(i),...);
J = [J; J_vals]; % Append column row
end
but this is re-allocating the size of the array every loop. If it's not too many loops then it should be ok.
Matlab's "cell arrays" are kind of like lists in Python. They are similar in that you can put variable datatypes into them. Nobody seems to be too sure, but most likely the cell array is implemented as an array of object pointers. That means that it is still somewhat expensive to append to it (cell_array{length(cell_array) + 1} = new_data), but at least you are only appending a pointer instead of the entire column. You would still have to convert the cell array to a normal matrix afterward using cell2mat.
The most idiomatic Matlab solution is to pre-allocate (as #dpmcmlxxvi suggested).
I think what you are describing is a really common use case, and it's unfortunate that Matlab requires such a verbose idiom for this. Also it's frustrating that the documentation is opaque on how cell arrays are implemented and whether it is expensive to append to a cell array.
Your solution works just fine as long as you add a : for the row subscript (assuming J_vals is a column vector):
for i = len(alpha)
[theta, J_vals] = gradientDescent(..., alpha(i),...);
J(:, i) = J_vals;
%// ^... all rows, column 'i'
end
You could even put that as the return value:
for i = len(alpha)
[theta, J(:, i)] = gradientDescent(..., alpha(i),...);
%// ^... add returned value directly to our list
end
Both of these methods allow you to preallocate your matrix for a potential speed gain.
If you want to build your list as you go, you can use the method in #dpmcmlxxvi's answer, or you can use the special subscript end. Neither of these methods are compatible with preallocation, though.
for i = len(alpha)
[theta, J(:, end+1)] = gradientDescent(..., alpha(i),...);
%// ^... add new vector after the current end of list
end
I would also like to suggest you not use i as a variable name in Matlab. I know it's natural for other languages, but in Matlab it overwrites the built-in imaginary constant i.
See: https://stackoverflow.com/a/14790765/1377097