Hoping to get some help here with parallelising my python code, I've been struggling with it for a while and come up with several errors in whichever way I try, currently running the code will take about 2-3 hours to complete, The code is given below;
import numpy as np
from scipy.constants import Boltzmann, elementary_charge as kb, e
import multiprocessing
from functools import partial
Tc = 9.2
x = []
g= []
def Delta(T):
'''
Delta(T) takes a temperature as an input and calculates a
temperature dependent variable based on Tc which is defined as a
global parameter
'''
d0 = (pi/1.78)*kb*Tc
D0 = d0*(np.sqrt(1-(T**2/Tc**2)))
return D0
def element_in_sum(T, n, phi):
D = Delta(T)
matsubara_frequency = (np.pi * kb * T) * (2*n + 1)
factor_d = np.sqrt((D**2 * cos(phi/2)**2) + matsubara_frequency**2)
element = ((2 * D * np.cos(phi/2))/ factor_d) * np.arctan((D * np.sin(phi/2))/factor_d)
return element
def sum_elements(T, M, phi):
'''
sum_elements(T,M,phi) is the most computationally heavy part
of the calculations, the larger the M value the more accurate the
results are.
T: temperature
M: number of steps for matrix calculation the larger the more accurate the calculation
phi: The phase of the system can be between 0- pi
'''
X = list(np.arange(0,M,1))
Y = [element_in_sum(T, n, phi) for n in X]
return sum(Y)
def KO_1(M, T, phi):
Iko1Rn = (2 * np.pi * kb * T /e) * sum_elements(T, M, phi)
return Iko1Rn
def main():
for j in range(1, 92):
T = 0.1*j
for i in range(1, 314):
phi = 0.01*i
pool = multiprocessing.Pool()
result = pool.apply_async(KO_1,args=(26000, T, phi,))
g.append(result)
pool.close()
pool.join()
A = max(g);
x.append(A)
del g[:]
My approach was to try and send the KO1 function into a multiprocessing pool but I either get a Pickling error or a too many files open, Any help is greatly appreciated, and if multiprocessing is the wrong approach I would love any guide.
I haven't tested your code, but you can do several things to improve it.
First of all, don't create arrays unnecessarily. sum_elements creates three array-like objects when it can use just one generator. First, np.arange creates a numpy array, then the list function creates a list object and and then the list comprehension creates another list. The function does 4 times the work it should.
The correct way to implement it (in python3) would be:
def sum_elements(T, M, phi):
return sum(element_in_sum(T, n, phi) for n in range(0, M, 1))
If you use python2, replace range with xrange.
This tip will probably help you in any python script you'll write.
Also, try to utilize multiprocessing better. It seems what you need to do is to create a multiprocessing.Pool object once, and use the pool.map function.
The main function should look like this:
def job(args):
i, j = args
T = 0.1*j
phi = 0.01*i
return K0_1(26000, T, phi)
def main():
pool = multiprocessing.Pool(processes=4) # You can change this number
x = [max(pool.imap(job, ((i, j) for i in range(1, 314)) for j in range(1, 92)]
Notice that I used a tuple in order to pass multiple arguments to job.
This is not an answer to the question, but if I may, I would propose how to speed up the code using simple numpy array operations. Have a look at the following code:
import numpy as np
from scipy.constants import Boltzmann, elementary_charge as kb, e
import time
Tc = 9.2
RAM = 4*1024**2 # 4GB
def Delta(T):
'''
Delta(T) takes a temperature as an input and calculates a
temperature dependent variable based on Tc which is defined as a
global parameter
'''
d0 = (np.pi/1.78)*kb*Tc
D0 = d0*(np.sqrt(1-(T**2/Tc**2)))
return D0
def element_in_sum(T, n, phi):
D = Delta(T)
matsubara_frequency = (np.pi * kb * T) * (2*n + 1)
factor_d = np.sqrt((D**2 * np.cos(phi/2)**2) + matsubara_frequency**2)
element = ((2 * D * np.cos(phi/2))/ factor_d) * np.arctan((D * np.sin(phi/2))/factor_d)
return element
def KO_1(M, T, phi):
X = np.arange(M)[:,np.newaxis,np.newaxis]
sizeX = int((float(RAM) / sum(T.shape))/sum(phi.shape)/8) #8byte
i0 = 0
Iko1Rn = 0. * T * phi
while (i0+sizeX) <= M:
print "X = %i"%i0
indices = slice(i0, i0+sizeX)
Iko1Rn += (2 * np.pi * kb * T /e) * element_in_sum(T, X[indices], phi).sum(0)
i0 += sizeX
return Iko1Rn
def main():
T = np.arange(0.1,9.2,0.1)[:,np.newaxis]
phi = np.linspace(0,np.pi, 361)
M = 26000
result = KO_1(M, T, phi)
return result, result.max()
T0 = time.time()
r, rmax = main()
print time.time() - T0
It runs a bit more than 20sec on my PC. One has to be careful not to use too much memory, that is why there is still a loop with a bit complicated construction to use only pieces of X. If enough memory is present, then it is not necessary.
One should also note that this is just the first step of speeding up. Much improvement could be reached still using e.g. just in time compilation or cython.
Related
I am developing a code that uses a method called Platen to solve stochastic differential equations. Then I must solve that stochastic differential equation many times (on the order of 10,000 times) to average all the results. My code is:
import numpy as np
import random
import numba
#numba.jit(nopython=True)
def integrador2(y,t,h): #this is the integrator of the function that solves the SDE
m = 6.6551079E-26 #parameters
gamma=0.05
T = 5E-3
k_b = 1.3806488E-23
b=np.sqrt(2*m*gamma*T*k_b)
c=np.sqrt(h)
for i in range(len(t)):
dW=c*random.gauss(0,1)
A=np.array([y[i,-1]/m,-gamma*y[i,-1]]) #this is the platen method that is applied at
B_dW=np.array([0,b*dW]) #each time step
z=y[i]+A*h+B_dW
Az=np.array([z[-1]/m,-gamma*z[-1]])
y[i+1]=y[i]+1/2*(Az+A)*h+B_dW
return y
def media(args): #args is a tuple with the parameters
y = args[0]
t = args[1]
k = args[2]
x=0
p=0
for n in range(k): #k=number of trajectories
y=integrador2(y,t,h)
x=(1./(n+1))*(n*x+y[:,0]) #I do the average like this so as not to have to save all the
p=(1./(n+1))*(n*p+y[:,1]) #solutions in memory
return x,p
The variables y, t and h are:
y0 = np.array([initial position, initial moment]) #initial conditions
t = np.linspace(initial time, final time, number of time intervals) #time array
y = np.zeros((len(t)+1,len(y0))) #array of positions and moments
y[0,:]=np.array(y0) #I keep the initial condition
h = (final time-initial time)/(number of time intervals) #time increment
I need to be able to run the program for a number of time intervals of 10 ** 7 and solve it 10 ** 4 times (k = 10 ** 4).
I feel that I have already reached a dead end because I already accelerate the function that calculates the result with Numba and then (although I do not put it here) I parallelize the "media" function to work with the four cores that my computer has. Even doing all this, my program takes an hour and a half to execute for 10 ** 6 time intervals and k = 10 ** 4, I have not had the courage to execute it for 10 ** 7 time intervals because my intuition tells me that it would take more than 10 hours.
I would really appreciate if someone could advise me to make some parts of the code faster.
Finally, I apologize if I have not expressed myself completely correctly in any part of the question, I am a physicist, not a computer scientist and my English is far from perfect.
I can save about 75% of compute time by simplifying the math in the loop:
def integrador2(y,t,h): #this is the integrator of the function that solves the SDE
m = 6.6551079E-26 #parameters
gamma=0.05
T = 5E-3
k_b = 1.3806488E-23
b=np.sqrt(2*m*gamma*T*k_b)
c=np.sqrt(h)
h = h * 1.
coeff0 = h/m - gamma*h**2/(2.*m)
coeff1 = (1. - gamma*h + gamma**2*h**2/2.)
coeffd = c*b*(1. - gamma*h/2.)
for i in range(len(t)):
dW=np.random.normal()
# Method 2
y[i+1] = np.array([y[i][0] + y[i][1]*coeff0, y[i][1]*coeff1 + dW*coeffd])
return y
Here's a method using filters with scipy, which I don't think is compatible with Numba, but is slightly faster than the solution above:
from scipy import signal
# #numba.jit(nopython=True)
def integrador2(y,t,h): #this is the integrator of the function that solves the SDE
m = 6.6551079E-26 #parameters
gamma=0.05
T = 5E-3
k_b = 1.3806488E-23
b=np.sqrt(2*m*gamma*T*k_b)
c=np.sqrt(h)
h = h * 1.
coeff0a = 1.
coeff0b = h/m - gamma*h**2/(2.*m)
coeff1 = (1. - gamma*h + gamma**2*h**2/2.)
coeffd = c*b*(1. - gamma*h/2.)
noise = np.zeros(y.shape[0])
noise[1:] = np.random.normal(0.,coeffd*1.,y.shape[0]-1)
noise[0] = y[0,1]
a = [1, -coeff1]
b = [1]
y[1:,1] = signal.lfilter(b,a,noise)[1:]
a = [1, -coeff0a]
b = [coeff0b]
y[1:,0] = signal.lfilter(b,a,y[:,1])[1:]
return y
I am trying to solve the dynamics of a network composed of N=400 neurons.
That means I have 400 coupled equations that obey the following rules:
i = 0,1,2...399
J(i,j) = some function of i and j (where j is a dummy variable)
I(i) = some function of i
dr(i,t)/dt = -r(i,t) + sum over j from 0 to 399[J(i,j)*r(j)] + I(i)
How do I solve?
I know that for a system of 3 odes. I defined the 3 odes and the initial conditions and then apply odeint. Is there a better way to perform in this case?
So far I tried the following code (it isn't good since it enters an infinite loop):
N=400
t=np.linspace(0,20,1000)
J0=0.5
J1=2.5
I0=0.5
I1=0.001
i=np.arange(0,400,1)
theta=(2*np.pi)*i/N
I=I0+I1*cos(theta)
r=np.zeros(400)
x0 = [np.random.rand() for ii in i]
def threshold(y):
if y>0:
return y
else:
return 0
def vectors(x,t):
for ii in i:
r[ii]=x[ii]
for ii in i:
drdt[ii] = -r[ii] + threshold(I[ii]+sum(r[iii]*(J0+J1*cos(theta[ii]-theta[iii]))/N for iii in i))
return drdt
x=odeint(vectors,x0,t)
After making what I think are the obvious corrections and additions to your code, I was able to run it. It was not actually in an infinite loop, it was just very slow. You can greatly improve the performance by "vectorizing" your calculations as much as possible. This allows the loops to be computed in C code rather than Python. A hint that there is room for a lot of improvement is in the expression sum over j from 0 to 399[J(i,j)*r(j)]. That is another way of expressing the product of a matrix J and a vector r. So we should really have something like J # r in the code, and not all those explicit Python loops.
After some more tweaking, here's a modified version of your code. It is significantly faster than the original. I also reorganized a bit, and added a plot.
import numpy as np
from scipy.integrate import odeint
import matplotlib.pyplot as plt
def computeIJ(N):
i = np.arange(N)
theta = (2*np.pi)*i/N
I0 = 0.5
I1 = 0.001
I = I0 + I1*np.cos(theta)
J0 = 0.5
J1 = 2.5
delta_theta = np.subtract.outer(theta, theta)
J = J0 + J1*np.cos(delta_theta)
return I, J / N
def vectors2(r, t, I, J):
s = J # r
drdt = -r + np.maximum(I + s, 0)
return drdt
N = 400
I, J = computeIJ(N)
np.random.seed(123)
r0 = np.random.rand(N)
t = np.linspace(0, 20, 1000)
r = odeint(vectors2, r0, t, args=(I, J))
for i in [0, 100, 200, 300, 399]:
plt.plot(t, r[:, i], label='i = %d' % i)
plt.xlabel('t')
plt.legend(shadow=True)
plt.grid()
plt.show()
Here's the plot generated by the script:
def testing(min_quadReq, stepsize, max_quadReq, S):
y = np.arange(min_quadReq, max_quadReq, stepsize)
print("Y", y)
I_avg = np.zeros(len(y))
Q_avg = np.zeros(len(y))
x = np.arange(0, (len(S)))
debugger = 0
for i in range(0, len(y)):
I = np.array(S * (np.cos(2 * np.pi * y[i] * x)))
Q = np.array(S * (np.sin(2 * np.pi * y[i] * x)))
I_avg[i] = np.sum(I, 0)
Q_avg[i] = np.sum(Q, 0)
debugger += 1
D = [I_avg**2 + Q_avg**2]
maxIndex = np.argmax(D)
#maxValue = D.max()
# in python is arctan2(b,a) compared to matlab's atan2(a,b)
phaseOut = np.arctan2(Q_avg[maxIndex], I_avg[maxIndex])
# returns the out value and the phase
out = min_quadReq + ((maxIndex + 1) - 1) * stepsize
return out, phaseOut
I'm working on a project where uses DSP to process a signal at get out the relevant data. The code above is from the inner function of a quadrature modulation. From what I have seen this is the part of the code that have the biggest potential to be optimized. For example the two sum function is called about 92k times each and the quadrature function itself 2696 times. I'm not that familiar with python so if any have any suggestion to how a more efficient way of writing it or some good documentation it would be lovely.
The signal S is the input source and it's a array of [481][251]. The outer shell of the quadrature is called on by quadReq(cavSig[j, :]) just some extra information to show how it's called and how many times.
def randomnumber():
s = np.random.random_sample((1, 251))
print(s)
return s
randomnumber()
Edit: Added some more information
Your loop produces one I_avg value for each element of y. For compactness I could write it as a list comprehension.
In [61]: x=np.arange(4)
In [62]: y=np.arange(0,1,.2)
In [63]: [np.cos(2*np.pi*y[i]*x).sum() for i in range(y.shape[0])]
Out[63]:
[4.0,
-0.30901699437494745,
0.80901699437494767,
0.80901699437494834,
-0.30901699437494756]
But that y[i]*x part is just an outer product of y and x, which can be written with np.outter, or just as easily with broadcasting, y[:,None]*x.
In [64]: np.cos(2*np.pi*y[:,None]*x).sum(axis=1)
Out[64]: array([ 4. , -0.30901699, 0.80901699, 0.80901699, -0.30901699])
Get on to an interactive Python session, and play around with expressions like this. The best way to learn is by doing and seeing immediate results. Keep the test expressions and arrays small so you can see immediately what is happening.
I'm trying to improve time efficiency of part of my script but I don't have any more idea. I ran following script in either Matlab and Python but Matlab implementation is four times quicker than Python 's one. Any idea how to improve ?
Python:
import time
import numpy as np
def ComputeGradient(X, y, theta, alpha):
m = len(y)
factor = alpha / m
h = np.dot(X, theta)
theta = [theta[i] - factor * sum((h-y) * X[:,i]) for i in [0,1]]
#Also tried this but with worse performances
#diff = np.tile((h-y)[:, np.newaxis],2)
#theta = theta - factor * sum(diff * X)
return theta
if __name__ == '__main__':
data = np.loadtxt("data_LinReg.txt", delimiter=',')
theta = [0, 0]
alpha = 0.01
X = data[:,0]
y = data[:,1]
X = np.column_stack((np.ones(len(y)), X))
start_time = time.time()
for i in range(0, 1500, 1):
theta = ComputeGradient(X, y, theta, alpha)
stop_time = time.time()
print("--- %s seconds ---" % (stop_time - start_time))
--> 0.048s
Matlab:
data = load('data_LinReg.txt');
X = data(:, 1); y = data(:, 2);
m = length(y);
X = [ones(m, 1), data(:,1)]; % Add a column of ones to x
theta = zeros(2, 1);
iterations = 1500;
alpha = 0.01;
tic
for i = 1:1500
theta = gradientDescent(X, y, theta, alpha);
end
toc
function theta = gradientDescent(X, y, theta, alpha)
m = length(y); % number of training examples
h = X * theta;
t1 = theta(1) - alpha * sum(X(:,1).*(h-y)) / m;
t2 = theta(2) - alpha * sum(X(:,2).*(h-y)) / m;
theta = [t1; t2];
end
--> 0.01s
[EDIT] : solution avenue
One possible avenue is to use numpy vectorization instead of python root functions. In the proposed code, replacing sum by np.sum improves the time efficiency so that it is closer to Matlab (0.019s instead of 0.048s)
Furthermore, I tested separately the functions on vectors : np.dot, np.sum, * (product) and all these functions seems to be faster (really faster in some case) than the equivalent Matlab. I wonder then why it is still slower in Python....
This solution presents an optimized MATLAB implementation that does -
Funtion-inlining of the gradient-descent implementation.
Pre-computation of certain values that are repeatedly used inside the loop.
Code -
data = load('data_LinReg.txt');
iterations = 1500;
alpha = 0.01;
m = size(data,1);
M = alpha/m; %// scaling factor
%// Pre-compute certain values that are repeatedly used inside the loop
sum_a = M*sum(data(:,1));
sum_p = M*sum(data(:,2));
sum_ap = M*sum(data(:,1).*data(:,2));
sum_sqa = M*sum(data(:,1).^2);
one_minus_alpha = 1 - alpha;
one_minus_sum_sqa = 1 - sum_sqa;
%// Start processing
t1n0 = 0;
t2n0 = 0;
for i = 1:iterations
temp = t1n0*one_minus_alpha - t2n0*sum_a + sum_p;
t2n0 = t2n0*one_minus_sum_sqa - t1n0*sum_a + sum_ap;
t1n0 = temp;
end
theta = [t1n0;t2n0];
Quick tests show that this presents an appreciable speedup over the MATLAB code posted in the question.
Now, I am not too familiar with python, but I would assume that this MATLAB code could be easily ported to python.
I don't know how much of a difference it will make, but you can simplify your function with something like:
s = alpha / size(X,1);
gradientDescent = #(theta)( theta - s * X' * (X*theta - y) );
Since you need theta_{i} in order to find theta_{i+1}, I don't see any way to avoid the loop.
I'm implementing Bayesian Changepoint Detection in Python/NumPy (if you are interested have a look at the paper). I need to compute likelihoods for data in ranges [a, b], where a and b can have all values from 1 to n. However I can prune the computation at some points, so that I don't have to compute every likelihood. On the other hand some likelihoods are used more than once, so that I can save time by saving the values in a matrix P[a, b]. Right now I check whether the value is already computed, whenever I use it, but I find that a bit of a hassle. It looks like this:
# ...
P = np.ones((n, n)) * np.inf # a likelihood can't get inf, so I use it
# as pseudo value
for a in range(n):
for b in range(a, n):
# The following two lines get annoying and error prone if you
# use P more than once
if P[a, b] == np.inf:
P[a, b] = likelihood(data, a, b)
Q[a] += P[a, b] * g[a] * Q[a - 1] # some computation using P[a, b]
# ...
I wonder, whether there is a more intuitive and pythonic way to achieve this, without having the if ... statement before every use of a P[a, b]. Something like an automagical function call if some condition is not met. I could of course make the likelihood function aware of the fact that it could save values, but then it needs some kind of state (e.g. becomes an object). I want to avoid that.
The likelihood function
Since it was asked for in a comment, I add the likelihood function. It actually computes the conjugate prior and then the likelihood. And all in log representation... So it is quite complicated.
from scipy.special import gammaln
def gaussian_obs_log_likelihood(data, t, s):
n = s - t
mean = data[t:s].sum() / n
muT = (n * mean) / (1 + n)
nuT = 1 + n
alphaT = 1 + n / 2
betaT = 1 + 0.5 * ((data[t:s] - mean) ** 2).sum() + ((n)/(1 + n)) * (mean**2 / 2)
scale = (betaT*(nuT + 1))/(alphaT * nuT)
# splitting the PDF of the student distribution up is /much/ faster. (~ factor 20)
prob = 1
for yi in data[t:s]:
prob += np.log(1 + (yi - muT)**2/(nuT * scale))
lgA = gammaln((nuT + 1) / 2) - np.log(np.sqrt(np.pi * nuT * scale)) - gammaln(nuT/2)
return n * lgA - (nuT + 1)/2 * prob
Although I work with Python 2.7, both answers for 2.7 and 3.x are appreciated.
I would use a sibling of defaultdict for this (you can't use defaultdict directly since it won't tell you the key that is missing):
class Cache(object):
def __init__(self):
self.cache = {}
def get(self, a, b):
key = (a,b)
result = self.cache.get(key, None)
if result is None:
result = likelihood(data, a, b)
self.cache[key] = result
return result
Another approach would be using a cache decorator on likelihood as described here.