How do I optimize an array heavy code in Python? - python

Sorry, this is probably a very noob question, but I'm converting some code I've been modeling with from MATLAB to Python both to help me learn Python and to see if it could run it any faster. In MATLAB, this code takes about 1 second to run, but in Python, it takes about 1 minute. Is there some way to speed it up, or is this not a good application of Python?
import numpy as np
import matplotlib.pyplot as plt
N = 7e5 #number of time steps
dt = 1e-6 #Time step, in seconds
tf = dt*N #Final time, seconds.
trange = np.linspace(0,tf,int(N+1)) #time range
dx = L/M #spatial step size in thermoelectric, meters
#Define dimensionless fourier number in thermoelectric
Fo = dt*(k/c)/(dx**2)
#temperature profile in thermoelectric as a function of space and time
T = np.zeros((M+1,2))
#Allocate initial condition
T[:,0] = Ti
#Set boundary condition at x=L
T[M,:] = T2
#temperature v time profile of coldside of thermoelectric
coldTemp = np.zeros(len(trange))
#initial coldside temp
coldTemp[0] = Ti
#setting current to optimum DC value
I = Issmax
#iterate over timesteps
for p in range(int(N)):
#Use central difference forward time method to find temperature within
#thermoelectric material.
for n in range(M-1):
#calculate temp. change at next time step
T[n+1,1] = T[n+1,0] + Fo*(T[n+2,0]-2*T[n+1,0]+T[n,0]) + dt*((I)**2*rho/(c*d**2*w**2))
#Apply energy balance to the metal (assumed isothermal) and use the
#fact that the metal temp is equal to the thermoelectric temp
T[0,1] = T[0,0] + dt*((I)**2*rhom/(cm*lm**2*wm**2)) - (k*dt/(cm*dx*lm))*(T[0,0]-T[1,0]) - (dt*(I)*S*T[0,0]/(cm*d*w*lm))
#Saving coldside temp
coldTemp[p+1] = T[0,1]
#Setting current temperature profile to be calculated one
T[:,0] = T[:,1]
#Plotting coldside temp vs time
plt.plot(trange, coldTemp)

the suggestions in comments above are good, but before anything else, you are violating perhaps the #1 rule of making loops faster: Don't do things inside of a loop that can be done outside. You are re-computing the same values billions of times, and they are "expensive" with division and exponentiation. Consider something like this... (Check the math, I wasn't too careful, and perhaps there is more you can do)
# calculate the constants...
c1 = dt*((I)**2*rho/(c*d**2*w**2))
c2 = dt*((I)**2*rhom/(cm*lm**2*wm**2))
c3 = (k*dt/(cm*dx*lm))
c4 = (dt*(I)*S/(cm*d*w*lm))
#iterate over timesteps
for p in range(int(N)):
#Use central difference forward time method to find temperature within
#thermoelectric material.
for n in range(M-1):
#calculate temp. change at next time step
T[n+1,1] = T[n+1,0] + Fo*(T[n+2,0]-2*T[n+1,0]+T[n,0]) + c1
#Apply energy balance to the metal (assumed isothermal) and use the
#fact that the metal temp is equal to the thermoelectric temp
T[0,1] = T[0,0] + c2 - c3*(T[0,0]-T[1,0]) - c4*T[0,0]
#Saving coldside temp
coldTemp[p+1] = T[0,1]
#Setting current temperature profile to be calculated one
T[:,0] = T[:,1]


While loop for Python odient solver?

I have a mathematical model of differential equations that begins as linear and then uses correctional coefficients after reaching a certain value (1).
Currently, I solve the linear function independently, find out where the array goes from less than 1 to greater than 1, and then use that value from the array as the new initial condition. I also correct the time scale.
def vttmodel_linear(m,t,tm,tv,M_max):
n = 1/(7*tm)
dMdt = n
return dMdt
M_0 = 0
M_max = 1 + 7*((RH_crit-RH)/(RH_crit-100)) - 2*np.square((RH_crit-RH)/(RH_crit-100))
# tm = days
# M = weeks so 7*tm
t = np.arange(0,104+1)
tm = np.exp(-0.68*np.log(T) - 13.9*np.log(RH) + 0.14*W - 0.33*SQ + 66.02)
tv = np.exp(-0.74*np.log(T) - 12.72*np.log(RH) + 0.06*W + 61.50)
m = odient(vttmodel_linear, M_0, t, args=(tm,tv,M_max))
M_0 = m[(np.where(m>1)[0][0])-1]
t = np.where(m>1)[0]
Then I use the new initial condition, M_0 and the updated time scale to solve the non-linear portion of the model.
def vttmodel(M,t,tm,tv,M_max):
n = 1/(7*tm)
k1 = 2/((tv/tm)-1)
k2 = np.max([1-np.exp(2.3*(M-M_max)), 0])
dMdt = n*k1*k2
return dMdt
M = odient(vttmodel, M_0, t, args=(tm,tv,M_max))
I then splice the arrays m and M at the location I found earlier and graph the result.
I would like to find a simplified way to do this. I have tried using If statements within the odient function and also a While loop when calling the two functions, but have not had any luck interrupting the odient function. Suggestions would be helpful. Thank you.

How can I optimize this code in python? For solving stochastic differential equations

I am developing a code that uses a method called Platen to solve stochastic differential equations. Then I must solve that stochastic differential equation many times (on the order of 10,000 times) to average all the results. My code is:
import numpy as np
import random
import numba
def integrador2(y,t,h): #this is the integrator of the function that solves the SDE
m = 6.6551079E-26 #parameters
T = 5E-3
k_b = 1.3806488E-23
for i in range(len(t)):
A=np.array([y[i,-1]/m,-gamma*y[i,-1]]) #this is the platen method that is applied at
B_dW=np.array([0,b*dW]) #each time step
return y
def media(args): #args is a tuple with the parameters
y = args[0]
t = args[1]
k = args[2]
for n in range(k): #k=number of trajectories
x=(1./(n+1))*(n*x+y[:,0]) #I do the average like this so as not to have to save all the
p=(1./(n+1))*(n*p+y[:,1]) #solutions in memory
return x,p
The variables y, t and h are:
y0 = np.array([initial position, initial moment]) #initial conditions
t = np.linspace(initial time, final time, number of time intervals) #time array
y = np.zeros((len(t)+1,len(y0))) #array of positions and moments
y[0,:]=np.array(y0) #I keep the initial condition
h = (final time-initial time)/(number of time intervals) #time increment
I need to be able to run the program for a number of time intervals of 10 ** 7 and solve it 10 ** 4 times (k = 10 ** 4).
I feel that I have already reached a dead end because I already accelerate the function that calculates the result with Numba and then (although I do not put it here) I parallelize the "media" function to work with the four cores that my computer has. Even doing all this, my program takes an hour and a half to execute for 10 ** 6 time intervals and k = 10 ** 4, I have not had the courage to execute it for 10 ** 7 time intervals because my intuition tells me that it would take more than 10 hours.
I would really appreciate if someone could advise me to make some parts of the code faster.
Finally, I apologize if I have not expressed myself completely correctly in any part of the question, I am a physicist, not a computer scientist and my English is far from perfect.
I can save about 75% of compute time by simplifying the math in the loop:
def integrador2(y,t,h): #this is the integrator of the function that solves the SDE
m = 6.6551079E-26 #parameters
T = 5E-3
k_b = 1.3806488E-23
h = h * 1.
coeff0 = h/m - gamma*h**2/(2.*m)
coeff1 = (1. - gamma*h + gamma**2*h**2/2.)
coeffd = c*b*(1. - gamma*h/2.)
for i in range(len(t)):
# Method 2
y[i+1] = np.array([y[i][0] + y[i][1]*coeff0, y[i][1]*coeff1 + dW*coeffd])
return y
Here's a method using filters with scipy, which I don't think is compatible with Numba, but is slightly faster than the solution above:
from scipy import signal
# #numba.jit(nopython=True)
def integrador2(y,t,h): #this is the integrator of the function that solves the SDE
m = 6.6551079E-26 #parameters
T = 5E-3
k_b = 1.3806488E-23
h = h * 1.
coeff0a = 1.
coeff0b = h/m - gamma*h**2/(2.*m)
coeff1 = (1. - gamma*h + gamma**2*h**2/2.)
coeffd = c*b*(1. - gamma*h/2.)
noise = np.zeros(y.shape[0])
noise[1:] = np.random.normal(0.,coeffd*1.,y.shape[0]-1)
noise[0] = y[0,1]
a = [1, -coeff1]
b = [1]
y[1:,1] = signal.lfilter(b,a,noise)[1:]
a = [1, -coeff0a]
b = [coeff0b]
y[1:,0] = signal.lfilter(b,a,y[:,1])[1:]
return y

Different values of Initial weight of linear regression is converging to different minimized cost value

I have implemented a univariate linear regression in python. The code is given below:
import numpy as np
import matplotlib.pyplot as plt
x = np.array([1,2,4,3,5,7,9,11])
y = np.array([3,5,9,7,11,15,19,23])
def hypothesis(w0,w1,x):
return w0 + w1*x
def cost_cal(y,w0,w1,x,m):
diff = hypothesis(w0,w1,x)-y
diff_sqr = np.square(diff)
total_cost = np.sum(diff)
total_cost_sqr = (1/(2*m)) * np.sum(diff_sqr)
return total_cost, total_cost_sqr
def gradient_descent(w0,w1,alpha,x,m,y):
cost, cost_sqr = cost_cal(y,w0,w1,x,m)
temp0 = (alpha/m) * cost
temp1 = (alpha/m) * np.sum(cost*x)
w0 = w0 - temp0
w1 = w1 - temp1
return w0,w1
These are my hypothesis, cost, and gradient_descent functions implemented in python. When I use the initial weight w0 = 0 and w1 = 0, my minimized cost is 0.12589726000013188. But, if I initialize the w0 = -1 and w1 = -2, the minimized cost is 0.5035890400005265. What is the reason behind the different minimum costs using different initial weight values? As the error function MSE, is a convex function, shouldn't it reach the global minimum? Am I doing something wrong?
alpha =0.0001
m = 8
z = 5000
c = np.zeros(z)
cs = np.zeros(z)
index = np.zeros(z)
i = 0
while (i<z):
index[i] = i
c[i],cs[i] = cost_cal(y,w0,w1,x,m)
#print(i, c[i], cs[i])
w0, w1 = gradient_descent(w0,w1,alpha,x,m,y)
w0_arr[i],w1_arr[i] = w0,w1
inc = np.argmin(cs)
The answer might vary based on your initial vector u choose in weight space. Apart from fact that the cost function is convex the curve has many critical points so it completely depends on the initial point or weights where we end up whether in local or global minima.
image link
as per the image in the given link if u start from an initial point which is at the left corner we end up landing in global minima if we start from the right end we end up landing in local minima. Cost may vary by a huge difference but in most cases, the difference is not very large in case of local or global minima so if the cost is varying by big difference u need to cross-check once. Picking initial weights randomly is a good practice they should not be set manually.
in gradient_descent function, temp0 is assigned an array instead of the value, the sum of that array must be done before adding.

Parallelize loops using OpenCL in Python

I have a given dataset in the matrix y and I want to train different SOMs with it. The SOM is one-dimensional (a line), and its number of neurons varies. I train a SOM of size N=2 at first, and N=NMax at last, giving a total of NMax-2+1 SOMs. For each SOM, I want to store the weights once the training is over before moving on to the next SOM.
The whole point of using PyOpenCL here is that each one of the outer loops is independent of the others. Namely, for each value of N, the script doesn't care about what happens when N takes other values. One could have the same result running the script NMax-2+1 times changing the value of N manually.
With this in mind, I was hoping to be able to perform each one of these independent iterations at the same time using the GPU, so that the time spent reduces significantly. The increase in speed will be less than 1/(NMax-2+1) though, because each iteration is more expensive that the previous ones, as for larger values of N, more calculations are made.
Is there a way to 'translate' this code to run on the GPU? I've never used OpenCL before, so let me know if this is too broad or silly so I can ask a more specific question. The code is self-contained, so feel free to try it out.The four constants declared at the beginning can be changed to whatever you like (given that NMax > 1 and all the others are strictly positive).
import numpy as np
import time
m = 3 # Dimension of datapoints
num_points = 2000 # Number of datapoints
iterMax = 150 # Maximum number of iterations
NMax = 3 # Maximum number of neurons
y = np.random.rand(num_points,m) # Generate always the same dataset
sigma_0 = 5 # Initial value of width of the neighborhood function
eta_0 = 1 # Initial value of learning rate
w = list(range(NMax - 1))
wClusters = np.zeros((np.size(y,axis = 0),NMax - 1)) # Clusters for each N
t_begin = time.clock() # Start time
for N in range(NMax-1): # Number of neurons for this iteration
w[N] = np.random.uniform(0,1,(N+2,np.size(y,axis=1))) - 0.5 # Initialize weights
iterCount = 1
while iterCount < iterMax:
# Mix up the input patterns
mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
# Sigma reduction
sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
s2 = 2*sigma**2
# Learning rate reduction
eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
for selectedInput in mixInputs: # Pick up one pattern
# Search winning neuron
aux = np.sum((selectedInput - w[N])**2, axis = -1)
ii = np.argmin(aux) # Neuron 'ii' is the winner
jjs = abs(ii - list(range(N+2)))
dists = np.min(np.vstack([jjs , abs(jjs-(N+2))]), axis = 0)
# Update weights
w[N] = w[N] + eta * np.exp((-dists**2)/s2).T[:,np.newaxis] * (selectedInput - w[N])
iterCount += 1
# Assign each datapoint to its nearest neuron
for kk in range(np.size(y,axis = 0)):
aux = np.sum((y[kk,] - w[N])**2,axis=-1)
ii = np.argmin(aux) # Neuron 'ii' is the winner
wClusters[kk,N] = ii + 1
t_end = time.clock() # End time
print(t_end - t_begin)
I'm trying to give a somewhat complete answer.
First of all:
Can this code be adapted to be run on the GPU using (py)OpenCL?
Most probably yes.
Can this been done automatically?
No (afaik).
Most of the questions I get about OpenCL are along the lines of: "Is it worth porting this piece of code to OpenCL for a speedup gain?" You are stating, that your outer loop is independent on the results of other runs, which makes the code basically parallelizable. In a straightforward implementation, each OpenCL working element would execute the same code with slightly different input parameters. Not regarding overhead by data transfer between host and device, the running time of this approach would be equal to the running time of the slowest iteration. Depending on the iterations in your outer loop, this could be a massive speed gain. As long as the numbers stay relatively small, you could try the multiprocessing module in python to parallelize these iterations on the CPU instead of the GPU.
Porting to the GPU usually only makes sense, if a huge number of processes are to be run in parallel (about 1000 or more). So in your case, if you really want an enormous speed boost, see if you can parallelize all calculations inside the loop. For example, you have 150 iterations and 2000 data points. If you could somehow parallelize these 2000 data points, this could offer a much bigger speed gain, which could justify the work of porting the whole code to OpenCL.
Try parallelizing on CPU first. If you find the need to run more than several 100s of processes at the same time, move to GPU.
Update: Simple code for parallelizing on CPU using multiprocessing (without callback)
import numpy as np
import time
import multiprocessing as mp
m = 3 # Dimension of datapoints
num_points = 2000 # Number of datapoints
iterMax = 150 # Maximum number of iterations
NMax = 10 # Maximum number of neurons
y = np.random.rand(num_points,m) # Generate always the same dataset
sigma_0 = 5 # Initial value of width of the neighborhood function
eta_0 = 1 # Initial value of learning rate
w = list(range(NMax - 1))
wClusters = np.zeros((np.size(y,axis = 0),NMax - 1)) # Clusters for each N
def neuron_run(N):
w[N] = np.random.uniform(0,1,(N+2,np.size(y,axis=1))) - 0.5 # Initialize weights
iterCount = 1
while iterCount < iterMax:
# Mix up the input patterns
mixInputs = y[np.random.permutation(np.size(y,axis = 0)),:]
# Sigma reduction
sigma = sigma_0 - (sigma_0/(iterMax + 1)) * iterCount
s2 = 2*sigma**2
# Learning rate reduction
eta = eta_0 - (eta_0/(iterMax + 1)) * iterCount
for selectedInput in mixInputs: # Pick up one pattern
# Search winning neuron
aux = np.sum((selectedInput - w[N])**2, axis = -1)
ii = np.argmin(aux) # Neuron 'ii' is the winner
jjs = abs(ii - list(range(N+2)))
dists = np.min(np.vstack([jjs , abs(jjs-(N+2))]), axis = 0)
# Update weights
w[N] = w[N] + eta * np.exp((-dists**2)/s2).T[:,np.newaxis] * (selectedInput - w[N])
iterCount += 1
# Assign each datapoint to its nearest neuron
for kk in range(np.size(y,axis = 0)):
aux = np.sum((y[kk,] - w[N])**2,axis=-1)
ii = np.argmin(aux) # Neuron 'ii' is the winner
wClusters[kk,N] = ii + 1
t_begin = time.clock() # Start time
def apply_async():
pool = mp.Pool(processes=NMax)
for N in range(NMax-1):
pool.apply_async(neuron_run, args = (N,))
print "Multiprocessing done!"
if __name__ == '__main__':
t_end = time.clock() # End time
print(t_end - t_begin)

Previous value in while loop in Python

I tried to make simple monte carlo simulation for stock investments where you start with some investment value, investment period (in years) and mean and std of stock mutual fund. I also wanted to implement an easy way for stock market crash - I did it so that whenever new calculated value was for 40 % higher than previous one, the new value should fall for 90 % - like some kind of crash. I managed to make it working and here is the code, but I think that it is not working right. The problem is probably hidden where I call previous value. Could you try to make it working?
import matplotlib
import matplotlib.pyplot as plt
import random
import numpy as np
mean = 7.0 #mean for stock mutual fund
std = 19.0 #std for stock mutual fund
def investment_return(): #random normal distribution of returns
investment_return = (np.random.normal(mean,std))/100
return investment_return
def investor(A, B):
investment_value = A
investment_period = B
wX = []
vY = []
x = 1
while x <= investment_period:
value = A + A*investment_return()
if value > value * 1.4: #if new value is 1.4x bigger than previous
A = value * 0.1 #than make -90 percent adjustment
A = value #else use new value
x += 1
i = 0
while i < 10: #number of investors
investor(100,20) #starting value and investment period
i += 1
Well, I tried best I could to interpret what you were after. It helped that you provided a solid basis to work with :).
Ok so, here we go: obviously, Kevin's remark that value > value * 1.4 will never evaluate to True is a solid one. I did rename some variables (for example, normally we compare stocks as indices, so I renamed A to index). Time is generally referred to as t, not x. The while loops were a little quirky, so I got rid of those.
import matplotlib.pyplot as plt
import numpy as np
mean = 7.0
std = 19.0
def investment_return():
return (np.random.normal(mean, std)) / 100
def investor(index, period):
wT = []
vY = []
for t in range(1, period + 1):
new_index = index + index * investment_return()
if new_index > index * 1.4:
index = new_index * 0.1
index = new_index
return wT, vY
for i in range(0, 10):
wT, vY = investor(100, 20)
# do something with your data
plt.plot(wT, vY)
This occasionally does have a stock crash, as can clearly be seen (do keep in mind that this requires you to sample >40 from an N(7,19) distribution: that should not happen in a little over 95% of all cases).

