Vectorize python code for improved performance - python

I am writing a scientific code in python to calculate the energy of a system.
Here is my function : cte1, cte2, cte3, cte4 are constants previously computed; pii is np.pi (calculated beforehand, since it slows the loop otherwise). I calculate the 3 components of the total energy, then sum them up.
def calc_energy(diam):
Energy1 = cte2*((pii*diam**2/4)*t)
Energy2 = cte4*(pii*diam)*t
d=diam/t
u=np.sqrt((d)**2/(1+d**2))
cc= u**2
E = sp.special.ellipe(cc)
K = sp.special.ellipk(cc)
Id=cte3*d*(d**2+(1-d**2)*E/u-K/u)
Energy3 = cte*t**3*Id
total_energy = Energy1+Energy2+Energy3
return (total_energy,Energy1)
My first idea was to simply loop over all values of the diameter :
start_diam, stop_diam, step_diam = 1e-10, 500e-6, 1e-9 #Diametre
diametres = np.arange(start_diam,stop_diam,step_diam)
for d in diametres:
res1,res2 = calc_energy(d)
totalEnergy.append(res1)
Energy1.append(res2)
In an attempt to speed up calculations, I decided to use numpy to vectorize, as shown below :
diams = diametres.reshape(-1,1) #If not reshaped, calculations won't run
r1 = np.apply_along_axis(calc_energy,1,diams)
However, the "vectorized" solution does not properly work. When timing I get 5 seconds for the first solution and 18 seconds for the second one.
I guess I'm doing something the wrong way but can't figure out what.

With your current approach, you're applying a Python function to each element of your array, which carries additional overhead. Instead, you can pass the whole array to your function and get an array of answers back. Your existing function appears to work fine without any modification.
import numpy as np
from scipy import special
cte = 2
cte1 = 2
cte2 = 2
cte3 = 2
cte4 = 2
pii = np.pi
t = 2
def calc_energy(diam):
Energy1 = cte2*((pii*diam**2/4)*t)
Energy2 = cte4*(pii*diam)*t
d=diam/t
u=np.sqrt((d)**2/(1+d**2))
cc= u**2
E = special.ellipe(cc)
K = special.ellipk(cc)
Id=cte3*d*(d**2+(1-d**2)*E/u-K/u)
Energy3 = cte*t**3*Id
total_energy = Energy1+Energy2+Energy3
return (total_energy,Energy1)
start_diam, stop_diam, step_diam = 1e-10, 500e-6, 1e-9 #Diametre
diametres = np.arange(start_diam,stop_diam,step_diam)
a = calc_energy(diametres) # Pass the whole array

Related

Vectorize maximum drawdown stop loss logic

I have the following code that works as intended in a for loop:
import pandas as pd
import numpy as np
returns = pd.DataFrame(np.random.normal(scale=0.01,size=[1000,7]))
signal = pd.DataFrame(np.random.choice([1,np.nan],p=(0.1,0.9),size=[1000,7]))
window=20
breach = -0.03
positions = signal.copy(); positions[:] = np.nan
pf_returns = pd.Series(index=positions.index)
max_dd = pf_returns.copy()
for i in range(len(positions)):
#Positions are just the number of signals divided by the total active ones
positions.iloc[i] = signal.ffill().shift().div(signal.ffill().shift().sum(axis=1),axis=0).iloc[i]
pf_returns.iloc[i] = (positions.iloc[i] * returns.iloc[i]).sum()
equity_line = (pf_returns.iloc[max(i-window,0):i+1]+1).iloc[:i+1].cumprod()
max_dd.iloc[i] = (equity_line/equity_line.cummax()-1).rolling(window, min_periods=1).min().iloc[-1]
if max_dd.iloc[i] <= breach and i != len(positions)-1:
signal.iloc[i+1] = 0
Is it somehow possible to vectorize it? I thought in computing the equity_line at once (definitely possible without all these ilocs), however it then changes the upcoming values. I also thought somehow to a loop that runs in chunks (until next max_dd is found basically) but I am not sure if there is any possibility to vectorize or make this more efficient.
My desired outcome is basically to reproduce what the loops does (in terms of positions, if you wish, and consequently pf_returns) in a more efficient way.

How to run Standard Normal Homogeneity Test for a time series data

I have a timeseries data (sample data) for a variable wind for nearly 40 stations and 36 years (details in sample screenshot).
I need to run the Standard Normal Homogeneity Test and Pettitt's Test for this data as per recommendations. Are they available in python?
I couldn't find any code for the mentioned tests in python documentations and packages.
I need some help here to know if there is any package holding these tests.
There are codes in R as follows:
snht(data, period, robust = F, time = NULL, scaled = TRUE, rmSeasonalPeriod = Inf, ...)
However, no results so far... only errors.
Regarding the Pettitt test, I found this python implementation.
I believe there is a small typo: the t+1 on line 19 should actually just be t.
I also have developed a faster, vectorised implementation::
import numpy as np
def pettitt_test(X):
"""
Pettitt test calculated following Pettitt (1979): https://www.jstor.org/stable/2346729?seq=4#metadata_info_tab_contents
"""
T = len(X)
U = []
for t in range(T): # t is used to split X into two subseries
X_stack = np.zeros((t, len(X[t:]) + 1), dtype=int)
X_stack[:,0] = X[:t] # first column is each element of the first subseries
X_stack[:,1:] = X[t:] # all rows after the first element are the second subseries
U.append(np.sign(X_stack[:,0] - X_stack[:,1:].transpose()).sum()) # sign test between each element of the first subseries and all elements of the second subseries, summed.
tau = np.argmax(np.abs(U)) # location of change (first data point of second sub-series)
K = np.max(np.abs(U))
p = 2 * np.exp(-6 * K**2 / (T**3 + T**2))
return (tau, p)

How do I vectorize the following loop in Numpy?

"""Some simulations to predict the future portfolio value based on past distribution. x is
a numpy array that contains past returns.The interpolated_returns are the returns
generated from the cdf of the past returns to simulate future returns. The portfolio
starts with a value of 100. portfolio_value is filled up progressively as
the program goes through every loop. The value is multiplied by the returns in that
period and a dollar is removed."""
portfolio_final = []
for i in range(10000):
portfolio_value = [100]
rand_values = np.random.rand(600)
interpolated_returns = np.interp(rand_values,cdf_values,x)
interpolated_returns = np.add(interpolated_returns,1)
for j in range(1,len(interpolated_returns)+1):
portfolio_value.append(interpolated_returns[j-1]*portfolio_value[j-1])
portfolio_value[j] = portfolio_value[j]-1
portfolio_final.append(portfolio_value[-1])
print (np.mean(portfolio_final))
I couldn't find a way to write this code using numpy. I was having a look at iterations using nditer but I was unable to move ahead with that.
I guess the easiest way to figure out how you can vectorize your stuff would be to look at the equations that govern your evolution and see how your portfolio actually iterates, finding patterns that could be vectorized instead of trying to vectorize the code you already have. You would have noticed that the cumprod actually appears quite often in your iterations.
Nevertheless you can find the semi-vectorized code below. I included your code as well such that you can compare the results. I also included a simple loop version of your code which is much easier to read and translatable into mathematical equations. So if you share this code with somebody else I would definitely use the simple loop option. If you want some fancy-pants vectorizing you can use the vector version. In case you need to keep track of your single steps you can also add an array to the simple loop option and append the pv at every step.
Hope that helps.
Edit: I have not tested anything for speed. That's something you can easily do yourself with timeit.
import numpy as np
from scipy.special import erf
# Prepare simple return model - Normal distributed with mu &sigma = 0.01
x = np.linspace(-10,10,100)
cdf_values = 0.5*(1+erf((x-0.01)/(0.01*np.sqrt(2))))
# Prepare setup such that every code snippet uses the same number of steps
# and the same random numbers
nSteps = 600
nIterations = 1
rnd = np.random.rand(nSteps)
# Your code - Gives the (supposedly) correct results
portfolio_final = []
for i in range(nIterations):
portfolio_value = [100]
rand_values = rnd
interpolated_returns = np.interp(rand_values,cdf_values,x)
interpolated_returns = np.add(interpolated_returns,1)
for j in range(1,len(interpolated_returns)+1):
portfolio_value.append(interpolated_returns[j-1]*portfolio_value[j-1])
portfolio_value[j] = portfolio_value[j]-1
portfolio_final.append(portfolio_value[-1])
print (np.mean(portfolio_final))
# Using vectors
portfolio_final = []
for i in range(nIterations):
portfolio_values = np.ones(nSteps)*100.0
rcp = np.cumprod(np.interp(rnd,cdf_values,x) + 1)
portfolio_values = rcp * (portfolio_values - np.cumsum(1.0/rcp))
portfolio_final.append(portfolio_values[-1])
print (np.mean(portfolio_final))
# Simple loop
portfolio_final = []
for i in range(nIterations):
pv = 100
rets = np.interp(rnd,cdf_values,x) + 1
for i in range(nSteps):
pv = pv * rets[i] - 1
portfolio_final.append(pv)
print (np.mean(portfolio_final))
Forget about np.nditer. It does not improve the speed of iterations. Only use if you intend to go one and use the C version (via cython).
I'm puzzled about that inner loop. What is it supposed to be doing special? Why the loop?
In tests with simulated values these 2 blocks of code produce the same thing:
interpolated_returns = np.add(interpolated_returns,1)
for j in range(1,len(interpolated_returns)+1):
portfolio_value.append(interpolated_returns[j-1]*portfolio[j-1])
portfolio_value[j] = portfolio_value[j]-1
interpolated_returns = (interpolated_returns+1)*portfolio - 1
portfolio_value = portfolio_value + interpolated_returns.tolist()
I assuming that interpolated_returns and portfolio are 1d arrays of the same length.

Blitz code produces different output

I want to use weave.blitz to improve the performance of the following numpy code:
def fastIteration(self):
g = self.grid
nx,ny = g.ux.shape
uxold = g.old_ux
ux = g.ux
ux[0:,1:-1] = uxold[0:,1:-1] + ReI* (uxold[0:,2:] - 2*uxold[0:,1:-1] + uxold[0:,0:-2])
g.setBC()
g.old_ux = ux.copy()
In this code g is the computational grid. Which consist of two different fields ux and uxold. The old is simply used for temporary storage of the variables. In the complete Code around 95% of the runtime is spend in the fastIteration method, therefore even a simple performance gain would reduce the amount of hours spend executing this code significantly.
The output of the numpy method looks as if:
As this code is my bottleneck I want to improve the speed by using weave blitz. This method looks like:
def blitzIteration(self):
### does not work correct so far
g = self.grid
nx,ny = g.ux.shape
uxold = g.old_ux
ux = g.ux
expr = "ux[0:,1:-1] = uxold[0:,1:-1] + ReI* (uxold[0:,2:] - 2*uxold[0:,1:-1] + uxold[0:,0:-2])"
weave.blitz(expr, check_size=0)
g.setBC()
g.old_ux = ux.copy()
However this does not produce the correct output:
It looks like a bug in weave.blitz (reproduced, filed and fixed. There's more information about the actual bug there).
I thought it was odd to write 0: instead of the shorter : to get a full slice so I replaced all those slices and voilĂ , it worked.
I don't really know where the bug lies, but the expr_code generated by weave.blitz is slightly different:
When using 0:
ipdb> expr_code
'ux_blitz_buggy(blitz::Range(0,_end),blitz::Range(1,Nux_blitz_buggy(1)-1-1))=uxold(blitz::Range(0,_end),blitz::Range(1,Nuxold(1)-1-1))+ReI*(uxold(blitz::Range(0,_end),blitz::Range(2,_end))-2*uxold(blitz::Range(0,_end),blitz::Range(1,Nuxold(1)-1-1))+uxold(blitz::Range(0,_end),blitz::Range(0,Nuxold(1)-2-1)));\n'
When using :
ipdb> expr_code
'ux_blitz_not_buggy(_all,blitz::Range(1,Nux_blitz_not_buggy(1)-1-1))=uxold(_all,blitz::Range(1,Nuxold(1)-1-1))+ReI*(uxold(_all,blitz::Range(2,_end))-2*uxold(_all,blitz::Range(1,Nuxold(1)-1-1))+uxold(_all,blitz::Range(0,Nuxold(1)-2-1)));\n'
So, blitz::Range(0,_end) becomes _all and they behave in a different way.
For convenience, here is a complete script that reproduces the problem and will only succeed while the problem exists.
import numpy as np
from scipy.weave import blitz
def test_blitz_bug(N=4):
ReI = 1.2
ux_blitz_buggy, ux_blitz_not_buggy, ux_np = np.zeros((N, N)), np.zeros((N, N)), np.zeros((N, N))
uxold = np.random.randn(N, N)
ux_np[0:,1:-1] = uxold[0:,1:-1] + ReI* (uxold[0:,2:] - 2*uxold[0:,1:-1] + uxold[0:,0:-2])
expr_buggy = 'ux_blitz_buggy[0:,1:-1] = uxold[0:,1:-1] + ReI* (uxold[0:,2:] - 2*uxold[0:,1:-1] + uxold[0:,0:-2])'
expr_not_buggy = 'ux_blitz_not_buggy[:,1:-1] = uxold[:,1:-1] + ReI* (uxold[:,2:] - 2*uxold[:,1:-1] + uxold[:,0:-2])'
blitz(expr_buggy)
blitz(expr_not_buggy)
assert not np.allclose(ux_blitz_buggy, ux_np)
assert np.allclose(ux_blitz_not_buggy, ux_np)
if __name__ == '__main__':
test_blitz_bug()

Is vectorizing this triple for loop in Python / Numpy possible?

I am trying to speed up my code which currently takes a little over an hour to run in Python / Numpy. The majority of computation time occurs in the function pasted below.
I'm trying to vectorize Z, but I'm finding it rather difficult for a triple for loop. Could I possible implement the numpy.diff function somewhere? Take a look:
def MyFESolver(KK,D,r,Z):
global tdim
global xdim
global q1
global q2
for k in range(1,tdim):
for i in range(1,xdim-1):
for j in range (1,xdim-1):
Z[k,i,j]=Z[k-1,i,j]+r*q1*Z[k-1,i,j]*(KK-Z[k-1,i,j])+D*q2*(Z[k-1,i-1,j]-4*Z[k-1,i,j]+Z[k-1,i+1,j]+Z[k-1,i,j-1]+Z[k-1,i,j+1])
return Z
tdim = 75 xdim = 25
I agree, it's tricky because the BCs on all four sides, ruin the simple structure of the Stiffness matrix. You can get rid of the space loops as such:
from pylab import *
from scipy.sparse.lil import lil_matrix
tdim = 3; xdim = 4; r = 1.0; q1, q2 = .05, .05; KK= 1.0; D = .5 #random values
Z = ones((tdim, xdim, xdim))
#Iterate in time
for k in range(1,tdim):
Z_prev = Z[k-1,:,:] #may need to flatten
Z_up = Z_prev[1:-1,2:]
Z_down = Z_prev[1:-1,:-2]
Z_left = Z_prev[:-2,1:-1]
Z_right = Z_prev[2:,1:-1]
centre_term = (q1*r*(Z_prev[1:-1,1:-1] + KK) - 4*D*q2)* Z_prev[1:-1,1:-1]
Z[k,1:-1,1:-1]= Z_prev[1:-1,1:-1]+ centre_term + q2*(Z_up+Z_left+Z_right+Z_down)
But I don't think you can get rid of the time loop...
I think the expression:
Z_up = Z_prev[1:-1,2:]
makes a copy in numpy, whereas what you want is a view - if you can figure out how to do this - it should be even faster (how much?)
Finally, I agree with the rest of the answerers - from experience, this kind of loops are better done in C and then wrapped into numpy. But the above should be faster than the original...
This looks like an ideal case for Cython. I'd suggest writing that function in Cython, it'll probably be hundreds of times faster.

Categories

Resources