How to create time matrix effectively

How to create time matrix effectively - python

I have a following question. I have a function get_time that return time between two coordinates. I would like to create a time matrix.
Here is my code:
def time_matrix(coordinates):
times = np.zeros((len(coordinates), len(coordinates)), dtype=float)
for i in range(len(coordinates)):
for j in range(len(coordinates)):
time = get_time(
coordinates[i][0], coordinates[i][1], coordinates[j][0], coordinates[j][1]
) / 60
times[i][j] = time
return times.tolist()
My function works, but it is very ineffective. times is symmetric, so it would be better to use each time twice. In other words, I don`t want to compute the result row by row. Can you help me how can I modify my function, please?

If you just want to use the evaluated time twice, it could simply be achieved by changing the assignment line and limit the second loop
def time_matrix(coordinates):
times = np.zeros((len(coordinates), len(coordinates)), dtype=float)
for i in range(len(coordinates)):
for j in range(i,len(coordinates)):
time = get_time(
coordinates[i][0], coordinates[i][1], coordinates[j][0], coordinates[j][1]
) / 60
times[i][j] = times[j][i] = time
return times.tolist()
This should work, right?
But a vectorized get_time() would be better, of course.

Related

How to speed up a python 2 program with multiple nested for-loops

This code has multiple for-loops and the lists I read in have 999 points each. I want to iterate this up to 10,000 times. However, even iterating it only 2 times takes nearly 10 minutes.
Even though I'm posting this specific code, I think an answer to my question can help others run their codes with a lot of data more quickly.
Any of your advice is appreciated. Thanks a lot.
What this code does: Basically, I'm reading in arrays from textfile as lists. Each list (e.g. x1,y1,z1... etc) has 999 elements each. I operate on each element in the list based on the other elements (the two inner loops). The end result is a totally new list which I've called x2. This code is then supposed to repeat the operations "n # of times" (the outer loop).
My issue is that I can only repeat this a for a few iterations before it just takes to long to execute.
import matplotlib.pyplot as plt
from astropy.table import Table
from astropy.io import ascii
import numpy as np
import argparse
import time
#for 200
start_time = time.time()
npoints=999
n1, mass1, x1, y1,z1,vx1,vy1,vz1,fx_list,fy_list,fz_list= [],[],[],[],[],[],[],[],[],[],[]
AngL_list=[]
Etot0_list=[]
G=1
dt=.01
with open('homo_sph_N1000_R3_v1.dat') as f:
for row in f.readlines():
if not row.startswith("#"):
spaces=row.split(' ')
n1.append(float(spaces[0]))
mass1.append(float(spaces[1]))
x1.append(float(spaces[2]))
y1.append(float(spaces[3]))
z1.append(float(spaces[4]))
vx1.append(float(spaces[5]))
vy1.append(float(spaces[6]))
vz1.append(float(spaces[7]))
for n in range(2):
#changes the particle on which the forces are acting
for xn in range(0,npoints):
#changes the forces from other particles acting on the particle
for step in range(0,npoints):
#Here we find the accelearation for every particle
fx=((G*mass1[xn]*mass1[step+1]*((x1[step+1]**2.+y1[step+1]**2.+z1[step+1]**2.)-(x1[xn]**2.+y1[xn]**2.+z1[xn]**2.)))/ ( abs((x1[step+1]**2.+y1[step+1]**2.+z1[step+1]**2.)-(x1[xn]**2.+y1[xn]**2.+z1[xn]**2.))**2.+(.2)**2 )**(3./2.))
fy=((G*mass1[xn]*mass1[step+1]*((x1[step+1]**2.+y1[step+1]**2.+z1[step+1]**2.)-(x1[xn]**2.+y1[xn]**2.+z1[xn]**2.)))/ ( abs((x1[step+1]**2.+y1[step+1]**2.+z1[step+1]**2.)-(x1[xn]**2.+y1[xn]**2.+z1[xn]**2.))**2+(.2)**2 )**(3./2.))
fz=((G*mass1[xn]*mass1[step+1]*((x1[step+1]**2.+y1[step+1]**2.+z1[step+1]**2.)-(x1[xn]**2.+y1[xn]**2.+z1[xn]**2.)))/ ( abs((x1[step+1]**2.+y1[step+1]**2.+z1[step+1]**2.)-(x1[xn]**2.+y1[xn]**2.+z1[xn]**2.))**2+(.2)**2 )**(3./2.))
#Then put store it in an array
fx_list.append(fx)
fy_list.append(fy)
fz_list.append(fz)
#Now, I need to split that array up by npoints, each particle has npoints forces acting on it.
fxx= np.array_split(fx_list,npoints)
fyy= np.array_split(fy_list,npoints)
fzz= np.array_split(fz_list,npoints)
#since the force on a particle is the sum of all forces acting on it, I'm summing each variable in each array together. e.g. [1,2,3]=[6]
fxxx_list=[]
fyyy_list=[]
fzzz_list=[]
for xn in range(0,npoints):
fxxx= np.sum(fxx[xn])
fyyy= np.sum(fyy[xn])
fzzz= np.sum(fzz[xn])
#and save that in array. Now I have the accelearation on each particle.
fxxx_list.append(fxxx)
fyyy_list.append(fyyy)
fzzz_list.append(fzzz)
#This is where i begin the integration
vx2=[]
vy2=[]
vz2=[]
for xn in range(0,npoints):
vx11=vx1[xn]+.5*(fxxx_list[xn]+fxxx_list[xn])*dt
vy11=vy1[xn]+.5*(fyyy_list[xn]+fyyy_list[xn])*dt
vz11=vz1[xn]+.5*(fzzz_list[xn]+fyyy_list[xn])*dt
vx2.append(vx11)
vy2.append(vy11)
vz2.append(vz11)
x2=[]
y2=[]
z2=[]
for xn in range(0,npoints):
x11=(x1[xn]+vx2[xn]*dt)+(.5*fxxx_list[xn]*(dt**2))
y11=(y1[xn]+vy2[xn]*dt)+(.5*fyyy_list[xn]*(dt**2))
z11=(z1[xn]+vz2[xn]*dt)+(.5*fzzz_list[xn]*(dt**2))
x2.append(x11)
y2.append(y11)
z2.append(z11)
x1,y1,z1,vx1,vy1,vz1 = x2,y2,z2,vx2,vy2,vz2
print x2,y2
plt.scatter(x2,y2)
print("--- %s seconds ---" % (time.time() - start_time))
plt.show()

It's only a small speed-up, but the code seems to be doing a lot of x**2 (x squared).
In python3, generally it's slower to execute x**2 rather than x*x. Consider a simple test program:
import time
iteration_count=99999999
# Do a lot of squaring with the ** operator
start1 = time.time()
sum = 0
for i in range( iteration_count ):
sum += i ** 2
end1 = time.time()
# Do a lot of squaring with i * i
start2 = time.time()
sum = 0
for i in range( iteration_count ):
sum += i * i
end2 = time.time()
print("x**2 => %f seconds" % (end1-start1))
print("x*x => %f seconds" % (end2-start2))
Which gives me the results:
$ python3 ./squared.py
x**2 => 21.347830 seconds
x*x => 8.983334 seconds
I did run it a bunch of times, it doesn't vary much.
The code in the question is doing a lot of calculations to make fx, fy and fz (This seems to be the same for each? is this correct?) If there is any commonality in these computations, intermediate results should be removed and only calculated once.
For example, instead of:
fx=((G*mass1[xn]*mass1[step+1]*((x1[step+1]**2.+y1[step+1]**2.+z1[step+1]**2.) ...
fy=((G*mass1[xn]*mass1[step+1]*((x1[step+1]**2.+y1[step+1]**2.+z1[step+1]**2.) ...
fz=((G*mass1[xn]*mass1[step+1]*((x1[step+1]**2.+y1[step+1]**2.+z1[step+1]**2.) ...
The first part should be computed only once:
g_mass = G*mass1[xn]*mass1[step+1]
fx=((g_mass * ((x1[step+1]**2.+y1[step+1]**2.+z1[step+1]**2.) ...
fy=((g_mass * ((x1[step+1]**2.+y1[step+1]**2.+z1[step+1]**2.) ...
fz=((g_mass * ((x1[step+1]**2.+y1[step+1]**2.+z1[step+1]**2.) ...
And similarly for any parts of these formulae with common components.

I think you should be able to get a big speed-up (maybe of order a factor 1000) if you convert you inputs to numpy arrays, do operations on the numpy arrays, and preallocate numpy arrays to their required size, rather than using lists and appending them as you go.
For example, just taking the start of your example, you could do something like the following (although I can't guarantee it's doing exactly what you want, but is just guidance)
with open('homo_sph_N1000_R3_v1.dat') as f:
for row in f.readlines():
if not row.startswith("#"):
spaces=row.split(' ')
n1.append(float(spaces[0]))
mass1.append(float(spaces[1]))
x1.append(float(spaces[2]))
y1.append(float(spaces[3]))
z1.append(float(spaces[4]))
vx1.append(float(spaces[5]))
vy1.append(float(spaces[6]))
vz1.append(float(spaces[7]))
# convert to numpy arrays
n1 = np.array(n1)
mass1 = np.array(mass1)
# KEEP DOING THIS FOR THE OTHER INPUTS
for n in range(2):
# PREALLOCATE
fx = np.zeros(npoints, npoints-1)
fy = np.zeros(npoints, npoints-1)
fz = np.zeros(npoints, npoints-1)
#changes the particle on which the forces are acting
for xn in range(0,npoints):
#changes the forces from other particles acting on the particle
# REMOVE THE INNER FOR LOOP AND JUST USE THE ARRAYS
#for step in range(0,npoints):
#Here we find the accelearation for every particle
fx[xn] = ((G*mass1[xn]*mass1[1:]*((x1[1:]**2.+y1[1:]**2.+z1[1:]**2.)-(x1[xn]**2.+y1[xn]**2.+z1[xn]**2.)))/ ( abs((x1[1:]**2.+y1[1:]**2.+z1[1:]**2.)-(x1[xn]**2.+y1[xn]**2.+z1[xn]**2.))**2.+(.2)**2 )**(3./2.))
fy[xn] = ((G*mass1[xn]*mass1[1:]*((x1[1:]**2.+y1[1:]**2.+z1[1:]**2.)-(x1[xn]**2.+y1[xn]**2.+z1[xn]**2.)))/ ( abs((x1[1:]**2.+y1[1:]**2.+z1[1:]**2.)-(x1[xn]**2.+y1[xn]**2.+z1[xn]**2.))**2+(.2)**2 )**(3./2.))
fz[xn] = ((G*mass1[xn]*mass1[1:]*((x1[1:]**2.+y1[1:]**2.+z1[1:]**2.)-(x1[xn]**2.+y1[xn]**2.+z1[xn]**2.)))/ ( abs((x1[1:]**2.+y1[1:]**2.+z1[1:]**2.)-(x1[xn]**2.+y1[xn]**2.+z1[xn]**2.))**2+(.2)**2 )**(3./2.))
#Now, I need to split that array up by npoints, each particle has npoints forces acting on it.
fxx= np.array_split(fx,npoints)
fyy= np.array_split(fy,npoints)
fzz= np.array_split(fz,npoints)
#since the force on a particle is the sum of all forces acting on it, I'm summing each variable in each array together. e.g. [1,2,3]=[6]
fxxx= np.sum(fxx[xn], axis=1)
fyyy= np.sum(fyy[xn], axis=1)
fzzz= np.sum(fzz[xn], axis=1)

Vectorize python code for improved performance

I am writing a scientific code in python to calculate the energy of a system.
Here is my function : cte1, cte2, cte3, cte4 are constants previously computed; pii is np.pi (calculated beforehand, since it slows the loop otherwise). I calculate the 3 components of the total energy, then sum them up.
def calc_energy(diam):
Energy1 = cte2*((pii*diam**2/4)*t)
Energy2 = cte4*(pii*diam)*t
d=diam/t
u=np.sqrt((d)**2/(1+d**2))
cc= u**2
E = sp.special.ellipe(cc)
K = sp.special.ellipk(cc)
Id=cte3*d*(d**2+(1-d**2)*E/u-K/u)
Energy3 = cte*t**3*Id
total_energy = Energy1+Energy2+Energy3
return (total_energy,Energy1)
My first idea was to simply loop over all values of the diameter :
start_diam, stop_diam, step_diam = 1e-10, 500e-6, 1e-9 #Diametre
diametres = np.arange(start_diam,stop_diam,step_diam)
for d in diametres:
res1,res2 = calc_energy(d)
totalEnergy.append(res1)
Energy1.append(res2)
In an attempt to speed up calculations, I decided to use numpy to vectorize, as shown below :
diams = diametres.reshape(-1,1) #If not reshaped, calculations won't run
r1 = np.apply_along_axis(calc_energy,1,diams)
However, the "vectorized" solution does not properly work. When timing I get 5 seconds for the first solution and 18 seconds for the second one.
I guess I'm doing something the wrong way but can't figure out what.

With your current approach, you're applying a Python function to each element of your array, which carries additional overhead. Instead, you can pass the whole array to your function and get an array of answers back. Your existing function appears to work fine without any modification.
import numpy as np
from scipy import special
cte = 2
cte1 = 2
cte2 = 2
cte3 = 2
cte4 = 2
pii = np.pi
t = 2
def calc_energy(diam):
Energy1 = cte2*((pii*diam**2/4)*t)
Energy2 = cte4*(pii*diam)*t
d=diam/t
u=np.sqrt((d)**2/(1+d**2))
cc= u**2
E = special.ellipe(cc)
K = special.ellipk(cc)
Id=cte3*d*(d**2+(1-d**2)*E/u-K/u)
Energy3 = cte*t**3*Id
total_energy = Energy1+Energy2+Energy3
return (total_energy,Energy1)
start_diam, stop_diam, step_diam = 1e-10, 500e-6, 1e-9 #Diametre
diametres = np.arange(start_diam,stop_diam,step_diam)
a = calc_energy(diametres) # Pass the whole array

How do I vectorize the following loop in Numpy?

"""Some simulations to predict the future portfolio value based on past distribution. x is
a numpy array that contains past returns.The interpolated_returns are the returns
generated from the cdf of the past returns to simulate future returns. The portfolio
starts with a value of 100. portfolio_value is filled up progressively as
the program goes through every loop. The value is multiplied by the returns in that
period and a dollar is removed."""
portfolio_final = []
for i in range(10000):
portfolio_value = [100]
rand_values = np.random.rand(600)
interpolated_returns = np.interp(rand_values,cdf_values,x)
interpolated_returns = np.add(interpolated_returns,1)
for j in range(1,len(interpolated_returns)+1):
portfolio_value.append(interpolated_returns[j-1]*portfolio_value[j-1])
portfolio_value[j] = portfolio_value[j]-1
portfolio_final.append(portfolio_value[-1])
print (np.mean(portfolio_final))
I couldn't find a way to write this code using numpy. I was having a look at iterations using nditer but I was unable to move ahead with that.

I guess the easiest way to figure out how you can vectorize your stuff would be to look at the equations that govern your evolution and see how your portfolio actually iterates, finding patterns that could be vectorized instead of trying to vectorize the code you already have. You would have noticed that the cumprod actually appears quite often in your iterations.
Nevertheless you can find the semi-vectorized code below. I included your code as well such that you can compare the results. I also included a simple loop version of your code which is much easier to read and translatable into mathematical equations. So if you share this code with somebody else I would definitely use the simple loop option. If you want some fancy-pants vectorizing you can use the vector version. In case you need to keep track of your single steps you can also add an array to the simple loop option and append the pv at every step.
Hope that helps.
Edit: I have not tested anything for speed. That's something you can easily do yourself with timeit.
import numpy as np
from scipy.special import erf
# Prepare simple return model - Normal distributed with mu &sigma = 0.01
x = np.linspace(-10,10,100)
cdf_values = 0.5*(1+erf((x-0.01)/(0.01*np.sqrt(2))))
# Prepare setup such that every code snippet uses the same number of steps
# and the same random numbers
nSteps = 600
nIterations = 1
rnd = np.random.rand(nSteps)
# Your code - Gives the (supposedly) correct results
portfolio_final = []
for i in range(nIterations):
portfolio_value = [100]
rand_values = rnd
interpolated_returns = np.interp(rand_values,cdf_values,x)
interpolated_returns = np.add(interpolated_returns,1)
for j in range(1,len(interpolated_returns)+1):
portfolio_value.append(interpolated_returns[j-1]*portfolio_value[j-1])
portfolio_value[j] = portfolio_value[j]-1
portfolio_final.append(portfolio_value[-1])
print (np.mean(portfolio_final))
# Using vectors
portfolio_final = []
for i in range(nIterations):
portfolio_values = np.ones(nSteps)*100.0
rcp = np.cumprod(np.interp(rnd,cdf_values,x) + 1)
portfolio_values = rcp * (portfolio_values - np.cumsum(1.0/rcp))
portfolio_final.append(portfolio_values[-1])
print (np.mean(portfolio_final))
# Simple loop
portfolio_final = []
for i in range(nIterations):
pv = 100
rets = np.interp(rnd,cdf_values,x) + 1
for i in range(nSteps):
pv = pv * rets[i] - 1
portfolio_final.append(pv)
print (np.mean(portfolio_final))

Forget about np.nditer. It does not improve the speed of iterations. Only use if you intend to go one and use the C version (via cython).
I'm puzzled about that inner loop. What is it supposed to be doing special? Why the loop?
In tests with simulated values these 2 blocks of code produce the same thing:
interpolated_returns = np.add(interpolated_returns,1)
for j in range(1,len(interpolated_returns)+1):
portfolio_value.append(interpolated_returns[j-1]*portfolio[j-1])
portfolio_value[j] = portfolio_value[j]-1
interpolated_returns = (interpolated_returns+1)*portfolio - 1
portfolio_value = portfolio_value + interpolated_returns.tolist()
I assuming that interpolated_returns and portfolio are 1d arrays of the same length.

Optimizing Algorithm of large dataset calculations

Once again I find myself stumped with pandas, and how to best perform a 'vector operation'. My code works, however it will take a long time to iterate through everything.
What the code is trying to do is loop through shapes.cv and determine which shape_pt_sequence is a stop_id, and then assigns the stop_lat and stop_lon to shape_pt_lat and shape_pt_lon, while also marking the shape_pt_sequence as is_stop.
GISTS
stop_times.csv LINK
trips.csv LINK
shapes.csv LINK
Here is my code:
import pandas as pd
from haversine import *
'''
iterate through shapes and match stops along a shape_pt_sequence within
x amount of distance. for shape_pt_sequence that is closest, replace the stop
lat/lon to the shape_pt_lat/shape_pt_lon, and mark is_stop column with 1.
'''
# readability assignments for shapes.csv
shapes = pd.read_csv('csv/shapes.csv')
shapes_index = list(set(shapes['shape_id']))
shapes_index.sort(key=int)
shapes.set_index(['shape_id', 'shape_pt_sequence'], inplace=True)
# readability assignments for trips.csv
trips = pd.read_csv('csv/trips.csv')
trips_index = list(set(trips['trip_id']))
trips.set_index(['trip_id'], inplace=True)
# readability assignments for stops_times.csv
stop_times = pd.read_csv('csv/stop_times.csv')
stop_times.set_index(['trip_id','stop_sequence'], inplace=True)
print(len(stop_times.loc[1423492]))
# readability assginments for stops.csv
stops = pd.read_csv('csv/stops.csv')
stops.set_index(['stop_id'], inplace=True)
# for each trip_id
for i in trips_index:
print('******NEW TRIP_ID******')
print(i)
i = i.astype(int)
# for each stop_sequence in stop_times
for x in range(len(stop_times.loc[i])):
stop_lat = stop_times.loc[i,['stop_lat','stop_lon']].iloc[x,[0,1]][0]
stop_lon = stop_times.loc[i,['stop_lat','stop_lon']].iloc[x,[0,1]][1]
stop_coordinate = (stop_lat, stop_lon)
print(stop_coordinate)
# shape_id that matches trip_id
print('**SHAPE_ID**')
trips_shape_id = trips.loc[i,['shape_id']].iloc[0]
trips_shape_id = int(trips_shape_id)
print(trips_shape_id)
smallest = 0
for y in range(len(shapes.loc[trips_shape_id])):
shape_lat = shapes.loc[trips_shape_id].iloc[y,[0,1]][0]
shape_lon = shapes.loc[trips_shape_id].iloc[y,[0,1]][1]
shape_coordinate = (shape_lat, shape_lon)
haversined = haversine_mi(stop_coordinate, shape_coordinate)
if smallest == 0 or haversined < smallest:
smallest = haversined
smallest_shape_pt_indexer = y
else:
pass
print(haversined)
print('{0:.20f}'.format(smallest))
print('{0:.20f}'.format(smallest))
print(smallest_shape_pt_indexer)
# mark is_stop as 1
shapes.iloc[smallest_shape_pt_indexer,[2]] = 1
# replace coordinate value
shapes.loc[trips_shape_id].iloc[y,[0,1]][0] = stop_lat
shapes.loc[trips_shape_id].iloc[y,[0,1]][1] = stop_lon
shapes.to_csv('csv/shapes.csv', index=False)

What you could do to optmizing this code is use some threads/workers instead those for.
I recommend using the Pool of Workes as its very simple to use.
In:
for i in trips_index:
You could use something like:
from multiprocessing import Pool
pool = Pool(processes=4)
result = pool.apply_async(func, trips_index)
And than the method func would be like:
def func(i):
#code here
And you could simply put the whole for loop inside this method.
It would make it work with 4 subprocess in this example, git it a nice improvment.

One thing to consider is that a collection of trips will often have the same sequence of stops and the same shape data (the only difference between trips is the timing). So it might make sense to cache the find-closest-point-on-shape operation for (stop_id, shape_id). I bet that would reduce your runtime by an order-of-magnitude.

Is vectorizing this triple for loop in Python / Numpy possible?

I am trying to speed up my code which currently takes a little over an hour to run in Python / Numpy. The majority of computation time occurs in the function pasted below.
I'm trying to vectorize Z, but I'm finding it rather difficult for a triple for loop. Could I possible implement the numpy.diff function somewhere? Take a look:
def MyFESolver(KK,D,r,Z):
global tdim
global xdim
global q1
global q2
for k in range(1,tdim):
for i in range(1,xdim-1):
for j in range (1,xdim-1):
Z[k,i,j]=Z[k-1,i,j]+r*q1*Z[k-1,i,j]*(KK-Z[k-1,i,j])+D*q2*(Z[k-1,i-1,j]-4*Z[k-1,i,j]+Z[k-1,i+1,j]+Z[k-1,i,j-1]+Z[k-1,i,j+1])
return Z
tdim = 75 xdim = 25

I agree, it's tricky because the BCs on all four sides, ruin the simple structure of the Stiffness matrix. You can get rid of the space loops as such:
from pylab import *
from scipy.sparse.lil import lil_matrix
tdim = 3; xdim = 4; r = 1.0; q1, q2 = .05, .05; KK= 1.0; D = .5 #random values
Z = ones((tdim, xdim, xdim))
#Iterate in time
for k in range(1,tdim):
Z_prev = Z[k-1,:,:] #may need to flatten
Z_up = Z_prev[1:-1,2:]
Z_down = Z_prev[1:-1,:-2]
Z_left = Z_prev[:-2,1:-1]
Z_right = Z_prev[2:,1:-1]
centre_term = (q1*r*(Z_prev[1:-1,1:-1] + KK) - 4*D*q2)* Z_prev[1:-1,1:-1]
Z[k,1:-1,1:-1]= Z_prev[1:-1,1:-1]+ centre_term + q2*(Z_up+Z_left+Z_right+Z_down)
But I don't think you can get rid of the time loop...
I think the expression:
Z_up = Z_prev[1:-1,2:]
makes a copy in numpy, whereas what you want is a view - if you can figure out how to do this - it should be even faster (how much?)
Finally, I agree with the rest of the answerers - from experience, this kind of loops are better done in C and then wrapped into numpy. But the above should be faster than the original...

This looks like an ideal case for Cython. I'd suggest writing that function in Cython, it'll probably be hundreds of times faster.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to create time matrix effectively - python

Related

How to speed up a python 2 program with multiple nested for-loops

Vectorize python code for improved performance

How do I vectorize the following loop in Numpy?

Optimizing Algorithm of large dataset calculations

Is vectorizing this triple for loop in Python / Numpy possible?

Categories

Resources