Python calculate lots of distances quickly

Python calculate lots of distances quickly - python

I have an input of 36,742 points which means if I wanted to calculate the lower triangle of a distance matrix (using the vincenty approximation) I would need to generate 36,742*36,741*0.5 = 1,349,974,563 distances.
I want to keep the pair combinations which are within 50km of each other. My current set-up is as follows
shops= [[id,lat,lon]...]
def lower_triangle_mat(points):
for i in range(len(shops)-1):
for j in range(i+1,len(shops)):
yield [shops[i],shops[j]]
def return_stores_cutoff(points,cutoff_km=0):
below_cut = []
counter = 0
for x in lower_triangle_mat(points):
dist_km = vincenty(x[0][1:3],x[1][1:3]).km
counter += 1
if counter % 1000000 == 0:
print("%d out of %d" % (counter,(len(shops)*len(shops)-1*0.5)))
if dist_km <= cutoff_km:
below_cut.append([x[0][0],x[1][0],dist_km])
return below_cut
start = time.clock()
stores = return_stores_cutoff(points=shops,cutoff_km=50)
print(time.clock() - start)
This will obviously take hours and hours. Some possibilities I was thinking of:
Use numpy to vectorise these calculations rather than looping through
Use some kind of hashing to get a quick rough-cut off (all stores within 100km) and then only calculate accurate distances between those stores
Instead of storing the points in a list use something like a quad-tree but I think that only helps with the ranking of close points rather than actual distance -> so I guess some kind of geodatabase
I can obviously try the haversine or project and use euclidean distances, however I am interested in using the most accurate measure possible
Make use of parallel processing (however I was having a bit of difficulty coming up how to cut the list to still get all the relevant pairs).
Edit: I think geohashing is definitely needed here - an example from:
from geoindex import GeoGridIndex, GeoPoint
geo_index = GeoGridIndex()
for _ in range(10000):
lat = random.random()*180 - 90
lng = random.random()*360 - 180
index.add_point(GeoPoint(lat, lng))
center_point = GeoPoint(37.7772448, -122.3955118)
for distance, point in index.get_nearest_points(center_point, 10, 'km'):
print("We found {0} in {1} km".format(point, distance))
However, I would also like to vectorise (instead of loop) the distance calculations for the stores returned by the geo-hash.
Edit2: Pouria Hadjibagheri - I tried using lambda and map:
# [B]: Mapping approach
lwr_tr_mat = ((shops[i],shops[j]) for i in range(len(shops)-1) for j in range(i+1,len(shops)))
func = lambda x: (x[0][0],x[1][0],vincenty(x[0],x[1]).km)
# Trying to see if conditional statements slow this down
func_cond = lambda x: (x[0][0],x[1][0],vincenty(x[0],x[1]).km) if vincenty(x[0],x[1]).km <= 50 else None
start = time.clock()
out_dist = list(map(func,lwr_tr_mat))
print(time.clock() - start)
start = time.clock()
out_dist = list(map(func_cond,lwr_tr_mat))
print(time.clock() - start)
And they were all around 61 seconds (I restricted number of stores to 2000 from 32,000). Perhaps I used map incorrectly?

This sounds like a classic use case for k-D trees.
If you first transform your points into Euclidean space then you can use the query_pairs method of scipy.spatial.cKDTree:
from scipy.spatial import cKDTree
tree = cKDTree(data)
# where data is (nshops, ndim) containing the Euclidean coordinates of each shop
# in units of km
pairs = tree.query_pairs(50, p=2) # 50km radius, L2 (Euclidean) norm
pairs will be a set of (i, j) tuples corresponding to the row indices of pairs of shops that are ≤50km from each other.
The output of tree.sparse_distance_matrix is a scipy.sparse.dok_matrix. Since the matrix will be symmetric and you're only interested in unique row/column pairs, you could use scipy.sparse.tril to zero out the upper triangle, giving you a scipy.sparse.coo_matrix. From there you can access the nonzero row and column indices and their corresponding distance values via the .row, .col and .data attributes:
from scipy import sparse
tree_dist = tree.sparse_distance_matrix(tree, max_distance=10000, p=2)
udist = sparse.tril(tree_dist, k=-1) # zero the main diagonal
ridx = udist.row # row indices
cidx = udist.col # column indices
dist = udist.data # distance values

Have you tried mapping entire arrays and functions instead of iterating through them? An example would be as follows:
from numpy.random import rand
my_array = rand(int(5e7), 1) # An array of 50,000,000 random numbers in double.
Now what is normally done is:
squared_list_iter = [value**2 for value in my_array]
Which of course works, but is optimally invalid.
The alternative would be to map the array with a function. This is done as follows:
func = lambda x: x**2 # Here is what I want to do on my array.
squared_list_map = map(func, test) # Here I am doing it!
Now, one might ask, how is this any different, or even better for that matter? Since now we have added a call to a function, too! Here is your answer:
For the former solution (via iteration):
1 loop: 1.11 minutes.
Compared to the latter solution (mapping):
500 loop, on average 560 ns.
Simultaneous conversion of a map() to list by list(map(my_list)) would increase the time by a factor of 10 to approximately 500 ms.
You choose!

Thanks everyone's help. I think I have solved this by incorporating all the suggestions.
I use numpy to import the geographic co-ordinates and then project them using "France Lambert - 93". This lets me fill scipy.spatial.cKDTree with the points and then calculate a sparse_distance_matrix by specifying a cut-off of 50km (my projected points are in metres). I then extract extract the lower-triangle to a CSV.
import numpy as np
import csv
import time
from pyproj import Proj, transform
#http://epsg.io/2154 (accuracy: 1.0m)
fr = '+proj=lcc +lat_1=49 +lat_2=44 +lat_0=46.5 +lon_0=3 \
+x_0=700000 +y_0=6600000 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 \
+units=m +no_defs'
#http://epsg.io/27700-5339 (accuracy: 1.0m)
uk = '+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 \
+x_0=400000 +y_0=-100000 +ellps=airy \
+towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs'
path_to_csv = '.../raw_in.csv'
out_csv = '.../out.csv'
def proj_arr(points):
inproj = Proj(init='epsg:4326')
outproj = Proj(uk)
# origin|destination|lon|lat
func = lambda x: transform(inproj,outproj,x[2],x[1])
return np.array(list(map(func, points)))
tstart = time.time()
# Import points as geographic coordinates
# ID|lat|lon
#Sample to try and replicate
#points = np.array([
# [39007,46.585012,5.5857829],
# [88086,48.192370,6.7296289],
# [62627,50.309155,3.0218611],
# [14020,49.133972,-0.15851507],
# [1091, 42.981765,2.0104902]])
#
points = np.genfromtxt(path_to_csv,
delimiter=',',
skip_header=1)
print("Total points: %d" % len(points))
print("Triangular matrix contains: %d" % (len(points)*((len(points))-1)*0.5))
# Get projected co-ordinates
proj_pnts = proj_arr(points)
# Fill quad-tree
from scipy.spatial import cKDTree
tree = cKDTree(proj_pnts)
cut_off_metres = 1600
tree_dist = tree.sparse_distance_matrix(tree,
max_distance=cut_off_metres,
p=2)
# Extract triangle
from scipy import sparse
udist = sparse.tril(tree_dist, k=-1) # zero the main diagonal
print("Distances after quad-tree cut-off: %d " % len(udist.data))
# Export CSV
import csv
f = open(out_csv, 'w', newline='')
w = csv.writer(f, delimiter=",", )
w.writerow(['id_a','lat_a','lon_a','id_b','lat_b','lon_b','metres'])
w.writerows(np.column_stack((points[udist.row ],
points[udist.col],
udist.data)))
f.close()
"""
Get ID labels
"""
id_to_csv = '...id.csv'
id_labels = np.genfromtxt(id_to_csv,
delimiter=',',
skip_header=1,
dtype='U')
"""
Try vincenty on the un-projected co-ordinates
"""
from geopy.distance import vincenty
vout_csv = '.../out_vin.csv'
test_vin = np.column_stack((points[udist.row].T[1:3].T,
points[udist.col].T[1:3].T))
func = lambda x: vincenty(x[0:2],x[2:4]).m
output = list(map(func,test_vin))
# Export CSV
f = open(vout_csv, 'w', newline='')
w = csv.writer(f, delimiter=",", )
w.writerow(['id_a','id_a2', 'lat_a','lon_a',
'id_b','id_b2', 'lat_b','lon_b',
'proj_metres','vincenty_metres'])
w.writerows(np.column_stack((list(id_labels[udist.row]),
points[udist.row ],
list(id_labels[udist.col]),
points[udist.col],
udist.data,
output,
)))
f.close()
print("Finished in %.0f seconds" % (time.time()-tstart)
This approach took 164 seconds to generate (for 5,306,434 distances) - compared to 9 - and also around 90 seconds to save to disk.
I then compared the difference in the vincenty distance and the hypotenuse distance (on the projected co-ordinates).
The mean difference in metres was 2.7 and the mean difference/metres was 0.0073% - which looks great.

"Use some kind of hashing to get a quick rough-cut off (all stores within 100km) and then only calculate accurate distances between those stores"
I think this might be better called gridding. So first make a dict, with a set of coords as the key and put each shop in a 50km bucket near that point. then when you are calculating distances, you only look in nearby buckets, rather than iterate through each shop in the whole universe

You can use vectorization with the haversine formula discussed in this thread Haversine Formula in Python (Bearing and Distance between two GPS points)
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6371 * c
Here you have the %%timeit for 7 451 653 distances
642 ms ± 20.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Related

Fourier transform and Full Width Half Maximum

I'm trying to calculate the Fourier transform of three muon polarization signals, which are simply cosine functions multiplied by an exponential decay.
So, doing the Fourier transform, we are going to see broadened peaks centered at the corresponding frequency.
The problem is that I have already tried to do the Fourier transform, but I do not know if it's correct; furthermore, I'm trying to calculate the FWHM using the scipy.stats.moment function, using the 2-nd moment: is it correct?
Can you tell me if the code is correct?
I put here the three signals in .npy file and the code used for the Fourier analysis.
The signals are signal[0], signal[1] and signal[2], arrays of 10 dimension.
Each signal[k] contains 10 polarization functions (1 for each applied magnetic field), which are signals of 400 points.
The corresponding files are signal_100, signal_110, signal_111, provided here:
https://github.com/JonathanFrassineti/UNDI-examples.
Ah, the frequencies range from 0 Hz to 40 MHz.
Thank you!
N = 400 # Number of signal points.
N1 = 40000000
T = 1./800. # Sampling spacing.
xf = np.fft.rfftfreq(N1, T)
yf1 = FWHM1 = sigma1 = delta1 = bhar1 = np.zeros(fields, dtype = object)
yf2 = FWHM2 = sigma2 = delta2 = bhar2 = np.zeros(fields, dtype = object)
yf3 = FWHM3 = sigma3 = delta3 = bhar3 = np.zeros(fields, dtype = object)
for j in range(fields):
# Fourier transform.
yf1[j] = np.fft.rfft(signal[0][j])
yf2[j] = np.fft.rfft(signal[1][j])
yf3[j] = np.fft.rfft(signal[2][j])
FWHM1[j] = moment(yf1[j], moment=2)
FWHM2[j] = moment(yf2[j], moment=2)
FWHM3[j] = moment(yf3[j], moment=2)
sigma1[j] = np.sqrt(np.abs(FWHM3[j]))/2.355
sigma2[j] = np.sqrt(np.abs(FWHM2[j]))/2.355
sigma3[j] = np.sqrt(np.abs(FWHM3[j]))/2.355
delta1[j] = sigma1[j]/gamma_Cu
delta2[j] = sigma2[j]/gamma_Cu
delta3[j] = sigma3[j]/gamma_Cu
bhar1[j] = (((a*angtom)**3)/(1e-7*gamma_Cu*hbar))*delta1[j]
bhar2[j] = (((a*angtom)**3)/(1e-7*gamma_Cu*hbar))*delta2[j]
bhar3[j] = (((a*angtom)**3)/(1e-7*gamma_Cu*hbar))*delta3[j]

Currently i work in a python project with same object. I've a set of data of magnetic field B(x,y,z), i think ideal would be to organize your data periodically at event and deduce Fe (sampling_rate).
f(A, t)=A*( cos(2*pi*fe*t) - sin(2*pi*fe*t)
B=[ 50, 50, 10, 3 ] # where each data is |B| normal at second
res=[ f(a, time) for time, a in enumerate(B) ]
fourrier_transform=np.fft.fft( res )
frequency= fftfreq([ time for time in range(len(B)) ]) # U can use fftfreq provide by scipy
Please star this project, research ressource to contribute
RFSignalToolkit github project

Efficient sum of Gaussians in 3D with NumPy using large arrays

I have an M x 3 array of 3D coordinates, coords (M ~1000-10000), and I would like to compute the sum of Gaussians centered at these coordinates over a mesh grid 3D array. The mesh grid 3D array is typically something like 64 x 64 x 64, but sometimes upwards of 256 x 256 x 256, and can go even larger. I’ve followed this question to get started, by converting my meshgrid array into an array of N x 3 coordinates, xyz, where N is 64^3 or 256^3, etc. However, for large array sizes it takes too much memory to vectorize the entire calculation (understandable since it could approach 1e11 elements and consume a terabyte of RAM) so I’ve broken it up into a loop over M coordinates. However, this is too slow.
I’m wondering if there is any way to speed this up at all without overloading memory. By converting the meshgrid to xyz, I feel like I’ve lost any advantage of the grid being equally spaced, and that somehow, maybe with scipy.ndimage, I should be able to take advantage of the even spacing to speed things up.
Here’s my initial start:
import numpy as np
from scipy import spatial
#create meshgrid
side = 100.
n = 64 #could be 256 or larger
x_ = np.linspace(-side/2,side/2,n)
x,y,z = np.meshgrid(x_,x_,x_,indexing='ij')
#convert meshgrid to list of coordinates
xyz = np.column_stack((x.ravel(),y.ravel(),z.ravel()))
#create some coordinates
coords = np.random.random(size=(1000,3))*side - side/2
def sumofgauss(coords,xyz,sigma):
"""Simple isotropic gaussian sum at coordinate locations."""
n = int(round(xyz.shape[0]**(1/3.))) #get n samples for reshaping to 3D later
#this version overloads memory
#dist = spatial.distance.cdist(coords, xyz)
#dist *= dist
#values = 1./np.sqrt(2*np.pi*sigma**2) * np.exp(-dist/(2*sigma**2))
#values = np.sum(values,axis=0)
#run cdist in a loop over coords to avoid overloading memory
values = np.zeros((xyz.shape[0]))
for i in range(coords.shape[0]):
dist = spatial.distance.cdist(coords[None,i], xyz)
dist *= dist
values += 1./np.sqrt(2*np.pi*sigma**2) * np.exp(-dist[0]/(2*sigma**2))
return values.reshape(n,n,n)
image = sumofgauss(coords,xyz,1.0)
import matplotlib.pyplot as plt
plt.imshow(image[n/2]) #show a slice
plt.show()
M = 1000, N = 64 (~5 seconds):
M = 1000, N = 256 (~10 minutes):

Considering that many of your distance calculations will give zero weight after the exponential, you can probably drop a lot of your distances. Doing big chunks of distance calculations while dropping distances which are greater than a threshhold is usually faster with KDTree:
import numpy as np
from scipy.spatial import cKDTree # so we can get a `coo_matrix` output
def gaussgrid(coords, sigma = 1, n = 64, side = 100, eps = None):
x_ = np.linspace(-side/2,side/2,n)
x,y,z = np.meshgrid(x_,x_,x_,indexing='ij')
xyz = np.column_stack((x.ravel(),y.ravel(),z.ravel()))
if eps is None:
eps = np.finfo('float64').eps
thr = -np.log(eps) * 2 * sigma**2
data_tree = cKDTree(coords)
discr = 1000 # you can tweak this to get best results on your system
values = np.empty(n**3)
for i in range(n**3//discr + 1):
slc = slice(i * discr, i * discr + discr)
grid_tree = cKDTree(xyz[slc])
dists = grid_tree.sparse_distance_matrix(data_tree, thr, output_type = 'coo_matrix')
dists.data = 1./np.sqrt(2*np.pi*sigma**2) * np.exp(-dists.data/(2*sigma**2))
values[slc] = dists.sum(1).squeeze()
return values.reshape(n,n,n)
Now, even if you keep eps = None it'll be a bit faster as you're still returning about 10% your distances, but with eps = 1e-6 or so, you should get a big speedup. On my system:
%timeit out = sumofgauss(coords, xyz, 1.0)
1 loop, best of 3: 23.7 s per loop
%timeit out = gaussgrid(coords)
1 loop, best of 3: 2.12 s per loop
%timeit out = gaussgrid(coords, eps = 1e-6)
1 loop, best of 3: 382 ms per loop

Splitting integrated probability density into two spatial regions

I have some probability density function:
T = 10000
tmin = 0
tmax = 10**20
t = np.linspace(tmin, tmax, T)
time = np.asarray(t) #this line may be redundant
for j in range(T):
timedep_PD[j]= probdensity_func(x,time[j],initial_state)
I want to integrate it over two distinct regions of x. I tried the following to split the timedep_PD array into two spatial regions and then proceeded to integrate:
step = abs(xmin - xmax) / T
l1 = int(np.floor((abs(ab - xmin)* T ) / abs(xmin - xmax)))
l2 = int(np.floor((abs(bd - ab)* T ) / abs(xmin - xmax)))
#For spatial region 1
R1 = np.empty([l1])
R1 = x[:l1]
for i in range(T):
Pd1[i] = Pd[i][:l1]
#For spatial region 2
Pd2 = np.empty([T,l2])
R2 = np.empty([l2])
R2 = x[l1:l1+l2]
for i in range(T):
Pd2[i] = Pd[i][l1:l1+l2]
#Integrating over each spatial region
for i in range(T):
P[0][i] = np.trapz(Pd1[i],R1)
P[1][i] = np.trapz(Pd2[i],R2)
Is there an easier/more clear way to go about splitting up a probability density function into two spatial regions and then integrating within each spatial region at each time-step?

The loops can be eliminated by using vectorized operations instead. It's not clear whether Pd is a 2D NumPy array; it it's something else (e.g., a list of lists), it should be converted to a 2D NumPy array with np.array(...). After that you can do this:
Pd1 = Pd[:, :l1]
Pd2 = Pd[:, l1:l1+l2]
No need to loop over the time index; the slicing happens for all times at once (having : in place of an index means "all valid indices").
Similarly, np.trapz can integrate all time slices at once:
P1 = np.trapz(Pd1, R1, axis=1)
P2 = np.trapz(Pd2, R2, axis=1)
Each P1 and P2 is now a time series of integrals. The axis parameter determines along which axis Pd1 gets integrated - it's the second axis, i.e., space.

Vectorizing Haversine distance calculation in Python

I am trying to calculate a distance matrix for a long list of locations identified by Latitude & Longitude using the Haversine formula that takes two tuples of coordinate pairs to produce the distance:
def haversine(point1, point2, miles=False):
""" Calculate the great-circle distance bewteen two points on the Earth surface.
:input: two 2-tuples, containing the latitude and longitude of each point
in decimal degrees.
Example: haversine((45.7597, 4.8422), (48.8567, 2.3508))
:output: Returns the distance bewteen the two points.
The default unit is kilometers. Miles can be returned
if the ``miles`` parameter is set to True.
"""
I can calculate the distance between all points using a nested for loop as follows:
data.head()
id coordinates
0 1 (16.3457688674, 6.30354512503)
1 2 (12.494749307, 28.6263955635)
2 3 (27.794615136, 60.0324947881)
3 4 (44.4269923769, 110.114216113)
4 5 (-69.8540884125, 87.9468778773)
using a simple function:
distance = {}
def haver_loop(df):
for i, point1 in df.iterrows():
distance[i] = []
for j, point2 in df.iterrows():
distance[i].append(haversine(point1.coordinates, point2.coordinates))
return pd.DataFrame.from_dict(distance, orient='index')
But this takes quite a while given the time complexity, running at around 20s for 500 points and I have a much longer list. This has me looking at vectorization, and I've come across numpy.vectorize ((docs), but can't figure out how to apply it in this context.

From haversine's function definition, it looked pretty parallelizable. So, using one of the best tools for vectorization with NumPy aka broadcasting and replacing the math funcs with the NumPy equivalents ufuncs, here's one vectorized solution -
# Get data as a Nx2 shaped NumPy array
data = np.array(df['coordinates'].tolist())
# Convert to radians
data = np.deg2rad(data)
# Extract col-1 and 2 as latitudes and longitudes
lat = data[:,0]
lng = data[:,1]
# Elementwise differentiations for lattitudes & longitudes
diff_lat = lat[:,None] - lat
diff_lng = lng[:,None] - lng
# Finally Calculate haversine
d = np.sin(diff_lat/2)**2 + np.cos(lat[:,None])*np.cos(lat) * np.sin(diff_lng/2)**2
return 2 * 6371 * np.arcsin(np.sqrt(d))
Runtime tests -
The other np.vectorize based solution has shown some positive promise on performance improvement over the original code, so this section would compare the posted broadcasting based approach against that one.
Function definitions -
def vectotized_based(df):
haver_vec = np.vectorize(haversine, otypes=[np.int16])
return df.groupby('id').apply(lambda x: pd.Series(haver_vec(df.coordinates, x.coordinates)))
def broadcasting_based(df):
data = np.array(df['coordinates'].tolist())
data = np.deg2rad(data)
lat = data[:,0]
lng = data[:,1]
diff_lat = lat[:,None] - lat
diff_lng = lng[:,None] - lng
d = np.sin(diff_lat/2)**2 + np.cos(lat[:,None])*np.cos(lat) * np.sin(diff_lng/2)**2
return 2 * 6371 * np.arcsin(np.sqrt(d))
Timings -
In [123]: # Input
...: length = 500
...: d1 = np.random.uniform(-90, 90, length)
...: d2 = np.random.uniform(-180, 180, length)
...: coords = tuple(zip(d1, d2))
...: df = pd.DataFrame({'id':np.arange(length), 'coordinates':coords})
...:
In [124]: %timeit vectotized_based(df)
1 loops, best of 3: 1.12 s per loop
In [125]: %timeit broadcasting_based(df)
10 loops, best of 3: 68.7 ms per loop

You would provide your function as an argument to np.vectorize(), and could then use it as an argument to pandas.groupby.apply as illustrated below:
haver_vec = np.vectorize(haversine, otypes=[np.int16])
distance = df.groupby('id').apply(lambda x: pd.Series(haver_vec(df.coordinates, x.coordinates)))
For instance, with sample data as follows:
length = 500
df = pd.DataFrame({'id':np.arange(length), 'coordinates':tuple(zip(np.random.uniform(-90, 90, length), np.random.uniform(-180, 180, length)))})
compare for 500 points:
def haver_vect(data):
distance = data.groupby('id').apply(lambda x: pd.Series(haver_vec(data.coordinates, x.coordinates)))
return distance
%timeit haver_loop(df): 1 loops, best of 3: 35.5 s per loop
%timeit haver_vect(df): 1 loops, best of 3: 593 ms per loop

start by getting all combinations using itertools.product
results= [(p1,p2,haversine(p1,p2))for p1,p2 in itertools.product(points,repeat=2)]
that said Im not sure how fast it will be this looks like it might be a duplicate of Python: speeding up geographic comparison

Converting MATLAB's interp1 to Python interp1d

I'm converting a MATLAB code into a Python code.
The code uses the function interp1 in MATLAB. I found that the scipy function interp1d should be what I'm after, but I'm not sure. Could you tell me if the code, I implemented is correct?
My Python version is 3.4.1, MATLAB version is R2013a. However, the code has been implemented around 2010].
MATLAB:
S_T = [0.0, 2.181716948, 4.363766232, 6.546480392, 8.730192373, ...
10.91523573, 13.10194482, 15.29065504, 17.48170299, 19.67542671, ...
21.87216588, 24.07226205, 26.27605882, 28.48390208; ...
1.0, 1.000382662968538, 1.0020234819906781, 1.0040560245904753, ...
1.0055690037530718, 1.0046180687475195, 1.000824223678225, ...
0.9954866694014762, 0.9891408937764872, 0.9822543350571298, ...
0.97480163751874, 0.9666158376141503, 0.9571711322843011, ...
0.9460998105962408; ...
1.0, 0.9992731388936672, 0.9995093132493109, 0.9997021748479805, ...
0.9982835412406582, 0.9926319477117723, 0.9833685776596993, ...
0.9730725288209638, 0.9626092685176822, 0.9525234896714959, ...
0.9426698515488858, 0.9326788630704709, 0.9218100196936996, ...
0.9095717918978693];
S = transpose(S_T);
dist = 0.00137;
old = 15.61;
ll = 125;
ref = 250;
start = 225;
high = 7500;
low = 2;
U = zeros(low,low,high);
for ii=1:high
g0= start-ref*dist*ii;
g1= g0+ll;
if(g0 <=0.0 && g1 >= 0.0)
temp= old/2*(1-cos(2*pi*g0/ll));
for jj=1:low
U(jj,jj,ii)= temp;
end
end
end
for ii=1:low
S_mod(ii,1,:)=interp1(S(:,1),S(:,ii+1),U(ii,ii,:),'linear');
end
Python:
import numpy
import os
from scipy import interpolate
S = [[0.0, 2.181716948, 4.363766232, 6.546480392, 8.730192373, 10.91523573, 13.10194482, 15.29065504, \
17.48170299, 19.67542671, 21.87216588, 24.07226205, 26.27605882, 28.48390208], \
[1.0, 1.000382662968538, 1.0020234819906781, 1.0040560245904753, 1.0055690037530718, 1.0046180687475195, \
1.000824223678225, 0.9954866694014762, 0.9891408937764872, 0.9822543350571298, 0.97480163751874, \
0.9666158376141503, 0.9571711322843011, 0.9460998105962408], \
[1.0, 0.9992731388936672, 0.9995093132493109, 0.9997021748479805, 0.9982835412406582, 0.9926319477117723, \
0.9833685776596993, 0.9730725288209638, 0.9626092685176822, 0.9525234896714959, 0.9426698515488858, \
0.9326788630704709, 0.9218100196936996, 0.9095717918978693]]
dist = 0.00137
old = 15.61
ll = 125
ref = 250
start = 225
high = 7500
low = 2
U = [numpy.zeros( [low, low] ) for _ in range(high)]
for ii in range(high):
g0 = start - ref * dist * (ii+1)
g1 = g0 + ll
if g0 <=0.0 and g1 >= 0.0:
for jj in range(low):
U[ii][jj,jj] = old / 2 * (1 - numpy.cos( 2 * numpy.pi * g0 / ll) )
S_mod = []
for jj in range(high):
temp = []
for ii in range(low):
temp.append(interpolate.interp1d( S[0], S[ii+1], U[jj][ii,ii]))
S_mod.append(temp)

Ok so I've solved my own problem (thanks to the explanation on the MATLAB interp1 from Alex!).
The python interp1d doesn't have query points in itself, but instead creates a function which you then use to get your new data points. Thus, it should be:
f = interpolate.interp1d( S[0], S[ii+1])
temp.append(f(U[jj][ii,ii]))

There is a python library that let's you use MATLAB functions through wrappers: mlabwrap. If you don't need to change the code of the functions itself this could save you some time.

I don't know scipy, but I can tell you what the interp1 call in MATLAB is doing:
http://www.mathworks.com/help/matlab/ref/interp1.html
You are using the syntax:
vq = interp1(x,v,xq,method)
"Vector x contains the sample points, and v contains the corresponding values, v(x). Vector xq contains the coordinates of the query points."
So, in your code, S(:,1) contains the sample points where your grid is defined, S(:,ii+1) contains your sampled values for your 1-D function, and U(ii,ii,:) contains the query points where you want to interpolate to find new functional values between known values in your grid. You are using linear interpolation.
1-D interpolation is an extremely well defined operation, and interp1 is a relatively straightforward interface for this operation. What exactly do you not understand? Are you clear what interpolation is?
Essentially, you have a discretely defined function f[x], the first argument to interp1 is x, the second argument is f[x], and the third argument are arbitrarily defined query points Xq at which you want to find new function values f[Xq]. Since these values are not known, you have to use an interpolation method for how you will approximate f[Xq]. 'linear' means you will use a linear weighted average of the two known sampled neighbors (left and right neighbors) nearest to Xq.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python calculate lots of distances quickly - python

Related

Fourier transform and Full Width Half Maximum

Efficient sum of Gaussians in 3D with NumPy using large arrays

Splitting integrated probability density into two spatial regions

Vectorizing Haversine distance calculation in Python

Converting MATLAB's interp1 to Python interp1d

Categories

Resources