I have multiple Lines in 3D space. Each Line is described by 2 points, a start point, and a stop point. I'm trying to find the Line that best 'fits' these Lines (minimize the distance between the new line, and the other Lines). Ideally, I'd also like to know the least-square distance of the new line to the other lines (or some other statistic for deciding whether it is a good fit or not).
Below is Code I've written so far that does what I want, but I feel there is a faster, analytical way to calculate it, and speed is extremely important for my application.
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize._minimize import minimize
import time
def min_line(FINAL_LINE,LINES):
sqr=0
x0,x1,y0,y1,z0,z1=FINAL_LINE
#for each original line, calculate the distance from it to the new line
for l in LINES:
p1,p2,dist=closestDistanceBetweenLines(l[0],l[1],np.array([x0,y0,z0]),np.array([x1,y1,z1]))
#add that distance to the total
sqr+=np.square(dist)
return sqr
def closestDistanceBetweenLines(a0,a1,b0,b1):
'''
https://stackoverflow.com/questions/2824478/shortest-distance-between-two-line-segments
Given two lines defined by numpy.array pairs (a0,a1,b0,b1)
Return the closest points on each segment and their distance
'''
# Calculate denomitator
A = a1 - a0
B = b1 - b0
magA = np.linalg.norm(A)
magB = np.linalg.norm(B)
_A = A / magA
_B = B / magB
cross = np.cross(_A, _B);
denom = np.linalg.norm(cross)**2
# If lines are parallel (denom=0) test if lines overlap.
# If they don't overlap then there is a closest point solution.
# If they do overlap, there are infinite closest positions, but there is a closest distance
if not denom:
d0 = np.dot(_A,(b0-a0))
# Segments overlap, return distance between parallel segments
return [],[],np.linalg.norm(((d0*_A)+a0)-b0)
# Lines criss-cross: Calculate the projected closest points
t = (b0 - a0);
detA = np.linalg.det([t, _B, cross])
detB = np.linalg.det([t, _A, cross])
t0 = detA/denom;
t1 = detB/denom;
pA = a0 + (_A * t0) # Projected closest point on segment A
pB = b0 + (_B * t1) # Projected closest point on segment B
return pA,pB,np.linalg.norm(pA-pB)
if __name__ == '__main__':
#setup the 3d figure for plotting
fig = plt.figure()
ax = plt.axes(projection='3d')
#4 sets of start/stop points forming 4 lines
a_start=np.array([0, 0, 0])
a_end=np.array([0.1, 1, 1])
b_start=np.array([0, 2, 0])
b_end=np.array([-0.1, 1, 1])
c_start=np.array([1, 0, 0])
c_end=np.array([.9, 1, 1])
d_start=np.array([1, 2, 0])
d_end=np.array([1.1, 1, 1])
e_start=np.array([1, 0, 0])
e_end=np.array([.9, 1, 2])
f_start=np.array([1, 2, 0])
f_end=np.array([1.1, 1, 2])
#turn them into a single Numpy array
a=[a_start,a_end]
b=[b_start,b_end]
c=[c_start,c_end]
d=[d_start,d_end]
e=[e_start,e_end]
f=[f_start,f_end]
all_points=np.array([a,b,c,d])
if(len(all_points)>2):
rets=minimize(min_line,[0,0,0,1,1,1], all_points)
print(rets)
#plot the original lines Green
for p in all_points:
ax.plot3D([p[0][0],p[1][0]],[p[0][1],p[1][1]],[p[0][2],p[1][2]],'g')
#plot the new, fit line Black
ax.plot3D([rets.x[0],rets.x[1]],[rets.x[2],rets.x[3]],[rets.x[4],rets.x[5]],'k')
plt.show()
Each Lines Starting point will always remain the same. The Starting points will never be the same as each other (they are all coplanar though). There will always be at least 3 Starting Lines.
Current Code Fitted to Lines A,B,C,D.
Related
I'm trying to (roughly) equally space the points of a line to a predefined distance.
It's ok to have some tolerance between the distances but as close as possible would be desirable.
I know I could manually iterate through each point in my line and check the p1 distance vs p2 and add more points if needed.
But I wondered if anyone knows if there is a way to achieve this with shapely as I already have the coords in a LineString.
One way to do that is to use interpolate method that returns points at specified distances along the line. You just have to generate a list of the distances somehow first. Taking the input line example from Roy2012's answer:
import numpy as np
from shapely.geometry import LineString
from shapely.ops import unary_union
line = LineString(([0, 0], [2, 1], [3, 2], [3.5, 1], [5, 2]))
Splitting at a specified distance:
distance_delta = 0.9
distances = np.arange(0, line.length, distance_delta)
# or alternatively without NumPy:
# points_count = int(line.length // distance_delta) + 1
# distances = (distance_delta * i for i in range(points_count))
points = [line.interpolate(distance) for distance in distances] + [line.boundary[1]]
multipoint = unary_union(points) # or new_line = LineString(points)
Note that since the distance is fixed you can have problems at the end of the line as shown in the image. Depending on what you want you can include/exclude the [line.boundary[1]] part which adds the line's endpoint or use distances = np.arange(0, line.length, distance_delta)[:-1] to exclude the penultimate point.
Also, note that the unary_union I'm using should be more efficient than calling object.union(other) inside a loop, as shown in another answer.
Splitting to a fixed number of points:
n = 7
# or to get the distances closest to the desired one:
# n = round(line.length / desired_distance_delta)
distances = np.linspace(0, line.length, n)
# or alternatively without NumPy:
# distances = (line.length * i / (n - 1) for i in range(n))
points = [line.interpolate(distance) for distance in distances]
multipoint = unary_union(points) # or new_line = LineString(points)
You can use the shapely substring operation:
from shapely.geometry import LineString
from shapely.ops import substring
line = LineString(([0, 0], [2, 1], [3,2], [3.5, 1], [5, 2]))
mp = shapely.geometry.MultiPoint()
for i in np.arange(0, line.length, 0.2):
s = substring(line, i, i+0.2)
mp = mp.union(s.boundary)
The result for this data is given below. Each circle is a point.
Consider the following two sets of points. I would like to find the optimal 2D translation and rotation that aligns the largest number of points between dataset blue and dataset orange, where a point is considered aligned if the distance to its nearest neighbor in the other dataset is smaller than a threshold.
I understand that this is related to "Iterative Closest Point" algorithms, but in this case the situation is a bit harder because not all points from one dataset are in the other, and also because some points may turn out to be "false positives" (noise).
Is there an efficient way of doing this?
I come across the same problem and found solution in comaring the CCD stars observation figures, the basic idea is to find the best match of the triangles of the two set of points.
I then use astroalign package to calculate the transformation matrix, and align to all the points. Thank the Lord, it works pretty good.
import itertools
import numpy as np
import matplotlib.pyplot as plt
import astroalign as aa
def getTriangles(set_X, X_combs):
"""
Inefficient way of obtaining the lengths of each triangle's side.
Normalized so that the minimum length is 1.
"""
triang = []
for p0, p1, p2 in X_combs:
d1 = np.sqrt((set_X[p0][0] - set_X[p1][0]) ** 2 +
(set_X[p0][1] - set_X[p1][1]) ** 2)
d2 = np.sqrt((set_X[p0][0] - set_X[p2][0]) ** 2 +
(set_X[p0][1] - set_X[p2][1]) ** 2)
d3 = np.sqrt((set_X[p1][0] - set_X[p2][0]) ** 2 +
(set_X[p1][1] - set_X[p2][1]) ** 2)
d_min = min(d1, d2, d3)
d_unsort = [d1 / d_min, d2 / d_min, d3 / d_min]
triang.append(sorted(d_unsort))
return triang
def sumTriangles(ref_triang, in_triang):
"""
For each normalized triangle in ref, compare with each normalized triangle
in B. find the differences between their sides, sum their absolute values,
and select the two triangles with the smallest sum of absolute differences.
"""
tr_sum, tr_idx = [], []
for i, ref_tr in enumerate(ref_triang):
for j, in_tr in enumerate(in_triang):
# Absolute value of lengths differences.
tr_diff = abs(np.array(ref_tr) - np.array(in_tr))
# Sum the differences
tr_sum.append(sum(tr_diff))
tr_idx.append([i, j])
# Index of the triangles in ref and in with the smallest sum of absolute
# length differences.
tr_idx_min = tr_idx[tr_sum.index(min(tr_sum))]
ref_idx, in_idx = tr_idx_min[0], tr_idx_min[1]
print("Smallest difference: {}".format(min(tr_sum)))
return ref_idx, in_idx
set_ref = np.array([[2511.268821,44.864124],
[2374.085032,201.922566],
[1619.282942,216.089335],
[1655.866502,221.127787],
[ 804.171659,2133.549517], ])
set_in = np.array([[1992.438563,63.727282],
[2285.793346,255.402548],
[1568.915358, 279.144544],
[1509.720134, 289.434629],
[1914.255205, 349.477788],
[2370.786382, 496.026836],
[ 482.702882, 508.685952],
[2089.691026, 523.18825 ],
[ 216.827439, 561.807396],
[ 614.874621, 2007.304727],
[1286.639124, 2155.264827],
[ 729.566116, 2190.982364]])
# All possible triangles.
ref_combs = list(itertools.combinations(range(len(set_ref)), 3))
in_combs = list(itertools.combinations(range(len(set_in)), 3))
# Obtain normalized triangles.
ref_triang, in_triang = getTriangles(set_ref, ref_combs), getTriangles(set_in, in_combs)
# Index of the ref and in triangles with the smallest difference.
ref_idx, in_idx = sumTriangles(ref_triang, in_triang)
# Indexes of points in ref and in of the best match triangles.
ref_idx_pts, in_idx_pts = ref_combs[ref_idx], in_combs[in_idx]
print ('triangle ref %s matches triangle in %s' % (ref_idx_pts, in_idx_pts))
print ("ref:", [set_ref[_] for _ in ref_idx_pts])
print ("input:", [set_in[_] for _ in in_idx_pts])
ref_pts = np.array([set_ref[_] for _ in ref_idx_pts])
in_pts = np.array([set_in[_] for _ in in_idx_pts])
transf, (in_list,ref_list) = aa.find_transform(in_pts, ref_pts)
transf_in = transf(set_in)
print(f'transformation matrix: {transf}')
plt.scatter(set_ref[:,0],set_ref[:,1], s=100,marker='.', c='r',label='Reference')
plt.scatter(set_in[:,0],set_in[:,1], s=100,marker='.', c='b',label='Input')
plt.scatter(transf_in[:,0],transf_in[:,1], s=100,marker='+', c='b',label='Input Aligned')
plt.plot(ref_pts[:,0],ref_pts[:,1], c='r')
plt.plot(in_pts[:,0],in_pts[:,1], c='b')
plt.legend()
plt.tight_layout()
plt.savefig( 'align_coordinates.png', format = 'png')
plt.show()
I have P_0 points spread randomly in a 2d box. Then I divide them in two groups S and I. If some points of S come too close to I, they are deleted from the S group and added to the I group. The problem I am facing is that sometimes they are not correctly deleted from S, but they are properly added to I. Hence, the total number of points keeps erroneously growing.
Here is the code:
from scipy.spatial import cKDTree
import numpy as np
import matplotlib.pyplot as plt
P_0 = 100 # initial susceptible population
# dimensions of box
Lx = 5.0
Ly = 5.0
# generate P_0 random points inside box
X = np.random.uniform(0, Lx, P_0)
Y = np.random.uniform(0, Ly, P_0)
pts = np.column_stack((X, Y)) # array of 2d points
S = np.arange(10, P_0) # indices of the susceptible
I = np.arange(10) # indices of the infected
# Divide points into infected and susceptible groups
r_I = pts[I]
r_S = pts[S]
tree = cKDTree(r_S)
# idx represents the indices to points in r_S which are closer than r to
# points in r_I
idx = tree.query_ball_point(r_I, r=0.4)
idx = np.hstack(idx) # flatten the lists into one numpy array
idx = idx.astype(int) # Make sure idx indices have int type
print idx
# plot points
plt.figure()
plt.plot (r_S[:, 0], r_S[:, 1], 'bo') # plot all r_S points
plt.plot (r_S[idx, 0], r_S[idx, 1], 'ko') # color those points nearest to r_I
plt.plot (r_I[:, 0], r_I[:, 1], 'ro') # identify the r_I points
print len(S), len(I), len(S)+len(I)
I= np.append(I, S[idx]) # add the closest points to I
S = np.delete(S, idx) # delete the closest points from S
# points in r_I
idx = tree.query_ball_point(r_I, r=0.4)
idx = np.hstack(idx) # flatten the lists into one numpy array
idx = idx.astype(int) # Make sure idx indices have int type
print idx
# plot points
plt.figure()
plt.plot (r_S[:, 0], r_S[:, 1], 'bo') # plot all r_S points
plt.plot (r_S[idx, 0], r_S[idx, 1], 'ko') # color those points nearest to r_I
plt.plot (r_I[:, 0], r_I[:, 1], 'ro') # identify the r_I points
print len(S), len(I), len(S)+len(I)
I= np.append(I, S[idx]) # add the closest points to I
S = np.delete(S, idx) # delete the closest points from S
plt.figure('S group')
plt.plot (pts[S, 0], pts[S, 1], 'bo') # plot the updated r_S points
plt.figure('I group')
plt.plot (pts[I, 0], pts[I, 1], 'ro') # plot the updated r_I points
print len(S), len(I), len(S)+len(I), len(idx)
plt.show()
So, I don't know why not all points in r_S closer than r, sometimes aren't deleted from S.
One might have to run the code a few times for the error to appear, or just increase P_0 to 1000 for example or increase the value of r. it might be a problem with idx and the way I am using numpy delete.
You could double check your assumption by swapping the actions (doing a move by doing the deletion first and the addition second, only on a successful deletion) and testing the deletion in a separate variable.
Compare the resulting size after deletion with the original size of the group. If sizes match, no deletion had occurred (for whatever reason), which is a signal not to add something on the other side.
Then you could have a print of groups on the upper case and see the indexes which participate to shed some light into the tunnel.
As I just commented, I just had to eliminate the duplicates in idx.
I added the line
idx = np.unique(idx)
just below idx = idx.astype(int)
I would like to plot parallel lines with different colors. E.g. rather than a single red line of thickness 6, I would like to have two parallel lines of thickness 3, with one red and one blue.
Any thoughts would be appreciated.
Merci
Even with the smart offsetting (s. below), there is still an issue in a view that has sharp angles between consecutive points.
Zoomed view of smart offsetting:
Overlaying lines of varying thickness:
Plotting parallel lines is not an easy task. Using a simple uniform offset will of course not show the desired result. This is shown in the left picture below.
Such a simple offset can be produced in matplotlib as shown in the transformation tutorial.
Method1
A better solution may be to use the idea sketched on the right side. To calculate the offset of the nth point we can use the normal vector to the line between the n-1st and the n+1st point and use the same distance along this normal vector to calculate the offset point.
The advantage of this method is that we have the same number of points in the original line as in the offset line. The disadvantage is that it is not completely accurate, as can be see in the picture.
This method is implemented in the function offset in the code below.
In order to make this useful for a matplotlib plot, we need to consider that the linewidth should be independent of the data units. Linewidth is usually given in units of points, and the offset would best be given in the same unit, such that e.g. the requirement from the question ("two parallel lines of width 3") can be met.
The idea is therefore to transform the coordinates from data to display coordinates, using ax.transData.transform. Also the offset in points o can be transformed to the same units: Using the dpi and the standard of ppi=72, the offset in display coordinates is o*dpi/ppi. After the offset in display coordinates has been applied, the inverse transform (ax.transData.inverted().transform) allows a backtransformation.
Now there is another dimension of the problem: How to assure that the offset remains the same independent of the zoom and size of the figure?
This last point can be addressed by recalculating the offset each time a zooming of resizing event has taken place.
Here is how a rainbow curve would look like produced by this method.
And here is the code to produce the image.
import numpy as np
import matplotlib.pyplot as plt
dpi = 100
def offset(x,y, o):
""" Offset coordinates given by array x,y by o """
X = np.c_[x,y].T
m = np.array([[0,-1],[1,0]])
R = np.zeros_like(X)
S = X[:,2:]-X[:,:-2]
R[:,1:-1] = np.dot(m, S)
R[:,0] = np.dot(m, X[:,1]-X[:,0])
R[:,-1] = np.dot(m, X[:,-1]-X[:,-2])
On = R/np.sqrt(R[0,:]**2+R[1,:]**2)*o
Out = On+X
return Out[0,:], Out[1,:]
def offset_curve(ax, x,y, o):
""" Offset array x,y in data coordinates
by o in points """
trans = ax.transData.transform
inv = ax.transData.inverted().transform
X = np.c_[x,y]
Xt = trans(X)
xto, yto = offset(Xt[:,0],Xt[:,1],o*dpi/72. )
Xto = np.c_[xto, yto]
Xo = inv(Xto)
return Xo[:,0], Xo[:,1]
# some single points
y = np.array([1,2,2,3,3,0])
x = np.arange(len(y))
#or try a sinus
x = np.linspace(0,9)
y=np.sin(x)*x/3.
fig, ax=plt.subplots(figsize=(4,2.5), dpi=dpi)
cols = ["#fff40b", "#00e103", "#ff9921", "#3a00ef", "#ff2121", "#af00e7"]
lw = 2.
lines = []
for i in range(len(cols)):
l, = plt.plot(x,y, lw=lw, color=cols[i])
lines.append(l)
def plot_rainbow(event=None):
xr = range(6); yr = range(6);
xr[0],yr[0] = offset_curve(ax, x,y, lw/2.)
xr[1],yr[1] = offset_curve(ax, x,y, -lw/2.)
xr[2],yr[2] = offset_curve(ax, xr[0],yr[0], lw)
xr[3],yr[3] = offset_curve(ax, xr[1],yr[1], -lw)
xr[4],yr[4] = offset_curve(ax, xr[2],yr[2], lw)
xr[5],yr[5] = offset_curve(ax, xr[3],yr[3], -lw)
for i in range(6):
lines[i].set_data(xr[i], yr[i])
plot_rainbow()
fig.canvas.mpl_connect("resize_event", plot_rainbow)
fig.canvas.mpl_connect("button_release_event", plot_rainbow)
plt.savefig(__file__+".png", dpi=dpi)
plt.show()
Method2
To avoid overlapping lines, one has to use a more complicated solution.
One could first offset every point normal to the two line segments it is part of (green points in the picture below). Then calculate the line through those offset points and find their intersection.
A particular case would be when the slopes of two subsequent line segments equal. This has to be taken care of (eps in the code below).
from __future__ import division
import numpy as np
import matplotlib.pyplot as plt
dpi = 100
def intersect(p1, p2, q1, q2, eps=1.e-10):
""" given two lines, first through points pn, second through qn,
find the intersection """
x1 = p1[0]; y1 = p1[1]; x2 = p2[0]; y2 = p2[1]
x3 = q1[0]; y3 = q1[1]; x4 = q2[0]; y4 = q2[1]
nomX = ((x1*y2-y1*x2)*(x3-x4)- (x1-x2)*(x3*y4-y3*x4))
denom = float( (x1-x2)*(y3-y4) - (y1-y2)*(x3-x4) )
nomY = (x1*y2-y1*x2)*(y3-y4) - (y1-y2)*(x3*y4-y3*x4)
if np.abs(denom) < eps:
#print "intersection undefined", p1
return np.array( p1 )
else:
return np.array( [ nomX/denom , nomY/denom ])
def offset(x,y, o, eps=1.e-10):
""" Offset coordinates given by array x,y by o """
X = np.c_[x,y].T
m = np.array([[0,-1],[1,0]])
S = X[:,1:]-X[:,:-1]
R = np.dot(m, S)
norm = np.sqrt(R[0,:]**2+R[1,:]**2) / o
On = R/norm
Outa = On+X[:,1:]
Outb = On+X[:,:-1]
G = np.zeros_like(X)
for i in xrange(0, len(X[0,:])-2):
p = intersect(Outa[:,i], Outb[:,i], Outa[:,i+1], Outb[:,i+1], eps=eps)
G[:,i+1] = p
G[:,0] = Outb[:,0]
G[:,-1] = Outa[:,-1]
return G[0,:], G[1,:]
def offset_curve(ax, x,y, o, eps=1.e-10):
""" Offset array x,y in data coordinates
by o in points """
trans = ax.transData.transform
inv = ax.transData.inverted().transform
X = np.c_[x,y]
Xt = trans(X)
xto, yto = offset(Xt[:,0],Xt[:,1],o*dpi/72., eps=eps )
Xto = np.c_[xto, yto]
Xo = inv(Xto)
return Xo[:,0], Xo[:,1]
# some single points
y = np.array([1,1,2,0,3,2,1.,4,3]) *1.e9
x = np.arange(len(y))
x[3]=x[4]
#or try a sinus
#x = np.linspace(0,9)
#y=np.sin(x)*x/3.
fig, ax=plt.subplots(figsize=(4,2.5), dpi=dpi)
cols = ["r", "b"]
lw = 11.
lines = []
for i in range(len(cols)):
l, = plt.plot(x,y, lw=lw, color=cols[i], solid_joinstyle="miter")
lines.append(l)
def plot_rainbow(event=None):
xr = range(2); yr = range(2);
xr[0],yr[0] = offset_curve(ax, x,y, lw/2.)
xr[1],yr[1] = offset_curve(ax, x,y, -lw/2.)
for i in range(2):
lines[i].set_data(xr[i], yr[i])
plot_rainbow()
fig.canvas.mpl_connect("resize_event", plot_rainbow)
fig.canvas.mpl_connect("button_release_event", plot_rainbow)
plt.show()
Note that this method should work well as long as the offset between the lines is smaller then the distance between subsequent points on the line. Otherwise method 1 may be better suited.
The best that I can think of is to take your data, generate a series of small offsets, and use fill_between to make bands of whatever color you like.
I wrote a function to do this. I don't know what shape you're trying to plot, so this may or may not work for you. I tested it on a parabola and got decent results. You can also play around with the list of colors.
def rainbow_plot(x, y, spacing=0.1):
fig, ax = plt.subplots()
colors = ['red', 'yellow', 'green', 'cyan','blue']
top = max(y)
lines = []
for i in range(len(colors)+1):
newline_data = y - top*spacing*i
lines.append(newline_data)
for i, c in enumerate(colors):
ax.fill_between(x, lines[i], lines[i+1], facecolor=c)
return fig, ax
x = np.linspace(0,1,51)
y = 1-(x-0.5)**2
rainbow_plot(x,y)
I want to bin the values of polygons to a fine regular grid.
For instance, I have the following coordinates:
data = 2.353
data_lats = np.array([57.81000137, 58.15999985, 58.13000107, 57.77999878])
data_lons = np.array([148.67999268, 148.69999695, 148.47999573, 148.92999268])
My regular grid looks like this:
delta = 0.25
grid_lons = np.arange(-180, 180, delta)
grid_lats = np.arange(90, -90, -delta)
llx, lly = np.meshgrid( grid_lons, grid_lats )
rows = lly.shape[0]
cols = llx.shape[1]
grid = np.zeros((rows,cols))
Now I can find the grid pixel that corresponds to the center of my polygon very easily:
centerx, centery = np.mean(data_lons), np.mean(data_lats)
row = int(np.floor( centery/delta ) + (grid.shape[0]/2))
col = int(np.floor( centerx/delta ) + (grid.shape[1]/2))
grid[row,col] = data
However, there are probably a couple of grid pixels that still intersect with the polygon. Hence, I would like to generate a bunch of coordinates inside my polygon (data_lons, data_lats) and find their corresponding grid pixel as before. Do you a suggestion to generate the coordinates randomly or systematically? I failed, but am still trying.
Note: One data set contains around ~80000 polygons so it has to be really fast (a couple of seconds). That is also why I chose this approach, because it does not account the area of overlap... (like my earlier question Data binning: irregular polygons to regular mesh which is VERY slow)
I worked on a quick and dirty solution by simply calculating the coordinates between corner pixels. Take a look:
dlats = np.zeros((data_lats.shape[0],4))+np.nan
dlons = np.zeros((data_lons.shape[0],4))+np.nan
idx = [0,1,3,2,0] #rearrange the corner pixels
for cc in range(4):
dlats[:,cc] = np.mean((data_lats[:,idx[cc]],data_lats[:,idx[cc+1]]), axis=0)
dlons[:,cc] = np.mean((data_lons[:,idx[cc]],data_lons[:,idx[cc+1]]), axis=0)
data_lats = np.column_stack(( data_lats, dlats ))
data_lons = np.column_stack(( data_lons, dlons ))
Thus, the red dots represent the original corners - the blue ones the intermediate pixels between them.
I can do this one more time and include the center pixel (geo[:,[4,9]])
dlats = np.zeros((data.shape[0],8))
dlons = np.zeros((data.shape[0],8))
for cc in range(8):
dlats[:,cc] = np.mean((data_lats[:,cc], geo[:,4]), axis=0)
dlons[:,cc] = np.mean((data_lons[:,cc], geo[:,9]), axis=0)
data_lats = np.column_stack(( data_lats, dlats, geo[:,4] ))
data_lons = np.column_stack(( data_lons, dlons, geo[:,9] ))
This works really nice, and I can assign each point directly to its corresponding grid pixel like this:
row = np.floor( data_lats/delta ) + (llx.shape[0]/2)
col = np.floor( data_lons/delta ) + (llx.shape[1]/2)
However the final binning now takes ~7sec!!! How can I speed this code up:
for ii in np.arange(len(data)):
for cc in np.arange(data_lats.shape[1]):
final_grid[row[ii,cc],col[ii,cc]] += data[ii]
final_grid_counts[row[ii,cc],col[ii,cc]] += 1
You'll need to test the following approach to see if it is fast enough. First, you should modify all your lats and lons into, to make them (possibly fractional) indices into your grid:
idx_lats = (data_lats - lat_grid_start) / lat_grid step
idx_lons = (data_lons - lon_grid_start) / lon_grid step
Next, we want to split your polygons into triangles. For any convex polygon, you could take the center of the polygon as one vertex of all triangles, and then the vertices of the polygon in consecutive pairs. But if your polygon are all quadrilaterals, it is going to be faster to divide them into only 2 triangles, using vertices 0, 1, 2 for the first, and 0, 2, 3 for the second.
To know if a certain point is inside a triangle, I am going to use the barycentric coordinates approach described here. This first function checks whether a bunch of points are inside a triangle:
def check_in_triangle(x, y, x_tri, y_tri) :
A = np.vstack((x_tri[0], y_tri[0]))
lhs = np.vstack((x_tri[1:], y_tri[1:])) - A
rhs = np.vstack((x, y)) - A
uv = np.linalg.solve(lhs, rhs)
# Equivalent to (uv[0] >= 0) & (uv[1] >= 0) & (uv[0] + uv[1] <= 1)
return np.logical_and(uv >= 0, axis=0) & (np.sum(uv, axis=0) <= 1)
Given a triangle by its vertices, you can get the lattice points inside it, by running the above function on the lattice points in the bounding box of the triangle:
def lattice_points_in_triangle(x_tri, y_tri) :
x_grid = np.arange(np.ceil(np.min(x_tri)), np.floor(np.max(x_tri)) + 1)
y_grid = np.arange(np.ceil(np.min(y_tri)), np.floor(np.max(y_tri)) + 1)
x, y = np.meshgrid(x_grid, y_grid)
x, y = x.reshape(-1), y.reshape(-1)
idx = check_in_triangle(x, y, x_tri, y_tri)
return x[idx], y[idx]
And for a quadrilateral, you simply call this last function twice :
def lattice_points_in_quadrilateral(x_quad, y_quad) :
return map(np.concatenate,
zip(lattice_points_in_triangle(x_quad[:3], y_quad[:3]),
lattice_points_in_triangle(x_quad[[0, 2, 3]],
y_quad[[0, 2, 3]])))
If you run this code on your example data, you will get two empty arrays returned: that's because the order of the quadrilateral points is a surprising one: indices 0 and 1 define one diagonal, 2 and 3 the other. My function above was expecting the vertices to be ordered around the polygon. If you really are doing things this other way, you need to change the second call to lattice_points_in_triangle inside lattice_points_in_quadrilateral so that the indices used are [0, 1, 3] instead of [0, 2, 3].
And now, with that change :
>>> idx_lats = (data_lats - (-180) ) / 0.25
>>> idx_lons = (data_lons - (-90) ) / 0.25
>>> lattice_points_in_quadrilateral(idx_lats, idx_lons)
[array([952]), array([955])]
If you change the resolution of your grid to 0.1:
>>> idx_lats = (data_lats - (-180) ) / 0.1
>>> idx_lons = (data_lons - (-90) ) / 0.1
>>> lattice_points_in_quadrilateral(idx_lats, idx_lons)
[array([2381, 2380, 2381, 2379, 2380, 2381, 2378, 2379, 2378]),
array([2385, 2386, 2386, 2387, 2387, 2387, 2388, 2388, 2389])]
Timing wise this approach is going to be, in my system, about 10x too slow for your needs:
In [8]: %timeit lattice_points_in_quadrilateral(idx_lats, idx_lons)
1000 loops, best of 3: 269 us per loop
So you are looking at over 20 sec. to process your 80,000 polygons.