TL;DR: Is there anyway I can get rid of my second for-loop?
I have a time series of points on a 2D-grid. To get rid of fast fluctuations of their position, I average the coordinates over a window of frames. Now in my case, it's possible for the points to cover a larger distance than usual. I don't want to include frames for a specific point, if it travels farther than the cut_off value.
In the first for-loop, I go over all frames and define the moving window. I then calculate the distances between the current frame and each frame in the moving window. After I grab only those positions from all frames, where both the x and y component did not travel farther than cut_off. Now I want to calculate the mean positions for every point from all these selected frames of the moving window (note: the number of selected frames can be smaller than n_window). This leads me to the second for-loop. Here I iterate over all points and actually grab the positions from the frames, in which the current point did not travel farther than cut_off. From these selected frames I calculate the mean value of the coordinates and use it as the new value for the current frame.
This very last for-loop slows down the whole processing. I can't come up with a better way to accomplish this calculation. Any suggestions?
MWE
Put in comments for clarification.
import numpy as np
# Generate a timeseries with 1000 frames, each
# containing 50 individual points defined by their
# x and y coordinates
n_frames = 1000
n_points = 50
n_coordinates = 2
timeseries = np.random.randint(-100, 100, [n_frames, n_points, n_coordinates])
# Set window size to 20 frames
n_window = 20
# Distance cut off
cut_off = 60
# Set up empty array to hold results
avg_data_store = np.zeros([n_frames, timeseries.shape[1], 2])
# Iterate over all frames
for frame in np.arange(0, n_frames):
# Set the frame according to the window size that we're looking at
t_before = int(frame - (n_window / 2))
t_after = int(frame + (n_window / 2))
# If we're trying to access frames below 0, set the lowest one to 0
if t_before < 0:
t_before = 0
# Trying to access frames that are not in the trajectory, set to last frame
if t_after > n_frames - 1:
t_after = n_frames - 1
# Grab x and y coordinates for all points in the corresponding window
pos_before = timeseries[t_before:frame]
pos_after = timeseries[frame + 1:t_after + 1]
pos_now = timeseries[frame]
# Calculate the distance between the current frame and the windows before/after
d_before = np.abs(pos_before - pos_now)
d_after = np.abs(pos_after - pos_now)
# Grab indices of frames+points, that are below the cut off
arg_before = np.argwhere(np.all(d_before < cut_off, axis=2))
arg_after = np.argwhere(np.all(d_after < cut_off, axis=2))
# Iterate over all points
for i in range(0, timeseries.shape[1]):
# Create temp array
temp_stack = pos_now[i]
# Grab all frames in which the current point did _not_
# travel farther than `cut_off`
all_before = arg_before[arg_before[:, 1] == i][:, 0]
all_after = arg_after[arg_after[:, 1] == i][:, 0]
# Grab the corresponding positions for this points in these frames
all_pos_before = pos_before[all_before, i]
all_pos_after = pos_after[all_after, i]
# If we have any frames for that point before / after
# stack them into the temp array
if all_pos_before.size > 0:
temp_stack = np.vstack([all_pos_before, temp_stack])
if all_pos_after.size > 0:
temp_stack = np.vstack([temp_stack, all_pos_after])
# Calculate the moving window average for the selection of frames
avg_data_store[frame, i] = temp_stack.mean(axis=0)
If you are fine with calculating the cutoff distance in x and y separately, you can use scipy.ndimage.generic_filter.
import numpy as np
from scipy.ndimage import generic_filter
def _mean(x, cutoff):
is_too_different = np.abs(x - x[len(x) / 2]) > cutoff
return np.mean(x[~is_too_different])
def _smooth(x, window_length=5, cutoff=1.):
return generic_filter(x, _mean, size=window_length, mode='nearest', extra_keywords=dict(cutoff=cutoff))
def smooth(arr, window_length=5, cutoff=1., axis=-1):
return np.apply_along_axis(_smooth, axis, arr, window_length=window_length, cutoff=cutoff)
# --------------------------------------------------------------------------------
def _simulate_movement_2d(T, fraction_is_jump=0.01):
# generate random velocities with a few "jumps"
velocity = np.random.randn(T, 2)
is_jump = np.random.rand(T) < fraction_is_jump
jump = 10 * np.random.randn(T, 2)
jump[~is_jump] = 0.
# pre-allocate position and momentum arrays
position = np.zeros((T,2))
momentum = np.zeros((T,2))
# initialise the first position
position[0] = np.random.randn(2)
# update position using velocity vector:
# smooth movement by not applying the velocity directly
# but rather by keeping track of the momentum
for ii in range(2,T):
momentum[ii] = 0.9 * momentum[ii-1] + 0.1 * velocity[ii-1]
position[ii] = position[ii-1] + momentum[ii] + jump[ii]
# add some measurement noise
noise = np.random.randn(T,2)
position += noise
return position
def demo(nframes=1000, npoints=3):
# create data
positions = np.array([_simulate_movement_2d(nframes) for ii in range(npoints)])
# format to (nframes, npoints, 2)
position = positions.transpose([1, 0, 2])
# smooth
smoothed = smooth(positions, window_length=11, cutoff=5., axis=1)
# plot
x, y = positions.T
xs, ys = smoothed.T
import matplotlib.pyplot as plt
fig, ax = plt.subplots(1,1)
ax.plot(x, y, 'o')
ax.plot(xs, ys, 'k-', alpha=0.3, lw=2)
plt.show()
demo()
Related
I have an image with width: 1980 and height: 1080.
Ultimately, I want to place various shapes within the image, but at random locations and in such a way that they do not overlap. The 0,0 coordinates of the image are in the center.
Before rendering the shapes into the image (I don't need help with this), I need to write an algorithm to generate the XY points/locations. I want to be able to specify the minimum distance any given point is allowed to get to any other points.
How can do this?
All I have been able to do, is to generate points at equally spaced locations and then add a bit of randomness to each point. But this is not ideal, because it means points just vary within some 'cell' within a grid, and if the randomness value is too high, they will appear outside of the rectangle. Here is my code:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle
from random import randrange
def is_square(integer):
root = np.sqrt(integer)
return integer == int(root + 0.5) ** 2
def perfect_sqr(n):
nextN = np.floor(np.sqrt(n)) + 1
return int(nextN * nextN)
def generate_cells(width = 1920, height = 1080, n = 9, show_plot=False):
# If the number is not a perfect square, we need to find the next number which is
# so that we can get the root N, which will be used to determine the number of rows/columns
if not is_square(n):
n = perfect_sqr(n)
N = np.sqrt(n)
# generate x and y lists, where each represents an array of points evenly spaced between 0 and the width/height
x = np.array(list(range(0, width, int(width/N))))
y = np.array(list(range(0, height, int(height/N))))
# center the points within each 'cell'
x_centered = x+int(width/N)/2
y_centered = y+int(height/N)/2
x_centered = [a+randrange(50) for a in x_centered]
y_centered = [a+randrange(50) for a in y_centered]
# generate a grid with the points
xv, yv = np.meshgrid(x_centered, y_centered)
if(show_plot):
plt.scatter(xv,yv)
plt.gca().add_patch(Rectangle((0,0),width, height,edgecolor='red', facecolor='none', lw=1))
plt.show()
# convert the arrays to 1D
xx = xv.flatten()
yy = yv.flatten()
# Merge them side-by-side
zips = zip(xx, yy)
# convert to set of points/tuples and return
return set(zips)
coords = generate_cells(width=1920, height=1080, n=15, show_plot=True)
print(coords)
Assuming you simply want to randomly define non-overlapping coordinates within the confines of a maximum image size subject to not having images overlap, this might be a good solution.
import numpy as np
def locateImages(field_height: int, field_width: int, min_sep: int, points: int)-> np.array:
h_range = np.array(range(min_sep//2, field_height - (min_sep//2), min_sep))
w_range = np.array(range(min_sep//2, field_width-(min_sep//2), min_sep))
mx_len = max(len(h_range), len(w_range))
if len(h_range) < mx_len:
xtra = np.random.choice(h_range, mx_len - len(h_range))
h_range = np.append(h_range, xtra)
if len(w_range) < mx_len:
xtra = np.random.choice(w_range, mx_len - len(w_range))
w_range = np.append(w_range, xtra)
h_points = np.random.choice(h_range, points, replace=False)
w_points = np.random.choice(w_range, points, replace=False)
return np.concatenate((np.vstack(h_points), np.vstack(w_points)), axis= 1)
Then given:
field_height = the vertical coordinate of the Image space
field_width = the maximum horizontal coordinate of the Image space
min_sep = the minimum spacing between images
points = number of coordinates to be selected
Then:
locateImages(15, 8, 2, 5) will yield:
array([[13, 1],
[ 7, 3],
[ 1, 5],
[ 5, 5],
[11, 5]])
Render the output:
points = locateImages(1080, 1920, 100, 15)
x,y= zip(*points)
plt.scatter(x,x)
plt.gca().add_patch(Rectangle((0,0),1920, 1080,edgecolor='red', facecolor='none', lw=1))
plt.show()
I have a random distribution of points in 2D space like so:
from sklearn import datasets
import pandas as pd
import numpy as np
arr, labels = datasets.make_moons()
arr, labels = datasets.make_blobs(n_samples=1000, centers=3)
pd.DataFrame(arr, columns=['x', 'y']).plot.scatter('x', 'y', s=1)
I'm trying to assign each of these points to the nearest unoccupied slot on an imaginary hex grid to ensure the points don't overlap. The code I'm using to accomplish this goal produces the plot below. The general idea is to create the hex bins, then iterate over each point and find the minimal radius that allows the algorithm to assign that point to an unoccupied hex bin:
from scipy.spatial.distance import euclidean
def get_bounds(arr):
'''Given a 2D array return the y_min, y_max, x_min, x_max'''
return [
np.min(arr[:,1]),
np.max(arr[:,1]),
np.min(arr[:,0]),
np.max(arr[:,0]),
]
def create_mesh(arr, h=100, w=100):
'''arr is a 2d array; h=number of vertical divisions; w=number of horizontal divisions'''
print(' * creating mesh with size', h, w)
bounds = get_bounds(arr)
# create array of valid positions
y_vals = np.arange(bounds[0], bounds[1], (bounds[1]-bounds[0])/h)
x_vals = np.arange(bounds[2], bounds[3], (bounds[3]-bounds[2])/w)
# create the dense mesh
data = np.tile(
[[0, 1], [1, 0]],
np.array([
int(np.ceil(len(y_vals) / 2)),
int(np.ceil(len(x_vals) / 2)),
]))
# ensure each axis has an even number of slots
if len(y_vals) % 2 != 0 or len(x_vals) % 2 != 0:
data = data[0:len(y_vals), 0:len(x_vals)]
return pd.DataFrame(data, index=y_vals, columns=x_vals)
def align_points_to_grid(arr, h=100, w=100, verbose=False):
'''arr is a 2d array; h=number of vertical divisions; w=number of horizontal divisions'''
h = w = len(arr)/10
grid = create_mesh(arr, h=h, w=w)
# fill the mesh
print(' * filling mesh')
df = pd.DataFrame(arr, columns=['x', 'y'])
bounds = get_bounds(arr)
# store the number of points slotted
c = 0
for site, point in df[['x', 'y']].iterrows():
# skip points not in original points domain
if point.y < bounds[0] or point.y > bounds[1] or \
point.x < bounds[2] or point.x > bounds[3]:
raise Exception('Input point is out of bounds', point.x, point.y, bounds)
# assign this point to the nearest open slot
r_y = (bounds[1]-bounds[0])/h
r_x = (bounds[3]-bounds[2])/w
slotted = False
while not slotted:
bottom = grid.index.searchsorted(point.y - r_y)
top = grid.index.searchsorted(point.y + r_y, side='right')
left = grid.columns.searchsorted(point.x - r_x)
right = grid.columns.searchsorted(point.x + r_x, side='right')
close_grid_points = grid.iloc[bottom:top, left:right]
# store the position in this point's radius that minimizes distortion
best_dist = np.inf
grid_loc = [np.nan, np.nan]
for x, col in close_grid_points.iterrows():
for y, val in col.items():
if val != 1: continue
dist = euclidean(point, (x,y))
if dist < best_dist:
best_dist = dist
grid_loc = [x,y]
# no open slots were found so increase the search radius
if np.isnan(grid_loc[0]):
r_y *= 2
r_x *= 2
# success! report the slotted position to the user
else:
# assign a value other than 1 to mark this slot as filled
grid.loc[grid_loc[0], grid_loc[1]] = 2
df.loc[site, ['x', 'y']] = grid_loc
slotted = True
c += 1
if verbose:
print(' * completed', c, 'of', len(arr), 'assignments')
return df
# plot sample data
df = align_points_to_grid(arr, verbose=False)
df.plot.scatter('x', 'y', s=1)
I'm satisfied with the result of this algorithm, but not with the performance.
Is there a faster solution to this kind of hexbin assignment problem in Python? I feel others with more exposure to the Linear Assignment Problem or the Hungarian Algorithm might have valuable insight into this problem. Any suggestions would be hugely helpful!
It turned out assigning each point to the first available grid spot within its current radius was sufficiently performant.
For others who end up here, my lab wrapped this functionality into a little Python package for convenience. You can pip install pointgrid and then:
from pointgrid import align_points_to_grid
from sklearn import datasets
# create fake data
arr, labels = datasets.make_blobs(n_samples=1000, centers=5)
# get updated point positions
updated = align_points_to_grid(arr)
updated will be a numpy array with the same shape as the input array arr.
I have an oriented cylinder generated with vtkCylinderSource and some transformations are applied on it to get the orientation that i want. Here is the code for creating this oriented-cylinder:
def cylinder_object(startPoint, endPoint, radius, my_color="DarkRed", opacity=1):
colors = vtk.vtkNamedColors()
# Create a cylinder.
# Cylinder height vector is (0,1,0).
# Cylinder center is in the middle of the cylinder
cylinderSource = vtk.vtkCylinderSource()
cylinderSource.SetRadius(radius)
cylinderSource.SetResolution(50)
# Generate a random start and end point
# startPoint = [0] * 3
# endPoint = [0] * 3
rng = vtk.vtkMinimalStandardRandomSequence()
rng.SetSeed(8775070) # For testing.8775070
# Compute a basis
normalizedX = [0] * 3
normalizedY = [0] * 3
normalizedZ = [0] * 3
# The X axis is a vector from start to end
vtk.vtkMath.Subtract(endPoint, startPoint, normalizedX)
length = vtk.vtkMath.Norm(normalizedX)
vtk.vtkMath.Normalize(normalizedX)
# The Z axis is an arbitrary vector cross X
arbitrary = [0] * 3
for i in range(0, 3):
rng.Next()
arbitrary[i] = rng.GetRangeValue(-10, 10)
vtk.vtkMath.Cross(normalizedX, arbitrary, normalizedZ)
vtk.vtkMath.Normalize(normalizedZ)
# The Y axis is Z cross X
vtk.vtkMath.Cross(normalizedZ, normalizedX, normalizedY)
matrix = vtk.vtkMatrix4x4()
# Create the direction cosine matrix
matrix.Identity()
for i in range(0, 3):
matrix.SetElement(i, 0, normalizedX[i])
matrix.SetElement(i, 1, normalizedY[i])
matrix.SetElement(i, 2, normalizedZ[i])
# Apply the transforms
transform = vtk.vtkTransform()
transform.Translate(startPoint) # translate to starting point
transform.Concatenate(matrix) # apply direction cosines
transform.RotateZ(-90.0) # align cylinder to x axis
transform.Scale(1.0, length, 1.0) # scale along the height vector
transform.Translate(0, .5, 0) # translate to start of cylinder
# Transform the polydata
transformPD = vtk.vtkTransformPolyDataFilter()
transformPD.SetTransform(transform)
transformPD.SetInputConnection(cylinderSource.GetOutputPort())
cylinderSource.Update()
# Create a mapper and actor for the arrow
mapper = vtk.vtkPolyDataMapper()
actor = vtk.vtkActor()
if USER_MATRIX:
mapper.SetInputConnection(cylinderSource.GetOutputPort())
actor.SetUserMatrix(transform.GetMatrix())
else:
mapper.SetInputConnection(transformPD.GetOutputPort())
actor.SetMapper(mapper)
actor.GetProperty().SetColor(colors.GetColor3d(my_color))
actor.GetProperty().SetOpacity(opacity)
return actor, transformPD
Now i want to ray cast a line with this oriented cylinder. unfortunately, using the vtkCylinderSource as the dataset for vtkOBBTree produces the wrong points as the result. how can i use ray-casting with a PolyDataFilter?
I came up with a solution where i export my oriented-cylinder to a .stl file and then read it again to implement the ray-casting algorithm using IntersectWithLine. The problem is i have thousands of these oriented-cylinders and this method (exporting and reading) makes my code extremely slow.
def ray_cast(filename, p_source, p_target):
'''
:param filename: STL file to perform ray casting on.
:param p_source: first point
:param p_target: second point
:return: code --> 0 : No intersection.
:return: code --> +1 : p_source lies OUTSIDE the closed surface.
:return; code --> -1 : p_source lies INSIDE closed surface
'''
reader = vtk.vtkSTLReader()
reader.SetFileName(filename)
reader.Update()
mesh = reader.GetOutput()
obbtree = vtk.vtkOBBTree()
obbtree.SetDataSet(mesh)
obbtree.BuildLocator()
pointsVTKIntersection = vtk.vtkPoints()
code = obbtree.IntersectWithLine(p_source, p_target, pointsVTKIntersection, None)
# Extracting data
pointsVTKIntersectionData = pointsVTKIntersection.GetData()
noPointsVTKIntersection = pointsVTKIntersectionData.GetNumberOfTuples()
pointsIntersection = []
for idx in range(noPointsVTKIntersection):
_tup = pointsVTKIntersectionData.GetTuple3(idx)
pointsIntersection.append(_tup)
return code, pointsIntersection, noPointsVTKIntersection
Below image shows the desired result using export-stl method. (the green spheres are intersection points)
I would appreciate any suggestion and help..
With vedo:
from vedo import *
cyl = Cylinder() # vtkActor
cyl.alpha(0.5).pos(3,3,3).orientation([2,1,1])
p1, p2 = (0,0,0), (4,4,5)
ipts_coords = cyl.intersectWithLine(p1, p2)
print('hit coords are', ipts_coords)
pts = Points(ipts_coords, r=10).color("yellow")
# print(pts.polydata()) # is the vtkPolyData object
origin = Point()
ln = Line(p1,p2)
show(origin, cyl, ln, pts, axes=True)
I have figured out a method to cluster disperse point data into structured 2-d array(like rasterize function). And I hope there are some better ways to achieve that target.
My work
1. Intro
1000 point data has there dimensions of properties (lon, lat, emission) whicn represent one factory located at (x,y) emit certain amount of CO2 into atmosphere
grid network: predefine the 2-d array in the shape of 20x20
http://i4.tietuku.com/02fbaf32d2f09fff.png
The code reproduced here:
#### define the map area
xc1,xc2,yc1,yc2 = 113.49805889531724,115.5030664238035,37.39995194888143,38.789235929357105
map = Basemap(llcrnrlon=xc1,llcrnrlat=yc1,urcrnrlon=xc2,urcrnrlat=yc2)
#### reading the point data and scatter plot by their position
df = pd.read_csv("xxxxx.csv")
px,py = map(df.lon, df.lat)
map.scatter(px, py, color = "red", s= 5,zorder =3)
#### predefine the grid networks
lon_grid,lat_grid = np.linspace(xc1,xc2,21), np.linspace(yc1,yc2,21)
lon_x,lat_y = np.meshgrid(lon_grid,lat_grid)
grids = np.zeros(20*20).reshape(20,20)
plt.pcolormesh(lon_x,lat_y,grids,cmap = 'gray', facecolor = 'none',edgecolor = 'k',zorder=3)
2. My target
Finding the nearest grid point for each factory
Add the emission data into this grid number
3. Algorithm realization
3.1 Raster grid
note: 20x20 grid points are distributed in this area represented by blue dot.
http://i4.tietuku.com/8548554587b0cb3a.png
3.2 KD-tree
Find the nearest blue dot of each red point
sh = (20*20,2)
grids = np.zeros(20*20*2).reshape(*sh)
sh_emission = (20*20)
grids_em = np.zeros(20*20).reshape(sh_emission)
k = 0
for j in range(0,yy.shape[0],1):
for i in range(0,xx.shape[0],1):
grids[k] = np.array([lon_grid[i],lat_grid[j]])
k+=1
T = KDTree(grids)
x_delta = (lon_grid[2] - lon_grid[1])
y_delta = (lat_grid[2] - lat_grid[1])
R = np.sqrt(x_delta**2 + y_delta**2)
for i in range(0,len(df.lon),1):
idx = T.query_ball_point([df.lon.iloc[i],df.lat.iloc[i]], r=R)
# there are more than one blue dot which are founded sometimes,
# So I'll calculate the distances between the factory(red point)
# and all blue dots which are listed
if (idx > 1):
distance = []
for k in range(0,len(idx),1):
distance.append(np.sqrt((df.lon.iloc[i] - grids[k][0])**2 + (df.lat.iloc[i] - grids[k][1])**2))
pos_index = distance.index(min(distance))
pos = idx[pos_index]
# Only find 1 point
else:
pos = idx
grids_em[pos] += df.so2[i]
4. Result
co2 = grids_em.reshape(20,20)
plt.pcolormesh(lon_x,lat_y,co2,cmap =plt.cm.Spectral_r,zorder=3)
http://i4.tietuku.com/6ded65c4ac301294.png
5. My question
Can someone point out some drawbacks or error of this method?
Is there some algorithms more aligned with my target?
Thanks a lot!
There are many for-loop in your code, it's not the numpy way.
Make some sample data first:
import numpy as np
import pandas as pd
from scipy.spatial import KDTree
import pylab as pl
xc1, xc2, yc1, yc2 = 113.49805889531724, 115.5030664238035, 37.39995194888143, 38.789235929357105
N = 1000
GSIZE = 20
x, y = np.random.multivariate_normal([(xc1 + xc2)*0.5, (yc1 + yc2)*0.5], [[0.1, 0.02], [0.02, 0.1]], size=N).T
value = np.ones(N)
df_points = pd.DataFrame({"x":x, "y":y, "v":value})
For equal space grids you can use hist2d():
pl.hist2d(df_points.x, df_points.y, weights=df_points.v, bins=20, cmap="viridis");
Here is the output:
Here is the code to use KdTree:
X, Y = np.mgrid[x.min():x.max():GSIZE*1j, y.min():y.max():GSIZE*1j]
grid = np.c_[X.ravel(), Y.ravel()]
points = np.c_[df_points.x, df_points.y]
tree = KDTree(grid)
dist, indices = tree.query(points)
grid_values = df_points.groupby(indices).v.sum()
df_grid = pd.DataFrame(grid, columns=["x", "y"])
df_grid["v"] = grid_values
fig, ax = pl.subplots(figsize=(10, 8))
ax.plot(df_points.x, df_points.y, "kx", alpha=0.2)
mapper = ax.scatter(df_grid.x, df_grid.y, c=df_grid.v,
cmap="viridis",
linewidths=0,
s=100, marker="o")
pl.colorbar(mapper, ax=ax);
the output is:
I want to bin the values of polygons to a fine regular grid.
For instance, I have the following coordinates:
data = 2.353
data_lats = np.array([57.81000137, 58.15999985, 58.13000107, 57.77999878])
data_lons = np.array([148.67999268, 148.69999695, 148.47999573, 148.92999268])
My regular grid looks like this:
delta = 0.25
grid_lons = np.arange(-180, 180, delta)
grid_lats = np.arange(90, -90, -delta)
llx, lly = np.meshgrid( grid_lons, grid_lats )
rows = lly.shape[0]
cols = llx.shape[1]
grid = np.zeros((rows,cols))
Now I can find the grid pixel that corresponds to the center of my polygon very easily:
centerx, centery = np.mean(data_lons), np.mean(data_lats)
row = int(np.floor( centery/delta ) + (grid.shape[0]/2))
col = int(np.floor( centerx/delta ) + (grid.shape[1]/2))
grid[row,col] = data
However, there are probably a couple of grid pixels that still intersect with the polygon. Hence, I would like to generate a bunch of coordinates inside my polygon (data_lons, data_lats) and find their corresponding grid pixel as before. Do you a suggestion to generate the coordinates randomly or systematically? I failed, but am still trying.
Note: One data set contains around ~80000 polygons so it has to be really fast (a couple of seconds). That is also why I chose this approach, because it does not account the area of overlap... (like my earlier question Data binning: irregular polygons to regular mesh which is VERY slow)
I worked on a quick and dirty solution by simply calculating the coordinates between corner pixels. Take a look:
dlats = np.zeros((data_lats.shape[0],4))+np.nan
dlons = np.zeros((data_lons.shape[0],4))+np.nan
idx = [0,1,3,2,0] #rearrange the corner pixels
for cc in range(4):
dlats[:,cc] = np.mean((data_lats[:,idx[cc]],data_lats[:,idx[cc+1]]), axis=0)
dlons[:,cc] = np.mean((data_lons[:,idx[cc]],data_lons[:,idx[cc+1]]), axis=0)
data_lats = np.column_stack(( data_lats, dlats ))
data_lons = np.column_stack(( data_lons, dlons ))
Thus, the red dots represent the original corners - the blue ones the intermediate pixels between them.
I can do this one more time and include the center pixel (geo[:,[4,9]])
dlats = np.zeros((data.shape[0],8))
dlons = np.zeros((data.shape[0],8))
for cc in range(8):
dlats[:,cc] = np.mean((data_lats[:,cc], geo[:,4]), axis=0)
dlons[:,cc] = np.mean((data_lons[:,cc], geo[:,9]), axis=0)
data_lats = np.column_stack(( data_lats, dlats, geo[:,4] ))
data_lons = np.column_stack(( data_lons, dlons, geo[:,9] ))
This works really nice, and I can assign each point directly to its corresponding grid pixel like this:
row = np.floor( data_lats/delta ) + (llx.shape[0]/2)
col = np.floor( data_lons/delta ) + (llx.shape[1]/2)
However the final binning now takes ~7sec!!! How can I speed this code up:
for ii in np.arange(len(data)):
for cc in np.arange(data_lats.shape[1]):
final_grid[row[ii,cc],col[ii,cc]] += data[ii]
final_grid_counts[row[ii,cc],col[ii,cc]] += 1
You'll need to test the following approach to see if it is fast enough. First, you should modify all your lats and lons into, to make them (possibly fractional) indices into your grid:
idx_lats = (data_lats - lat_grid_start) / lat_grid step
idx_lons = (data_lons - lon_grid_start) / lon_grid step
Next, we want to split your polygons into triangles. For any convex polygon, you could take the center of the polygon as one vertex of all triangles, and then the vertices of the polygon in consecutive pairs. But if your polygon are all quadrilaterals, it is going to be faster to divide them into only 2 triangles, using vertices 0, 1, 2 for the first, and 0, 2, 3 for the second.
To know if a certain point is inside a triangle, I am going to use the barycentric coordinates approach described here. This first function checks whether a bunch of points are inside a triangle:
def check_in_triangle(x, y, x_tri, y_tri) :
A = np.vstack((x_tri[0], y_tri[0]))
lhs = np.vstack((x_tri[1:], y_tri[1:])) - A
rhs = np.vstack((x, y)) - A
uv = np.linalg.solve(lhs, rhs)
# Equivalent to (uv[0] >= 0) & (uv[1] >= 0) & (uv[0] + uv[1] <= 1)
return np.logical_and(uv >= 0, axis=0) & (np.sum(uv, axis=0) <= 1)
Given a triangle by its vertices, you can get the lattice points inside it, by running the above function on the lattice points in the bounding box of the triangle:
def lattice_points_in_triangle(x_tri, y_tri) :
x_grid = np.arange(np.ceil(np.min(x_tri)), np.floor(np.max(x_tri)) + 1)
y_grid = np.arange(np.ceil(np.min(y_tri)), np.floor(np.max(y_tri)) + 1)
x, y = np.meshgrid(x_grid, y_grid)
x, y = x.reshape(-1), y.reshape(-1)
idx = check_in_triangle(x, y, x_tri, y_tri)
return x[idx], y[idx]
And for a quadrilateral, you simply call this last function twice :
def lattice_points_in_quadrilateral(x_quad, y_quad) :
return map(np.concatenate,
zip(lattice_points_in_triangle(x_quad[:3], y_quad[:3]),
lattice_points_in_triangle(x_quad[[0, 2, 3]],
y_quad[[0, 2, 3]])))
If you run this code on your example data, you will get two empty arrays returned: that's because the order of the quadrilateral points is a surprising one: indices 0 and 1 define one diagonal, 2 and 3 the other. My function above was expecting the vertices to be ordered around the polygon. If you really are doing things this other way, you need to change the second call to lattice_points_in_triangle inside lattice_points_in_quadrilateral so that the indices used are [0, 1, 3] instead of [0, 2, 3].
And now, with that change :
>>> idx_lats = (data_lats - (-180) ) / 0.25
>>> idx_lons = (data_lons - (-90) ) / 0.25
>>> lattice_points_in_quadrilateral(idx_lats, idx_lons)
[array([952]), array([955])]
If you change the resolution of your grid to 0.1:
>>> idx_lats = (data_lats - (-180) ) / 0.1
>>> idx_lons = (data_lons - (-90) ) / 0.1
>>> lattice_points_in_quadrilateral(idx_lats, idx_lons)
[array([2381, 2380, 2381, 2379, 2380, 2381, 2378, 2379, 2378]),
array([2385, 2386, 2386, 2387, 2387, 2387, 2388, 2388, 2389])]
Timing wise this approach is going to be, in my system, about 10x too slow for your needs:
In [8]: %timeit lattice_points_in_quadrilateral(idx_lats, idx_lons)
1000 loops, best of 3: 269 us per loop
So you are looking at over 20 sec. to process your 80,000 polygons.