find indices of lat lon point on a grid using python - python

I am new to python, and I can't figure out how to find the minimum distance from a given lat/lon point (which is not given from the grid, but selected by me) to a find the closest indices of a lat/lon point on a grid.
Basically , I am reading in an ncfile that contains 2D coordinates:
coords = 'coords.nc'
fh = Dataset(coords,mode='r')
lons = fh.variables['latitudes'][:,:]
lats = fh.variables['longitudes'][:,:]
fh.close()
>>> lons.shape
(94, 83)
>>> lats.shape
(94, 83)
I want to find the indices in the above grid for the nearest lat lon to the below values:
sel_lat=71.60556
sel_lon=-161.458611
I tried to make lat/lon pairs in order to use the scipy.spatial.distance function, but I still am having problems because I did not set up the input arrays to the format it wants, but I don't understand how to do that:
latLon_pairsGrid = np.vstack(([lats.T],[lons.T])).T
>>> latLon_pairsGrid.shape
(94, 83, 2)
distance.cdist([sel_lat,sel_lon],latLon_pairsGrid,'euclidean')
Any help or hints would be appreciated

Checkout the pyresample package. It provides spatial nearest neighbour search using a fast kdtree approach:
import pyresample
import numpy as np
# Define lat-lon grid
lon = np.linspace(30, 40, 100)
lat = np.linspace(10, 20, 100)
lon_grid, lat_grid = np.meshgrid(lon, lat)
grid = pyresample.geometry.GridDefinition(lats=lat_grid, lons=lon_grid)
# Generate some random data on the grid
data_grid = np.random.rand(lon_grid.shape[0], lon_grid.shape[1])
# Define some sample points
my_lons = np.array([34.5, 36.5, 38.5])
my_lats = np.array([12.0, 14.0, 16.0])
swath = pyresample.geometry.SwathDefinition(lons=my_lons, lats=my_lats)
# Determine nearest (w.r.t. great circle distance) neighbour in the grid.
_, _, index_array, distance_array = pyresample.kd_tree.get_neighbour_info(
source_geo_def=grid, target_geo_def=swath, radius_of_influence=50000,
neighbours=1)
# get_neighbour_info() returns indices in the flattened lat/lon grid. Compute
# the 2D grid indices:
index_array_2d = np.unravel_index(index_array, grid.shape)
print "Indices of nearest neighbours:", index_array_2d
print "Longitude of nearest neighbours:", lon_grid[index_array_2d]
print "Latitude of nearest neighbours:", lat_grid[index_array_2d]
print "Great Circle Distance:", distance_array
There is also a shorthand method for directly obtaining the data values at the nearest grid points:
data_swath = pyresample.kd_tree.resample_nearest(
source_geo_def=grid, target_geo_def=swath, data=data_grid,
radius_of_influence=50000)
print "Data at nearest grid points:", data_swath

I think I found an answer, but it is a workaround that avoids calculating distance between the chosen lat/lon and the lat/lons on the grid. This doesn't seem completely accurate because I am never calculating distances, just the closest difference between lat/lon values themselves.
I used the answer to the question find (i,j) location of closest (long,lat) values in a 2D array
a = abs(lats-sel_lat)+abs(lons-sel_lon)
i,j = np.unravel_index(a.argmin(),a.shape)
Using those returned indices i,j, I can then find on the grid the coordinates that correspond most closely to my selected lat, lon value:
>>> lats[i,j]
71.490295
>>> lons[i,j]
-161.65045

Related

Sort 4 3D coordinates in a winding order in any given direction

I need to sort a selection of 3D coordinates in a winding order as seen in the image below. The bottom-right vertex should be the first element of the array and the bottom-left vertex should be the last element of the array. This needs to work given any direction that the camera is facing the points and at any orientation of those points. Since "top-left","bottom-right", etc is relative, I assume I can use the camera as a reference point? We can also assume all 4 points will be coplanar.
I am using the Blender API (writing a Blender plugin) and have access to the camera's view matrix if that is even necessary. Mathematically speaking is this even possible if so how? Maybe I am overcomplicating things?
Since the Blender API is in Python I tagged this as Python, but I am fine with pseudo-code or no code at all. I'm mainly concerned with how to approach this mathematically as I have no idea where to start.
Since you assume the four points are coplanar, all you need to do is find the centroid, calculate the vector from the centroid to each point, and sort the points by the angle of the vector.
import numpy as np
def sort_points(pts):
centroid = np.sum(pts, axis=0) / pts.shape[0]
vector_from_centroid = pts - centroid
vector_angle = np.arctan2(vector_from_centroid[:, 1], vector_from_centroid[:, 0])
sort_order = np.argsort(vector_angle) # Find the indices that give a sorted vector_angle array
# Apply sort_order to original pts array.
# Also returning centroid and angles so I can plot it for illustration.
return (pts[sort_order, :], centroid, vector_angle[sort_order])
This function calculates the angle assuming that the points are two-dimensional, but if you have coplanar points then it should be easy enough to find the coordinates in the common plane and eliminate the third coordinate.
Let's write a quick plot function to plot our points:
from matplotlib import pyplot as plt
def plot_points(pts, centroid=None, angles=None, fignum=None):
fig = plt.figure(fignum)
plt.plot(pts[:, 0], pts[:, 1], 'or')
if centroid is not None:
plt.plot(centroid[0], centroid[1], 'ok')
for i in range(pts.shape[0]):
lstr = f"pt{i}"
if angles is not None:
lstr += f" ang: {angles[i]:.3f}"
plt.text(pts[i, 0], pts[i, 1], lstr)
return fig
And now let's test this:
With random points:
pts = np.random.random((4, 2))
spts, centroid, angles = sort_points(pts)
plot_points(spts, centroid, angles)
With points in a rectangle:
pts = np.array([[0, 0], # pt0
[10, 5], # pt2
[10, 0], # pt1
[0, 5]]) # pt3
spts, centroid, angles = sort_points(pts)
plot_points(spts, centroid, angles)
It's easy enough to find the normal vector of the plane containing our points, it's simply the (normalized) cross product of the vectors joining two pairs of points:
plane_normal = np.cross(pts[1, :] - pts[0, :], pts[2, :] - pts[0, :])
plane_normal = plane_normal / np.linalg.norm(plane_normal)
Now, to find the projections of all points in this plane, we need to know the "origin" and basis of the new coordinate system in this plane. Let's assume that the first point is the origin, the x axis joins the first point to the second, and since we know the z axis (plane normal) and x axis, we can calculate the y axis.
new_origin = pts[0, :]
new_x = pts[1, :] - pts[0, :]
new_x = new_x / np.linalg.norm(new_x)
new_y = np.cross(plane_normal, new_x)
Now, the projections of the points onto the new plane are given by this answer:
proj_x = np.dot(pts - new_origin, new_x)
proj_y = np.dot(pts - new_origin, new_y)
Now you have two-dimensional points. Run the code above to sort them.
After many hours, I finally found a solution. #Pranav Hosangadi's solution worked for the 2D side of things. However, I was having trouble projecting the 3D coordinates to 2D coordinates using the second part of his solution. I also tried projecting the coordinates as described in this answer, but it did not work as intended. I then discovered an API function called location_3d_to_region_2d() (see docs) which, as the name implies, gets the 2D screen coordinates in pixels of the given 3D coordinate. I didn't need to necessarily "project" anything into 2D in the first place, getting the screen coordinates worked perfectly fine and is much more simple. From that point, I could sort the coordinates using Pranav's function with some slight adjustments to get it in the order illustrated in the screenshot of my first post and I wanted it returned as a list instead of a NumPy array.
import bpy
from bpy_extras.view3d_utils import location_3d_to_region_2d
import numpy
def sort_points(pts):
"""Sort 4 points in a winding order"""
pts = numpy.array(pts)
centroid = numpy.sum(pts, axis=0) / pts.shape[0]
vector_from_centroid = pts - centroid
vector_angle = numpy.arctan2(
vector_from_centroid[:, 1], vector_from_centroid[:, 0])
# Find the indices that give a sorted vector_angle array
sort_order = numpy.argsort(-vector_angle)
# Apply sort_order to original pts array.
return list(sort_order)
# Get 2D screen coords of selected vertices
region = bpy.context.region
region_3d = bpy.context.space_data.region_3d
corners2d = []
for corner in selected_verts:
corners2d.append(location_3d_to_region_2d(
region, region_3d, corner))
# Sort the 2d points in a winding order
sort_order = sort_points(corners2d)
sorted_corners = [selected_verts[i] for i in sort_order]
Thanks, Pranav for your time and patience in helping me solve this problem!
There is a simpler and faster solution for the Blender case:
1.) The following code sorts 4 planar points in 2D (vertices of the plane object in Blender) very efficiently:
def sort_clockwise(pts):
rect = np.zeros((4, 2), dtype="float32")
s = pts.sum(axis=1)
rect[0] = pts[np.argmin(s)]
rect[2] = pts[np.argmax(s)]
diff = np.diff(pts, axis=1)
rect[1] = pts[np.argmin(diff)]
rect[3] = pts[np.argmax(diff)]
return rect
2.) Blender keeps vertices related data, such as the translation, rotation and scale in the world matrix. If you query for vertices.co(ordinates) only, you just get the original coordinates, without translation, rotation and scaling. But that does not affect the order of vertices. That simplifies the problem because what you get is actually a 2D (with z's = 0) mesh data. If you sort that 2D data (excluding z's) you will get the information, the sort indices for the 3D sorted data. You can modify the code above to get the indices from that 2D array. For the plane object of Blender, for some reason the order is always [0,1,3,2], not [0,1,2,3]. The following modified code gives the sorted indices for the vertices data in 2D.
def sorted_ix_clockwise(pts):
#rect = zeros((4, 2), dtype="float32")
ix = array([0,0,0,0])
s = pts.sum(axis=1)
#rect[0] = pts[argmin(s)]
#rect[2] = pts[argmax(s)]
ix[0] = argmin(s)
ix[2] = argmax(s)
dif = diff(pts, axis=1)
#rect[1] = pts[argmin(dif)]
#rect[3] = pts[argmax(dif)]
ix[1] = argmin(dif)
ix[3] = argmax(dif)
return ix
You can use these indices to get the actual 3D sorted data, which you can obtain by multiplying vertices coordinates with the world matrix to include any translation, rotation and scaling.

Efficiently find all points within sorted 2D Numpy Array

How do I efficiently find the set of points within a circle of a given radius and centre from a sorted numpy array of equally spaced points?
For example, this is my code and how I currently extract those points within the radius.
import numpy as np
n_points = 10000
x_lim = [0, 100]
y_lim = [0, 100]
x, y = np.meshgrid(np.linspace(*x_lim, n_points), np.linspace(*y_lim, n_points))
xy = np.vstack((x.flatten(), y.flatten())).T 
# Current approach
radius = 5
point = np.array([50, 35], dtype=float) 
# Indexes of those points within a circle of radius centered at point
idxs = np.linalg.norm(point - xy, axis=-1) < radius
points_within_circle = xy[idxs]
How do I do I calculate these indexes more efficiently? I imagine because the array is structured and has a set distance between each point I should be able to exploit this to eliminate most of the checks.
One of the most important tricks that people forget is that it is a lot faster to calculate distance**2 and compare it to radius**2, than to calculate if distance < radius. So given that it looks like you're using a center of 0, calculate x**2 + y**2, and compare to 25.

K-nearest points from two dataframes with GeoPandas

GeoPandas uses shapely under the hood. To get the nearest neighbor I saw the use of nearest_points from shapely. However, this approach does not include k-nearest points.
I needed to compute distances to nearest points from to GeoDataFrames and insert the distance into the GeoDataFrame containing the "from this point" data.
This is my approach using GeoSeries.distance() without using another package or library. Note that when k == 1 the returned value essentially shows the distance to the nearest point. There is also a GeoPandas-only solution for nearest point by #cd98 which inspired my approach.
This works well for my data, but I wonder if there is a better or faster approach or another benefit to use shapely or sklearn.neighbors?
import pandas as pd
import geopandas as gp
gdf1 > GeoDataFrame with point type geometry column - distance from this point
gdf2 > GeoDataFrame with point type geometry column - distance to this point
def knearest(from_points, to_points, k):
distlist = to_points.distance(from_points)
distlist.sort_values(ascending=True, inplace=True) # To have the closest ones first
return distlist[:k].mean()
# looping through a list of nearest points
for Ks in [1, 2, 3, 4, 5, 10]:
name = 'dist_to_closest_' + str(Ks) # to set column name
gdf1[name] = gdf1.geometry.apply(knearest, args=(gdf2, closest_x))
yes there is, but first, I must credit the University of Helsinki from automating GIS process, here's the source code. Here's how
first, read the data, for example, finding nearest bus stops for each building.
# Filepaths
stops = gpd.read_file('data/pt_stops_helsinki.gpkg')
buildings = read_gdf_from_zip('data/building_points_helsinki.zip')
define the function, here, you can adjust the k_neighbors
from sklearn.neighbors import BallTree
import numpy as np
def get_nearest(src_points, candidates, k_neighbors=1):
"""Find nearest neighbors for all source points from a set of candidate points"""
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='haversine')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
# Get closest indices and distances (i.e. array at index 0)
# note: for the second closest points, you would take index 1, etc.
closest = indices[0]
closest_dist = distances[0]
# Return indices and distances
return (closest, closest_dist)
def nearest_neighbor(left_gdf, right_gdf, return_dist=False):
"""
For each point in left_gdf, find closest point in right GeoDataFrame and return them.
NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).
"""
left_geom_col = left_gdf.geometry.name
right_geom_col = right_gdf.geometry.name
# Ensure that index in right gdf is formed of sequential numbers
right = right_gdf.copy().reset_index(drop=True)
# Parse coordinates from points and insert them into a numpy array as RADIANS
left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
right_radians = np.array(right[right_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
# Find the nearest points
# -----------------------
# closest ==> index in right_gdf that corresponds to the closest point
# dist ==> distance between the nearest neighbors (in meters)
closest, dist = get_nearest(src_points=left_radians, candidates=right_radians)
# Return points from right GeoDataFrame that are closest to points in left GeoDataFrame
closest_points = right.loc[closest]
# Ensure that the index corresponds the one in left_gdf
closest_points = closest_points.reset_index(drop=True)
# Add distance if requested
if return_dist:
# Convert to meters from radians
earth_radius = 6371000 # meters
closest_points['distance'] = dist * earth_radius
return closest_points
Do the nearest neighbours analysis
# Find closest public transport stop for each building and get also the distance based on haversine distance
# Note: haversine distance which is implemented here is a bit slower than using e.g. 'euclidean' metric
# but useful as we get the distance between points in meters
closest_stops = nearest_neighbor(buildings, stops, return_dist=True)
now join the from and to data frame
# Rename the geometry of closest stops gdf so that we can easily identify it
closest_stops = closest_stops.rename(columns={'geometry': 'closest_stop_geom'})
# Merge the datasets by index (for this, it is good to use '.join()' -function)
buildings = buildings.join(closest_stops)
The answer above using Automating GIS-processes is really nice but there is an error when converting points as a numpy array as RADIANS. The latitude and longitude are reversed.
left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.y * np.pi / 180, geom.x * np.pi / 180)).to_list())
Indeed Points are given with (lat, lon) but the longitude correspond the x-axis of a plan or a sphere and the latitude to the y-axis.
If your data are in grid coordinates, then the approach is a bit leaner, but with one key gotcha.
Building on sutan's answer and streamlining the block from the Uni Helsinki...
To get multiple neighbors, you edit the k_neighbors argument....and must ALSO hard code vars within the body of the function (see my additions below 'closest' and 'closest_dist') AND add them to the return statement.
Thus, if you want the 2 closest points, it looks like:
from sklearn.neighbors import BallTree
import numpy as np
def get_nearest(src_points, candidates, k_neighbors=2):
"""
Find nearest neighbors for all source points from a set of candidate points
modified from: https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html
"""
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='euclidean')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
# Get closest indices and distances (i.e. array at index 0)
# note: for the second closest points, you would take index 1, etc.
closest = indices[0]
closest_dist = distances[0]
closest_second = indices[1] # *manually add per comment above*
closest_second_dist = distances[1] # *manually add per comment above*
# Return indices and distances
return (closest, closest_dist, closest_sec, closest_sec_dist)
The inputs are lists of (x,y) tuples. Thus, since (by question title) your data is in a GeoDataframe:
# easier to read
in_pts = [(row.geometry.x, row.geometry.y) for idx, row in gdf1.iterrows()]
qry_pts = [(row.geometry.x, row.geometry.y) for idx, row in gdf2.iterrows()]
# faster (by about 7X)
in_pts = [(x,y) for x,y in zip(gdf1.geometry.x , gdf1.geometry.y)]
qry_pts = [(x,y) for x,y in zip(gdf2.geometry.x , gdf2.geometry.y)]
I'm not interested in distances, so instead of commenting out of the function, I run:
idx_nearest, _, idx_2ndnearest, _ = get_nearest(in_pts, qry_pts)
and get two arrays of the same length of in_pts that, respectively, contain index values of the closest and second closest points from the original geodataframe for qry_pts.
Great solution! If you are using Automating GIS-processes solution, make sure to reset the index of buildings geoDataFrame before join (only if you are using a subset of left_gdf).
buildings.insert(0, 'Number', range(0,len(buildings)))
buildings.set_index('Number' , inplace = True)
Based on the answers before I have a all-in-one solution for you which takes two geopandas.DataFrames as input and searches for the nearest k-neighbors.
def get_nearest_neighbors(gdf1, gdf2, k_neighbors=2):
'''
Find k nearest neighbors for all source points from a set of candidate points
modified from: https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html
Parameters
----------
gdf1 : geopandas.DataFrame
Geometries to search from.
gdf2 : geopandas.DataFrame
Geoemtries to be searched.
k_neighbors : int, optional
Number of nearest neighbors. The default is 2.
Returns
-------
gdf_final : geopandas.DataFrame
gdf1 with distance, index and all other columns from gdf2.
'''
src_points = [(x,y) for x,y in zip(gdf1.geometry.x , gdf1.geometry.y)]
candidates = [(x,y) for x,y in zip(gdf2.geometry.x , gdf2.geometry.y)]
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='euclidean')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
closest_gdfs = []
for k in np.arange(k_neighbors):
gdf_new = gdf2.iloc[indices[k]].reset_index()
gdf_new['distance'] = distances[k]
gdf_new = gdf_new.add_suffix(f'_{k+1}')
closest_gdfs.append(gdf_new)
closest_gdfs.insert(0,gdf1)
gdf_final = pd.concat(closest_gdfs,axis=1)
return gdf_final

Finding the closest ground pixel on an irregular grid for given coordinates

I work with satellite data organised on an irregular two-dimensional grid whose dimensions are scanline (along track dimension) and ground pixel (across track dimension). Latitude and longitude information for each ground pixel are stored in auxiliary coordinate variables.
Given a (lat, lon) point, I would like to identify the closest ground pixel on my set of data.
Let's build a 10x10 toy data set:
import numpy as np
import xarray as xr
import cartopy.crs as ccrs
import matplotlib.pyplot as plt
%matplotlib inline
lon, lat = np.meshgrid(np.linspace(-20, 20, 10),
np.linspace(30, 60, 10))
lon += lat/10
lat += lon/10
da = xr.DataArray(data = np.random.normal(0,1,100).reshape(10,10),
dims=['scanline', 'ground_pixel'],
coords = {'lat': (('scanline', 'ground_pixel'), lat),
'lon': (('scanline', 'ground_pixel'), lon)})
ax = plt.subplot(projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, cmap=plt.cm.get_cmap('Blues'),
infer_intervals=True);
ax.scatter(lon, lat, transform=ccrs.PlateCarree())
ax.coastlines()
ax.gridlines(draw_labels=True)
plt.tight_layout()
Note that the lat/lon coordinates identify the centre pixel and the pixel boundaries are automatically inferred by xarray.
Now, say I want to identify the closest ground pixel to Rome.
The best way I came up with so far is to use a scipy's kdtree on a stacked flattened lat/lon array:
from scipy import spatial
pixel_center_points = np.stack((da.lat.values.flatten(), da.lon.values.flatten()), axis=-1)
tree = spatial.KDTree(pixel_center_points)
rome = (41.9028, 12.4964)
distance, index = tree.query(rome)
print(index)
# 36
I have then to apply unravel_index to get my scanline/ground_pixel indexes:
pixel_coords = np.unravel_index(index, da.shape)
print(pixel_coords)
# (3, 6)
Which gives me the scanline/ground_pixel coordinates of the (supposedly) closest ground pixel to Rome:
ax = plt.subplot(projection=ccrs.PlateCarree())
da.plot.pcolormesh('lon', 'lat', ax=ax, cmap=plt.cm.get_cmap('Blues'),
infer_intervals=True);
ax.scatter(da.lon[pixel_coords], da.lat[pixel_coords],
marker='x', color='r', transform=ccrs.PlateCarree())
ax.coastlines()
ax.gridlines(draw_labels=True)
plt.tight_layout()
I'm convinced there must me a much more elegant way to approach this problem. In particular, I would like to get rid of the flattening/unraveling steps (all my attempts to build a kdtree on a two-dimensional array failed miserably), and make use of my xarray dataset's variables as much as possible (adding a new centre_pixel dimension for example, and using it as input to KDTree).
I am going to answer my own question as I believe I came up with a decent solution, which is discussed at much greater length on my blog post on this subject.
Geographical distance
First of all, defining the distance between two points on the earth's surface as simply the euclidean distance between the two lat/lon pairs could lead to inaccurate results depending on the distance between two points. It is thus better to transform the coordinates to ECEF coordinates first and built a KD-Tree on the transformed coordinates. Assuming points on the surface of the planet (h=0) the coordinate transformation is done as such:
def transform_coordinates(coords):
""" Transform coordinates from geodetic to cartesian
Keyword arguments:
coords - a set of lan/lon coordinates (e.g. a tuple or
an array of tuples)
"""
# WGS 84 reference coordinate system parameters
A = 6378.137 # major axis [km]
E2 = 6.69437999014e-3 # eccentricity squared
coords = np.asarray(coords).astype(np.float)
# is coords a tuple? Convert it to an one-element array of tuples
if coords.ndim == 1:
coords = np.array([coords])
# convert to radiants
lat_rad = np.radians(coords[:,0])
lon_rad = np.radians(coords[:,1])
# convert to cartesian coordinates
r_n = A / (np.sqrt(1 - E2 * (np.sin(lat_rad) ** 2)))
x = r_n * np.cos(lat_rad) * np.cos(lon_rad)
y = r_n * np.cos(lat_rad) * np.sin(lon_rad)
z = r_n * (1 - E2) * np.sin(lat_rad)
return np.column_stack((x, y, z))
Building the KD-Tree
We could then build the KD-Tree by transforming our dataset coordinates, taking care of flattening the 2D grid to a one-dimensional sequence of lat/lon tuples. This is because the KD-Tree input data needs to be (N,K), where N is the number of point and K is the dimensionality (K=2 in our case, as we assume no heigth component).
# reshape and stack coordinates
coords = np.column_stack((da.lat.values.ravel(),
da.lon.values.ravel()))
# construct KD-tree
ground_pixel_tree = spatial.cKDTree(transform_coordinates(coords))
Querying the tree and indexing the xarray dataset
Querying the tree is now as simple as transforming our point's lat/lon coordinates to ECEF and passing those to the tree's query method:
rome = (41.9028, 12.4964)
index = ground_pixel_tree.query(transform_coordinates(rome))
In doing so though, we need to unravel our index on the original dataset's shape, to get the scanline/ground_pixel indexes:
index = np.unravel_index(index, self.shape)
We could now use the two components to index our original xarray dataset, but we could also build two indexers to use with xarray pointwise indexing feature:
index = xr.DataArray(index[0], dims='pixel'), \
xr.DataArray(index[1], dims='pixel')
Getting the closest pixel is now easy and elegant at the same time:
da[index]
Note that we could also query more than one point at once, and by building the indexers as above, we could still index the dataset with a single call:
da[index]
Which would then return a subset of the dataset containing the closest ground pixels to our query points.
Further readings
Using the euclidean norm on the lat/lon tuples could be accurate enough for smaller distance (thing of it as approximating the earth as flat, it works on a small scale). More details on geographic distances here.
Using a KD-Tree to find the nearest neighbour is not the only way to address this problem, see this very comprehensive article.
An implementation of KD-Tree directly into xarray is in the pipeline.
My blog post on the subject.

How to vectorize a python code that needs interpolation for specific data points

I have a problem where I use a computer program called MCNP to calculate the energy deposition in a square geometry from a particle flux. The square geometry is broken down into a mesh grid with 50 cubic meshes in length, width and height. The data is placed into a text file displaying the centroid position of each mesh in cartesian coordinates (x,y and z position) and the energy deposition at that x,y,z coordinate. The data is then extracted with a Python script. I have a script that allows me to take a slice in the z plane and plot a heat map of energy deposition on that plane and the script works, but I dont think it is very efficient and I am looking for solutions to vectorize the process.
The code reads in the X, Y and Z coordinates as three separate 1-D numpy arrays and also reads in the energy deposition at that coordinate as a 1-D numpy array. For the sake of this description, lets assume I want to take a slice at the Z coordinate of zero, but none of the mesh centroids are at the z-coordinate of 0, then I have to (and do) cycle through all of the data points in the Z-coordinate array until it finds one that is greater than zero (array index i) with a proceeding array index (i-1) that is less than zero. It then needs to use those array points in Z-space along with the slice location (in this case 0) and the energy deposition at those array indices and interpolate to find the correct energy deposition at that z-location of the slice. Since the X and Y arrays are unaffected, now I have the coordinate of X, Y and can plot a heat map of that specific X,Y location and the Energy deposition at the slice location. The code also needs to determine if the slice location is already in the data set, in which case no interpolation is needed. The code I have works, but I could not see how to use built in scipy interpolation schemes and instead wrote a function to do the interpolation. In this scenario and had to use a for loop to iterate until I found the position where the z-position was above and below the slice location (z=0 in this instance). I am attaching my example code in this post and am asking for help to better vectorize this code snippet (if it can be better vectorized) and hopefully learn something in the process.
# - This transforms the read in data from a list to a numpy array
# where Magnitude represents the energy deposition
XArray = np.array(XArray); YArray = np.array(YArray)
ZArray = np.array(ZArray); Magnitude = np.array(Magnitude)
#==============================================================
# - This section creates planar data for a 2-D plot
# Interpolation function for determining 2-D slice of 3-D data
def Interpolate(X1,X2,Y1,Y2,X3):
Slope = (Y2-Y1)/(X2-X1)
Y3 = (X3-X1)*Slope
Y3 = Y3 + Y1
return Y3
# This represents the location on the Z-axis where a slice is taken
Slice_Location = 0.0
XVal = []; YVal = []; ZVal = []
Tally = []; Error = []
counter = 1.0
length = len(XArray)-1
for numbers in range(length):
# - If data falls on the selected plane location then use existing data
if ZArray[counter] == Slice_Location:
XVal.append(XArray[counter])
YVal.append(YArray[counter])
ZVal.append(ZArray[counter])
Tally.append(float(Magnitude[counter]))
# - If existing data does not exist on selected plane then interpolate
if ZArray[counter-1] < Slice_Location and ZArray[counter] > Slice_Location:
XVal.append(XArray[counter])
YVal.append(YArray[counter])
ZVal.append(Slice_Location)
Value = Interpolate(ZArray[counter-1],ZArray[counter],Magnitude[counter-1], \
Magnitude[counter],Slice_Location)
Tally.append(float(Value))
counter = counter + 1
XVal = np.array(XVal); YVal = np.array(YVal); ZVal = np.array(ZVal)
Tally = np.array(Tally);

Categories

Resources