SciKits BallTree method gives me incorrect "nearest neighbor"

SciKits BallTree method gives me incorrect "nearest neighbor" - python

I'm using code from the source given below to get the nearest "site".
Source: https://automating-gis-processes.github.io/site/notebooks/L3/nearest-neighbor-faster.html
My Code:
# Read data from a DB
test_df = pd.read_sql_query(sql, conn)
# Calculates distance between 2 points on a map using lat and long
# (Source: https://towardsdatascience.com/heres-how-to-calculate-distance-between-2-geolocations-in-python-93ecab5bbba4)
def haversine_distance(lat1, lon1, lat2, lon2):
r = 6371
phi1 = np.radians(float(lat1))
phi2 = np.radians(float(lat2))
delta_phi = np.radians(lat2 - lat1)
delta_lambda = np.radians(lon2- lon1)
a = np.sin(delta_phi / 2)**2 + np.cos(phi1) * np.cos(phi2) * np.sin(delta_lambda / 2)**2
res = r * (2 * np.arctan2(np.sqrt(a), np.sqrt(1 - a)))
return np.round(res, 2)
test_df["actualDistance (km)"] = test_df.apply(lambda row: haversine_distance(row['ClientLat'],row['ClientLong'],row['actual_SLa'],row['actual_SLo']), axis=1)
test_gdf = geopandas.GeoDataFrame(test_df, geometry=geopandas.points_from_xy(test_df.ClientLong, test_df.ClientLat))
site_gdf = geopandas.GeoDataFrame(site_df, geometry=geopandas.points_from_xy(site_df.SiteLong, site_df.SiteLat))
#-------Set up the functions as shown in the tutorial-------
def get_nearest(src_points, candidates, k_neighbors=1):
"""Find nearest neighbors for all source points from a set of candidate points"""
# Create tree from the candidate points
tree = BallTree(candidates, leaf_size=15, metric='haversine')
# Find closest points and distances
distances, indices = tree.query(src_points, k=k_neighbors)
# Transpose to get distances and indices into arrays
distances = distances.transpose()
indices = indices.transpose()
# Get closest indices and distances (i.e. array at index 0)
# note: for the second closest points, you would take index 1, etc.
closest = indices[0]
closest_dist = distances[0]
# Return indices and distances
return (closest, closest_dist)
def nearest_neighbor(left_gdf, right_gdf, return_dist=False):
"""
For each point in left_gdf, find closest point in right GeoDataFrame and return them.
NOTICE: Assumes that the input Points are in WGS84 projection (lat/lon).
"""
left_geom_col = left_gdf.geometry.name
right_geom_col = right_gdf.geometry.name
# Ensure that index in right gdf is formed of sequential numbers
right = right_gdf.copy().reset_index(drop=True)
# Parse coordinates from points and insert them into a numpy array as RADIANS
left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
right_radians = np.array(right[right_geom_col].apply(lambda geom: (geom.x * np.pi / 180, geom.y * np.pi / 180)).to_list())
# Find the nearest points
# -----------------------
# closest ==> index in right_gdf that corresponds to the closest point
# dist ==> distance between the nearest neighbors (in meters)
closest, dist = get_nearest(src_points=left_radians, candidates=right_radians)
# Return points from right GeoDataFrame that are closest to points in left GeoDataFrame
closest_points = right.loc[closest]
# Ensure that the index corresponds the one in left_gdf
closest_points = closest_points.reset_index(drop=True)
# Add distance if requested
if return_dist:
# Convert to meters from radians
earth_radius = 6371000 # meters
closest_points['distance'] = dist * earth_radius
return closest_points
closest_sites = nearest_neighbor(test_gdf, site_gdf, return_dist=True)
# Rename the geometry of closest sites gdf so that we can easily identify it
closest_sites = closest_sites.rename(columns={'geometry': 'closest_site_geom'})
# Merge the datasets by index (for this, it is good to use '.join()' -function)
test_gdf = test_gdf.join(closest_sites)
#Extracted closest site latitude and longitude for data analysis
test_gdf['CS_lo'] = test_gdf.closest_site_geom.apply(lambda p: p.x)
test_gdf['CS_la'] = test_gdf.closest_site_geom.apply(lambda p: p.y)
The code is a replica of the tutorial link I provided. And based on their explanation it should've worked.
To verify this data I got some statistical data using .describe(), and it showed me that the tutorials method did indeed give me a mean distance that was much closer than the distance in the actual data (792 m vs the actual distance which was 1.80 km).
Closest Distance generated using the BallTree method
Actual Distance in the data
However when I plotted them out on a map using plotly I noticed that the BallTree method's outputs weren't closer than the "actual" distance.
This is generally what the plotted data looks like (Blue: predetermined site, Red: site predicted using the BallTree method
Could someone help me track down the discrepancy

I'm not sure why this works but it did. I decided to just write the code based on the docs instead of following the tutorial and this worked:
# Build BallTree with haversine distance metric, which expects (lat, lon) in radians and returns distances in radians
dist = DistanceMetric.get_metric('haversine')
tree = BallTree(np.radians(site_df[['SiteLat', 'SiteLong']]), metric=dist)
test_coords = np.radians(test_df[['ClientLat', 'ClientLong']])
dists, ilocs = tree.query(test_coords)

The problem is that the tutorial code provides coordinates in Longitude, Latitude format instead of the Latitude, Longitude format BallTree anticipates. So you're measuring distances between inverted points.
If you swap the order of geom.x and geom.y in the coordinate parsing code you will get correct measurements.
# Parse coordinates from points and insert them into a numpy array as RADIANS
left_radians = np.array(left_gdf[left_geom_col].apply(lambda geom: (geom.y * np.pi / 180, geom.x * np.pi / 180)).to_list())
right_radians = np.array(right[right_geom_col].apply(lambda geom: (geom.y * np.pi / 180, geom.x * np.pi / 180)).to_list())

Related

Create NxM array from 2 arrays/lists of N and M size

I have longitude and latitude arrays of fixed resolutions i.e. .1. This gives me 1800 lats and 3600 lons. I want to create a matrix of 1800x 3600 that will store area for each grid based on the formula here . i.e.
A = 2piR^2 |sin(lat1)-sin(lat2)| |lon1-lon2|/360
I have lons are lats already in arrays which represents centre of the grid.
Currently I use a formula, which calculates area for a given rectangle box.
def grid_area(lat1, lon1, lat2, lon2, radius= 6365000):
"""
Calculate grid area based on lat-long points of rectangle/square grid size by degrees.
Calculations are without any prohection system.
radius in meters is used to make it generic. Defaults to Earth
Formuala from : https://www.pmel.noaa.gov/maillists/tmap/ferret_users/fu_2004/msg00023.html
"""
import numpy as np
area = (np.pi/180)*(radius**2) *np.abs(np.sin(np.radians(lat1)) - np.sin(np.radians(lat2))) * np.abs(lon1 -lon2)/360
return area
I use this in a double loop for each lat/lon combination to get the area_grid.
grid_areas = np.zeros((len(lats), len(longs)))
for ll in range(len(longs)-1):
for lt in range(len(lats)-1):
lt1 = np.round(lats[lt]+.05,2)
ll1 = np.round(longs[ll]-.05,2)
lt2 = np.round(lats[lt]-.05,2)
ll2 = np.round(longs[ll]+.05,2)
grid_areas[lt,ll] = grid_area(lt1,ll1,lt2,ll2)
This as expected is slow. I am not sure which approach I can use to make it efficient.
I looked through the forum to create NxM matrixes, but not able to get the solution for this problem.
While writing this question, came across this thread on stackoverflow to use itertools.chain. Will try to change my code as per this, if that helps. Will update my findings on that.
In the meantime, any help in the right direction would help.
UPDATE:
I changed my code using itertools.product
lat_longs = np.array(list(itertools.product(*[lats.tolist(),longs.tolist()])))
and updated the function to accept centroids.
def grid_area(lat=None, lon=None, grid_size=.1, radius= 6365000):
"""
Calculate grid area based on lat-long points of rectangle/square grid size by degrees.
Calculations are without any prohection system.
radius in meters is used to make it generic. Defaults to Earth
Formuala from : https://www.pmel.noaa.gov/maillists/tmap/ferret_users/fu_2004/msg00023.html
"""
import numpy as np
grid_delta = grid_size/2
lat1 = lat+grid_delta
lat2 = lat-grid_delta
lon1 = lon - grid_delta
lon2 = lon + grid_delta
area = (np.pi/180)*(radius**2) *np.abs(np.sin(np.radians(lat1)) - np.sin(np.radians(lat2))) * np.abs(lon1 -lon2)/360
return area
I then rearrange the return area array using
areas_mat = areas.reshape((lats.shape[0], longs.shape[0]))
Now the longest part of the code is the itertools.product. it takes about 4.5 seconds, while the area calculation takes only about 350ms.
Any other way to get that first combination faster?
Update2: Final code
Once I tried, I found that area was not correct, even when the code was aligned with formula in the link. used the 2nd source for final version. Final code is
def grid_area_vec(lat=None, lon=None, grid_size=.1, radius= 6365000):
"""
Calculate grid area based on lat-long points of rectangle/square grid size by degrees.
Calculations are without any prohection system.
radius in meters is used to make it generic. Defaults to Earth
Orig Formula from : https://www.pmel.noaa.gov/maillists/tmap/ferret_users/fu_2004/msg00023.html
Another source for formula, finally used
https://gis.stackexchange.com/questions/413349/calculating-area-of-lat-lon-polygons-without-transformation-using-geopandas
"""
import numpy as np
grid_delta = 0.5 * grid_size
# dlon: (3600,)
dlon = np.full(lon.shape, np.deg2rad(grid_size))
# dlat: (1800, 1)
dlat = np.abs(np.sin(np.deg2rad(lat + grid_delta)) -
np.sin(np.deg2rad(lat - grid_delta)))[:, None]
# area: (1800, 3600)
# area = np.deg2rad(radius**2 * dlat * dlon)
area = radius**2 * (dlat * dlon)
return area

You can trivially vectorize this operation across all your arrays. Given an array lats with shape (1800,), and an array lons with shape (3600,), you can reshape them so that the broadcasted computation yields an array of the correct shape.
grid_delta = 0.5 * grid_size
# dlon: (3600,)
dlon = np.full(lons.shape, np.rad2deg(grid_size))
# dlat: (1800, 1)
dlat = np.abs(np.sin(np.deg2rad(lats + grid_delta)) -
np.sin(np.deg2rad(lats - grid_delta)))[:, None]
# area: (1800, 3600)
area = np.rad2deg(radius**2 * dlat * dlon)

pint: convert geographic CRS degrees to nautical miles

I would like to use pint to convert degrees (distance in a geographic CRS) into nautical miles.
https://geopandas.org/docs/reference/api/geopandas.GeoDataFrame.sjoin_nearest.html outputs distance in degree for epsg:4326.
Given distance (in nm) varies from equator to pole i'm not sure if this is possible.
I could use a rule of thumb of 1 deg ~= 111 km ~= 60 nm.
Perhaps it can be calculated using the starting point and distance using something like: https://github.com/anitagraser/movingpandas/blob/master/movingpandas/geometry_utils.py#L38
This code is also useful: https://geopy.readthedocs.io/en/stable/#module-geopy.distance
Here's some code to test:
import pandas as pd
import geopandas as gpd
df = pd.DataFrame({"lon": [0], "lat": [0]})
gdf_pt = gpd.GeoDataFrame(df, geometry=gpd.points_from_xy(df["lon"], df["lat"]), crs="epsg:4326")
df2 = pd.DataFrame({"lon": [1, 2], "lat": [0, 0]})
gdf_pts = gpd.GeoDataFrame(df2, geometry=gpd.points_from_xy(df2["lon"], df2["lat"]), crs="epsg:4326")
value = gdf_pt.sjoin_nearest(gdf_pts, distance_col="distances")["distances"].values[0]
import pint
l = value * ureg.arcdegree

Probably best to throw it to Mercator and use that if you can
import pint_pandas
gdf = gdf_pt.to_crs("EPSG:3395").sjoin_nearest(gdf_pts.to_crs("EPSG:3395"), distance_col="distances")
gdf["distance"] = gdf["distance"].astype("pint[meter]").pint.to("nautical_mile")

This function, which I've pulled from existing code, computes the distance in meters between two lat/long sets. "rlat" and "rlong" are expressed in radians; you'll have to do the conversion from degrees. To get nm instead of meters, just set R to 3440.
from math import *
# Radius of the earth, in meters.
R = 6371000
# Return distance between two lat/longs.
def distance( pt1, pt2 ):
rlat1 = pt1.rlat
rlat2 = pt2.rlat
dlat = pt2.rlat - pt1.rlat
dlong = pt2.rlong - pt1.rlong
a = sin(dlat/2) * sin(dlat/2) + cos(rlat1) * cos(rlat2) * sin(dlong/2) * sin(dlong/2)
c = 2 * atan2(sqrt(a), sqrt(1-a))
return R * c

Converting GPS to Cartesian coordinates

I am attempting to convert a gps location for a Cartesian x,y,z location relative to a second gps location in python. I am using the standard trigonometry based on other info I found on other forums and was expect x,y,z to come out in meters.
#Get cartesian coordinates relative to a center
import math
centerLat = 0.7127
centerLon = -1.2906
centerAlt = -32.406
pointLat = float(input("Enter latitude in degrees for the new point")) * 3.141592653589 / 180
pointLon = float(input("Enter longitude in degrees for the new point")) * 3.141592653589 / 180
pointAlt = float(input("Enter altitude in meters for the new point"))
r = centerAlt + 6378137
xCenter = r * math.cos(centerLat) * math.cos(centerLon)
yCenter = r * math.cos(centerLat) * math.sin(centerLon)
zCenter = r * math.sin(centerLat)
r = pointAlt + 6378137
xPoint = r * math.cos(pointLat) * math.cos(pointLon)
yPoint = r * math.cos(pointLat) * math.sin(pointLon)
zPoint = r * math.sin(pointLat)
x = xPoint - xCenter
y = yPoint - yCenter
z = zPoint - zCenter`enter code here`
For some reason these two points after conversion are now farther apart than they should be. If someone could give me advice on what I'm doing wrong that would be great.
Edit:
Here is the point I am trying to convert.
Lat: 40.767870
Lon: -73.885160
Alt: 48.463201
I am using a landscape in unreal engine for reference which I have imported from GIS Data using this tutorial and realized since posting this that the landscapes where not centered where I thought they were. I have adjusted them and the result is much closer now but still not lining up. I wonder if it might be an issue of scale or rotation.
all formulas and constants were found in the below links
https://stackoverflow.com/questions/8981943/lat-long-to-x-y-z-position-in-js-not-working

Calculating geographic distance between a list of coordinates (lat, lng)

I'm writing a flask application, using some data extracted from a GPS sensor. I am able to draw the route on a Map and I want to calculate the distance the GPS sensor traveled. One way could be to just get the start and end coordinates, however due to the way the sensor travels this is quite inaccurate. Therefore I do sampling of each 50 sensor samples. If the real sensor sample size was 1000 I will now have 20 samples (by extracting each 50 sample).
Now I want to be able to put my list of samples through a function to calculate distance. So far I've been able to use the package geopy, but when I take large gps sample sets I do get "too many requests" errors, not to mention I will have extra processing time from processing the requests, which is not what I want.
Is there a better approach to calculating the cumulative distance of a list element containing latitude and longitude coordinates?
positions = [(lat_1, lng_1), (lat_2, lng_2), ..., (lat_n, lng_n)]
I found methods for lots of different mathematical ways of calculating distance using just 2 coordinates (lat1, lng1 and lat2 and lng2), but none supporting a list of coordinates.
Here's my current code using geopy:
from geopy.distance import vincenty
def calculate_distances(trips):
temp = {}
distance = 0
for trip in trips:
positions = trip['positions']
for i in range(1, len(positions)):
distance += ((vincenty(positions[i-1], positions[i]).meters) / 1000)
if i == len(positions):
temp = {'distance': distance}
trip.update(temp)
distance = 0
trips is a list element containing dictionaries of key-value pairs of information about a trip (duration, distance, start and stop coordinates and so forth) and the positions object inside trips is a list of tuple coordinates as visualized above.
trips = [{data_1}, {data_2}, ..., {data_n}]

Here's the solution I ended up using. It's called the Haversine (distance) function if you want to look up what it does for yourself.
I changed my approach a little as well. My input (positions) is a list of tuple coordinates:
def calculate_distance(positions):
results = []
for i in range(1, len(positions)):
loc1 = positions[i - 1]
loc2 = positions[i]
lat1 = loc1[0]
lng1 = loc1[1]
lat2 = loc2[0]
lng2 = loc2[1]
degreesToRadians = (math.pi / 180)
latrad1 = lat1 * degreesToRadians
latrad2 = lat2 * degreesToRadians
dlat = (lat2 - lat1) * degreesToRadians
dlng = (lng2 - lng1) * degreesToRadians
a = math.sin(dlat / 2) * math.sin(dlat / 2) + math.cos(latrad1) * \
math.cos(latrad2) * math.sin(dlng / 2) * math.sin(dlng / 2)
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
r = 6371000
results.append(r * c)
return (sum(results) / 1000) # Converting from m to km

I'd recommend transform your (x, y) coordinates into complex, as it is computational much easier to calculate distances. Thus, the following function should work:
def calculate_distances(trips):
for trip in trips:
positions = trip['positions']
c_pos = [complex(c[0],c[1]) for c in positions]
distance = 0
for i in range(1, len(c_pos)):
distance += abs(c_pos[i] - c_pos[i-1])
trip.update({'distance': distance})
What I'm doing is converting every (lat_1, lng_1) touple into a single complex number c1 = lat_1 + j*lng_1, and creates a list formed by [c1, c2, ... , cn].
A complex number is, all in all, a 2-dimensional number and, therefore, you can make this if you have 2D coordinates, which is perfect for geolocalization, but wouldn't be possible for 3D space coordinates, for instance.
Once you got this, you can easily compute the distance between two complex numbers c1 and c2 as dist12 = abs(c2 - c1). Doing this recursively you obtain the total distance.
Hope this helped!

Calculating distance between two points using latitude longitude and altitude (elevation)

I'm trying to calculate distance between two points, using latitude longitude and altitude (elevation).
I was using euklides formula in order to get my distance:
D=√((Long1-Long2)²+(Lat1-Lat2)²+(Alt1-Alt2)²)
My points are geographical coordinates and ofcourse altitude is my height above the sea.
I only have lat and lng, I'm using GOOGLE API Elevation to get my altitude.
I'm developing an application which calculates my traveled distance (on my skis). Every application which I have used, gets distance traveled with included altitude. Like #Endomondo or #Garmin I cannot get my distance in 2D space because true distances are going to vary from the ones I've returned.
Which formula would be the best to calculate my distance ? Ofcourse with included altitude.
I'm writing my app in Python, with PostGis.

You can calculate distance between flat coordinates in, say, meters by using geopy package or Vincenty's formula, pasting coordinates directly. Suppose the result is d meters. Then the total distance travelled is sqrt(d**2 + h**2) where h is the change in elevation in meters.

EDIT 2019: Since this answer, I composed a Q&A style example to answer similar questions (including this one as an example): How to calculate 3D distance (including altitude) between two points in GeoDjango.
In sort:
We need to calculate the 2D great-circle distance between 2 points using either the Haversine formula or the Vicenty formula and then we can combine it with the difference (delta) in altitude between the 2 points to calculate the Euclidean distance between them as follows:
dist = sqrt(great_circle((lat_1, lon_1), (lat_2, lon_2)).m**2, (alt_1 - alt_2)**2)
The solution assumes that the altitude is in meters and thus converts the great_circle's result into meters as well.
You can get the correct calculation by translating your coordinates from Polar (long, lat, alt) to Cartesian (x, y, z):
Let:
polar_point_1 = (long_1, lat_1, alt_1)
and polar_point_2 = (long_2, lat_2, alt_2)
Translate each point to it's Cartesian equivalent by utilizing this formula:
x = alt * cos(lat) * sin(long)
y = alt * sin(lat)
z = alt * cos(lat) * cos(long)
and you will have p_1 = (x_1, y_1, z_1) and p_2 = (x_2, y_2, z_2) points respectively.
Finally use the Euclidean formula:
dist = sqrt((x_2-x_1)**2 + (y_2-y_1)**2 + (z_2-z_1)**2)

I used the solution provided by John Moutafis but I didn't get a right answer.The formula needs some corrections. You will get the conversion of coordinates from Polar to Cartesian (x, y, z) at http://electron9.phys.utk.edu/vectors/3dcoordinates.htm.
Use the above formula to convert spherical coordinates(Polar) to Cartesian and calculate Euclidean distance.
I used the following c# in a console app.
Considering following dummy lat long
double lat_1 = 18.457793 * (Math.PI / 180);
double lon_1 = 73.3951930277778 *(Math.PI/180);
double alt_1 = 270.146;
double lat_2 = 18.4581253333333 * (Math.PI / 180);
double lon_2 = 73.3963755277778 * (Math.PI / 180);
double alt_2 = 317.473;
const Double r = 6376.5 *1000; // Radius of Earth in metres
double x_1 = r * Math.Sin(lon_1) * Math.Cos(lat_1);
double y_1 = r * Math.Sin(lon_1) * Math.Sin(lat_1);
double z_1 = r * Math.Cos(lon_1);
double x_2 = r * Math.Sin(lon_2) * Math.Cos(lat_2);
double y_2 = r * Math.Sin(lon_2) * Math.Sin(lat_2);
double z_2 = r * Math.Cos(lon_2);
double dist = Math.Sqrt((x_2 - x_1) * (x_2 - x_1) + (y_2 - y_1) *
(y_2 - y_1) + (z_2 - z_1) * (z_2 - z_1));

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

SciKits BallTree method gives me incorrect "nearest neighbor" - python

Related

Create NxM array from 2 arrays/lists of N and M size

pint: convert geographic CRS degrees to nautical miles

Converting GPS to Cartesian coordinates

Calculating geographic distance between a list of coordinates (lat, lng)

Calculating distance between two points using latitude longitude and altitude (elevation)

Categories

Resources