I have a pandas dataframe of lat/lng points created from a gps device.
My question is how to generate a distance column for the distance between each point in the gps track line.
Some googling has given me the haversine method below which works using single values selected using iloc, but i'm struggling on how to iterate over the dataframe for the method inputs.
I had thought I could run a for loop, with something along the lines of
for i in len(df):
df['dist'] = haversine(df['lng'].iloc[i],df['lat'].iloc[i],df['lng'].iloc[i+1],df['lat'].iloc[i+1]))
but I get the error TypeError: 'int' object is not iterable. I was also thinking about df.apply but I'm not sure how to get the appropriate inputs. Any help or hints. on how to do this would be appreciated.
Sample DF
lat lng
0 -7.11873 113.72512
1 -7.11873 113.72500
2 -7.11870 113.72476
3 -7.11870 113.72457
4 -7.11874 113.72444
Method
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
c = 2 * math.asin(math.sqrt(a))
km = 6367 * c
return km
are you looking for a result like this?
lat lon dist2next
0 -7.11873 113.72512 0.013232
1 -7.11873 113.72500 0.026464
2 -7.11873 113.72476 0.020951
3 -7.11873 113.72457 0.014335
4 -7.11873 113.72444 NaN
There's probably a clever way to use pandas.rolling_apply... but for a quick solution, I'd do something like this.
def haversine(loc1, loc2):
# convert decimal degrees to radians
lon1, lat1 = map(math.radians, loc1)
lon2, lat2 = map(math.radians, loc2)
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
c = 2 * math.asin(math.sqrt(a))
km = 6367 * c
return km
df['dist2next'] = np.nan
for i in df.index[:-1]:
loc1 = df.ix[i, ['lon', 'lat']]
loc2 = df.ix[i+1, ['lon', 'lat']]
df.ix[i, 'dist2next'] = haversine(loc1, loc2)
alternatively, if you don't want to modify your haversine function like that, you can just pick off lats and lons one at a time using df.ix[i, 'lon'], df.ix[i, 'lat'], df.ix[i+1, 'lon], etc.
I would recommande using a quicker variation of looping through a df such has
df_shift = df.shift(1)
df = df.join(df_shift, l_suffix="lag_")
log = []
for rows in df.itertuples():
log.append(haversine(rows.lng ,rows.lat, rows.lag_lng, rows.lag_lat))
pd.DataFrame(log)
Related
I have total 32 variables in dataframe,
X1 to X16 - Latitude values and
Y1 to Y16 - Longitude values for 16 different positions.
I want to perform following steps on these values using python,
calculate distance between each position (X1,Y1) with every other position. Do it for all the positions and then average the distance.
e.g., calculate distance between (X1,Y1) & (x2,y2), (X1,Y1) & (x3,y3), (x1,y1)&(x4,y4) etc - then average distance(A1)
calculate distance between (X2,Y2) & (x1,y1),(X2,Y2) & (x3,y3) etc - then average distance (A2)...etc
Finally i want to take the mean of A1+A2+...+A16 and insert in a column for corresponding rows.
I want to do it to compare the final column (mean of A's) with dependent variable.
I know there is something like following code to work with latitude and longitude but dont know how can i use it in my case.
vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
"""
slightly modified version: of http://stackoverflow.com/a/29546836/2901002
Calculate the great circle distance between two points
on the earth (specified in decimal degrees or in radians)
All (lat, lon) coordinates must have numeric dtypes and be of equal length.
"""
if to_radians:
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
a = np.sin((lat2-lat1)/2.0)**2 + \
np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
return earth_radius * 2 * np.arcsin(np.sqrt(a))
df['dist'] = haversine(df.LAT.shift(), df.LONG.shift().df.loc[1:, 'LAT'], df.loc[1:, 'LONG'])
The below should help you to find the distance between two coordinates:
# Python 3 program to calculate Distance Between Two Points on Earth
from math import radians, cos, sin, asin, sqrt
def distance(lat1, lat2, lon1, lon2):
# The math module contains a function named
# radians which converts from degrees to radians.
lon1 = radians(lon1)
lon2 = radians(lon2)
lat1 = radians(lat1)
lat2 = radians(lat2)
# Haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * asin(sqrt(a))
# Radius of earth in kilometers. Use 3956 for miles
r = 6371
# calculate the result
return(c * r)
# driver code
lat1 = 53.32055555555556
lat2 = 53.31861111111111
lon1 = -1.7297222222222221
lon2 = -1.6997222222222223
print(distance(lat1, lat2, lon1, lon2), "K.M")
To find the same, for all the positions, using a 'for' loop should help you. It can be there stored in a new column and the mean can be calculated.
Edited:
I am sure the below code will help you. I have created a sample dataset as per your requirement and worked on it. Since you are new to python, I did the whole code for you. Let me know if this is your requirement - attaching the sample dataset, code, and output for you.
Sample input/dataset : sample dataset that i created as per your requirement
Sample Output : sample output
import pandas as pd
from math import radians, cos, sin, asin, sqrt
df = pd.read_excel(r'sample.xlsx', engine='openpyxl')
#function to calculate the distance
def distance(lat1, lat2, lon1, lon2):
# The math module contains a function named
# radians which converts from degrees to radians.
lon1 = radians(lon1)
lon2 = radians(lon2)
lat1 = radians(lat1)
lat2 = radians(lat2)
# Haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * asin(sqrt(a))
# Radius of earth in kilometers. Use 3956 for miles
r = 6371
# calculate the result
return(c * r)
#driver code
#finds the number of rows in df
df_len = df.shape[0]
dist_list = []
#'for' loop that iterates through the every rows of the dataframe
for i in range(df_len):
dist_list = []
for j in range(df_len):
val1 = df.iloc[[i]]
lat1 = int(val1['x'])
lon1 = int(val1['y'])
val2 = df.iloc[[j]]
lat2 = int(val2['x'])
lon2 = int(val2['y'])
#function calling to calculate the distance between the (x1, y1) and (x2, y2), and so on.
dist_btwn = distance(lat1, lat2, lon1, lon2)
# appending the distance to a "dist_list"
dist_list.append(dist_btwn)
col_name = "dist between ({}, {}) and every other points".format(lat1,lon1)
df[col_name] = dist_list
#lets now print the dataframe
print(df)
I am new to Python. I am trying to calculate Haversine on a Panda Dataframe. I have 2 dataframes. Like this: First 3 rows of first dataframe
Second one: First 3 rows of second dataframe
Here is my haversine function.
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 3956 # Radius of earth in kilometers.
return c * r
I took the longitude and latitude values in the first dataframe as centers and drew circles on the map (I took the Radius as 1000m). First, I try to give all the lon and lat values in the second dataframe to the haversine function with the lon and lat values in the first row in the first dataframe. Then I'll do the same for the other rows in the first dataframe. Thus, I will be able to find out whether the coordinates (longitude and latitude values) in the second dataframe are located in circles with central longitude and latitude values in the first dataframe. It works when i use like this:
a = haversine(29.023165,40.992752,28.844604,41.113586)
radius = 1.00 # in kilometer
if a <= radius:
print('Inside the area')
else:
print('Outside the area')
In the codes I wrote, I could not give the exact order I wanted. I mean I tried my code by giving all the lon and lat values in the first dataframe and the second dataframe, but logically this is wrong (or unnecessary operation). I tried the below code (I tried the code Haversine Distance Calc using Pandas Data Frame "cannot convert the series to <class 'float'>") But it gives an error: ('LONGITUDE', 'occurred at index 0').
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 3956 # Radius of earth in kilometers.
return c * r
iskeleler.loc['density'] = iskeleler.apply(lambda row: haversine(iskeleler['lon'], iskeleler['lat'], row['LONGITUDE'], row['LATITUDE']), axis=1)
Can you help me with how I can do this? Thanks in advance.
The code you are using to calculate haversine distance receives one float in each argument, so indeed you need to pass floats for each argument. In this case iskeleler['lon'] and iskeleler['lat'] are Series.
This should work to calculate the distance between coordinates in the same row:
iskeleler.loc['density'] = iskeleler.apply(lambda row: haversine(
row['lon'], row['lat'],
row['LONGITUDE'], row['LATITUDE']
),axis=1)
But you are looking for a pair-wise distance which might require a for loop and this is not efficient. Try sklearn.metrics.pairwise.haversine_distances
from sklearn.metrics.pairwise import haversine_distances
distance_matrix = haversine_distances(
iskeleler[['lat', 'lon']],
iskeleler[['LATITUDE', 'LONGITUDE']]
)
If you prefer the table structure, then:
distance_table = pd.DataFrame(
distance_matrix,
index=pd.MultiIndex.from_frames(iskeleler[['lat', 'lon']]),
columns=pd.MultiIndex.from_frames(iskeleler[['LATITUDE', 'LONGITUDE']]),
).stack([0, 1]).reset_index(name='distance')
This is an example, there are many ways to create the dataframe from the matrix.
This question already has an answer here:
Efficient computation of minimum of Haversine distances
(1 answer)
Closed 2 years ago.
I am trying to find the minimum distance between each customer to the store. Currently, there are ~1500 stores and ~670K customers in my data. I have to calculate the geo distance for 670K customers x 1500 stores and find the minimum distance for each customer.
I have created the haversine function below:
import numpy as np
def haversine_np(lon1, lat1, lon2, lat2):
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
miles = 6367 * c/1.609
return miles
and my data set looks like below, 1 data frame for the customer (cst_geo) and 1 data frame for the store (store_geo). The numbers below are made up as I can't share the snippet of the real data:
Customer ID
Latitude
Longitude
A123
39.342
-40.800
B456
38.978
-41.759
C789
36.237
-77.348
Store ID
Latitude
Longitude
S1
59.342
-60.800
S2
28.978
-71.759
S3
56.237
-87.348
I wrote a for loop below to attempt this calculation but it took >8 hours to run. I have tried to use deco but wasn't able to optimize it any further.
mindist = []
for i in cst_geo.index:
dist = []
for j in store_geo.index:
dist.append(haversine_np(cst_geo.longitude[i], cst_geo.latitude[i],
store_geo.longitude[j], store_geo.latitude[j]))
mindist.append(min(dist))
This can be done with geopy
from geopy.distance import geodesic
customers = [
(39.342, -40.800),
(38.978, -41.759),
(36.237, -77.348),
]
stores = [
(59.342, -60.800),
(28.978, -71.759),
(56.237, -87.348),
]
matrix = [[None] * len(customers)] * len(stores)
for index, i in enumerate(customers):
for j_index, j in enumerate(stores):
matrix[j_index][index] = geodesic(i, j).meters
output
[[3861568.3809260903, 3831526.290564832, 2347407.258650098, 2347407.258650098],
[3861568.3809260903, 3831526.290564832, 2347407.258650098, 2347407.258650098],
[3861568.3809260903, 3831526.290564832, 2347407.258650098, 2347407.258650098]]
you can also have the distance in others units with kilometers, miles, feet ...
I want to find the lat, long combination with minimum distance. x_lat, x_long are constant. I want to get combinations of y_latitude, y_longitude and calculate the distance and find out the minimum distance and return the corresponding y_latitude, y_longitude.
The following is trying,
x_lat = 33.50194395
x_long = -112.048885
y_latitude = ['56.16', '33.211045400000003', '37.36']
y_longitude = ['-117.3700631', '-118.244']
I have a distance function which would return the distance,
from math import radians, cos, sin, asin, sqrt
def distance(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
So I tried something like the following,
dist = []
for i in itertools.product(y_latitude , y_longitude):
print i
dist.append(distance(float(i[1]),float(i[0]),float(x_long), float(x_lat)))
print dist.index(min(dist))
So this creates all possible combinations of y_latitude and y_longitude and calculates distance and returns the index of minimum distance. I am not able to make it return the corresponding y_latitude and y_longitude.
Here the index of minimum distance is 2 and output is 2. The required output is ('33.211045400000003', '-117.3700631'), which I am not able to make it return.
Can anybody help me in solving the last piece?
Thanks
Try this,
dist = []
for i in itertools.product(y_latitude , y_longitude):
dist.append([distance(float(i[1]),float(i[0]),float(x_long), float(x_lat)),i])
min_lat,min_lng = min(dist, key = lambda x: x[0])[1]
Append the lat and long along with the dist, And get min of first index,
I have the method below (haversine) that returns the distance between two gps points. Table below is my dataframe.
When I apply the function on the dataframe using, I get the error "cannot convert the series to ". Not sure whether i am missing something. Any help would be appreciated.
distdf1['distance'] = distdf1.apply(lambda x: haversine(distdf1['SLongitude'], distdf1['SLatitude'], distdf1['ClosestLong'], distdf1['ClosestLat']), axis=1)
Dataframe:
SLongitude SLatitude ClosestLong ClosestLat
0 -100.248093 25.756313 -98.220240 26.189491
1 -77.441536 38.991512 -77.481600 38.748722
2 -72.376370 40.898690 -73.662870 41.025640
Method:
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
Try:
distdf1.apply(lambda x: haversine(x['SLongitude'], x['SLatitude'], x['ClosestLong'], x['ClosestLat']), axis=1)