Haversine Function using Pandas Data Frame

Haversine Function using Pandas Data Frame - python

I am new to Python. I am trying to calculate Haversine on a Panda Dataframe. I have 2 dataframes. Like this: First 3 rows of first dataframe
Second one: First 3 rows of second dataframe
Here is my haversine function.
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 3956 # Radius of earth in kilometers.
return c * r
I took the longitude and latitude values in the first dataframe as centers and drew circles on the map (I took the Radius as 1000m). First, I try to give all the lon and lat values in the second dataframe to the haversine function with the lon and lat values in the first row in the first dataframe. Then I'll do the same for the other rows in the first dataframe. Thus, I will be able to find out whether the coordinates (longitude and latitude values) in the second dataframe are located in circles with central longitude and latitude values in the first dataframe. It works when i use like this:
a = haversine(29.023165,40.992752,28.844604,41.113586)
radius = 1.00 # in kilometer
if a <= radius:
print('Inside the area')
else:
print('Outside the area')
In the codes I wrote, I could not give the exact order I wanted. I mean I tried my code by giving all the lon and lat values in the first dataframe and the second dataframe, but logically this is wrong (or unnecessary operation). I tried the below code (I tried the code Haversine Distance Calc using Pandas Data Frame "cannot convert the series to <class 'float'>") But it gives an error: ('LONGITUDE', 'occurred at index 0').
from math import radians, cos, sin, asin, sqrt
def haversine(lon1, lat1, lon2, lat2):
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
r = 3956 # Radius of earth in kilometers.
return c * r
iskeleler.loc['density'] = iskeleler.apply(lambda row: haversine(iskeleler['lon'], iskeleler['lat'], row['LONGITUDE'], row['LATITUDE']), axis=1)
Can you help me with how I can do this? Thanks in advance.

The code you are using to calculate haversine distance receives one float in each argument, so indeed you need to pass floats for each argument. In this case iskeleler['lon'] and iskeleler['lat'] are Series.
This should work to calculate the distance between coordinates in the same row:
iskeleler.loc['density'] = iskeleler.apply(lambda row: haversine(
row['lon'], row['lat'],
row['LONGITUDE'], row['LATITUDE']
),axis=1)
But you are looking for a pair-wise distance which might require a for loop and this is not efficient. Try sklearn.metrics.pairwise.haversine_distances
from sklearn.metrics.pairwise import haversine_distances
distance_matrix = haversine_distances(
iskeleler[['lat', 'lon']],
iskeleler[['LATITUDE', 'LONGITUDE']]
)
If you prefer the table structure, then:
distance_table = pd.DataFrame(
distance_matrix,
index=pd.MultiIndex.from_frames(iskeleler[['lat', 'lon']]),
columns=pd.MultiIndex.from_frames(iskeleler[['LATITUDE', 'LONGITUDE']]),
).stack([0, 1]).reset_index(name='distance')
This is an example, there are many ways to create the dataframe from the matrix.

Related

Pandas dataframe : working with Latitude and longitude features

I have total 32 variables in dataframe,
X1 to X16 - Latitude values and
Y1 to Y16 - Longitude values for 16 different positions.
I want to perform following steps on these values using python,
calculate distance between each position (X1,Y1) with every other position. Do it for all the positions and then average the distance.
e.g., calculate distance between (X1,Y1) & (x2,y2), (X1,Y1) & (x3,y3), (x1,y1)&(x4,y4) etc - then average distance(A1)
calculate distance between (X2,Y2) & (x1,y1),(X2,Y2) & (x3,y3) etc - then average distance (A2)...etc
Finally i want to take the mean of A1+A2+...+A16 and insert in a column for corresponding rows.
I want to do it to compare the final column (mean of A's) with dependent variable.
I know there is something like following code to work with latitude and longitude but dont know how can i use it in my case.
vectorized haversine function
def haversine(lat1, lon1, lat2, lon2, to_radians=True, earth_radius=6371):
"""
slightly modified version: of http://stackoverflow.com/a/29546836/2901002
Calculate the great circle distance between two points
on the earth (specified in decimal degrees or in radians)
All (lat, lon) coordinates must have numeric dtypes and be of equal length.
"""
if to_radians:
lat1, lon1, lat2, lon2 = np.radians([lat1, lon1, lat2, lon2])
a = np.sin((lat2-lat1)/2.0)**2 + \
np.cos(lat1) * np.cos(lat2) * np.sin((lon2-lon1)/2.0)**2
return earth_radius * 2 * np.arcsin(np.sqrt(a))
df['dist'] = haversine(df.LAT.shift(), df.LONG.shift().df.loc[1:, 'LAT'], df.loc[1:, 'LONG'])

The below should help you to find the distance between two coordinates:
# Python 3 program to calculate Distance Between Two Points on Earth
from math import radians, cos, sin, asin, sqrt
def distance(lat1, lat2, lon1, lon2):
# The math module contains a function named
# radians which converts from degrees to radians.
lon1 = radians(lon1)
lon2 = radians(lon2)
lat1 = radians(lat1)
lat2 = radians(lat2)
# Haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * asin(sqrt(a))
# Radius of earth in kilometers. Use 3956 for miles
r = 6371
# calculate the result
return(c * r)
# driver code
lat1 = 53.32055555555556
lat2 = 53.31861111111111
lon1 = -1.7297222222222221
lon2 = -1.6997222222222223
print(distance(lat1, lat2, lon1, lon2), "K.M")
To find the same, for all the positions, using a 'for' loop should help you. It can be there stored in a new column and the mean can be calculated.
Edited:
I am sure the below code will help you. I have created a sample dataset as per your requirement and worked on it. Since you are new to python, I did the whole code for you. Let me know if this is your requirement - attaching the sample dataset, code, and output for you.
Sample input/dataset : sample dataset that i created as per your requirement
Sample Output : sample output
import pandas as pd
from math import radians, cos, sin, asin, sqrt
df = pd.read_excel(r'sample.xlsx', engine='openpyxl')
#function to calculate the distance
def distance(lat1, lat2, lon1, lon2):
# The math module contains a function named
# radians which converts from degrees to radians.
lon1 = radians(lon1)
lon2 = radians(lon2)
lat1 = radians(lat1)
lat2 = radians(lat2)
# Haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * asin(sqrt(a))
# Radius of earth in kilometers. Use 3956 for miles
r = 6371
# calculate the result
return(c * r)
#driver code
#finds the number of rows in df
df_len = df.shape[0]
dist_list = []
#'for' loop that iterates through the every rows of the dataframe
for i in range(df_len):
dist_list = []
for j in range(df_len):
val1 = df.iloc[[i]]
lat1 = int(val1['x'])
lon1 = int(val1['y'])
val2 = df.iloc[[j]]
lat2 = int(val2['x'])
lon2 = int(val2['y'])
#function calling to calculate the distance between the (x1, y1) and (x2, y2), and so on.
dist_btwn = distance(lat1, lat2, lon1, lon2)
# appending the distance to a "dist_list"
dist_list.append(dist_btwn)
col_name = "dist between ({}, {}) and every other points".format(lat1,lon1)
df[col_name] = dist_list
#lets now print the dataframe
print(df)

I cannot create new column in my data series in python for an assignment on the haversine formula

I am trying to create a new column to manipulate the data set:
df['longitude'] = df['longitude'].astype(float)
df['latitude'] = df['latitude'].astype(float)
then ran the function for haversine:
from math import radians, cos, sin, asin, sqrt
def haversine(lon1,lat1,lat2,lon2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
But when I run this code :
df['d_centre']=haversine(lon1,
lat1,
df.longitude.astype(float),
df.latitude.astype(float))
to create a new column in my df I get this error:
Error: cannot convert the series to <class 'float'>
I tried this as well:
df['d_centre']= haversine(lon1,lat1,lat2,lon2)
the haversine is working but when I try to create the new column in my df, I get this error. I have tried converting to a list as well but I'm getting the same result

I figured out the answer: have to use numpy for all the math and write the code for the new column with the df
from math import radians, cos, sin, asin, sqrt
def haversine_np(lon1,lat1,lon2,lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
c = 2 * np.arcsin(np.sqrt(a))
km = 6367 * c
return km
Create a new column:
df2['d_centre'] =haversine_np(df2['lon1'],df2['lat1'],df2['lon2'],df2['lat2'])

Issues with data types in pandas functions

I am trying to design a function in python pandas that inputs the lon/lat of the origin and the lon/lat of the destination and outputs the haversine’s distance between the two coordinates (in miles), and then append this distance in a column next to my original values in the dataframe. My original dataset in pandas has a comlumn for lat1, lon1, lat2, lon2, with 5-6 coordinates below. I used the code below, and everything seemed to be running swimmingly (the code worked at first), until I got an error message TypeError: input must be an array, list, tuple or scalar. It appears that this issue is occurring because the numbers I am feeding into my function are in pd.series format, and I have tried using the np.array function to convert them into an array. The error still come up though.
Does anyone have any ideas about how to fix this issue?
Thanks so much,
MG
Here is my code:
import numpy as np
def haversine_np(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
All args must be of equal length.
"""
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2.0)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2.0)**2
c = 2 * np.arcsin(np.sqrt(a))
r = 3956 # Radius of earth in kilometers. Use 3956 for miles
return c * r
import numpy as np
import pandas as pd
#reads from csv created externally
df2 = pd.read_csv('\Users\grossmanmb\Desktop\PythonGPS\locations.csv')
print(df2)
# creates dataframe with haversine formula from origin to destination
'''
# calculates distance between location in lat1/lon1 and lat2/lon2 for every row in dataframe
# this dataframe is setup so that the origin in every case is "home"
# IOW, the distance that is caluclated depends on what lat1/lon1 and lat2/lon2 are for each row
'''
df3 = { 'distance(m)' : (haversine_np(df2['longitude1'],df2['latitude1'],df2['longitude2'],df2['latitude2']))}
print(df3)
# appends haversine dostance column to original dataframe
df2['distance(m)']= df3['distance(m)']
print(df2)
Sample data:
df6 = pd.DataFrame({'origin':['home','home','home'], 'destination':['Bethesda','Rockville', 'Washington'], 'latitude1' :[37.99, 37.99, 37.99], 'longitude1' :[-77.17, -77.17, -77.17], 'latitude2' :[38.98, 39.08, 38.89], 'longitude2' :[-77.09, -77.15, -77.02]})
In [56]: df6
Out[56]:
destination latitude1 latitude2 longitude1 longitude2 origin
0 Bethesda 37.99 38.98 -77.17 -77.09 home
1 Rockville 37.99 39.08 -77.17 -77.15 home
2 Washington 37.99 38.89 -77.17 -77.02 home

Returning the lat long with minimum distance in python

I want to find the lat, long combination with minimum distance. x_lat, x_long are constant. I want to get combinations of y_latitude, y_longitude and calculate the distance and find out the minimum distance and return the corresponding y_latitude, y_longitude.
The following is trying,
x_lat = 33.50194395
x_long = -112.048885
y_latitude = ['56.16', '33.211045400000003', '37.36']
y_longitude = ['-117.3700631', '-118.244']
I have a distance function which would return the distance,
from math import radians, cos, sin, asin, sqrt
def distance(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2
c = 2 * asin(sqrt(a))
km = 6367 * c
return km
So I tried something like the following,
dist = []
for i in itertools.product(y_latitude , y_longitude):
print i
dist.append(distance(float(i[1]),float(i[0]),float(x_long), float(x_lat)))
print dist.index(min(dist))
So this creates all possible combinations of y_latitude and y_longitude and calculates distance and returns the index of minimum distance. I am not able to make it return the corresponding y_latitude and y_longitude.
Here the index of minimum distance is 2 and output is 2. The required output is ('33.211045400000003', '-117.3700631'), which I am not able to make it return.
Can anybody help me in solving the last piece?
Thanks

Try this,
dist = []
for i in itertools.product(y_latitude , y_longitude):
dist.append([distance(float(i[1]),float(i[0]),float(x_long), float(x_lat)),i])
min_lat,min_lng = min(dist, key = lambda x: x[0])[1]
Append the lat and long along with the dist, And get min of first index,

Iterate over Pandas index pairs [0,1],[1,2][2,3]

I have a pandas dataframe of lat/lng points created from a gps device.
My question is how to generate a distance column for the distance between each point in the gps track line.
Some googling has given me the haversine method below which works using single values selected using iloc, but i'm struggling on how to iterate over the dataframe for the method inputs.
I had thought I could run a for loop, with something along the lines of
for i in len(df):
df['dist'] = haversine(df['lng'].iloc[i],df['lat'].iloc[i],df['lng'].iloc[i+1],df['lat'].iloc[i+1]))
but I get the error TypeError: 'int' object is not iterable. I was also thinking about df.apply but I'm not sure how to get the appropriate inputs. Any help or hints. on how to do this would be appreciated.
Sample DF
lat lng
0 -7.11873 113.72512
1 -7.11873 113.72500
2 -7.11870 113.72476
3 -7.11870 113.72457
4 -7.11874 113.72444
Method
def haversine(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(math.radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
c = 2 * math.asin(math.sqrt(a))
km = 6367 * c
return km

are you looking for a result like this?
lat lon dist2next
0 -7.11873 113.72512 0.013232
1 -7.11873 113.72500 0.026464
2 -7.11873 113.72476 0.020951
3 -7.11873 113.72457 0.014335
4 -7.11873 113.72444 NaN
There's probably a clever way to use pandas.rolling_apply... but for a quick solution, I'd do something like this.
def haversine(loc1, loc2):
# convert decimal degrees to radians
lon1, lat1 = map(math.radians, loc1)
lon2, lat2 = map(math.radians, loc2)
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = math.sin(dlat/2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon/2)**2
c = 2 * math.asin(math.sqrt(a))
km = 6367 * c
return km
df['dist2next'] = np.nan
for i in df.index[:-1]:
loc1 = df.ix[i, ['lon', 'lat']]
loc2 = df.ix[i+1, ['lon', 'lat']]
df.ix[i, 'dist2next'] = haversine(loc1, loc2)
alternatively, if you don't want to modify your haversine function like that, you can just pick off lats and lons one at a time using df.ix[i, 'lon'], df.ix[i, 'lat'], df.ix[i+1, 'lon], etc.

I would recommande using a quicker variation of looping through a df such has
df_shift = df.shift(1)
df = df.join(df_shift, l_suffix="lag_")
log = []
for rows in df.itertuples():
log.append(haversine(rows.lng ,rows.lat, rows.lag_lng, rows.lag_lat))
pd.DataFrame(log)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Haversine Function using Pandas Data Frame - python

Related

Pandas dataframe : working with Latitude and longitude features

I cannot create new column in my data series in python for an assignment on the haversine formula

Issues with data types in pandas functions

Returning the lat long with minimum distance in python

Iterate over Pandas index pairs [0,1],[1,2][2,3]

Categories

Resources