Pandas Google Distance Matrix API - Pass coordinates into URL - python

I am working with the Google Distance Matrix API, where I want to feed coordinates from a dataframe into the API and return the duration and distance between the two points.
Here is my dataframe:
import pandas as pd
import simplejson
import urllib
import numpy as np
Record orig_lat orig_lng dest_lat dest_lng
1 40.7484405 -74.0073127 40.7115242 -74.0145492
2 40.7421218 -73.9878531 40.7727216 -73.9863531
First, I need to combine the orig_lat & orig_lng and dest_lat & dest_lng into strings, which then pass into the url. So I've tried creating the variables orig_coord & dest_coord then passing them into the URL and returning values:
orig_coord = df[['orig_lat','orig_lng']].apply(lambda x: '{},{}'.format(x[0],x[1]), axis=1)
dest_coord = df[['dest_lat','dest_lng']].apply(lambda x: '{},{}'.format(x[0],x[1]), axis=1)
for row in df.itertuples():
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,end_coord)
result = simplejson.load(urllib.urlopen(url))
df['driving_time_text'] = result['rows'][0]['elements'][0]['duration']['text']
But I get the following error: "TypeError: () got an unexpected keyword argument 'axis'"
So my question is: how do I concatenate values from two columns into a string, then pass that string into a URL and output the result?
Thank you in advance!

Hmm, I am not sure how you constructed your data frame. Maybe post those details? But if you can live with referencing tuple elements positionally, this worked for me:
import pandas as pd
data = [{'orig_lat': 40.748441, 'orig_lng': -74.007313, 'dest_lat': 40.711524, 'dest_lng': -74.014549},
{'orig_lat': 40.742122, 'orig_lng': -73.987853, 'dest_lat': 40.772722, 'dest_lng': -73.986353}]
df = pd.DataFrame(data)
for row in df.itertuples():
orig_coord='{},{}'.format(row[1],row[2])
dest_coord='{},{}'.format(row[3],row[4])
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,dest_coord)
print url
produces
http://maps.googleapis.com/maps/api/distancematrix/json?origins=40.748441,-74.007313&destinations=40.711524,-74.014549&units=imperial&MYGOOGLEAPIKEY
http://maps.googleapis.com/maps/api/distancematrix/json?origins=40.742122,-73.987853&destinations=40.772722,-73.986353&units=imperial&MYGOOGLEAPIKEY
To update the data frame with the result, since row is a tuple and not writeable, you might want to keep track of the current index as you iterate. Maybe something like this:
data = [{'orig_lat': 40.748441, 'orig_lng': -74.007313, 'dest_lat': 40.711524, 'dest_lng': -74.014549, 'result': -1},
{'orig_lat': 40.742122, 'orig_lng': -73.987853, 'dest_lat': 40.772722, 'dest_lng': -73.986353, 'result': -1}]
df = pd.DataFrame(data)
i_row = 0
for row in df.itertuples():
orig_coord='{},{}'.format(row[1],row[2])
dest_coord='{},{}'.format(row[3],row[4])
url = "http://maps.googleapis.com/maps/api/distancematrix/json?origins={0}&destinations={1}&units=imperial&MYGOOGLEAPIKEY".format(orig_coord,dest_coord)
# Do stuff to get your result
df['result'][i_row] = result
i_row++

Related

Python columns function

I would like a function which will create columns through the modalities of the columns of a database.
These created columns will calculate the percent of the modalities for each observation
First you have to load data as JSON/Array.
import pandas as pd
hold_data = pd.read_csv (r'C:\Users\abdulalim\Desktop\Test\Products.csv')
hold_data.to_json (r'C:\Users\abdulalim\Desktop\Test\New_Products.json')
output: {"Product":{"0":"Desktop Computer","1":"Tablet","2":"Printer","3":"Laptop"},"Price":{"0":700,"1":250,"2":120,"3":1200}}
if data get as Array:
def get_data(request):
hold_data = request.data
for key in hold_data:
print(key[0])
print(key[1])
if data get as Json:
def get_data(request):
hold_data = request.data
hold_data['data1']
hold_data['data2']
After you get the data than do what ever you want

Joining time series by common date in Python (dataframe & series/list question)

Noob here. PLEASE FORGIVE ABYSMAL FORMATTING as I am still learning. I am trying to create a time series (a dataframe, I think?) that consists of three columns. One is a date column, the next is an inventory column, and the last is a price column.
I have pulled two separate series (date & inventory; date & price) and I want to meld the two series so that I can see three columns instead of two sets of two. This is my code.
import json
import numpy as np
import pandas as pd
from urllib.error import URLError, HTTPError
from urllib.request import urlopen
class EIAgov(object):
def __init__(self, token, series):
'''
Purpose:
Initialise the EIAgov class by requesting:
- EIA token
- id code(s) of the series to be downloaded
Parameters:
- token: string
- series: string or list of strings
'''
self.token = token
self.series = series
def __repr__(self):
return str(self.series)
def Raw(self, ser):
# Construct url
url = 'http://api.eia.gov/series/?api_key=' + self.token + '&series_id=' + ser.upper()
try:
# URL request, URL opener, read content
response = urlopen(url);
raw_byte = response.read()
raw_string = str(raw_byte, 'utf-8-sig')
jso = json.loads(raw_string)
return jso
except HTTPError as e:
print('HTTP error type.')
print('Error code: ', e.code)
except URLError as e:
print('URL type error.')
print('Reason: ', e.reason)
def GetData(self):
# Deal with the date series
date_ = self.Raw(self.series[0])
date_series = date_['series'][0]['data']
endi = len(date_series) # or len(date_['series'][0]['data'])
date = []
for i in range (endi):
date.append(date_series[i][0])
# Create dataframe
df = pd.DataFrame(data=date)
df.columns = ['Date']
# Deal with data
lenj = len(self.series)
for j in range (lenj):
data_ = self.Raw(self.series[j])
data_series = data_['series'][0]['data']
data = []
endk = len(date_series)
for k in range (endk):
data.append(data_series[k][1])
df[self.series[j]] = data
return df
if __name__ == '__main__':
tok = 'mytoken'
# Natural Gas - Weekly Storage
#
ngstor = ['NG.NW2_EPG0_SWO_R48_BCF.W'] # w/ several series at a time ['ELEC.REV.AL-ALL.M', 'ELEC.REV.AK-ALL.M', 'ELEC.REV.CA-ALL.M']
stordata = EIAgov(tok, ngstor)
print(stordata.GetData())
# Natural Gas - Weekly Prices
#
ngpx = ['NG.RNGC1.W'] # w/ several series at a time ['ELEC.REV.AL-ALL.M', 'ELEC.REV.AK-ALL.M', 'ELEC.REV.CA-ALL.M']
pxdata = EIAgov(tok, ngpx)
print(pxdata.GetData())
Note that 'mytoken' needs to be replaced by an eia.gov API key. I can get this to successfully create an output of two lists...but then to get the lists merged I tried to add this at the end:
joined_frame = pd.concat([ngstor, ngpx], axis = 1, sort=False)
print(joined_frame.GetData())
But I get an error
("TypeError: cannot concatenate object of type '<class 'list'>'; only Series and DataFrame objs are valid")
because apparently I don't know the difference between a list and a series.
How do I merge these lists by date column? Thanks very much for any help. (Also feel free to advise why I am terrible at formatting code correctly in this post.)
If you want to manipulate them as DataFrames in the rest of your code, you can transform ngstor and ngpx into DataFrames as follows:
import pandas as pd
# I create two lists that look like yours
ngstor = [[1,2], ["2020-04-03", "2020-05-07"]]
ngpx = [[3,4] , ["2020-04-03", "2020-05-07"]]
# I transform them to DataFrames
ngstor = pd.DataFrame({"value1": ngstor[0],
"date_col": ngstor[1]})
ngpx = pd.DataFrame({"value2": ngpx[0],
"date_col": ngpx[1]})
Then you can either use pandas.merge or pandas.concat :
# merge option
joined_framed = pd.merge(ngstor, ngpx, on="date_col",
how="outer")
# concat option
ngstor = ngstor.set_index("date_col")
ngpx = ngpx.set_index("date_col")
joined_framed = pd.concat([ngstor, ngpx], axis=1,
join="outer").reset_index()
The result will be:
date_col value1 value2
0 2020-04-03 1 3
1 2020-05-07 2 4

Extracting string with the help of function

Actually I have data frames of clickstream with about 4 million rows. I have many columns and two of them are based on URL and Domain. I have a dictionary and want to use it as a condition. For example: If the domain is equal to amazon.de and Url contains a Keyword pillow then the column will have a value pillow. And so on.
dictionary_keywords = {"amazon.de": "pillow", "rewe.com": "apple"}
ID Domain URL
1 amazon.de www.amazon.de/ssssssss/exapmle/pillow
2 rewe.de www.rewe.de/apple
The expected output should be the new column:
ID Domain URL New_Col
1 amazon.de www.amazon.de/ssssssss/exapmle/pillow pillow
2 rewe.de www.rewe.de/apple apple
I can manually use .str.contain method but need to define a function which takes the dictionary key and value as a condition.
Something like this df[df['domain] == 'amazon.de'] & df[df['url'].str.contains('pillow')
But I am not sure. I am new in this.
The way I prefer to solve this kind of problem is by using df.apply() by row (axis=1) with a custom function to deal with the logic.
import pandas as pd
dictionary_keywords = {"amazon.de": "Pillow", "rewe.de": "Apple"}
df = pd.DataFrame({
'Domain':['amazon.de','rewe.de'],
'URL':['www.amazon.de/ssssssss/exapmle/pillow', 'www.rewe.de/apple']
})
def f(row):
global dictionary_keywords
try:
url = row['URL'].lower()
domain = url.split('/')[0].strip('www.')
if dictionary_keywords[domain].lower() in url:
return dictionary_keywords[domain]
except Exception as e:
print(row.name, e)
return None #or False, or np.nan
df['New_Col'] = df.apply(f, axis=1)
Output:
print(df)
Domain URL New_Col
0 amazon.de www.amazon.de/ssssssss/exapmle/pillow Pillow
1 rewe.de www.rewe.de/apple Apple

mask function doesn't get rid of unwanted data

I'm working on a data frame taken from Adafruit IO and sadly some of my data is from a time when my project malfunctioned so some of the values are just equal NaN.
I tried to remove it by typing this code lines:
onlyValidData=temp_data.mask(temp_data['value'] =='NaN')
onlyValidData
This is data retreived from Adafruit IO Feed, getting analyzed by pandas, I tried using 'where' function too but it didn't work
my entire code is
import pandas as pd
temp_data = pd.read_json('https://io.adafruit.com/api/(...)')
light_data = pd.read_json('https://io.adafruit.com/api/(...)')
temp_data['created_at'] = pd.to_datetime(temp_data['created_at'], infer_datetime_format=True)
temp_data = temp_data.set_index('created_at')
light_data['created_at'] = pd.to_datetime(light_data['created_at'], infer_datetime_format=True)
light_data = light_data.set_index('created_at')
tempVals = pd.Series(temp_data['value'])
lightVals = pd.Series(light_data['value'])
onlyValidData=temp_data.mask(temp_data['value'] =='NaN')
onlyValidData
The output is all of my data for some reason, but it should be only the valid values.
Hey I think the issue here that you're looking for values equal to the string 'NaN', while actual NaN values aren't a string, or more specifically aren't anything.
Try using:
onlyValidData = temp_data.mask(temp_data['value'].isnull())
Edit: to remove rows rather than marking all values in that row as NaN:
onlyValidData = temp_data.dropna()

Create Proper Dataframe from SDMX Response, Python 3.6

I want to prepare dataset from the data available in http://stat.data.abs.gov.au/Index.aspx?DataSetCode=ATSI_BIRTHS_SUMM
Data API:
http://stat.data.abs.gov.au/restsdmx/sdmx.ashx/GetData/ATSI_BIRTHS_SUMM/1+4+5+7+8+9+10+13+14+15+18+19+20.IM+IB.0+1+2+3+4+5+6+7.A/all
from pandasdmx import Request
Agency_Code = 'ABS'
Dataset_Id = 'ATSI_BIRTHS_SUMM'
ABS = Request(Agency_Code)
data_response = ABS.data(resource_id='ATSI_BIRTHS_SUMM')
print(data_response.url)
DF = data_response.write(data_response.data.obs(with_values=True, with_attributes=True), parse_time=False)
Above gives error: ValueError: Type names and field names cannot be a keyword: 'None'
DF = data_response.write(data_response.data.series, parse_time=False), This works but Dimension items coming in column wise.
Support Links:
http://stat.data.abs.gov.au/restsdmx/sdmx.ashx/GetDataStructure/all
http://stat.data.abs.gov.au/restsdmx/sdmx.ashx/GetDataStructure/ATSI_BIRTHS_SUMM
http://stat.data.abs.gov.au/Index.aspx?DataSetCode=ATSI_BIRTHS_SUMM
Please suggest better way to retrieve data.
Your example
DF = data_response.write(data_response.data.series, parse_time=False)
Produces a stacked DataFrame, by unstack().reset_index() you will get a "flat" DataFrame.
data_response.write().unstack().reset_index()
MEASURE INDIGENOUS_STATUS ASGS_2011 FREQUENCY TIME_PERIOD 0
0 1 IM 0 A 2001 8334.0
Is this what you are looking for?

Categories

Resources