Why does geolocate not give me the right addresses? - python

So I was analyzing a data set with addresses in Philadelphia, PA. Now, in order to make use of these, I wanted to get the exact longitude and latitude to later show them on a map.
I have gotten the unique entries of the column as a list and have implemented a loop to get me the longitude and latitude, though it's giving me the same coordinates for every city and sometimes even ones that are outside of Philadelphia.
Here's what I did so far:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="my_user_agent")
geocode = lambda query: geolocator.geocode("%s, Philadelphia PA" % query)
cities = list(philly["station_name"].unique())
for city in cities:
address = city
location = geolocator.geocode(address)
if(location != None):
philly["longitude"] = location.longitude
philly["latitude"] = location.latitude
philly["coordinates"] = list(zip(philly["latitude"], philly["longitude"]))

If "philly" is a list of dictionary objects then you can iterate over the list and add the location properties to each record.
from geopy.geocoders import Nominatim
philly = [{'station_name': '30th Street Station'}]
geolocator = Nominatim(user_agent="my_user_agent")
for row in philly:
address = row["station_name"]
location = geolocator.geocode(f"{address}, Philadelphia, PA", country_codes="us")
if location:
print(address)
print(">>", location.longitude, location.latitude)
row["longitude"] = location.longitude
row["latitude"] = location.latitude
row["coordinates"] = (location.longitude, location.latitude)
print(philly)
Output:
30th Street Station
>> -75.1821442 39.9552836
[{'station_name': '30th Street Station', 'longitude': -75.1821442, 'latitude': 39.9552836, 'coordinates': (-75.1821442, 39.9552836)}]
If working with a Pandas dataframe then you can iterate over each record in the dataframe then set the latitude, longitude and coordinates fields in it.
You can do something like this:
from geopy.geocoders import Nominatim
import pandas as pd
geolocator = Nominatim(user_agent="my_user_agent")
philly = [{'station_name': '30th Street Station'}]
df = pd.DataFrame(philly)
# add empty location columns to data frame
df["latitude"] = ""
df["longitude"] = ""
df["coordinates"] = ""
for _, row in df.iterrows():
address = row.station_name
location = geolocator.geocode(f"{address}, Philadelphia, PA", country_codes="us")
if location:
row["latitude"] = location.latitude
row["longitude"] = location.longitude
row["coordinates"] = (location.longitude, location.latitude)
print(df)
Output:
station_name latitude longitude coordinates
0 30th Street Station 39.955284 -75.182144 (-75.1821442, 39.9552836)
If you have a list with duplicate station names then you should cache the results so you don't make duplicate geolocation requests.

Related

How to get the continent given the coordinates (latitude and longitude) in Python?

Is there a method that allows to get the continent where it is in place given its coordinates (without an API key)?
I'm using:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent='...')
location = geolocator.reverse('51.0456448, 3.7273618')
print(location.address)
print((location.latitude, location.longitude))
print(location.raw)
But it does not return the continent. Even giving a place name and using geolocator.geocode() doesn't work. Besides, even giving a name and using:
import urllib.error, urllib.request, urllib.parse
import json
target = 'http://py4e-data.dr-chuck.net/json?'
local = 'Paris'
url = target + urllib.parse.urlencode({'address': local, 'key' : 42})
data = urllib.request.urlopen(url).read()
js = json.loads(data)
print(json.dumps(js, indent=4))
Doesn't work either.
A bit late, but for future reference and those who could need it, like me recently, here is one way to do it with Wikipedia and the use of Pandas, requests and geopy:
import pandas as pd
import requests
from geopy.geocoders import Nominatim
URLS = {
"Africa": "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Africa",
"Asia": "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Asia",
"Europe": "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Europe",
"North America": "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_North_America",
"Ocenia": "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_Oceania",
"South America": "https://en.wikipedia.org/wiki/List_of_sovereign_states_and_dependent_territories_in_South_America",
}
def get_continents_and_countries() -> dict[str, str]:
"""Helper function to get countries and corresponding continents.
Returns:
Dictionary where keys are countries and values are continents.
"""
df_ = pd.concat(
[
pd.DataFrame(
pd.read_html(
requests.get(url).text.replace("<br />", ";"),
match="Flag",
)[0]
.pipe(
lambda df_: df_.rename(
columns={col: i for i, col in enumerate(df_.columns)}
)
)[2]
.str.split(";;")
.apply(lambda x: x[0])
)
.assign(continent=continent)
.rename(columns={2: "country"})
for continent, url in URLS.items()
]
).reset_index(drop=True)
df_["country"] = (
df_["country"]
.str.replace("*", "", regex=False)
.str.split("[")
.apply(lambda x: x[0])
).str.replace("\xa0", "")
return dict(df_.to_dict(orient="split")["data"])
def get_location_of(coo: str, data: dict[str, str]) -> tuple[str, str, str]:
"""Function to get the country of given coordinates.
Args:
coo: coordinates as string ("lat, lon").
data: input dictionary of countries and continents.
Returns:
Tuple of coordinates, country and continent (or Unknown if country not found).
"""
geolocator = Nominatim(user_agent="stackoverflow", timeout=25)
country: str = (
geolocator.reverse(coo, language="en-US").raw["display_name"].split(", ")[-1]
)
return (coo, country, data.get(country, "Unknown"))
Finally:
continents_and_countries = get_continents_and_countries()
print(get_location_of("51.0456448, 3.7273618", continents_and_countries))
# Output
('51.0456448, 3.7273618', 'Belgium', 'Europe')

How to handle multiple missing keys in a dict?

I'm using an API to get basic information about shops in my area, name of shop, address, postcode, phone number etc… The API returns back a long list about each shop, but I only want some of the data from each shop.
I created a for loop that just takes the information that I want for every shop that the API has returned. This all works fine.
Problem is not all shops have a phone number or a website, so I get a KeyError because the key website does not exist in every return of a shop. I tried to use try and except which works but only if I only handle one thing, but a shop might not have a phone number and a website, which leads to a second KeyError.
What can I do to check for every key in my for loop and if a key is found missing to just add the value "none"?
My code:
import requests
import geocoder
import pprint
g = geocoder.ip('me')
print(g.latlng)
latitude, longitude = g.latlng
URL = "https://discover.search.hereapi.com/v1/discover"
latitude = xxxx
longitude = xxxx
api_key = 'xxxxx' # Acquire from developer.here.com
query = 'food'
limit = 12
PARAMS = {
'apikey':api_key,
'q':query,
'limit': limit,
'at':'{},{}'.format(latitude,longitude)
}
# sending get request and saving the response as response object
r = requests.get(url = URL, params = PARAMS)
data = r.json()
#print(data)
for x in data['items']:
title = x['title']
address = x['address']['label']
street = x['address']['street']
postalCode = x['address']['postalCode']
position = x['position']
access = x['access']
typeOfBusiness = x['categories'][0]['name']
contacts = x['contacts'][0]['phone'][0]['value']
try:
website = x['contacts'][0]['www'][0]['value']
except KeyError:
website = "none"
resultList = {
'BUSINESS NAME:':title,
'ADDRESS:':address,
'STREET NAME:':street,
'POSTCODE:':postalCode,
'POSITION:':position,
'POSITSION2:':access,
'TYPE:':typeOfBusiness,
'PHONE:':contacts,
'WEBSITE:':website
}
print("--"*80)
pprint.pprint( resultList)
I think a good way to handle it would be to use the operator.itemgetter() to create a callable the will attempt to retrieve all the keys at once, and if any aren't found, it will generate a KeyError.
A short demonstration of what I mean:
from operator import itemgetter
test_dict = dict(name="The Shop", phone='123-45-6789', zipcode=90210)
keys = itemgetter('name', 'phone', 'zipcode')(test_dict)
print(keys) # -> ('The Shop', '123-45-6789', 90210)
keys = itemgetter('name', 'address', 'phone', 'zipcode')(test_dict)
# -> KeyError: 'address'

How can I loop through a dataframe to create separate Folium maps?

I'm trying to examine the sushi venues within 5 different cities, using foursqaure.
I can get the data and filter it correctly. Code below.
city = {'City':['Brunswick','Auckland','Wellington','Christchurch','Hamilton','Ponsonby'],
'Latitude':[-37.7670,-36.848461,-41.28664,-43.55533,-37.78333,-36.8488],
'Longitude':[144.9621,174.763336,174.77557,172.63333,175.28333,174.7381]}
df_location= pd.DataFrame(city, columns = ['City','Latitude','Longitude'])
def getNearbyVenues(names, latitudes, longitudes, radius=2000, LIMIT=100):
venues_list=[]
for name, lat, lng in zip(names, latitudes, longitudes):
# create the API request URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
CLIENT_ID,
CLIENT_SECRET,
VERSION,
lat,
lng,
radius,
LIMIT,
"4bf58dd8d48988d1d2941735")
# make the GET request
results = requests.get(url).json()["response"]['groups'][0]['items']
venues_list.append([(
name,
v['venue']['name'],
v['venue']['location']['lat'],
v['venue']['location']['lng']) for v in results])
nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
nearby_venues.columns = [
'City',
'Venue',
'Venue Latitude',
'Venue Longitude',]
return(nearby_venues)
sushi_venues = getNearbyVenues(names = df_location['City'],
latitudes = df_location['Latitude'],
longitudes = df_location['Longitude'])
cities = df_location["City"]
latitude = df_location["Latitude"]
longitude = df_location["Longitude"]
I'm getting stuck on creating the maps and I'm not sure how I should iterate through the cities to create a map for each.
Here's the code I have.
maps = {}
for city in cities:
maps[city] = folium.Map(location = [latitude, longitude],zoom_start=10)
for lat, lng, neighborhood in zip(sushi_venues['Venue Latitude'], sushi_venues['Venue Longitude'], sushi_venues['Venue']):
label = '{}'.format(neighborhood)
label = folium.Popup(label, parse_html = True)
folium.CircleMarker(
[lat, lng],
radius = 5,
popup = label,
color = 'blue',
fill = True,
fill_color = '#3186cc',
fill_opacity = 0.7,
parse_html = False).add_to(maps[city])
maps[cities[0]]
For this code, 'maps[cities[0]]' brings up a blank folio map.
If I change the code to reference the row of the city in df_location, e.g
maps = {}
for city in cities:
maps[city] = folium.Map(location = [latitude[0], longitude[0],zoom_start=10)
Then 'maps[cities[0]]' brings up a correctly labeled Folio map of Brunswick with the corresponding venues marked.
So my question is, how can I correctly iterate through all 5 cities, so that I can pull a new map for each without changing the location each time? I'm unable to zip the locations because it needs to be a single lat/long to initialize the Folium map.
Thanks so much for your help!

Osmnx: How to retreive info on bus-stop info node, which is part of a highway?

I am trying to also show info of OpenStreetMap bus-stop node 439460636 (https://www.openstreetmap.org/node/439460636) which is part of a highway.
I am using Python3 Osmnx
Other POIs all show perfectly. Just not the ones which are not maped as a 'amenity'. (There are more examples)
I am using jupyter notebook for my analysis:
import osmnx as ox
# Retrieve POI shelters
place_name = 'Santa Clara, Santa Clara County, California, USA'
shelter = ox.pois_from_place(place_name, amenities=['shelter'])
cols = ['amenity', 'name', 'element_type', 'shelter_type',
'building', 'network'
]
shelter[cols]
cols = ['amenity', 'name','element_type', 'shelter_type',
'building', 'network'
]
shelter[cols].loc[(shelter['shelter_type'] == 'public_transport') ]
# Look bus-stop in highway
graph = ox.graph_from_place(place_name)
nodes, edges = ox.graph_to_gdfs(graph)
nodes.loc[(nodes['highway'] == 'bus_stop') ]
Overpass:
[out:json][timeout:25];
// gather results
(
area[name="Santa Clara, Santa Clara County, California, USA"];
node(area)["highway"="bus_stop"]({{bbox}});
);
// print results
out body;
>;
out skel qt;
The POI Kino (439460636) is not listed. The shelter right next to the POI is listed. The POI is in the middle of my area, so I do not understand how I can retreive the node info. Can you help?
Manually update Osmnx with the file linked in this post from chesterharvey. https://github.com/gboeing/osmnx/issues/116#issuecomment-439577495
Final testing of feature still incomplete!
import osmnx as ox
# Specify the name that is used to seach for the data
place_name = "Santa Clara, Santa Clara County, California, USA"
tags = {
'amenity':True,
'leisure':True,
'landuse':['retail','commercial'],
'highway':'bus_stop',
}
all_pois = ox.pois_from_place(place=place_name, tags=tags)
all_pois.loc[(all_pois['highway'] == 'bus_stop')]
This functionality as been added to OSMnx as of v0.13.0. It generalizes the POIs module to query using a tags dict instead of an amenities list. It removes the amenities parameter from all POI functions. The tags dict accepts key:value pairs of the form:
'tag' : True (use bool to retrieve all items with tag)
'tag' : 'value' (use string to retrieve all items with tag = value)
'tag' : ['value1', 'value2', etc] (use list to retrieve all items with tag equal to either value1 or value2 etc.
Usage examples of the new POI querying functionality:
import osmnx as ox
ox.config(use_cache=True, log_console=True)
tags = {'amenity' : True,
'landuse' : ['retail', 'commercial'],
'highway' : 'bus_stop'}
gdf = ox.pois_from_place(place='Piedmont, California, USA', tags=tags)
You can do that easily with footprints I think:
#point of interests around an aread
import networkx as nx
import osmnx as ox
import requests
#returns polygon or coordinates of poi
#point = (59.912390, 10.750584)
#amn = ["bus_station",'waste_transfer_station'] #["bus_station",'waste_transfer_station']
#points of interest/amenities we can use: https://wiki.openstreetmap.org/wiki/Key:amenity
def get_interest_points(long,lat,dist,amn[]):
point = (long, lat)
gdf_points = ox.pois_from_point(point, distance=dist, amenities=amn)
return gdf_points[["amenity", "geometry"]]
#Get bus buildings, distance in meter 400 is minimum
#returns polygon of building
def get_buildings(long,lat,dist):
point = (long, lat)
gdf = ox.footprints.footprints_from_point(point=point, distance=dist,footprint_type='buildings')
return gdf["geometry"]
#Get bus, tram or subway
#type = "bus" or "tram" or "subway"
#, distance in meter 400 is minimum
#returns polygon of stop
def get_buildings(long,lat,dist,type):
point = (long, lat)
gdf = ox.footprints.footprints_from_point(point=point, distance=dist,footprint_type=type)
return gdf["geometry"]

In Python, trying to convert geocoded tsv file into geojson format

trying to convert a geocoded TSV file into JSON format but i'm having trouble with it. Here's the code:
import geojson
import csv
def create_map(datafile):
geo_map = {"type":"FeatureCollection"}
item_list = []
datablock = list(csv.reader(datafile))
for i, line in enumerate(datablock):
data = {}
data['type'] = 'Feature'
data['id'] = i
data['properties']={'title': line['Movie Title'],
'description': line['Amenities'],
'date': line['Date']}
data['name'] = {line['Location']}
data['geometry'] = {'type':'Point',
'coordinates':(line['Lat'], line['Lng'])}
item_list.append(data)
for point in item_list:
geo_map.setdefault('features', []).append(point)
with open("thedamngeojson.geojson", 'w') as f:
f.write(geojson.dumps(geo_map))
create_map('MovieParksGeocode2.tsv')
I'm getting a TypeError:list indices must be integers, not str on the data['properties'] line but I don't understand, isn't that how I set values to the geoJSON fields?
The file I'm reading from has values under these keys: Location Movie Title Date Amenities Lat Lng
The file is viewable here: https://github.com/yongcho822/Movies-in-the-park/blob/master/MovieParksGeocodeTest.tsv
Thanks guys, much appreciated as always.
You have a couple things going on here that need to get fixed.
1.Your TSV contains newlines with double quotes. I don't think this is intended, and will cause some problems.
Location Movie Title Date Amenities Formatted_Address Lat Lng
"
Edgebrook Park, Chicago " A League of Their Own 7-Jun "
Family friendly activities and games. Also: crying is allowed." Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA 41.9998876 -87.7627672
"
2.You don't need the geojson module to dump out JSON - which is all GeoJSON is. Just import json instead.
3.You are trying to read a TSV, but you don't include the delimiter=\t option that is needed for that.
4.You are trying to read keys off the rows, but you aren't using DictReader which does that for you.Hence the TypeError about indices you mention above.
Check out my revised code block below..you still need to fix your TSV to be a valid TSV.
import csv
import json
def create_map(datafile):
geo_map = {"type":"FeatureCollection"}
item_list = []
with open(datafile,'r') as tsvfile:
reader = csv.DictReader(tsvfile,delimiter='\t')
for i, line in enumerate(reader):
print line
data = {}
data['type'] = 'Feature'
data['id'] = i
data['properties']={'title': line['Movie Title'],
'description': line['Amenities'],
'date': line['Date']}
data['name'] = {line['Location']}
data['geometry'] = {'type':'Point',
'coordinates':(line['Lat'], line['Lng'])}
item_list.append(data)
for point in item_list:
geo_map.setdefault('features', []).append(point)
with open("thedamngeojson.geojson", 'w') as f:
f.write(json.dumps(geo_map))
create_map('MovieParksGeocode2.tsv')

Categories

Resources