Retrieve the country from the geographical locations in Python - python

I am trying to get the country name from the latitude and longitude points from my pandas dataframe.
Currently I have used geolocator.reverse(latitude,longitude) to get the full address of the geographic location. But there is no option to retrieve the country name from the full address as it returns a list.
Method used:
def get_country(row):
pos = str(row['StartLat']) + ', ' + str(row['StartLong'])
locations = geolocator.reverse(pos)
return locations
Call to get_country by passing the dataframe:
df4['country'] = df4.apply(lambda row: get_country(row), axis = 1)
Current output:
StartLat StartLong Address
52.509669 13.376294 Potsdamer Platz, Mitte, Berlin, Deutschland, Europe
Just wondering whether there is some Python library to retrieve the country when we pass the geographic points.
Any help would be appreciated.

In your get_country function, your return value location will have an attribute raw, which is a dict that looks like this:
{
'address': {
'attraction': 'Potsdamer Platz',
'city': 'Berlin',
'city_district': 'Mitte',
'country': 'Deutschland',
'country_code': 'de',
'postcode': '10117',
'road': 'Potsdamer Platz',
'state': 'Berlin'
},
'boundingbox': ['52.5093982', '52.5095982', '13.3764983', '13.3766983'],
'display_name': 'Potsdamer Platz, Mitte, Berlin, 10117, Deutschland',
... and so one ...
}
so location.raw['address']['country'] gives 'Deutschland'
If I read your question correctly, a possible solution could be:
def get_country(row):
pos = str(row['StartLat']) + ', ' + str(row['StartLong'])
locations = geolocator.reverse(pos)
return location.raw['address']['country']
EDIT: The format of the location.raw object will differ depending on which geolocator service you are using. My example uses geopy.geocoders.Nominatim, from the example on geopy's documentation site, so your results might differ.

My code,hopefully that helps:
from geopy.geocoders import Nominatim
nm = Nominatim()
place, (lat, lng) = nm.geocode("3995 23rd st, San Francisco,CA 94114")
print('Country' + ": " + place.split()[-1])

I'm not sure what service you're using with geopy, but as a small plug which I'm probably biased towards, this I think could be a simpler solution for you.
https://github.com/Ziptastic/ziptastic-python
from ziptastic import Ziptastic
# Set API key.
api = Ziptastic('<your api key>')
result = api.get_from_coordinates('42.9934', '-84.1595')
Which will return a list of dictionaries like so:
[
{
"city": "Owosso",
"geohash": "dpshsfsytw8k",
"country": "US",
"county": "Shiawassee",
"state": "Michigan",
"state_short": "MI",
"postal_code": "48867",
"latitude": 42.9934,
"longitude": -84.1595,
"timezone": "America/Detroit"
}
]

Related

Python JSON assign data from API

I have a py file to read data from Wordpress API and pass values to another fields of other API. When values are singles, i have no problem, but i don't know how make that:
When i read one field from the API, the states values, comes with code instead the text value. For example, when the text value in Wordpress is Barcelona, returns B, and i'll need that the value returned will be Barcelona.
One example of code with simple fields values:
oClienteT["Direcciones"] = []
oClienteT["Telefono"] = oClienteW["billing"]["phone"]
oClienteT["NombreFiscal"] = oClienteW["first_name"] " " oClienteW["last_name"]
oClienteT["Direcciones"].append( {
"Codigo" : oClienteW["id"],
"Nombre" : oClienteW["billing"]["first_name"],
"Apellidos" : oClienteW["billing"]["last_name"],
"Direccion" : oClienteW["billing"]["address_1"],
"Direccion2" : oClienteW["billing"]["address_2"],
"Poblacion" : oClienteW["billing"]["state"],
"Provincia" : oClienteW["billing"]["city"]
})
When billing city is Madrid and billing state is madrid, Wordpress returns Madrid and M
I need tell thst when Madrid, returns Madrid, and so on.
Make sure to convert to a JSON object before accessing fields (data = json.loads(json_str))
response = { "billing": { "address_1": "C/GUSTAVO ADOLFO BECQUER, 4", "city": "SEVILLA", "state": "SE"}}
print(response["billing"].get("address_1", None))
I've just got solved it:
def initProvincias(self):
self.aProvincias['C'] = 'A Coruña'
self.aProvincias['VI'] = 'Álava'
def getProvincia(self , sCod ):
if not sCod in self.aProvincias:
_logger.info("PROVINCIA NO ENCONTRADA "+str(sCod))
return ""
return self.aProvincias[sCod]
"Provincia" : self.getProvincia( oClienteW["shipping"]["state"] ),

Retrieving full address and geocoding based on place/store name and city stored in csv

I have a csv file with 2 fields, store_name and city. There can be multiple stores in a city.
I want an output csv with 5 fields, store_name, city, address, latitude, longitude.
For example, if one entry of the csv is Starbucks, Chicago, I want the output csv to contain all the information in the 5 fields (mentioned above) as:
Starbucks, Chicago, "200 S Michigan Ave, Chicago, IL 60604, USA", 41.8164613, -87.8127855,
Starbucks, Chicago, "8 N Michigan Ave, Chicago, IL 60602, USA", 41.8164613, -87.8127855
and so on for the rest of the results.
I was trying to work this through GeoPy using Nomanitim, before making it work with Google Maps API. Although I do not know what is the best way to approach this. Do note that there are a million of such entries in the source csv, but buying an API key is not an issue once it works.
I did try only geocoding with Nominatim using pandas, but this only creates one result in the output csv for each entry. I want to grab each result as explained in the example above. Not sure how to implement it.
from geopy.geocoders import Nominatim
import csv, sys
import pandas as pd
import keys
in_file = str(sys.argv[1])
out_file = str('gc_' + in_file)
timeout = int(sys.argv[2])
nominatim = Nominatim(user_agent=your_key_here, timeout=timeout)
def gc(address):
name = str(address['store_name'])
city = str(address['city'])
add_concat = name + ", " + city
location = nominatim.geocode(add_concat)
if location != None:
print(f'geocoded record {address.name}: {city}')
located = pd.Series({
'lat': location.latitude,
'lng': location.longitude,
})
else:
print(f'failed to geolocate record {address.name}: {city}')
located = pd.Series({
'lat': 'null',
'lng': 'null',
})
return located
print('opening input.')
reader = pd.read_csv(in_file, header=0)
print('geocoding addresses.')
reader = reader.merge(reader.apply(lambda add: gc(add), axis=1), left_index=True, right_index=True)
print(f'writing to {out_file}.')
reader.to_csv(out_file, encoding='utf-8', index=False)
print('done.')
You can use reverse geocoding for that purpose. As per the official documentation here, it's a way of converting geographic coordinates into a human-readable address.
I used the below function in one of my projects and it's still working. You can probably modify it as per your requirements.
import requests
GCODE_URL = 'https://maps.googleapis.com/maps/api/geocode/json?'
GCODE_KEY = 'YOUR API KEY'
def reverse_gcode(location):
location = str(location).replace(' ','+')
nav_req = 'address={}&key={}'.format(location,GCODE_KEY)
request = GCODE_URL + nav_req
result = requests.get(request)
data = result.json()
status = data['status']
geo_location = {}
if str(status) == "OK":
sizeofjson = len(data['results'][0]['address_components'])
for i in range(sizeofjson):
sizeoftype = len(data['results'][0]['address_components'][i]['types'])
if sizeoftype == 3:
geo_location[data['results'][0]['address_components'][i]['types'][2]] = data['results'][0]['address_components'][i]['long_name']
else:
if data['results'][0]['address_components'][i]['types'][0] == 'administrative_area_level_1':
geo_location['state'] = data['results'][0]['address_components'][i]['long_name']
elif data['results'][0]['address_components'][i]['types'][0] == 'administrative_area_level_2':
geo_location['city'] = data['results'][0]['address_components'][i]['long_name']
geo_location['town'] = geo_location['city']
else:
geo_location[data['results'][0]['address_components'][i]['types'][0] ]= data['results'][0]['address_components'][i]['long_name']
formatted_address = data['results'][0]['formatted_address']
geo_location['lat'] = data['results'][0]['geometry']['location']['lat']
geo_location['lang']= data['results'][0]['geometry']['location']['lng']
geo_location['formatted_address']=formatted_address
return geo_location
print(reverse_gcode("Starbucks, Chicago"))
Output will be in a json format, looks something like this:
{'street_number': '8', 'town': 'Cook County', 'locality': 'Chicago', 'city': 'Cook County', 'lat': 41.882413, 'neighborhood': 'Chicago Loop', 'route': 'North Michigan Avenue', 'lang': -87.62468799999999, 'postal_code': '60602', 'country': 'United States', 'formatted_address': '8 N Michigan Ave, Chicago, IL 60602, USA', 'state': 'Illinois'}

Getting the population of a city given its name

What is a good python API I can use to get the population of a city? I have tried using geocoder, but it is not working - not sure why.
geocoder.population('San Francisco, California')
returns
'module' object has no attribute 'population'
Why is this happening, and how can I fix it?
Alternatively, is there a different python api I can use for this?
Certainly you can get the population of a city using geocoder and Google,
but it requires an API key.
Here are two quite different alternative solutions:
OpenDataSoft
The first solution uses the OpenDataSoft API and basic Python 3.
The country needs to be specified via a two-letter country code, see examples below.
import requests
import json
def get_city_opendata(city, country):
tmp = 'https://public.opendatasoft.com/api/records/1.0/search/?dataset=worldcitiespop&q=%s&sort=population&facet=country&refine.country=%s'
cmd = tmp % (city, country)
res = requests.get(cmd)
dct = json.loads(res.content)
out = dct['records'][0]['fields']
return out
get_city_opendata('Berlin', 'de')
#{'city': 'berlin',
# 'country': 'de',
# 'region': '16',
# 'geopoint': [52.516667, 13.4],
# 'longitude': 13.4,
# 'latitude': 52.516667,
# 'accentcity': 'Berlin',
# 'population': 3398362}
get_city_opendata('San Francisco', 'us')
#{'city': 'san francisco',
# 'country': 'us',
# 'region': 'CA',
# 'geopoint': [37.775, -122.4183333],
# 'longitude': -122.4183333,
# 'latitude': 37.775,
# 'accentcity': 'San Francisco',
# 'population': 732072}
WikiData
The second solution uses the WikiData API and the qwikidata package.
Here, the country is given by its English name (or a part of it), see examples below.
I'm sure the SPARQL command can be written much more efficiently and elegantly (feel free to edit), but it does the job.
import qwikidata
import qwikidata.sparql
def get_city_wikidata(city, country):
query = """
SELECT ?city ?cityLabel ?country ?countryLabel ?population
WHERE
{
?city rdfs:label '%s'#en.
?city wdt:P1082 ?population.
?city wdt:P17 ?country.
?city rdfs:label ?cityLabel.
?country rdfs:label ?countryLabel.
FILTER(LANG(?cityLabel) = "en").
FILTER(LANG(?countryLabel) = "en").
FILTER(CONTAINS(?countryLabel, "%s")).
}
""" % (city, country)
res = qwikidata.sparql.return_sparql_query_results(query)
out = res['results']['bindings'][0]
return out
get_city_wikidata('Berlin', 'Germany')
#{'city': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q64'},
# 'population': {'datatype': 'http://www.w3.org/2001/XMLSchema#decimal',
# 'type': 'literal',
# 'value': '3613495'},
# 'country': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q183'},
# 'cityLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Berlin'},
# 'countryLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'Germany'}}
get_city_wikidata('San Francisco', 'America')
#{'city': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q62'},
# 'population': {'datatype': 'http://www.w3.org/2001/XMLSchema#decimal',
# 'type': 'literal',
# 'value': '805235'},
# 'country': {'type': 'uri', 'value': 'http://www.wikidata.org/entity/Q30'},
# 'cityLabel': {'xml:lang': 'en', 'type': 'literal', 'value': 'San Francisco'},
# 'countryLabel': {'xml:lang': 'en',
# 'type': 'literal',
# 'value': 'United States of America'}}
Both approaches return dictionaries from which you can extract the infos you need using basic Python.
Hope that helps!
from urllib.request import urlopen
import json
import pycountry
import requests
from geopy.geocoders import Nominatim
def get_city_opendata(city, country):
tmp = 'https://public.opendatasoft.com/api/records/1.0/search/?dataset=worldcitiespop&q=%s&sort=population&facet=country&refine.country=%s'
cmd = tmp % (city, country)
res = requests.get(cmd)
dct = json.loads(res.content)
out = dct['records'][0]['fields']
return out
def getcode(cc):
countries = {}
for country in pycountry.countries:
countries[country.name] = country.alpha_2
codes = countries.get(cc)
return codes
def getplace(lat, lon):
key = "PUT YOUR OWN GOOGLE API KEY HERE" #PUT YOUR OWN GOOGLE API KEY HERE
url = "https://maps.googleapis.com/maps/api/geocode/json?"
url += "latlng=%s,%s&sensor=false&key=%s" % (lat, lon, key)
v = urlopen(url).read()
j = json.loads(v)
components = j['results'][0]['address_components']
country = town = None
for c in components:
if "country" in c['types']:
country = c['long_name']
if "postal_town" in c['types']:
town = c['long_name']
return town, country
address= input('Input an address or town name\t')
geolocator = Nominatim(user_agent="Your_Name")
location = geolocator.geocode(address)
locationLat = location.latitude
locationLon = location.longitude
towncountry = getplace(location.latitude, location.longitude)
mycity = towncountry[0]
mycountry = towncountry[1]
print(towncountry)
print(mycountry)
print(mycity)
mycccode = getcode(mycountry)
mycccode = mycccode.lower()
print(mycccode)
populationdict = get_city_opendata(address, mycccode)
population = populationdict.get('population')
print('population',population)
print(location.address)
print((location.latitude, location.longitude))
I am very grateful for the previous answers. I had to solve this issue too. My code above follows on from David's answer above, where he recommends the OpenDataSoft API. Apparently the Google API at this time doesn't provide population results.
The code which I used below is able to get population of a city, OpenDataSoft doesn't always return town populations.
My code combines code from a few answers to different questions that I found on stackoverflow.
You will need to get a google maps developer api key, and do relevant pip installs.
Firstly this code gets the long,lat coordinates of any place name
based on user input
Then it uses those to get the country name off google maps
Then it uses the country name to get the abbreviated 2
letters for the country
Then it sends the place name and the abbreviated 2 letters to get the population from the OpenDataSoft

Passing variables onto a MongoDB Query

My collections has the following documents
{
cust_id: "0044234",
Address: "1234 Dunn Hill",
city: "Pittsburg",
comments : "4"
},
{
cust_id: "0097314",
Address: "5678 Dunn Hill",
city: "San Diego",
comments : "99"
},
{
cust_id: "012345",
Address: "2929 Dunn Hill",
city: "Pittsburg",
comments : "41"
}
I want to write a block of code that extracts and stores all cust_id's from the same city. I am able to get the answer by running the below query on MongoDB :
db.custData.find({"city" : 'Pittsburg'},{business_id:1}).
However, I am unable to do the same using Python. Below is what I have tried :
ctgrp=[{"$group":{"_id":"$city","number of cust":{"$sum":1}}}]
myDict={}
for line in collection.aggregate(ctgrp) : #for grouping all the cities in the dataset
myDict[line['_id']]=line['number of cust']
for key in myDict:
k=db.collection.find({"city" : 'key'},{'cust_id:1'})
print k
client.close()
Also, I am unable to figure out how can I store this. The only thing that is coming to my mind is a dictionary with a 'list of values' corresponding to a particular 'key'. However, I could not come up with an implementation about the same.I was looking for an output like this
For Pitssburg, the values would be 0044234 and 012345.
You can use the .distinct method which is the best way to do this.
import pymongo
client = pymongo.MongoClient()
db = client.test
collection = db.collection
then:
collection.distinct('cust_id', {'city': 'Pittsburg'})
Yields:
['0044234', '012345']
or do this client side which is not efficient:
>>> cust_ids = set()
>>> for element in collection.find({'city': 'Pittsburg'}):
... cust_ids.add(element['cust_id'])
...
>>> cust_ids
{'0044234', '012345'}
Now if you want all "cust_id" for a given city here it is
>>> list(collection.aggregate([{'$match': {'city': 'Pittsburg'} }, {'$group': {'_id': None, 'cust_ids': {'$push': '$cust_id'}}}]))[0]['cust_ids']
['0044234', '012345']
Now if what you want is group your document by city then here and find distinct "cust_id" then here is it:
>>> from pprint import pprint
>>> pipeline = [{'$group': {'_id': '$city', 'cust_ids': {'$addToSet': '$cust_id'}, 'count': {'$sum': 1}}}]
>>> pprint(list(collection.aggregate(pipeline)))
[{'_id': 'San Diego', 'count': 1, 'cust_ids': ['0097314']},
{'_id': 'Pittsburg', 'count': 2, 'cust_ids': ['012345', '0044234']}]

Return individual address components (city, state, etc.) from GeoPy geocoder

I'm using GeoPy to geocode addresses to lat,lng. I would also like to extract the itemized address components (street, city, state, zip) for each address.
GeoPy returns a string with the address -- but I can't find a reliable way to separate each component. For example:
123 Main Street, Los Angeles, CA 90034, USA =>
{street: '123 Main Street', city: 'Los Angeles', state: 'CA', zip: 90034, country: 'USA'}
The Google geocoding API does return these individual components... is there a way to get these from GeoPy? (or a different geocoding tool?)
You can also get the individual address components from the Nominatim() geocoder (which is the standard open source geocoder from geopy).
from geopy.geocoders import Nominatim
# address is a String e.g. 'Berlin, Germany'
# addressdetails=True does the magic and gives you also the details
location = geolocator.geocode(address, addressdetails=True)
print(location.raw)
gives
{'type': 'house',
'class': 'place',
'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright',
'display_name': '2, Stralauer Allee, Fhain, Friedrichshain-Kreuzberg, Berlin, 10245, Deutschland',
'place_id': '35120946',
'osm_id': '2825035484',
'lon': '13.4489063',
'osm_type': 'node',
'address': {'country_code': 'de',
'road': 'Stralauer Allee',
'postcode': '10245',
'house_number': '2',
'state': 'Berlin',
'country': 'Deutschland',
'suburb': 'Fhain',
'city_district': 'Friedrichshain-Kreuzberg'},
'lat': '52.5018003',
'importance': 0.421,
'boundingbox': ['52.5017503', '52.5018503', '13.4488563', '13.4489563']}
with
location.raw['address']
you get the dictionary with the components only.
Take a look at geopy documentation for more parameters or Nominatim for all address components.
Use usaddress by DataMade. Here's the GitHub repo.
It works like this usaddress.parse('123 Main St. Suite 100 Chicago, IL') and returns this array
[('123', 'AddressNumber'),
('Main', 'StreetName'),
('St.', 'StreetNamePostType'),
('Suite', 'OccupancyType'),
('100', 'OccupancyIdentifier'),
('Chicago,', 'PlaceName'),
('IL', 'StateName')]
This is how I implemented such a split, as I wanted the resulting address in always the same format. You would just have to skip the concatenation and retrun each value... or put it in list. Up to you.
def getaddress(self, lat, lng, language="en"):
try:
geolocator = Nominatim()
string = str(lat) + ', ' +str(lng)
location = geolocator.reverse(string, language=language)
data = location.raw
data = data['address']
address = str(data)
street = district = postalCode= state = country = countryCode = ""
district =str(data['city_district'])
postalCode =str(data['postcode'])
state =str(data['state'])
country =str(data['country'])
countryCode =str(data['country_code']).upper()
address = street +' '+ district +' '+ postalCode +' '+ state +' '+ country +' '+ countryCode
except:
address="Error"
return str(address.decode('utf8'))
I helped write one not long ago called LiveAddress; it was just upgraded to support single-line (freeform) addresses and implements geocoding features.
GeoPy is a geocoding utility, not an address parser/standardizer. LiveAddress API is, however, and can also verify the validity of the address for you, filling out the missing information. You'll find that services such as Google and Yahoo approximate the address, while a CASS-Certified service like LiveAddress actually verify it and won't return results unless the address is real.
After doing a lot of research and development with implementing LiveAddress, I wrote a summary in this Stack Overflow post. It documents some of the crazy-yet-complete formats that addresses can come in and ultimately lead to a solution for the parsing problem (for US addresses).
To parse a single-line address into components using Python, simply put the entire address into the "street" field:
import json
import pprint
import urllib
LOCATION = 'https://api.qualifiedaddress.com/street-address/'
QUERY_STRING = urllib.urlencode({ # entire query sting must be URL-Encoded
'auth-token': r'YOUR_API_KEY_HERE',
'street': '1 infinite loop cupertino ca 95014'
})
URL = LOCATION + '?' + QUERY_STRING
response = urllib.urlopen(URL).read()
structure = json.loads(response)
pprint.pprint(structure)
The resulting JSON object will contain a components object which will look something like this:
"components": {
"primary_number": "1",
"street_name": "Infinite",
"street_suffix": "Loop",
"city_name": "Cupertino",
"state_abbreviation": "CA",
"zipcode": "95014",
"plus4_code": "2083",
"delivery_point": "01",
"delivery_point_check_digit": "7"
}
The response will also include the combined first_line and delivery_line_2 so you don't have to manually concatenate those if you need them. Latitude/longitude and other information is also available about the address.

Categories

Resources