I have an object that I'm retrieving via request that I need to map over and normalize to an expected response:
address_list = list(map(normalize_shipping_address, address_list))
I was hoping to be able to set up some sort of dict to handle the mapping such as:
def normalize_shipping_address(address):
normalized_address = {}
valueMapping = {
'firstname': 'firstName',
'lastname': 'lastName',
'city': 'city',
'postcode': 'zip',
'countryId': 'country',
'defaultShipping': 'isDefault',
}
for source_key, destination_key in valueMapping.iteritems():
try:
key = address[source_key]
except KeyError:
key = None
normalized_address[destination_key] = key
return normalized_address
but realized there is not a good way for me to get nested dictionary keys or list objects such as: address.some.nested.key or address.some.list[0].key.
I ended up handling these special "cases" by doing things such as:
try:
state = address['region']['regionCode']
except KeyError:
state = None
normalized_address['state'] = state
and
try:
address_name = address['customAttributes'][0]['value']
except KeyError:
address_name = None
normalized_address['addressName'] = address_name
This seems a bit verbose and clunky. Is there a more elegant way to parse/normalize dictionaries like this?
Related
I'm using an API to get basic information about shops in my area, name of shop, address, postcode, phone number etc… The API returns back a long list about each shop, but I only want some of the data from each shop.
I created a for loop that just takes the information that I want for every shop that the API has returned. This all works fine.
Problem is not all shops have a phone number or a website, so I get a KeyError because the key website does not exist in every return of a shop. I tried to use try and except which works but only if I only handle one thing, but a shop might not have a phone number and a website, which leads to a second KeyError.
What can I do to check for every key in my for loop and if a key is found missing to just add the value "none"?
My code:
import requests
import geocoder
import pprint
g = geocoder.ip('me')
print(g.latlng)
latitude, longitude = g.latlng
URL = "https://discover.search.hereapi.com/v1/discover"
latitude = xxxx
longitude = xxxx
api_key = 'xxxxx' # Acquire from developer.here.com
query = 'food'
limit = 12
PARAMS = {
'apikey':api_key,
'q':query,
'limit': limit,
'at':'{},{}'.format(latitude,longitude)
}
# sending get request and saving the response as response object
r = requests.get(url = URL, params = PARAMS)
data = r.json()
#print(data)
for x in data['items']:
title = x['title']
address = x['address']['label']
street = x['address']['street']
postalCode = x['address']['postalCode']
position = x['position']
access = x['access']
typeOfBusiness = x['categories'][0]['name']
contacts = x['contacts'][0]['phone'][0]['value']
try:
website = x['contacts'][0]['www'][0]['value']
except KeyError:
website = "none"
resultList = {
'BUSINESS NAME:':title,
'ADDRESS:':address,
'STREET NAME:':street,
'POSTCODE:':postalCode,
'POSITION:':position,
'POSITSION2:':access,
'TYPE:':typeOfBusiness,
'PHONE:':contacts,
'WEBSITE:':website
}
print("--"*80)
pprint.pprint( resultList)
I think a good way to handle it would be to use the operator.itemgetter() to create a callable the will attempt to retrieve all the keys at once, and if any aren't found, it will generate a KeyError.
A short demonstration of what I mean:
from operator import itemgetter
test_dict = dict(name="The Shop", phone='123-45-6789', zipcode=90210)
keys = itemgetter('name', 'phone', 'zipcode')(test_dict)
print(keys) # -> ('The Shop', '123-45-6789', 90210)
keys = itemgetter('name', 'address', 'phone', 'zipcode')(test_dict)
# -> KeyError: 'address'
I have the below list from which I have to retrieve the port number I want the value 50051 but what I get is port=50051 I know I can retrieve this by iterating the list and using string operations but wanted to see if there is some direct way to access this.
r = requests.get(url_service)
data = {}
data = r.json()
#Below is the json after printing
[{'ServerTag': [ 'abc-service=true',
'port=50051',
'protocol=http']
}]
print(data[0]["ServiceTags"][1]) // prints port=50051
You can do something like this perhaps:
received_dic = {
'ServerTag': [ 'abc-service=true',
'port=50051',
'protocol=http']
}
ServerTag = received_dic.get("ServerTag", None)
if ServerTag:
port = list(filter(lambda x: "port" in x, ServerTag))[0].split("=")[1]
print(port)
# 50051
Considering you have the following JSON:
[
{
"ServerTag": ["abc-service=true", "port=50051", "protocol=http"]
}
]
You can extract your value like this:
from functools import partial
# ...
def extract_value_from_tag(tags, name, default=None):
tags = map(partial(str.split, sep='='), tags)
try:
return next(value for key, value in tags if key == name)
except StopIteration:
# Tag was not found
return default
And then you just:
# Providing data is the deserialized JSON as a Python list
# Also assuming that data is not empty and ServerTag is present on the first object
tags = data[0].get('ServerTag', [])
port_number = extract_value_from_tag(tags, 'port', default='8080')
I'm reading data from a SELECT statement of SQLite. Date comes in the following form:
ID|Phone|Email|Status|Role
Multiple rows may be returned for the same ID, Phone, or Email. And for a given row, either Phone or Email can be empty/NULL. However, for the same ID, it's always the same value for Status and the same for Role. for example:
1|1234567892|a#email.com| active |typeA
2|3434567893|b#email.com| active |typeB
2|3434567893|c#email.com| active |typeB
3|5664567891|d#email.com|inactive|typeC
3|7942367891|d#email.com|inactive|typeC
4|5342234233| NULL | active |typeD
5| NULL |e#email.com| active |typeD
These data are returned as a list by Sqlite3, let's call it results. I need to go through them and reorganize the data to construct another list structure in Python. The final list basically consolidates the data for each ID, such that:
Each item of the final list is a dict, one for each unique ID in results. In other words, multiple rows for the same ID will be merged.
Each dict contains these keys: 'id', 'phones', 'emails', 'types', 'role', 'status'.
'phones' and 'emails' are lists, and contains zero or more items, but no duplicates.
'types' is also a list, and contains either 'phone' or 'email' or both, but no duplicates.
The order of dicts in the final list does not matter.
So far I have come up this:
processed = {}
for r in results:
if r['ID'] in processed:
p_data = processed[r['ID']]
if r['Phone']:
p_data['phones'].add(r['Phone'])
p_data['types'].add('phone')
if r['Email']:
p_data['emails'].add(r['Email'])
p_data['types'].add('email')
else:
p_data = {'id': r['ID'], 'status': r['Status'], 'role': r['Role']}
if r['Phone']:
p_data['phones'] = set([r['Phone']])
p_data.setdefault('types', set).add('phone')
if r['Email']:
p_data['emails'] = set([r['Email']])
p_data.setdefault('types', set).add('email')
processed[r['ID']] = p_data
consolidated = list(processed.values())
I wonder if there is a faster and/or more concise way to do this.
EDIT:
A final detail: I would prefer to have 'phones', 'emails', and 'types' in each dict as list instead of set. The reason is that I need to dump consolidated into JSON, and JSON does not allow set.
When faced with something like this I usually use:
processed = collections.defaultdict(lambda:{'phone':set(),'email':set(),'status':None,'type':set()})
and then something like:
for r in results:
for field in ['Phone','Email']:
if r[field]:
processed[r['ID']][field.lower()].add(r[field])
processed[r['ID']]['type'].add(field.lower())
Finally, you can dump it into a dictionary or a list:
a_list = processed.items()
a_dict = dict(a_list)
Regarding the JSON problem with sets, you can either convert the sets to lists right before serializing or write a custom encoder (very useful!). Here is an example of one I have for dates extended to handle sets:
class JSONDateTimeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, datetime.datetime):
return int(time.mktime(obj.timetuple()))
elif isinstance(ojb, set):
return list(obj)
try:
return json.JSONEncoder.default(self, obj)
except:
return str(obj)
and to use it:
json.dumps(a_list,sort_keys=True, indent=2, cls =JSONDateTimeEncoder)
I assume results is a 2d list:
print results
#[['1', '1234567892', 'a#email.com', ' active ', 'typeA'],
#['2', '3434567893', 'b#email.com', ' active ', 'typeB'],
#['2', '3434567893', 'c#email.com', ' active ', 'typeB'],
#['3', '5664567891', 'd#email.com', 'inactive', 'typeC'],
#['3', '7942367891', 'd#email.com', 'inactive', 'typeC'],
#['4', '5342234233', ' NULL ', ' active ', 'typeD'],
#['5', ' NULL ', 'e#email.com', ' active ', 'typeD']]
Now we group this list by id:
from itertools import groupby
data_grouped = [ (k,list(v)) for k,v in groupby( sorted(results, key=lambda x:x[0]) , lambda x : x[0] )]
# make list of column names (should correspond to results). These will be dict keys
names = [ 'id', 'phone','email', 'status', 'roll' ]
ID_info = { g[0]: {names[i]: list(list( map( set, zip(*g[1] )))[i]) for i in range( len(names))} for g in data_grouped }
Now for the types:
for k in ID_info:
email = [ i for i in ID_info[k]['email'] if i.strip() != 'NULL' and i != '']
phone = [ i for i in ID_info[k]['phone'] if i.strip() != 'NULL' and i != '']
if email and phone:
ID_info[k]['types'] = [ 'phone', 'email' ]
elif email and not phone:
ID_info[k]['types'] = ['email']
elif phone and not email:
ID_info[k]['types'] = ['phone']
else:
ID_info[k]['types'] = []
# project
ID_info[k]['id'] = ID_info[k]['id'][0]
ID_info[k]['roll'] = ID_info[k]['roll'][0]
ID_info[k]['status'] = ID_info[k]['status'][0]
And what you asked for (a list of dicts) is returned by ID_info.values()
How might one store the path to a value in a dict of dicts? For instance, we can easily store the path to the name value in a variable name_field:
person = {}
person['name'] = 'Jeff Atwood'
person['address'] = {}
person['address']['street'] = 'Main Street'
person['address']['zip'] = '12345'
person['address']['city'] = 'Miami'
# Get name
name_field = 'name'
print( person[name_field] )
How might the path to the city value be stored?
# Get city
city_field = ['address', 'city']
print( person[city_field] ) // Obviously won't work!
You can do:
path = ('address', 'city')
lookup = person
for key in path:
lookup = lookup[key]
print lookup
# gives: Miami
This will raise a KeyError if a part of the path does not exist.
It will also work if path consists of one value, such as ('name',).
You can use reduce function to do this
print reduce(lambda x, y: x[y], city_field, person)
Output
Miami
An alternative that uses Simeon's (and Unubtu's deleted answer) is to create your own dict class that defines an extra method:
class mydict(dict):
def lookup(self, *args):
tmp = self
for field in args:
tmp = tmp[field]
return tmp
person = mydict()
person['name'] = 'Jeff Atwood'
person['address'] = {}
person['address']['street'] = 'Main Street'
person['address']['zip'] = '12345'
person['address']['city'] = 'Miami'
print(person.lookup('address', 'city'))
print(person.lookup('name'))
print(person.lookup('city'))
which results in:
Miami
Jeff Atwood
Traceback (most recent call last):
File "look.py", line 17, in <module>
print(person.lookup('city'))
File "look.py", line 5, in lookup
tmp = tmp[field]
KeyError: 'city'
You can shorten the loop per the suggestion of thefourtheye. If you want to be really fancy, you can probably override private methods like __get__ to allow for a case like person['address', 'city'], but then things may become tricky.
Here's another way - behaves exactly the same as Simeon Visser's
from operator import itemgetter
pget = lambda map, path: reduce(lambda x,p: itemgetter(p)(x), path, map)
With your example data:
person = {
'name': 'Jeff Atwood',
'address': {
'street': 'Main Street',
'zip': '12345',
'city': 'Miami',
},
}
pget(person, ('address', 'zip')) # Prints '12345'
pget(person, ('name',)) # Prints 'Jeff Atwood'
pget(person, ('nope',)) # Raises KeyError
How do I look up the 'id' associated with the a person's 'name' when the 2 are in a dictionary?
user = 'PersonA'
id = ? #How do I retrieve the 'id' from the user_stream json variable?
json, stored in a variable named "user_stream"
[
{
'name': 'PersonA',
'id': '135963'
},
{
'name': 'PersonB',
'id': '152265'
},
]
You'll have to decode the JSON structure and loop through all the dictionaries until you find a match:
for person in json.loads(user_stream):
if person['name'] == user:
id = person['id']
break
else:
# The else branch is only ever reached if no match was found
raise ValueError('No such person')
If you need to make multiple lookups, you probably want to transform this structure to a dict to ease lookups:
name_to_id = {p['name']: p['id'] for p in json.loads(user_stream)}
then look up the id directly:
id = name_to_id.get(name) # if name is not found, id will be None
The above example assumes that names are unique, if they are not, use:
from collections import defaultdict
name_to_id = defaultdict(list)
for person in json.loads(user_stream):
name_to_id[person['name']).append(person['id'])
# lookup
ids = name_to_id.get(name, []) # list of ids, defaults to empty
This is as always a trade-off, you trade memory for speed.
Martijn Pieters's solution is correct, but if you intend to make many such look-ups it's better to load the json and iterate over it just once, and not for every look-up.
name_id = {}
for person in json.loads(user_stream):
name = person['name']
id = person['id']
name_id[name] = id
user = 'PersonA'
print name_id[user]
persons = json.loads(...)
results = filter(lambda p:p['name'] == 'avi',persons)
if results:
id = results[0]["id"]
results can be more than 1 of course..