Python Decision Tree: Creating Relationship using Dictionary from Row data - python

I have a hierarchical data(more than 10 generation) which tells who a person's parent/children are. i would want to represent this as dict of dict. is there any way to achieve this.
sample input - List of Dict/Dataframe
[{'Name': 'Oli Bob', 'Location': 'United Kingdom', 'Parent': nan}, {'Name': 'Mary May', 'Location': 'Germany', 'Parent': 'Oli Bob'}, {'Name': 'Christine Lobowski', 'Location': 'France', 'Parent': 'Oli Bob'}, {'Name': 'Brendon Philips', 'Location': 'USA', 'Parent': 'Oli Bob'}, {'Name': 'Margret Marmajuke', 'Location': 'Canada', 'Parent': 'Brendon Philips'}, {'Name': 'Frank Harbours', 'Location': 'Russia', 'Parent': 'Brendon Philips'}, {'Name': 'Todd Philips', 'Location': 'United Kingdom', 'Parent': 'Frank Harbours'}, {'Name': 'Jamie Newhart', 'Location': 'India', 'Parent': nan}, {'Name': 'Gemma Jane', 'Location': 'China', 'Parent': nan}, {'Name': 'Emily Sykes', 'Location': 'South Korea', 'Parent': 'Emily Sykes'}, {'Name': 'James Newman', 'Location': 'Japan', 'Parent': nan}]
same data in table form
Desired Output
[
{name:"Oli Bob", location:"United Kingdom", _children:[
{name:"Mary May", location:"Germany"},
{name:"Christine Lobowski", location:"France"},
{name:"Brendon Philips", location:"USA",_children:[
{name:"Margret Marmajuke", location:"Canada"},
{name:"Frank Harbours", location:"Russia",_children:[{name:"Todd Philips", location:"United Kingdom"}]},
]},
]},
{name:"Jamie Newhart", location:"India"},
{name:"Gemma Jane", location:"China", _children:[
{name:"Emily Sykes", location:"South Korea"},
]},
{name:"James Newman", location:"Japan"},
];

Related

Need help translating a nested dictionary into a pandas dataframe

Looking into translating the following nested dictionary which is an API pull from Yelp into a pandas dataframe to run visualization on:
Top 50 Pizzerias in Chicago
{'businesses': [{'alias': 'pequods-pizzeria-chicago',
'categories': [{'alias': 'pizza', 'title': 'Pizza'}],
'coordinates': {'latitude': 41.92187, 'longitude': -87.664486},
'display_phone': '(773) 327-1512',
'distance': 2158.7084581522413,
'id': 'DXwSYgiXqIVNdO9dazel6w',
'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/8QJUNblfCI0EDhOjuIWJ4A/o.jpg',
'is_closed': False,
'location': {'address1': '2207 N Clybourn Ave',
'address2': '',
'address3': '',
'city': 'Chicago',
'country': 'US',
'display_address': ['2207 N Clybourn Ave',
'Chicago, IL 60614'],
'state': 'IL',
'zip_code': '60614'},
'name': "Pequod's Pizzeria",
'phone': '+17733271512',
'price': '$$',
'rating': 4.0,
'review_count': 6586,
'transactions': ['restaurant_reservation', 'delivery'],
'url': 'https://www.yelp.com/biz/pequods-pizzeria-chicago?adjust_creative=wt2WY5Ii_urZB8YeHggW2g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=wt2WY5Ii_urZB8YeHggW2g'},
{'alias': 'lou-malnatis-pizzeria-chicago',
'categories': [{'alias': 'pizza', 'title': 'Pizza'},
{'alias': 'italian', 'title': 'Italian'},
{'alias': 'sandwiches', 'title': 'Sandwiches'}],
'coordinates': {'latitude': 41.890357,
'longitude': -87.633704},
'display_phone': '(312) 828-9800',
'distance': 4000.9990531720227,
'id': '8vFJH_paXsMocmEO_KAa3w',
'image_url': 'https://s3-media3.fl.yelpcdn.com/bphoto/9FiL-9Pbytyg6usOE02lYg/o.jpg',
'is_closed': False,
'location': {'address1': '439 N Wells St',
'address2': '',
'address3': '',
'city': 'Chicago',
'country': 'US',
'display_address': ['439 N Wells St',
'Chicago, IL 60654'],
'state': 'IL',
'zip_code': '60654'},
'name': "Lou Malnati's Pizzeria",
'phone': '+13128289800',
'price': '$$',
'rating': 4.0,
'review_count': 6368,
'transactions': ['pickup', 'delivery'],
'url': 'https://www.yelp.com/biz/lou-malnatis-pizzeria-chicago?adjust_creative=wt2WY5Ii_urZB8YeHggW2g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=wt2WY5Ii_urZB8YeHggW2g'},
....]
I've tried the below and iterations of it but haven't had any luck.
df = pd.DataFrame.from_dict(topresponse)
Im really new to coding so any advice would be helpful
response["businesses"] is a list of records, so:
df = pd.DataFrame.from_records(response["businesses"])

Json to Pandas, include "Parents"

With a list of 150+ Neighborhoods , I am using Foursquare API to retrieve nearby venues at 500m radius of a given Neighbourhood. Each Neighbourhood is expected to return 10-20 nearby venues.
Refer to snippet of json result as returned by Foursquare.
With results['response']['groups'][0]['items'], I able to retrieve the nearby venues information and make it a Table as below. However results['response']['groups'][0]['items'] does not have the Neighbourhood ( under headerFullLocation in json) of associated venues.
Q: How can I link the Neighbourhood(headerFullLocation) to its associated nearby venue and add it as a column to table below? Thanks for the advice.
{'suggestedFilters': {'header': 'Tap to show:',
'filters': [{'name': 'Open now', 'key': 'openNow'}]},
'headerLocation': 'Alexandra Park',
'headerFullLocation': 'Alexandra Park, Toronto',**
'headerLocationGranularity': 'neighborhood',
'totalResults': 138,
'suggestedBounds': {'ne': {'lat': 43.6545000045, 'lng': -79.39379244047241},
'sw': {'lat': 43.645499995499996, 'lng': -79.4062075595276}},
'groups': [{'type': 'Recommended Places',
'name': 'recommended',
'items': [{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '5644dbaa498e7f7534154326',
'**name': 'Maker Pizza',**
'contact': {},
'location': {'address': '59 Cameron St',
'lat': 43.6504011331197,
'lng': -79.39804047841302,
'labeledLatLngs': [{'label': 'display',
'lat': 43.6504011331197,
'lng': -79.39804047841302}],
'distance': 164,
'postalCode': 'M5T 2H1',
'cc': 'CA',
'city': 'Toronto',
'state': 'ON',
'country': 'Canada',
'formattedAddress': ['59 Cameron St', 'Toronto ON M5T 2H1', 'Canada']},
'categories': [{'id': '4bf58dd8d48988d1ca941735',
'name': 'Pizza Place',
'pluralName': 'Pizza Places',
'shortName': 'Pizza',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/pizza_',
'suffix': '.png'},
'primary': True}],
'verified': False,
'stats': {'tipCount': 0,
'usersCount': 0,
'checkinsCount': 0,
'visitsCount': 0},
'beenHere': {'count': 0,
'lastCheckinExpiredAt': 0,
'marked': False,
'unconfirmedCount': 0},
'photos': {'count': 0, 'groups': []},
'hereNow': {'count': 0, 'summary': 'Nobody here', 'groups': []}},
Why don't you just do venues['Neighbourhood'] = response['headerFullLocation']. I am assuming, you send a separate request for each neigbhourhood and plan to concatenate multiple venue dataframes in the end.

Count duplicates in dictionary by specific keys

I have a list of dictionaries and I need to count duplicates by specific keys.
For example:
[
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
If will specify 'age' and 'country' it should return
[
{
'age': 10,
'country': 'USA',
'count': 2
},
{
'age': 10,
'country': 'Canada',
'count': 2
},
{
'age': 15,
'country': 'Canada',
'count': 1
}
]
Or if I will specify 'name' and 'height':
[
{
'name': 'John',
'height': 185,
'count': 2
},
{
'name': 'Mark',
'height': 180,
'count': 2
},
{
'name': 'Doe',
'heigth': 185,
'count': 1
}
]
Maybe there is a way to implement this by Counter?
You can use itertools.groupby with sorted list:
>>> data = [
{'name': 'John', 'age': 10, 'country': 'USA', 'height': 185},
{'name': 'John', 'age': 10, 'country': 'Canada', 'height': 185},
{'name': 'Mark', 'age': 10, 'country': 'USA', 'height': 180},
{'name': 'Mark', 'age': 10, 'country': 'Canada', 'height': 180},
{'name': 'Doe', 'age': 15, 'country': 'Canada', 'height': 185}
]
>>> from itertools import groupby
>>> key = 'age', 'country'
>>> list_sorter = lambda x: tuple(x[k] for k in key)
>>> grouper = lambda x: tuple(x[k] for k in key)
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'age': 10, 'country': 'Canada', 'count': 2},
{'age': 10, 'country': 'USA', 'count': 2},
{'age': 15, 'country': 'Canada', 'count': 1}]
>>> key = 'name', 'height'
>>> result = [
{**dict(zip(key, k)), 'count': len([*g])}
for k, g in
groupby(sorted(data, key=list_sorter), grouper)
]
>>> result
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]
If you use pandas then you can use, pandas.DataFrame.groupby, pandas.groupby.size, pandas.Series.to_frame, pandas.DataFrame.reset_index and finally pandas.DataFrame.to_dict with orient='records':
>>> import pandas as pd
>>> df = pd.DataFrame(data)
>>> df.groupby(list(key)).size().to_frame('count').reset_index().to_dict('records')
[{'name': 'Doe', 'height': 185, 'count': 1},
{'name': 'John', 'height': 185, 'count': 2},
{'name': 'Mark', 'height': 180, 'count': 2}]

How to create new rows in pandas by dictionary

I have a problem with pandas DataFrame - I don't understand how I can create new rows and merge them with a dictionary.
For instanse, I have this dataframe:
shops = [{'Chain': 'SeQu', 'Shop': 'Rimme', 'Location': 'UK', 'Brand': 'Rexona', 'Value': 10},
{'Chain': 'SeQu', 'Shop': 'Rimme', 'Location': 'UK', 'Brand': 'AXE', 'Value': 20},
{'Chain': 'SeQu', 'Shop': 'Rimme', 'Location': 'UK', 'Brand': 'Old Spice', 'Value': 30},
{'Chain': 'SeQu', 'Shop': 'Rimme', 'Location': 'UK', 'Brand': 'Camel', 'Value': 40},
{'Chain': 'SeQu', 'Shop': 'Rimme', 'Location': 'UK', 'Brand': 'Dove', 'Value': 50},
{'Chain': 'SeQu', 'Shop': 'Rum', 'Location': 'USA', 'Brand': 'Rexona', 'Value': 10},
{'Chain': 'SeQu', 'Shop': 'Rum', 'Location': 'USA', 'Brand': 'CIF', 'Value': 20},
{'Chain': 'SeQu', 'Shop': 'Rum', 'Location': 'USA', 'Brand': 'Old Spice', 'Value': 30},
{'Chain': 'SeQu', 'Shop': 'Rum', 'Location': 'USA', 'Brand': 'Camel', 'Value': 40}]
At the same time, I have a dictionary dataframe with Chain-Brand connection:
chain_brands = [{'Chain': 'SeQu', 'Brand': 'Rexona'},
{'Chain': 'SeQu', 'Brand': 'Axe'},
{'Chain': 'SeQu', 'Brand': 'Old Spice'},
{'Chain': 'SeQu', 'Brand': 'Camel'},
{'Chain': 'SeQu', 'Brand': 'Dove'},
{'Chain': 'SeQu', 'Brand': 'CIF'}]
So, I need to create new rows and fill them with 0, if brand in Null. It should look like this:
output = [{'Chain': 'SeQu', 'Shop': 'Rimme', 'Location': 'UK', 'Brand': 'Rexona', 'Value': 10},
{'Chain': 'SeQu', 'Shop': 'Rimme', 'Location': 'UK', 'Brand': 'AXE', 'Value': 20},
{'Chain': 'SeQu', 'Shop': 'Rimme', 'Location': 'UK', 'Brand': 'Old Spice', 'Value': 30},
{'Chain': 'SeQu', 'Shop': 'Rimme', 'Location': 'UK', 'Brand': 'Camel', 'Value': 40},
{'Chain': 'SeQu', 'Shop': 'Rimme', 'Location': 'UK', 'Brand': 'Dove', 'Value': 50},
{'Chain': 'SeQu', 'Shop': 'Rimme', 'Location': 'UK', 'Brand': 'CIF', 'Value': 0},
{'Chain': 'SeQu', 'Shop': 'Rum', 'Location': 'USA', 'Brand': 'Rexona', 'Value': 10},
{'Chain': 'SeQu', 'Shop': 'Rum', 'Location': 'USA', 'Brand': 'CIF', 'Value': 20},
{'Chain': 'SeQu', 'Shop': 'Rum', 'Location': 'USA', 'Brand': 'Old Spice', 'Value': 30},
{'Chain': 'SeQu', 'Shop': 'Rum', 'Location': 'USA', 'Brand': 'Axe', 'Value': 0},
{'Chain': 'SeQu', 'Shop': 'Rum', 'Location': 'USA', 'Brand': 'Camel', 'Value': 40},
{'Chain': 'SeQu', 'Shop': 'Rum', 'Location': 'USA', 'Brand': 'Dove', 'Value': 0}]
Thanks!
You can create a multi-index from the chain_brands dataframe and then use groupby together with reindex to solve this:
mi = pd.MultiIndex.from_arrays(chain_brands.values.T, names=['Chain', 'Brand'])
s = shops.set_index(['Chain', 'Brand']).\
groupby(['Location', 'Shop']).\
apply(lambda x: x.reindex(mi, fill_value=0)).\
drop(columns=['Location', 'Shop']).\
reset_index()
Result:
Location Shop Chain Brand Value
0 UK Rimme SeQu Rexona 10
1 UK Rimme SeQu Axe 0
2 UK Rimme SeQu Old Spice 30
3 UK Rimme SeQu Camel 40
4 UK Rimme SeQu Dove 50
5 UK Rimme SeQu CIF 0
6 USA Rum SeQu Rexona 10
7 USA Rum SeQu Axe 0
8 USA Rum SeQu Old Spice 30
9 USA Rum SeQu Camel 40
10 USA Rum SeQu Dove 0
11 USA Rum SeQu CIF 20

How to filter meta-tags in json response with Python Requests?

I have the following code with the Matchbook betting API.
r17 = s.get('https://matchbook.com/bpapi/rest/events/?sport-ids=15&?after=1486157894&?before=14862442917&')
data1 = r17.json()
for event in data1['events']:
print(event['name'])
print(event['id'])
print(event['sport-id'])
print(event['start'])
print(event['meta-tags'])
which gives the following json output
Bayern Munich vs Schalke
368063
15
2017-02-04T14:35:00.000Z
[{'id': 1, 'url-name': 'sport', 'name': 'Sport', 'type': 'UNKNOWN'}, {'id': 402, 'url-name': 'live-betting', 'name': 'Live Betting', 'type': 'COMPETITION'}, {'id': 4, 'url-name': 'soccer', 'name': 'Soccer', 'type': 'SPORT'}, {'id': 56, 'url-name': 'germany', 'name': 'Germany', 'type': 'COUNTRY'}, {'id': 57, 'url-name': 'bundesliga', 'name': 'Bundesliga', 'type': 'COMPETITION'}, {'id': 4105, 'url-name': 'february-4th-2017', 'name': 'February 4th 2017', 'type': 'DATE'}]
The meta-tags are contained between the [] brackets. How do I filter by these meta-tags?
import pprint
r17 = requests.get('https://matchbook.com/bpapi/rest/events/?sport-ids=15&?after=1486157894&?before=14862442917&')
data = r17.json()
for event in data['events']:
print(event['name'])
pprint.pprint(event['meta-tags'], indent=4)
print('sorted:')
# change k['id'] to k['name'] if you need to sort dict's by name
pprint.pprint(sorted(event['meta-tags'], key=lambda k: k['id']), indent=4)

Categories

Resources