multinested json python to df - python

I have a json response with productdetails of multiple products in this structure
all_content=[{'productcode': '0502570SRE',
'brand': {'code': 'MJ', 'name': 'Marie Jo'},
'series': {'code': '0257', 'name': 'DANAE'},
'family': {'code': '0257SRE', 'name': 'DANAE Red'},
'style': {'code': '0502570'},
'introSeason': {'code': '226', 'name': 'WINTER 2022'},
'seasons': [{'code': '226', 'name': 'WINTER 2022'}],
'otherColors': [{'code': '0502570ZWA'}, {'code': '0502570PIR'}],
'synonyms': [{'code': '0502571SRE'}],
'stayerB2B': False,
'name': [{'language': 'de', 'text': 'DANAE Rot Rioslip'},
{'language': 'sv', 'text': 'DANAE Red '},
{'language': 'en', 'text': 'DANAE Red rio briefs'},
{'language': 'it', 'text': 'DANAE Rouge slip brasiliano'},
{'language': 'fr', 'text': 'DANAE Rouge slip brésilien'},
{'language': 'da', 'text': 'DANAE Red rio briefs'},
{'language': 'nl', 'text': 'DANAE Rood rioslip'},
.......]
what i need is a dataframe with for each productcode only the values of specific keys in the subdictionaries. for ex.
productcode synonyms_code name_language_nl
0522570SRE 0522571SRE rioslip
I've tried a nested for loop which gives me a list of all values of one specific key - value pair in a subdict.
for item in all_content:
synonyms_details = item ['synonyms']
for i in synonyms_details:
print (i['code'])
How do I get from here to a DF like this
productcode synonyms_code name_language_nl
0522570SRE 0522571SRE rioslip

took another route with json_normalize to make a flattened df with original key in column
# test with json_normalize to extract list_of_dicts_of_list_of_dicts
# meta prefix necessary avoiding conflicts multi_use 'code'as key
df_syn = pd.json_normalize(all_content, record_path = ['synonyms'], meta = 'code', meta_prefix = 'org_')
result
code org_code
0 0162934FRO 0162935FRO
1 0241472TFE 0241473TF
changed column name for merge with orginal df
df_syn = df_syn.rename(columns={"code": "syn_code", "org_code":"code"})
result
syn_code code
0 0162934FRO 0162935FRO
1 0241472TFE 0241473T
merged flattened df with left merge based on combined key
df = pd.merge(left=df, right=df_syn, how='left', left_on='code', right_on='code')
result see last column. NaN because not every product has a synonym.
code brand series family style introSeason seasons otherColors synonyms stayerB2B ... composition recycleComposition media assortmentIds preOrderPeriod firstDeliveryDates productGroup language text syn_code
0 0502570SRE {'code': 'MJ', 'name': 'Marie Jo'} {'code': '0257', 'name': 'DANAE'} {'code': '0257SRE', 'name': 'DANAE Red'} {'code': '0502570'} {'code': '226', 'name': 'WINTER 2022'} [{'code': '226', 'name': 'WINTER 2022'}] [{'code': '0502570ZWA'}, {'code': '0502570PIR'}] [] False ... [{'material': [{'language': 'de', 'text': 'Pol... [{'origin': [{'language': 'de', 'text': 'Nicht... [{'type': 'IMAGE', 'imageType': 'No body pictu... [BO_MJ] {'startDate': '2022-01-01', 'endDate': '2022-0... {'common': '2022-09-05', 'deviations': []} {'code': '0SRI1'} nl DANAE Rood rioslip NaN
Next step is to get one step deeper to get the values of a list_of_lists_of_lists.
Suggestions for a more straightforward way? Extracting data like this for all nested values is quite timeconsuming.

Related

How to check if value inside of json and get the object that contains the keys with that value?

I have this JSON object
{'result':
{'chk_in': '2022-05-28',
'chk_out': '2022-05-30',
'currency': 'USD',
'rates': [
{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0},
{'code': 'BookingCom', 'name': 'Booking.com', 'rate': 299.0, 'tax': 73.0},
{'code': 'CtripTA', 'name': 'Trip.com', 'rate': 297.0, 'tax': 66.0},
],
}
How do I check and get the keys 'code' inside 'rates' contain value 'Expedia'?
This is what I tried but did not succeed.
if ('code','Expedia') in json_object['result']['rates'].items():
print(json_object['result']['rates'])
else:
print('no price')
json_object['result']['rates'] is a list, not a dict, so it doesn't have an items() method. What you want is something like:
[rate for rate in json_object['result']['rates'] if rate['code'] == 'Expedia']
This will give you a list of all the dictionaries in rates matching the criteria you're looking for; there might be none, one, or more.
The issue is that the data inside the "rates" key is a list. Take a look at the following:
>>> data = {'result':
{'chk_in': '2022-05-28',
'chk_out': '2022-05-30',
'currency': 'USD',
'rates': [
{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0},
{'code': 'BookingCom', 'name': 'Booking.com', 'rate': 299.0, 'tax': 73.0},
{'code': 'CtripTA', 'name': 'Trip.com', 'rate': 297.0, 'tax': 66.0},
],
}}
>>> type(data["result"]["rates"])
<class 'list'>
I'm not sure if I understand what you're trying to do exactly but maybe you are looking for something like this:
>>> data["result"]["rates"]
[{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0}, {'code': 'BookingCom', 'name': 'Booking.com', 'rate': 299.0, 'tax': 73.0}, {'code': 'CtripTA', 'name': 'Trip.com', 'rate': 297.0, 'tax': 66.0}]
>>> for rate in data["result"]["rates"]:
... code = rate["code"]
... if code == "Expedia":
... print(rate)
...
{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0}
>>>
You want to look through the list of rates and seach the ones with el['code'] == 'Expedia' :
d = {'result':
{'chk_in': '2022-05-28',
'chk_out': '2022-05-30',
'currency': 'USD',
'rates': [
{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0},
{'code': 'BookingCom', 'name': 'Booking.com', 'rate': 299.0, 'tax': 73.0},
{'code': 'CtripTA', 'name': 'Trip.com', 'rate': 297.0, 'tax': 66.0},
],
}
}
print([el for el in d['result']['rates'] if el['code']=='Expedia'])
>>> [{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0}]

Need help translating a nested dictionary into a pandas dataframe

Looking into translating the following nested dictionary which is an API pull from Yelp into a pandas dataframe to run visualization on:
Top 50 Pizzerias in Chicago
{'businesses': [{'alias': 'pequods-pizzeria-chicago',
'categories': [{'alias': 'pizza', 'title': 'Pizza'}],
'coordinates': {'latitude': 41.92187, 'longitude': -87.664486},
'display_phone': '(773) 327-1512',
'distance': 2158.7084581522413,
'id': 'DXwSYgiXqIVNdO9dazel6w',
'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/8QJUNblfCI0EDhOjuIWJ4A/o.jpg',
'is_closed': False,
'location': {'address1': '2207 N Clybourn Ave',
'address2': '',
'address3': '',
'city': 'Chicago',
'country': 'US',
'display_address': ['2207 N Clybourn Ave',
'Chicago, IL 60614'],
'state': 'IL',
'zip_code': '60614'},
'name': "Pequod's Pizzeria",
'phone': '+17733271512',
'price': '$$',
'rating': 4.0,
'review_count': 6586,
'transactions': ['restaurant_reservation', 'delivery'],
'url': 'https://www.yelp.com/biz/pequods-pizzeria-chicago?adjust_creative=wt2WY5Ii_urZB8YeHggW2g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=wt2WY5Ii_urZB8YeHggW2g'},
{'alias': 'lou-malnatis-pizzeria-chicago',
'categories': [{'alias': 'pizza', 'title': 'Pizza'},
{'alias': 'italian', 'title': 'Italian'},
{'alias': 'sandwiches', 'title': 'Sandwiches'}],
'coordinates': {'latitude': 41.890357,
'longitude': -87.633704},
'display_phone': '(312) 828-9800',
'distance': 4000.9990531720227,
'id': '8vFJH_paXsMocmEO_KAa3w',
'image_url': 'https://s3-media3.fl.yelpcdn.com/bphoto/9FiL-9Pbytyg6usOE02lYg/o.jpg',
'is_closed': False,
'location': {'address1': '439 N Wells St',
'address2': '',
'address3': '',
'city': 'Chicago',
'country': 'US',
'display_address': ['439 N Wells St',
'Chicago, IL 60654'],
'state': 'IL',
'zip_code': '60654'},
'name': "Lou Malnati's Pizzeria",
'phone': '+13128289800',
'price': '$$',
'rating': 4.0,
'review_count': 6368,
'transactions': ['pickup', 'delivery'],
'url': 'https://www.yelp.com/biz/lou-malnatis-pizzeria-chicago?adjust_creative=wt2WY5Ii_urZB8YeHggW2g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=wt2WY5Ii_urZB8YeHggW2g'},
....]
I've tried the below and iterations of it but haven't had any luck.
df = pd.DataFrame.from_dict(topresponse)
Im really new to coding so any advice would be helpful
response["businesses"] is a list of records, so:
df = pd.DataFrame.from_records(response["businesses"])

I'm trying to normalize the documents column within the dataframe

[{
'processingTechniques': ['Hot rolling'],
'summary': 'Metals Long Products Rebar in Coil',
'applications': ['CONCRETE REINFORCEMENT', 'METAL DOWNSTREAM INDUSTRIALS', 'CUT AND BEND', 'EPOXY COATING '],
'regions': ['MEA'],
'description': 'Metals Long Products Rebar in Coil',
'industrySegments': None,
'grade_id': '89a63243-74c7-e611-8197-06b69393ae39',
'name': '40',
'documents': [{
'documentType': 'TDS',
'title': 'Rebar in Coil_40_Global_Technical_Data_Sheet',
'url': '/en/products/documents/rebar-in-coil_40_global_technical_data_sheet/en',
'language': 'English',
'region': 'Global',
'revision': '20210812',
'id': '2bc4102f-8df7-e611-819b-06b69393ae39'
}]
}, {
'processingTechniques': ['Hot rolling'],
'summary': 'Metals Long Products Rebar in Coil',
'applications': ['CONCRETE REINFORCEMENT', 'METAL DOWNSTREAM INDUSTRIALS', 'CUT AND BEND', 'EPOXY COATING '],
'regions': ['MEA'],
'description': 'Metals Long Products Rebar in Coil',
'industrySegments': None,
'grade_id': 'dddd0468-79c7-e611-8197-06b69393ae39',
'name': '460B',
'documents': [{
'documentType': 'TDS',
'title': 'Rebar in Coil_460B_MEA_Technical_Data_Sheet',
'url': '/en/products/documents/rebar-in-coil_460b_mea_technical_data_sheet/en',
'language': 'English',
'region': 'MEA',
'revision': '20210812',
'id': '0e63bc98-c343-e811-80fd-005056857ef3'
}]
}, {
'processingTechniques': ['Hot rolling'],
'summary': 'Metals Long Products Rebar in Coil',
'applications': ['CONCRETE REINFORCEMENT', 'METAL DOWNSTREAM INDUSTRIALS', 'CUT AND BEND', 'EPOXY COATING '],
'regions': ['MEA'],
'description': 'Metals Long Products Rebar in Coil',
'industrySegments': None,
'grade_id': 'cd695035-76c7-e611-8197-06b69393ae39',
'name': '60',
'documents': [{
'documentType': 'TDS',
'title': 'Rebar in Coil_60_MEA_Technical_Data_Sheet',
'url': '/en/products/documents/rebar-in-coil_60_mea_technical_data_sheet/en',
'language': 'English',
'region': 'MEA',
'revision': '20210812',
'id': '733946d8-c343-e811-80fd-005056857ef3'
}]
}, {
'processingTechniques': ['Hot rolling'],
'summary': 'Metals Long Products Rebar in Coil',
'applications': ['CONCRETE REINFORCEMENT', 'METAL DOWNSTREAM INDUSTRIALS', 'CUT AND BEND', 'EPOXY COATING '],
'regions': ['MEA'],
'description': 'Metals Long Products Rebar in Coil',
'industrySegments': None,
'grade_id': 'c99a8cc9-79c7-e611-8197-06b69393ae39',
'name': 'B500B',
'documents': [{
'documentType': 'TDS',
'title': 'Rebar in Coil_B500B_MEA_Technical_Data_Sheet',
'url': '/en/products/documents/rebar-in-coil_b500b_mea_technical_data_sheet/en',
'language': 'English',
'region': 'MEA',
'revision': '20210812',
'id': 'bc25a637-c443-e811-80fd-005056857ef3'
}]
}]
The code written to convert this json to dataframe after normalizing the document is this
gr2 = pd.json_normalize(result, ['documents'], meta = ['regions', 'name', 'description', 'grade_id', 'processingTechniques','summary', 'applications'])
gr2['product_id'] = prod_id
gr2.head()
result is the json file attached above.
After running the above code, I'm getting this error
Can anyone help me with this ? I just want documents to get normalised along with the other columns.

Json to Pandas, include "Parents"

With a list of 150+ Neighborhoods , I am using Foursquare API to retrieve nearby venues at 500m radius of a given Neighbourhood. Each Neighbourhood is expected to return 10-20 nearby venues.
Refer to snippet of json result as returned by Foursquare.
With results['response']['groups'][0]['items'], I able to retrieve the nearby venues information and make it a Table as below. However results['response']['groups'][0]['items'] does not have the Neighbourhood ( under headerFullLocation in json) of associated venues.
Q: How can I link the Neighbourhood(headerFullLocation) to its associated nearby venue and add it as a column to table below? Thanks for the advice.
{'suggestedFilters': {'header': 'Tap to show:',
'filters': [{'name': 'Open now', 'key': 'openNow'}]},
'headerLocation': 'Alexandra Park',
'headerFullLocation': 'Alexandra Park, Toronto',**
'headerLocationGranularity': 'neighborhood',
'totalResults': 138,
'suggestedBounds': {'ne': {'lat': 43.6545000045, 'lng': -79.39379244047241},
'sw': {'lat': 43.645499995499996, 'lng': -79.4062075595276}},
'groups': [{'type': 'Recommended Places',
'name': 'recommended',
'items': [{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '5644dbaa498e7f7534154326',
'**name': 'Maker Pizza',**
'contact': {},
'location': {'address': '59 Cameron St',
'lat': 43.6504011331197,
'lng': -79.39804047841302,
'labeledLatLngs': [{'label': 'display',
'lat': 43.6504011331197,
'lng': -79.39804047841302}],
'distance': 164,
'postalCode': 'M5T 2H1',
'cc': 'CA',
'city': 'Toronto',
'state': 'ON',
'country': 'Canada',
'formattedAddress': ['59 Cameron St', 'Toronto ON M5T 2H1', 'Canada']},
'categories': [{'id': '4bf58dd8d48988d1ca941735',
'name': 'Pizza Place',
'pluralName': 'Pizza Places',
'shortName': 'Pizza',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/pizza_',
'suffix': '.png'},
'primary': True}],
'verified': False,
'stats': {'tipCount': 0,
'usersCount': 0,
'checkinsCount': 0,
'visitsCount': 0},
'beenHere': {'count': 0,
'lastCheckinExpiredAt': 0,
'marked': False,
'unconfirmedCount': 0},
'photos': {'count': 0, 'groups': []},
'hereNow': {'count': 0, 'summary': 'Nobody here', 'groups': []}},
Why don't you just do venues['Neighbourhood'] = response['headerFullLocation']. I am assuming, you send a separate request for each neigbhourhood and plan to concatenate multiple venue dataframes in the end.

How to filter meta-tags in json response with Python Requests?

I have the following code with the Matchbook betting API.
r17 = s.get('https://matchbook.com/bpapi/rest/events/?sport-ids=15&?after=1486157894&?before=14862442917&')
data1 = r17.json()
for event in data1['events']:
print(event['name'])
print(event['id'])
print(event['sport-id'])
print(event['start'])
print(event['meta-tags'])
which gives the following json output
Bayern Munich vs Schalke
368063
15
2017-02-04T14:35:00.000Z
[{'id': 1, 'url-name': 'sport', 'name': 'Sport', 'type': 'UNKNOWN'}, {'id': 402, 'url-name': 'live-betting', 'name': 'Live Betting', 'type': 'COMPETITION'}, {'id': 4, 'url-name': 'soccer', 'name': 'Soccer', 'type': 'SPORT'}, {'id': 56, 'url-name': 'germany', 'name': 'Germany', 'type': 'COUNTRY'}, {'id': 57, 'url-name': 'bundesliga', 'name': 'Bundesliga', 'type': 'COMPETITION'}, {'id': 4105, 'url-name': 'february-4th-2017', 'name': 'February 4th 2017', 'type': 'DATE'}]
The meta-tags are contained between the [] brackets. How do I filter by these meta-tags?
import pprint
r17 = requests.get('https://matchbook.com/bpapi/rest/events/?sport-ids=15&?after=1486157894&?before=14862442917&')
data = r17.json()
for event in data['events']:
print(event['name'])
pprint.pprint(event['meta-tags'], indent=4)
print('sorted:')
# change k['id'] to k['name'] if you need to sort dict's by name
pprint.pprint(sorted(event['meta-tags'], key=lambda k: k['id']), indent=4)

Categories

Resources