I have a json response with productdetails of multiple products in this structure
all_content=[{'productcode': '0502570SRE',
'brand': {'code': 'MJ', 'name': 'Marie Jo'},
'series': {'code': '0257', 'name': 'DANAE'},
'family': {'code': '0257SRE', 'name': 'DANAE Red'},
'style': {'code': '0502570'},
'introSeason': {'code': '226', 'name': 'WINTER 2022'},
'seasons': [{'code': '226', 'name': 'WINTER 2022'}],
'otherColors': [{'code': '0502570ZWA'}, {'code': '0502570PIR'}],
'synonyms': [{'code': '0502571SRE'}],
'stayerB2B': False,
'name': [{'language': 'de', 'text': 'DANAE Rot Rioslip'},
{'language': 'sv', 'text': 'DANAE Red '},
{'language': 'en', 'text': 'DANAE Red rio briefs'},
{'language': 'it', 'text': 'DANAE Rouge slip brasiliano'},
{'language': 'fr', 'text': 'DANAE Rouge slip brésilien'},
{'language': 'da', 'text': 'DANAE Red rio briefs'},
{'language': 'nl', 'text': 'DANAE Rood rioslip'},
.......]
what i need is a dataframe with for each productcode only the values of specific keys in the subdictionaries. for ex.
productcode synonyms_code name_language_nl
0522570SRE 0522571SRE rioslip
I've tried a nested for loop which gives me a list of all values of one specific key - value pair in a subdict.
for item in all_content:
synonyms_details = item ['synonyms']
for i in synonyms_details:
print (i['code'])
How do I get from here to a DF like this
productcode synonyms_code name_language_nl
0522570SRE 0522571SRE rioslip
took another route with json_normalize to make a flattened df with original key in column
# test with json_normalize to extract list_of_dicts_of_list_of_dicts
# meta prefix necessary avoiding conflicts multi_use 'code'as key
df_syn = pd.json_normalize(all_content, record_path = ['synonyms'], meta = 'code', meta_prefix = 'org_')
result
code org_code
0 0162934FRO 0162935FRO
1 0241472TFE 0241473TF
changed column name for merge with orginal df
df_syn = df_syn.rename(columns={"code": "syn_code", "org_code":"code"})
result
syn_code code
0 0162934FRO 0162935FRO
1 0241472TFE 0241473T
merged flattened df with left merge based on combined key
df = pd.merge(left=df, right=df_syn, how='left', left_on='code', right_on='code')
result see last column. NaN because not every product has a synonym.
code brand series family style introSeason seasons otherColors synonyms stayerB2B ... composition recycleComposition media assortmentIds preOrderPeriod firstDeliveryDates productGroup language text syn_code
0 0502570SRE {'code': 'MJ', 'name': 'Marie Jo'} {'code': '0257', 'name': 'DANAE'} {'code': '0257SRE', 'name': 'DANAE Red'} {'code': '0502570'} {'code': '226', 'name': 'WINTER 2022'} [{'code': '226', 'name': 'WINTER 2022'}] [{'code': '0502570ZWA'}, {'code': '0502570PIR'}] [] False ... [{'material': [{'language': 'de', 'text': 'Pol... [{'origin': [{'language': 'de', 'text': 'Nicht... [{'type': 'IMAGE', 'imageType': 'No body pictu... [BO_MJ] {'startDate': '2022-01-01', 'endDate': '2022-0... {'common': '2022-09-05', 'deviations': []} {'code': '0SRI1'} nl DANAE Rood rioslip NaN
Next step is to get one step deeper to get the values of a list_of_lists_of_lists.
Suggestions for a more straightforward way? Extracting data like this for all nested values is quite timeconsuming.
Related
I have this JSON object
{'result':
{'chk_in': '2022-05-28',
'chk_out': '2022-05-30',
'currency': 'USD',
'rates': [
{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0},
{'code': 'BookingCom', 'name': 'Booking.com', 'rate': 299.0, 'tax': 73.0},
{'code': 'CtripTA', 'name': 'Trip.com', 'rate': 297.0, 'tax': 66.0},
],
}
How do I check and get the keys 'code' inside 'rates' contain value 'Expedia'?
This is what I tried but did not succeed.
if ('code','Expedia') in json_object['result']['rates'].items():
print(json_object['result']['rates'])
else:
print('no price')
json_object['result']['rates'] is a list, not a dict, so it doesn't have an items() method. What you want is something like:
[rate for rate in json_object['result']['rates'] if rate['code'] == 'Expedia']
This will give you a list of all the dictionaries in rates matching the criteria you're looking for; there might be none, one, or more.
The issue is that the data inside the "rates" key is a list. Take a look at the following:
>>> data = {'result':
{'chk_in': '2022-05-28',
'chk_out': '2022-05-30',
'currency': 'USD',
'rates': [
{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0},
{'code': 'BookingCom', 'name': 'Booking.com', 'rate': 299.0, 'tax': 73.0},
{'code': 'CtripTA', 'name': 'Trip.com', 'rate': 297.0, 'tax': 66.0},
],
}}
>>> type(data["result"]["rates"])
<class 'list'>
I'm not sure if I understand what you're trying to do exactly but maybe you are looking for something like this:
>>> data["result"]["rates"]
[{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0}, {'code': 'BookingCom', 'name': 'Booking.com', 'rate': 299.0, 'tax': 73.0}, {'code': 'CtripTA', 'name': 'Trip.com', 'rate': 297.0, 'tax': 66.0}]
>>> for rate in data["result"]["rates"]:
... code = rate["code"]
... if code == "Expedia":
... print(rate)
...
{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0}
>>>
You want to look through the list of rates and seach the ones with el['code'] == 'Expedia' :
d = {'result':
{'chk_in': '2022-05-28',
'chk_out': '2022-05-30',
'currency': 'USD',
'rates': [
{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0},
{'code': 'BookingCom', 'name': 'Booking.com', 'rate': 299.0, 'tax': 73.0},
{'code': 'CtripTA', 'name': 'Trip.com', 'rate': 297.0, 'tax': 66.0},
],
}
}
print([el for el in d['result']['rates'] if el['code']=='Expedia'])
>>> [{'code': 'Expedia', 'name': 'Expedia', 'rate': 299.0, 'tax': 70.0}]
Looking into translating the following nested dictionary which is an API pull from Yelp into a pandas dataframe to run visualization on:
Top 50 Pizzerias in Chicago
{'businesses': [{'alias': 'pequods-pizzeria-chicago',
'categories': [{'alias': 'pizza', 'title': 'Pizza'}],
'coordinates': {'latitude': 41.92187, 'longitude': -87.664486},
'display_phone': '(773) 327-1512',
'distance': 2158.7084581522413,
'id': 'DXwSYgiXqIVNdO9dazel6w',
'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/8QJUNblfCI0EDhOjuIWJ4A/o.jpg',
'is_closed': False,
'location': {'address1': '2207 N Clybourn Ave',
'address2': '',
'address3': '',
'city': 'Chicago',
'country': 'US',
'display_address': ['2207 N Clybourn Ave',
'Chicago, IL 60614'],
'state': 'IL',
'zip_code': '60614'},
'name': "Pequod's Pizzeria",
'phone': '+17733271512',
'price': '$$',
'rating': 4.0,
'review_count': 6586,
'transactions': ['restaurant_reservation', 'delivery'],
'url': 'https://www.yelp.com/biz/pequods-pizzeria-chicago?adjust_creative=wt2WY5Ii_urZB8YeHggW2g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=wt2WY5Ii_urZB8YeHggW2g'},
{'alias': 'lou-malnatis-pizzeria-chicago',
'categories': [{'alias': 'pizza', 'title': 'Pizza'},
{'alias': 'italian', 'title': 'Italian'},
{'alias': 'sandwiches', 'title': 'Sandwiches'}],
'coordinates': {'latitude': 41.890357,
'longitude': -87.633704},
'display_phone': '(312) 828-9800',
'distance': 4000.9990531720227,
'id': '8vFJH_paXsMocmEO_KAa3w',
'image_url': 'https://s3-media3.fl.yelpcdn.com/bphoto/9FiL-9Pbytyg6usOE02lYg/o.jpg',
'is_closed': False,
'location': {'address1': '439 N Wells St',
'address2': '',
'address3': '',
'city': 'Chicago',
'country': 'US',
'display_address': ['439 N Wells St',
'Chicago, IL 60654'],
'state': 'IL',
'zip_code': '60654'},
'name': "Lou Malnati's Pizzeria",
'phone': '+13128289800',
'price': '$$',
'rating': 4.0,
'review_count': 6368,
'transactions': ['pickup', 'delivery'],
'url': 'https://www.yelp.com/biz/lou-malnatis-pizzeria-chicago?adjust_creative=wt2WY5Ii_urZB8YeHggW2g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=wt2WY5Ii_urZB8YeHggW2g'},
....]
I've tried the below and iterations of it but haven't had any luck.
df = pd.DataFrame.from_dict(topresponse)
Im really new to coding so any advice would be helpful
response["businesses"] is a list of records, so:
df = pd.DataFrame.from_records(response["businesses"])
[{
'processingTechniques': ['Hot rolling'],
'summary': 'Metals Long Products Rebar in Coil',
'applications': ['CONCRETE REINFORCEMENT', 'METAL DOWNSTREAM INDUSTRIALS', 'CUT AND BEND', 'EPOXY COATING '],
'regions': ['MEA'],
'description': 'Metals Long Products Rebar in Coil',
'industrySegments': None,
'grade_id': '89a63243-74c7-e611-8197-06b69393ae39',
'name': '40',
'documents': [{
'documentType': 'TDS',
'title': 'Rebar in Coil_40_Global_Technical_Data_Sheet',
'url': '/en/products/documents/rebar-in-coil_40_global_technical_data_sheet/en',
'language': 'English',
'region': 'Global',
'revision': '20210812',
'id': '2bc4102f-8df7-e611-819b-06b69393ae39'
}]
}, {
'processingTechniques': ['Hot rolling'],
'summary': 'Metals Long Products Rebar in Coil',
'applications': ['CONCRETE REINFORCEMENT', 'METAL DOWNSTREAM INDUSTRIALS', 'CUT AND BEND', 'EPOXY COATING '],
'regions': ['MEA'],
'description': 'Metals Long Products Rebar in Coil',
'industrySegments': None,
'grade_id': 'dddd0468-79c7-e611-8197-06b69393ae39',
'name': '460B',
'documents': [{
'documentType': 'TDS',
'title': 'Rebar in Coil_460B_MEA_Technical_Data_Sheet',
'url': '/en/products/documents/rebar-in-coil_460b_mea_technical_data_sheet/en',
'language': 'English',
'region': 'MEA',
'revision': '20210812',
'id': '0e63bc98-c343-e811-80fd-005056857ef3'
}]
}, {
'processingTechniques': ['Hot rolling'],
'summary': 'Metals Long Products Rebar in Coil',
'applications': ['CONCRETE REINFORCEMENT', 'METAL DOWNSTREAM INDUSTRIALS', 'CUT AND BEND', 'EPOXY COATING '],
'regions': ['MEA'],
'description': 'Metals Long Products Rebar in Coil',
'industrySegments': None,
'grade_id': 'cd695035-76c7-e611-8197-06b69393ae39',
'name': '60',
'documents': [{
'documentType': 'TDS',
'title': 'Rebar in Coil_60_MEA_Technical_Data_Sheet',
'url': '/en/products/documents/rebar-in-coil_60_mea_technical_data_sheet/en',
'language': 'English',
'region': 'MEA',
'revision': '20210812',
'id': '733946d8-c343-e811-80fd-005056857ef3'
}]
}, {
'processingTechniques': ['Hot rolling'],
'summary': 'Metals Long Products Rebar in Coil',
'applications': ['CONCRETE REINFORCEMENT', 'METAL DOWNSTREAM INDUSTRIALS', 'CUT AND BEND', 'EPOXY COATING '],
'regions': ['MEA'],
'description': 'Metals Long Products Rebar in Coil',
'industrySegments': None,
'grade_id': 'c99a8cc9-79c7-e611-8197-06b69393ae39',
'name': 'B500B',
'documents': [{
'documentType': 'TDS',
'title': 'Rebar in Coil_B500B_MEA_Technical_Data_Sheet',
'url': '/en/products/documents/rebar-in-coil_b500b_mea_technical_data_sheet/en',
'language': 'English',
'region': 'MEA',
'revision': '20210812',
'id': 'bc25a637-c443-e811-80fd-005056857ef3'
}]
}]
The code written to convert this json to dataframe after normalizing the document is this
gr2 = pd.json_normalize(result, ['documents'], meta = ['regions', 'name', 'description', 'grade_id', 'processingTechniques','summary', 'applications'])
gr2['product_id'] = prod_id
gr2.head()
result is the json file attached above.
After running the above code, I'm getting this error
Can anyone help me with this ? I just want documents to get normalised along with the other columns.
With a list of 150+ Neighborhoods , I am using Foursquare API to retrieve nearby venues at 500m radius of a given Neighbourhood. Each Neighbourhood is expected to return 10-20 nearby venues.
Refer to snippet of json result as returned by Foursquare.
With results['response']['groups'][0]['items'], I able to retrieve the nearby venues information and make it a Table as below. However results['response']['groups'][0]['items'] does not have the Neighbourhood ( under headerFullLocation in json) of associated venues.
Q: How can I link the Neighbourhood(headerFullLocation) to its associated nearby venue and add it as a column to table below? Thanks for the advice.
{'suggestedFilters': {'header': 'Tap to show:',
'filters': [{'name': 'Open now', 'key': 'openNow'}]},
'headerLocation': 'Alexandra Park',
'headerFullLocation': 'Alexandra Park, Toronto',**
'headerLocationGranularity': 'neighborhood',
'totalResults': 138,
'suggestedBounds': {'ne': {'lat': 43.6545000045, 'lng': -79.39379244047241},
'sw': {'lat': 43.645499995499996, 'lng': -79.4062075595276}},
'groups': [{'type': 'Recommended Places',
'name': 'recommended',
'items': [{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '5644dbaa498e7f7534154326',
'**name': 'Maker Pizza',**
'contact': {},
'location': {'address': '59 Cameron St',
'lat': 43.6504011331197,
'lng': -79.39804047841302,
'labeledLatLngs': [{'label': 'display',
'lat': 43.6504011331197,
'lng': -79.39804047841302}],
'distance': 164,
'postalCode': 'M5T 2H1',
'cc': 'CA',
'city': 'Toronto',
'state': 'ON',
'country': 'Canada',
'formattedAddress': ['59 Cameron St', 'Toronto ON M5T 2H1', 'Canada']},
'categories': [{'id': '4bf58dd8d48988d1ca941735',
'name': 'Pizza Place',
'pluralName': 'Pizza Places',
'shortName': 'Pizza',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/pizza_',
'suffix': '.png'},
'primary': True}],
'verified': False,
'stats': {'tipCount': 0,
'usersCount': 0,
'checkinsCount': 0,
'visitsCount': 0},
'beenHere': {'count': 0,
'lastCheckinExpiredAt': 0,
'marked': False,
'unconfirmedCount': 0},
'photos': {'count': 0, 'groups': []},
'hereNow': {'count': 0, 'summary': 'Nobody here', 'groups': []}},
Why don't you just do venues['Neighbourhood'] = response['headerFullLocation']. I am assuming, you send a separate request for each neigbhourhood and plan to concatenate multiple venue dataframes in the end.
I have the following code with the Matchbook betting API.
r17 = s.get('https://matchbook.com/bpapi/rest/events/?sport-ids=15&?after=1486157894&?before=14862442917&')
data1 = r17.json()
for event in data1['events']:
print(event['name'])
print(event['id'])
print(event['sport-id'])
print(event['start'])
print(event['meta-tags'])
which gives the following json output
Bayern Munich vs Schalke
368063
15
2017-02-04T14:35:00.000Z
[{'id': 1, 'url-name': 'sport', 'name': 'Sport', 'type': 'UNKNOWN'}, {'id': 402, 'url-name': 'live-betting', 'name': 'Live Betting', 'type': 'COMPETITION'}, {'id': 4, 'url-name': 'soccer', 'name': 'Soccer', 'type': 'SPORT'}, {'id': 56, 'url-name': 'germany', 'name': 'Germany', 'type': 'COUNTRY'}, {'id': 57, 'url-name': 'bundesliga', 'name': 'Bundesliga', 'type': 'COMPETITION'}, {'id': 4105, 'url-name': 'february-4th-2017', 'name': 'February 4th 2017', 'type': 'DATE'}]
The meta-tags are contained between the [] brackets. How do I filter by these meta-tags?
import pprint
r17 = requests.get('https://matchbook.com/bpapi/rest/events/?sport-ids=15&?after=1486157894&?before=14862442917&')
data = r17.json()
for event in data['events']:
print(event['name'])
pprint.pprint(event['meta-tags'], indent=4)
print('sorted:')
# change k['id'] to k['name'] if you need to sort dict's by name
pprint.pprint(sorted(event['meta-tags'], key=lambda k: k['id']), indent=4)