I'm trying to normalize the documents column within the dataframe - python

[{
'processingTechniques': ['Hot rolling'],
'summary': 'Metals Long Products Rebar in Coil',
'applications': ['CONCRETE REINFORCEMENT', 'METAL DOWNSTREAM INDUSTRIALS', 'CUT AND BEND', 'EPOXY COATING '],
'regions': ['MEA'],
'description': 'Metals Long Products Rebar in Coil',
'industrySegments': None,
'grade_id': '89a63243-74c7-e611-8197-06b69393ae39',
'name': '40',
'documents': [{
'documentType': 'TDS',
'title': 'Rebar in Coil_40_Global_Technical_Data_Sheet',
'url': '/en/products/documents/rebar-in-coil_40_global_technical_data_sheet/en',
'language': 'English',
'region': 'Global',
'revision': '20210812',
'id': '2bc4102f-8df7-e611-819b-06b69393ae39'
}]
}, {
'processingTechniques': ['Hot rolling'],
'summary': 'Metals Long Products Rebar in Coil',
'applications': ['CONCRETE REINFORCEMENT', 'METAL DOWNSTREAM INDUSTRIALS', 'CUT AND BEND', 'EPOXY COATING '],
'regions': ['MEA'],
'description': 'Metals Long Products Rebar in Coil',
'industrySegments': None,
'grade_id': 'dddd0468-79c7-e611-8197-06b69393ae39',
'name': '460B',
'documents': [{
'documentType': 'TDS',
'title': 'Rebar in Coil_460B_MEA_Technical_Data_Sheet',
'url': '/en/products/documents/rebar-in-coil_460b_mea_technical_data_sheet/en',
'language': 'English',
'region': 'MEA',
'revision': '20210812',
'id': '0e63bc98-c343-e811-80fd-005056857ef3'
}]
}, {
'processingTechniques': ['Hot rolling'],
'summary': 'Metals Long Products Rebar in Coil',
'applications': ['CONCRETE REINFORCEMENT', 'METAL DOWNSTREAM INDUSTRIALS', 'CUT AND BEND', 'EPOXY COATING '],
'regions': ['MEA'],
'description': 'Metals Long Products Rebar in Coil',
'industrySegments': None,
'grade_id': 'cd695035-76c7-e611-8197-06b69393ae39',
'name': '60',
'documents': [{
'documentType': 'TDS',
'title': 'Rebar in Coil_60_MEA_Technical_Data_Sheet',
'url': '/en/products/documents/rebar-in-coil_60_mea_technical_data_sheet/en',
'language': 'English',
'region': 'MEA',
'revision': '20210812',
'id': '733946d8-c343-e811-80fd-005056857ef3'
}]
}, {
'processingTechniques': ['Hot rolling'],
'summary': 'Metals Long Products Rebar in Coil',
'applications': ['CONCRETE REINFORCEMENT', 'METAL DOWNSTREAM INDUSTRIALS', 'CUT AND BEND', 'EPOXY COATING '],
'regions': ['MEA'],
'description': 'Metals Long Products Rebar in Coil',
'industrySegments': None,
'grade_id': 'c99a8cc9-79c7-e611-8197-06b69393ae39',
'name': 'B500B',
'documents': [{
'documentType': 'TDS',
'title': 'Rebar in Coil_B500B_MEA_Technical_Data_Sheet',
'url': '/en/products/documents/rebar-in-coil_b500b_mea_technical_data_sheet/en',
'language': 'English',
'region': 'MEA',
'revision': '20210812',
'id': 'bc25a637-c443-e811-80fd-005056857ef3'
}]
}]
The code written to convert this json to dataframe after normalizing the document is this
gr2 = pd.json_normalize(result, ['documents'], meta = ['regions', 'name', 'description', 'grade_id', 'processingTechniques','summary', 'applications'])
gr2['product_id'] = prod_id
gr2.head()
result is the json file attached above.
After running the above code, I'm getting this error
Can anyone help me with this ? I just want documents to get normalised along with the other columns.

Related

multinested json python to df

I have a json response with productdetails of multiple products in this structure
all_content=[{'productcode': '0502570SRE',
'brand': {'code': 'MJ', 'name': 'Marie Jo'},
'series': {'code': '0257', 'name': 'DANAE'},
'family': {'code': '0257SRE', 'name': 'DANAE Red'},
'style': {'code': '0502570'},
'introSeason': {'code': '226', 'name': 'WINTER 2022'},
'seasons': [{'code': '226', 'name': 'WINTER 2022'}],
'otherColors': [{'code': '0502570ZWA'}, {'code': '0502570PIR'}],
'synonyms': [{'code': '0502571SRE'}],
'stayerB2B': False,
'name': [{'language': 'de', 'text': 'DANAE Rot Rioslip'},
{'language': 'sv', 'text': 'DANAE Red '},
{'language': 'en', 'text': 'DANAE Red rio briefs'},
{'language': 'it', 'text': 'DANAE Rouge slip brasiliano'},
{'language': 'fr', 'text': 'DANAE Rouge slip brésilien'},
{'language': 'da', 'text': 'DANAE Red rio briefs'},
{'language': 'nl', 'text': 'DANAE Rood rioslip'},
.......]
what i need is a dataframe with for each productcode only the values of specific keys in the subdictionaries. for ex.
productcode synonyms_code name_language_nl
0522570SRE 0522571SRE rioslip
I've tried a nested for loop which gives me a list of all values of one specific key - value pair in a subdict.
for item in all_content:
synonyms_details = item ['synonyms']
for i in synonyms_details:
print (i['code'])
How do I get from here to a DF like this
productcode synonyms_code name_language_nl
0522570SRE 0522571SRE rioslip
took another route with json_normalize to make a flattened df with original key in column
# test with json_normalize to extract list_of_dicts_of_list_of_dicts
# meta prefix necessary avoiding conflicts multi_use 'code'as key
df_syn = pd.json_normalize(all_content, record_path = ['synonyms'], meta = 'code', meta_prefix = 'org_')
result
code org_code
0 0162934FRO 0162935FRO
1 0241472TFE 0241473TF
changed column name for merge with orginal df
df_syn = df_syn.rename(columns={"code": "syn_code", "org_code":"code"})
result
syn_code code
0 0162934FRO 0162935FRO
1 0241472TFE 0241473T
merged flattened df with left merge based on combined key
df = pd.merge(left=df, right=df_syn, how='left', left_on='code', right_on='code')
result see last column. NaN because not every product has a synonym.
code brand series family style introSeason seasons otherColors synonyms stayerB2B ... composition recycleComposition media assortmentIds preOrderPeriod firstDeliveryDates productGroup language text syn_code
0 0502570SRE {'code': 'MJ', 'name': 'Marie Jo'} {'code': '0257', 'name': 'DANAE'} {'code': '0257SRE', 'name': 'DANAE Red'} {'code': '0502570'} {'code': '226', 'name': 'WINTER 2022'} [{'code': '226', 'name': 'WINTER 2022'}] [{'code': '0502570ZWA'}, {'code': '0502570PIR'}] [] False ... [{'material': [{'language': 'de', 'text': 'Pol... [{'origin': [{'language': 'de', 'text': 'Nicht... [{'type': 'IMAGE', 'imageType': 'No body pictu... [BO_MJ] {'startDate': '2022-01-01', 'endDate': '2022-0... {'common': '2022-09-05', 'deviations': []} {'code': '0SRI1'} nl DANAE Rood rioslip NaN
Next step is to get one step deeper to get the values of a list_of_lists_of_lists.
Suggestions for a more straightforward way? Extracting data like this for all nested values is quite timeconsuming.

ccxt OKEx placing orders

I placed DEMO order on OKEx with amount 246 and price 0.46. When I looked on site, order amount was more than 11k:
I fetched info about order:
{'info': {'accFillSz': '0', 'avgPx': '', 'cTime': '1652262833825', 'category': 'normal', 'ccy': '', 'clOrdId': 'e847386590ce4dBCc812b22b16d7807c', 'fee': '0', 'feeCcy': 'USDT', 'fillPx': '', 'fillSz': '0', 'fillTime': '', 'instId': 'XRP-USDT-SWAP', 'instType': 'SWAP', 'lever': '1', 'ordId': '444557778278035458', 'ordType': 'limit', 'pnl': '0', 'posSide': 'long', 'px': '0.45693', 'rebate': '0', 'rebateCcy': 'USDT', 'side': 'buy', 'slOrdPx': '-1', 'slTriggerPx': '0.44779', 'slTriggerPxType': 'mark', 'source': '', 'state': 'live', 'sz': '246', 'tag': '', 'tdMode': 'isolated', 'tgtCcy': '', 'tpOrdPx': '-1', 'tpTriggerPx': '0.46606', 'tpTriggerPxType': 'mark', 'tradeId': '', 'uTime': '1652262833825'}, 'id': '444557778278035458', 'clientOrderId': 'e847386590ce4dBCc812b22b16d7807c', 'timestamp': 1652262833825, 'datetime': '2022-05-11T09:53:53.825Z', 'lastTradeTimestamp': None, 'symbol': 'XRP/USDT:USDT', 'type': 'limit', 'timeInForce': None, 'postOnly': None, 'side': 'buy', 'price': 0.45693, 'stopPrice': 0.44779, 'average': None, 'cost': 0.0, 'amount': 246.0, 'filled': 0.0, 'remaining': 246.0, 'status': 'open', 'fee': {'cost': 0.0, 'currency': 'USDT'}, 'trades': [], 'fees': [{'cost': 0.0, 'currency': 'USDT'}]}
and amount is 246.
Here is my code:
exchange = ccxt.okx(
{
'apiKey': API_KEY,
'secret': API_SECRET,
'password': API_PASSPHRASE,
'options': {
'defaultType': 'swap'
},
'headers': {
'x-simulated-trading': '1'
}
}
exchange.load_markets()
market = exchange.market(PAIR)
params = {
'tdMode': 'isolated',
'posSide': 'long',
'instId': market['id'],
'side': 'buy',
'sz': 246,
'tpOrdPx': '-1',
'slOrdPx': '-1',
'tpTriggerPx': str(take_profit),
'slTriggerPx': str(stop_loss),
'tpTriggerPxType': 'mark',
'slTriggerPxType': 'mark',
}
order = exchange.create_order(
f"{PAIR}", ORDER_TYPE, 'buy', summa, price, params=params)
info = exchange.fetch_order(order['id'], PAIR)
print(info)
What I'm doing wrong?
For starters you can only buy multiples of 100 of XRP as you can see in the screenshot below so you can only buy 200 or 300 and not 246.
Secondly, it looks like there's a multiplier of 100 being applied in the api where 1 = 100 XRP. I was able to deduce this by entering 24,600 XRP which gives you around $11k that you mentioned.
In your case, if you were to buy 200 or 300 XRP, you would need to enter 2 or 3 as an amount in the api request.

Need help translating a nested dictionary into a pandas dataframe

Looking into translating the following nested dictionary which is an API pull from Yelp into a pandas dataframe to run visualization on:
Top 50 Pizzerias in Chicago
{'businesses': [{'alias': 'pequods-pizzeria-chicago',
'categories': [{'alias': 'pizza', 'title': 'Pizza'}],
'coordinates': {'latitude': 41.92187, 'longitude': -87.664486},
'display_phone': '(773) 327-1512',
'distance': 2158.7084581522413,
'id': 'DXwSYgiXqIVNdO9dazel6w',
'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/8QJUNblfCI0EDhOjuIWJ4A/o.jpg',
'is_closed': False,
'location': {'address1': '2207 N Clybourn Ave',
'address2': '',
'address3': '',
'city': 'Chicago',
'country': 'US',
'display_address': ['2207 N Clybourn Ave',
'Chicago, IL 60614'],
'state': 'IL',
'zip_code': '60614'},
'name': "Pequod's Pizzeria",
'phone': '+17733271512',
'price': '$$',
'rating': 4.0,
'review_count': 6586,
'transactions': ['restaurant_reservation', 'delivery'],
'url': 'https://www.yelp.com/biz/pequods-pizzeria-chicago?adjust_creative=wt2WY5Ii_urZB8YeHggW2g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=wt2WY5Ii_urZB8YeHggW2g'},
{'alias': 'lou-malnatis-pizzeria-chicago',
'categories': [{'alias': 'pizza', 'title': 'Pizza'},
{'alias': 'italian', 'title': 'Italian'},
{'alias': 'sandwiches', 'title': 'Sandwiches'}],
'coordinates': {'latitude': 41.890357,
'longitude': -87.633704},
'display_phone': '(312) 828-9800',
'distance': 4000.9990531720227,
'id': '8vFJH_paXsMocmEO_KAa3w',
'image_url': 'https://s3-media3.fl.yelpcdn.com/bphoto/9FiL-9Pbytyg6usOE02lYg/o.jpg',
'is_closed': False,
'location': {'address1': '439 N Wells St',
'address2': '',
'address3': '',
'city': 'Chicago',
'country': 'US',
'display_address': ['439 N Wells St',
'Chicago, IL 60654'],
'state': 'IL',
'zip_code': '60654'},
'name': "Lou Malnati's Pizzeria",
'phone': '+13128289800',
'price': '$$',
'rating': 4.0,
'review_count': 6368,
'transactions': ['pickup', 'delivery'],
'url': 'https://www.yelp.com/biz/lou-malnatis-pizzeria-chicago?adjust_creative=wt2WY5Ii_urZB8YeHggW2g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=wt2WY5Ii_urZB8YeHggW2g'},
....]
I've tried the below and iterations of it but haven't had any luck.
df = pd.DataFrame.from_dict(topresponse)
Im really new to coding so any advice would be helpful
response["businesses"] is a list of records, so:
df = pd.DataFrame.from_records(response["businesses"])

Json to Pandas, include "Parents"

With a list of 150+ Neighborhoods , I am using Foursquare API to retrieve nearby venues at 500m radius of a given Neighbourhood. Each Neighbourhood is expected to return 10-20 nearby venues.
Refer to snippet of json result as returned by Foursquare.
With results['response']['groups'][0]['items'], I able to retrieve the nearby venues information and make it a Table as below. However results['response']['groups'][0]['items'] does not have the Neighbourhood ( under headerFullLocation in json) of associated venues.
Q: How can I link the Neighbourhood(headerFullLocation) to its associated nearby venue and add it as a column to table below? Thanks for the advice.
{'suggestedFilters': {'header': 'Tap to show:',
'filters': [{'name': 'Open now', 'key': 'openNow'}]},
'headerLocation': 'Alexandra Park',
'headerFullLocation': 'Alexandra Park, Toronto',**
'headerLocationGranularity': 'neighborhood',
'totalResults': 138,
'suggestedBounds': {'ne': {'lat': 43.6545000045, 'lng': -79.39379244047241},
'sw': {'lat': 43.645499995499996, 'lng': -79.4062075595276}},
'groups': [{'type': 'Recommended Places',
'name': 'recommended',
'items': [{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '5644dbaa498e7f7534154326',
'**name': 'Maker Pizza',**
'contact': {},
'location': {'address': '59 Cameron St',
'lat': 43.6504011331197,
'lng': -79.39804047841302,
'labeledLatLngs': [{'label': 'display',
'lat': 43.6504011331197,
'lng': -79.39804047841302}],
'distance': 164,
'postalCode': 'M5T 2H1',
'cc': 'CA',
'city': 'Toronto',
'state': 'ON',
'country': 'Canada',
'formattedAddress': ['59 Cameron St', 'Toronto ON M5T 2H1', 'Canada']},
'categories': [{'id': '4bf58dd8d48988d1ca941735',
'name': 'Pizza Place',
'pluralName': 'Pizza Places',
'shortName': 'Pizza',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/pizza_',
'suffix': '.png'},
'primary': True}],
'verified': False,
'stats': {'tipCount': 0,
'usersCount': 0,
'checkinsCount': 0,
'visitsCount': 0},
'beenHere': {'count': 0,
'lastCheckinExpiredAt': 0,
'marked': False,
'unconfirmedCount': 0},
'photos': {'count': 0, 'groups': []},
'hereNow': {'count': 0, 'summary': 'Nobody here', 'groups': []}},
Why don't you just do venues['Neighbourhood'] = response['headerFullLocation']. I am assuming, you send a separate request for each neigbhourhood and plan to concatenate multiple venue dataframes in the end.

How to filter meta-tags in json response with Python Requests?

I have the following code with the Matchbook betting API.
r17 = s.get('https://matchbook.com/bpapi/rest/events/?sport-ids=15&?after=1486157894&?before=14862442917&')
data1 = r17.json()
for event in data1['events']:
print(event['name'])
print(event['id'])
print(event['sport-id'])
print(event['start'])
print(event['meta-tags'])
which gives the following json output
Bayern Munich vs Schalke
368063
15
2017-02-04T14:35:00.000Z
[{'id': 1, 'url-name': 'sport', 'name': 'Sport', 'type': 'UNKNOWN'}, {'id': 402, 'url-name': 'live-betting', 'name': 'Live Betting', 'type': 'COMPETITION'}, {'id': 4, 'url-name': 'soccer', 'name': 'Soccer', 'type': 'SPORT'}, {'id': 56, 'url-name': 'germany', 'name': 'Germany', 'type': 'COUNTRY'}, {'id': 57, 'url-name': 'bundesliga', 'name': 'Bundesliga', 'type': 'COMPETITION'}, {'id': 4105, 'url-name': 'february-4th-2017', 'name': 'February 4th 2017', 'type': 'DATE'}]
The meta-tags are contained between the [] brackets. How do I filter by these meta-tags?
import pprint
r17 = requests.get('https://matchbook.com/bpapi/rest/events/?sport-ids=15&?after=1486157894&?before=14862442917&')
data = r17.json()
for event in data['events']:
print(event['name'])
pprint.pprint(event['meta-tags'], indent=4)
print('sorted:')
# change k['id'] to k['name'] if you need to sort dict's by name
pprint.pprint(sorted(event['meta-tags'], key=lambda k: k['id']), indent=4)

Categories

Resources