Related
I have a column that was extracted using Pandas. The following column may contain one dictionary or more than one dictionary.
Column B
[{'url': 'mailto:Kim_Do#dmx.com', 'type': 0, 'id': 1021857, 'name': 'KIM Do', 'entryListId': -1}, {'url': 'mailto:Angel_Kong#dmx.com', 'type': 0, 'id': 1023306, 'name': 'Angel Kong', 'entryListId': -1}, {'url': 'mailto:Alex_Do#dmx.com', 'type': 0, 'id': 1023289, 'name': 'Alex Do', 'entryListId': -1}
[{'url': 'mailto:Ray_Chan#dmx.com', 'type': 0, 'id': 1021857, 'name': 'Ray Chan', 'entryListId': -1}, {'url': 'mailto:Paul_Jones#dmx.com', 'type': 0, 'id': 1023306, 'name': 'Paul Jones, 'entryListId': -1}]
nan
nan
[{'url': 'mailto:Ray_Chaudhry#dmx.com', 'type': 0, 'id': 1021857, 'name': 'Ray Chaudhry', 'entryListId': -1}]
What I want back is just the names from the dictionary. So, the output should be as follows:
Column B
Kim Do, Angel Kong, Alex Do, Fred Tome
Ray Chan, Paul Jones
nan
nan
Ray Chaudhry
How can I achieve this. Thank you!
You can use:
df['New'] = df['Column B'].explode().str['name'].dropna().groupby(level=0).agg(', '.join)
Output (New column only):
0 KIM Do, Angel Kong, Alex Do
1 Ray Chan, Paul Jones
2 NaN
3 NaN
4 Ray Chaudhry
Use this function:
def extract_names(list_data):
row_names = []
for n in range(len(list_data)):
row_names.append(list_data[n]['name'])
return row_names
store_names = []
col = 0 #column name
for idx, row in df.iterrows():
# print(idx)
# print(row[col])
store_names.append(extract_names(row[col]))
# print('--')
Now you can store the list as parameter of your choice:
df['Names'] = store_names
we have the following dataframe:
import pandas as pd
our_df = pd.DataFrame(data = {'rank': {0: 1, 1: 2}, 'title_name': {0: "And It's Still Alright", 1: 'Black Madonna'}, 'title_id': {0: '120034150', 1: '106938609'}, 'artist_id': {0: '222521', 1: '200160'}, 'artist_name': {0: 'Nathaniel Rateliff', 1: 'Cage The Elephant'}, 'label': {0: 'CNCO', 1: 'RCA'}, 'metrics': {0: [{'name': 'Rank', 'value': 1}, {'name': 'Song', 'value': "And It's Still Alright"}, {'name': 'Artist', 'value': 'Nathaniel Rateliff'}, {'name': 'TP Spins', 'value': 933}, {'name': '+/- Chg. Spins', 'value': -32}, {'name': 'LP Spins', 'value': 965}, {'name': 'Stations', 'value': '44/46'}, {'name': 'Adds', 'value': 0}, {'name': 'TP Audience', 'value': 1260000}, {'name': '+/- Chg. Audience', 'value': -40600}, {'name': 'LP Audience', 'value': 1300600}, {'name': 'TP Stream', 'value': 413101}], 1: [{'name': 'Rank', 'value': 2}, {'name': 'Song', 'value': 'Black Madonna'}, {'name': 'Artist', 'value': 'Cage The Elephant'}, {'name': 'TP Spins', 'value': 814}, {'name': '+/- Chg. Spins', 'value': 38}, {'name': 'LP Spins', 'value': 776}, {'name': 'Stations', 'value': '38/46'}, {'name': 'Adds', 'value': 0}, {'name': 'TP Audience', 'value': 1283400}, {'name': '+/- Chg. Audience', 'value': -21600}, {'name': 'LP Audience', 'value': 1305000}, {'name': 'TP Stream', 'value': 362366}]}})
and we are looking to convert the metrics column into 12 new columns in our dataframe, using the metric's name field as the column name, and value field as the field in the dataframe. Something like this:
rank title_name title_id artist_id artist_name label Rank Song ...
1 'And It's Still Alright' 120034150 222521 'Nathaniel Rateliff' 'CNCO' 1 "And It's Still Alright"
Here's what the value in the metrics column looks like for row 1:
our_df['metrics'][0]
[{'name': 'Rank', 'value': 1},
{'name': 'Song', 'value': "And It's Still Alright"},
{'name': 'Artist', 'value': 'Nathaniel Rateliff'},
{'name': 'TP Spins', 'value': 933},
{'name': '+/- Chg. Spins', 'value': -32},
{'name': 'LP Spins', 'value': 965},
{'name': 'Stations', 'value': '44/46'},
{'name': 'Adds', 'value': 0},
{'name': 'TP Audience', 'value': 1260000},
{'name': '+/- Chg. Audience', 'value': -40600},
{'name': 'LP Audience', 'value': 1300600},
{'name': 'TP Stream', 'value': 413101}]
The +/- in the column names may be problematic though, along with the . in Chg. This dataframe would be best if all the column names were snake_case, if the +/- was replaced with plus_minus, and if the . in Chg. was simply dropped.
Edit: we can assume that the metric names will be the same in every row in the dataframe. However, there may be other dataframes with different metric names, so it would be preferable if the names 'Rank', 'Song', 'Artist', etc. were not hardcoded. Here is the original list before it was converted into a pandas dataframe:
raw_data = [{'rank': 1,
'title_name': 'BUTTER',
'title_id': '',
'artist_id': '',
'artist_name': 'BTS',
'label': '',
'peak_position': 1,
'last_week_rank': 7,
'last_2week_rank': 8,
'metrics': [{'name': 'Rank', 'value': 1},
{'name': 'Song', 'value': 'BUTTER'},
{'name': 'Artist', 'value': 'BTS'},
{'name': 'Label Description', 'value': None},
{'name': 'Label', 'value': ' '},
{'name': 'Last Week Rank', 'value': 7},
{'name': 'Last 2 Week Rank', 'value': 8},
{'name': 'Weeks On Chart', 'value': 15}]},
{'rank': 2,
'title_name': 'STAY',
'title_id': '',
'artist_id': '',
'artist_name': 'THE KID LAROI & JUS',
'label': '',
'peak_position': 1,
'last_week_rank': 1,
'last_2week_rank': 1,
'metrics': [{'name': 'Rank', 'value': 2},
{'name': 'Song', 'value': 'STAY'},
{'name': 'Artist', 'value': 'THE KID LAROI & JUS'},
{'name': 'Label Description', 'value': None},
{'name': 'Label', 'value': ' '},
{'name': 'Last Week Rank', 'value': 1},
{'name': 'Last 2 Week Rank', 'value': 1},
{'name': 'Weeks On Chart', 'value': 8}]}]
Most likely, the fastest way is to process raw_data as a dictionary and only then construct a DataFrame with it.
records = []
for rec in raw_data:
for metric in rec['metrics']:
# process name: snake_case > drop '.' > '+/-' to 'plus_minus'
name = metric['name'].lower().replace(' ','_').replace('.','').replace('+/-','plus_minus')
rec[name] = metric['value']
rec.pop('metrics') # drop metric records
records.append(rec)
df = pd.DataFrame(records)
Output
Resulting df
rank
title_name
title_id
artist_id
artist_name
label
peak_position
last_week_rank
last_2week_rank
song
artist
label_description
last_2_week_rank
weeks_on_chart
0
1
BUTTER
BTS
1
7
8
BUTTER
BTS
8
15
1
2
STAY
THE KID LAROI & JUS
1
1
1
STAY
THE KID LAROI & JUS
1
8
Setup
raw_data = [{'rank': 1,
'title_name': 'BUTTER',
'title_id': '',
'artist_id': '',
'artist_name': 'BTS',
'label': '',
'peak_position': 1,
'last_week_rank': 7,
'last_2week_rank': 8,
'metrics': [{'name': 'Rank', 'value': 1},
{'name': 'Song', 'value': 'BUTTER'},
{'name': 'Artist', 'value': 'BTS'},
{'name': 'Label Description', 'value': None},
{'name': 'Label', 'value': ' '},
{'name': 'Last Week Rank', 'value': 7},
{'name': 'Last 2 Week Rank', 'value': 8},
{'name': 'Weeks On Chart', 'value': 15}]},
{'rank': 2,
'title_name': 'STAY',
'title_id': '',
'artist_id': '',
'artist_name': 'THE KID LAROI & JUS',
'label': '',
'peak_position': 1,
'last_week_rank': 1,
'last_2week_rank': 1,
'metrics': [{'name': 'Rank', 'value': 2},
{'name': 'Song', 'value': 'STAY'},
{'name': 'Artist', 'value': 'THE KID LAROI & JUS'},
{'name': 'Label Description', 'value': None},
{'name': 'Label', 'value': ' '},
{'name': 'Last Week Rank', 'value': 1},
{'name': 'Last 2 Week Rank', 'value': 1},
{'name': 'Weeks On Chart', 'value': 8}]}]
Using the example's data as raw_data, i.e.
our_df = pd.DataFrame(data = {'rank': {0: 1, 1: 2}, 'title_name': {0: "And It's Still Alright", 1: 'Black Madonna'}, 'title_id': {0: '120034150', 1: '106938609'}, 'artist_id': {0: '222521', 1: '200160'}, 'artist_name': {0: 'Nathaniel Rateliff', 1: 'Cage The Elephant'}, 'label': {0: 'CNCO', 1: 'RCA'}, 'metrics': {0: [{'name': 'Rank', 'value': 1}, {'name': 'Song', 'value': "And It's Still Alright"}, {'name': 'Artist', 'value': 'Nathaniel Rateliff'}, {'name': 'TP Spins', 'value': 933}, {'name': '+/- Chg. Spins', 'value': -32}, {'name': 'LP Spins', 'value': 965}, {'name': 'Stations', 'value': '44/46'}, {'name': 'Adds', 'value': 0}, {'name': 'TP Audience', 'value': 1260000}, {'name': '+/- Chg. Audience', 'value': -40600}, {'name': 'LP Audience', 'value': 1300600}, {'name': 'TP Stream', 'value': 413101}], 1: [{'name': 'Rank', 'value': 2}, {'name': 'Song', 'value': 'Black Madonna'}, {'name': 'Artist', 'value': 'Cage The Elephant'}, {'name': 'TP Spins', 'value': 814}, {'name': '+/- Chg. Spins', 'value': 38}, {'name': 'LP Spins', 'value': 776}, {'name': 'Stations', 'value': '38/46'}, {'name': 'Adds', 'value': 0}, {'name': 'TP Audience', 'value': 1283400}, {'name': '+/- Chg. Audience', 'value': -21600}, {'name': 'LP Audience', 'value': 1305000}, {'name': 'TP Stream', 'value': 362366}]}})
raw_data = our_df.to_dict(orient='records')
Output
Resulting df from the solution above
rank
title_name
title_id
artist_id
artist_name
label
song
artist
tp_spins
plus_minus_chg_spins
lp_spins
stations
adds
tp_audience
plus_minus_chg_audience
lp_audience
tp_stream
0
1
And It's Still Alright
120034150
222521
Nathaniel Rateliff
CNCO
And It's Still Alright
Nathaniel Rateliff
933
-32
965
44/46
0
1260000
-40600
1300600
413101
1
2
Black Madonna
106938609
200160
Cage The Elephant
RCA
Black Madonna
Cage The Elephant
814
38
776
38/46
0
1283400
-21600
1305000
362366
Let's start decomposing your issue. After defining our_df we can generate a new dataframe based on the column metrics with:
pd.concat([pd.DataFrame({x['name']:x['value'] for x in y},index=[0]) for y in our_df['metrics']]
Which outputs:
Rank Song ... LP Audience TP Stream
0 1 And It's Still Alright ... 1300600 413101
0 2 Black Madonna ... 1305000 362366
Next it's just a question of joining them together with pd.concat() or merge. I assume the common key is the column Rank therefore I'll use merge:
our_df.drop(columns=['metrics']).merge(pd.concat([pd.DataFrame({x['name']:x['value'] for x in y},index=[0]) for y in our_df['metrics']]),left_on='rank',right_on='Rank')
Outputting the full dataframe
rank title_name ... LP Audience TP Stream
0 1 And It's Still Alright ... 1300600 413101
1 2 Black Madonna ... 1305000 362366
Alternative that might be robust against missing names
metric_df = our_df.apply(
lambda r:
pd.Series(
index=list(map(lambda d: d['name'], r['metrics']))+['rank'],
data=list(map(lambda d: d['value'], r['metrics']))+[r['rank']],
),
axis=1,
)
our_df.merge(metric_df, on='rank')
box = pd.concat({index : pd.DataFrame(ent)
for index, ent in
zip( our_df.index, our_df.metrics)})
( our_df
.drop(columns = 'metrics')
.join(box.droplevel(-1))
.pivot(['rank', 'title_name', 'title_id', 'artist_id', 'artist_name', 'label'],
'name',
'value')
.reset_index()
)
name rank title_name title_id artist_id artist_name label +/- Chg. Audience +/- Chg. Spins Adds Artist LP Audience LP Spins Rank Song Stations TP Audience TP Spins TP Stream
0 1 And It's Still Alright 120034150 222521 Nathaniel Rateliff CNCO -40600 -32 0 Nathaniel Rateliff 1300600 965 1 And It's Still Alright 44/46 1260000 933 413101
1 2 Black Madonna 106938609 200160 Cage The Elephant RCA -21600 38 0 Cage The Elephant 1305000 776 2 Black Madonna 38/46 1283400 814 362366
Working on the raw_data:
from itertools import chain, product
metrics = [ent['metrics'] for ent in raw_data]
non_metrics = [{key : value
for key, value
in ent.items()
if key != 'metrics'}
for ent in raw_data]
combo = zip(metrics, non_metrics)
combo = (product(metrics, [non_metrics])
for metrics, non_metrics in combo)
combo = chain.from_iterable(combo)
combo = [{**left, **right} for left, right in combo]
pd.DataFrame(combo)
name value rank title_name title_id artist_id artist_name label peak_position last_week_rank last_2week_rank
0 Rank 1 1 BUTTER BTS 1 7 8
1 Song BUTTER 1 BUTTER BTS 1 7 8
2 Artist BTS 1 BUTTER BTS 1 7 8
3 Label Description None 1 BUTTER BTS 1 7 8
4 Label 1 BUTTER BTS 1 7 8
5 Last Week Rank 7 1 BUTTER BTS 1 7 8
6 Last 2 Week Rank 8 1 BUTTER BTS 1 7 8
7 Weeks On Chart 15 1 BUTTER BTS 1 7 8
8 Rank 2 2 STAY THE KID LAROI & JUS 1 1 1
9 Song STAY 2 STAY THE KID LAROI & JUS 1 1 1
10 Artist THE KID LAROI & JUS 2 STAY THE KID LAROI & JUS 1 1 1
11 Label Description None 2 STAY THE KID LAROI & JUS 1 1 1
12 Label 2 STAY THE KID LAROI & JUS 1 1 1
13 Last Week Rank 1 2 STAY THE KID LAROI & JUS 1 1 1
14 Last 2 Week Rank 1 2 STAY THE KID LAROI & JUS 1 1 1
15 Weeks On Chart 8 2 STAY THE KID LAROI & JUS 1 1 1
You can then reshape/transform into whatever you desire.
Looking into translating the following nested dictionary which is an API pull from Yelp into a pandas dataframe to run visualization on:
Top 50 Pizzerias in Chicago
{'businesses': [{'alias': 'pequods-pizzeria-chicago',
'categories': [{'alias': 'pizza', 'title': 'Pizza'}],
'coordinates': {'latitude': 41.92187, 'longitude': -87.664486},
'display_phone': '(773) 327-1512',
'distance': 2158.7084581522413,
'id': 'DXwSYgiXqIVNdO9dazel6w',
'image_url': 'https://s3-media1.fl.yelpcdn.com/bphoto/8QJUNblfCI0EDhOjuIWJ4A/o.jpg',
'is_closed': False,
'location': {'address1': '2207 N Clybourn Ave',
'address2': '',
'address3': '',
'city': 'Chicago',
'country': 'US',
'display_address': ['2207 N Clybourn Ave',
'Chicago, IL 60614'],
'state': 'IL',
'zip_code': '60614'},
'name': "Pequod's Pizzeria",
'phone': '+17733271512',
'price': '$$',
'rating': 4.0,
'review_count': 6586,
'transactions': ['restaurant_reservation', 'delivery'],
'url': 'https://www.yelp.com/biz/pequods-pizzeria-chicago?adjust_creative=wt2WY5Ii_urZB8YeHggW2g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=wt2WY5Ii_urZB8YeHggW2g'},
{'alias': 'lou-malnatis-pizzeria-chicago',
'categories': [{'alias': 'pizza', 'title': 'Pizza'},
{'alias': 'italian', 'title': 'Italian'},
{'alias': 'sandwiches', 'title': 'Sandwiches'}],
'coordinates': {'latitude': 41.890357,
'longitude': -87.633704},
'display_phone': '(312) 828-9800',
'distance': 4000.9990531720227,
'id': '8vFJH_paXsMocmEO_KAa3w',
'image_url': 'https://s3-media3.fl.yelpcdn.com/bphoto/9FiL-9Pbytyg6usOE02lYg/o.jpg',
'is_closed': False,
'location': {'address1': '439 N Wells St',
'address2': '',
'address3': '',
'city': 'Chicago',
'country': 'US',
'display_address': ['439 N Wells St',
'Chicago, IL 60654'],
'state': 'IL',
'zip_code': '60654'},
'name': "Lou Malnati's Pizzeria",
'phone': '+13128289800',
'price': '$$',
'rating': 4.0,
'review_count': 6368,
'transactions': ['pickup', 'delivery'],
'url': 'https://www.yelp.com/biz/lou-malnatis-pizzeria-chicago?adjust_creative=wt2WY5Ii_urZB8YeHggW2g&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=wt2WY5Ii_urZB8YeHggW2g'},
....]
I've tried the below and iterations of it but haven't had any luck.
df = pd.DataFrame.from_dict(topresponse)
Im really new to coding so any advice would be helpful
response["businesses"] is a list of records, so:
df = pd.DataFrame.from_records(response["businesses"])
I have a dataframe that I created from a Data Dictionary format in the following way:
df = pd.DataFrame( info_closed, columns = [ 'type', 'origQty', 'executedQty' ] )
The result is as follows:
type origQty executedQty
0 LIMIT 0.00362000 0.00362000
1 MARKET 0.00200000 0.00200000
2 MARKET 0.00150000 0.00150000
3 MARKET 0.00150000 0.00150000
4 LIMIT 0.00150000 0.00150000
5 LIMIT 0.00150000 0.00150000
6 MARKET 0.00199500 0.00199500
7 LIMIT 0.00150000 0.00150000
8 MARKET 0.00149800 0.00149800
9 LIMIT 0.00150000 0.00150000
10 LIMIT 0.00149900 0.00149900
11 LIMIT 0.00150000 0.00150000
12 MARKET 0.00149800 0.00149800
[... snip ...]
I am trying to create a result in the following manner:
type origQty executedQty Count
0 LIMIT 13.03 15.01 23
1 MARKET 122.0l 40.00 54
[.. snip ...]
Basically, this would be a group_by (type) and a sum( origQty ) and sum ( origQty ) within each 'type' and a count of records that were use to calculate the values of sum( origQty ) and sum (origQty)
I tried:
g = df.groupby(['type'])['origQty', 'executedQty'].sum().reset_index()
but the results come out as follows:
type origQty executedQty
0 LIMIT 0.003620000.001500000.001500000.001500000.0015... 0.003620000.001500000.001500000.001500000.0015...
1 LIMIT_MAKER 0.001499000.001500000.001500000.001500000.0014... 0.001499000.001500000.001500000.001500000.0014...
2 MARKET 0.002000000.001500000.001500000.001995000.0014... 0.002000000.001500000.001500000.001995000.0014...
3 STOP_LOSS_LIMIT 0.00150000 0.00150000
Question: what am I doing wrong?
TIA
ETA:
Thanks all for the provided solutions!
I ran some but I was still getting this type of output:
origQty
executedQty
type
LIMIT_MAKER 0.001499000.001500000.001500000.001500000.0014... 0.001499000.001500000.001500000.001500000.0014...
The original data was like this (it is a combination of data from the Binance exchange and the ccxt wrapper code. I was attempting to isolate the Binance data ~only~ (whichi is associated with ['info'])
[{'info': {'symbol': 'BTCUSDT', 'orderId': 2538903025, 'orderListId':
-1, 'clientOrderId': 'ENDsgXoqtv2ct5jizrfeQe', 'price': '9638.00000000', 'origQty': '0.00150000', 'executedQty': '0.00150000',
'cummulativeQuoteQty': '14.45700000', 'status': 'FILLED',
'timeInForce': 'GTC', 'type': 'LIMIT_MAKER', 'side': 'BUY',
'stopPrice': '0.00000000', 'icebergQty': '0.00000000', 'time':
1592879158045, 'updateTime': 1592879162299, 'isWorking': True,
'origQuoteOrderQty': '0.00000000'}, 'id': '2538903025',
'clientOrderId': 'ENDsgXoqtv2ct5jizrfeQe', 'timestamp': 1592879158045,
'datetime': '2020-06-23T02:25:58.045Z', 'lastTradeTimestamp': None,
'symbol': 'BTC/USDT', 'type': 'limit', 'side': 'buy', 'price': 9638.0,
'amount': 0.0015, 'cost': 14.457, 'average': 9638.0, 'filled': 0.0015,
'remaining': 0.0, 'status': 'closed', 'fee': None, 'trades': None},
{'info': {'symbol': 'BTCUSDT', 'orderId': 2539250884, 'orderListId':
-1, 'clientOrderId': '5UFBYwDF6b9qJ1UWNsvOYU', 'price': '9653.00000000', 'origQty': '0.00299700', 'executedQty': '0.00299700',
'cummulativeQuoteQty': '28.93004100', 'status': 'FILLED',
'timeInForce': 'GTC', 'type': 'LIMIT_MAKER', 'side': 'SELL',
'stopPrice': '0.00000000', 'icebergQty': '0.00000000', 'time':
1592883883927, 'updateTime': 1592884056113, 'isWorking': True,
'origQuoteOrderQty': '0.00000000'}, 'id': '2539250884',
'clientOrderId': '5UFBYwDF6b9qJ1UWNsvOYU', 'timestamp': 1592883883927,
'datetime': '2020-06-23T03:44:43.927Z', 'lastTradeTimestamp': None,
'symbol': 'BTC/USDT', 'type': 'limit', 'side': 'sell', 'price':
9653.0, 'amount': 0.002997, 'cost': 28.930041, 'average': 9653.0, 'filled': 0.002997, 'remaining': 0.0, 'status': 'closed', 'fee': None,
'trades': None}, {'info': {'symbol': 'BTCUSDT', 'orderId': 2539601261,
'orderListId': -1, 'clientOrderId': 'testme-15928890617592764',
'price': '9633.00000000', 'origQty': '0.00150000', 'executedQty':
'0.00150000', 'cummulativeQuoteQty': '14.44950000', 'status':
'FILLED', 'timeInForce': 'GTC', 'type': 'LIMIT_MAKER', 'side': 'BUY',
'stopPrice': '0.00000000', 'icebergQty': '0.00000000', 'time':
1592889061852, 'updateTime': 1592889136305, 'isWorking': True,
'origQuoteOrderQty': '0.00000000'}, 'id': '2539601261',
'clientOrderId': 'testme-15928890617592764', 'timestamp':
1592889061852, 'datetime': '2020-06-23T05:11:01.852Z',
'lastTradeTimestamp': None, 'symbol': 'BTC/USDT', 'type': 'limit',
'side': 'buy', 'price': 9633.0, 'amount': 0.0015, 'cost': 14.4495,
'average': 9633.0, 'filled': 0.0015, 'remaining': 0.0, 'status':
'closed', 'fee': None, 'trades': None}]
I paired it back by executing the following :
info_closed = []
for index,item in enumerate( orders_closed ):
info_closed.append( item['info'] )
The results of what I had is listed above in the first post.
I then ran:
df = pd.DataFrame( final_output, columns = [ 'type', 'origQty', 'executedQty' ] )
I am starting to wonder if there is something amiss with the dataframe ... will start looking at this area ...
try this, before groupby cast the values to float.
df[['origQty', 'executedQty']] = df[['origQty', 'executedQty']].astype(float)
(
df.groupby(['type'])
.agg({"origQty": sum, "executedQty": sum, "type": len})
.rename(columns={'type': 'count'})
.reset_index()
)
I am 99% sure you get the result you want by just doing this:
df.groupby(['type'])[['origQty', 'executedQty']].sum()
I need assistance handling the dict file type that is returned from the google maps api.
Currently, the results are handing me a dict of the resulting data (starting addresses, ending addresses, travel times, distances etc) which I cannot process. I can extract the start and end addresses simply, but the bulk data is proving difficult to extract, and I think it is because of its structure.
The sample of the code I have is as follows;
import googlemaps
import csv
import pandas as pd
postcodes = pd.read_csv("SW.csv", sep=',', usecols=['postcode'], squeeze=True)
infile1 = open('SW.csv', 'r')
reader1 = csv.reader(infile1)
Location1 = postcodes[0:10]
Location2 = 'SW1A 2HQ'
my_distance = gmaps.distance_matrix(Location1, Location2, mode='bicycling', language=None, avoid=None, units='metric',
departure_time='2475925955', arrival_time=None,
transit_routing_preference=None)
print(my_distance)
Which generates the following output;
{'origin_addresses': ['Cossar Mews, Brixton, London SW2 2TR, UK',
'Bushnell Rd, London SW17 8QP, UK', 'Maltings Pl, Fulham, London SW6
2BX, UK', 'Knightsbridge, London SW7 1BJ, UK', 'Chelsea, London SW3
3EE, UK', 'Hester Rd, London SW11 4AJ, UK', 'Brixton, London SW2 1HZ,
UK', 'Randall Cl, London SW11 3TG, UK', 'Sloane St, London SW1X 9SF,
UK', 'Binfield Rd, London SW4 6TA, UK'], 'rows': [{'elements':
[{'duration': {'text': '28 mins', 'value': 1657}, 'status': 'OK',
'distance': {'text': '7.5 km', 'value': 7507}}]}, {'elements':
[{'duration': {'text': '31 mins', 'value': 1850}, 'status': 'OK',
'distance': {'text': '9.2 km', 'value': 9176}}]}, {'elements':
[{'duration': {'text': '27 mins', 'value': 1620}, 'status': 'OK',
'distance': {'text': '7.0 km', 'value': 7038}}]}, {'elements':
[{'duration': {'text': '16 mins', 'value': 953}, 'status': 'OK',
'distance': {'text': '4.0 km', 'value': 4038}}]}, {'elements':
[{'duration': {'text': '15 mins', 'value': 899}, 'status': 'OK',
'distance': {'text': '3.4 km', 'value': 3366}}]}, {'elements':
[{'duration': {'text': '21 mins', 'value': 1260}, 'status': 'OK',
'distance': {'text': '5.3 km', 'value': 5265}}]}, {'elements':
[{'duration': {'text': '28 mins', 'value': 1682}, 'status': 'OK',
'distance': {'text': '7.5 km', 'value': 7502}}]}, {'elements':
[{'duration': {'text': '23 mins', 'value': 1368}, 'status': 'OK',
'distance': {'text': '5.9 km', 'value': 5876}}]}, {'elements':
[{'duration': {'text': '14 mins', 'value': 839}, 'status': 'OK',
'distance': {'text': '3.3 km', 'value': 3341}}]}, {'elements':
[{'duration': {'text': '16 mins', 'value': 982}, 'status': 'OK',
'distance': {'text': '4.3 km', 'value': 4294}}]}],
'destination_addresses': ['Horse Guards Rd, London SW1A 2HQ, UK'],
'status': 'OK'}
I am then using the following code to extract it;
origin = my_distance['origin_addresses']
dest = my_distance['destination_addresses']
dist = my_distance['rows']
I have tried the df_from_list and many others to try and process the dist data. The end goal is to have a matrix with the origin addresses on each row, the destination addresses forming columns, with distance and time as data variables within these columns.
Something similar to this
| DEST 1 | DEST 2 |
| TIME | DIST | TIME | DIST |
START 1 | X | Y | Z | T |
START 2 | A | B | C | T |
Please can someone help me process the my_distance output (shown above) into an architecture similar to that shown above.
Thanks!
This basicly creates a dictionary with the starts and the destination adresses.
The destination adresses have a list of tupels as values. The first element in the tuple is the duration and the second the distance
e.g. (45, 7.0)#45=45min and 7.0 = 7km. Then I create the dataframe with pandas.DataFrame.from_dict()
import pandas as pd
dct = {d_adresses:[] for d_adresses in data['destination_addresses']}
dct['starts'] = []
for i in range(len(data['origin_addresses'])):
duration=int(data['rows'][i]['elements'][0]['duration']['text'].split(' ')[0])
distance=float(data['rows'][i]['elements'][0]['distance']['text'].split(' ')[0])
for key in dct:
if key != 'starts':
dct[key].append((duration, distance))
dct['starts'].append(data['origin_addresses'][i])
df = pd.DataFrame.from_dict(dct)
df.set_index('starts', inplace=True)
I create an empty dataframe before running gmaps.distance_matrix, and place the dictionary keys into the dataframe. Similar to the above solution:
traffic = pd.DataFrame({'time': [], 'origins': [], 'destinations': [], 'destination_addresses': [], 'origin_addresses': [], 'rows': [], 'status': []})
for origin in origins:
for destination in destinations:
traffic = traffic.append({'time': [00:00], 'origins': [origin], 'destinations': [destination]}, ignore_index=True)
if origin != destination:
if cityname == cityname:
# Get travel distance and time for a matrix of origins and destinations
traffic_result = gmaps.distance_matrix((origin), (destination),
mode="driving", language=None, avoid=None, units="metric",
departure_time=00:00, arrival_time=None, transit_mode=None,
transit_routing_preference=None, traffic_model=None, region=None)
for key in traffic_result.keys():
for value in traffic_result[key]:
print(key, value)
traffic = traffic.append({key: [value]}, ignore_index=True)