Related
I need to convert data from an api call into a dataframe. After calling the api I get the following json object:
{'responseID': 149882407,
'surveyID': 9711255,
'surveyName': 'NPS xx yy',
'ipAddress': '170.231.171.253',
'timestamp': '20 Aug, 2022 01:37:29 PM PET',
'location': {'country': None,
'region': '',
'latitude': -12.0439,
'longitude': -77.0281,
'radius': 0.0,
'countryCode': 'PE'},
'duplicate': False,
'timeTaken': 6,
'responseStatus': 'Completed',
'externalReference': '',
'customVariables': {'custom1': None,
'custom2': None,
'custom3': None,
'custom4': None,
'custom5': None},
'language': 'Spanish (Latin America)',
'currentInset': '',
'operatingSystem': 'ANDROID1',
'osDeviceType': 'MOBILE',
'browser': 'CHROME10',
'responseSet': [{'questionID': 106457509,
'questionDescription': '',
'questionCode': 'Q3',
'questionText': '¿El vendedor te recomendó algún producto adicional? ',
'imageUrl': None,
'answerValues': [{'answerID': 571204020,
'answerText': 'SI',
'value': {'scale': '1',
'other': '',
'dynamicExplodeText': '',
'text': '',
'result': '',
'fileLink': '',
'weight': 0.0}}]},
{'questionID': 106457510,
'questionDescription': '{detractor:Nada probable,promoter:Altamente probable}',
'questionCode': 'Q8',
'questionText': '¿Cuán probable es que recomiendes las tiendas Samsung a un familiar o amigo?',
'imageUrl': None,
'answerValues': [{'answerID': 571204032,
'answerText': '10',
'value': {'scale': '11',
'other': '',
'dynamicExplodeText': '',
'text': '',
'result': '',
'fileLink': '',
'weight': 0.0}}]},
{'questionID': 106457511,
'questionDescription': '',
'questionCode': 'Q6',
'questionText': '¿En qué fallamos?',
'imageUrl': None,
'answerValues': []},
{'questionID': 106457512,
'questionDescription': '',
'questionCode': 'Q4',
'questionText': '¿En qué debemos mejorar?',
'imageUrl': None,
'answerValues': []},
{'questionID': 106457513,
'questionDescription': '',
'questionCode': 'Q5',
'questionText': '¿Por qué nos felicitas?',
'imageUrl': None,
'answerValues': [{'answerID': 571204035,
'answerText': '',
'value': {'scale': '',
'other': '',
'dynamicExplodeText': '',
'text': '',
'result': '',
'fileLink': '',
'weight': 0.0}}]}],
'utctimestamp': 29}
The json file is a list of dictionaries, and each dictionary (as the one above) represents one response from a client. It is the response from a client, to a five question survey.
The information that I want to scrape is: responseID, surveyID, ipAddress, timestamp, latitude, longitude, questionText (inside responseSet), scale and text (inside answerValues). So it would look something like this:
The json object I showed represents one of the rows from the dataframe.
First I tried pd.json_normalize(), which correctly scrapes responseID, surveyID, ipAddress, timestamp, latitude, longitude and timeTaken, but since responseSet is a list, it just remains a list within the dataframe.
I tried to use to_list() to expand this column of lists into multiple columns, but this quickly got out of hand since there are dicts, within dicts, within lists, within dicts. In other words, it's very heavily nested and I need to extract the answers to five questions, which answer may be in text or scale. So I figured this wasn't the most pythonic way to do so.
Lastly I used json_normalize with answerValues as the record path, which gave me a dataframe in which every row was an individual answer. So I had 5 rows per client (sometimes less, since it wouldn't scrape the answer if the client left the question unanswered). Next I used pivot to obtain a dataframe that was closer to what I wanted, and finally merged with my previous dataframe.
def transform_json(data):
flatten_json = pd.json_normalize(data)
answers_long = pd.json_normalize(data,
record_path= ["responseSet", "answerValues"],
meta= ["responseID",
["responseSet", "questionText"]])
answers_long["value"] = answers_long["value.text"] + answers_long["value.scale"]
answers = answers_long.pivot(index= "responseID",
columns= "responseSet.questionText",
values= "value").reset_index()
df = flatten_json.merge(answers,
how= "left",
on= "responseID")
return df
I wonder what would be the best way to achieve my task since I don't think this was the best, maybe there is a way to completely flatten the json file, including the nested lists and dictionaries.
Building off my former post here: How to print data from API call into a CSV file
The API call returns this
[Order({ 'asset_class': 'us_equity',
'asset_id': '8a9-43b6-9b36-662f01e8fadd',
'canceled_at': None,
'client_order_id': 'e38a-b51c-349314bc6e9e',
'created_at': '2020-06-05T16:16:53.307491Z',
'expired_at': None,
'extended_hours': False,
'failed_at': None,
'filled_at': '2020-06-05T16:16:53.329Z',
'filled_avg_price': '7.8701',
'filled_qty': '45',
'id': '8-4888-9c7c-97bf8c2a3a16',
'legs': None,
'limit_price': '7.87',
'order_class': '',
'order_type': 'limit',
'qty': '45',
'replaced_at': None,
'replaced_by': None,
'replaces': None,
'side': 'sell',
'status': 'filled',
'stop_price': None,
'submitted_at': '2020-06-05T16:16:53.293859Z',
'symbol': 'CARS',
'time_in_force': 'day',
'type': 'limit',
'updated_at': '2020-06-08T11:21:51.411547Z'}), Order({ 'asset_class': 'us_equity',
'asset_id': '1aef-42f4-9975-750dbcb3e67d',
'canceled_at': None,
'client_order_id': '2bde-4572-a5d0-bfc32c2bf31a',
'created_at': '2020-06-05T16:16:37.508176Z',
'expired_at': None,
'extended_hours': False,
'failed_at': None,
'filled_at': '2020-06-05T16:16:37.531Z',
'filled_avg_price': '10.8501',
'filled_qty': '26',
'id': '4256-472c-a5de-6ca9d6a21422',
'legs': None,
'limit_price': '10.85',
'order_class': '',
'order_type': 'limit',
'qty': '26',
'replaced_at': None,
'replaced_by': None,
'replaces': None,
'side': 'sell',
'status': 'filled',
'stop_price': None,
'submitted_at': '2020-06-05T16:16:37.494389Z',
'symbol': 'IGT',
'time_in_force': 'day',
'type': 'limit',
'updated_at': '2020-06-08T11:21:51.424963Z'})]
I'd like to repeat the exercise of writing to a CSV as linked in my other post but this time only write a subset of the columns from the API into a CSV. My first attempt here was instead of using the keys from the raw dictionary I would specify the fieldnames as a list, but then I'm having trouble accessing only the keys in the dict entry based on the list of filenames that I'm passing in.
with open('historical_orders.csv', 'w', newline='') as csvfile:
fieldnames = ['id', 'created_at', 'filled_at', 'canceled_at', 'replaced_at', 'symbol', 'asset_class', 'qty',
'filled_qty', 'filled_avg_price', 'order_class', 'order_type', 'type',
'side', 'time_in_force', 'limit_price',
'stop_price', 'status', 'extended_hours', 'legs']
writer = csv.DictWriter(csvfile, fieldnames)
writer.writeheader()
for order in closed_orders:
writer.writerow(order.__dict__['_raw'].fieldnames)
I get AttributeError: 'dict' object has no attribute 'fieldnames'.
Additionally I'd like to add 1 more column that strips out the funky "created_at" value to a date and time. So instead of "created_at" = "'2020-06-05T16:16:53.307491Z'", I'd like to create a column date "'2020-06-05" and time "'16:16:53". I was thinking I could do this by adding a loop in each write row to write one field a time, but wasn't sure if there was a better way.
Can someone help me with these 2 issues?
I'm using the Alpaca trading API and want to export data from a function call into a CSV file.
When I run a call like this:
closed_orders = api.list_orders(
status='closed',
limit=2,
nested=True # show nested multi-leg orders
)
print(closed_orders)
I get back:
[Order({ 'asset_class': 'us_equity',
'asset_id': '8a9-43b6-9b36-662f01e8fadd',
'canceled_at': None,
'client_order_id': 'e38a-b51c-349314bc6e9e',
'created_at': '2020-06-05T16:16:53.307491Z',
'expired_at': None,
'extended_hours': False,
'failed_at': None,
'filled_at': '2020-06-05T16:16:53.329Z',
'filled_avg_price': '7.8701',
'filled_qty': '45',
'id': '8-4888-9c7c-97bf8c2a3a16',
'legs': None,
'limit_price': '7.87',
'order_class': '',
'order_type': 'limit',
'qty': '45',
'replaced_at': None,
'replaced_by': None,
'replaces': None,
'side': 'sell',
'status': 'filled',
'stop_price': None,
'submitted_at': '2020-06-05T16:16:53.293859Z',
'symbol': 'CARS',
'time_in_force': 'day',
'type': 'limit',
'updated_at': '2020-06-08T11:21:51.411547Z'}), Order({ 'asset_class': 'us_equity',
'asset_id': '1aef-42f4-9975-750dbcb3e67d',
'canceled_at': None,
'client_order_id': '2bde-4572-a5d0-bfc32c2bf31a',
'created_at': '2020-06-05T16:16:37.508176Z',
'expired_at': None,
'extended_hours': False,
'failed_at': None,
'filled_at': '2020-06-05T16:16:37.531Z',
'filled_avg_price': '10.8501',
'filled_qty': '26',
'id': '4256-472c-a5de-6ca9d6a21422',
'legs': None,
'limit_price': '10.85',
'order_class': '',
'order_type': 'limit',
'qty': '26',
'replaced_at': None,
'replaced_by': None,
'replaces': None,
'side': 'sell',
'status': 'filled',
'stop_price': None,
'submitted_at': '2020-06-05T16:16:37.494389Z',
'symbol': 'IGT',
'time_in_force': 'day',
'type': 'limit',
'updated_at': '2020-06-08T11:21:51.424963Z'})]
How do I grab this data and print it into a CSV? I tried to write something like this but I get the error "Order" object has no keys. I assumed I'd be able to loop through the API response and write according to the headers; how do I break down/flatten the API response accordingly?
with open('historical_orders.csv', 'w', newline='') as csvfile:
fieldnames = [
'asset_id',
'canceled_at',
'client_order_id',
'created_at',
'expired_at',
'extended_hours',
'failed_at',
'filled_at',
'filled_avg_price',
'filled_qty',
'id',
'legs',
'limit_price',
'order_class',
'order_type',
'qty',
'replaced_at',
'replaced_by',
'replaces',
'side',
'status',
'stop_price',
'submitted_at',
'symbol',
'time_in_force',
'type',
'updated_at'
]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for c in closed_orders:
writer.writerow(c)
You can access the __dict__ attribute of the Orders class to get the headers and rows for the CSV file.
with open('historical_orders.csv', 'w', newline='') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=closed_orders[0].__dict__['_raw'].keys())
writer.writeheader()
for order in closed_orders:
writer.writerow(order.__dict__['_raw'])
I am currently using glom to parse through a JSON API response, which returns, among other things, a list of dictionaries, with a list of dictionaries inside it. The problem I'm having is getting glom to access the correct dictionary entry.
Example JSON:
{'answeredAt': '2019-08-23T21:11:04Z',
'direction': 'Inbound',
'disposition': 'Answered',
'duration': 110867,
'endedAt': '2019-08-23T21:12:55Z',
'from': {'connectedAt': '2019-08-23T21:11:04Z',
'departmentName': None,
'deviceType': None,
'disconnectedAt': '2019-08-23T21:12:55Z',
'name': 'blah',
'number': '1234567890',
'number_e164': '1234567890',
'serviceId': None,
'userId': None},
'initialQueueName': 'blah',
'joinedLinkedIds': [],
'legs': [{'departmentName': 'default',
'deviceType': 'Unknown',
'legType': 'Dial',
'menuName': None,
'menuOption': None,
'menuPrompt': None,
'number': '1234567890',
'optionAction': None,
'optionArg': None,
'queueName': None,
'serviceId': 327727,
'timestamp': '2019-08-23T21:11:04Z',
'userId': None},
{'departmentName': 'default',
'deviceType': 'Unknown',
'legType': 'Answer',
'menuName': None,
'menuOption': None,
'menuPrompt': None,
'number': '1234567890',
'optionAction': None,
'optionArg': None,
'queueName': None,
'serviceId': 327727,
'timestamp': '2019-08-23T21:11:04Z',
'userId': None},
{'departmentName': None,
'deviceType': None,
'legType': 'EnterIVR',
'menuName': 'blah',
'menuOption': None,
'menuPrompt': None,
'number': None,
'optionAction': None,
'optionArg': None,
'queueName': None,
'serviceId': None,
'timestamp': '2019-08-23T21:11:05Z',
'userId': None},
{'departmentName': None,
'deviceType': None,
'legType': 'IVRSchedule',
'menuName': 'Day',
'menuOption': None,
'menuPrompt': None,
'number': None,
'optionAction': None,
'optionArg': None,
'queueName': None,
'serviceId': None,
'timestamp': '2019-08-23T21:11:06Z',
'userId': None},
{'departmentName': None,
'deviceType': None,
'legType': 'EnterQueue',
'menuName': None,
'menuOption': None,
'menuPrompt': None,
'number': None,
'optionAction': None,
'optionArg': None,
'queueName': 'blah',
'serviceId': None,
'timestamp': '2019-08-23T21:11:15Z',
'userId': None},
{'departmentName': None,
'deviceType': None,
'legType': 'Hangup',
'menuName': None,
'menuOption': None,
'menuPrompt': None,
'number': 'blah',
'optionAction': None,
'optionArg': None,
'queueName': None,
'serviceId': None,
'timestamp': '2019-08-23T21:12:55Z',
'userId': None}],
'linkedId': 'some unique key',
'startedAt': '2019-08-23T21:11:04Z',
'to': {'connectedAt': '2019-08-23T21:11:04Z',
'departmentName': 'default',
'deviceType': 'Unknown',
'disconnectedAt': '2019-08-23T21:12:55Z',
'name': None,
'number': '1234567890',
'number_e164': '1234567890',
'serviceId': 327727,
'userId': None},
'version': {'label': None, 'major': 4, 'minor': 2, 'point': 1}},
The information I'm trying to get at is in 'legs', where 'legType' == 'Dial' or 'EnterIVR'. I need 'number' from the 'Dial' leg, and 'menuName' from the 'EnterIVR' leg. I can get it, for instance, to list back all the different legTypes, but not the data specifically from those.
This is where I'm at currently:
with open('callstest.csv',mode='w') as calls:
data_writer = csv.writer(calls, delimiter = ',')
data_writer.writerow(['LinkedID','Number','Queue','Client'])
target = response_json['calls']
glomtemp = {}
for item in target:
spec = {
'Linked ID':'linkedId',
#this returns the number I need only in certain cases,
#so I need 'number' from the 'Dial' legType
'Number': ('to', 'number')
'Queue': 'initialQueueName',
'Client': #need help here, should be 'menuName' from
#'EnterIVR' legType
}
glomtemp = glom(item,spec)
#print(glomtemp)
data_writer.writerow([glomtemp['Linked ID'],glomtemp['Number'],glomtemp['Queue']])
Right now I can get them to fall back with Coalesce to "None", but that's not what I'm looking for.
Any suggestions on how I should spec this to get the info out of those 2 legs for 'Number' and 'Client'?
If I understand correctly, you want to filter out certain entries that don't fit the supported legType. You're definitely onto something with the Coalesce, and I think the key here is glom's Check specifier type, combined with the SKIP singleton. I had to tweak your current spec a bit to match the example data, but this runs:
from glom import glom, Check, Coalesce, SKIP
LEG_SPEC = {'Client': Coalesce('menuName', default=''),
'Number': Coalesce('to.number', default=''),
'Linked ID': 'serviceId',
'Queue': 'queueName'}
entries_spec = ('legs',
[Check('legType', one_of=('Dial', 'EnterIVR'), default=SKIP)],
[LEG_SPEC])
pprint(glom(target, entries_spec))
# prints:
# [{'Client': None, 'Linked ID': 327727, 'Number': '', 'Queue': None},
# {'Client': 'blah', 'Linked ID': None, 'Number': '', 'Queue': None}]
Not sure if that was exactly what you were hoping to see, but the pattern is there. I think you want Nones (or '') for those other fields because the csv you're writing is going to want to put something in those columns.
There are other ways of doing filtered iteration using glom, too. The snippets page has a short section, complete with examples.
I have a JSON list like this (it was a JSON response, the below is after i did json.loads)
[{'status': 'ok', 'slot': None, 'name': 'blah', 'index': 0, 'identify': 'off',
'details': None, 'speed': None, 'temperature': None}, {'status': 'ok', 'slot':
None, 'name': 'blah0', 'index': 0, 'identify': 'off', 'details': None,
'speed': None, 'temperature': None}, {'status': 'ok', 'slot': None, 'name':
'blah1', 'index': 1, 'identify': 'off', 'details': None, 'speed': None,
'temperature': None}, {'status': 'ok', 'slot': None, 'name': 'blah2',
'index': 2, 'identify': 'off', 'details': None, 'speed': None, 'temperature':
None}, {'status': 'ok', 'slot': None, 'name': 'blah3', 'index': 3,
'identify': 'off', 'details': None, 'speed': None, 'temperature': None}]
I want to get both name and the status of the list, if name='blah' or 'blah0' or 'blah1' or 'blah2' or 'blah3'
Essentially, for all the matches i want to store all the name and status in separate variables to use it elsewhere. (can be dynamically creating variables or statically assigning them will also work for me)
I tried this, but doesn't seems to work the way i want.
for value in data:
if value['name'] in ['blah', 'blah0', 'blah1', 'blah2', 'blah3']:
print(value['name'], value['status'])
This prints out the name and status as a string one line below the other. But i want each name and status to be assigned to a variable so i can use it later. Any help is much appreciated!
EDITED
Try something like:
new_data = []
# Extract all the data and map them by name and status
for value in data:
name = value.get("name")
status = value.get("status")
if name in ['blah', 'blah0', 'blah1', 'blah2', 'blah3']:
new_data.append(dict(
name=name,
status=status))
Option 1
# loop through the new data
for data in new_data:
print(data)
# OUTPUT:
{'name': 'blah', 'status': 'ok'}
{'name': 'blah0', 'status': 'ok'}
{'name': 'blah1', 'status': 'ok'}
{'name': 'blah2', 'status': 'ok'}
{'name': 'blah3', 'status': 'ok'}
Option 2
for data in new_data:
for key, value in data.items():
print(key, value)
#OUTPUT:
name blah
status ok
name blah0
status ok
name blah1
status ok
name blah2
status ok
name blah3
status ok
Option 3
for data in new_data:
print(data['name'], data['status'])
#OUTPUT
blah ok
blah0 ok
blah1 ok
blah2 ok
blah3 ok
You don't really want dynamic variables, but you can use a list comprehension. You should also take advantage of constant-cost set membership test:
keep = set(['blah', 'blah0', 'blah1', 'blah2', 'blah3'])
result = [(value['name'], value['status']) for value in data if value['name'] in keep]
print(result)
Output:
[('blah', 'ok'),
('blah0', 'ok'),
('blah1', 'ok'),
('blah2', 'ok'),
('blah3', 'ok')]
If you want a dictionary:
keep = set(['blah', 'blah0', 'blah1', 'blah2', 'blah3'])
result = {value['name']: value['status'] for value in data if value['name'] in keep}
print(result)