I am extracting data from ebay using ebay-sdkpython. The end goal is to have a XLS dump of items with specific fields for each item. The output that I want to manipulate comes as a complex nested list/dict as below:
dict = {'ack': 'Success',
'timestamp': '2015-11-20T03:49:08.302Z',
'version': '1.13.0',
'searchResult':
{'item': [
{'itemId': '111827613927',
'topRatedListing': 'false',
'sellingStatus':
{'currentPrice':
{'_currencyId': 'AUD',
'value': '290.0'},
'convertedCurrentPrice': {'_currencyId': 'AUD', 'value': '290.0'},
'sellingState': 'EndedWithSales'},
'listingInfo':
{'listingType': 'FixedPrice',
'endTime': '2015-11-20T03:35:07.000Z'}
},
'_count': '100'},
'paginationOutput':
{'totalPages': '114',
'entriesPerPage': '100',
'pageNumber': '1',
'totalEntries': '11364'}}
I know that I can pull a specific key out using the following format:
print dict['searchResult']['item'][0]['itemId']
But I want to pull all of them, with the varying levels of nesting. I also need error handling as some fields are missing, these should return a ''. My code so far is:
x=1
for item in response.dict()['searchResult']['item']:
print x,
for field in ['itemId','topRatedListing']:
try:
print item[field]
except KeyError:
print ''
x=x+1
print '\n'
How would I modify this for loop to also include, for example, ['sellingstatus']['currentprice']['value'] ?
Related
I call an API and this my JSON response:
{'data': [{
'id': 'd-1225959',
'startTime': '2022-12-30T00:00:00.000Z',
'endTime': '2022-12-30T23:59:00.000Z',
'checkedInAt': None,
'checkedOutAt': None,
'status': 'PENDING',
'space': {
'id': 'd-4063963',
'name': '082',
'type': 'DESK',
'createdAt': '2021-07-06T11:48:57.000Z',
'updatedAt': '2021-07-06T11:48:57.000Z',
'isAvailable': False,
'assignedTo': None,
'locationId': '133778',
'floorId': '41681',
'floorName': 'Car Park',
'neighborhoodId': '92267',
'neighborhoodName': 'NEI1'}}
I'm struggling to get the 'space' 'id' and 'name' extracted out if I do a nested python loop like so it only returns the headers like 'id' and 'name' not the values held within.
for order in response['data']:
print(order['id'])
print(order['startTime'])
print(order['endTime'])
print(order['checkedInAt'])
print(order['checkedOutAt'])
print(order['status'])
print(order['space'])
for doc in response['space']:
print(doc['id'], doc['name'])
Any help with this would be much appreciated!
for doc in response['space']: will iterate over the keys in response['space'] dict, i.e. doc will be str.
You want to do doc = response['space'] instead and then print(doc['id']). or directly print(response['space']['id']).
Note, you may want to use dict.get() method to avoid KeyError.
# if response dict has no 'space' key, return empty dict.
# if no 'id' key - return None
space_id = response.get('space', {}).get('id')
My json looks like below:
json_obj = [{'extracted_value': {'other': 'Not found', 'sound': 'false', 'longterm': 'false', 'physician': 'false'}, 'page_num': '33', 'score': '0.75', 'number': 12223611, 'misc':'true'}]
df=pd.DataFrame(json_obj)[['extracted_value', 'page_num','conf_score','number']]
I am extracting only the above info. But now i wanted to ignore 'other': 'Not found' in the extracted_value column and extract like above values.
you can try df['extracted_value'].apply(remove_other) i.e apply a function on column extracted_value.
complete code will be:
json_obj = [{'extracted_value': {'other': 'Not found', 'sound': 'false', 'longterm': 'false', 'physician': 'false'}, 'page_num': '33', 'score': '0.75', 'number': 12223611, 'misc':'true'}]
df=pd.DataFrame(json_obj)[['extracted_value', 'page_num','number']]
def remove_other(my_dict):
return {e:my_dict[e] for e in my_dict if e != 'other' and my_dict[e] != 'Not Found' } # condition to remove other and not found pair
df['extracted_value']=df['extracted_value'].apply(remove_other)
and the result will be:
extracted_value page_num number
0 {'sound': 'false', 'longterm': 'false', 'physi... 33 12223611
additional response:
df['extracted_value'].apply(remove_other) implies that column value will be passed as a parameter to the function. you can put print statement print(my_dict) in the remove_other to visualize it better.
code can be changed to remove dictionary value from and condition.
def remove_other(my_dict):
return {e:my_dict[e] for e in my_dict if e != 'other' }#and my_dict[e] != 'Not Found' } # remove'other' key item
i would suggest getting familiarized with JSON. in this case , need to go to [0]['coord'][0] . so function will be like :
# Section_Page_start and Section_End_Page
def get_start_and_end(var1):
my_dict=var1[0]['coord'][0]
return {ek:my_dict[ek] for ek in my_dict if ek in ['Section_Page_start','Section_End_Page']}
I know that somewhat related questions have been asked here: Accessing key, value in a nested dictionary and here: python accessing elements in a dictionary inside dictionary among other places but I can't quite seem to apply the answers' methodology to my issue.
I'm getting a KeyError trying to access the keys within response_dict, which I know is due to it being nested/paginated and me going about this the wrong way. Can anybody help and/or point me in the right direction?
import requests
import json
URL = "https://api.constantcontact.com/v2/contacts?status=ALL&limit=1&api_key=<redacted>&access_token=<redacted>"
#make my request, store it in the requests object 'r'
r = requests.get(url = URL)
#status code to prove things are working
print (r.status_code)
#print what was retrieved from the API
print (r.text)
#visual aid
print ('---------------------------')
#decode json data to a dict
response_dict = json.loads(r.text)
#show how the API response looks now
print(response_dict)
#just for confirmation
print (type(response_dict))
print('-------------------------')
# HERE LIES THE ISSUE
print(response_dict['first_name'])
And my output:
200
{"meta":{"pagination":{}},"results":[{"id":"1329683950","status":"ACTIVE","fax":"","addresses":[{"id":"4e19e250-b5d9-11e8-9849-d4ae5275509e","line1":"222 Fake St.","line2":"","line3":"","city":"Kansas City","address_type":"BUSINESS","state_code":"","state":"OK","country_code":"ve","postal_code":"19512","sub_postal_code":""}],"notes":[],"confirmed":false,"lists":[{"id":"1733488365","status":"ACTIVE"}],"source":"Site Owner","email_addresses":[{"id":"1fe198a0-b5d5-11e8-92c1-d4ae526edd6c","status":"ACTIVE","confirm_status":"NO_CONFIRMATION_REQUIRED","opt_in_source":"ACTION_BY_OWNER","opt_in_date":"2018-09-11T18:18:20.000Z","email_address":"rsmith#fake.com"}],"prefix_name":"","first_name":"Robert","middle_name":"","last_name":"Smith","job_title":"I.T.","company_name":"FBI","home_phone":"","work_phone":"5555555555","cell_phone":"","custom_fields":[],"created_date":"2018-09-11T15:12:40.000Z","modified_date":"2018-09-11T18:18:20.000Z","source_details":""}]}
---------------------------
{'meta': {'pagination': {}}, 'results': [{'id': '1329683950', 'status': 'ACTIVE', 'fax': '', 'addresses': [{'id': '4e19e250-b5d9-11e8-9849-d4ae5275509e', 'line1': '222 Fake St.', 'line2': '', 'line3': '', 'city': 'Kansas City', 'address_type': 'BUSINESS', 'state_code': '', 'state': 'OK', 'country_code': 've', 'postal_code': '19512', 'sub_postal_code': ''}], 'notes': [], 'confirmed': False, 'lists': [{'id': '1733488365', 'status': 'ACTIVE'}], 'source': 'Site Owner', 'email_addresses': [{'id': '1fe198a0-b5d5-11e8-92c1-d4ae526edd6c', 'status': 'ACTIVE', 'confirm_status': 'NO_CONFIRMATION_REQUIRED', 'opt_in_source': 'ACTION_BY_OWNER', 'opt_in_date': '2018-09-11T18:18:20.000Z', 'email_address': 'rsmith#fake.com'}], 'prefix_name': '', 'first_name': 'Robert', 'middle_name': '', 'last_name': 'Smith', 'job_title': 'I.T.', 'company_name': 'FBI', 'home_phone': '', 'work_phone': '5555555555', 'cell_phone': '', 'custom_fields': [], 'created_date': '2018-09-11T15:12:40.000Z', 'modified_date': '2018-09-11T18:18:20.000Z', 'source_details': ''}]}
<class 'dict'>
-------------------------
Traceback (most recent call last):
File "C:\Users\rkiek\Desktop\Python WIP\Chris2.py", line 20, in <module>
print(response_dict['first_name'])
KeyError: 'first_name'
first_name = response_dict["results"][0]["first_name"]
Even though I think this question would be better answered by yourself by reading some documentation, I will explain what is going on here. You see the dict-object of the man named "Robert" is within a list which is a value under the key "results". So, at first you need to access the value within results which is a python-list.
Then you can use a loop to iterate through each of the elements within the list, and treat each individual element as a regular dictionary object.
results = response_dict["results"]
results = response_dict.get("results", None)
# use any one of the two above, the first one will throw a KeyError if there is no key=="results" the other will return NULL
# this results is now a list according to the data you mentioned.
for item in results:
print(item.get("first_name", None)
# here you can loop through the list of dictionaries and treat each item as a normal dictionary
I need to prepare test which will be comparing content of .json file with expected result (we want to check if values in .json are correctly generated by our dev tool).
For test I will use robot framework or unittests but I don't know yet how to parse correctly json file.
Json example:
{
"Customer": [{
"Information": [{
"Country": "",
"Form": ""
}
],
"Id": "110",
"Res": "",
"Role": "Test",
"Limit": ["100"]
}]
}
So after I execute this:
with open('test_json.json') as f:
hd = json.load(f)
I get dict 'hd' where key is:
dict_keys(['Customer'])
and values:
dict_values([[{'Information': [{'Form': '', 'Country': ''}], 'Role': 'Test', 'Id': '110', 'Res': '', 'Limit': ['100']}]])
My problem is that I don't know how to get to only one value from Dict(e.g: Role: Test), because I can get only extract whole value. I can prepare a long string to compare with but it is not best solution for tests.
Any ideas how I can get to only one row from .json file?
Your JSON has single key 'Customer' and it has a value of list type. So when you ppass dict_keys(['Customer']) you are getting list value.
>>> hd['Customer']
[{'Id': '110', 'Role': 'Test', 'Res': '', 'Information': [{'Form': '', 'Country': ''}], 'Limit': ['100']}]
First element in list:
>>> hd['Customer'][0]
{'Id': '110', 'Role': 'Test', 'Res': '', 'Information': [{'Form': '', 'Country': ''}], 'Limit': ['100']}
Now access inside dict structure using:
>>> hd['Customer'][0]['Role']
'Test'
You can compare the dict that you loaded (say hd) to the expected results dict (say expected_dict) by running
hd.items() == expected_dict.items()
I have a YAML file that parses into an object, e.g.:
{'name': [{'proj_directory': '/directory/'},
{'categories': [{'quick': [{'directory': 'quick'},
{'description': None},
{'table_name': 'quick'}]},
{'intermediate': [{'directory': 'intermediate'},
{'description': None},
{'table_name': 'intermediate'}]},
{'research': [{'directory': 'research'},
{'description': None},
{'table_name': 'research'}]}]},
{'nomenclature': [{'extension': 'nc'}
{'handler': 'script'},
{'filename': [{'id': [{'type': 'VARCHAR'}]},
{'date': [{'type': 'DATE'}]},
{'v': [{'type': 'INT'}]}]},
{'data': [{'time': [{'variable_name': 'time'},
{'units': 'minutes since 1-1-1980 00:00 UTC'},
{'latitude': [{'variable_n...
I'm having trouble accessing the data in python and regularly see the error TypeError: list indices must be integers, not str
I want to be able to access all elements corresponding to 'name' so to retrieve each data field I imagine it would look something like:
import yaml
settings_stream = open('file.yaml', 'r')
settingsMap = yaml.safe_load(settings_stream)
yaml_stream = True
print 'loaded settings for: ',
for project in settingsMap:
print project + ', ' + settingsMap[project]['project_directory']
and I would expect each element would be accessible via something like ['name']['categories']['quick']['directory']
and something a little deeper would just be:
['name']['nomenclature']['data']['latitude']['variable_name']
or am I completely wrong here?
The brackets, [], indicate that you have lists of dicts, not just a dict.
For example, settingsMap['name'] is a list of dicts.
Therefore, you need to select the correct dict in the list using an integer index, before you can select the key in the dict.
So, giving your current data structure, you'd need to use:
settingsMap['name'][1]['categories'][0]['quick'][0]['directory']
Or, revise the underlying YAML data structure.
For example, if the data structure looked like this:
settingsMap = {
'name':
{'proj_directory': '/directory/',
'categories': {'quick': {'directory': 'quick',
'description': None,
'table_name': 'quick'}},
'intermediate': {'directory': 'intermediate',
'description': None,
'table_name': 'intermediate'},
'research': {'directory': 'research',
'description': None,
'table_name': 'research'},
'nomenclature': {'extension': 'nc',
'handler': 'script',
'filename': {'id': {'type': 'VARCHAR'},
'date': {'type': 'DATE'},
'v': {'type': 'INT'}},
'data': {'time': {'variable_name': 'time',
'units': 'minutes since 1-1-1980 00:00 UTC'}}}}}
then you could access the same value as above with
settingsMap['name']['categories']['quick']['directory']
# quick