API call checking if certain data is retrieved - python

I am performing 100 API calls on 100 different cities and when it is called it usually comes in the form :
{'data': {'aqi': 13,
'attributions': [{'name': 'Air Quality Ontario - the Ontario '
'Ministry of the Environment and Climate '
'Change',
'url': 'http://www.airqualityontario.com/'},
{'name': 'World Air Quality Index Project',
'url': 'https://waqi.info/'}],
'city': {'geo': [43.653226, -79.3831843],
'name': 'Toronto',
'url': 'https://aqicn.org/city/toronto'},
'debug': {'sync': '2019-06-04T15:37:48+09:00'},
'dominentpol': 'pm25',
'iaqi': {'co': {'v': 1.7},
'no2': {'v': 15.2},
'o3': {'v': 8.8},
'p': {'v': 1018.3},
'pm25': {'v': 13},
'so2': {'v': 0.2},
't': {'v': 11.6},
'w': {'v': 0.2}},
'idx': 5914,
'status': 'ok'}
However in ['data']['iaqi'] sometime it lacks one of co, no2, o3,etc... while looping through 100 cities and performing api calls I want to check if each of them are present and append "na" if it is not present.
I am doing try and except like this:
cities = []
aqi = []
# 5 pollutants used to calculate AQI
CO = []
NO2 = []
SO2 = []
pm25 = []
for city in canadian_cities:
city_name = city
url = f'https://api.waqi.info/feed/{city}/?token={api_key}'
response = requests.get(url).json()
if (response["status"] == "ok"):
# sometime aqi might not be a number, exclude them
print("yes")
if (isinstance(response["data"]["aqi"], int)):
# append aqi and city name to appropriate list
aqi.append(response["data"]["aqi"])
cities.append(city)
# append pollutants individually
try:
CO.append(response["data"]["iaqi"]["co"]["v"])
except:
CO.append("na")
try:
NO2.append(response["data"]["iaqi"]["no2"]["v"])
except:
NO2.append("na")
try:
SO2.append(response["data"]["iaqi"]["o3"]["v"])
except:
SO2.append("na")
pm25.append(response["data"]["iaqi"]["pm25"]["v"])
This works perfectly fine however does not seem efficient, I am wondering if there is a cleaner way to do this. Thanks!

Instead of keeping your polutants as separate lists, keep a dictionary like this:
polutants = {"co":[],"no2":[],"so2":[],"pm25":[]}
If you make sure your keys match what you would expect from the API you can now do this:
for item in polutants.keys():
if item in response["data"]["iaqi"].keys():
polutants[item].append(response["data"]["iaqi"][item])
else:
polutants[item].append('na')
but your way is perfectly fine too to be honest.

you can use get method instead
eg:
<your_variable>.append(data["iaqi"].get("co",data["iaqi"]).get("v","na"))

choosing better data structures can make it way better
from collections import defaultdict
cities = defaultdict(dict)
data_points = ["co", "no2", "o3", "pm25"]
for city in canadian_cities:
url = f'https://api.waqi.info/feed/{city}/?token={api_key}'
response = requests.get(url).json()
if (response["status"] == "ok"):
# sometime aqi might not be a number, exclude them
print("yes")
if (isinstance(response["data"]["aqi"], int)):
# append aqi and city name to appropriate list
cities[city]['aqi'] = response["data"]["aqi"]
for data_point in data_points:
cities[city][data_point] = response["data"]["iaqi"].get(data_point, dict()).get("v", "na")
print(cities)
I would even go bit further and refactor it like this:
from collections import defaultdict, namedtuple
cities = defaultdict(dict)
data_points = ['aqi', "co", "no2", "o3", "pm25"]
defaults = ['na', 'na', 'na', 'na', 'na']
AQI = namedtuple('AQI', field_names=data_points, defaults=defaults)
def get_data(city):
url = f'https://api.waqi.info/feed/{city}/?token={api_key}'
response = requests.get(url).json()
if response["status"] == "ok":
return response
def parse_json(data):
raw_data = {key:value['v'] for key, value in data["iaqi"].items() if key in data_points[1:]}
raw_data['aqi'] = data["aqi"]
return AQI(**raw_data)
for city in canadian_cities:
data = get_data(city)
if data and isinstance(data["data"]["aqi"], int):
cities[city] = parse_json(data=data["data"])
print(cities)

Related

How to extract a couple of fields nested in response using python

I'm a python beginner. I would like to ask for help regarding the retrieve the response data. Here's my script:
import pandas as pd
import re
import time
import requests as re
import json
response = re.get(url, headers=headers, auth=auth)
data = response.json()
Here's a part of json response:
{'result': [{'display': '',
'closure_code': '',
'service_offer': 'Integration Platforms',
'updated_on': '2022-04-23 09:05:53',
'urgency': '2',
'business_service': 'Operations',
'updated_by': 'serviceaccount45',
'description': 'ALERT returned 400 but expected 200',
'sys_created_on': '2022-04-23 09:05:53',
'sys_created_by': 'serviceaccount45',
'subcategory': 'Integration',
'contact_type': 'Email',
'problem_type': 'Design: Availability',
'caller_id': '',
'action': 'create',
'company': 'aaaa',
'priority': '3',
'status': '1',
'opened': 'smith.j',
'assigned_to': 'doe.j',
'number': '123456',
'group': 'blabla',
'impact': '2',
'category': 'Business Application & Databases',
'caused_by_change': '',
'location': 'All Locations',
'configuration_item': 'Monitor',
},
I would like to extract the data only for one group = 'blablabla'. Then I would like to extract fields such as:
number = data['number']
group = data['group']
service_offer = data['service_offer']
updated = data['updated_on']
urgency = data['urgency']
username = data['created_by']
short_desc = data['description']
How it should be done?
I know that to check the first value I should use:
service_offer = data['result'][0]['service_offer']
I've tried to create a dictionary, but, I'm getting an error:
data_result = response.json()['result']
payload ={
number = data_result['number']
group = data_result['group']
service_offer = data_result['service_offer']
updated = data_result['updated_on']
urgency = data_result['urgency']
username = data_result['created_by']
short_desc = data_result['description']
}
TypeError: list indices must be integers or slices, not str:
So, I've started to create something like below., but I'm stuck:
get_data = []
if len(data) > 0:
for item in range(len(data)):
get_data.append(data[item])
May I ask for help?
If data is your decoded json response from the question then you can do:
# find group `blabla` in result:
g = next(d for d in data["result"] if d["group"] == "blabla")
# get data from the `blabla` group:
number = g["number"]
group = g["group"]
service_offer = g["service_offer"]
updated = g["updated_on"]
urgency = g["urgency"]
username = g["sys_created_by"]
short_desc = g["description"]
print(number, group, service_offer, updated, urgency, username, short_desc)
Prints:
123456 blabla Integration Platforms 2022-04-23 09:05:53 2 serviceaccount45 ALERT returned 400 but expected 200

How to check each key separately from a list in a loop without creating multiple loops. Which may have a KeyError etc

I wrote a code that takes 9 keys from API.
The authors, isbn_one, isbn_two, thumbinail, page_count fields may not always be retrievable, and if any of them are missing, I would like it to be None. Unfortunately, if, or even nested, doesn't work. Because that leads to a lot of loops. I also tried try and except KeyError etc. because each key has a different error and it is not known which to assign none to. Here is an example of logic when a photo is missing:
th = result['volumeInfo'].get('imageLinks')
if th is not None:
book_exists_thumbinail = {
'thumbinail': result['volumeInfo']['imageLinks']['thumbnail']
}
dnew = {**book_data, **book_exists_thumbinail}
book_import.append(dnew)
else:
book_exists_thumbinail_n = {
'thumbinail': None
}
dnew_none = {**book_data, **book_exists_thumbinail_n}
book_import.append(dnew_none)
When I use logic, you know when one condition is met, e.g. for thumbinail, the rest is not even checked.
When I use try and except, it's similar. There's also an ISBN in the keys, but there's a list in the dictionary over there, and I need to use something like this:
isbn_zer = result['volumeInfo']['industryIdentifiers']
dic = collections.defaultdict(list)
for d in isbn_zer:
for k, v in d.items():
dic[k].append(v)
Output data: [{'type': 'ISBN_10', 'identifier': '8320717507'}, {'type': 'ISBN_13', 'identifier': '9788320717501'}]
I don't know what to use anymore to check each key separately and in the case of its absence or lack of one ISBN (identifier) assign the value None. I have already tried many ideas.
The rest of the code:
book_import = []
if request.method == 'POST':
filter_ch = BookFilterForm(request.POST)
if filter_ch.is_valid():
cd = filter_ch.cleaned_data
filter_choice = cd['choose_v']
filter_search = cd['search']
search_url = "https://www.googleapis.com/books/v1/volumes?"
params = {
'q': '{}{}'.format(filter_choice, filter_search),
'key': settings.BOOK_DATA_API_KEY,
'maxResults': 2,
'printType': 'books'
}
r = requests.get(search_url, params=params)
results = r.json()['items']
for result in results:
book_data = {
'title': result['volumeInfo']['title'],
'authors': result['volumeInfo']['authors'][0],
'publish_date': result['volumeInfo']['publishedDate'],
'isbn_one': result['volumeInfo']['industryIdentifiers'][0]['identifier'],
'isbn_two': result['volumeInfo']['industryIdentifiers'][1]['identifier'],
'page_count': result['volumeInfo']['pageCount'],
'thumbnail': result['volumeInfo']['imageLinks']['thumbnail'],
'country': result['saleInfo']['country']
}
book_import.append(book_data)
else:
filter_ch = BookFilterForm()
return render(request, "BookApp/book_import.html", {'book_import': book_import,
'filter_ch': filter_ch})```

Using regex to search for text that follows a specific word

I am searching a string of text which contains dictionaries that look like so:
soup_string = """{"loadType":"","shiftId":"ROVR-DUMMY-SHIFTID","carbonFriendly":"no","cost":"£2.00","initialSlotPrice":"","timeSlotISO":"2019-06-13T12:00+01:00/13:00+01:00","isSameDayPremium":"false","stopId":"10446315588190612134701380","availability":"full","slotDiscountedByDP":"false","slotId":"1hr-12-13-20190613","time":"12:00pm - 1:00pm","rawSlotPrice":"","slotDiscounted":"false"},
{"loadType":"","shiftId":"ROVR-DUMMY-SHIFTID","carbonFriendly":"no","cost":"£2.00","initialSlotPrice":"","timeSlotISO":"2019-06-13T12:30+01:00/13:30+01:00","isSameDayPremium":"false","stopId":"10446315588190612134701380","availability":"available","slotDiscountedByDP":"false","slotId":"1hr-12:30-13:30-20190613","time":"12:30pm - 1:30pm","rawSlotPrice":"","slotDiscounted":"false"}"""
I am looking to return the string which follows each key in the 'dictionaries'.
I have decided an appropriate method is to use Regex expressions. I can return each times and costs using
Costs = re.findall(r"\£[0-9]\.[0-9][0-9]", soup_string)
times = re.findall(r'\"(time)\"\:\"(.{14,16})\"\,', soup_string)
Essentially I would like to be able to look for each key in the dictionary, and search for a specific string then return the value.
The end goal is to create a dictionary with the 'Cost', 'Availability' and 'time'.
Full code:
import requests
from bs4 import BeautifulSoup
import json
postcode = "L4 0TH"
ASDA_url = "https://groceries.asda.com/api/user/checkpostcode?postcode="+ postcode + "&requestorigin=gi"
ASDA_url2 = "https://groceries.asda.com/api/slot/view?startdate=12%2F06%2F2019&deliveryoption=homedelivery&requestorigin=gi&_="
client = requests.Session()
r = client.get(ASDA_url)
r2 = client.get(ASDA_url2)
soup = BeautifulSoup(r2.text, 'html.parser')
soup_string = str(soup)
soup_dicts = json.loads('[' + soup_string + ']')
keep_keys = ('cost', 'availability', 'time')
filtered = [{k:soup_dict[k] for k in keep_keys} for soup_dict in soup_dicts]```
Given that you have multiple dictionaries, I'm not exactly sure what you're trying to obtain, but from my understanding this should help:
import json
soup_string = ''' ... ''' # As it is in the question
soup_dicts = json.loads('[' + soup_string + ']')
keep_keys = ('cost', 'availability', 'time')
filtered = [{k:soup_dict[k] for k in keep_keys} for soup_dict in soup_dicts]
It treats your string of dictionaries as a list of JSON dictionaries, and uses the json module to parse it. Then it filters out everything except the key/value pairs you need. The result is a list of the filtered dictionaries.
Output (i.e. value of filtered):
[
{'cost': '£2.00', 'availability': 'full', 'time': '12:00pm - 1:00pm'},
{'cost': '£2.00', 'availability': 'available', 'time': '12:30pm - 1:30pm'}
]
EDIT:
In response to you providing your code, I can see that you're calling str on the results from BeautifulSoup. Rather than doing that, you can just process the client.get() results directly:
import json
import requests
postcode = "L4 0TH"
ASDA_url = "https://groceries.asda.com/api/user/checkpostcode?postcode="+ postcode + "&requestorigin=gi"
ASDA_url2 = "https://groceries.asda.com/api/slot/view?startdate=12%2F06%2F2019&deliveryoption=homedelivery&requestorigin=gi&_="
client = requests.Session()
r = client.get(ASDA_url)
r2 = client.get(ASDA_url2)
dicts = r2.json()['slotHeader'][0]['slots']
keep_keys = ('cost', 'availability', 'time')
filtered = [{k:d[k] for k in keep_keys} for d in dicts]
First you need to put your data into a list and create a dictionary with key: data. (see my example below). Then use json to convert it as a dictionary of dictionaries. Then extract cost, availability and time per dictionary on a loop.
import json
soup_string = """{"data": [{"loadType":"","shiftId":"ROVR-DUMMY-SHIFTID","carbonFriendly":"no","cost":"£2.00","initialSlotPrice":"","timeSlotISO":"2019-06-13T12:00+01:00/13:00+01:00","isSameDayPremium":"false","stopId":"10446315588190612134701380","availability":"full","slotDiscountedByDP":"false","slotId":"1hr-12-13-20190613","time":"12:00pm - 1:00pm","rawSlotPrice":"","slotDiscounted":"false"}, {"loadType":"","shiftId":"ROVR-DUMMY-SHIFTID","carbonFriendly":"no","cost":"£2.00","initialSlotPrice":"","timeSlotISO":"2019-06-13T12:30+01:00/13:30+01:00","isSameDayPremium":"false","stopId":"10446315588190612134701380","availability":"available","slotDiscountedByDP":"false","slotId":"1hr-12:30-13:30-20190613","time":"12:30pm - 1:30pm","rawSlotPrice":"","slotDiscounted":"false"}]}"""
d = json.loads(soup_string)
result = []
cost, avail, time = [], [], []
for data in d['data']:
tmp = {}
tmp['Cost'] = data['cost']
tmp['Availability'] = data['availability']
tmp['Time'] = data['time']
result.append(tmp)
result
Output:
[{'Cost': '£2.00', 'Availability': 'full', 'Time': '12:00pm - 1:00pm'},
{'Cost': '£2.00', 'Availability': 'available', 'Time': '12:30pm - 1:30pm'}]

Converting deeply nested JSON response from an API call to pandas dataframe

I am currently having trouble parsing a deeply nested JSON response from a HTTP API call.
My JSON Response is like
{'took': 476,
'_revision': 'r08badf3',
'response': {'accounts': {'hits': [{'name': '4002238760',
'display_name': 'Googleglass-4002238760',
'selected_fields': ['Googleglass',
'DDMonkey',
'Papu New Guinea',
'Jonathan Vardharajan',
'4002238760',
'DDMadarchod-INSTE',
None,
'Googleglass',
'0001012556',
'CC',
'Setu Non Standard',
'40022387',
320142,
4651321321333,
1324650651651]},
{'name': '4003893720',
'display_name': 'Swift-4003893720',
'selected_fields': ['Swift',
'DDMonkey',
'Papu New Guinea',
'Jonathan Vardharajan',
'4003893720',
'DDMadarchod-UPTM-RemotexNBD',
None,
'S.W.I.F.T. SCRL',
'0001000110',
'SE',
'Setu Non Standard',
'40038937',
189508,
1464739200000,
1559260800000]},
After I receive the response I am storing it in data object using json normalize
data = response.json()
data = data['response']['accounts']['hits']
data = json_normalize(data)
However after I normalize my dataframe looks like this
My Curl Statement looks like this
curl --data 'query= {"terms":[{"type":"string_attribute","attribute":"Account Type","query_term_id":"account_type","in_list":["Contract"]},{"type":"string","term":"status_group","in_list":["paying"]},{"type":"string_attribute","attribute":"Region","in_list":["DDEU"]},{"type":"string_attribute","attribute":"Country","in_list":["Belgium"]},{"type":"string_attribute","attribute":"CSM Tag","in_list":["EU CSM"]},{"type":"date_attribute","attribute":"Contract Renewal Date","gte":1554057000000,"lte":1561833000000}],"count":1000,"offset":0,"fields":[{"type":"string_attribute","attribute":"DomainName","field_display_name":"Client Name"},{"type":"string_attribute","attribute":"Region","field_display_name":"Region"},{"type":"string_attribute","attribute":"Country","field_display_name":"Country"},{"type":"string_attribute","attribute":"Success Manager","field_display_name":"Client Success Manager"},{"type":"string","term":"identifier","field_display_name":"Account id"},{"type":"string_attribute","attribute":"DeviceSLA","field_display_name":"[FIN] Material Part Number"},{"type":"string_attribute","attribute":"SFDCAccountId","field_display_name":"SFDCAccountId"},{"type":"string_attribute","attribute":"Client","field_display_name":"[FIN] Client Sold-To Name"},{"type":"string_attribute","attribute":"Sold To Code","field_display_name":"[FIN] Client Sold To Code"},{"type":"string_attribute","attribute":"BU","field_display_name":"[FIN] Active BUs"},{"type":"string_attribute","attribute":"Service Type","field_display_name":"[FIN] Service Type"},{"type":"string_attribute","attribute":"Contract Header ID","field_display_name":"[FIN] SAP Contract Header ID"},{"type":"number_attribute","attribute":"Contract Value","field_display_name":"[FIN] ACV - Annual Contract Value","desc":true},{"type":"date_attribute","attribute":"Contract Start Date","field_display_name":"[FIN] Contract Start Date"},{"type":"date_attribute","attribute":"Contract Renewal Date","field_display_name":"[FIN] Contract Renewal Date"}],"scope":"all"}' --header 'app-token:YOUR-TOKEN-HERE' 'https://app.totango.com/api/v1/search/accounts'
So ultimately I want to store the Response in a dataframe along with the field names.
I've had to do this sort of thing a few times in the past (flatten out a nested json) I'll explain my process, and you can see if it works, or at least can then work the code a bit to fit your needs.
1) Took the data response, and completely flattened it out using a function. This blog was very helpful when I first had to do this.
2) Then it iterates through the flat dictionary created to find where each rows and columns are needed to be created by the numbering of the new key names within the nested parts. There are also keys that are unique/distinct, so they don't have a number to identify as a "new" row, so I account for those in what I called special_cols.
3) As it iterates through those, pulls the specified row number (embedded in those flat keys), and then constructs the dataframe in that way.
It sounds complicated, but if you debug and run line by line, you could see how it works. None-the-less, I believe it should get you what you need.
data = {'took': 476,
'_revision': 'r08badf3',
'response': {'accounts': {'hits': [{'name': '4002238760',
'display_name': 'Googleglass-4002238760',
'selected_fields': ['Googleglass',
'DDMonkey',
'Papu New Guinea',
'Jonathan Vardharajan',
'4002238760',
'DDMadarchod-INSTE',
None,
'Googleglass',
'0001012556',
'CC',
'Setu Non Standard',
'40022387',
320142,
4651321321333,
1324650651651]},
{'name': '4003893720',
'display_name': 'Swift-4003893720',
'selected_fields': ['Swift',
'DDMonkey',
'Papu New Guinea',
'Jonathan Vardharajan',
'4003893720',
'DDMadarchod-UPTM-RemotexNBD',
None,
'S.W.I.F.T. SCRL',
'0001000110',
'SE',
'Setu Non Standard',
'40038937',
189508,
1464739200000,
1559260800000]}]}}}
import pandas as pd
import re
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(data)
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = column.replace('_', '')
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
Output:
print (results.to_string())
name displayname selectedfields0 selectedfields1 selectedfields2 selectedfields3 selectedfields4 selectedfields5 selectedfields6 selectedfields7 selectedfields8 selectedfields9 selectedfields10 selectedfields11 selectedfields12 selectedfields13 selectedfields14 took _revision
0 4002238760 Googleglass-4002238760 Googleglass DDMonkey Papu New Guinea Jonathan Vardharajan 4002238760 DDMadarchod-INSTE NaN Googleglass 0001012556 CC Setu Non Standard 40022387 320142.0 4.651321e+12 1.324651e+12 476 r08badf3
1 4003893720 Swift-4003893720 Swift DDMonkey Papu New Guinea Jonathan Vardharajan 4003893720 DDMadarchod-UPTM-RemotexNBD NaN S.W.I.F.T. SCRL 0001000110 SE Setu Non Standard 40038937 189508.0 1.464739e+12 1.559261e+12 476 r08badf3

How to add a key-value pair in dictionary for dumping data in JSON format in python?

I want to add a key-value pair to an already existing array having some key-value pairs, and then dump this information in JSON format.
I tried following code:
import json
student_data = [{'stu_name':'name','id no':7}]
if result is 1:
student_data['result'] = 'pass'
else:
student_data['result'] = 'fail'
if school is 1:
student_data['school'] = 'secondary school'
else:
student_data['school'] = 'primary school'
with open(file.json, "w") as f:
json.dump(student_data, f)
But this code gives me error in line "student_data['result'] = 'pass'
I tried removing [] from student_data = [{'stu_name':'name','id no':7}]
but then only keys get printed in the file without values.
How can I correct this?
You have a list with a dictionary. Either use indexing:
student_data[0]['result'] = 'pass'
or add the list later, when writing:
student_data = {'stu_name':'name','id no':7}
# ...
with open(file.json, "w") as f:
json.dump([student_data], f)
Note: Do not use identity tests for integers when you should be testing for equality instead. Just because CPython happens to intern small integers, doesn't make using is 1 a good idea. Use == 1 instead:
student_data = {'stu_name':'name','id no':7}
student_data['result'] = 'pass' if result == 1 else 'fail'
student_data['school'] = 'secondary school' if school == 1 else 'primary school'
with open(file.json, "w") as f:
json.dump([student_data], f)
In the above example I used conditional expressions to set the result and school keys; you can use those directly in the dictionary literal too:
student_data = [{'stu_name': 'name', 'id no':7,
'result': 'pass' if result == 1 else 'fail',
'school': 'secondary school' if school == 1 else 'primary school',
}]
with open(file.json, "w") as f:
json.dump(student_data, f)
If you are changing student_data as dictionary then you can try something like this to update the dictionary. You are remove [] from student_data so, it will change to dict object.
>>> student_data = {'stu_name':'name','id no':7}
>>> student_data.update({'result':'pass'})
>>> student_data
{'stu_name': 'name', 'id no': 7, 'result': 'pass'}
>>>
Or You can just assign it:
>>> student_data = {'stu_name':'name','id no':7}
>>> student_data['result'] = 'pass'
>>> student_data
{'stu_name': 'name', 'id no': 7, 'result': 'pass'}
>>>

Categories

Resources