How can I filter API GET Request on multiple variables? - python

I am really struggling with this one. I'm new to python and I'm trying to extract data from an API.
I have managed to run the script below but I need to amend it to filter on multiple values for one column, lets say England and Scotland. Is there an equivelant to the SQL IN operator e.g. Area_Name IN ('England','Scotland').
from requests import get
from json import dumps
ENDPOINT = "https://api.coronavirus.data.gov.uk/v1/data"
AREA_TYPE = "nation"
AREA_NAME = "england"
filters = [
f"areaType={ AREA_TYPE }",
f"areaName={ AREA_NAME }"
]
structure = {
"date": "date",
"name": "areaName",
"code": "areaCode",
"dailyCases": "newCasesByPublishDate",
}
api_params = {
"filters": str.join(";", filters),
"structure": dumps(structure, separators=(",", ":")),
"latestBy": "cumCasesByPublishDate"
}
formats = [
"json",
"xml",
"csv"
]
for fmt in formats:
api_params["format"] = fmt
response = get(ENDPOINT, params=api_params, timeout=10)
assert response.status_code == 200, f"Failed request for {fmt}: {response.text}"
print(f"{fmt} data:")
print(response.content.decode())

I have tried the script, and dict is the easiest type to handle in this case.
Given your json data output
data = {"length":1,"maxPageLimit":1,"data":[{"date":"2020-09-17","name":"England","code":"E92000001","dailyCases":2788}],"pagination":{"current":"/v1/data?filters=areaType%3Dnation%3BareaName%3Dengland&structure=%7B%22date%22%3A%22date%22%2C%22name%22%3A%22areaName%22%2C%22code%22%3A%22areaCode%22%2C%22dailyCases%22%3A%22newCasesByPublishDate%22%7D&latestBy=cumCasesByPublishDate&format=json&page=1","next":null,"previous":null,"first":"/v1/data?filters=areaType%3Dnation%3BareaName%3Dengland&structure=%7B%22date%22%3A%22date%22%2C%22name%22%3A%22areaName%22%2C%22code%22%3A%22areaCode%22%2C%22dailyCases%22%3A%22newCasesByPublishDate%22%7D&latestBy=cumCasesByPublishDate&format=json&page=1","last":"/v1/data?filters=areaType%3Dnation%3BareaName%3Dengland&structure=%7B%22date%22%3A%22date%22%2C%22name%22%3A%22areaName%22%2C%22code%22%3A%22areaCode%22%2C%22dailyCases%22%3A%22newCasesByPublishDate%22%7D&latestBy=cumCasesByPublishDate&format=json&page=1"}}
You can try something like this:
countries = ['England', 'France', 'Whatever']
return [country for country in data where country['name'] in countries]
I presume the data list is the only interesting key in the data dict since all others do not have any meaningful values.

Related

Use of parameters dictionary with Python requests GET method

Trying to retrieve data via the EIA data API (v2): https://www.eia.gov/opendata/documentation.php.
I'm able to use the API dashboard to return data:
https://www.eia.gov/opendata/browser/electricity/retail-sales?frequency=monthly&data=price;revenue;sales;&start=2013-01
But when I attempt to retrieve within Python using the attached documentation, I don't appear to be returning any values when using the same parameters.
url = 'https://api.eia.gov/v2/electricity/retail-sales/data/?api_key=' + API_KEY
params = {
"frequency": "monthly",
"data": [
"revenue",
"sales",
"price"
],
"start": "2013-01"
}
if x.status_code == 200:
print('Success')
else:
print('Failed')
res = x.json()['response']
data = res['data']
If I print the url created by the GET method, and compare to API url included in the dashboard, the issue appears to be in the way the GET method is attempting to retrieve items from the data parameter:
Works
https://api.eia.gov/v2/electricity/retail-sales/data/?frequency=monthly&data[0]=price&data[1]=revenue&data[2]=sales&start=2013-01&sort[0][column]=period&sort[0][direction]=desc&offset=0&length=5000
Doesn't work (returned by GET method):
https://api.eia.gov/v2/electricity/retail-sales/data/?api_key=MY_API&frequency=monthly&data=revenue&data=sales&data=price&start=2013-01
Can anyone provided guidance on how to coerce the GET method to pass my data parameters in the same way as the API dashboard appears to?
Your data in params not formatted correctly in url. Try this if you want the url to be formed as in your working version:
url = 'https://api.eia.gov/v2/electricity/retail-sales/data/?api_key=' + API_KEY
data = [
"revenue",
"sales",
"price"
]
params = {
"frequency": "monthly",
"start": "2013-01"
}
for index in range(0, len(data)):
params[f"data[{index}]"] = data[index]
response = requests.get(url, params = params)
But if the server is adequate, then square brackets in the name of the data[] parameter are enough:
url = 'https://api.eia.gov/v2/electricity/retail-sales/data/?api_key=' + API_KEY
params = {
"frequency": "monthly",
"data[]": [
"revenue",
"sales",
"price"
],
"start": "2013-01"
}

How to extract and print a list of values from my JSON data?

I'm working in Python (3.8) and I've successfully called an API gotten it to print the JSON within command line after running the Python file. Now, I want to be able to print a particular list of information (like all of the names from the JSON), and later on save that list as its own set of data, but I'm hitting a block.
Example JSON I'm working with:
{
"data": {
"employees": [
{
"fields": {
"name": "Buddy",
"superheroName": "Syndrome",
"workEmail": "syndrome#example.com",
}
},
{
"fields": {
"name": "Helen Parr",
"superheroName": "Elastigirl",
"workEmail": "elastigirl#example.com",
}
}
]
}
I’ve tried the following so far and I was able to get “data” to print, but anytime I try to print another “layer” and get to say...“employees” or “fields” even, I hit a wall.
url = "my API url"
response = requests.get(url)
if response.status_code != 200:
print('Error with status code {}'.format(response.status_code))
exit()
jsonResponse = response.json()
jsonPretty = json.dumps(jsonResponse, indent=4, sort_keys=True)
jsonDictionary = json.loads(jsonPretty)
keys = jsonDictionary.keys()
for key in jsonDictionary.keys():
print(key)
Ideally, could someone share insight into how I can access the 'name' JSON value and get Python to print it as a list like the following, for example:
Buddy
Helen Parr
JSON files are basically nested dictionaries. jsonDictionary only contains one key and one entry under that key: data and another dictionary with the rest your result respectively.
If you wanted to access the name fields specifically:
employeesDict = jsonDictionary['data']
feildsDictList = employeesDict['employees']
firstFieldsDict = fieldsDictList[0]
secondFieldsDict = fieldsDictList[1]
firstName = firstFieldsDict['name']
secondNAme = secondFieldsDict['name']
You can access it like this (make sure it's already a dictionary):
for i in h['data']['employees']:
print(i['fields']['name'])
This way you can access the names with i['fields']['name']

Generating specific fields from JSON objects

I have a rather massive JSON objects and I'm trying to generate a specific JSON using only certain elements from it.
My current code looks like this:
data = get.req("/v2/users")
data = json.loads(data)
print (type(data)) # This returns <class 'list'>
print (data) # Below for what this returns
all_data = []
for d in data:
login_value = d['login']
if login_value.startswith('fe'):
continue
s = get.req("/v2/users/" + str(login_value)) # Sending another request with each
# login from the first request
all_data.append(s)
print (all_data) # Below for what this looks like this
print (data) before json.loads is str for the information, it returns data like this:
[
{
"id": 68663,
"login": "test1",
"url": "https://x.com/test"
},
{
"id": 67344,
"login": "test2",
"url": "https://x.com/test"
},
{
"id": 66095,
"login": "hi",
"url": "https://x.com/test"
}
]
print (all_data) returns a similar result to this for every user a request was sent for the first time
[b'{"id":68663,"email":"x#gmail.com","login":"xg","phone":"hidden","fullname":"xg gx","image_url":"https://imgur.com/random.png","mod":false,"points":5,"activity":0,"groups":['skill': 'archery']}
And this repeats for every user.
What I'm attempting to do is filtering by a few fields from all those results I received, so the final JSON I have will look something like this
[
{
"email": "x#gmail.com",
"fullname": "xg gf",
"points": 5,
"image_url", "https://imgur.com/random.pmg"
},
{
... similar json for the next user and so on
}
]
I feel as if the way I'm iterating over the data might be inefficient, so if you could guide me to a better way it would be wonderful.
In order for you to fetch login value, you have to iterate over data atleast once and for fetching details for every user, you have to make one call and that is exactly what you have done.
After you receive the user details instead of appending the whole object to the all_data list just take the field you need and construct a dict of it and then append it to all_data.
So your code has time complexity of O(n) which is best I understand.
Edit :
For each user you are receiving a byte response like below.
byte_response = [ b'{"id":68663,"email":"x#gmail.com","login":"xg","phone":"hidden","fullname":"xg gx","image_url":"https://imgur.com/random.png","mod":false,"points":5,"activity":0,"groups":[]}']
I'm not sure why would you get a response in a list [], but if it like that then take byte_response[0] so that we have the actual byte data like below.
byte_response = b'{"id":68663,"email":"x#gmail.com","login":"xg","phone":"hidden","fullname":"xg gx","image_url":"https://imgur.com/random.png","mod":false,"points":5,"activity":0,"groups":[]}'
response_decoded = byte_response.decode("utf-8") #decode it
import json
json_object_in_dict_form = json.loads(response_decoded) #convert it into dictionary
and then...
json_object_in_dict_form['take the field u want']
you can write:
data = get.req("/v2/users")
data = json.loads(data)
all_data = []
for d in data:
...
s = get.req("/v2/users/" + str(login_value))
new_data = {
'email': s['email'],
'fullname': s['fullname'],
'points': s['points'],
'image_url': s['image_url']
}
all_data.append(new_data)
print (all_data)
or you can make it fancy using an array with the fields you need:
data = get.req("/v2/users")
data = json.loads(data)
all_data = []
fields = ['email', 'fullname', 'point', 'image_url']
for d in data:
...
s = get.req("/v2/users/" + str(login_value))
new_data = dict()
for field in fields:
new_data[field] = s[field]
all_data.append(new_data)
print (all_data)

Handling url parts in a api returned json format

*** I have updated code at the bottom.
I have a json object I'm working with and it's coming from Azure analytics for an application we build. I'm trying to figure hot to parse the Url that comes back with just the limits and location keys data in separate columns. The code I'm using is listed here: (keys are take out as well as url because of api keys and tokens)
import request
import pandas as pd
from urllib.parse import urlparse
from furl import furl
import json
d1 = '<This is where I have the rule for the API>'
querystring = {"timespan":"P7D") #get's last 7 days
headers = { Stuff in here for headers }
response = requests.request("GET", d1, headers=headers, parms=querystring)
data = json.loads(response.text)
#then I clean up the stuff in the dataframe
for stuff in data(['value'])
del stuff['count'].....(just a list of all the non needed fields in the json script)
newstuff = json.dumps(data, indent=2, sort_keys=True
data2 = json.loads(newstuff)
ok now here is the part I am having problems with, I want to pull out 3 columns of data from each row. the ['request']['url'], ['timestamp'], ['user']['id']
I'm pretty sure I need to do a for loop so I'm doing the following to get the pieces out.
for x in data2['value']:
time = x['timestamp']
user = x['user']['id']
url = furl(x['request']['url'])
limit = url.args['limit']
location = url.args['location']
What's happening is when I try this i'm getting limit does not exist for every url. I think I have to do a if else statement but not sure how to formulate this. I need to get everything into a dataframe so I can parse it out into a cursor.execute statement which I know how to do.
What's needed.
1. Get the information in the for loop into a dataframe
2. take the url and if the url does not have a Limit or a Location then make it none else put limit in a column and same for location in a column by itself.
Dataframe would look like this
timestamp user limit location
2018-01-01 bob#home.com null
2018-01-01 bill#home.com null
2018-01-01 same#home.com null null
2018-01-02 bob#home.com
here is the information on furl
here is some sample json to test with:
{
"value": [{
"request": {
"url": "https://website/testing"
},
"timestamp": "2018-09-23T18:32:58.153z",
"user": {
"id": ""
}
},
{
"request": {
"url": "https://website/testing/limit?location=31737863-c431-e6611-9420-90b11c44c42f"
},
"timestamp": "2018-09-23T18:32:58.153z",
"user": {
"id": "steve#home.com"
}
},
{
"request": {
"url": "https://website/testing/dealanalyzer?limit=57bd5872-3f45-42cf-bc32-72ec21c3b989&location=31737863-c431-e611-9420-90b11c44c42f"
},
"timestamp": "2018-09-23T18:32:58.153z",
"user": {
"id": "tom#home.com"
}
}
]
}
import requests
import pandas as pd
from urllib.parse import urlparse
import json
from pandas.io.json import json_normalize
d1 = "https://nowebsite/v1/apps/11111111-2222-2222-2222-33333333333333/events/requests"
querystring = {"timespan":"P7D"}
headers = {
'x-api-key': "xxxxxxxxxxxxxxxxxxxxxxxx",
'Cache-Control': "no-cache",
'Postman-Token': "xxxxxxxxxxxxxxxxxxxx"
}
response = requests.request("GET", d1, headers=headers, params=querystring)
data = json.loads(response.text)
# delete crap out of API GET Request
for stuff in data['value']:
del stuff['count']
del stuff['customDimensions']
del stuff['operation']
del stuff['session']
del stuff['cloud']
del stuff['ai']
del stuff['application']
del stuff['client']
del stuff['id']
del stuff['type']
del stuff['customMeasurements']
del stuff['user']['authenticatedId']
del stuff['user']['accountId']
del stuff['request']['name']
del stuff['request']['success']
del stuff['request']['duration']
del stuff['request']['performanceBucket']
del stuff['request']['resultCode']
del stuff['request']['source']
del stuff['request']['id']
newstuff = json.dumps(data, indent=2, sort_keys=True)
#print(newstuff)
# Now it's in a cleaner format to work with
data2 = json.loads(newstuff)
json_normalize(data2['value'])
From here the data is in a pandas dataframe and looks like I want it to.
I just need to know how to use the furl to pull the limit and location out of the url on a per row basis and create a new column called load and limits as mentions above.

How to pass nested JSON from dataframe while adding another field?

I have a dataframe that I need to pass as a nested json string into an email service provider api as a json string.
My dataframe looks like this:
email_address first_name last_name
a#a.com adam apple
b#b.com bobby banana
The contacts in the dataframe is what I need to pass into the email service provider API, and this needs be a nested JSON string like so:
{
"import_data": [{
"email_addresses": ["hector#hector.com"],
"first_name": "Hector",
"last_name": "Smith"
}, {
"email_addresses": ["Jane#Doe.com"],
"first_name": "Jane",
"last_name": "Doe"
}, {
"email_addresses": ["Pradeep#Patel.com"],
"first_name": "Pradeep",
"last_name": "Patel"
}],
"lists": ["1234567890"]
}
I am not sure how I would create nested json string via the 'to_json' command from pandas and at the same time insert the word "import_data" as above into the json string. I know I can hard code the a column with in the dataframe for "lists" and pass that in as well. List ID will always be static.
Here is code for my API response:
headers = {
'Authorization': '',
'X-Originating-Ip': '',
'Content-Type': '',
}
update_contact = '{"import_data": [{"email_addresses": ["test#test.com"],"first_name": "test","last_name": "test"},{"email_addresses": ["Jane#Doe.com"],"first_name": "Jane","last_name": "Doe"}, {"email_addresses": ["Pradeep#Patel.com"],"first_name": "Pradeep","last_name": "Patel"}],"lists": ["1234567890"]}'
r = requests.post('url', headers=headers ,data = update_contact)
print(r.text)
I believe the API asked for application/json, if this is really the case, you should send it like this
headers = {}
update_contact = {"import_data": [{"email_addresses": ["test#test.com"],"first_name": "test","last_name": "test"},{"email_addresses": ["Jane#Doe.com"],"first_name": "Jane","last_name": "Doe"}, {"email_addresses": ["Pradeep#Patel.com"],"first_name": "Pradeep","last_name": "Patel"}],"lists": ["1234567890"]}
r = requests.post('url', headers=headers ,json= update_contact)
print(r.text)
Format the data using orient='records' of to_dict(), then just format your the dictionary to json
#converted emails to list, may not be necessary...
df.email_address = df.email_address.apply(lambda x: [x])
import json
update_contact = json.dumps({'import_data':df.to_dict(orient='records'),
'lists':["1234567890"]})
You can use json_variable.append(NewValues) so that it will append any of your data blocks into a json value.
Refer to below skeleton and re-build according to your inputs.
enter code here
import json
Json_List=[]
for i in Excel_data:
Newdata = {
'Name':Read_name_From_Excel(i),
'Age':Read_age_From_Excel(i),
'Email':Read_Email_From_Excel(i),
'Amount':Read_Amount_From_Excel(i)
}
Json_List.append(Newdata)

Categories

Resources