Call API for each element in list - python

I have a list with over 1000 IDs and I want to call an API with different endpoints for every element of the list.
Example:
customerlist = [803818, 803808, 803803,803738,803730]
I tried the following:
import json
import requests
import pandas as pd
API_BASEURL = "https://exampleurl.com/"
API_TOKEN = "abc"
HEADERS = {'content-type' : 'application/json',
'Authorization': API_TOKEN }
def get_data(endpoint):
for i in customerlist:
api_endpoint = endpoint
params = {'customerid' : i}
response = requests.get(f"{API_BASEURL}/{api_endpoint}",
params = params,
headers = HEADERS)
if response.status_code == 200:
res = json.loads(response.text)
else:
raise Exception(f'API error with status code {response.status_code}')
res= pd.DataFrame([res])
return res
get_data(endpointexample)
This works, but it only returns the values for the first element of the list (803818). I want the function to return the values for every ID from customerlist for the endpoint I defined in the function argument.
I found this - possibly related - question, but I couldn't figure my problem out.
There is probably an easy solution for this which I am not seeing, as I am just starting with Python. Thanks.

The moment a function hits a return statement, it immediately finishes. Since your return statement is in the loop, the other iterations never actually get called.
To fix, you can create a list outside the loop, append to it every loop iteration, and then return the DataFrame created with that list:
def get_data(endpoint):
responses = []
for i in customerlist:
api_endpoint = endpoint
params = {'customerid' : i}
response = requests.get(f"{API_BASEURL}/{api_endpoint}",
params = params,
headers = HEADERS)
if response.status_code == 200:
res = json.loads(response.text)
else:
raise Exception(f'API error with status code {response.status_code}')
responses.append(res)
return pd.DataFrame(responses)
A much cleaner solution would be to use list comprehension:
def get_data(endpoint, i):
api_endpoint = endpoint
params = {'customerid' : i}
response = requests.get(f"{API_BASEURL}/{api_endpoint}",
params = params,
headers = HEADERS)
if response.status_code == 200:
res = json.loads(response.text)
else:
raise Exception(f'API error with status code {response.status_code}')
return res
responses = pd.DataFrame([get_data(endpoint, i) for i in customerlist])

Related

How to properly wait while `GET` returns the completion status equal to 0

I need to use API that performs a task in two steps:
POST request: to submit a task and get back the results URL
GET request: to check for a status of the results URL and get the result when the status is "completed"
Below I provide my implementation. However, I don't know how to perform the waiting while GET returns the completion status equal to 0.
import requests
url = "https://xxx"
headers = {"Content-Type": "application/json", "Ocp-Apim-Subscription-Key": "xxx"}
body = {...}
response = requests.post(url, headers=headers, json=body)
status_code = response.status_code
url_result = response.headers['url-result']
# Step 2
s2_result = requests.get(url_result, headers=headers)
s2_result = s2_result.json()
s2_result_status = s2_result['completed']
if s2_result_status == 1:
# ...
else:
# wait 1 second and repeat
You need to repeat the get in a loop. I suggest you use Session() to prvent connection errors
with requests.Session() as s:
c = 0
while c < 10: # add a limit
s2_result = s.get(url_result, headers=headers)
s2_result = s2_result.json()
s2_result_status = s2_result['completed']
if s2_result_status == 1:
break
else:
time.sleep(1)
c += 1
The idea of waiting for 1 second and repeating can definitely solve your problem. You can use time.sleep(1). For example,
# Step 2
while True:
s2_result = requests.get(url_result, headers=headers)
s2_result = s2_result.json()
s2_result_status = s2_result['completed']
if s2_result_status == 1:
# ...
break
else:
# wait 1 second and repeat
time.sleep(1)
Reminders: Do not forget to check the http status code of response before you call s2_result.json(). And, using s2_result.get('completed') rather than s2_result['completed'] makes your program more robust.

Using values from list into a sub function

I am running a function that produces a list like this:
rft_id_list = []
for i in payload_df_rft:
payload_rft = json.dumps(i)
url = 'https://domain/api/link/rft'
print(url)
response = requests.request('POST', url, headers=headers, data=payload_rft)
rft_script_output = response.json()
# print(rft_script_output)
rft_id = (rft_script_output['id'])
#print(rft_id)
rft_id_list.append(rft_id)
print(rft_id_list)
print('~~~Script Finished ~~~')
Script above gives me the values below:
['1234abc', '22345bcde', '33456cdef']
Next sub-function has a url and I want the rft_id_list to iterate through the values above and add them to the URL while doing a PUT.
url = 'https://domain/api/link/' + rft_id_list
What's the best way I can do this?

Apply the code for smaller batches in the data set sequentially

I have data set of retrieved tweets via the Twitter streaming API.
However, I regularly want to be updated about how the public metrics change. Therefore, I wrote a code to request those public metrics:
def create_url():
tweet_fields = "tweet.fields=public_metrics"
tweets_data_path = 'dataset.txt'
tweets_data = []
tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
tweet = json.loads(line)
tweets_data.append(tweet)
except:
continue
df = pd.DataFrame.from_dict(pd.json_normalize(tweets_data), orient='columns')
df_id = (str(str((df['id'].tolist()))[1:-1])).replace(" ", "")
ids = "ids=" + df_id
url = "https://api.twitter.com/2/tweets?{}&{}".format(ids, tweet_fields)
return url
def bearer_oauth(r):
r.headers["Authorization"] = f"Bearer {'AAAAAAAAAAAAAAAAAAAAAN%2B7QwEAAAAAEG%2BzRZkmZ4HGizsKCG3MkwlaRzY%3DOwuZeaeHbeMM1JDIafd5riA1QdkDabPiELFsguR4Zba9ywzzOQ'}"
r.headers["User-Agent"] = "v2TweetLookupPython"
return r
def connect_to_endpoint(url):
response = requests.request("GET", url, auth=bearer_oauth)
print(response.status_code)
if response.status_code != 200:
raise Exception(
"Request returned an error: {} {}".format(
response.status_code, response.text
)
)
return response.json()
def main():
url = create_url()
json_response = connect_to_endpoint(url)
print(json.dumps(json_response, indent=3, sort_keys=True))
if __name__ == "__main__":
main()
Unfortunately, my data set has more than 100 id's in it and I want to retrieve the metrics for all of them. As I can only request 100 id's at a time, can you maybe help me on how to do that?
Also, I would like to make the request daily at midnight and then store the file in a txt document, maybe you can also help me with that?
You can chunk your data and send it in batches using itertools.islice.
test.py:
import reprlib
from itertools import islice
import pandas as pd
BASE_URL = "https://api.twitter.com/2/tweets"
CHUNK = 100
def req(ids):
tmp = reprlib.repr(ids) # Used here just to shorten the output
print(f"{BASE_URL}?ids={tmp}")
def main():
df = pd.DataFrame({"id": range(1000)})
it = iter(df["id"])
while chunk := tuple(islice(it, CHUNK)):
ids = ",".join(map(str, chunk))
req(ids)
if __name__ == "__main__":
main()
Test:
$ python test.py
https://api.twitter.com/2/tweets?ids='0,1,2,3,4,5,...5,96,97,98,99'
https://api.twitter.com/2/tweets?ids='100,101,102,...6,197,198,199'
https://api.twitter.com/2/tweets?ids='200,201,202,...6,297,298,299'
https://api.twitter.com/2/tweets?ids='300,301,302,...6,397,398,399'
https://api.twitter.com/2/tweets?ids='400,401,402,...6,497,498,499'
https://api.twitter.com/2/tweets?ids='500,501,502,...6,597,598,599'
https://api.twitter.com/2/tweets?ids='600,601,602,...6,697,698,699'
https://api.twitter.com/2/tweets?ids='700,701,702,...6,797,798,799'
https://api.twitter.com/2/tweets?ids='800,801,802,...6,897,898,899'
https://api.twitter.com/2/tweets?ids='900,901,902,...6,997,998,999'
Note: You'll make multiple requests with this approach so keep in mind any rate limits.

Handling final page in Python paginated API request

I'm requesting Microsoft's Graph API, where I'm using the following function to request multiple pages. I'm trying to request all pages, merge the json files and finally write them to a pandas dataframe.
v = "v1.0"
r = "/users?$filter=userType eq 'Member'&$select=displayName,givenName,jobTitle,mail,department&$top=200"
def query(v, r):
all_records = []
url = uri.format(v=v, r=r)
while True:
if not url:
break
result = requests.get(url, headers=headers)
if result.status_code == 200:
json_data = json.loads(result.text)
all_records = all_records + json_data["value"]
url = json_data["#odata.nextLink"]
return all_records
The while-loop goes through all the pages, but when I run the function I'm getting a error:
KeyError: '#odata.nextLink'
I assume this is because the loop reaches the final page, and thus the '#odata.nextLink' cannot be found. But how can I handle this?
You are doing
url = json_data["#odata.nextLink"]
which suggest json_data is dict, so you should be able to use .get method which returns default value when key not found (None by default), please try doing following and write if it does work as excepted:
url = json_data.get("#odata.nextLink")
if url is None:
print("nextLink not found")
else:
print("nextLink found")

The result of requests module is different between idle and code

Why doesn't the wayback machine return an answer with this code?
What I tried: (1) python idle returned a normal answer.
(2) The status_code is 200 and the function returns None.
def wayback_search(url):
res = requests.get("https://web.archive.org/cdx/search/cdx?url=%s&showDupeCount=true&output=json" % url,
headers = {'User-agent': 'Mozilla/5.0'})
### search in requests_module
urllist = res.url.split('&')
request_url = urllist[0][:-1] + '&' + urllist[1] + '&' + urllist[2]
print('timestamps_url:', request_url)
res = requests.get(request_url)
if res.raise_for_status():
cdx = res.json()
print(res.url)
print('cdx', cdx)
res = requests.get("http://archive.org/wayback/available?url=%s" % url,
headers = {'User-agent': 'Mozilla/5.0'})
if res.raise_for_status():
cdx = res.json()
print(res.url)
print('cdx', cdx)
Perhaps the wayback isn't working at all.
I do not see where the function wayback_search is called. Also, there is no return statement in the function. In python, when no return statement is there in function, it returns None.. Try to return what you want
Also the code inside the if res.raise_for_status() should ideally never run because the res.raise_for_status() raises a exception.

Categories

Resources