Error: Time out while performing API call in Python - python

I have a list (lst) which is a list of list. There are 19 elements in this list and each element has ~2500 strings.
lst
[['A', 'B', 'C',...]['E', 'F', 'G',....][.....]]
I am using these strings (A,B....) to call an API endpoint ('q':element). However after ~1800 strings, I am getting a time out.
I am running following lines of code.
def get_val(element):
url = 'https://www.xxxx/yyy/api/search'
headers = {'Content-Type': 'application/json'}
param = {'q': element, 'page' : 500}
try:
response = requests.get(url, headers = headers, params = param, timeout=(3.05, 27))
docs = response.json()['response']['docs']
for result in docs:
file.write("%s\t%s\n" % (element,result['short_form']))
except Timeout:
print('Timeout has been raised.')
#loop through elements of list
for i in lst:
for element in i:
get_val(element)
How can I modify my code to avoid this time out?

One reason for this timeout could be a protection against mass requests, that means, that there are too many requests in a short time.
To overcome this problem a short pause could be added after for example every 100 requests. However this is a try and error approach but it could work. Worst case would be to add a delay after every request.
import time
time.sleep(0.5)
The parameter is added in seconds so 0.5 sec for example.

Related

Returning all values from an API GET request that limits page size

I'm currently working in an API and I'm having some issues being able to return all values of something. The API allows for page sizes up to 500 records at a time. By default it uses 25 records per "page." You can also go between pages. You can edit the page sizes by adding to the endpoint: ?page[size]={number between 1-500}. My issue is I'm storing the values returned from the api in a dictionary and whenever I try to get the max amount of data possible if there's more than 500 records in the full population of the data I get no errors but whenever there's less than 500 I get a key error since it's expecting more data than is available. I don't want to have to guess the exact page size each and every request. Is there any easier way I can be going about this to be able to get all available data without having to request the exact amount of data for a page size? Ideally I'd want to be able to just get the max amount from whatever request always going to the upper bounds of what the request can return.
Thanks!
Here's an example of some code from the script:
base_api_endpoint = "{sensitive_url}?page[size]=300"
response = session.get(base_api_endpoint)
print(response)
print(" ")
d = response.json()
data = [item['attributes']['name'] for item in d['data']]
result = {}
sorted_result = {}
for i in data:
result[i] = data.count(i)
sorted_value_index = np.argsort(result.values())
dictionary_keys = list(result.keys())
sorted_dict = {dictionary_keys[i]: sorted(
result.values())[i] for i in range(len(dictionary_keys))}
sorted_d = dict( sorted(result.items(), key=operator.itemgetter(1),reverse=True))
for key, value in sorted_d.items():
print(key)
For some context, this structure of dictionary is used in other areas of the program to print both the key and value pair but for simplicities sake I'm just printing the key here.

Python While Loop Problem - Instagram API Returns Pagination Objects but not new results

I am trying to extract a list of Instagram posts that have been tagged with a certain hashtag. I am using a RAPIDAPI found here. Instagram paginates the results which are returned, so I have to cycle through the pages to get all results. I am encountering a very strange bug/error where I am receiving the next page as requested, but the posts are from the previous page.
To use the analogy of a book, I can see page 1 of the book and I can request to the book to show me page 2. The book is showing me a page labeled page 2, but the contents of the page are the same as page 1.
Using the container provided by the RapidAPI website, I do not encounter this error. This leads me to believe that problem must be on my end, presumably in the while loop I have written.
If somebody could please review my 'while' loop, or suggest anything else which would correct the problem, I would greatly appreciate it. The list of index range error at the bottom is easily fixable, so I'm not concerned about it.
Other info: This particular hashtag has 694 results, and the API returns a page containing 50 items of results.
import http.client
import json
import time
conn = http.client.HTTPSConnection("instagram-data1.p.rapidapi.com") #endpoint supplied by RAPIDAPI
##Begin Credential Section
headers = {
'x-rapidapi-key': "*removed*",
'x-rapidapi-host': "instagram-data1.p.rapidapi.com"
}
##End Credential Section
hashtag = 'givingtuesdayaus'
conn.request("GET", "/hashtag/feed?hashtag=" + hashtag, headers=headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8")) #Purely for debugging, can be disabled
json_dictionary = json.loads(data.decode("utf-8")) #Saving returned results into JSON format, because I find it easier to work with
i = 1 # Results need to cycle through pages, using 'i' to track the number of loops and for input in the name of the file which is saved
with open(hashtag + str(i) + '.json', 'w') as json_file:
json.dump(json_dictionary['collector'], json_file)
#JSON_dictionary contains five fields, 'count' which is number of results for hashtag query, 'has_more' boolean indicating if there are additional pages
# 'end_cursor' string which can be added to the url to cycle to the next page, 'collector' list containing post information, and 'len'
#while loop essentially checks if the 'has_more' indicates there are additional pages, if true uses the 'end_cursor' value to cycle to the next page
while json_dictionary['has_more']:
time.sleep(1)
cursor = json_dictionary['end_cursor']
conn.request("GET", "/hashtag/feed?hashtag=" + hashtag +'&end-cursor=' + cursor, headers=headers)
res = conn.getresponse()
data = res.read()
json_dictionary = json.loads(data.decode("utf-8"))
i += 1
print(i)
print(json_dictionary['collector'][1]['id'])
print(cursor) #these three prints rows are only used for debugging.
with open(hashtag + str(i) + '.json', 'w') as json_file:
json.dump(json_dictionary['collector'], json_file)
Results from python console: (As you can see, cursor and 'i' advance, but post id remains the same. The saved JSON files also all contain the same posts.
> {"count":694,"has_more":true,"end_cursor":"QVFCd2pVdEN2d01rNkw3UmRKSGVUN1EyanBlYzBPMS15MkIyUG1VdHhjWlJWMDBwRmVhaEYxd0czSE0wMktFcGhfMnItak5ZOE1GTzJvd05FU0pTMWxmVg==","collector":[{"id":"2467140087692742224","shortcode":"CI9CtaaDU5Q","type":"GraphImage",.....}
> #shortened by poster 2 2464906276234990574 QVFCd2pVdEN2d01rNkw3UmRKSGVUN1EyanBlYzBPMS15MkIyUG1VdHhjWlJWMDBwRmVhaEYxd0czSE0wMktFcGhfMnItak5ZOE1GTzJvd05FU0pTMWxmVg==
> 3 2464906276234990574
> QVFDVUlROFVKVVB3SEwyR05MSzJHZ2V1UXZqSzlzTVFhWDNBM3hXNENMcThKWExwWU90RFRnRm1FNWtSRGtrbTdORFIwRlU2QWZaSVByOHZhSXFnQnJsVg==
> 4 2464906276234990574
> QVFEVFpheV9SeFZCcWlKYkc3NUZZdG00Rk5KMWJsQVBNakJlZDcyMGlTWm9rUTlIQzRoYjVtTU1uRmhJZG5TTFBSOXdhbHozVUViUjZEbVpLdjVUQlJtVQ==
> Traceback (most recent call last): File "<input>", line 33, in
> <module> IndexError: list index out of range
I see you break a list of lists, it's just that you take a list of lists more than the list of lists
example:
data = [1,2,3,4,5]
You must provide a list of numbers
data[4]
not like this
data[6]
You made a mistake
IndexError: list index out of range
maybe wrong in this
print(json_dictionary['collector'][1]['id'])
print(cursor) #these three prints rows are only used for debugging.
with open(hashtag + str(i) + '.json', 'w') as json_file:
json.dump(json_dictionary['collector'], json_file)
Apologies for everyone who has read this far, I am an idiot.
I have identified the error shortly after posting:
conn.request("GET", "/hashtag/feed?hashtag=" + hashtag +'&end-cursor=' + cursor, headers=headers)
'end-cursor' should be 'end_cursor'.

Do while emulation in python is not working properly

testurl = '{}/testplan/Plans/{}/Suites/{}/Testpoint?includePointDetails=true&api-version=5.1-preview.2'.format(base, planId, suiteId)
print(testurl)
while True:
c = count_testpoints(testplanAPI(base, planId, suiteId, callAPI(testurl)))
if(c<200):
break
Where callAPI() is a function is used to return a header from the response which is passed as an argument to testplanAPI() to build a new testurl using that argument as URL parameter. testplanAPI() returns testurl while count_testpoints() returns the count of testpoints.
I have to close the loops after its get the first count less than 200.
Using above code is building the url only once and iterating the same condition infinitely. It's not appending the url further after the first iteration.
Can you please suggest a better way or what can be rectify here?
As #deceze correctly wrote, you have to set the url inside the loop, and you most likely have to save the new base and IDs...
while c < 200:
testurl = '{}/testplan/Plans/{}/Suites/{}/Testpoint?includePointDetails=true&api-version=5.1-preview.2.format(base, planId, suiteId)'
print(testurl)
c = count_testpoints(testplanAPI(base, planId, suiteId, callAPI(testurl)))
# sth like: base, planId, suiteId = new values for these...

How to get data from all pages in Github API with Python?

I'm trying to export a repo list and it always returns me information about the 1rst page. I could extend the number of items per page using URL+"?per_page=100" but it's not enough to get the whole list.
I need to know how can I get the list extracting data from page 1, 2,...,N.
I'm using Requests module, like this:
while i <= 2:
r = requests.get('https://api.github.com/orgs/xxxxxxx/repos?page{0}&per_page=100'.format(i), auth=('My_user', 'My_passwd'))
repo = r.json()
j = 0
while j < len(repo):
print repo[j][u'full_name']
j = j+1
i = i + 1
I use that while condition 'cause I know there are 2 pages, and I try to increase it in that waym but It doesn't work
import requests
url = "https://api.github.com/XXXX?simple=yes&per_page=100&page=1"
res=requests.get(url,headers={"Authorization": git_token})
repos=res.json()
while 'next' in res.links.keys():
res=requests.get(res.links['next']['url'],headers={"Authorization": git_token})
repos.extend(res.json())
If you aren't making a full blown app use a "Personal Access Token"
https://github.com/settings/tokens
From github docs:
Response:
Status: 200 OK
Link: <https://api.github.com/resource?page=2>; rel="next",
<https://api.github.com/resource?page=5>; rel="last"
X-RateLimit-Limit: 5000
X-RateLimit-Remaining: 4999
You get the links to the next and the last page of that organization. Just check the headers.
On Python Requests, you can access your headers with:
response.headers
It is a dictionary containing the response headers. If link is present, then there are more pages and it will contain related information. It is recommended to traverse using those links instead of building your own.
You can try something like this:
import requests
url = 'https://api.github.com/orgs/xxxxxxx/repos?page{0}&per_page=100'
response = requests.get(url)
link = response.headers.get('link', None)
if link is not None:
print link
If link is not None it will be a string containing the relevant links for your resource.
From my understanding, link will be None if only a single page of data is returned, otherwise link will be present even when going beyond the last page. In this case link will contain previous and first links.
Here is some sample python which aims to simply return the link for the next page, and returns None if there is no next page. So could incorporate in a loop.
link = r.headers['link']
if link is None:
return None
# Should be a comma separated string of links
links = link.split(',')
for link in links:
# If there is a 'next' link return the URL between the angle brackets, or None
if 'rel="next"' in link:
return link[link.find("<")+1:link.find(">")]
return None
Extending on the answers above, here is a recursive function to deal with the GitHub pagination that will iterate through all pages, concatenating the list with each recursive call and finally returning the complete list when there are no more pages to retrieve, unless the optional failsafe returns the list when there are more than 500 items.
import requests
api_get_users = 'https://api.github.com/users'
def call_api(apicall, **kwargs):
data = kwargs.get('page', [])
resp = requests.get(apicall)
data += resp.json()
# failsafe
if len(data) > 500:
return (data)
if 'next' in resp.links.keys():
return (call_api(resp.links['next']['url'], page=data))
return (data)
data = call_api(api_get_users)
First you use
print(a.headers.get('link'))
this will give you the number of pages the repository has, similar to below
<https://api.github.com/organizations/xxxx/repos?page=2&type=all>; rel="next",
<https://api.github.com/organizations/xxxx/repos?page=8&type=all>; rel="last"
from this you can see that currently we are on first page of repo, rel='next' says that the next page is 2, and rel='last' tells us that your last page is 8.
After knowing the number of pages to traverse through,you just need to use '=' for page number while getting request and change the while loop until the last page number, not len(repo) as it will return you 100 each time.
for e.g
i=1
while i <= 8:
r = requests.get('https://api.github.com/orgs/xxxx/repos?page={0}&type=all'.format(i),
auth=('My_user', 'My_passwd'))
repo = r.json()
for j in repo:
print(repo[j][u'full_name'])
i = i + 1
link = res.headers.get('link', None)
if link is not None:
link_next = [l for l in link.split(',') if 'rel="next"' in l]
if len(link_next) > 0:
return int(link_next[0][link_next[0].find("page=")+5:link_next[0].find(">")])

Get all data from API in single hit - Python-Requests

import requests
url = 'http://www.justdial.com/autosuggest.php?'
param = {
'cases':'popular',
'strtlmt':'24',
'city':'Mumbai',
'table':'b2c',
'where':'',
'scity':'Mumbai',
'casename':'tmp,tmp1,24-24',
'id':'2'
}
res = requests.get(url,params=param)
res = res.json()
though in first time hit the base url in browser then last 3 params not shown in requests query parameter but its working.
When I hit this API it return a json which contain 2 keys(total & results).
result key contain a list of dictionary(this is main data). and another key which is 'total' contain total number of different categories available in Justdial.
in present case it is total=49 and so have to hit api 3 times because at one time api return only 24 results so (24+24+1 so we need to hit 3 times ).
my question is is there any way to get complete json at one time I mean there are 49 results so instead of hittiing api 3 times can we get all data(all 49 categories) in single hit. I've already tried so many combinations in params but not success.
generally APIs have a count or max_results parameter -- set this on the URL and you'll get more results back.
Here's the documentation for Twitter's API count parameter: https://dev.twitter.com/docs/api/1.1/get/statuses/user_timeline
Github APi requires you to retrieve the data in pages (up to 100 results per page) and the response dict has a 'links' entry with the url to the next page of results.
The code below iterates through all teams in an organisation until it finds the team it's looking for
params = {'page': 1, 'per_page':100}
another_page = True
api = GH_API_URL+'orgs/'+org['login']+'/teams'
while another_page: #the list of teams is paginated
r = requests.get(api, params=params, auth=(username, password))
json_response = json.loads(r.text)
for i in json_response:
if i['name'] == team_name:
return i['id']
if 'next' in r.links: #check if there is another page of organisations
api = r.links['next']['url']

Categories

Resources