I request data from API with python.
2 steps:
I need to request for the url for the data.
They are obtained by requests.POST method
with the urls, I can finally request for the data I need.
but the problem is:
the url's resources are not well prepared instantly the urls are obtained. If you want to pull a request for data at once, the outcome is usually failure with status_code != 200 and prompting data not ready.
so, I set a sleep time for random seconds. After some time, I will request the data again until I get it. Although, the code will collapse due to 'Max retries exceeded with url' Error.
How could I get to know the exact time when the url's resources are ready so that I don't need to try again and again?
Related
# this is psudo code
cache = {"user_information": response}
async def update_caching(
app,
token,
user_information=None,
guild_id=None,
user_guilds=None,
guild_channels=None,
guild_data=None,
guild_roles=None,
):
if user_information:
if cache not updated in last 5 seconds:
response = await request(...) # make request to API
cache[user_information] = response
return
I have this function to cache results from my api so that I don't have to keep needing to make requests. However, if a user makes the request to this endpoint twice, and the first request has not yet been updated due to it waiting for a request from the API, then there is no data in it. This means that both requests to the api will be sent, causing one to become rate limited.
How can I fix this? I could set the last_call datetime before the request is made inside of the update_cache function, but then on the 2nd request my API will think that there is data cached, when its not ready yet...
Any tips would be very helpful.
Esentially,
1st request is made, my code is collecting data to cache, but before the new data is ready, the user makes another request to get data - now there are two requests being made to the external thirdparty API and they get rate limited.
I appreciate the code is not functional. But I think it displays what problem I am having.
I'm trying to get some data from pubg API using requests.get().
While code was excuting, response.status_code returned 429.
After I got 429, I couldn't get 200.
how to fix this situation?
Here is part of my code.
for num in range(len(platform)):
url = "https://api.pubg.com/shards/"+platform[num]+"/players/"+playerID[num]+"/seasons/"+seasonID+"/ranked"
req = requests.get(url, headers=header)
print(req.status_code)
[output]
200
429
As per https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429
you are sending too many requests at once/in short span of time. I recommend using time.sleep(10)
import time
for num in range(len(platform)):
....
....
time.sleep(10)
I used 10 seconds, but you have to test it and understand how much time gap is requred. I would also recommend using https://pypi.org/project/retry/ after you figure out the right amount of sleep time.
As mentioned by sam, HTTP error 429 means you are making too many requests in a certain amount of time.
According to the official PUBG API documentation, the API actually tells you these rate limits by sending an additional header called X-RateLimit-Limit with every request. Each request also has a header called X-RateLimit-Remaining which tells you how many requests you have currently left until the next rate reset which happens at the time specified in the third header X-RateLimit-Reset.
Since you seem to use the requests library, you can access these headers with a simple req.headers after making your request in python.
I am making an GET call through requests in Python 3.7. This GET request will trigger a job. Once the job gets triggered on the host side it has set of attributes such as run id and other parameters. The host has an idle timeout of like 120 seconds and the job that was triggered will run more than 120 seconds. Since the get request is blocked until a response is thrown, in this case it times out after 120 seconds we are getting expected 504 error. But if the job completes within 120 seconds, the get request header response has details of run id and other attributes.
What I am trying to accomplish is the moment request.get is submitted, is there a way to get immediate response back with run id and other details. I can use that run id to poll the the host to get back the response code even after 120 seconds through a separate API call. I tried to search the blog but was unsuccessful. If request module cannot help in this case, please advice if other modules will come handy for my need.
Any input is much appreciated.
I have to get several pages of a json API with about 130'000 entries.
The request is fairly simple with:
response = requests.request("GET", url, headers=headers, params=querystring)
Where the querystring is an access token and the headers fairly simple.
I created a while loop where basically every request url is in the form of
https://urlprovider.com/endpointname?pageSize=10000&rowStart=0
and the rowStart increments by pageSize until there is no more further pages.
The problem I encounter is the following response after about 5-8 successful requests:
{'errorCode': 'ERROR_XXX', 'code': 503, 'message': 'Maximum limit for unprocessed API requests have been reached. Please try again later.', 'success': False}
From the error message I get that I initiate the next request before the last has finished. Does anyone know how I can make sure the get request has finished before the next one starts (except something crude like a sleep()) or if the error could lie elsewhere?
I found the answer to my question.
Requests is synchronous, meaning that it will ALWAYS wait until the the call has finished before continuing
The response from the API provider is misleading, as the request has thus already been processed before the next one comes.
The root cause is difficult to assess, but it may be to do with a limit imposed by the API provider
What has worked:
A crude sleep(10), which makes the program wait 10 seconds before processing the next request
Better solution: Create a Session. According to the documentation:
The Session object [...] will use urllib3’s connection pooling. So if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).
Not only does this resolve the problem but also increases the performance compared to my initial code.
I have a request to the fb graph api that goes like so:
https://graph.facebook.com/?access_token=<ACCESSTOKEN>&fields=id,name,email,installed&ids=<A LONG LONG LIST OF IDS>
If the number of ids goes above 200-ish in the request, the following things happen:
in browser: works
in local tests urllib: timeout on deployed
appengine application: "Invalid request URL (followed by url)" this
one doesn't hang at all though
For number of ids below 200 or so , it works fine for all of them.
Sure I could just slice the id list up and fetch them separately, but I would like to know why this is happening and what it means?
I didn't read your question through the first time around. I didn't scroll the embedded code to the right to realize that you were using a long URL.
There's usually maximum URL lengths. This will prevent you from having a long HTTP GET request. The way to get around that is to embed the parameters in the data of a POST request.
It looks like FB's Graph API does support it, according to this question:
using POST request on Facebook Graph API