waiting for completion of get request - python

I have to get several pages of a json API with about 130'000 entries.
The request is fairly simple with:
response = requests.request("GET", url, headers=headers, params=querystring)
Where the querystring is an access token and the headers fairly simple.
I created a while loop where basically every request url is in the form of
https://urlprovider.com/endpointname?pageSize=10000&rowStart=0
and the rowStart increments by pageSize until there is no more further pages.
The problem I encounter is the following response after about 5-8 successful requests:
{'errorCode': 'ERROR_XXX', 'code': 503, 'message': 'Maximum limit for unprocessed API requests have been reached. Please try again later.', 'success': False}
From the error message I get that I initiate the next request before the last has finished. Does anyone know how I can make sure the get request has finished before the next one starts (except something crude like a sleep()) or if the error could lie elsewhere?

I found the answer to my question.
Requests is synchronous, meaning that it will ALWAYS wait until the the call has finished before continuing
The response from the API provider is misleading, as the request has thus already been processed before the next one comes.
The root cause is difficult to assess, but it may be to do with a limit imposed by the API provider
What has worked:
A crude sleep(10), which makes the program wait 10 seconds before processing the next request
Better solution: Create a Session. According to the documentation:
The Session object [...] will use urllib3’s connection pooling. So if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).
Not only does this resolve the problem but also increases the performance compared to my initial code.

Related

Caching data in an API - Handling concurrent/fast requests

# this is psudo code
cache = {"user_information": response}
async def update_caching(
app,
token,
user_information=None,
guild_id=None,
user_guilds=None,
guild_channels=None,
guild_data=None,
guild_roles=None,
):
if user_information:
if cache not updated in last 5 seconds:
response = await request(...) # make request to API
cache[user_information] = response
return
I have this function to cache results from my api so that I don't have to keep needing to make requests. However, if a user makes the request to this endpoint twice, and the first request has not yet been updated due to it waiting for a request from the API, then there is no data in it. This means that both requests to the api will be sent, causing one to become rate limited.
How can I fix this? I could set the last_call datetime before the request is made inside of the update_cache function, but then on the 2nd request my API will think that there is data cached, when its not ready yet...
Any tips would be very helpful.
Esentially,
1st request is made, my code is collecting data to cache, but before the new data is ready, the user makes another request to get data - now there are two requests being made to the external thirdparty API and they get rate limited.
I appreciate the code is not functional. But I think it displays what problem I am having.

how to avoid 429 error requests.get() in python?

I'm trying to get some data from pubg API using requests.get().
While code was excuting, response.status_code returned 429.
After I got 429, I couldn't get 200.
how to fix this situation?
Here is part of my code.
for num in range(len(platform)):
url = "https://api.pubg.com/shards/"+platform[num]+"/players/"+playerID[num]+"/seasons/"+seasonID+"/ranked"
req = requests.get(url, headers=header)
print(req.status_code)
[output]
200
429
As per https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/429
you are sending too many requests at once/in short span of time. I recommend using time.sleep(10)
import time
for num in range(len(platform)):
....
....
time.sleep(10)
I used 10 seconds, but you have to test it and understand how much time gap is requred. I would also recommend using https://pypi.org/project/retry/ after you figure out the right amount of sleep time.
As mentioned by sam, HTTP error 429 means you are making too many requests in a certain amount of time.
According to the official PUBG API documentation, the API actually tells you these rate limits by sending an additional header called X-RateLimit-Limit with every request. Each request also has a header called X-RateLimit-Remaining which tells you how many requests you have currently left until the next rate reset which happens at the time specified in the third header X-RateLimit-Reset.
Since you seem to use the requests library, you can access these headers with a simple req.headers after making your request in python.

how to get to know when the status_code will be 200?

I request data from API with python.
2 steps:
I need to request for the url for the data.
They are obtained by requests.POST method
with the urls, I can finally request for the data I need.
but the problem is:
the url's resources are not well prepared instantly the urls are obtained. If you want to pull a request for data at once, the outcome is usually failure with status_code != 200 and prompting data not ready.
so, I set a sleep time for random seconds. After some time, I will request the data again until I get it. Although, the code will collapse due to 'Max retries exceeded with url' Error.
How could I get to know the exact time when the url's resources are ready so that I don't need to try again and again?

Python request module getting first response after reqeust.get api call

I am making an GET call through requests in Python 3.7. This GET request will trigger a job. Once the job gets triggered on the host side it has set of attributes such as run id and other parameters. The host has an idle timeout of like 120 seconds and the job that was triggered will run more than 120 seconds. Since the get request is blocked until a response is thrown, in this case it times out after 120 seconds we are getting expected 504 error. But if the job completes within 120 seconds, the get request header response has details of run id and other attributes.
What I am trying to accomplish is the moment request.get is submitted, is there a way to get immediate response back with run id and other details. I can use that run id to poll the the host to get back the response code even after 120 seconds through a separate API call. I tried to search the blog but was unsuccessful. If request module cannot help in this case, please advice if other modules will come handy for my need.
Any input is much appreciated.

504 in a request to Typeform API using request library

I am integrating the answers given by some users to a typeform poll. I am using the request library and 50% of the times I get the response and 50% of the times i get a 504 code. Does anyone have an advice/link on how to solve this?
This may not be something that you can correct on on the client side.
A 504 status code indicates that some gateway server or service attempted to forward your request to the origin server and either received a 504 from upstream or no response within it's timeout interval. For example, this is a code that is used by Amazon's CDN and load-balancing services. If the failure ratio is close to 50% with a significant sample size, there's a good chance there's load-balancing involved.
As with any 5xx status code, the implication is that once the issue causing the failure is resolved upstream, you should be able to send the same request without modification and get a successful response. So any attempt to mitigate it on your side should be made with respect to retry/backoff strategy, rather than changing the content of your request.

Categories

Resources