I need to make a lot of HTTP requests (> 1000) against a public service which only allows 500 HTTP requests per day. Hence, I have to count the number of executed requests and stop when I reach the maximum daily amount to continue the next day with the remaining calls. In particular, I iterate over a non-sorted list, so I cannot assume that the elements are in any order. My code looks like this:
from requests import Session, Request
request_parameters = {'api_key': api_key}
for user_id in all_user_ids:
r = requests.get('http://public-api.com/%s'% user_id, request_parameters)
text = r.content
# do some stuff with text
Is there any package or pattern which you can recommend for counting and resuming API calls like this?
I would suggest implementing a simple counter to stop when you have hit your limit for the day along with a local cache of the data you have already received. Then when you run the process again the next day check each record against your local cache first and only go on to call the web service if there is no record in the local cache. That way you will eventually have all of the data unless you are generating more requests per day than the service usage limit.
The format of the cache will depend on what is returned from the web service and how much of that data you need, but it may be as simple as a csv file with a unique identifier to search against and the other fields you will need to retrieve in future.
Another option would be to store the whole response from each call (if you need a lot of it) in a dictionary with the key being a unique identifier and the value being the response. This could be saved as a json file and easily loaded back into memory for checking against on future runs.
Related
I'm still working on this cocktail app and I ran into another issue. After some time, the api call will drop after an hour; the api will only stay open for an hour. After that hour, I have to make a new api call. But I don't want to start all over when trying to pull data back. The data I'm using has over 300 thousand records in it to match, which on average takes about 10 hours to complete. Sorry, if this question does not make any sense I'm still new to coding and I'm doing my best!
For instance:
Make api call
Use data from csv file 300k records to get data back on
Retrieve the data back.
API connection drops--- Will need to make new api connection.
Make API call
Determine what data has been matched from csv file, then pick up from the next record that has not been matched.
Is there a way to make sure that when I have to get a new API key or when the connection drops, whenever the connection is made again it picks up where it left off?
Sounds like you want your API to have something like "sessions" that the client can disconnect and reconnect to.
Just some sort of session ID the server can use to reconnect you to whatever long running query it is doing.
It probably gets quite complex though. The query may finish when no client is connected and so the results need to be stored somewhere keyed by the session id so that when the API reconnects it can get the results.
PS. 10 hours to match 300K records? That is a crazy amount of time ! I'd be doing some profiling on that match algorithm.
I want to get info from API via Python, which has infinite info updating (it is live - for example live video or live monitoring). So I want to stop this GET request after interval (for example 1 second), then process these information and then repeat this cycle.
Any ideas? (now I am using requests module, but I do not know, how to stop receiving data and then get them)
I might be off here, but if you hit an endpoint at a specific time, it should return the JSON at that particular moment. You could then store it and use it in whatever process you have created.
If you want to hit it again, you would just use requests to hit the endpoint.
I am new to this concept.Currently,I am working on API calls.
Background:
Here is my requirement.
I have a get API url
/details/v1/employee/{employeeid}/employeecodes?%s&limit=5000&offset={1}.
This Url gets one record at a time.So,I need to pass employeeid value to this url in a loop.
I have 300K employee in a list.
Issue:
Issue here is ,I have a limit on API calls that is 25K per day. But I have 300K employee,the program errors out saying out of call volume quota.
Question:
Is there any approach that I can limit my API calls?
I'm using django to develop a website. On the server side, I need to transfer some data that must be processed on the second server (on a different machine). I then need a way to retrieve the processed data. I figured that the simplest would be to send back to the Django server a POST request, that would then be handled on a view dedicated for that job.
But I would like to add some minimum security to this process: When I transfer the data to the other machine, I want to join a randomly generated token to it. When I get the processed data back, I expect to also get back the same token, otherwise the request is ignored.
My problem is the following: How do I store the generated token on the Django server?
I could use a global variable, but I had the impression browsing here and there on the web, that global variables should not be used for safety reason (not that I understand why really).
I could store the token on disk/database, but it seems to be an unjustified waste of performance (even if in practice it would probably not change much).
Is there third solution, or a canonical way to do such a thing using Django?
You can store your token in django cache, it will be faster from database or disk storage in most of the cases.
Another approach is to use redis.
You can also calculate your token:
save some random token in settings of both servers
calculate token based on current timestamp rounded to 10 seconds, for example using:
token = hashlib.sha1(secret_token)
token.update(str(rounded_timestamp))
token = token.hexdigest()
if token generated on remote server when POSTing request match token generated on local server, when getting response, request is valid and can be processed.
The simple obvious solution would be to store the token in your database. Other possible solutions are Redis or something similar. Finally, you can have a look at distributed async tasks queues like Celery...
I'm using the Instagram API to retrieve all photos from a list of hashtags. Starting today, I've been hitting the API rate limits (429 error, specifically). To debug this, I've been trying to see how to get the number of calls I have left per hour and incorporate that into my script.
The Instagram API says that you can access the number of calls left in the hour through the HTTP header, but I'm not sure how to access that in Python.
The following fields are provided in the header of each response and their values are related to the type of call that was made (authenticated or unauthenticated):
X-Ratelimit-Remaining: the remaining number of calls available to your app within the 1-hour window
X-Ratelimit-Limit: the total number of calls allowed within the 1-hour window
http://instagram.com/developer/limits
How would I access this data in Python?
I assumed it was much a much fancier solution based on a few other answers on SO, but after some researching, I found a really simple solution!
import requests
r = requests.get('URL of the API response here')
r.headers