Trying to put data into LruCache and then get it back. I am using postman to get the data via api, every fourth or fifth time I retry the api, data is returned form the cache. Whereas for first tries, it takes time and never returns from the cache. I think data is not being cached in the first placed. Please help, what might be the issue here with python's LRU Cache and Gunicorn
I am expecting data to be returned via cache, 2nd time I hit my api. But actually data is retuned from cache, after several tries of hitting the api.
Related
I am facing this issue where when I call an endpoint to my API (built in Flask, running on gunicorn with 9 workers), the data loads the first time, then, when I refresh the page, it seems to throw the follow error:
{'message': 'Internal Server Error'}
For some detail, the API does not store anything in persistent data, it is stored in runtime/ram before a specific endpoint is called (in my case, localhost:8000/see/), which then frees** the ram by deleting said data.
** I have disabled the function to free up stored data so no matter how many times I refresh the page, it should show the data.
So, I am facing two issues:
1: I have an endpoint localhost:8000/data/ where I store information being sent to the server. When I call this endpoint with that is inside of a dict, it should return all the data for that int. All data is shown at first, but when I refresh the page, it gives me the error I mentioned above. After a few refreshes of getting this error, the data shows up again but is gone on the next refresh.
2: When I call the /see/ endpoint, it should show me all the data I have collected, which it does but then when I refresh this page, it returns {} (disabled clearing, as mentioned above), which means the data is being cleared from runtime.
Furthermore, I have tried to fix this issue using the base Flask dev server, where I DO NOT encounter this error. I have also tried using another WSGI service: Waitress, which also does not give me these issues. I have tried running Gunicorn with supervisor and without, but I still am encountering these errors.
Any help would be appreciated as I would like to use Gunicorn. If there are good alternatives, please let me know.
I want to get info from API via Python, which has infinite info updating (it is live - for example live video or live monitoring). So I want to stop this GET request after interval (for example 1 second), then process these information and then repeat this cycle.
Any ideas? (now I am using requests module, but I do not know, how to stop receiving data and then get them)
I might be off here, but if you hit an endpoint at a specific time, it should return the JSON at that particular moment. You could then store it and use it in whatever process you have created.
If you want to hit it again, you would just use requests to hit the endpoint.
So I'm trying to get data from an API which has a max call limit of 60/min. If I go over this it will return a response [429] too many requests.
I thought maybe a better way was to keep requesting the API until I get a response [200] but I am unsure how to do this.
import requests
r = requests.get("https://api.steampowered.com/IDOTA2Match_570/GetTopLiveGame/v1/?key=" + key + "&partner=0")
livematches = json.loads(r.text)['game_list']
So usually it runs but if it returns anything other than a response [200], my program will fail and anyone who uses the website won't see anything.
Generally with cases like this, I would suggest adding a layer of cache indirection here. You definitely don't want (and can't) try solving this using frontend like you suggested since that's not what the frontend of your app is meant for. Sure, you can add the "wait" ability, but all someone has to do is pull up Chrome Developer tools to grab your API key and then call that as many times as they want. Think of it like this: say you have a chef that can only cook 10 things per hour. Someone can easily come into the restaurant, order 10 things, and then nobody else can order anything for the next hour, which isn't fair. Instead, adding a "cache" layer, which means that you only call the steam API every couple seconds. If 5 people request your site within, say, 5 seconds, you only go to the steam API on the first call, then you save that response. To the other 4 (and anyone who comes within those next few seconds), you return a "cached" version.
The two main reasons for adding a cache API layer are the following:
You would stop exposing key from the frontend. You never want to expose your API key to the frontend directly like this since anyone could just take your key and run many requests and then bam, your site is now down (denial of service attack is trivial). You would instead have the users hit your custom mysite.io/getLatestData endpoint which doesn't need the key since that would be securely stored in your backend.
You won't run into the rate limiting issue. Essentially if your cache only hits the API once every minute, you'll not run into any time where users can't access your site due to an API limit, since it'll return cached results to the user.
This may be a bit tricky to visualize, so here's how this works:
You write a little API in your favorite server-side language. Let's say NodeJS. There are lots of resources for learning the basics of ExpressJS. You'd write an endpoint like /getTopGame that is attached to a cache like redis. If there's a cached entry in the redis cache, return that to the user. If not, go to the steam API and get the latest data. Store that with an expiration of, say, 5 seconds. Boom, you're done!
It seems a bit daunting at first but as you said being a college student, this is a great way to learn a lot about how backend development works.
I'm using django to develop a website. On the server side, I need to transfer some data that must be processed on the second server (on a different machine). I then need a way to retrieve the processed data. I figured that the simplest would be to send back to the Django server a POST request, that would then be handled on a view dedicated for that job.
But I would like to add some minimum security to this process: When I transfer the data to the other machine, I want to join a randomly generated token to it. When I get the processed data back, I expect to also get back the same token, otherwise the request is ignored.
My problem is the following: How do I store the generated token on the Django server?
I could use a global variable, but I had the impression browsing here and there on the web, that global variables should not be used for safety reason (not that I understand why really).
I could store the token on disk/database, but it seems to be an unjustified waste of performance (even if in practice it would probably not change much).
Is there third solution, or a canonical way to do such a thing using Django?
You can store your token in django cache, it will be faster from database or disk storage in most of the cases.
Another approach is to use redis.
You can also calculate your token:
save some random token in settings of both servers
calculate token based on current timestamp rounded to 10 seconds, for example using:
token = hashlib.sha1(secret_token)
token.update(str(rounded_timestamp))
token = token.hexdigest()
if token generated on remote server when POSTing request match token generated on local server, when getting response, request is valid and can be processed.
The simple obvious solution would be to store the token in your database. Other possible solutions are Redis or something similar. Finally, you can have a look at distributed async tasks queues like Celery...
I need to make a lot of HTTP requests (> 1000) against a public service which only allows 500 HTTP requests per day. Hence, I have to count the number of executed requests and stop when I reach the maximum daily amount to continue the next day with the remaining calls. In particular, I iterate over a non-sorted list, so I cannot assume that the elements are in any order. My code looks like this:
from requests import Session, Request
request_parameters = {'api_key': api_key}
for user_id in all_user_ids:
r = requests.get('http://public-api.com/%s'% user_id, request_parameters)
text = r.content
# do some stuff with text
Is there any package or pattern which you can recommend for counting and resuming API calls like this?
I would suggest implementing a simple counter to stop when you have hit your limit for the day along with a local cache of the data you have already received. Then when you run the process again the next day check each record against your local cache first and only go on to call the web service if there is no record in the local cache. That way you will eventually have all of the data unless you are generating more requests per day than the service usage limit.
The format of the cache will depend on what is returned from the web service and how much of that data you need, but it may be as simple as a csv file with a unique identifier to search against and the other fields you will need to retrieve in future.
Another option would be to store the whole response from each call (if you need a lot of it) in a dictionary with the key being a unique identifier and the value being the response. This could be saved as a json file and easily loaded back into memory for checking against on future runs.