Caching data in an API - Handling concurrent/fast requests - python

# this is psudo code
cache = {"user_information": response}
async def update_caching(
app,
token,
user_information=None,
guild_id=None,
user_guilds=None,
guild_channels=None,
guild_data=None,
guild_roles=None,
):
if user_information:
if cache not updated in last 5 seconds:
response = await request(...) # make request to API
cache[user_information] = response
return
I have this function to cache results from my api so that I don't have to keep needing to make requests. However, if a user makes the request to this endpoint twice, and the first request has not yet been updated due to it waiting for a request from the API, then there is no data in it. This means that both requests to the api will be sent, causing one to become rate limited.
How can I fix this? I could set the last_call datetime before the request is made inside of the update_cache function, but then on the 2nd request my API will think that there is data cached, when its not ready yet...
Any tips would be very helpful.
Esentially,
1st request is made, my code is collecting data to cache, but before the new data is ready, the user makes another request to get data - now there are two requests being made to the external thirdparty API and they get rate limited.
I appreciate the code is not functional. But I think it displays what problem I am having.

Related

waiting for completion of get request

I have to get several pages of a json API with about 130'000 entries.
The request is fairly simple with:
response = requests.request("GET", url, headers=headers, params=querystring)
Where the querystring is an access token and the headers fairly simple.
I created a while loop where basically every request url is in the form of
https://urlprovider.com/endpointname?pageSize=10000&rowStart=0
and the rowStart increments by pageSize until there is no more further pages.
The problem I encounter is the following response after about 5-8 successful requests:
{'errorCode': 'ERROR_XXX', 'code': 503, 'message': 'Maximum limit for unprocessed API requests have been reached. Please try again later.', 'success': False}
From the error message I get that I initiate the next request before the last has finished. Does anyone know how I can make sure the get request has finished before the next one starts (except something crude like a sleep()) or if the error could lie elsewhere?
I found the answer to my question.
Requests is synchronous, meaning that it will ALWAYS wait until the the call has finished before continuing
The response from the API provider is misleading, as the request has thus already been processed before the next one comes.
The root cause is difficult to assess, but it may be to do with a limit imposed by the API provider
What has worked:
A crude sleep(10), which makes the program wait 10 seconds before processing the next request
Better solution: Create a Session. According to the documentation:
The Session object [...] will use urllib3’s connection pooling. So if you’re making several requests to the same host, the underlying TCP connection will be reused, which can result in a significant performance increase (see HTTP persistent connection).
Not only does this resolve the problem but also increases the performance compared to my initial code.

Can I persist an http connection (or other data) across Flask requests?

I'm working on a Flask app which retrieves the user's XML from the myanimelist.net API (sample), processes it, and returns some data. The data returned can be different depending on the Flask page being viewed by the user, but the initial process (retrieve the XML, create a User object, etc.) done before each request is always the same.
Currently, retrieving the XML from myanimelist.net is the bottleneck for my app's performance and adds on a good 500-1000ms to each request. Since all of the app's requests are to the myanimelist server, I'd like to know if there's a way to persist the http connection so that once the first request is made, subsequent requests will not take as long to load. I don't want to cache the entire XML because the data is subject to frequent change.
Here's the general overview of my app:
from flask import Flask
from functools import wraps
import requests
app = Flask(__name__)
def get_xml(f):
#wraps(f)
def wrap():
# Get the XML before each app function
r = requests.get('page_from_MAL') # Current bottleneck
user = User(data_from_r) # User object
response = f(user)
return response
return wrap
#app.route('/one')
#get_xml
def page_one(user_object):
return 'some data from user_object'
#app.route('/two')
#get_xml
def page_two(user_object):
return 'some other data from user_object'
if __name__ == '__main__':
app.run()
So is there a way to persist the connection like I mentioned? Please let me know if I'm approaching this from the right direction.
I think you aren't approaching this from the right direction because you place your app too much as a proxy of myanimelist.net.
What happens when you have 2000 users? Your app end up doing tons of requests to myanimelist.net, and a mean user could definitely DoS your app (or use it to DoS myanimelist.net).
This is a much cleaner way IMHO :
Server side :
Create a websocket server (ex: https://github.com/aaugustin/websockets/blob/master/example/server.py)
When a user connects to the websocket server, add the client to a list, remove it from the list on disconnect.
For every connected users, do frequently check myanimelist.net to get the associated xml (maybe lower the frequence the more online users you get)
for every xml document, make a diff with your server local version, and send that diff to the client using the websocket channel (assuming there is a diff).
Client side :
on receiving diff : update the local xml with the differences.
disconnect from websocket after n seconds of inactivity + when disconnected add a button on the interface to reconnect
I doubt you can do anything much better assuming myanimelist.net doesn't provide a "push" API.

is there a way to partially return results from Python web service....?

I am new to python. I am using Flask for creating a web service which makes lots of api calls to linkedin. The problem with this is getting the final result set lot of time and frontend remains idle for this time. I was thinking of returning partial results found till that point and continuing api calling at server side. Is there any way to do it in Python? Thanks.
Flask has the ability to stream data back to the client. Sometimes this requires javascript modifications to do what you want but it is possible to send content to a user in chunks using flask and jinja2. It requires some wrangling but it's doable.
A view that uses a generator to break up content could look like this (though the linked to SO answer is much more comprehensive).
from flask import Response
#app.route('/image')
def generate_large_image():
def generate():
while True:
if not processing_finished():
yield ""
else:
yield get_image()
return Response(generate(), mimetype='image/jpeg')
There are a few ways to do this. The simplest would be to return the initial request via flask immediately and then use Javascript on the page you returned to make an additional request to another URL and load that when it comes back. Maybe displaying a loading indicator or something.
The additional URL would look like this
#app.route("/linkedin-data")
def linkedin():
# make some call to the linked in api which returns "data", probably in json
return flask.jsonify(**data)
Fundamentally, no. You can't return a partial request. So you have to break your requests up into smaller units. You can stream data using websockets. But you would still be sending back an initial request, which would then create a websocket connection using Javascript, which would then start streaming data back to the user.

Appengine urlfetch issue (Python)

I'm trying to use urlfetch to make a request to my application (the same application which is sending the request) however, it doesn't work.
My code is as follows;
uploadurl = 'http://myapp.appspot.com/posturl'
result = urlfetch.fetch(
url=uploadurl,
payload=data,
method=urlfetch.POST,
headers={'Content-Type': 'application/x-www-form-urlencoded'})
There is no error at all when I call this, and everything seems to work correctly, however the request never arrives. For debugging purposes, I changed the uploadurl to a different application which I own and it worked fine. Any ideas why I can't send requests using urlfetch to the same application?
The full (real) url that I would call is made by
session = str(os.urandom(16).encode('hex'))
uploadurl = blobstore.create_upload_url('/process?session=' + session)
So I can't understand how that could be incorrect as the url is made for me.
Thanks.
I don't know how you're verifying that the request "never arrives". The blobstore URLs are not handled by your application's actual code, but by the App Engine runtime itself, so if you're looking in the logs you won't see that request there.
I think it is not possible, to prevent endless loop. From the urlfetch api documentation page:
To prevent an app from causing an endless recursion of requests, a
request handler is not allowed to fetch its own URL. It is still
possible to cause an endless recursion with other means, so exercise
caution if your app can be made to fetch requests for URLs supplied by
the user.

Facebook graph api on appengine Invalid Request URL

I have a request to the fb graph api that goes like so:
https://graph.facebook.com/?access_token=<ACCESSTOKEN>&fields=id,name,email,installed&ids=<A LONG LONG LIST OF IDS>
If the number of ids goes above 200-ish in the request, the following things happen:
in browser: works
in local tests urllib: timeout on deployed
appengine application: "Invalid request URL (followed by url)" this
one doesn't hang at all though
For number of ids below 200 or so , it works fine for all of them.
Sure I could just slice the id list up and fetch them separately, but I would like to know why this is happening and what it means?
I didn't read your question through the first time around. I didn't scroll the embedded code to the right to realize that you were using a long URL.
There's usually maximum URL lengths. This will prevent you from having a long HTTP GET request. The way to get around that is to embed the parameters in the data of a POST request.
It looks like FB's Graph API does support it, according to this question:
using POST request on Facebook Graph API

Categories

Resources