I have a Django app and need to generate files that can take up to a minute, so I pass that off to a background worker.
Currently, the process works as follows. I post to the server that replies with a URL that I can poll. I then poll the server every 2 seconds and either sends back "busy" or a url of where the file is located in my S3 bucket.
I want to replace this polling with Django channels, but unsure what is the best way to achieve this as I can't find any examples online. is channels even intended for something like this?
My current thoughts are the following:
Start the file generation as soon as the client opens a connection to on a specific route (previously this would have been a post)
The background task gets started as soon as the client connects and get the channel name as a paramenter
Once it is done it sends back the file path to the consumer which in turn sends it to the browser where I'll use JS to create a download button.
Below is an example:
#shared_task
def my_bg_task(channel_name):
#some long running calc here
channel_layer = get_channel_layer()
async_to_sync(channel_layer.send)(channel_name, {'type': 'generation_done', 'f_path': 'path/to/s3/bucket'})
class ReloadConsumer(WebsocketConsumer):
def connect(self):
my_bg_task.delay(self.channel_name)
self.accept()
def generation_done(self, event):
self.send(text_data=json.dumps({event}))
Is this the best way to achieve this?
Obviously from a security point of the it should not be accessible to anybody other than the user that opened the connection.
Related
I'm working on a python/flask application and I have my logging handled on a different server. The way I currently set it up is to have a function which sends a request to the external server whenever somebody visits a webpage.
This, of course extends my TTB because execution only continues after the request to the external server is completed. I've heard about threading but read that that also takes a little extra time.
Summary of current code:
log_auth_token = os.environ["log_auth"]
def send_log(data):
post_data = {
"data": data,
"auth": log_auth_token
}
r = requests.post("https://example.com/log", data=data)
#app.route('/log')
def log():
send_log("/log was just accessed")
return("OK")
In short:
Intended behavior: User requests webpage -> User recieves response -> Request is logged.
Current behavior: User requests webpage -> Request is logged -> User recieves response.
What would be the fastest way to achieve my intended behavior?
What would be the fastest way to achieve my intended behavior?
Log locally and periodically send the log files to a separate server. More specifically, you need to create rotating log files and archive them so you don't end up with 1 huge file. In order to do this you need to configure your reverse proxy (like NGINX).
Or log locally and create an application that allows you to read the log files remotely.
Sending a log per server call to a separate server simply isn't efficient unless you have another process do that. Users shouldn't have to wait for your log action to complete
I've created a page that allows a user to upload an excel file, which is then parsed into it's columns and then the rows are inserted into the database 500 rows at a time.
This is isn't a terribly long process - between 25 and 90 seconds, but long enough that I would like to give the user some feedback to let them know it's actually still working - in the form of status messages and/or progress bars.
My app is written in flask like this:
app.py
from flask import Flask, render_template, request
import tqdm
app = Flask(__name__)
#app.route('/', methods=['GET', 'POST'])
def fun():
if request.method == 'GET':
return render_template('index.html')
else:
filename = request.form['filename']
print('opening file') #change from console print to webpage
df = pandas.read_excel(filename)
print('File read.. processing data\n') #change from console to webpage
processData()
print('connecting to db....\n') #change to webpage print
db.connect()
print('connected to db! inserting rows') #change to webpage print
bulk_inserts = rows/500
for i in tqdm(range(bulk_inserts)): #wrapping tqdm around range makes a progress bar
insert500rows()
db.commit()
db.disconnect()
return 'Complete. ' + str(rows) + ' inserted.' #this gets sent as a post to the page
app.run()
I know you can only send one response to a post request, but how can I give the user status of the process if I can only send one response? Maybe I'm going about this the wrong way, but I think this is a pretty common use case. How else should I set this up if this way won't work?
For some reason marked as a duplicate of this question. That question asks how to print a continuous stream of values to a screen. Here I am asking how to send a message at certain points of execution. I think the comments provided about Flask-socketio provide a different approach to a different problem.
The "one response to one request" is a matter of how HTTP protocol works: the client sends a query and some data (the POST request), and the server responds with some other data (your "one response"). While you could technically get the server to send back pieces of the response in chunks, that is not how it works in practice; for one, browsers don't handle that too well.
You need to do this a different way. For instance, create a "side channel" with SocketIO, as the commenters helpfully suggest. Then you can send updates to the client through this side channel - instead of your prints, you would use socketio.emit.
On the client side, you would first subscribe to a SocketIO channel when the page loads. Then you would submit the file through an AJAX call (or in an separate iframe), and keep the SocketIO connection open on the page, to display the updates.
This way the POST request is separated from your "page loading". The javascript on the page remains active and can read and display progress update, and the upload (with the associated wait time) happens in background.
I would also do it like #matejcik explained in their answer, but there is also another way. What websockets does is pushing the data back to browser when there is an update. There is also the pull method.
You can send queries to the server periodically and the server will give you the update. You still have to use AJAX to send requests, and javascript's setTimeout function to wait between the queries but what you do is basically simply refreshing the page without showing it to user. It is easier to understand for beginners as the used technology is still GET. Instead of printing your new logs you add it to a string, (or an array) and when the GET request is made, you return this array, clear your text output and write this new array, both with old and new info.
This method is far less efficient than websockets but for prototyping it can be faster.
I'm working on a Flask app which retrieves the user's XML from the myanimelist.net API (sample), processes it, and returns some data. The data returned can be different depending on the Flask page being viewed by the user, but the initial process (retrieve the XML, create a User object, etc.) done before each request is always the same.
Currently, retrieving the XML from myanimelist.net is the bottleneck for my app's performance and adds on a good 500-1000ms to each request. Since all of the app's requests are to the myanimelist server, I'd like to know if there's a way to persist the http connection so that once the first request is made, subsequent requests will not take as long to load. I don't want to cache the entire XML because the data is subject to frequent change.
Here's the general overview of my app:
from flask import Flask
from functools import wraps
import requests
app = Flask(__name__)
def get_xml(f):
#wraps(f)
def wrap():
# Get the XML before each app function
r = requests.get('page_from_MAL') # Current bottleneck
user = User(data_from_r) # User object
response = f(user)
return response
return wrap
#app.route('/one')
#get_xml
def page_one(user_object):
return 'some data from user_object'
#app.route('/two')
#get_xml
def page_two(user_object):
return 'some other data from user_object'
if __name__ == '__main__':
app.run()
So is there a way to persist the connection like I mentioned? Please let me know if I'm approaching this from the right direction.
I think you aren't approaching this from the right direction because you place your app too much as a proxy of myanimelist.net.
What happens when you have 2000 users? Your app end up doing tons of requests to myanimelist.net, and a mean user could definitely DoS your app (or use it to DoS myanimelist.net).
This is a much cleaner way IMHO :
Server side :
Create a websocket server (ex: https://github.com/aaugustin/websockets/blob/master/example/server.py)
When a user connects to the websocket server, add the client to a list, remove it from the list on disconnect.
For every connected users, do frequently check myanimelist.net to get the associated xml (maybe lower the frequence the more online users you get)
for every xml document, make a diff with your server local version, and send that diff to the client using the websocket channel (assuming there is a diff).
Client side :
on receiving diff : update the local xml with the differences.
disconnect from websocket after n seconds of inactivity + when disconnected add a button on the interface to reconnect
I doubt you can do anything much better assuming myanimelist.net doesn't provide a "push" API.
I want to log requests (ie. user page views) to a database, but I want only to log the request metadata to a DB after the request was finished and data was successfully sent to the client.
Does flask request_tearing_down is the correct signal to subscribe? How about request_finished?
It looks like you don't want request_finished. From the docs:
This signal is sent right before the response is sent to the client.
From what I gather, request_tearing_down is also triggered before a response is sent.
I don't think there is a specific signal that exists that you can subscribe to to do something after a response has been sent. You might be able to modify Flask's code to add one yourself.
You best option might be to make the logging happen asynchronously so that it doesn't delay the response. You could do this yourself with threads or subprocesses, or you could use a library like Celery to do some of the work for you.
Also see this question
My webapp has two parts:
a GAE server which handles web requests and sends them to an EC2 REST server
an EC2 REST server which does all the calculations given information from GAE and sends back results
It works fine when the calculations are simple. Otherwise, I would have timeout error on the GAE side.
I realized that there are some approaches for this timeout issue. But after some researches, I found (please correct me if I am wrong):
taskqueue would not fit my needs since some of the calculations could take more than half an hours.
'GAE backend instance' works when I reserved another instance all the time. But since I have already resered an EC2 instance, I would like to find some "cheap" solutions (not paying GAE backend instance and EC2 at the same time)
'GAE Asynchronous Requests' also not an option, since it still wait for response from EC2 although users can send other requests while they are waiting
Below is a simple case of my code, and it asks:
users to upload a csv
parse this csv and send information to EC2
generate output page given response from EC2
OutputPage.py
from przm import przm_batchmodel
class OutputPage(webapp.RequestHandler):
def post(self):
form = cgi.FieldStorage()
thefile = form['upfile']
#this is where uploaded file will be processed and sent to EC2 for computing
html= przm_batchmodel.loop_html(thefile)
przm_batchoutput_backend.przmBatchOutputPageBackend(thefile)
self.response.out.write(html)
app = webapp.WSGIApplication([('/.*', OutputPage)], debug=True)
przm_batchmodel.py### This is the code which sends info. to EC2
def loop_html(thefile):
#parses uploaded csv and send its info. to the REST server, the returned value is a html page.
data= csv.reader(thefile.file.read().splitlines())
response = urlfetch.fetch(url=REST_server, payload=data, method=urlfetch.POST, headers=http_headers, deadline=60)
return response
At this moment, my questions are:
Is there a way on the GAE side allow me to just send the request to EC2 without waiting for its response? If this is possible, on the EC2 side, I can send users emails to notify them when the results are ready.
If question 1 is not possible. Is there a way to create a monitor on EC2 which will invoke the calculation once information are received from GAE side?
I appreciate any suggestions.
Here are some points:
For Question 1 : You do not need to wait on the GAE side for EC2 to complete its work. You are already using URLFetch to send the data across to EC2. As long as it is able to send that data across over to the EC2 side within 60 seconds and its size is not more than 10MB, then you are fine.
You will need to make sure that you have a Receipt Handler on the EC2 side that is capable of collecting this data from above and sending back an Ack. An Ack will be sufficient for the GAE side to track the activity. You can then always write some code on the EC2 side to send back the response to the GAE side that the conversion is done or as you mentioned, you could send an email off if needed.
I suggest that you create your own little tracker on the GAE side. For e.g. when the File is uploaded, created a Task and send back the Ack immediately to the client. Then you can use a Cron Job or Task Queue on the App Engine side to simply send off the work to EC2. Do not wait for EC2 to complete its job. Then let EC2 report back to GAE that its work is done for a particular Task Id and send off and email (if required) to notify the users that the work is done. In fact, EC2 can even report back with a batch of Task Ids that it completed, instead of sending a notification for each Task Id.