I have a recommendation site. Everything was working dandy, until at points when the site was under a decent amount of traffic, the recommendations would take longer than 30 seconds (Heroku's limit) and time-out, throwing a 500 error. I realize this is a very long time for a http request.
So, I read up online and implemented RQ with Redis. I got that to work, but after some testing, it will still throw the Internal Server Error, even though the requests are going through a queue.
I'm really just lacking knowledge here and I have no idea what to do. I think I'm missing the whole idea of rq and redis I guess? Here's some of my code if it helps, but I'm hoping for more of just guidance of where to go from here to fix this error.
worker.py
import os
import redis
from rq import Worker, Queue, Connection
listen = ['high', 'default', 'low']
redis_url = os.getenv('REDISTOGO_URL',
'redis://redistogo:sampleurl:portNo/')
if not redis_url:
raise RuntimeError('Set up Redis to go first.')
conn = redis.from_url(redis_url)
if __name__=='__main__':
with Connection(conn):
worker = Worker(map(Queue, listen))
worker.work()
part of my views.py
q = Queue(connection=conn)
#app.route('/')
def home():
form = ArtistsForm()
return render_template('home.html', form=form)
#app.route('/results', methods=['POST'])
def results():
form = ArtistsForm()
error = None
try:
if request.method == 'POST' and form.validate():
table = 'Artists'
artists = []
for value in form.data.items():
if (value[1] is not ''):
artists.append(value[1])
results = q.enqueue_call(func=getArtists, args=(table, *artists))
while results.result is None:
time.sleep(1)
results = results.result.values.tolist()
return render_template('results.html', results=results)
else:
error = "Please be sure to enter 5 artists with correct spelling" \
" and punctuation"
except pylast.WSError:
return render_template('error.html')
return render_template('home.html', form=form, error=error)
Any guidance is appreciated
You can always try dividing the work between a web dyno to simply acknowledge the request, then have worker dynos doing the heavy lifting.
Kafka or something similar could be used to accomplish this.
Related
Using Tornado, I have a POST request that takes a long time as it makes many requests to another API service and processes the data. This can take minutes to fully complete. I don't want this to block the entire web server from responding to other requests, which it currently does.
I looked at multiple threads here on SO, but they are often 8 years old and the code does not work anylonger as tornado removed the "engine" component from tornado.gen.
Is there an easy way to kick off this long get call and not have it block the entire web server in the process? Is there anything I can put in the code to say.. "submit the POST response and work on this one function without blocking any concurrent server requests from getting an immediate response"?
Example:
main.py
def make_app():
return tornado.web.Application([
(r"/v1", MainHandler),
(r"/v1/addfile", AddHandler, dict(folderpaths = folderpaths)),
(r"/v1/getfiles", GetHandler, dict(folderpaths = folderpaths)),
(r"/v1/getfile", GetFileHandler, dict(folderpaths = folderpaths)),
])
if __name__ == "__main__":
app = make_app()
sockets = tornado.netutil.bind_sockets(8888)
tornado.process.fork_processes(0)
tornado.process.task_id()
server = tornado.httpserver.HTTPServer(app)
server.add_sockets(sockets)
tornado.ioloop.IOLoop.current().start()
addHandler.py
class AddHandler(tornado.web.RequestHandler):
def initialize(self, folderpaths):
self.folderpaths = folderpaths
def blockingFunction(self):
time.sleep(320)
post("AWAKE")
def post(self):
user = self.get_argument('user')
folderpath = self.get_argument('inpath')
outpath = self.get_argument('outpath')
workflow_value = self.get_argument('workflow')
status_code, status_text = validateInFolder(folderpath)
if (status_code == 200):
logging.info("Status Code 200")
result = self.folderpaths.add_file(user, folderpath, outpath, workflow_value)
self.write(result)
self.finish()
#At this point the path is validated.
#POST response should be send out. Internal process should continue, new
#requests should not be blocked
self.blockingFunction()
Idea is that if input-parameters are validated the POST response should be sent out.
Then internal process (blockingFunction()) should be started, that should not block the Tornado Server from processing another API POST request.
I tried defining the (blockingFunction()) as async, which allows me to process multiple concurrent user requests - however there was a warning about missing "await" with async method.
Any help welcome. Thank you
class AddHandler(tornado.web.RequestHandler):
def initialize(self, folderpaths):
self.folderpaths = folderpaths
def blockingFunction(self):
time.sleep(320)
post("AWAKE")
async def post(self):
user = self.get_argument('user')
folderpath = self.get_argument('inpath')
outpath = self.get_argument('outpath')
workflow_value = self.get_argument('workflow')
status_code, status_text = validateInFolder(folderpath)
if (status_code == 200):
logging.info("Status Code 200")
result = self.folderpaths.add_file(user, folderpath, outpath, workflow_value)
self.write(result)
self.finish()
#At this point the path is validated.
#POST response should be send out. Internal process should continue, new
#requests should not be blocked
await loop.run_in_executor(None, self.blockingFunction)
#if this had multiple parameters it would be
#await loop.run_in_executor(None, self.blockingFunction, param1, param2)
Thank you #xyres
Further read: https://www.tornadoweb.org/en/stable/faq.html
So i'm creating this application and a part of it is a web page where a trading algorithm is testing itself using live data. All that is working but the issue is if i leave (exit) the web page, it stops. I was wondering how i can keep it running in the background indefinitely as i want the algorithm to keep doing it's thing.
This is the route which i would like to run in the background.
#app.route('/live-data-source')
def live_data_source():
def get_live_data():
live_options = lo.Options()
while True:
live_options.run()
live_options.update_strategy()
trades = live_options.get_all_option_trades()
trades = trades[0]
json_data = json.dumps(
{'data': trades})
yield f"data:{json_data}\n\n"
time.sleep(5)
return Response(get_live_data(), mimetype='text/event-stream')
I've looked into multi-threading but not too sure if that's the right thing for the job. I am kind of still new to flask so hence the poor question. If you need more info, please do comment.
You can do it the following way - this is 100% working example below. Note, in production use Celery for such tasks, or write another one daemon app (another process) by yourself and feed it with tasks from http server with the help of message queue (e.g. RabbitMQ) or with the help of common database.
If any questions regarding code below, feel free to ask, it was quite good exercise for me:
from flask import Flask, current_app
import threading
from threading import Thread, Event
import time
from random import randint
app = Flask(__name__)
# use the dict to store events to stop other treads
# one event per thread !
app.config["ThreadWorkerActive"] = dict()
def do_work(e: Event):
"""function just for another one thread to do some work"""
while True:
if e.is_set():
break # can be stopped from another trhead
print(f"{threading.current_thread().getName()} working now ...")
time.sleep(2)
print(f"{threading.current_thread().getName()} was stoped ...")
#app.route("/long_thread", methods=["GET"])
def long_thread_task():
"""Allows to start a new thread"""
th_name = f"Th-{randint(100000, 999999)}" # not really unique actually
stop_event = Event() # is used to stop another thread
th = Thread(target=do_work, args=(stop_event, ), name=th_name, daemon=True)
th.start()
current_app.config["ThreadWorkerActive"][th_name] = stop_event
return f"{th_name} was created!"
#app.route("/stop_thread/<th_id>", methods=["GET"])
def stop_thread_task(th_id):
th_name = f"Th-{th_id}"
if th_name in current_app.config["ThreadWorkerActive"].keys():
e = current_app.config["ThreadWorkerActive"].get(th_name)
if e:
e.set()
current_app.config["ThreadWorkerActive"].pop(th_name)
return f"Th-{th_id} was asked to stop"
else:
return "Sorry something went wrong..."
else:
return f"Th-{th_id} not found"
#app.route("/", methods=["GET"])
def index_route():
text = ("/long_thread - create another thread. "
"/stop_thread/th_id - stop thread with a certain id. "
f"Available Threads: {'; '.join(current_app.config['ThreadWorkerActive'].keys())}")
return text
if __name__ == '__main__':
app.run(host="0.0.0.0", port=9999)
I have 2 functions.
1st function stores the data received in a list and 2nd function writes the data into a csv file.
I'm using Flask. Whenever a web service has been called it will store the data and send response to it, as soon as it sends response it triggers the 2nd function.
My Code:
from flask import Flask, flash, request, redirect, url_for, session
import json
app = Flask(__name__)
arr = []
#app.route("/test", methods=['GET','POST'])
def check():
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
return json.dumps(res)
def trigger():
df = pd.DataFrame({'x': arr})
df.to_csv("docs/xyz.csv", index=False)
return
Obviously the 2nd function is not called.
Is there a way to achieve this?
P.S: My real life problem is different where trigger function is time consuming and I don't want user to wait for it to finish execution.
One solution would be to have a background thread that will watch a queue. You put your csv data in the queue and the background thread will consume it. You can start such a thread before first request:
import threading
from multiprocessing import Queue
class CSVWriterThread(threading.Thread):
def __init__(self, *args, **kwargs):
threading.Thread.__init__(self, *args, **kwargs)
self.input_queue = Queue()
def send(self, item):
self.input_queue.put(item)
def close(self):
self.input_queue.put(None)
self.input_queue.join()
def run(self):
while True:
csv_array = self.input_queue.get()
if csv_array is None:
break
# Do something here ...
df = pd.DataFrame({'x': csv_array})
df.to_csv("docs/xyz.csv", index=False)
self.input_queue.task_done()
time.sleep(1)
# Done
self.input_queue.task_done()
return
#app.before_first_request
def activate_job_monitor():
thread = CSVWriterThread()
app.csvwriter = thread
thread.start()
And in your code put the message in the queue before returning:
#app.route("/test", methods=['GET','POST'])
def check():
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
app.csvwriter.send(arr)
return json.dumps(res)
P.S: My real life problem is different where trigger function is time consuming and I don't want user to wait for it to finish execution.
Consider using celery which is made for the very problem you're trying to solve. From docs:
Celery is a simple, flexible, and reliable distributed system to process vast amounts of messages, while providing operations with the tools required to maintain such a system.
I recommend you integrate celery with your flask app as described here. your trigger method would then become a straightforward celery task that you can execute without having to worry about long response time.
Im actually working on another interesting case on my side where i pass the work off to a python worker that sends the job to a redis queue. There are some great blogs using redis with Flask , you basically need to ensure redis is running (able to connect on port 6379)
The worker would look something like this:
import os
import redis
from rq import Worker, Queue, Connection
listen = ['default']
redis_url = os.getenv('REDISTOGO_URL', 'redis://localhost:6379')
conn = redis.from_url(redis_url)
if __name__ == '__main__':
with Connection(conn):
worker = Worker(list(map(Queue, listen)))
worker.work()
In my example I have a function that queries a database for usage and since it might be a lengthy process i pass it off to the worker (running as a seperate script)
def post(self):
data = Task.parser.parse_args()
job = q.enqueue_call(
func=migrate_usage, args=(my_args),
result_ttl=5000
)
print("Job ID is: {}".format(job.get_id()))
job_key = job.get_id()
print(str(Job.fetch(job_key, connection=conn).result))
if job:
return {"message": "Job : {} added to queue".format(job_key)}, 201
Credit due to the following article:
https://realpython.com/flask-by-example-implementing-a-redis-task-queue/#install-requirements
You can try use streaming. See next example:
import time
from flask import Flask, Response
app = Flask(__name__)
#app.route('/')
def main():
return '''<div>start</div>
<script>
var xhr = new XMLHttpRequest();
xhr.open('GET', '/test', true);
xhr.onreadystatechange = function(e) {
var div = document.createElement('div');
div.innerHTML = '' + this.readyState + ':' + this.responseText;
document.body.appendChild(div);
};
xhr.send();
</script>
'''
#app.route('/test')
def test():
def generate():
app.logger.info('request started')
for i in range(5):
time.sleep(1)
yield str(i)
app.logger.info('request finished')
yield ''
return Response(generate(), mimetype='text/plain')
if __name__ == '__main__':
app.run('0.0.0.0', 8080, True)
All magic in this example in genarator where you can start response data, after do some staff and yield empty data to end your stream.
For details look at http://flask.pocoo.org/docs/patterns/streaming/.
You can defer route specific actions with limited context by combining after_this_request and response.call_on_close. Note that request and response context won't be available but the route function context remains available. So you'll need to copy any request/response data you'll need into local variables for deferred access.
I moved your array to a local var to show how the function context is preserved. You could change your csv write function to an append so you're not pushing data endlessly into memory.
from flask import Flask, flash, request, redirect, url_for, session
import json
app = Flask(__name__)
#app.route("/test", methods=['GET','POST'])
def check():
arr = []
arr.append(request.form['a'])
arr.append(request.form['b'])
res = {'Status': True}
#flask.after_this_request
def add_close_action(response):
#response.call_on_close
def process_after_request():
df = pd.DataFrame({'x': arr})
df.to_csv("docs/xyz.csv", index=False)
return response
return json.dumps(res)
I'm still new to Flask and I am trying to build a flask app that flashes status as and when tasks are complete
`
app.route("/", methods=['GET', 'POST'])
def root():
form = ReusableForm(request.form)
if request.method == 'POST':
name = request.form['name']
if form.validate():
Reports1 = RunReport1()
flash('Report 1 run successfully')
Report2 = RunReport2()
flash('Report 2 run successfully')
return render_template('run.html', form=form)'
However, there doesn't seem to be a way to flash messages without rendering the html page each time. Any assistance would be greatly appreciated. Thanks.
You can do it with Flask without roundtrip to the system. You have to use content streaming. For this you have to use stream_with_context for the response.
#app.route("/")
def root():
reports = ["Reports1", "Reports2"]
def generate_output():
for i in range(0, len(reports)):
report = reports[i]
template = '<p>{{ report }}.</p>'
context = {'report': report}
yield render_template_string(template, **context)
time.sleep(5)
return Response(stream_with_context(generate_output()))
So this will update the response till server side complete its tasks. Normally this is used for long running processes. Refer the documentation.
Hope this helps!
I have a problem with django,
it was like, when users submit some data,it will go into view.py to process, and it will eventually get to the success page.
But the process is too long.I don't want the users to wait for that long time.What i want is to get to the success page right away after users submiting the data.And the Server will process the data after return the success page.
Can you please tell me how to deal with it?
that was my code,but i don't know why it didn't work.
url.py
from django.conf.urls import patterns, url
from hebeu.views import handleRequest
urlpatterns = patterns('',
url(r'^$', handleRequest),
)
view.py
def handleRequest(request):
if request.method == 'POST':
response = HttpResponse(parserMsg(request))
return response
else:
return None
def parserMsg(request):
rawStr = smart_str(request.body)
msg = paraseMsgXml(ET.fromstring(rawStr))
queryStr = msg.get('Content')
openID = msg.get('FromUserName')
arr = smart_unicode(queryStr).split(' ')
#start a new thread
cache_classroom(openID,arr[1],arr[2],arr[3],arr[4]).start()
return "success"
My English is not good, i hope you can understand.
Take a look at Celery, it is a distributed task queue that will handle your situation perfectly. There is a little bit of setup to get everything working but once that is out of the way Celery really easy to work with.
For integration with Django start here: http://docs.celeryproject.org/en/latest/django/index.html
Write a management command to for parseMsg and trigger that using subprocess.popen and return success to user and parseMsg process will run in background. if this kind of operations are more in the application then you should use celery.
This is quiet easy, encapsulate #start a new thread with the code below
from threading import Thread
from datetime import datetime
class ProcessThread(Thread):
def __init__(self, name):
Thread.__init__(self)
self.name = name
self.started = datetime.now()
def run(self):
cache_classroom(openID,arr[1],arr[2],arr[3],arr[4]).start()
# I added this so you might know how long the process lasted
# just incase any optimization of your code is needed
finished = datetime.now()
duration = (self.start - finished).seconds
print "%s thread started at %s and finished at %s in "\
"%s seconds" % (self.name, self.started, finished, duration)
# let us now run start the thread
my_thread = ProcessThread("CacheClassroom")
my_thread.start()