I am creating a fairly straightforward little app, and am trying to use apscheduler to send a text to a user at a certain time. It works wonderfully locally, but the clock process, as specified in the Procfile, crashes every time. The only information I get is "Process exited with status code 0. State changed from up to crashed." when I run heroku logs -t -p clock.
My Procfile looks as follows:
web: uvicorn remind_me.main:app --host=0.0.0.0 --port=${PORT:-8000} clock: python remind_me/views/register.py
I also ran the heroku ps:scale clock=1 command.
The code for my "view" is below:
from apscheduler.schedulers.background import BackgroundScheduler
from dateutil.parser import parse`
sched = BackgroundScheduler({'apscheduler.timezone': 'EST'})
def scheduled_job(msg, number, carrier):
send(msg, number, carrier)
#router.post('/')
#template()
async def home(request: Request):
vm = HomeViewModel(request)
await vm.load()
if vm.error:
return vm.to_dict()
msg, number, carrier, run_date = vm.task, vm.number, vm.carrier, vm.date_and_time
run_date = parse(run_date)
sched.add_job(scheduled_job, args=(msg,number,carrier), trigger='date', run_date=run_date)
try:
sched.start()
except:
pass
db_storage.store_events(vm.name, vm.number, vm.carrier, vm.task, vm.date_and_time)
return fastapi.responses.RedirectResponse(url="/", status_code=status.HTTP_302_FOUND)
I obviously left out some imports related to FastAPI and some stuff I was doing with the database, but I left those out for brevity. Everything works great locally, and all the pages work perfectly on Heroku, it's just that the clock process crashes every time.
I have also tried this with the blocking scheduler, but have had no luck, as that doesn't return and obviously I need the page to return/redirect.
I am wondering if it has something to do with the async aspect of the view itself, and if I might need to be APScheduler's async method (I can't remember its name offhand.). Any advice that you folks could give would be much appreciated. Thanks!
Related
Setup :
Language i am using is Python
I am running a flask app with threaded =True and inside the flask app when an endpoint is hit, it starts a thread and returns a thread started successfully with status code 200.
Inside the thread, there is re.Search happening, which takes 30-40 seconds(worst case) and once the thread is completed it hits a callback URL completing one request.
Ideally, the Flask app is able to handle concurrent requests
Issue:
When the re.search is happening inside the thread, the flask app is not accepting concurrent requests. I am assuming some kind of thread locking is happening and unable to figure out.
Question :
is it ok to do threading(using multi processing) inside flask app which has "threaded = True"?
When regex is happening does it do any thread locking?
Code snippet: hit_callback does a post request to another api which is not needed for this issue.
import threading
from flask import *
import re
app = Flask(__name__)
#app.route(/text)
def temp():
text = request.json["text"]
def extract_company(text)
attrib_list = ["llc","ltd"] # realone is 700 in length
entity_attrib = r"((\s|^)" + r"(.)?(\s|$)|(\s|^)".join(attrib_list) + r"(\s|$))"
raw_client = re.search("(.*(?:" + entity_attrib + "))", text, re.I)
hit_callback(raw_client)
extract_thread = threading.Thread(target=extract_company, )
extract_thread.start()
return jsonify({"Response": True}), 200
if __name__ == "__main__":
app.run(debug=True, host='0.0.0.0', port=4557, threaded=True)
Please read up on the GIL - basically, Python can only execute ONE piece of code at the same time - even if in threads. So your re-search runs and blocks all other threads until run to completion.
Solutions: Use Gunicorn etc. pp to run multiple flask processes - do not attempt to do everything in one flask process.
To add something: Also your design is problematic - 40 seconds or so for a http answer is way to long. A better design would have at least two services: a web server and a search service. The first would register a search and give back an id. Your search service would communicate asynchronously with the web service and return resullt +id when the result is ready. Your clients could pull the web service until they get a result.
I'm doing some load testing using Locust. For what it's worth, I am also using Graphite and Grafana for analytics of the results, but I can produce this issue without loading or using either one in my code.
At it's most simple, the issue can be reproduced with the following very simple locust file:
from locust import HttpLocust, TaskSet, task, between
import locust.events
class Tasks(TaskSet):
#task
def make_request(self):
self.client.get('/')
print('doing thing')
class Locust(HttpLocust):
wait_time = between(1, 3)
task_set = Tasks
def __init__(self):
super(Locust, self).__init__()
locust.events.request_success += self.hook_request_success
def hook_request_success(self, request_type, name,
response_time, response_length):
# This is where I would make a call to send the request's metadata to Graphite
print("sending thing")
Called like so:
locust -H <host> -t 10s -c 1000 -r 10 --no-web -f test.py
As you can see, the spec is basic. I have a one-task plan, and I want to perform a single request sending the result of each request to Graphite. However, in the stdout of all of my runs, I get nearly sixty (60) times more instances of "sending thing" than "doing thing" when I would assume they would be exactly one to one! I've confirmed the function is being called with the same parameters and represents the exact same request, just called numerous times for every "actual" request locust makes. I don't want this at all, I only want to send the request once.
Why is this? Is there something I'm doing wrong?
You're adding the event listener every time you spawn/instantiate a locust.
Make hook_request_success a global function and add it from there as well, changing it to something like this:
locust.events.request_success += hook_request_success
That way it will only be added once, which is what you want.
I have a web endpoint for users to upload file.
When the endpoint receives the request, I want to run a background job to process the file.
Since the job would take time to complete, I wish to return the job_id to the user to track the status of the request while the job is running in background.
I am wondering if asyncio would help in this case.
import asyncio
#asyncio.coroutine
def process_file(job_id, file_obj):
<process the file and dump results in db>
#app.route('/file-upload', methods=['POST'])
def upload_file():
job_id = uuid()
process_file(job_id, requests.files['file']) . # I want this call to be asyc without any await
return jsonify('message' : 'Request received. Track the status using: " + `job_id`)
With the above code, process_file method is never called. Not able to understand why.
I am not sure if this is the right way to do it though, please help if I am missing something.
Flask doesn't support async calls yet.
To create and execute heavy tasks in background you can use https://flask.palletsprojects.com/en/1.1.x/patterns/celery/ Celery library.
You can use this for reference:
Making an asynchronous task in Flask
Official documentation:
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html#installing-celery
Even though you wrote #asyncio.coroutine() around a function it is never awaited which tells a function to return result.
Asyncio is not good for such kind of tasks, because they are blocking I/O. It is usually used to make function calls and return results fast.
As #py_dude mentioned, Flask does not support async calls. If you are looking for a library that functions and feels similar to Flask but is asynchronous, I recommend checking out Sanic. Here is some sample code:
from sanic import Sanic
from sanic.response import json
app = Sanic()
#app.route("/")
async def test(request):
return json({"hello": "world"})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8000)
Updating your database asynchronously shouldn't be an issue; refer to here to find asyncio-supported database drivers. For processing your file, check out aiohttp. You can run your server extremely fast on a single thread without any hickup if you do so asynchronously.
I have an odd problem at which when I run my code below in PyCharm or through the console (python script.py) the flask server takes an extremely long time to boot meaning when trying to access it it shows no content for a good few minutes.
import threading
from flask import render_template, request, logging, Flask, redirect
def setupFlask():
appn = Flask(__name__)
log = logging.getLogger('werkzeug')
log.setLevel(logging.ERROR)
#appn.route('/')
def page():
return render_template('index.html')
#appn.route('/submit', methods=['POST'])
def submit():
token = request.form['ID']
ID = token
return redirect('/')
appn.run()
a = threading.Thread(target=setupFlask)
a.daemon = True
a.start()
while True:
pass
The odd thing is that when I run the same code above in the PyCharm debugger, the flask server takes about 5 seconds to boot, massively quicker than the few minutes it takes when run in the console. I would love that kind of speed when running the script normally and cant find a solution due to the problem fixing itself when in debugger!
This code snippet is part of a larger application, however I have adapted it to be run on its own and the same problem occurs.
I am not running in a virtualenv.
All help appreciated.
EDIT: The index.html document is very basic and only contains a few scripts and elements therefore I could not see it taking a long time to load.
The problem was with your Flask installation, but there's another one. You should not wait for your thread with a while loop. The better way is to join your thread, like this:
a = threading.Thread(target=setupFlask)
a.daemon = True
a.start()
a.join()
I am writing a web application which would do some heavy work. With that in mind I thought of making the tasks as background tasks(non blocking) so that other requests are not blocked by the previous ones.
I went with demonizing the thread so that it doesn't exit once the main thread (since I am using threaded=True) is finished, Now if a user sends a request my code will immediately tell them that their request is in progress, it'll be running in the background, and the application is ready to serve other requests.
My current application code looks something like this:
from flask import Flask
from flask import request
import threading
class threadClass:
def __init__(self):
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = threadClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
I just want it to be able to handle a few concurrent requests (it's not gonna be used in production)
Could I have done this better? Did I miss anything? I was going through python's multi-threading package and found this
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.
Can I demonize a process using multi-processing? How can I achieve better than what I have with threading module?
##EDIT
I went through the multi-processing package of python, it is similar to threading.
from flask import Flask
from flask import request
from multiprocessing import Process
class processClass:
def __init__(self):
p = Process(target=self.run, args=())
p.daemon = True # Daemonize it
p.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = processClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
Does the above approach looks good?
Best practice
The best way to implement background tasks in flask is with Celery as explained in this SO post. A good starting point is the official Flask documentation and the Celery documentation.
Crazy way: Build your own decorator
As #MrLeeh pointed out in a comment, Miguel Grinberg presented a solution in his Pycon 2016 talk by implementing a decorator. I want to emphasize that I have the highest respect for his solution; he called it a "crazy solution" himself. The below code is a minor adaptation of his solution.
Warning!!!
Don't use this in production! The main reason is that this app has a memory leak by using the global tasks dictionary. Even if you fix the memory leak issue, maintaining this sort of code is hard. If you just want to play around or use this in a private project, read on.
Minimal example
Assume you have a long running function call in your /foo endpoint. I mock this with a 10 second sleep timer. If you call the enpoint three times, it will take 30 seconds to finish.
Miguel Grinbergs decorator solution is implemented in flask_async. It runs a new thread in a Flask context which is identical to the current Flask context. Each thread is issued a new task_id. The result is saved in a global dictionary tasks[task_id]['result'].
With the decorator in place you only need to decorate the endpoint with #flask_async and the endpoint is asynchronous - just like that!
import threading
import time
import uuid
from functools import wraps
from flask import Flask, current_app, request, abort
from werkzeug.exceptions import HTTPException, InternalServerError
app = Flask(__name__)
tasks = {}
def flask_async(f):
"""
This decorator transforms a sync route to asynchronous by running it in a background thread.
"""
#wraps(f)
def wrapped(*args, **kwargs):
def task(app, environ):
# Create a request context similar to that of the original request
with app.request_context(environ):
try:
# Run the route function and record the response
tasks[task_id]['result'] = f(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['result'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['result'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task': threading.Thread(
target=task, args=(current_app._get_current_object(), request.environ))}
tasks[task_id]['task'].start()
# Return a 202 response, with an id that the client can use to obtain task status
return {'TaskId': task_id}, 202
return wrapped
#app.route('/foo')
#flask_async
def foo():
time.sleep(10)
return {'Result': True}
#app.route('/foo/<task_id>', methods=['GET'])
def foo_results(task_id):
"""
Return results of asynchronous task.
If this request returns a 202 status code, it means that task hasn't finished yet.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'result' not in task:
return {'TaskID': task_id}, 202
return task['result']
if __name__ == '__main__':
app.run(debug=True)
However, you need a little trick to get your results. The endpoint /foo will only return the HTTP code 202 and the task id, but not the result. You need another endpoint /foo/<task_id> to get the result. Here is an example for localhost:
import time
import requests
task_ids = [requests.get('http://127.0.0.1:5000/foo').json().get('TaskId')
for _ in range(2)]
time.sleep(11)
results = [requests.get(f'http://127.0.0.1:5000/foo/{task_id}').json()
for task_id in task_ids]
# [{'Result': True}, {'Result': True}]