Django Celery : Something wrong with shared_task - python

I use django-rest-framework and celery
this ims my views.py
# GET /server/test/<para>/
class Testcelery(APIView):
def test(self):
print(celery_test())
def get(self, request, para, format=None):
print('test')
self.test()
# result = add.delay(4, 4)
# print(result.id)
result = OrderedDict()
result['result'] = 'taskid'
result['code'] = status.HTTP_200_OK
result['message'] = 'success'
return Response(result, status=status.HTTP_200_OK)
this is a simple celery task
#shared_task()
def celery_test():
print('celerytest')
return True
I debug the django
it can goes to the test method
but the program stuck at the next step in call in local.py
where the error happens
the debug stops there, and shows like this
debug result

There are several problems:
Tasks are supposed to run with whatever.delay() http://docs.celeryproject.org/en/latest/userguide/calling.html#basics
I wouldn't call a class Testsomething unless it's a test class
Make sure the worker is running and that it is initialized correctly. Don't forget to check if the broker is running correctly.
When debugging, take into account that the celery worker is a different process. Your debugger is probably attached only to the process running manage.py. If you run the worker as a command from the IDE, it'll probably be easier to debug.

Related

How to make Celery + Django + Pytest work together

Trying to test an endpoint that has a celery task. Celery tasks don't seem to run in the test.
django==4.1.5
celery==5.2.7
pytest==7.2.1
pytest-django==4.5.2
An endpoint:
def do_some_stuff(blah: Blah) -> Blah:
res = cool_task.apply_async(kwargs={
'cool_id': int(pk),
'config': config,
'name': RESUBMIT.value,
},
link=update_status.si(
cool_id=int(pk),
new_status="why is this so hard",
)
)
[...]
A test:
#pytest.mark.django_db
def test_my_thing(django_client: Client) -> None:
[...]
response = django_client.post(f"/api/myendpoint/{mything.id}/do_some_stuff/")
It hits the endpoint. Gets a 202 back as expected. But celery doesn't seem to be picking up the task in the test. The update_status method updates the db, and I'm not seeing that happen.
I've tried creating a celery app in the test, creating a worker in the test, changing the test to use the main DB instead of the test db, setting the override_settings to "BROKER_BANDEND='memory'".
I'd like a full working example. It seems kind of basic but it's eluding me. I don't understand what combination of fixtures and overrides I need for this to work.
Seems to work when I actually call the application.

how to redirect to another page and call a python function when clicking on submit button [duplicate]

I am writing an application in Flask, which works really well except that WSGI is synchronous and blocking. I have one task in particular which calls out to a third party API and that task can take several minutes to complete. I would like to make that call (it's actually a series of calls) and let it run. while control is returned to Flask.
My view looks like:
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = audio_class.render_audio(data=text_list)
# do stuff
return Response(
mimetype='application/json',
status=200
)
Now, what I want to do is have the line
final_file = audio_class.render_audio()
run and provide a callback to be executed when the method returns, whilst Flask can continue to process requests. This is the only task which I need Flask to run asynchronously, and I would like some advice on how best to implement this.
I have looked at Twisted and Klein, but I'm not sure they are overkill, as maybe Threading would suffice. Or maybe Celery is a good choice for this?
I would use Celery to handle the asynchronous task for you. You'll need to install a broker to serve as your task queue (RabbitMQ and Redis are recommended).
app.py:
from flask import Flask
from celery import Celery
broker_url = 'amqp://guest#localhost' # Broker URL for RabbitMQ task queue
app = Flask(__name__)
celery = Celery(app.name, broker=broker_url)
celery.config_from_object('celeryconfig') # Your celery configurations in a celeryconfig.py
#celery.task(bind=True)
def some_long_task(self, x, y):
# Do some long task
...
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = audio_class.render_audio(data=text_list)
some_long_task.delay(x, y) # Call your async task and pass whatever necessary variables
return Response(
mimetype='application/json',
status=200
)
Run your Flask app, and start another process to run your celery worker.
$ celery worker -A app.celery --loglevel=debug
I would also refer to Miguel Gringberg's write up for a more in depth guide to using Celery with Flask.
Threading is another possible solution. Although the Celery based solution is better for applications at scale, if you are not expecting too much traffic on the endpoint in question, threading is a viable alternative.
This solution is based on Miguel Grinberg's PyCon 2016 Flask at Scale presentation, specifically slide 41 in his slide deck. His code is also available on github for those interested in the original source.
From a user perspective the code works as follows:
You make a call to the endpoint that performs the long running task.
This endpoint returns 202 Accepted with a link to check on the task status.
Calls to the status link returns 202 while the taks is still running, and returns 200 (and the result) when the task is complete.
To convert an api call to a background task, simply add the #async_api decorator.
Here is a fully contained example:
from flask import Flask, g, abort, current_app, request, url_for
from werkzeug.exceptions import HTTPException, InternalServerError
from flask_restful import Resource, Api
from datetime import datetime
from functools import wraps
import threading
import time
import uuid
tasks = {}
app = Flask(__name__)
api = Api(app)
#app.before_first_request
def before_first_request():
"""Start a background thread that cleans up old tasks."""
def clean_old_tasks():
"""
This function cleans up old tasks from our in-memory data structure.
"""
global tasks
while True:
# Only keep tasks that are running or that finished less than 5
# minutes ago.
five_min_ago = datetime.timestamp(datetime.utcnow()) - 5 * 60
tasks = {task_id: task for task_id, task in tasks.items()
if 'completion_timestamp' not in task or task['completion_timestamp'] > five_min_ago}
time.sleep(60)
if not current_app.config['TESTING']:
thread = threading.Thread(target=clean_old_tasks)
thread.start()
def async_api(wrapped_function):
#wraps(wrapped_function)
def new_function(*args, **kwargs):
def task_call(flask_app, environ):
# Create a request context similar to that of the original request
# so that the task can have access to flask.g, flask.request, etc.
with flask_app.request_context(environ):
try:
tasks[task_id]['return_value'] = wrapped_function(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['return_value'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['return_value'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
finally:
# We record the time of the response, to help in garbage
# collecting old tasks
tasks[task_id]['completion_timestamp'] = datetime.timestamp(datetime.utcnow())
# close the database session (if any)
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task_thread': threading.Thread(
target=task_call, args=(current_app._get_current_object(),
request.environ))}
tasks[task_id]['task_thread'].start()
# Return a 202 response, with a link that the client can use to
# obtain task status
print(url_for('gettaskstatus', task_id=task_id))
return 'accepted', 202, {'Location': url_for('gettaskstatus', task_id=task_id)}
return new_function
class GetTaskStatus(Resource):
def get(self, task_id):
"""
Return status about an asynchronous task. If this request returns a 202
status code, it means that task hasn't finished yet. Else, the response
from the task is returned.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'return_value' not in task:
return '', 202, {'Location': url_for('gettaskstatus', task_id=task_id)}
return task['return_value']
class CatchAll(Resource):
#async_api
def get(self, path=''):
# perform some intensive processing
print("starting processing task, path: '%s'" % path)
time.sleep(10)
print("completed processing task, path: '%s'" % path)
return f'The answer is: {path}'
api.add_resource(CatchAll, '/<path:path>', '/')
api.add_resource(GetTaskStatus, '/status/<task_id>')
if __name__ == '__main__':
app.run(debug=True)
You can also try using multiprocessing.Process with daemon=True; the process.start() method does not block and you can return a response/status immediately to the caller while your expensive function executes in the background.
I experienced similar problem while working with falcon framework and using daemon process helped.
You'd need to do the following:
from multiprocessing import Process
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
heavy_process = Process( # Create a daemonic process with heavy "my_func"
target=my_func,
daemon=True
)
heavy_process.start()
return Response(
mimetype='application/json',
status=200
)
# Define some heavy function
def my_func():
time.sleep(10)
print("Process finished")
You should get a response immediately and, after 10s you should see a printed message in the console.
NOTE: Keep in mind that daemonic processes are not allowed to spawn any child processes.
Flask 2.0
Flask 2.0 supports async routes now. You can use the httpx library and use the asyncio coroutines for that. You can change your code a bit like below
#app.route('/render/<id>', methods=['POST'])
async def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = await asyncio.gather(
audio_class.render_audio(data=text_list),
do_other_stuff_function()
)
# Just make sure that the coroutine should not having any blocking calls inside it.
return Response(
mimetype='application/json',
status=200
)
The above one is just a pseudo code, but you can checkout how asyncio works with flask 2.0 and for HTTP calls you can use httpx. And also make sure the coroutines are only doing some I/O tasks only.
If you are using redis, you can use Pubsub event to handle background tasks.
See more: https://redis.com/ebook/part-2-core-concepts/chapter-3-commands-in-redis/3-6-publishsubscribe/

How to send a task to Celery without waiting?

I have the following code in my tasks.py file:
#app.task(bind=True)
def create_car(self, car):
if car is None:
return False
status = subprocess.run(["<some_command_to_run>"])
return True
It should run the command <some_command_to_run> but for some reason the website waits it to finish. I thought the whole point of Celery that it will be run in the background and return status. How can I submit this task in asynchronous way? The wanted behaviour: user asked to create a new car instance, it will add a task to the queue and return true that indicating that the car was requested correctly. In the background it will run that command and return (somewhere - not sure yet where) that status. How to do it?
you just need to call create_car.delay(instance.pk), delay() make it async.
it's JSON encoded so make sure to pass only primary key or json serializable data (model instance is not)
Be carefull because post_save is not async too :)

Background tasks in flask

I am writing a web application which would do some heavy work. With that in mind I thought of making the tasks as background tasks(non blocking) so that other requests are not blocked by the previous ones.
I went with demonizing the thread so that it doesn't exit once the main thread (since I am using threaded=True) is finished, Now if a user sends a request my code will immediately tell them that their request is in progress, it'll be running in the background, and the application is ready to serve other requests.
My current application code looks something like this:
from flask import Flask
from flask import request
import threading
class threadClass:
def __init__(self):
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = threadClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
I just want it to be able to handle a few concurrent requests (it's not gonna be used in production)
Could I have done this better? Did I miss anything? I was going through python's multi-threading package and found this
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.
Can I demonize a process using multi-processing? How can I achieve better than what I have with threading module?
##EDIT
I went through the multi-processing package of python, it is similar to threading.
from flask import Flask
from flask import request
from multiprocessing import Process
class processClass:
def __init__(self):
p = Process(target=self.run, args=())
p.daemon = True # Daemonize it
p.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = processClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
Does the above approach looks good?
Best practice
The best way to implement background tasks in flask is with Celery as explained in this SO post. A good starting point is the official Flask documentation and the Celery documentation.
Crazy way: Build your own decorator
As #MrLeeh pointed out in a comment, Miguel Grinberg presented a solution in his Pycon 2016 talk by implementing a decorator. I want to emphasize that I have the highest respect for his solution; he called it a "crazy solution" himself. The below code is a minor adaptation of his solution.
Warning!!!
Don't use this in production! The main reason is that this app has a memory leak by using the global tasks dictionary. Even if you fix the memory leak issue, maintaining this sort of code is hard. If you just want to play around or use this in a private project, read on.
Minimal example
Assume you have a long running function call in your /foo endpoint. I mock this with a 10 second sleep timer. If you call the enpoint three times, it will take 30 seconds to finish.
Miguel Grinbergs decorator solution is implemented in flask_async. It runs a new thread in a Flask context which is identical to the current Flask context. Each thread is issued a new task_id. The result is saved in a global dictionary tasks[task_id]['result'].
With the decorator in place you only need to decorate the endpoint with #flask_async and the endpoint is asynchronous - just like that!
import threading
import time
import uuid
from functools import wraps
from flask import Flask, current_app, request, abort
from werkzeug.exceptions import HTTPException, InternalServerError
app = Flask(__name__)
tasks = {}
def flask_async(f):
"""
This decorator transforms a sync route to asynchronous by running it in a background thread.
"""
#wraps(f)
def wrapped(*args, **kwargs):
def task(app, environ):
# Create a request context similar to that of the original request
with app.request_context(environ):
try:
# Run the route function and record the response
tasks[task_id]['result'] = f(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['result'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['result'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task': threading.Thread(
target=task, args=(current_app._get_current_object(), request.environ))}
tasks[task_id]['task'].start()
# Return a 202 response, with an id that the client can use to obtain task status
return {'TaskId': task_id}, 202
return wrapped
#app.route('/foo')
#flask_async
def foo():
time.sleep(10)
return {'Result': True}
#app.route('/foo/<task_id>', methods=['GET'])
def foo_results(task_id):
"""
Return results of asynchronous task.
If this request returns a 202 status code, it means that task hasn't finished yet.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'result' not in task:
return {'TaskID': task_id}, 202
return task['result']
if __name__ == '__main__':
app.run(debug=True)
However, you need a little trick to get your results. The endpoint /foo will only return the HTTP code 202 and the task id, but not the result. You need another endpoint /foo/<task_id> to get the result. Here is an example for localhost:
import time
import requests
task_ids = [requests.get('http://127.0.0.1:5000/foo').json().get('TaskId')
for _ in range(2)]
time.sleep(11)
results = [requests.get(f'http://127.0.0.1:5000/foo/{task_id}').json()
for task_id in task_ids]
# [{'Result': True}, {'Result': True}]

Making an asynchronous task in Flask

I am writing an application in Flask, which works really well except that WSGI is synchronous and blocking. I have one task in particular which calls out to a third party API and that task can take several minutes to complete. I would like to make that call (it's actually a series of calls) and let it run. while control is returned to Flask.
My view looks like:
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = audio_class.render_audio(data=text_list)
# do stuff
return Response(
mimetype='application/json',
status=200
)
Now, what I want to do is have the line
final_file = audio_class.render_audio()
run and provide a callback to be executed when the method returns, whilst Flask can continue to process requests. This is the only task which I need Flask to run asynchronously, and I would like some advice on how best to implement this.
I have looked at Twisted and Klein, but I'm not sure they are overkill, as maybe Threading would suffice. Or maybe Celery is a good choice for this?
I would use Celery to handle the asynchronous task for you. You'll need to install a broker to serve as your task queue (RabbitMQ and Redis are recommended).
app.py:
from flask import Flask
from celery import Celery
broker_url = 'amqp://guest#localhost' # Broker URL for RabbitMQ task queue
app = Flask(__name__)
celery = Celery(app.name, broker=broker_url)
celery.config_from_object('celeryconfig') # Your celery configurations in a celeryconfig.py
#celery.task(bind=True)
def some_long_task(self, x, y):
# Do some long task
...
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = audio_class.render_audio(data=text_list)
some_long_task.delay(x, y) # Call your async task and pass whatever necessary variables
return Response(
mimetype='application/json',
status=200
)
Run your Flask app, and start another process to run your celery worker.
$ celery worker -A app.celery --loglevel=debug
I would also refer to Miguel Gringberg's write up for a more in depth guide to using Celery with Flask.
Threading is another possible solution. Although the Celery based solution is better for applications at scale, if you are not expecting too much traffic on the endpoint in question, threading is a viable alternative.
This solution is based on Miguel Grinberg's PyCon 2016 Flask at Scale presentation, specifically slide 41 in his slide deck. His code is also available on github for those interested in the original source.
From a user perspective the code works as follows:
You make a call to the endpoint that performs the long running task.
This endpoint returns 202 Accepted with a link to check on the task status.
Calls to the status link returns 202 while the taks is still running, and returns 200 (and the result) when the task is complete.
To convert an api call to a background task, simply add the #async_api decorator.
Here is a fully contained example:
from flask import Flask, g, abort, current_app, request, url_for
from werkzeug.exceptions import HTTPException, InternalServerError
from flask_restful import Resource, Api
from datetime import datetime
from functools import wraps
import threading
import time
import uuid
tasks = {}
app = Flask(__name__)
api = Api(app)
#app.before_first_request
def before_first_request():
"""Start a background thread that cleans up old tasks."""
def clean_old_tasks():
"""
This function cleans up old tasks from our in-memory data structure.
"""
global tasks
while True:
# Only keep tasks that are running or that finished less than 5
# minutes ago.
five_min_ago = datetime.timestamp(datetime.utcnow()) - 5 * 60
tasks = {task_id: task for task_id, task in tasks.items()
if 'completion_timestamp' not in task or task['completion_timestamp'] > five_min_ago}
time.sleep(60)
if not current_app.config['TESTING']:
thread = threading.Thread(target=clean_old_tasks)
thread.start()
def async_api(wrapped_function):
#wraps(wrapped_function)
def new_function(*args, **kwargs):
def task_call(flask_app, environ):
# Create a request context similar to that of the original request
# so that the task can have access to flask.g, flask.request, etc.
with flask_app.request_context(environ):
try:
tasks[task_id]['return_value'] = wrapped_function(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['return_value'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['return_value'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
finally:
# We record the time of the response, to help in garbage
# collecting old tasks
tasks[task_id]['completion_timestamp'] = datetime.timestamp(datetime.utcnow())
# close the database session (if any)
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task_thread': threading.Thread(
target=task_call, args=(current_app._get_current_object(),
request.environ))}
tasks[task_id]['task_thread'].start()
# Return a 202 response, with a link that the client can use to
# obtain task status
print(url_for('gettaskstatus', task_id=task_id))
return 'accepted', 202, {'Location': url_for('gettaskstatus', task_id=task_id)}
return new_function
class GetTaskStatus(Resource):
def get(self, task_id):
"""
Return status about an asynchronous task. If this request returns a 202
status code, it means that task hasn't finished yet. Else, the response
from the task is returned.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'return_value' not in task:
return '', 202, {'Location': url_for('gettaskstatus', task_id=task_id)}
return task['return_value']
class CatchAll(Resource):
#async_api
def get(self, path=''):
# perform some intensive processing
print("starting processing task, path: '%s'" % path)
time.sleep(10)
print("completed processing task, path: '%s'" % path)
return f'The answer is: {path}'
api.add_resource(CatchAll, '/<path:path>', '/')
api.add_resource(GetTaskStatus, '/status/<task_id>')
if __name__ == '__main__':
app.run(debug=True)
You can also try using multiprocessing.Process with daemon=True; the process.start() method does not block and you can return a response/status immediately to the caller while your expensive function executes in the background.
I experienced similar problem while working with falcon framework and using daemon process helped.
You'd need to do the following:
from multiprocessing import Process
#app.route('/render/<id>', methods=['POST'])
def render_script(id=None):
...
heavy_process = Process( # Create a daemonic process with heavy "my_func"
target=my_func,
daemon=True
)
heavy_process.start()
return Response(
mimetype='application/json',
status=200
)
# Define some heavy function
def my_func():
time.sleep(10)
print("Process finished")
You should get a response immediately and, after 10s you should see a printed message in the console.
NOTE: Keep in mind that daemonic processes are not allowed to spawn any child processes.
Flask 2.0
Flask 2.0 supports async routes now. You can use the httpx library and use the asyncio coroutines for that. You can change your code a bit like below
#app.route('/render/<id>', methods=['POST'])
async def render_script(id=None):
...
data = json.loads(request.data)
text_list = data.get('text_list')
final_file = await asyncio.gather(
audio_class.render_audio(data=text_list),
do_other_stuff_function()
)
# Just make sure that the coroutine should not having any blocking calls inside it.
return Response(
mimetype='application/json',
status=200
)
The above one is just a pseudo code, but you can checkout how asyncio works with flask 2.0 and for HTTP calls you can use httpx. And also make sure the coroutines are only doing some I/O tasks only.
If you are using redis, you can use Pubsub event to handle background tasks.
See more: https://redis.com/ebook/part-2-core-concepts/chapter-3-commands-in-redis/3-6-publishsubscribe/

Categories

Resources