I have a Django + Celery project. One of the Celery tasks does a lot of small HTTP requests using the requests library, while others do lots of talking to the database via the Django ORM. The HTTP-heavy task is already running in its own celery worker using its own Celery queue. I'd like to make the HTTP-heavy worker use eventlet while leaving the rest of the tasks to use the prefork execution pool. How do I do this?
The Celery docs seem to suggest that I gain magical concurrency powers by just running celery ... -P eventlet. However, this SO answer says that I need to use a patched version of the requests library. Which is correct? Additionally, if I have to explicitly patch requests, do I have to put this task in a separate module from the rest of the regular tasks so that these other tasks can continue using the regular version of requests?
TL,DR: use patched version of requests library. No need to start separate module.
celery -P eventlet gives you celery jobs concurrency. They may or may not call eventlet.monkey_patch() to make all code compatible. They may also change it in future. Explicitly using patched version removes ambiguity while also providing useful documentation.
There is no point in separating concurrent requests from blocking. Your prefork pool can also use concurrent version. Eventlet doesn't poison things with something bad.
Related
I have a FastAPI api code that is executed using uvicorn. Now I want to add a queu system, and I think Celery and Flower can be great tools for me since my api has some endpoints that uses a lot CPU and take some seconds in answering. However, I have a couple of questions about the addition of Celery:
Does Celery substitute Uvicorn? Do I need it any more? I cannot see any example on the website where they consider uvicorn too, and when you execute the Celery seems to do not need it...
I have read a lot about using Celery for creating a queu for FastAPI. However, you can manage a queue in FastAPI without using Celery. What's better? and why?
Does Celery substitute Uvicorn?
No. Celery is not a replacement for Uvicorn. Uvicorn is meant to run your FastAPI application, Celery will not do that for you.
I have read a lot about using Celery for creating a queu for FastAPI. However, you can manage a queue in FastAPI without using Celery. What's better? and why?
I guess you mean the BackgroundTasks here, but that is not a replacement for Celery. FastAPI BackgroundTasks are meant to execute simple tasks (and not CPU bound related tasks).
Answering the question, ideally, you'd have to start both services: Uvicorn, and Celery. You can see an example on how to do it here.
Not that it matters much here, but I'm one of the Uvicorn maintainers.
My site write with django. I need to run some task in the background of container(I using ec2).
Recently, I research Celery. But, it required redis or queue server to run. It makes I cannot using celery because I mustn't install something else.
Question: Can I setup celery stand alone? If yes, how to do this? If no, Are we have any alternative, which can install stand alone?
The answer is - no, you cannot use Celery without a broker (Redis, RabbitMQ, or any other from the list of supported brokers).
I am not aware of a service that does both (queue management AND execution environment for your tasks). Best services follow the UNIX paradigm - "do one thing, and do it right". Service you described above would have to do two different, non-trivial things and that is probably why most likely such service does not exist (at least not in the Python world).
I am fairly new to creating web services in Python. I have created a Flask web service successfully and run it with Gunicorn (as Flaskās built-in server is not suitable for production).
This is how I run my app (with 4 worker nodes).
gunicorn --bind 0.0.0.0:5000 My_Web_Service:app -w 4
The problem is, this only handles 4 requests at a time. I want it to be able to handle potentially 1000's of requests concurrently. Should I be using multi-threading? Any other options/suggestions?
Reading the section on Workers you have to switch to an async worker, which can handle thousands of connections, if your work is IO bound. Using more processes than CPUs is not recommended.
I'd switch from Flask to FastAPI and combine it with either Async IO or (if it's not possible to find non-blocking versions for all your functions) with a multiprocessing pool (not multithreading, which would be still blocked by the GIL and thus slightly slower).
Among production servers gunicorn is probably still the best process manager, but since FastAPI needs ASGI, you need to combine it with uvicorn workers.
I use celery, with python 3 and supervisor in Ubuntu.
I've been working to make a new API, which will get an image from the internet using PIL(Pillow) and save it in a server.
However the problem is that I use Celery as scheduler and in the original API it returns the result in a milisecond, but when I use PIL, the wait becomes almost a second.
So as a solution, I am looking for a way to make the Celery worker run in the background.
Is it possible?
What you probably want is to daemonize your Celery worker.
If you follow the steps provided in the Celery running the worker as a daemon documentation you will be able to do that.
It is a bit of a complicated process, but it will allow the Celery worker to run in the background
I need to run some tasks in background of web app (checking the code out, etc) without blocking the views.
The twist in typical Queue/Celery scenario is that I have to ensure that the tasks will complete, surviving even web app crash or restart until those tasks complete, whatever their final result.
I was thinking about recording parameters for multiprocessing.Pool in a database and starting all the incomplete tasks at webapp restart. It's doable, but I'm wondering if there's a simpler or more cost-effective aproach?
UPDATE: Why not Celery itself? Well, I used Celery in some projects and it's really a great solution, but for this task it's on the big side: it requires a separate server, communication, etc., while all I need is spawning a few processes/threads, doing some work in them (git clone ..., svn co ...) and checking whether they succeeded or failed. Another issue is that I need the solution to be as small as possible since I have to make it follow elaborate corporate guidelines, procedures, etc., and the human administrative and bureaucratic overhead I'd have to go through to get Celery onboard is something I'd prefer to avoid if I can.
I would suggest you to use Celery.
Celery does not require its own server, you can have a worker running on the same machine. You can also have a "poor man's queue" using an SQL database instead of a "real" queue/messaging server such as RabbitMQ - this setup would look very much like what you're describing, only with a separate process doing the long-running tasks.
The problem with starting long-running tasks from the webserver process is that in the production environment the web "workers" are normally managed by the webserver - multiple workers can be spawned or killed at any time. The viability of your approach would highly depend on the web server you're using and its configuration. Also, with multiple workers each trying to do a task you may have some concurrency issues.
Apart from Celery, another option is to look at UWSGI's spooler subsystem, especially if you're already using UWSGI.