Celery worker hangs on ZEO database access (race condition?)

Celery worker hangs on ZEO database access (race condition?) - python

Problem
Celery workers are hanging on task execution when using a package which accesses a ZEO server. However, if I were to access the server directly within tasks.py, there's no problem at all.
Background
I have a program that reads and writes to a ZODB file. Because I want multiple users to be able to access and modify this database concurrently, I have it managed by a ZEO server, which should make it safe across multiple processes and threads. I define the database within a module of my program:
from ZEO import ClientStorage
from ZODB.DB import DB
addr = 'localhost', 8090
storage = ClientStorage.ClientStorage(addr, wait=False)
db = DB(storage)
SSCCE
I'm obviously attempting more complex operations, but let's assume I only want the keys of a root object, or its children. I can produce the problem in this context.
I create dummy_package with the above code in a module, databases.py, and a bare-bones module meant to perform database access:
# main.py
def get_keys(dict_like):
return dict_like.keys()
If I don't try any database access with dummy_package, I can import the database and access root without issue:
# tasks.py
from dummy_package import databases
#task()
def simple_task():
connection = databases.db.open()
keys = connection.root().keys()
connection.close(); databases.db.close()
return keys # Works perfectly
However, trying to pass a connection or a child of root makes the task hang indefinitely.
#task()
def simple_task():
connection = databases.db.open()
root = connection.root()
ret = main.get_keys(root) # Hangs indefinitely
...
If it makes any difference, these Celery tasks are accessed by Django.
Question
So, first of all, what's going on here? Is there some sort of race condition caused by accessing the ZEO server in this way?
I could make all database access Celery's responsibility, but that will make for ugly code. Furthermore, it would ruin my program's ability to function as a standalone program. Is it not possible to interact with ZEO within a routine called by a Celery worker?

Do not save an open connection or its root object as a global.
You need a connection per-thread; just because ZEO makes it possible for multiple threads to access, it sounds like you are using something that is not thread-local (e.g. module-level global in databases.py).
Save the db as a global, but call db.open() during each task. See http://zodb.readthedocs.org/en/latest/api.html#connection-pool

I don't completely understand what's going on, but I'm thinking the deadlock has something to do with the fact that Celery uses multiprocessing by default for concurrency. Switching over to using Eventlet for tasks that need to access the ZEO server solved my problem.
My process
Start up a worker that uses Eventlet, and one that uses standard multiproccesing.
celery is the name of the default queue (for historical reasons), so have the Eventlet worker handle this queue:
$ celery worker --concurrency=500 --pool=eventlet --loglevel=debug \
-Q celery --hostname eventlet_worker
$ celery worker --loglevel=debug \
-Q multiprocessing_queue --hostname multiprocessing_worker
Route tasks which need standard multiprocessing to the appropriate queue. All others will be routed to the celery queue (Eventlet-managed) by default. (If using Django, this goes in settings.py):
CELERY_ROUTES = {'project.tasks.ex_task': {'queue': 'multiprocessing_queue'}}

Related

Django celery redis remove a specific periodic task from queue

There is a specific periodic task that needs to be removed from message queue. I am using the configuration of Redis and celery here.
tasks.py
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
"""
some operations here
"""
There are other periodic tasks also in the project but I need to stop this specific task to stop from now on.
As explained in this answer, the following code will work?
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
pass

In this example periodic task schedule is defined directly in code, meaning it is hard-coded and cannot be altered dynamically without code change and app re-deploy.
The provided code with task logic deleted or with simple return at the beginning - will work, but will not be the answer to the question - task will still run, there just is no code that will run with it.
Also, it is recommended NOT to use #periodic_task:
"""Deprecated decorator, please use :setting:beat_schedule."""
so it is not recommended to use it.
First, change method from being #periodic_task to just regular celery #task, and because you are using Django - it is better to go straightforward for #shared_task:
from celery import shared_task
#shared_task
def task_abcd():
...
Now this is just one of celery tasks, which needs to be called explicitly. Or it can be run periodically if added to celery beat schedule.
For production and if using multiple workers it is not recommended to run celery worker with embedded beat (-B) - run separate instance of celery beat scheduler.
Schedule can specified in celery.py or in django project settings (settings.py).
It is still not very dynamic, as to re-read settings app needs to be reloaded.
Then, use Database Scheduler which will allow dynamically creating schedules - which tasks need to be run and when and with what arguments. It even provides nice django admin web views for administration!

That code will work but I'd go for something that doesn't force you to update your code every time you need to disable/enable the task.
What you could do is to use a configurable variable whose value could come from an admin panel, a configuration file, or whatever you want, and use that to return before your code runs if the task is in disabled mode.
For instance:
#periodic_task(run_every=crontab(minute='*/6'))
def task_abcd():
config = load_config_for_task_abcd()
if not config.is_enabled:
return
# some operations here
In this way, even if your task is scheduled, its operations won't be executed.

If you simply want to remove the periodic task, have you tried to remove the function and then restart your celery service. You can restart your Redis service as well as your Django server for safe measure.
Make sure that the function you removed is not referenced anywhere else.

Google Cloud Python Flexible Environment Multithreaded database worker freezes

I run a flexible service for heavy load on Google App Engine Python Flexible Environment. I run PSQ workers to handle tasks through Pub/Sub.
This is all fine and dandy as long as I work with single-threaded workers. On single threaded workers, if I instantiate a datastore client like so:
from google.cloud import datastore
_client = datastore.Client(project='project-name-kept-private')
... and retrieve an entity:
entity = _client.get(_client.key('EntityKind', 1234))
... it works fine.
However, once I do this exact same thing in a multi-threaded worker, it freezes on the last line:
entity = _client.get(_client.key('EntityKind', 1234))
I know it fails exactly on this line because I user logging.error before and after that specific line like so:
import logging
logging.error('entity test1')
entity = _client.get(_client.key('EntityKind', 1234))
logging.error('entity test2')
The line entity test1 and entity test2 both appear in the logs on a single-threaded worker, but only entity test1 gets printed on a multi-threaded worker. It never finishes the task – it just gets stuck on that line.
Any advice or pointers in the right the direction would be of great help. I've been struggling with this issues for quite some time now.

I figured out what the problem was, when the 'datastore_client' constructs its api client, it uses gRPC by default. Apparently this freezes if you use multithreaded workers. By setting GOOGLE_CLOUD_DISABLE_GRPC to True in environmental variables you force it to use the HTTPDatastoreAPI. This 'fixes' my problem.

Creating separate database connection for every celery worker

I keep running into wierd mysql issues while workers executing tasks just after creation.
We use django 1.3, celery 3.1.17, djorm-ext-pool 0.5
We start celery process with concurrency 3.
My obeservation so far is, when the workers process start, they all get same mysql connecition. We log db connection id as below.
from django.db import connection
connection.cursor()
logger.info("Task %s processing with db connection %s", str(task_id), str(connection.connection.thread_id()))
When all the workers get tasks, the first one executes successfully but the other two gives weird Mysql errors. It either errors with "Mysql server gone away", or with a condition where Django throws "DoesNotExist" error. clearly the objects that Django is querying do exist.
After this error, each worker starts getting its own database connection after which we don't find any issue.
What is the default behavior of celery ? Is it designed to share same database connection. If so how is the inter process communication handled ?
I would ideally prefer different database connection for each worker.
I tried the code mentioned in below link which did not work.
Celery Worker Database Connection Pooling
We have also fixed the celery code suggested below.
https://github.com/celery/celery/issues/2453
For those who downvote the question, kindly let me know the reason for downvote.

Celery is started with below command
celery -A myproject worker --loglevel=debug --concurrency=3 -Q testqueue
myproject.py as part of the master process was making some queries to mysql database before forking the worker processes.
As part of query flow in main process, django ORM creates a sqlalchemy connection pool if it does not already exist. Worker processes are then created.
Celery as part of django fixups closes existing connections.
def close_database(self, **kwargs):
if self._close_old_connections:
return self._close_old_connections() # Django 1.6
if not self.db_reuse_max:
return self._close_database()
if self._db_recycles >= self.db_reuse_max * 2:
self._db_recycles = 0
self._close_database()
self._db_recycles += 1
In effect what could be happening is that, the sqlalchemy pool object with one unused db connection gets copied to the 3 worker process when forked. So the 3 different pools have 3 connection objects pointing to the same connection file descriptor.
Workers while executing the tasks when asked for a db connection, all the workers get the same unused connection from sqlalchemy pool because that is currently unused. The fact that all the connections point to the same file descriptor has caused the MySQL connection gone away errors.
New connections created there after are all new and don't point to the same socket file descriptor.
Solution:
In the main process add
from django.db import connection
connection.cursor()
before any import is done. i.e before even djorm-ext-pool module is added.
That way all the db queries will use connection created by django outside the pool. When celery django fixup closes the connection, the connection actually gets closed as opposed to going back to the alchemy pool leaving the alchemy pool with no connections in it at the time of coping over to all the workers when forked. There after when workers ask for db connection, sqlalchemy returns one of the newly created connections.

uWSGI, cherrypy and threading

preface: I would like to separate these problems into smaller questions, but apparently, I am missing some pieces of the puzzle and it seems impossible to me.
I developed my cherrypy application using cherrypy's built in WSGI server. I naively assumed that when the time comes, I will be able to use created WSGI Application class and deploy it using any WSGI compliant server.
I used this blog post to create my own (but very similar) cherrypy Plugin and Tool to connect to database using SQLAlchemy during http requests.
I expected that any server will somehow work like cherrypy's built in server:
main process will spawn X threads to satisfy X concurrent requests
my engine Plugin will create SQLalchemy engine with connection pool = X (so any request will have its connection)
on request arrival, my Tool will supply sql alchemy connection from pool
This flow does not match with uWSGI (as long as I understand it).
I assign my application.py in uWSGI configuration. This file looks something like this:
cherrypy.tools.db = DbConnectorTool()
cherrypy.engine.dbengine = DbEnginePlugin(cherrypy.engine, settings.database)
cherrypy.config.update({
'engine.dbengine.on': True
})
from myapp.application import Application
root = Application(settings)
application = cherrypy.Application(root, script_name='', config=settings)
I was using this application.py to mount my application into cherrypy's built in server when I was developing and testing it.
The problems are that uWSGI does not create any threads itself and my SQLAlchemy plugin is not working with it, because no cherrypy.engine is created.
Does uWSGI support threading in the meaning of using threads to serve multiple concurrent requests? Can I start these threads in my application.py? Will uWSGI understand it and use these threads for concurrent requests? And how can this be done? I think cherrypy can be used somehow, or not?
And what about my SQLAlchemy Plugin, how can I start cherrypy.engine when using only WSGI cherrypy.Application?
Any help or information that could help me will be appreciated.
Edit:
My uWSGI configuration:
<uwsgi>
<socket>127.0.0.1:9001</socket>
<master/>
<daemonize>/var/log/uwsgi/app.log</daemonize>
<logdate/>
<threads/>
<pidfile>/home/web/uwsgi.pid</pidfile>
<uid>uwsgi</uid>
<gid>uwsgi</gid>
<workers>2</workers>
<harakiri>90</harakiri>
<harakiri-verbose/>
<home>/home/web/</home>
<pythonpath>/home/web/instance</pythonpath>
<module>core.application</module>
<no-orphans/>
<touch-reload>/home/web/uwsgi-reload-web</touch-reload>
</uwsgi>

uWSGI uses worker processes, not threads. It's worth noting that it means that the globals are not shared between all requests any more. You can use SharedArea for global data.
The processes are forked by default, so make sure you're ok with that or adjust settings (see Things to know).
Get Cherrypy's WSGI application with cherrypy.tree.mount(root, config=settings) call.
If your DB plugin does not have threading / shared data issues, chances are it will work. Like you say, you may need cherrypy.engine.start(), but definitely not cherrypy.engine.block(), since your main thread is now uWSGI worker.

You should post your uWSGI config, otherwise it will be hard to understand what is going on.
By the way to spawn additional threads (per worker) you simply need to add --threads N

Django Asynchronous Processing

I have a bunch of Django requests which executes some mathematical computations ( written in C and executed via a Cython module ) which may take an indeterminate amount ( on the order of 1 second ) of time to execute. Also the requests don't need to access the database and are all independent of each other and Django.
Right now everything is synchronous ( using Gunicorn with sync worker types ) but I'd like to make this asynchronous and nonblocking. In short I'd like to do something:
Receive the AJAX request
Allocate task to an available worker ( without blocking the main Django web application )
Worker executes task in some unknown amount of time
Django returns the result of the computation (a list of strings) as JSON whenever the task completes
I am very new to asynchronous Django, and so my question is what is the best stack for doing this.
Is this sort of process something a task queue is well suited for? Would anyone recommend Tornado + Celery + RabbitMQ, or perhaps something else?
Thanks in advance!

Celery would be perfect for this.
Since what you're doing is relatively simple (read: you don't need complex rules about how tasks should be routed), you could probably get away with using the Redis backend, which means you don't need to setup/configure RabbitMQ (which, in my experience, is more difficult).
I use Redis with the most a dev build of Celery, and here are the relevant bits of my config:
# Use redis as a queue
BROKER_BACKEND = "kombu.transport.pyredis.Transport"
BROKER_HOST = "localhost"
BROKER_PORT = 6379
BROKER_VHOST = "0"
# Store results in redis
CELERY_RESULT_BACKEND = "redis"
REDIS_HOST = "localhost"
REDIS_PORT = 6379
REDIS_DB = "0"
I'm also using django-celery, which makes the integration with Django happy.
Comment if you need any more specific advice.

Since you are planning to make it async (presumably using something like gevent), you could also consider making a threaded/forked backend web service for the computational work.
The async frontend server could handle all the light work, get data from databases that are suitable for async (redis or mysql with a special driver), etc. When a computation has to be done, the frontend server can post all input data to the backend server and retrieve the result when the backend server is done computing it.
Since the frontend server is async, it will not block while waiting for the results. The advantage of this as opposed to using celery, is that you can return the result to the client as soon as it becomes available.
client browser <> async frontend server <> backend server for computations

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Celery worker hangs on ZEO database access (race condition?) - python

Related

Django celery redis remove a specific periodic task from queue

Google Cloud Python Flexible Environment Multithreaded database worker freezes

Creating separate database connection for every celery worker

uWSGI, cherrypy and threading

Django Asynchronous Processing

Categories

Resources