I keep running into wierd mysql issues while workers executing tasks just after creation.
We use django 1.3, celery 3.1.17, djorm-ext-pool 0.5
We start celery process with concurrency 3.
My obeservation so far is, when the workers process start, they all get same mysql connecition. We log db connection id as below.
from django.db import connection
connection.cursor()
logger.info("Task %s processing with db connection %s", str(task_id), str(connection.connection.thread_id()))
When all the workers get tasks, the first one executes successfully but the other two gives weird Mysql errors. It either errors with "Mysql server gone away", or with a condition where Django throws "DoesNotExist" error. clearly the objects that Django is querying do exist.
After this error, each worker starts getting its own database connection after which we don't find any issue.
What is the default behavior of celery ? Is it designed to share same database connection. If so how is the inter process communication handled ?
I would ideally prefer different database connection for each worker.
I tried the code mentioned in below link which did not work.
Celery Worker Database Connection Pooling
We have also fixed the celery code suggested below.
https://github.com/celery/celery/issues/2453
For those who downvote the question, kindly let me know the reason for downvote.
Celery is started with below command
celery -A myproject worker --loglevel=debug --concurrency=3 -Q testqueue
myproject.py as part of the master process was making some queries to mysql database before forking the worker processes.
As part of query flow in main process, django ORM creates a sqlalchemy connection pool if it does not already exist. Worker processes are then created.
Celery as part of django fixups closes existing connections.
def close_database(self, **kwargs):
if self._close_old_connections:
return self._close_old_connections() # Django 1.6
if not self.db_reuse_max:
return self._close_database()
if self._db_recycles >= self.db_reuse_max * 2:
self._db_recycles = 0
self._close_database()
self._db_recycles += 1
In effect what could be happening is that, the sqlalchemy pool object with one unused db connection gets copied to the 3 worker process when forked. So the 3 different pools have 3 connection objects pointing to the same connection file descriptor.
Workers while executing the tasks when asked for a db connection, all the workers get the same unused connection from sqlalchemy pool because that is currently unused. The fact that all the connections point to the same file descriptor has caused the MySQL connection gone away errors.
New connections created there after are all new and don't point to the same socket file descriptor.
Solution:
In the main process add
from django.db import connection
connection.cursor()
before any import is done. i.e before even djorm-ext-pool module is added.
That way all the db queries will use connection created by django outside the pool. When celery django fixup closes the connection, the connection actually gets closed as opposed to going back to the alchemy pool leaving the alchemy pool with no connections in it at the time of coping over to all the workers when forked. There after when workers ask for db connection, sqlalchemy returns one of the newly created connections.
Related
I am working on an application based on Celery/RabbitMQ with Celery tasks writing to Mongo 4.2.2 (using PyMongo 3.9.0). Mongo database is a replica set, everything runs in a Kubernetes cluster. Occasionally, writing to Mongo starts to fail for several inserts (40 to 300 failures), with the odd ServerSelectionTimeoutError - no primary available for writes, even though there is no primary election taking place, all secondaries seem to be available. This is pretty much impossible to reproduce while testing, but it happens randomly in Production environments, of course. Outside Celery tasks, it's not happening. I also tried having each task using its own MongoClient instance, same result.
I am using gevent for Celery workers:
--pool gevent --concurrency 128 --prefetch-multiplier 64 --without-heartbeat --without-gossip --without-mingle -Ofair
The task is basically using MongoClient as a singleton (simplified snippet):
class HttpCallTask(Task):
def __init__(self):
self._mongo_client = None
#property
def mongo_client(self):
"""Set up a singleton mongo client for this worker."""
if not self._mongo_client:
self._mongo_client = new_mongo_client_from_env()
return self._mongo_client
def run(self, notification, notification_template=None):
try:
notification_id = self.mongo_client[DATABASE][COLLECTION].insert_one(notification_template)
Mongo client connection is based on some env variables and is using default options, except server selection timeout:
mongo_client = MongoClient(
MONGO_URI,
serverSelectionTimeoutMS=10000
)
is reusing MongoClient instance a bad idea? It seems to improve performance a lot and I have seen the same problem when using a new instance of MongoClient in every task
is the PyMongo connection pooling configuration relevant for my case?
I am using sqlalchemy with pandas.to_sql() to copy some data into SQL server. After the copying is done and engine.dispose() is called, I see the following INFO message in logs:
[INFO] sqlalchemy.pool.impl.QueuePool: Pool recreating
I was wondering if this message means that even though I dispose of the engine, the connection is still being kept live. And if so, what would be the safe and correct way to do it?
The connection is not alive. But you can restart the connection with the help of the Pool object.
This is described in detail in the documentation:
The Engine has logic which can detect disconnection events and refresh the pool automatically.
When the Connection attempts to use a DBAPI connection, and an exception is raised that corresponds to a “disconnect” event, the connection is invalidated. The Connection then calls the Pool.recreate() method, effectively invalidating all connections not currently checked out so that they are replaced with new ones upon next checkout.
Also check out the code example in the link. It is really neat.
If there is a connection which is already checked out from the pool, those connections will still be alive as they are being referenced by something.
You may refer to following links for detailed information.
https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/engine/base.py#L2512-L2539
https://docs.sqlalchemy.org/en/13/core/connections.html#engine-disposal
https://docs.sqlalchemy.org/en/13/core/connections.html#sqlalchemy.engine.Engine.dispose
If you are using QueuePool (its by default if you don't specify any poolClass when creating engine object) and doesn't want any connections to be kept alive then you may close the connection [conn.close() or session.close()] which in-turn returns the connection back to the pool (called checked-in connection). Later when you call engine.dispose() after you copy job is done, that will take care of closing the connection really and won't be keep any checked-in connections alive
I'm running a Tornado HTTPS server across multiple processes using the first method described here http://www.tornadoweb.org/en/stable/guide/running.html (server.start(n))
The server is connected to a local MySQL instance and I would like to have a independent MySQL connection per Tornado process.
However, right now I only have one MySQL connection according to the output of SHOW PROCESSLIST. I guess this happens because I establish the connection before calling server.start(n) and IOLoop.current().start() right?
What I don't really understand is whether the processes created after calling server.start(n) share some data (for instance, global variables within the same module) or are totally independent.
Should I establish the connection after calling server.start(n) ? Or after calling IOLoop.current().start() ? If I do so, will I have one MySQL connection per Tornado process?
Thanks
Each child process gets a copy of the variables that existed in the parent process when start(n) was called. For things like connections, this will usually cause problems. When using multi-process mode, it's important to do as little as possible before starting the child processes, so don't create the mysql connections until after start(n) (but before IOLoop.start(); IOLoop.start() doesn't return until the server is stopped).
I am working on an online judge.I am using python 2.7 and Mysql ( as I am working on back end-part)
My Method:
I create a main thread which pulls out submissions from database( 10 at a time) and puts them in a queue.Then I have multiple threads that take submissions from queue, evaluate it and write the result back to database.
My doubts:
1.The main thread and the other threads have their own database connections assigned to them in
beginning.But I guess it not a good process because sometimes I get the error : Lost connection to mysql server while querying which I guess is when resources of a db connection are exhausted.Then I looked up psqlpool.So I want to know whether the connections
provided by pool are dedicated or shared(I want dedicated).
2.Also when I stop my main thread all others threads stop(as daemon for them is set true) but the db connections are not closed(as I stop main thread by Ctrl-Z).So next time I again
start my program there are issues of Lock wait timeout exceeded; try restarting transaction.Which are due to previous connections which were not closed.Rather than manually killing from show full processlist is there any other method.Also how will we solve it in case of psqlpool or is already handled by the library.
Problem
Celery workers are hanging on task execution when using a package which accesses a ZEO server. However, if I were to access the server directly within tasks.py, there's no problem at all.
Background
I have a program that reads and writes to a ZODB file. Because I want multiple users to be able to access and modify this database concurrently, I have it managed by a ZEO server, which should make it safe across multiple processes and threads. I define the database within a module of my program:
from ZEO import ClientStorage
from ZODB.DB import DB
addr = 'localhost', 8090
storage = ClientStorage.ClientStorage(addr, wait=False)
db = DB(storage)
SSCCE
I'm obviously attempting more complex operations, but let's assume I only want the keys of a root object, or its children. I can produce the problem in this context.
I create dummy_package with the above code in a module, databases.py, and a bare-bones module meant to perform database access:
# main.py
def get_keys(dict_like):
return dict_like.keys()
If I don't try any database access with dummy_package, I can import the database and access root without issue:
# tasks.py
from dummy_package import databases
#task()
def simple_task():
connection = databases.db.open()
keys = connection.root().keys()
connection.close(); databases.db.close()
return keys # Works perfectly
However, trying to pass a connection or a child of root makes the task hang indefinitely.
#task()
def simple_task():
connection = databases.db.open()
root = connection.root()
ret = main.get_keys(root) # Hangs indefinitely
...
If it makes any difference, these Celery tasks are accessed by Django.
Question
So, first of all, what's going on here? Is there some sort of race condition caused by accessing the ZEO server in this way?
I could make all database access Celery's responsibility, but that will make for ugly code. Furthermore, it would ruin my program's ability to function as a standalone program. Is it not possible to interact with ZEO within a routine called by a Celery worker?
Do not save an open connection or its root object as a global.
You need a connection per-thread; just because ZEO makes it possible for multiple threads to access, it sounds like you are using something that is not thread-local (e.g. module-level global in databases.py).
Save the db as a global, but call db.open() during each task. See http://zodb.readthedocs.org/en/latest/api.html#connection-pool
I don't completely understand what's going on, but I'm thinking the deadlock has something to do with the fact that Celery uses multiprocessing by default for concurrency. Switching over to using Eventlet for tasks that need to access the ZEO server solved my problem.
My process
Start up a worker that uses Eventlet, and one that uses standard multiproccesing.
celery is the name of the default queue (for historical reasons), so have the Eventlet worker handle this queue:
$ celery worker --concurrency=500 --pool=eventlet --loglevel=debug \
-Q celery --hostname eventlet_worker
$ celery worker --loglevel=debug \
-Q multiprocessing_queue --hostname multiprocessing_worker
Route tasks which need standard multiprocessing to the appropriate queue. All others will be routed to the celery queue (Eventlet-managed) by default. (If using Django, this goes in settings.py):
CELERY_ROUTES = {'project.tasks.ex_task': {'queue': 'multiprocessing_queue'}}