Tornado multiple processes: create multiple MySQL connections

Tornado multiple processes: create multiple MySQL connections - python

I'm running a Tornado HTTPS server across multiple processes using the first method described here http://www.tornadoweb.org/en/stable/guide/running.html (server.start(n))
The server is connected to a local MySQL instance and I would like to have a independent MySQL connection per Tornado process.
However, right now I only have one MySQL connection according to the output of SHOW PROCESSLIST. I guess this happens because I establish the connection before calling server.start(n) and IOLoop.current().start() right?
What I don't really understand is whether the processes created after calling server.start(n) share some data (for instance, global variables within the same module) or are totally independent.
Should I establish the connection after calling server.start(n) ? Or after calling IOLoop.current().start() ? If I do so, will I have one MySQL connection per Tornado process?
Thanks

Each child process gets a copy of the variables that existed in the parent process when start(n) was called. For things like connections, this will usually cause problems. When using multi-process mode, it's important to do as little as possible before starting the child processes, so don't create the mysql connections until after start(n) (but before IOLoop.start(); IOLoop.start() doesn't return until the server is stopped).

Related

SqlAlchemy recreates pool after engine.dispose()

I am using sqlalchemy with pandas.to_sql() to copy some data into SQL server. After the copying is done and engine.dispose() is called, I see the following INFO message in logs:
[INFO] sqlalchemy.pool.impl.QueuePool: Pool recreating
I was wondering if this message means that even though I dispose of the engine, the connection is still being kept live. And if so, what would be the safe and correct way to do it?

The connection is not alive. But you can restart the connection with the help of the Pool object.
This is described in detail in the documentation:
The Engine has logic which can detect disconnection events and refresh the pool automatically.
When the Connection attempts to use a DBAPI connection, and an exception is raised that corresponds to a “disconnect” event, the connection is invalidated. The Connection then calls the Pool.recreate() method, effectively invalidating all connections not currently checked out so that they are replaced with new ones upon next checkout.
Also check out the code example in the link. It is really neat.

If there is a connection which is already checked out from the pool, those connections will still be alive as they are being referenced by something.
You may refer to following links for detailed information.
https://github.com/sqlalchemy/sqlalchemy/blob/master/lib/sqlalchemy/engine/base.py#L2512-L2539
https://docs.sqlalchemy.org/en/13/core/connections.html#engine-disposal
https://docs.sqlalchemy.org/en/13/core/connections.html#sqlalchemy.engine.Engine.dispose
If you are using QueuePool (its by default if you don't specify any poolClass when creating engine object) and doesn't want any connections to be kept alive then you may close the connection [conn.close() or session.close()] which in-turn returns the connection back to the pool (called checked-in connection). Later when you call engine.dispose() after you copy job is done, that will take care of closing the connection really and won't be keep any checked-in connections alive

Reusing database connection for multiple requests

If I don't need transactions, can I reuse the same database connection for multiple requests?
Flask documentation says:
Because database connections encapsulate a transaction, we also need to make sure that only one request at the time uses the connection.
Here's how I understand the meaning of the above sentence:
Python DB-API connection can only handle one transaction at a time; to start a new transaction, one must first commit or roll back the previous one. So if each of our requests needs its own transaction, then of course each request needs its own database connection.
Please let me know if I got it wrong.
But let's say I set autocommit mode, and handle each request in a single SQL statement. Or, alternatively, let's say I only read - not write - to the database. In either case, it seems I can just reuse the same database connection for all my requests to save the overhead of multiple connections. But I'm not sure if there's any downside to this approach.
Edit: I can see one issue with what I'm proposing: each request might be handled by a different process. Since connections should probably not be reused across processes, let me clarify my question: I mean creating one connection per process, and using it for all requests that happen to be handled by this process.
On the other hand, the whole point of (green or native) threads is usually to serve one request per thread, so my proposed approach implies sharing connection across threads. It seems one connection can be used concurrently in multiple native threads, but not in multiple green threads.
So let's say for concreteness my environment is flask + gunicorn with multiple multi-threaded sync workers.

Based on #Craig Ringer comment on a different question, I think I know the answer.
The only possible advantage of connection sharing is performance (other factors - like transaction encapsulation and simplicity - favor a separate connection per request). And since a connection can't be shared across processes or green threads, it only has a chance with native threads. But psycopg2 (and presumably other drivers) doesn't allow concurrent access from the same connection. So unless each request spends very little time talking to the database, there is likely a performance hit, not benefit, from connection sharing.

python - number of pyro connections

I'm using python and writing something that connects to a remote object using Pyro4
When running some unit tests (using pyunit) that repeatedly connects to a remote object with pyro, I found I couldn't run more than 9 tests or the tests would get stuck and just hang there.
I've now managed to fix this by using
with Pyro4.Proxy(PYRONAME:name) as pyroObject:
do something with object...
whereas before I was creating the object in the test set up:
def setUp(self):
self.pyroObject = Pyro4.Proxy(PYRONAME:name)
and then using self.pyroObject within the tests
Does anyone know why this has fixed the issue? Thanks

When you're not cleaning up the proxy objects they keep a connection live to the pyro daemon. By default the daemon accepts 16 concurrent connections.
If you use the with.. as... syntax, you're closing the proxy cleanly after you've done using it and this releases a connection in the daemon, making it available for a new proxy.
You can increase the number of 16 by increasing Pyro's threadpool size via the config. Alternatively you could perhaps use the multiplex server type instead of the default threaded one.

How is wsgi application handled by apache server with mod_wsgi

I'm from PHP/Apache background. With PHP/Apache setup PHP interpreter is loaded by apache module and then on every page request new worker is created that executes entire script. I've been working with WSGI applications in Python recently and it seems that Apache (mod_wsgi) server loads the entire application and keeps it alive. Then based on incoming requests executes pieces of code. If my understanding is correct it would explain why some of my objects won't execute __del__?
Edit:
In my case I have a wrapper class around python mysql module. There's only one instance used in entire application. It's responsibilities are to run queries and reuse mysql connection throughout the code when possible, but I have to make sure connection will be closed when everything is processed. In some parts I'm using multiprocessing where I tell the object not to reuse connection for spawn child processes. Initially I thought I can use __del__ to implement closing mysql connection but noticed It would be never called. I did end up using flask teardown function to make sure connection is closed after each request but was wondering if there are any other options to handle this nicely.

How to manage db connection especially in case of multi threading

I am working on an online judge.I am using python 2.7 and Mysql ( as I am working on back end-part)
My Method:
I create a main thread which pulls out submissions from database( 10 at a time) and puts them in a queue.Then I have multiple threads that take submissions from queue, evaluate it and write the result back to database.
Now I have some doubts(I know they are doubts from different topics but approach to some of them also is highly appreciated).
Currently when I start the threads I give them their own db connections, Which they use.Is this a good practice to give one connection per thread. Does sharing of connections between threads create problems.How do I go about this.
My main thread uses a single connection as its only work is to pull submissions from db and put then in queue(also update their status in db to Assessing Submission). But sometimes I get the error: Lost connection to Mysql server while querying. I keep getting it even when I stop the program and start it again.What do I do about it? Also should I implement a Pool of connections for only the main thread?
Also does a db connection stay alive for ever? What to do when its session memory etc gets exhausted how to handle that?

Use a connection pool. Sharing the database connection is not always bad but you have to be careful about it. You can try SQLAlchemy to manage a lot of this for you: http://docs.sqlalchemy.org/en/rel_0_8/orm/session.html#unitofwork-contextual
The server might be out of connections, your connection might have been killed because it uses too many resources.. etc. A connection pool could help you solve this.
It all depends, it could stay alive indefinitely theoretically, but usually you have a timeout somewhere.

If you give the same connection to every thread then the threads will not be able to query the database and race condition will occur. So you need to provide separate connection to every thread and indeed it is a good idea. Use a Connection Pool for the purpose it will help you get different connections.
Connection Pool will surely help.
Release the connection once your work is over. There is a limit to connection which is termed as connection time out. So you need to use some third party library to handle that, c3p0 is a good library which can help you in this.
Please refer the below link to configure it:
Best configuration of c3p0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Tornado multiple processes: create multiple MySQL connections - python

Related

SqlAlchemy recreates pool after engine.dispose()

Reusing database connection for multiple requests

python - number of pyro connections

How is wsgi application handled by apache server with mod_wsgi

How to manage db connection especially in case of multi threading

Categories

Resources