I want to create and use a connection pool for an app I'm making. I'm trying to figure out if I need to close the connection pool itself when exiting the app or not. I know how to and when to close the connections to the connection pool. I can't seem to find an answer as to whether I need to close the connection pool itself, or how to do it for that matter. The actual connection to SQL, not the connections to the pool.
Sure thing, no worries! Just grab SQLAlchemy's Connection Pools and go to town.
Really, though, unless you're doing this for exercise and are willing to think through all of the corner cases (and as you can tell from the length of that manual page, they're not few), don't implement connection pooling yourself.
Related
My python application uses concurrent.futures.ProcessPoolExecutor with 5 workers and each process makes multiple database queries.
Between the choice of giving each process its own db client, or alternatively , making all process to share a single client, which is considered more safe and conventional?
Short answer: Give each process (that needs it) its own db client.
Long answer: What problem are you trying to solve?
Sharing a DB client between processes basically doesn't happen; you'd have to have the one process which does have the DB client proxy the queries from the others, using more-or-less your own protocol. That can have benefits, if that protocol is specific to your application, but it will add complexity: you'll now have two different kinds of workers in your program, rather than just one kind, plus the protocol between them. You'd want to make sure that the benefits outweigh the additional complexity.
Sharing a DB client between threads is usually possible; you'd have to check the documentation to see which objects and operations are "thread-safe". However, since your application is otherwise CPU-heavy, threading is not suitable, due to Python limitations (the GIL).
At the same time, there's little cost to having a DB client in each process; you will in any case need some sort of client, it might as well be the direct one.
There isn't going to be much more IO, since that's mostly based on the total number of queries and amount of data, regardless of whether that comes from one process or gets spread among several. The only additional IO will be in the login, and that's not much.
If you're running out of connections at the database, you can either tune/upgrade your database for more connections, or use a separate off-the-shelf "connection pooler" to share them; that's likely to be much better than trying to implement a connection pooler from scratch.
More generally, and this applies well beyond this particular question, it's often better to combine several off-the-shelf pieces in a straightforward way, than it is to try to put together a custom complex piece that does the whole thing all at once.
So, what problem are you trying to solve?
It is better to use multithreading or asynchronous approach instead of multiprocessing because it will consume fewer resources. That way you could use a single db connection, but I would recommend creating a separate session for each worker or coroutine to avoid some exceptions or problems with locking.
I'm programming a bit of server code and the MQTT side of it runs in it's own thread using the threading module which works great and no issues but now I'm wondering how to proceed.
I have two MariaDB databases, one of them is local and the other is remote (There is a good and niche reason for this.) and I'm writing a class which handles the databases. This class will start new threads of classes that submits the data to their respected databases. If conditions are true, then it tells the data to start a new thread to push data to one database, if they are false, the data will go to the other database. The MQTT thread has a instance of the "Database handler" class and passes data to it through different calling functions within the class.
Will this work to allow a thread to concentrate on MQTT tasks while another does the database work? There are other threads as well, I've just never combined databases and threads before so I'd like an opinion or any information that would help me out from more seasoned programmers.
Writing code that is "thread safe" can be tricky. I doubt if the Python connector to MySQL is thread safe; there is very little need for it.
MySQL is quite happy to have multiple connections to it from clients. But they must be separate connections, not the same connection running in separate threads.
Very few projects need multi-threaded access to the database. Do you have a particular need? If so let's hear about it, and discuss the 'right' way to do it.
For now, each of your threads that needs to talk to the database should create its own connection. Generally, such a connection can be created soon after starting the thread (or process) and kept open until close to the end of the thread. That is, normally you should have only one connection per thread.
This question already has answers here:
What is the best solution for database connection pooling in python?
(9 answers)
Closed 5 years ago.
I'm using Python's multiprocessing package to set up worker threads to do work and update the result to a MySQL database. What's the right way to set this up so that a database connection is not re-established each time a worker thread is initialized?
Each thread must use its own separate connection.
MySQL protocol is not stateless (like for example http). If you try to use a single MySQL connection among multiple threads, the server gets confused about which request it's responding to, and the client threads get confused because the wrong thread might read a response.
The same is true for any other stateful protocol, like for example ftp.
A better way to reduce overhead is to use a connection pool. Each thread requests a connection in the thread's initialization, and the pool manager assigns the thread exclusive use of one of the connections from the pool, until the thread is done with it. Then it returns the connection to the pool, where it will be allocated to another thread requesting a connection.
Even better is to have threads request a connection not in the thread initialization, but later on, when it actually needs to do some database work. Then release the connection when that db work is done. If it's very low-overhead to request a connection from the pool, there's no reason to hold on to the connection for the life of the thread.
It's better to share!
I've built a server listening on a specific port on my server using Python (asyncore and sockets) and I was curious to know if there was anything possible to do when there is too many people connecting at once on my server.
The code in itself cannot be changed, but will adding more process works? or is it from an hardware perspective and I should focus on adding a load balancer in front and balancing the requests on multiple servers?
This questions is borderline StackOverflow (code/python) and ServerFault (server management). I decided to go with SO because of the code, but if you think ServerFault is better, let me know.
1.
asyncore relies on operating system for whole connection handling, therefore what you are asking is OS dependent. It has very little to do with Python. Using twisted instead of asyncore wouldn't solve your problem.
On Windows, for example, you can listen only for 5 connections coming in simultaneously.
So, first requirement is, run it on *nix platform.
The rest depends on how long your handlers are taking and on your bandwith.
2.
What you can do is combine asyncore and threading to speed-up waiting for next connection.
I.e. you can make Handlers that are running in separate threads. It will be a little messy but it is one of possible solutions.
When server accepts a connection, instead of creating new traditional handler (which would slow down checking for following connection - because asyncore waits until that handler does at least a little bit of its job), you create a handler that deals with read and write as non-blocking.
I.e. it starts a thread and does the job, then, when it has data ready, only then sends it upon following loop()'s check.
This way, you allow asyncore.loop() to check the server's socket more often.
3.
Or you can use two different socket_maps with two different asyncore.loop()s.
You use one map (dictionary), let say the default one - asyncore.socket_map to check the server, and use one asyncore.loop(), let say in main thread, only for server().
And you start the second asyncore.loop() in a thread using your custom dictionary for client handlers.
So, One loop is checking only server that accepts connections, and when it arrives, it creates a handler which goes in separate map for handlers, which is checked by another asyncore.loop() running in a thread.
This way, you do not mix the server connection checks and client handling. So, server is checked immediately after it accepts one connection. The other loop balances between clients.
If you are determined to go even faster, you can exploit the multiprocessor computers by having more maps for handlers.
For example, one per CPU and as many threads with asyncore.loop()s.
Note, sockets are IO operations using system calls and select() is one too, therefore GIL is released while asyncore.loop() is waiting for results. This means, that you will have total advantage of multithreading and each CPU will deal with its number of clients in literally parallel way.
What you would have to do is make the server distributing the load and starting threading loops upon connection arrivals.
Don't forget that asyncore.loop() ends when the map empties. So the loop() in a thread that manages clients must be started when new connection is accepted and restarted if at some time there are no more connections present.
4.
If you want to be able to run your server on multiple computers and use them as a cluster, then you install the process balancer in front.
I do not see the serious need for it if you wrote the asyncore server correctly and want to run it on single computer only.
I'm not sure if I'm understanding the use case for DB connection pools (eg: psycopg2.pool and mysql.connector.pooling) in python. It seems to me that parallelism is usually achieved in python using a multi-process rather than a multi-thread approach because of the GIL, and that in the multi-process case these pools are not very useful since each process will initialize its own pool and will only have a single thread running at a time. Is this correct? Is there any strategy for sharing a DB connection pool when using multiple processes, and if not is the usefulness of pooling limited to multi-threaded python applications or are there other scenarios where you would use them?
Keith,
You're on the right track. As mentioned in the S.O post "Accessing a MySQL connection pool from Python multiprocessing,":
Making a seperate pool for each process is redundant and opens up way
too many connections.
Check out the other S.O post, "What is the best solution for database connection pooling in python?", it contains a sample pooling solution in python. This post also discusses the limitations of db-pooling if your application were to become multi-threaded:
Making your own connection pool is a BAD idea if your app ever decides to start using
multi-threading. Making a connection pool for a multi-threaded application is much
more complicated than one for a single-threaded application. You can use something
like PySQLPool in that case.
In-terms of implementing db pooling in python, as mentioned in "Application vs Database Resident Connection Pool," if your database supports it, the best implementation would involve:
Let connection pool be maintained and managed by database itself
(example: Oracle's DRCP) and calling modules just ask connections from the connection
broker described by Oracle DRCP.
Please let me know if you have any questions!