My python application uses concurrent.futures.ProcessPoolExecutor with 5 workers and each process makes multiple database queries.
Between the choice of giving each process its own db client, or alternatively , making all process to share a single client, which is considered more safe and conventional?
Short answer: Give each process (that needs it) its own db client.
Long answer: What problem are you trying to solve?
Sharing a DB client between processes basically doesn't happen; you'd have to have the one process which does have the DB client proxy the queries from the others, using more-or-less your own protocol. That can have benefits, if that protocol is specific to your application, but it will add complexity: you'll now have two different kinds of workers in your program, rather than just one kind, plus the protocol between them. You'd want to make sure that the benefits outweigh the additional complexity.
Sharing a DB client between threads is usually possible; you'd have to check the documentation to see which objects and operations are "thread-safe". However, since your application is otherwise CPU-heavy, threading is not suitable, due to Python limitations (the GIL).
At the same time, there's little cost to having a DB client in each process; you will in any case need some sort of client, it might as well be the direct one.
There isn't going to be much more IO, since that's mostly based on the total number of queries and amount of data, regardless of whether that comes from one process or gets spread among several. The only additional IO will be in the login, and that's not much.
If you're running out of connections at the database, you can either tune/upgrade your database for more connections, or use a separate off-the-shelf "connection pooler" to share them; that's likely to be much better than trying to implement a connection pooler from scratch.
More generally, and this applies well beyond this particular question, it's often better to combine several off-the-shelf pieces in a straightforward way, than it is to try to put together a custom complex piece that does the whole thing all at once.
So, what problem are you trying to solve?
It is better to use multithreading or asynchronous approach instead of multiprocessing because it will consume fewer resources. That way you could use a single db connection, but I would recommend creating a separate session for each worker or coroutine to avoid some exceptions or problems with locking.
Related
I'm having a python 3.8+ program using Django and Postgresql which requires multiple threads or processes. I cannot use threads since the GLI will restrict them to a single process which results in an awful performance (especially since most of the threads are CPU bound).
So the obvious solution was to use the multiprocessing module. But I've encountered several problems:
When using spawn to generate new processes, I get the "Apps aren't loaded yet" error when the new process imports the Django models. This is because the new process doesn't have the database connection given to the main process by python manage.py runserver. I circumvented it by using fork instead of spawn (like advised here) so the connections are copied to the other processes but I feel like this is not the best solution and there should be a clean way to start new processes with the necessary connections.
When several of the processes simultaneously access the database, sometimes false results are given back (partly even from wrong models / relations) which crashes the program. This can happen in the initial startup when fetching data but also when the program is running. I tried to use ISOLATION LEVEL SERIALIZABLE (like advised here) by adding it in the options in the database settings but that didn't work.
A possible solution might be using custom locks that are given to every process but that doesn't feel like a good solution as well.
So in general, the question is: Is there a good and clean way to use multiprocessing in Django without these issues? A way that new processes have the database connections without needing to rely on fork and that all processes can just access the database without having any race conditions sometimes producing false results like this?
One important thing: I don't use a Pool since the processes aren't running the same simple task. The processes are each running different specific tasks, share data via multiprocessing Signals, Queues, Values and Namespaces (shared memory) and new processes can be triggered by user interaction (websockets).
I've tried to look into Celery since this has been recommended on a lot of questions about Django and multiprocessing but I wouldn't know how to use something like that in the project structure with the specific different processes that need to be created at specific points and the data that gets transferred over the Queues, Signals, Values and Namespaces in the existing project.
Thank you for reading; any help is appreciated!
With every new process, a setup function calling Django.setup() is first called before executing the real function. My hope was that with this way, every process would create an independent connection to the database so that the current system could work.
Yes - you can do that with initializer,
as explained in my other answer from yesteryear.
However, it still throws errors like django.db.utils.OperationalError: lost synchronization with server: got message type "1", length 976434746
That means you're using the fork start method for subprocesses, and any database connections and their state has been forked into the subprocesses too, and they will be out of sync when used by multiple processes.
You'll need to close them:
def subprocess_setup():
django.setup()
from django.db import connections
for conn in connections.all():
conn.close()
with ProcessPoolExecutor(max_workers=5, initializer=subprocess_setup) as executor:
I'm writing a Python server for a telnet-like protocol. Clients connect and authenticate a session, and then issue a series of commands that each have a response. The sessions have state, in the sense that a user authenticates once and then it's assumed that subsequent commands are performed by that user. The command/response operations in different sessions are effectively independent, although they do involve reads and occasional writes to a shared IO resource (postgres) that is largely capable of managing its own concurrency.
It's a design goal to support a large number of users with a small number of 8 or 16-core servers. I'm looking for a reasonably efficient way to architect the server implementation.
Some options I've considered include:
Using threads for each session; I suspect with the GIL this will make poor use of available cores
Using multiple processes for each session; I suspect that with a high ratio of sessions to servers (1000-to-1, say) the overhead of 1000 python interpreters may exceed memory limitations. You also have a "slow start" problem when a user connects.
Assigning sessions to process pools of 32 or so processes; idle sessions may get assigned to all 32 processes and prevent non-idle sessions from being processed.
Using some type of "routing" system where all sessions are handled by a single process and then individual commands are farmed out to a process pool. This still sounds substantially single-threaded to me (as there's a big single-threaded bottleneck), and this system may introduce substantial overhead if some commands are very trivial but must cross an IPC boundary two times and wait for a free process to get a response.
Use Jython/IronPython and multithreading; lack of C extensions is a concern
Python isn't a good fit for this problem; use Go/C++/Scala/Java either as a router for Python processes or abandon Python completely.
Using threads for each session; I suspect with the GIL this will make poor use of available cores
Is your code actually CPU-bound?* If it spends all its time waiting on I/O, then the GIL doesn't matter at all.** So there's absolutely no reason to use processes, or a GIL-less Python implementation.
Of course if your code is CPU-bound, then you should definitely use processes or a GIL-less implementation. But in that case, you're really only going to be able to efficiently handle N clients at a time with N CPUs, which is a very different problem than the one you're describing. Having 10000 users all fighting to run CPU-bound code on 8 cores is just going to frustrate all of them. The only way to solve that is to only handle, say, 8 or 32 at a time, which means the whole "10000 simultaneous connections" problem doesn't even arise.
So, I'll assume your code I/O-bound and your problem is a sensible and solvable one.
There are other reasons threads can be limiting. In particular, if you want to handle 10000 simultaneous clients, your platform probably can't run 10000 simultaneous threads (or can't switch between them efficiently), so this will not work. But in that case, processes usually won't help either (in fact, on some platforms, they'll just make things a lot worse).
For that, you need to use some kind of asynchronous networking—either a proactor (a small thread pool and I/O completion), or a reactor (a single-threaded event loop around an I/O readiness multiplexer). The Socket Programming HOWTO in the Python docs shows how to do this with select; doing it with more powerful mechanisms is a bit more complicated, and a lot more platform-specific, but not that much harder.
However, there are libraries that make this a lot easier. Python 3.4 comes with asyncio,*** which lets you abstract all the obnoxious details out and just write protocols that talk to transports via coroutines. Under the covers, there's either a reactor or a proactor (and a good one for each platform), without you having to worry about it.
If you can't wait for 3.4 to be finalized, or want to use something that's less-bleeding-edge, there are popular third-party frameworks like Twisted, which have other advantages as well.****
Or, if you prefer to think in a threaded paradigm, you can use a library like gevent, while uses greenlets to fake a bunch of threads on a single socket on top of a reactor.
From your comments, it sounds like you really have two problems:
First, you need to handle 10000 connections that are mostly sitting around doing nothing. The actual scheduling and multiplexing of 10000 connections is itself a major I/O bound if you try to do it with something like select, and as I said about, running 10000 threads or processes is not going to work. So, you need a good proactor or reactor for your platform, which is all described above.
Second, a few of those connections will be alive at a time.
First, for simplicity, let's assume it's all CPU-bound. So you will want processes. In particular, you want a pool of N processes, where N is the number of cores. Which you do by just creating a concurrent.futures.ProcessPoolExecutor() or multiprocessing.Pool().
But you claim they're doing a mix of CPU-bound and I/O-bound work. If all the tasks spend, say, 1/4th of their time burning CPU, use 4N processes instead. There's a bit of wasted overhead in context switching, but you're unlikely to notice it. You can get N as n = multiprocessing.cpu_count(); then use ProcessPoolExecutor(4*n) or Pool(4*n). If they're not that consistent or predictable, you can still almost always pretend they are—measure average CPU time over a bunch of tasks, and use n/avg. You can fudge this up or down depending on whether you're more concerned with maximizing peak performance or typical performance, but it's just one knob to twiddle, and you can just twiddle it empirically.
And that's it.*****
* … and in Python or in C extensions that don't release the GIL. If you're using, e.g., NumPy, it will do much of its slow work without holding the GIL.
** Well, it matters before Python 3.2. But hopefully if you're already using 3.x you can upgrade to 3.2+.
*** There's also asyncore and its friend asynchat, which have been in the stdlib for decades, but you're better off just ignoring them.
**** For example, frameworks like Twisted are chock full of protocol implementations and wrappers and adaptors and so on to tie all kinds of other functionality in without having to write a mess of complicated code yourself.
***** What if it really isn't good enough, and the task switching overhead or the idleness when all of your tasks happen to be I/O-waiting at the same time kills performance? Well, those are both very unlikely except in specific kinds of apps. If it happens, you will need to either break your tasks up to separate out the actual CPU-bound subtasks from the I/O-bound, or write some kind of application-specific adaptive load balancer.
I've got a fairly simple Python program as outlined below:
It has 2 threads plus the main thread. One of the threads collects some data and puts it on a Queue.
The second thread takes stuff off the queue and logs it. Right now it's just printing out the stuff from the queue, but I'm working on adding it to a local MySQL database.
This is a process that needs to run for a long time (at least a few months).
How should I deal with the database connection? Create it in main, then pass it to the logging thread, or create it directly in the logging thread? And how do I handle unexpected situations with the DB connection (interrupted, MySQL server crashes, etc) in a robust manner?
How should I deal with the database connection? Create it in main,
then pass it to the logging thread, or create it directly in the
logging thread?
I would perhaps configure your logging component with the class that creates the connection and let your logging component request it. This is called dependency injection, and makes life easier in terms of testing e.g. you can mock this out later.
If the logging component created the connections itself, then testing the logging component in a standalone fashion would be difficult. By injecting a component that handles these, you can make a mock that returns dummies upon request, or one that provides connection pooling (and so on).
How you handle database issues robustly depends upon what you want to happen. Firstly make your database interactions transactional (and consequently atomic). Now, do you want your logger component to bring your system to a halt whilst it retries a write. Do you want it to buffer writes up and try out-of-band (i.e. on another thread) ? Is it mission critical to write this or can you afford to lose data (e.g. abandon a bad write). I've not provided any specific answers here, since there are so many options depending upon your requirements. The above details a few possible options.
Let's say I have a simple script with two functions below. As callback_func is invoked I would assume it will only run in a singular basis. That i, there won't be two events passing through the code block at the same time. Is that correct? Also, if the callback_func is run in a singular basis, the messaging service itself would have to perform some buffering so no messages are lost, and that buffering is depending on the service originating the event. Is that also correct?
def callback_func(event):
# Can be called anytime
def main_func():
# Sets up a connection to a messaging service
Then what if I add a send_func? If I receive one message but I have three going out, how will send_func deal with a situation when it gets called while sending a message? Is such a situation handled by the Python interpreter?
def send_func(event):
# Can be called anytime
def callback_func(event):
# Can be called anytime
def main_func():
# Sets up a connection to a messaging service
Then lastly, if I change the language to C, how do the answers to my questions above change?
Confusing Two Concepts ( Asynchronous != Concurrent )
Asynchronous does not imply Concurrent, and Concurrent does not imply Asynchronous. These terms get semantically confused by beginners ( and some experts ), but they are different concepts!
You can have one without the other, or both sometimes.
Asynchronous means you don't wait for something, it doesn't imply that it happens while other things do, just that it may happen later..
Concurrent means more than one completely individual thing is happening at the exact same time, these things can be Synchronous while being isolated and concurrent.
Implementation Specific
CPython is single threaded, there is no concern about re-entry. Other Python runtimes allow for concurrency and would need locking mechanisms if those features were used.
C is inherently single threaded as well unless you are specifically starting new threads, then you would need a locking mechanism.
I'd like to add that there are many places that message buffering can happen other than the "service". At a low level, I believe the operating system will buffer incoming and outgoing bytes on a socket.
I currently have a couple of small applications < 500 lines. I am intending to eventually run them on a small LINUX ARM box. Is it better to combine these applications and use threading, or continue to have them as two separate applications?
These applications plus a very small website use a small sqlite database, though only one of the applications write everything else currently does reads. Due to constraints of the target box I am using Python 2.6.
I am using SQLite to prevent data loss over several days of use. There is no direct interaction between the two application though there is the potential for database locking issue especially during period data maintenance. Stopping these issue are a concern also the foot print of the two applications as the target devices are pretty small.
Depends on whether you need them to share data and how involved the sharing is. Other than that, from a speed point of view, for a multiprocessing machine, threading won't give you much of an advantage over separate processes.
If sharing can easily take place via a flat file or database then just let them be separate rather than complicating via threading.
For performance purpose, I will suggest you to use threads, process consumes much more resources than threads, it will be faster to create and need less memory (usefull in embedded environment), but of course, you'll have to deal with the common traps of multithreading programmation (concurent access solved by locks, but locks may lead to interlocking...)
If you plan to use many libraries that make low level calls, maybe with C extension developped that could not release properkly the GIL (Global Interpreter Lock), in this case, processes can be a better solution, to allow your applications to run even when one is blocked by GIL.
If you need to pass data between the two, you could use the Queues and other mechanisms in the multiprocessing module.
It's often much simpler to use multiprocessing rather than sharing memory or objects using the threading module.
If you don't need to pass data between your programs, just run them separately (and release any locks on files or databases as soon as possible to reduce contention).
I have decided to adopt a process rather than a threaded approach to resolving this issue. The primary factor in this decision is simplicity. The second factor is whilst one of these applications will be carrying out data acquisition the other will be communicating with a modem on an ad-hoc basis (receiving calls) I don't have control over the calling application but based on my investigations, there is the potential for a lot to go wrong.
There are a couple of factor which may change the approach further down the line primarily the need for the two processes to interact to prevent data contention on the database. Secondly if resources (memory/disk space/cpu) become an issue (due to the size of the device) one application should give me the ability to manage these problems.
That said the data acquisition application is already threaded. This allows the parent thread to manage the worker when exceptions arise, as the device will not be in a managed environment.