I have two python programs that are supposed to run in parallel and do the same thing:
Read and unzip data from disk (takes about 1 min)
Process data (takes about 2-3 min)
Send data to database (takes about 3-5 min)
As you can see, it would be nice to have the execution of both instances synchronized in a way that one does the processor-heavy steps 1 and 2 (the implementation is multithreaded, so the CPU can actually be maxed out) while the other does the I/O-heavy step 3 and vice versa.
My first idea was to use a lockfile, which is acquired by each process upon entering phase 3 and released after completing it. So the other process will wait until the lock is released and then set it when it enters phase 3. However, this seems like a very cumbersome way to do it. Also, the system is supposed to run unsupervised for days and weeks with the ability to recover from errors, scheduled reboots or power failures. Especially in the last case, the lockfile could simply lock up everything.
Is there a more elegant way to communicate the lockout between the two processes? Or should I rather use the lockfile and try to implement some smart cleanup functionality to keep a deadlock from happening?
It seems that every solution has some drawbacks - either some mechanism or module is not available on all platforms (i.e. Linux only or Windows only), or you may run into error recovery issues with a file-system based approach (as you have already pointed out in your question).
Here is a list of some possible options:
Use Python's multiprocessing module
This allows you to create a lock like this:
lock = multiprocessing.Lock()
and to acquire and release it like this:
lock.acquire()
# do something
lock.release()
Here is a complete example.
Pro: Straightforward to use; cross-platform; no issues with error recovery.
Con: Since you currently have two separate programs, you will have to rearrange your code to start two processes from the same python module.
Use fnctl (Linux)
For Linux/Unix systems, there is fcntl (with fcntl.flock()) available as a python module. This is based on lockfiles.
See also this discussion with some recommendations that I am repeating here:
Write the process ID of the locked process to the file for being able to recognize and fix possible deadlocks.
Put your lock files in a temporary location or a RAM file system.
Con: Not cross-platform, available on Linux/Unix systems only.
Use posix_ipc (Linux)
For Linux/Unix systems, there is python_ipc (with a Semaphore class) available as a python module.
Pro: Not file-system based, no issues with error recovery.
Con: Not cross-platform, available on Linux/Unix systems only.
Use msvcrt (Windows)
For Windows systems, there is msvcrt (with msvcrt.locking()) available as a python module.
See also this discussion.
Con: Not cross-platform, available on Windows systems only.
Use a third-party library
You might want to check out the following python libraries:
ilock
portalocker
filelock
If you are running with some synchronization problems, in my opinion there is no better way than using semaphores. The way you handle the clean up and the lock parts depends a lot of your problem. There are a lot of resources for this kind of issues. Python has already implemented some primitives
You can check this post for an example.
Also check Zookeeper, I never use it on python but its widely used in others languages.
Related
I'm creating a website using Flask. My WSGI server, Gunicorn, spawns multiple processes.
I have some cross-process objects (notably files) that I want to constrain access to within these processes, and raise events when they are modified.
The choice is normally to use system-wide mutexes/semaphores and events.
However, I can't find a portable (Windows/Mac/Linux) solution for these on Python.
The multiprocessing module (see this question), as far as I can tell, only works for processes spawned by the multiprocessing module itself, which these are not.
There are POSIX semaphores also, but these only work on Linux.
Does anyone know of a more general solution?
I have been researching this for a while, and the closest I could find is the python file-locking library fasteners:
It works quite well in all platforms. The problem it only implements system mutex, but not semaphore like counting. I have implementing my own counting in a locked file with an integer counter and active waiting, but this is still fragile and will leave the system in bad state if one of the process crashes and doesn't update the count properly.
I'm writing a program to run svn up in parallel and it is causing the machine to freeze. The server is not experiencing any load issues when this happens.
The commands are run using ThreadPool.map() onto subprocess.Popen():
def cmd2args(cmd):
if isinstance(cmd, basestring):
return cmd if sys.platform == 'win32' else shlex.split(cmd)
return cmd
def logrun(cmd):
popen = subprocess.Popen(cmd2args(cmd),
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
cwd=curdir,
shell=sys.platform == 'win32')
for line in iter(popen.stdout.readline, ""):
sys.stdout.write(line)
sys.stdout.flush()
...
pool = multiprocessing.pool.ThreadPool(argv.jobcount)
pool.map(logrun, _commands)
argv.jobcount is the lesser of multiprocessing.cpu_count() and the number of jobs to run (in this case it is 4). _commands is a list of strings with the commands listed below. shell is set to True on Windows so the shell can find the executables since Windows doesn't have a which command and finding an executable is a bit more complex on Windows (the commands used to be of the form cd directory&&svn up .. which also requires shell=True but that is now done with the cwd parameter instead).
the commands that are being run are
svn up w:/srv/lib/dktabular
svn up w:/srv/lib/dkmath
svn up w:/srv/lib/dkforms
svn up w:/srv/lib/dkorm
where each folder is a separate project/repository, but existing on the same Subversion server. The svn executable is the one packaged with TortoiseSVN 1.8.8 (build 25755 - 64 Bit). The code is up-to-date (i.e. svn up is a no-op).
When the client freezes, the memory bar in Task Manager first goes blank:
and sometimes everything goes dark
If I wait for a while (several minutes) the machine eventually comes back.
Q1: Is it copacetic to invoke svn in parallel?
Q2: Are there any issues with how I'm using ThreadPool.map() and subprocess.Popen()?
Q3: Are there any tools/strategies for debugging these kinds of issues?
I will do the best that I can to answer all three questions thoroughly, and I welcome corrections to my statements.
Q1: Is it copacetic to invoke svn in parallel?
Copacetic, that is up for determination, but I would say that it's neither recommended nor unrecommended. With that statement, source control tools have specific functionality that requires process and block-level (best guess) locking. The checksumming, file transfers, and file reads/writes require locking in order to process correctly or you risk both duplicating effort and file contention, which will lead to process failures.
Q2: Are there any issues with how I'm using ThreadPool.map() and subprocess.Popen()?
While I don't know the absolute specifics on subprocess.Popen() as I was using it last in 2.6, I can speak about the programmability a bit. What you are doing in the code you creating is creating a pool of one specific subprocess, instead of calling the processes directly. Now off the top of my head, and with my understanding of ThreadPool() is that it does not perform locking by default. This may cause issues with subprocess.Popen(), I'm not sure. Regarding my answer above, locking is something that will need to be implemented. I would recommend looking at https://stackoverflow.com/a/3044626/2666240 for a better understanding of the differences between threading and pooling as I would recommend using threading instead of mutliprocessing. With the nature of source control applications requiring locking, if you are going to parallelise operations while handling locking, you will also need to be able to synchronise the threads so that work is not duplicated. I ran a test a few months back on Linux with multiprocessing, and I noticed that grep was repeating the global search. I'll see if I can find the code I wrote and paste it. With thread synchronisation, I would hope that Python would be able to pass the svn thread status between threads in a way that svn is able to understand so that process duplication is not occuring. That being said, I don't know how svn works under the hood from that aspect, so I am only speculating/making a best guess. As svn is likely using a fairly complicated locking method (I would assert block-level locking and not inode locking but once again, best guess), it would likely make sense to implement semaphore locking instead of lock() or Rlock(). That said, you will have to go through and test various locking and synchronisation methods to figure out which works best for svn. This is a good resource when it comes to thread synchronisation: http://effbot.org/zone/thread-synchronization.htm
Q3: Are there any tools/strategies for debugging these kinds of issues?
Sure, threading and multiprocessing should both have logging functionality that you can utilise in conjunction with logging. I would just log to a file so that you can have something to reference instead of just console output. You should, in theory, be able to just use logging.debug(pool.map(logrun, _commands)) and that would log the processes taken. That being said, I'm not a logging expert with threading or multiprocessing, so someone else can likely answer that better than I.
Are you using Python 2.x or 3.x?
How can I initiate IPC with a child process, without letting it inherit all handles? To make it more interesting, this shoud work on windows as well as unix.
The background: I am writing a library that interfaces with a 3rparty shared library (let's just call it IT) which in turn contains global data (that really should be objects!). I want to have multiple instances of this global data. As far as I understand, I have two options to solve this:
create a cython module that links against a static variant of IT, then copy and import the module whenever I want a new instance. Analogously, I could copy IT but that's even more work to create a ctypes interface.
spawn a subprocess that loads IT and establish an IPC connection to it.
There are a few reasons to use (2):
I am not sure, if (1) is reliable in any way and it feels like a bad idea (what happens with all the extra modules, when the application exits in an uncontrolled way?).
boxing IT into a separate process might actually be a good idea anyway for security considerations: IT deals with potentially unsafe input and IT's code quality isn't overly good. So, I'd rather not have any secure resources open when running it.
there is probably lot's of need for this kind of IPC in future applications
So what are my options? I have already looked into:
multiprocessing.Process at first looked nice, until I realized that the new process gets a copy of all my handles. Needless to say that this is quite problematic, since now resources cannot be reliably freed by closing them in the parent process + the security issues mentioned earlier.
Use os.closerange within a multiprocessing.Process to close to all handles manually - except for the Pipe I'm interested in. Does os.closerange close only files or does it take care of other types of resources as well? If so: how can I determine the range, given the Pipe object?
subprocess.Popen(.., close_fds=True, stdin=PIPE, stdout=PIPE) works fine on unix but isn't possible on win32.
Named pipes are very different on win32 and unix. Are their any libraries that their usage?
Sockets. Promising, especially since their are handy RPC libraries that can work with sockets. On the other hand, I fear that this may cause a whole bunch of security issues. Are sockets that I have determined to be of local origin (sock.getpeername()[0] == '127.0.0.1') secure against tempering?
Are there any possibilities that I have overlooked?
To round up: the main question is how to establish a secure IPC with a child process on windows+unix? But please don't hesitate to answer if you know any answers to only partial problems.
Thanks for taking the time to read it!
It seems on python>=3.4 subprocess.Popen(..., stdin=PIPE, stdout=PIPE, close_fds=False) is a possible option. This is due to a patch that makes all opened file descriptors non-inheritable by default. To be more precise, they will be automatically closed on execv (so still can't use multiprocessing.Process), see PEP 446.
This is also a valid option for other python versions:
on windows, HANDLEs are created non-inheritable by default, so you will leak only handles that were made inheritable explicitly
on POSIX/python<=3.3 you can still use os.closerange to close open file descriptors after spawning the subprocess
for a corresponding example see:
https://github.com/coldfix/python-ipc-test
The most useful combinations are:
stdio:pickle
pro: completely cross-platform in my tests
pro: fastest option (with 2)
con: stdin/stdout can not be redirected independently
inherit_unidir:pickle
pro: you can redirect STDIO streams independently
pro: fastest option together with stdio:pickle
con: very low level platform specific code
socket:sockpipe
pro: cross-platform with little effort
con: there is a short period when "attackers" may connect to the port, you could require a pass-phrase or something to prevent that from happening
con: slightly slower than alternatives on windows (factor 1.6 in my measurements)
when not using AF_UNIX there are unpredictable performance hits on linux
I currently have a couple of small applications < 500 lines. I am intending to eventually run them on a small LINUX ARM box. Is it better to combine these applications and use threading, or continue to have them as two separate applications?
These applications plus a very small website use a small sqlite database, though only one of the applications write everything else currently does reads. Due to constraints of the target box I am using Python 2.6.
I am using SQLite to prevent data loss over several days of use. There is no direct interaction between the two application though there is the potential for database locking issue especially during period data maintenance. Stopping these issue are a concern also the foot print of the two applications as the target devices are pretty small.
Depends on whether you need them to share data and how involved the sharing is. Other than that, from a speed point of view, for a multiprocessing machine, threading won't give you much of an advantage over separate processes.
If sharing can easily take place via a flat file or database then just let them be separate rather than complicating via threading.
For performance purpose, I will suggest you to use threads, process consumes much more resources than threads, it will be faster to create and need less memory (usefull in embedded environment), but of course, you'll have to deal with the common traps of multithreading programmation (concurent access solved by locks, but locks may lead to interlocking...)
If you plan to use many libraries that make low level calls, maybe with C extension developped that could not release properkly the GIL (Global Interpreter Lock), in this case, processes can be a better solution, to allow your applications to run even when one is blocked by GIL.
If you need to pass data between the two, you could use the Queues and other mechanisms in the multiprocessing module.
It's often much simpler to use multiprocessing rather than sharing memory or objects using the threading module.
If you don't need to pass data between your programs, just run them separately (and release any locks on files or databases as soon as possible to reduce contention).
I have decided to adopt a process rather than a threaded approach to resolving this issue. The primary factor in this decision is simplicity. The second factor is whilst one of these applications will be carrying out data acquisition the other will be communicating with a modem on an ad-hoc basis (receiving calls) I don't have control over the calling application but based on my investigations, there is the potential for a lot to go wrong.
There are a couple of factor which may change the approach further down the line primarily the need for the two processes to interact to prevent data contention on the database. Secondly if resources (memory/disk space/cpu) become an issue (due to the size of the device) one application should give me the ability to manage these problems.
That said the data acquisition application is already threaded. This allows the parent thread to manage the worker when exceptions arise, as the device will not be in a managed environment.
I use Python 2.5.4. My computer: CPU AMD Phenom X3 720BE, Mainboard 780G, 4GB RAM, Windows 7 32 bit.
I use Python threading but can not make every python.exe process consume 100% CPU. Why are they using only about 33-34% on average?.
I wish to direct all available computer resources toward these large calculations so as to complete them as quickly as possible.
EDIT:
Thanks everybody. Now I'm using Parallel Python and everything works well. My CPU now always at 100%. Thanks all!
It appears that you have a 3-core CPU. If you want to use more than one CPU core in native Python code, you have to spawn multiple processes. (Two or more Python threads cannot run concurrently on different CPUs)
As R. Pate said, Python's multiprocessing module is one way. However, I would suggest looking at Parallel Python instead. It takes care of distributing tasks and message-passing. You can even run tasks on many separate computers with little change to your code.
Using it is quite simple:
import pp
def parallel_function(arg):
return arg
job_server = pp.Server()
# Define your jobs
job1 = job_server.submit(parallel_function, ("foo",))
job2 = job_server.submit(parallel_function, ("bar",))
# Compute and retrieve answers for the jobs.
print job1()
print job2()
Try the multiprocessing module, as Python, while it has real, native threads, will restrict their concurrent use while the GIL is held. Another alternative, and something you should look at if you need real speed, is writing a C extension module and calling functions in it from Python. You can release the GIL in those C functions.
Also see David Beazley's Mindblowing GIL.
Global Interpreter Lock
The reasons of employing such a lock include:
* increased speed of single-threaded programs (no necessity to acquire or release locks
on all data structures separately)
* easy integration of C libraries that usually are not thread-safe.
Applications written in languages with
a GIL have to use separate processes
(i.e. interpreters) to achieve full
concurrency, as each interpreter has
its own GIL.
From CPU usage it looks like you're still running on a single core. Try running a trivial calculation with 3 or more threads with same threading code and see if it utilizes all cores. If it doesn't, something might be wrong with your threading code.
What about Stackless Python?
You bottleneck is probably somewhere else, like the hard-drive (paging), or memory access.
You should perform some Operating System and Python monitoring to determine where the bottle neck is.
Here is some info for windows 7:
Performance Monitor: You can use Windows Performance Monitor to examine how programs you run affect your computer’s performance, both in real time and by collecting log data for later analysis. (Control Panel-> All Control Panel Items->Performance Information and Tools-> Advanced Tools- > View Performance Monitor)
Resource Monitor: Windows Resource Monitor is a system tool that allows you to view information about the use of hardware (CPU, memory, disk, and network) and software (file handles and modules) resources in real time. You can use Resource Monitor to start, stop, suspend, and resume processes and services. (Control Panel-> All Control Panel Items->Performance Information and Tools-> Advanced Tools- > View Resource Monitor)
I solved the problems that led me to this post by running a second script manually. This post helped me run multiple python scripts at the same time.
I managed to execute in the newly-opened terminal window typing a command there. Not as convenient as shift + enter but does the job.