Shared and exclusive named lock for python - python

I need to synchronize python threads and processes (not necessary related with each other) with named lock (file lock for example). Preferably it should be readers-writer lock. I have tried fcntl.flock (it have both exclusive and shared lock acquisition) but it does not provide desired level of locking - Does python's fcntl.flock function provide thread level locking of file access?
My solution so far is to use lockfile with memcached (or mmap'ed locked file). Lockfile will synchronize access and memcached will count readers/writers.
Are there any better/faster solutions? Do you know any project which already solves this problem?

Here is a link http://semanchuk.com/philip/ with libraries implementing posix and system V semaphores. You can use one of those. Beware though that in a situation when process holding semaphore dies without releasing it - all other get stucked. If you afraid of this - you can use System V Semaphores with UNDO but they are a little slower. Also if you are happen to use System V shared memory primitives - remember that they live in kernel and keep living after process termination - you have to explicitly remove them from system.
If you are not afraid of dying processes and deadlock of whole system and processes are related - you could use python's Semaphores (they are posix named semaphores.)
The page you linked as related question (fcntl) does not saying that fcntl is not suitable for inter thread locking. It is saying that fcntl cares about fds. So you can use fcntl for inter-process and inter-thread locking as long as you open locking file and get new fd for each lock instance.
You could also use a combination of fcntl for inter-process and python's semaphore for inter-thread locking.
And finally: rethink your architecture. Locking is generally bad. Delegate resource to a process that will take care of it without locking. It will be much more simplier to maintain. Believe me.

Related

Best way to communicate resource lock between processes

I have two python programs that are supposed to run in parallel and do the same thing:
Read and unzip data from disk (takes about 1 min)
Process data (takes about 2-3 min)
Send data to database (takes about 3-5 min)
As you can see, it would be nice to have the execution of both instances synchronized in a way that one does the processor-heavy steps 1 and 2 (the implementation is multithreaded, so the CPU can actually be maxed out) while the other does the I/O-heavy step 3 and vice versa.
My first idea was to use a lockfile, which is acquired by each process upon entering phase 3 and released after completing it. So the other process will wait until the lock is released and then set it when it enters phase 3. However, this seems like a very cumbersome way to do it. Also, the system is supposed to run unsupervised for days and weeks with the ability to recover from errors, scheduled reboots or power failures. Especially in the last case, the lockfile could simply lock up everything.
Is there a more elegant way to communicate the lockout between the two processes? Or should I rather use the lockfile and try to implement some smart cleanup functionality to keep a deadlock from happening?
It seems that every solution has some drawbacks - either some mechanism or module is not available on all platforms (i.e. Linux only or Windows only), or you may run into error recovery issues with a file-system based approach (as you have already pointed out in your question).
Here is a list of some possible options:
Use Python's multiprocessing module
This allows you to create a lock like this:
lock = multiprocessing.Lock()
and to acquire and release it like this:
lock.acquire()
# do something
lock.release()
Here is a complete example.
Pro: Straightforward to use; cross-platform; no issues with error recovery.
Con: Since you currently have two separate programs, you will have to rearrange your code to start two processes from the same python module.
Use fnctl (Linux)
For Linux/Unix systems, there is fcntl (with fcntl.flock()) available as a python module. This is based on lockfiles.
See also this discussion with some recommendations that I am repeating here:
Write the process ID of the locked process to the file for being able to recognize and fix possible deadlocks.
Put your lock files in a temporary location or a RAM file system.
Con: Not cross-platform, available on Linux/Unix systems only.
Use posix_ipc (Linux)
For Linux/Unix systems, there is python_ipc (with a Semaphore class) available as a python module.
Pro: Not file-system based, no issues with error recovery.
Con: Not cross-platform, available on Linux/Unix systems only.
Use msvcrt (Windows)
For Windows systems, there is msvcrt (with msvcrt.locking()) available as a python module.
See also this discussion.
Con: Not cross-platform, available on Windows systems only.
Use a third-party library
You might want to check out the following python libraries:
ilock
portalocker
filelock
If you are running with some synchronization problems, in my opinion there is no better way than using semaphores. The way you handle the clean up and the lock parts depends a lot of your problem. There are a lot of resources for this kind of issues. Python has already implemented some primitives
You can check this post for an example.
Also check Zookeeper, I never use it on python but its widely used in others languages.

Python portable interprocess Semaphore/Event

I'm creating a website using Flask. My WSGI server, Gunicorn, spawns multiple processes.
I have some cross-process objects (notably files) that I want to constrain access to within these processes, and raise events when they are modified.
The choice is normally to use system-wide mutexes/semaphores and events.
However, I can't find a portable (Windows/Mac/Linux) solution for these on Python.
The multiprocessing module (see this question), as far as I can tell, only works for processes spawned by the multiprocessing module itself, which these are not.
There are POSIX semaphores also, but these only work on Linux.
Does anyone know of a more general solution?
I have been researching this for a while, and the closest I could find is the python file-locking library fasteners:
It works quite well in all platforms. The problem it only implements system mutex, but not semaphore like counting. I have implementing my own counting in a locked file with an integer counter and active waiting, but this is still fragile and will leave the system in bad state if one of the process crashes and doesn't update the count properly.

python-twisted: fork for background non-returning processing

How to correctly fork a child process in twisted that does not use anything from twisted (but uses data from the parent process) (e.g. to process a “snapshot” of some data from the parent process and write it to file, without blocking)?
It seems if I do anything like clean shutdown in the child process after os.fork(), it closes some of the sockets / descriptors in the parent process; the only way to avoid that that I see is to do os.kill(os.getpid(), signal.SIGKILL), which does seem like a bad idea (though not directly problematic).
(additionally, if a dict is changed in the parent process, can it be that it will change in the child process too? Quick test shows that it doesn't change, though. OS/kernels are debian stable / sid)
IReactorProcess.spawnProcess (usually available as from twisted.internet import reactor; reactor.spawnProcess) can spawn a process running any available executable on your system. The subprocess does not need to use Twisted, or, indeed, even be in Python.
Do not call os.fork yourself. As you've discovered, it has lots of very peculiar interactions with process state, that spawnProcess will manage for you.
Among the problems with os.fork are:
Forking copies your current process state, but doesn't copy the state of threads. This means that any thread in the middle of modifying some global state will leave things half-broken, possibly holding some locks which will never be released. Don't run any threads in your application? Have you audited every library you use, every one of its dependencies, to ensure that none of them have ever or will ever use a background thread for anything?
You might think you're only touching certain areas of your application memory, but thanks to Python's reference counting, any object which you even peripherally look at (or is present on the stack) may have reference counts being incremented or decremented. Incrementing or decrementing a refcount is a write operation, which means that whole page (not just that one object) gets copied back into your process. So forked processes in Python tend to accumulate a much larger copied set than, say, forked C programs.
Many libraries, famously all of the libraries that make up the systems on macOS and iOS, cannot handle fork() correctly and will simply crash your program if you attempt to use them after fork but before exec.
There's a flag for telling file descriptors to close on exec - but no such flag to have them close on fork. So any files (including log files, and again, any background temp files opened by libraries you might not even be aware of) can get silently corrupted or truncated if you don't manage access to them carefully.

Having issues with flock() function

I have a question about how flock() works, particularly in python. I have a module that opens a serial connection (via os.open()). I need to make this thread safe. It's easy enough making it thread safe when working in the same module using threading.Lock(), but if the module gets imported from different places, it breaks.
I was thinking of using flock(), but I'm having trouble finding enough information about how exactly flock works. I read that flock() unlocks the file once the file is closed. But is there a situation that will keep the file open if python crashes?
And what exactly is allowed to use the locked file if LOCK_EX is set? Just the module that locked the file? Any module that was imported from the script that was originally run?
When a process dies the OS should clean up any open file resources (with some caveats, I'm sure). This is because the advisory lock is released when the file is closed, an operation which occurs as part of the OS cleanup when the python process exits.
Remember, flock(2) is merely advisory:
Advisory locks allow cooperating processes to perform consistent operations on files, but [other, poorly behaved] processes may still access those files without using advisory locks.
flock(2) implements a readers-writer lock. You can't flock the same file twice with LOCK_EX, but any number of people can flock it with LOCK_SH simultaneously (as long as nobody else has a LOCK_EX on it).
The locking mechanism allows two types of locks: shared locks and exclusive locks. At any time multiple shared locks may be applied to a file, but at no time are multiple exclusive, or both shared and exclusive, locks allowed simultaneously on a file.
flock works at the OS/process level and is independent of python modules. One module may request n locks, or n locks could be requested across m modules. However, only one process can hold a LOCK_EX lock on a given file at a given time.
YMMV on a "non-UNIX" system or a non-local filesystem.

Mutex locks vs Threading locks. Which to use?

My main question is does the Threading lock object create atomic locks? It doesn't say that the lock is atomic in the module documentation. in pythons mutex documentation it does say the mutex lock is atomic but it seems that I read somewhere that in fact it isn't. I am wondering if someone could could give me a bit of insight on this mater. Which lock should I use. I am currently running my scripts using python 2.4
Locks of any nature would be rather useless if they weren't atomic - the whole point of the lock is to allow for higher-level atomic operations.
All of threading's synchronization objects (locks, rlocks, semaphores, boundedsemaphores) utilize atomic instructions, as do mutexes.
You should use threading, since mutex is actually deprecated going forward (and removed in Python 3).

Categories

Resources