I have a question about how flock() works, particularly in python. I have a module that opens a serial connection (via os.open()). I need to make this thread safe. It's easy enough making it thread safe when working in the same module using threading.Lock(), but if the module gets imported from different places, it breaks.
I was thinking of using flock(), but I'm having trouble finding enough information about how exactly flock works. I read that flock() unlocks the file once the file is closed. But is there a situation that will keep the file open if python crashes?
And what exactly is allowed to use the locked file if LOCK_EX is set? Just the module that locked the file? Any module that was imported from the script that was originally run?
When a process dies the OS should clean up any open file resources (with some caveats, I'm sure). This is because the advisory lock is released when the file is closed, an operation which occurs as part of the OS cleanup when the python process exits.
Remember, flock(2) is merely advisory:
Advisory locks allow cooperating processes to perform consistent operations on files, but [other, poorly behaved] processes may still access those files without using advisory locks.
flock(2) implements a readers-writer lock. You can't flock the same file twice with LOCK_EX, but any number of people can flock it with LOCK_SH simultaneously (as long as nobody else has a LOCK_EX on it).
The locking mechanism allows two types of locks: shared locks and exclusive locks. At any time multiple shared locks may be applied to a file, but at no time are multiple exclusive, or both shared and exclusive, locks allowed simultaneously on a file.
flock works at the OS/process level and is independent of python modules. One module may request n locks, or n locks could be requested across m modules. However, only one process can hold a LOCK_EX lock on a given file at a given time.
YMMV on a "non-UNIX" system or a non-local filesystem.
Related
I've been scratching my head trying to figure out if this is possible.
I have a server program running with about 30 different socket connections to it from all over the country. I need to update this server program now and although the client devices will automatically reconnect, its not totally reliable.
I was wondering if there is a way of saving the socket object to a file? then load it back up when the server restarts? or forcefully keeping a socket open even after the program stops. This way the clients never disconnect at all.
Could really do with hot swappable code here really!
Solution 1.
It can be done with some process magic, at least under linux (although I do believe similar windows api exists). First of all note that sockets cannot be stored in a file. These objects are temporary by their nature. But you can keep them in a separate process. Have a look at this:
Can I open a socket and pass it to another process in Linux
So one way to accomplish this is the following:
Create a "keeper" process at some point (make sure that the process is not a child of the main process so that it stays alive when the main process is gone)
Send all sockets to the keeper process via sendmsg() with SCM_RIGHTS
Shutdown the main process
Do whatever update you have to
Fire the main process
Retrieve sockets from the keeper process
Shutdown the keeper process
However this solution is quite difficult to maintain. You have two separate processes, it is unclear which is the master and which is a slave. So you would probably need another master process at the top. Things get nasty very quickly, not to mention security issues.
Solution 2.
Reloading modules as suggested by #gavinb might be a solution. Note however that in practice this often breaks the app. You never know what those modules do under the hood unless you know the code of every single Python file you use. Plus it imposes some restrictions on modules, i.e. they have to be reloadable. For example some modules use inline caching which makes reloading difficult.
Also once a module is loaded in a different module it keeps a reference to that module. So you not only have to reload it but also update references in every other module that loaded it earlier. The maintanance costs raise very quickly unless you thought about it at the begining of the project (so that every import is encapsulated for easy reload). And bugs caused by two different versions of a module running in the same process are (I imagine, never been in this situation though) extremely difficult to find.
Anyway I would avoid that.
Solution 3.
So this is XY problem. Instead of saving sockets how about you put a proxy in front of the main server? IMO this is the safest and at the same time simpliest solution. The proxy will communicate with the main server (for example over unix domain sockets) and will buffer the data and automatically reconnect to the main server once it is available again. Perhaps you can even reuse some existing tech, e.g. nginx.
No, the sockets are special file handles that belong to the process. If you close the process, the runtime will force close any open files/sockets. This is not Python specific; it is just how operating systems manage resources.
Now what you can do however is dynamically reload one or more modules while keeping the process active. It might take some careful management when you have open sockets, but in theory it should be possible. So yes, hot swappable code is actually supported by Python.
Do some reading and research on "dynamic reloading". The importlib module in Python 3 provides the reload function which is used to:
Reload a previously imported module. The argument must be a module object, so it must have been successfully imported before. This is useful if you have edited the module source file using an external editor and want to try out the new version without leaving the Python interpreter.
I think your critical question is how to hot reload.
And as mentioned by #gavinb, you can import importlib and then use importlib.reload(module) to reload a module dynamically.
Be careful, the parameter of reload(param) must be a module.
How to correctly fork a child process in twisted that does not use anything from twisted (but uses data from the parent process) (e.g. to process a “snapshot” of some data from the parent process and write it to file, without blocking)?
It seems if I do anything like clean shutdown in the child process after os.fork(), it closes some of the sockets / descriptors in the parent process; the only way to avoid that that I see is to do os.kill(os.getpid(), signal.SIGKILL), which does seem like a bad idea (though not directly problematic).
(additionally, if a dict is changed in the parent process, can it be that it will change in the child process too? Quick test shows that it doesn't change, though. OS/kernels are debian stable / sid)
IReactorProcess.spawnProcess (usually available as from twisted.internet import reactor; reactor.spawnProcess) can spawn a process running any available executable on your system. The subprocess does not need to use Twisted, or, indeed, even be in Python.
Do not call os.fork yourself. As you've discovered, it has lots of very peculiar interactions with process state, that spawnProcess will manage for you.
Among the problems with os.fork are:
Forking copies your current process state, but doesn't copy the state of threads. This means that any thread in the middle of modifying some global state will leave things half-broken, possibly holding some locks which will never be released. Don't run any threads in your application? Have you audited every library you use, every one of its dependencies, to ensure that none of them have ever or will ever use a background thread for anything?
You might think you're only touching certain areas of your application memory, but thanks to Python's reference counting, any object which you even peripherally look at (or is present on the stack) may have reference counts being incremented or decremented. Incrementing or decrementing a refcount is a write operation, which means that whole page (not just that one object) gets copied back into your process. So forked processes in Python tend to accumulate a much larger copied set than, say, forked C programs.
Many libraries, famously all of the libraries that make up the systems on macOS and iOS, cannot handle fork() correctly and will simply crash your program if you attempt to use them after fork but before exec.
There's a flag for telling file descriptors to close on exec - but no such flag to have them close on fork. So any files (including log files, and again, any background temp files opened by libraries you might not even be aware of) can get silently corrupted or truncated if you don't manage access to them carefully.
I need to synchronize python threads and processes (not necessary related with each other) with named lock (file lock for example). Preferably it should be readers-writer lock. I have tried fcntl.flock (it have both exclusive and shared lock acquisition) but it does not provide desired level of locking - Does python's fcntl.flock function provide thread level locking of file access?
My solution so far is to use lockfile with memcached (or mmap'ed locked file). Lockfile will synchronize access and memcached will count readers/writers.
Are there any better/faster solutions? Do you know any project which already solves this problem?
Here is a link http://semanchuk.com/philip/ with libraries implementing posix and system V semaphores. You can use one of those. Beware though that in a situation when process holding semaphore dies without releasing it - all other get stucked. If you afraid of this - you can use System V Semaphores with UNDO but they are a little slower. Also if you are happen to use System V shared memory primitives - remember that they live in kernel and keep living after process termination - you have to explicitly remove them from system.
If you are not afraid of dying processes and deadlock of whole system and processes are related - you could use python's Semaphores (they are posix named semaphores.)
The page you linked as related question (fcntl) does not saying that fcntl is not suitable for inter thread locking. It is saying that fcntl cares about fds. So you can use fcntl for inter-process and inter-thread locking as long as you open locking file and get new fd for each lock instance.
You could also use a combination of fcntl for inter-process and python's semaphore for inter-thread locking.
And finally: rethink your architecture. Locking is generally bad. Delegate resource to a process that will take care of it without locking. It will be much more simplier to maintain. Believe me.
I have a pipeline which at some point splits work into various sub-processes that do the same thing in parallel. Thus their output should go into the same file.
Is it too risky to say all of those processes should write into the same file? Or does python try and retry if it sees that this resource is occupied?
This is system dependent. In Windows, the resource is locked and you get an exception. In Linux you can write the file with two processes (written data could be mixed)
Ideally in such cases you should use semaphores to synchronize access to shared resources.
If using semaphores is too heavy for your needs, then the only alternative is to write in separate files...
Edit: As pointed out by eye in a later post, a resource manager is another alternative to handle concurrent writers
In general, this is not a good idea and will take a lot of care to get right. Since the writes will have to be serialized, it might also adversely affect scalability.
I'd recommend writing to separate files and merging (or just leaving them as separate files).
A better solution is to implement a resource manager (writer) to avoid opening the same file twice. This manager could use threading synchronization mechanisms (threading.Lock) to avoid simultaneous access on some platforms.
How about having all of the different processes write their output into a queue, and have a single process that reads that queue, and writes to the file?
Use multiprocessing.Lock() instead of threading.Lock(). Just a word of caution! might slow down your concurrent processing ability because one process just waits for the lock to be released
I have two threads, one which writes to a file, and another which periodically
moves the file to a different location. The writes always calls open before writing a message, and calls close after writing the message. The mover uses shutil.move to do the move.
I see that after the first move is done, the writer cannot write to the file anymore, i.e. the size of the file is always 0 after the first move. Am I doing something wrong?
Locking is a possible solution, but I prefer the general architecture of having each external resource (including a file) dealt with by a single, separate thread. Other threads send work requests to the dedicated thread on a Queue.Queue instance (and provide a separate queue of their own as part of the work request's parameters if they need result back), the dedicated thread spends most of its time waiting on a .get on that queue and whenever it gets a requests goes on and executes it (and returns results on the passed-in queue if needed).
I've provided detailed examples of this approach e.g. in "Python in a Nutshell". Python's Queue is intrinsically thread-safe and simplifies your life enormously.
Among the advantages of this architecture is that it translates smoothly to multiprocessing if and when you decide to switch some work to a separate process instead of a separate thread (e.g. to take advantage of multiple cores) -- multiprocessing provides its own workalike Queue type to make such a transition smooth as silk;-).
When two threads access the same resources, weird things happen. To avoid that, always lock the resource. Python has the convenient threading.Lock for that, as well as some other tools (see documentation of the threading module).
Check out http://www.evanfosmark.com/2009/01/cross-platform-file-locking-support-in-python/
You can use a simple lock with his code, as written by Evan Fosmark in an older StackOverflow question:
from filelock import FileLock
with FileLock("myfile.txt"):
# work with the file as it is now locked
print("Lock acquired.")
One of the more elegant libraries I've ever seen.