I'm working with a piece of hardware that must be stopped and started at different intervals. Unfortunately, it doesn't teardown gracefully, so restarting within the same process results in libusb errors. One workaround would be to move the configuration of the hardware to a different process, and stop/start the process when required.
What would be the best way to do this in Python?
The pickle module allows you to serialize objects to a string, so you can transfer it via the disk or a socket.
You could also use multiprocessing, which is intended for parallelism, but could be used here too. (Actually, multiprocessing relies on pickle.)
Related
I'm creating a website using Flask. My WSGI server, Gunicorn, spawns multiple processes.
I have some cross-process objects (notably files) that I want to constrain access to within these processes, and raise events when they are modified.
The choice is normally to use system-wide mutexes/semaphores and events.
However, I can't find a portable (Windows/Mac/Linux) solution for these on Python.
The multiprocessing module (see this question), as far as I can tell, only works for processes spawned by the multiprocessing module itself, which these are not.
There are POSIX semaphores also, but these only work on Linux.
Does anyone know of a more general solution?
I have been researching this for a while, and the closest I could find is the python file-locking library fasteners:
It works quite well in all platforms. The problem it only implements system mutex, but not semaphore like counting. I have implementing my own counting in a locked file with an integer counter and active waiting, but this is still fragile and will leave the system in bad state if one of the process crashes and doesn't update the count properly.
I've been scratching my head trying to figure out if this is possible.
I have a server program running with about 30 different socket connections to it from all over the country. I need to update this server program now and although the client devices will automatically reconnect, its not totally reliable.
I was wondering if there is a way of saving the socket object to a file? then load it back up when the server restarts? or forcefully keeping a socket open even after the program stops. This way the clients never disconnect at all.
Could really do with hot swappable code here really!
Solution 1.
It can be done with some process magic, at least under linux (although I do believe similar windows api exists). First of all note that sockets cannot be stored in a file. These objects are temporary by their nature. But you can keep them in a separate process. Have a look at this:
Can I open a socket and pass it to another process in Linux
So one way to accomplish this is the following:
Create a "keeper" process at some point (make sure that the process is not a child of the main process so that it stays alive when the main process is gone)
Send all sockets to the keeper process via sendmsg() with SCM_RIGHTS
Shutdown the main process
Do whatever update you have to
Fire the main process
Retrieve sockets from the keeper process
Shutdown the keeper process
However this solution is quite difficult to maintain. You have two separate processes, it is unclear which is the master and which is a slave. So you would probably need another master process at the top. Things get nasty very quickly, not to mention security issues.
Solution 2.
Reloading modules as suggested by #gavinb might be a solution. Note however that in practice this often breaks the app. You never know what those modules do under the hood unless you know the code of every single Python file you use. Plus it imposes some restrictions on modules, i.e. they have to be reloadable. For example some modules use inline caching which makes reloading difficult.
Also once a module is loaded in a different module it keeps a reference to that module. So you not only have to reload it but also update references in every other module that loaded it earlier. The maintanance costs raise very quickly unless you thought about it at the begining of the project (so that every import is encapsulated for easy reload). And bugs caused by two different versions of a module running in the same process are (I imagine, never been in this situation though) extremely difficult to find.
Anyway I would avoid that.
Solution 3.
So this is XY problem. Instead of saving sockets how about you put a proxy in front of the main server? IMO this is the safest and at the same time simpliest solution. The proxy will communicate with the main server (for example over unix domain sockets) and will buffer the data and automatically reconnect to the main server once it is available again. Perhaps you can even reuse some existing tech, e.g. nginx.
No, the sockets are special file handles that belong to the process. If you close the process, the runtime will force close any open files/sockets. This is not Python specific; it is just how operating systems manage resources.
Now what you can do however is dynamically reload one or more modules while keeping the process active. It might take some careful management when you have open sockets, but in theory it should be possible. So yes, hot swappable code is actually supported by Python.
Do some reading and research on "dynamic reloading". The importlib module in Python 3 provides the reload function which is used to:
Reload a previously imported module. The argument must be a module object, so it must have been successfully imported before. This is useful if you have edited the module source file using an external editor and want to try out the new version without leaving the Python interpreter.
I think your critical question is how to hot reload.
And as mentioned by #gavinb, you can import importlib and then use importlib.reload(module) to reload a module dynamically.
Be careful, the parameter of reload(param) must be a module.
I'm trying to implement a plugin architecture in Python.
I've started writing it using the Threading module where each plugin is a thread which I invoke using the Thread.start() method (since all plugins subclass BasePlugin which subclasses Thread). However I've just come across the multiprocessing module.
I'm currently wondering if I should switch to the multiprocessing module and share data using shared memory / Pipes etc...
I'd like to get other's opinions on this.
The plugin architecture I've been working on works as follows:
An event is received by the Plugin Manager. The Plugin Manager checks for all the plugins who've subscribed to that type of event. It activates them and sends them the event object (since it holds additional information). If one of the plugins is already active there is no need to spawn it (just send the event object to it).
In addition there are a few resources which belong only to one plugin at any point in time. Each plugin can request the resource (I'm not worrying about any race condition here since there won't be that many plugins active at once).
Threads share memory with the primary process and each other. For example you can have a list that is available to all threads. An item appended to a list can be seen by other threads. But you have to be careful. You have to understand which operations on data structures are thread safe and which are not. What happens to the behaviour of your program when two threads are checking for the existence of a key in a dictionary and then writing to it?
Multiple processes do not share memory. The new process that you start gets a copy of the memory at the point where it was spawned.
Threads use less resources. But can be hard to reason about. On the other hand communication between processes is tricky. And you can't just access an arbitrary Python data structure. Which it sounds like you want to be able to do.
A badly written plugin, if it was in a thread, could crash your whole program. Whereas if it was in a separate process this wouldn't happen. Maybe that's a consideration?
How to correctly fork a child process in twisted that does not use anything from twisted (but uses data from the parent process) (e.g. to process a “snapshot” of some data from the parent process and write it to file, without blocking)?
It seems if I do anything like clean shutdown in the child process after os.fork(), it closes some of the sockets / descriptors in the parent process; the only way to avoid that that I see is to do os.kill(os.getpid(), signal.SIGKILL), which does seem like a bad idea (though not directly problematic).
(additionally, if a dict is changed in the parent process, can it be that it will change in the child process too? Quick test shows that it doesn't change, though. OS/kernels are debian stable / sid)
IReactorProcess.spawnProcess (usually available as from twisted.internet import reactor; reactor.spawnProcess) can spawn a process running any available executable on your system. The subprocess does not need to use Twisted, or, indeed, even be in Python.
Do not call os.fork yourself. As you've discovered, it has lots of very peculiar interactions with process state, that spawnProcess will manage for you.
Among the problems with os.fork are:
Forking copies your current process state, but doesn't copy the state of threads. This means that any thread in the middle of modifying some global state will leave things half-broken, possibly holding some locks which will never be released. Don't run any threads in your application? Have you audited every library you use, every one of its dependencies, to ensure that none of them have ever or will ever use a background thread for anything?
You might think you're only touching certain areas of your application memory, but thanks to Python's reference counting, any object which you even peripherally look at (or is present on the stack) may have reference counts being incremented or decremented. Incrementing or decrementing a refcount is a write operation, which means that whole page (not just that one object) gets copied back into your process. So forked processes in Python tend to accumulate a much larger copied set than, say, forked C programs.
Many libraries, famously all of the libraries that make up the systems on macOS and iOS, cannot handle fork() correctly and will simply crash your program if you attempt to use them after fork but before exec.
There's a flag for telling file descriptors to close on exec - but no such flag to have them close on fork. So any files (including log files, and again, any background temp files opened by libraries you might not even be aware of) can get silently corrupted or truncated if you don't manage access to them carefully.
Task is:
I have task queue stored in db. It grows. I need to solve tasks by python script when I have resources for it. I see two ways:
python script working all the time. But i don't like it (reason posible memory leak).
python script called by cron and do a little part of task. But i need to solve the problem of one working active script in memory (To prevent active scripts count grow). What is the best solution to implement it in python?
Any ideas to solve this problem at all?
You can use a lockfile to prevent multiple scripts from running out of cron. See the answers to an earlier question, "Python: module for creating PID-based lockfile". This is really just good practice in general for anything that you need to make sure won't have multiple instances running, actually, so you should look into it even if you do have the script running constantly, which I do suggest.
For most things, it shouldn't be too hard to avoid memory leaks, but if you're having a lot of trouble with it (I sometimes do with complex third-party web frameworks, for example), I would suggest instead writing the script with a small, carefully-designed main loop that monitors the database for new jobs, and then uses the multiprocessing module to fork off new processes to complete each task.
When a task is complete, the child process can exit, immediately freeing any memory that isn't properly garbage collected, and the main loop should be simple enough that you can avoid any memory leaks.
This also offers the advantage that you can run multiple tasks in parallel if your system has more than one CPU core, or if your tasks spend a lot of time waiting for I/O.
This is a bit of a vague question. One thing you should remember is that it is very difficult to leak memory in Python, because of the automatic garbage collection. croning a Python script to handle the queue isn't very nice, although it would work fine.
I would use method 1; if you need more power you could make a small Python process that monitors the DB queue and starts new processes to handle the tasks.
I'd suggest using Celery, an asynchronous task queuing system which I use myself.
It may seem a bit heavy for your use case, but it makes it easy to expand later by adding more worker resources if/when needed.