I wrote a daemon script in python that takes dicts from a queue and processes files based on the information from those dicts. Now I want to insert some additional dicts in that queue from a separated Django script. Is it possible to expose the queue as file to other software ? If not, is there any other solution ?
My project runs on debian linux.
If you start the daemon from the django script, then you just need to use the object's methods (or directly access its queue) from the django script.
If the daemon is already started, then you need inter-process communication. Sockets or pipes are some options. Regularly checking a file's content is another solution, but not as responsive.
You might take a look at the official documentation.
I'm not a big fan of ipc when it comes down to rather trivial communication setups. Building a network based client server model also adds a lot of overhead. Since most probably the two processes will be running on the same machine.
You could create a file based queue, either pickle it or use some kind of serialization format.
The client your seperate django script will fill that file.
Your deamin will watch the file, and append the deserialized or depickled queue objects to the deamon's queue object.
pynotify for watching the file if you're running a gnu/linux os.
Related
I have just learned about python concurrency and its library module multiprocessing. Most examples I have encountered are within ONE python script, it spawns several processes, and communicate among them using multiprocessing.Queue.
My question is: without using message broker or a third supervising application, can TWO python script communicate with each other using multiprocessing.Queue?
The multiprocessing module is a package that supports spawning processes, so that you can write code that executes in parallel. This means that you can write one python script that spawns multiple processes transparently, without worrying much about how these processes serialize data & pass it to each-other.
As for your question, it depends... Why do they need to be separate?
If the only concern is that your functions are defined in different modules/scripts, you can just import everything you need in the script that uses the Queue and make all your functions available in one script.
If your use-case is that you want one script to wait for requests (server) & the other script to be a client (it sends requests to the server when needed and waits for response), then you need to implement some sort of RPC protocol.
You can make an http server using web frameworks like Flask & send http requests to it from the client, or if you only need to share short simple messages, you can implement your own message exchange protocol using sockets.
So to sum up: It is possible for 2 python processes to communicate without a message broker (e.g: through sockets). But you want to use multiprocessing if you want to run 1 python script that spawns multiple processes that can communicate with one-another. If instead you need to start 2 independent scripts and have one of them request the other one to do some work & return the output, you need to implement some RPC protocol between them. The multiprocessing.Queue object itself is not a replacement for message brokers. If you want independent scripts that are started independently to communicate through a message queue, that queue needs to live either in one of the processes that are communicating (i.e: the server), or in a 3rd process.
I've been scratching my head trying to figure out if this is possible.
I have a server program running with about 30 different socket connections to it from all over the country. I need to update this server program now and although the client devices will automatically reconnect, its not totally reliable.
I was wondering if there is a way of saving the socket object to a file? then load it back up when the server restarts? or forcefully keeping a socket open even after the program stops. This way the clients never disconnect at all.
Could really do with hot swappable code here really!
Solution 1.
It can be done with some process magic, at least under linux (although I do believe similar windows api exists). First of all note that sockets cannot be stored in a file. These objects are temporary by their nature. But you can keep them in a separate process. Have a look at this:
Can I open a socket and pass it to another process in Linux
So one way to accomplish this is the following:
Create a "keeper" process at some point (make sure that the process is not a child of the main process so that it stays alive when the main process is gone)
Send all sockets to the keeper process via sendmsg() with SCM_RIGHTS
Shutdown the main process
Do whatever update you have to
Fire the main process
Retrieve sockets from the keeper process
Shutdown the keeper process
However this solution is quite difficult to maintain. You have two separate processes, it is unclear which is the master and which is a slave. So you would probably need another master process at the top. Things get nasty very quickly, not to mention security issues.
Solution 2.
Reloading modules as suggested by #gavinb might be a solution. Note however that in practice this often breaks the app. You never know what those modules do under the hood unless you know the code of every single Python file you use. Plus it imposes some restrictions on modules, i.e. they have to be reloadable. For example some modules use inline caching which makes reloading difficult.
Also once a module is loaded in a different module it keeps a reference to that module. So you not only have to reload it but also update references in every other module that loaded it earlier. The maintanance costs raise very quickly unless you thought about it at the begining of the project (so that every import is encapsulated for easy reload). And bugs caused by two different versions of a module running in the same process are (I imagine, never been in this situation though) extremely difficult to find.
Anyway I would avoid that.
Solution 3.
So this is XY problem. Instead of saving sockets how about you put a proxy in front of the main server? IMO this is the safest and at the same time simpliest solution. The proxy will communicate with the main server (for example over unix domain sockets) and will buffer the data and automatically reconnect to the main server once it is available again. Perhaps you can even reuse some existing tech, e.g. nginx.
No, the sockets are special file handles that belong to the process. If you close the process, the runtime will force close any open files/sockets. This is not Python specific; it is just how operating systems manage resources.
Now what you can do however is dynamically reload one or more modules while keeping the process active. It might take some careful management when you have open sockets, but in theory it should be possible. So yes, hot swappable code is actually supported by Python.
Do some reading and research on "dynamic reloading". The importlib module in Python 3 provides the reload function which is used to:
Reload a previously imported module. The argument must be a module object, so it must have been successfully imported before. This is useful if you have edited the module source file using an external editor and want to try out the new version without leaving the Python interpreter.
I think your critical question is how to hot reload.
And as mentioned by #gavinb, you can import importlib and then use importlib.reload(module) to reload a module dynamically.
Be careful, the parameter of reload(param) must be a module.
I'm trying to implement a plugin architecture in Python.
I've started writing it using the Threading module where each plugin is a thread which I invoke using the Thread.start() method (since all plugins subclass BasePlugin which subclasses Thread). However I've just come across the multiprocessing module.
I'm currently wondering if I should switch to the multiprocessing module and share data using shared memory / Pipes etc...
I'd like to get other's opinions on this.
The plugin architecture I've been working on works as follows:
An event is received by the Plugin Manager. The Plugin Manager checks for all the plugins who've subscribed to that type of event. It activates them and sends them the event object (since it holds additional information). If one of the plugins is already active there is no need to spawn it (just send the event object to it).
In addition there are a few resources which belong only to one plugin at any point in time. Each plugin can request the resource (I'm not worrying about any race condition here since there won't be that many plugins active at once).
Threads share memory with the primary process and each other. For example you can have a list that is available to all threads. An item appended to a list can be seen by other threads. But you have to be careful. You have to understand which operations on data structures are thread safe and which are not. What happens to the behaviour of your program when two threads are checking for the existence of a key in a dictionary and then writing to it?
Multiple processes do not share memory. The new process that you start gets a copy of the memory at the point where it was spawned.
Threads use less resources. But can be hard to reason about. On the other hand communication between processes is tricky. And you can't just access an arbitrary Python data structure. Which it sounds like you want to be able to do.
A badly written plugin, if it was in a thread, could crash your whole program. Whereas if it was in a separate process this wouldn't happen. Maybe that's a consideration?
I need to share some queue between two applications on same machine, one is Tornado which is going to occasionally add message to that queue and another is python script runs from cron which is going in every iteration add new messages. Can anyone suggest me module for this ?
(Can this be solved with redis usage, I avoid to use mysql for this purpose )
I would use redis with a list. You can push a element top, and rpop to remove from the tail.
See redis rpop
and redis lpushx
The purest way I can think of to do this is with IPC. Python has very good support for IPC between two processes when one process spawns another, but not in your scenario. There are python modules for ipc such as sysv_ipc and posix_ipc. But if you are going to have your main application built in tornado, why not just have it listen on a zeromq socket for published messages.
Here is a link with more information. You want the Publisher-Subscriber model.
http://zeromq.github.io/pyzmq/eventloop.html#tornado-ioloop
Your cron job will start and publish messages a to zeromq socket. Your already running application will receive them as subscriber.
Try RabbitMQ for hosting the queue independent of your applications, then access using Pika, which even comes with a Tornado adapter. Just pick the appropriate model: queue/exchange/topic and protocol of the message you want (strings, json, xml, yaml) and you are set.
What packages should I look at for writing a python daemon and processing jobs? Also, what do I need to do for a python daemon?
I'm pretty happy with beanstalkd, which has client libraries available in various languages:
Daemon:
http://kr.github.com/beanstalkd/
Python client library:
http://code.google.com/p/pybeanstalk/
Your question is a bit ambiguous, but I'm assuming you mean you would like to write a python daemon that will process jobs that get thrown in a queue. If not, please say as much. :-)
I've heard a lot of great things about redis. The folks at github built resque as a job processing daemon for Ruby. If you're language flexible, you could just use that, but if you're not, you could emulate it in as much or as little depth as you like making use of redis as your queue system. Depending on how pluggable and extensible you need it to be, this could be a really simple thing to implement.
Another option I ran across after some more googling is redqueue. It looks like it might already implement most of a job queue.
If you're using django, you may wish to consider the Celery project. It's a job queue system based on RabbitMQ which is yet another queuing server with excellent reviews.
As far as creating a daemon in python, there are a number of options. You can look at this page on activestate, which is a good start. Better yet, you can use python-daemon to do it all for you. But if you use one of the above options or beanstalkd as recommended by mczepiel, you probably won't have to make your process run as a daemon.
I have recently (this week) implemented a queue in RabbitMQ with a python daemon extracting the information and storing it on a database (using Django ORM). The daemon has a intermediate buffer so it will wait a little and write in the database in batches, instead of writing each time a little message arrives.
I've made the integration with the queue using this little flopsy module, which is easy to set up. The only problem I've got it to be able to set up a timeout for waiting a message, as the module has not a clear way of doing that. After a while playing with the interactive shell and making a few dir(), I manage to get to the socket object and set up the timeout.
I considered also Celery, but seems to be more focused on using internally a RabbitMQ to allow you to launch tasks (periodically or asynchronously), more that using a queue to communicating with other systems. In our case, the queue can be feed both by Python systems and Ruby ones.
Once I've completed the process, I've made some adjustments to allow running it as a daemon (mostly storing the standard output to a file to allow easy logging) and then create a bash script that launch a start-stop-daemon command. I've followed more or less this schema
I discovered python-daemon just about one day late, so after the work is done it makes no sense revisiting it, but maybe it makes more sense for a Python project.