Need queue module to be shared between two applications

Need queue module to be shared between two applications - python

I need to share some queue between two applications on same machine, one is Tornado which is going to occasionally add message to that queue and another is python script runs from cron which is going in every iteration add new messages. Can anyone suggest me module for this ?
(Can this be solved with redis usage, I avoid to use mysql for this purpose )

I would use redis with a list. You can push a element top, and rpop to remove from the tail.
See redis rpop
and redis lpushx

The purest way I can think of to do this is with IPC. Python has very good support for IPC between two processes when one process spawns another, but not in your scenario. There are python modules for ipc such as sysv_ipc and posix_ipc. But if you are going to have your main application built in tornado, why not just have it listen on a zeromq socket for published messages.
Here is a link with more information. You want the Publisher-Subscriber model.
http://zeromq.github.io/pyzmq/eventloop.html#tornado-ioloop
Your cron job will start and publish messages a to zeromq socket. Your already running application will receive them as subscriber.

Try RabbitMQ for hosting the queue independent of your applications, then access using Pika, which even comes with a Tornado adapter. Just pick the appropriate model: queue/exchange/topic and protocol of the message you want (strings, json, xml, yaml) and you are set.

Related

Is it possible to use multiprocessing.Queue to communicate between TWO python scripts?

I have just learned about python concurrency and its library module multiprocessing. Most examples I have encountered are within ONE python script, it spawns several processes, and communicate among them using multiprocessing.Queue.
My question is: without using message broker or a third supervising application, can TWO python script communicate with each other using multiprocessing.Queue?

The multiprocessing module is a package that supports spawning processes, so that you can write code that executes in parallel. This means that you can write one python script that spawns multiple processes transparently, without worrying much about how these processes serialize data & pass it to each-other.
As for your question, it depends... Why do they need to be separate?
If the only concern is that your functions are defined in different modules/scripts, you can just import everything you need in the script that uses the Queue and make all your functions available in one script.
If your use-case is that you want one script to wait for requests (server) & the other script to be a client (it sends requests to the server when needed and waits for response), then you need to implement some sort of RPC protocol.
You can make an http server using web frameworks like Flask & send http requests to it from the client, or if you only need to share short simple messages, you can implement your own message exchange protocol using sockets.
So to sum up: It is possible for 2 python processes to communicate without a message broker (e.g: through sockets). But you want to use multiprocessing if you want to run 1 python script that spawns multiple processes that can communicate with one-another. If instead you need to start 2 independent scripts and have one of them request the other one to do some work & return the output, you need to implement some RPC protocol between them. The multiprocessing.Queue object itself is not a replacement for message brokers. If you want independent scripts that are started independently to communicate through a message queue, that queue needs to live either in one of the processes that are communicating (i.e: the server), or in a 3rd process.

Python Server, Job Queue, Launch Multiprocessing Job

I need to create a python server that can accept multiple job requests. Then from those it requests, it processes each Job one at a time but the server can still accept new Jobs while processing a task.
Does anyone have an suggestions on how to do this?
Thanks

Sure. Create a multiprocessing.Pool which will by default spawn one process per core. Then use the original process to run an HTTP service or something else that accepts jobs via some protocol. The main process then listens for new requests and submits them to the pool for async processing.

Use twisted. Twisted is an event-driven networking engine. Twisted also supports many common network protocols, including SMTP, POP3, IMAP, SSHv2, and DNS.

Python stdout to queue broker or a websocket

Is there a way to push stdout into a queue broker or to a websocket?
So far I've been unable to find a clear explanation on how to do this.
I have several processes running in parallel and the idea is to create a UI where you can switch from process to process and take a look at what they are doing.

One approach that will work (and is non-blocking, can serve multiple clients) is using Python, Twisted and Autobahn:
Connect one or multiple ProcessProtocol instances
http://twistedmatrix.com/documents/current/core/howto/process.html
to a Twisted WebSocket server
https://github.com/tavendo/AutobahnPython
https://github.com/tavendo/AutobahnPython/tree/master/examples/websocket/broadcast
https://github.com/tavendo/AutobahnPython/tree/master/examples/wamp/pubsub/simple
Disclosure: I am author of Autobahn.

How can I offer concurrency with Pika in long-working consumers?

Short version: How can I prevent blocking Pika in a Remote Procedure Call situation?
Long version:
None of the Pika examples demonstrate my use case.
I have a Tornado server which communicates with other processes/machines over AMQP (RabbitMQ, Pika). These other processes are not very well-defined, but they will, for the most part, be returning data (see the RPC example on RabbitMQ's website). Sometimes, a process might need to take an extremely long time to process a large amount of information, but it shouldn't completely block smaller requests from being taken by the process. Or maybe the remote server is blocking because it sent out a web request. Think of it like a web server, but using AMQP instead of HTTP.
Since Pika documentation claims that it's not thread-safe, I cannot pass the connection to multiple threads (or processes, for that matter). What I want to do is start a new process, and add a socket event (for the pipe to that program) to the Pika IOLoop, as I would be able to do with Tornado. The Pika IOLoop is much different from the Tornado IOLoop, and it doesn't seem to support adding multiple handlers; it seems to operate using one "poller" on one socket.
I'd like to avoid requiring the Tornado package for this package, because I would only be using the IOLoop. It's not out of the question, but I want to see what my other options are, or if there is a solution to my problem by somehow connecting multiple Pika IOLoops/Pollers. RabbitMQ's documentation says that workers can often be "scaled up" by adding more. I'd like to avoid creating a connection for every request that comes in (if they're coming in fast).

From what you described, I believe you unfortunately either need a different communication model or need multiple Pika IOLoops/Pollers/Redundant Connections.
It sounds like from documentation and from other sites that RPC in Pika is always a blocking statement and unable to be passed around between threads. See http://www.rabbitmq.com/tutorials/tutorial-six-python.html where the author points out that RPC in Pika is inherently blocking once you actually call the ioloop.
"When in doubt avoid RPC. If you can, you should use an asynchronous pipeline - instead of RPC-like blocking"
If you want to keep sending multiple RPC calls on the same connection before one completes, you'll need a different Asynchronous model. Multiple RPC calls on the same connection before completion isn't the usual implementation of the RPC model, though it's not technically forbidden ( http://pic.dhe.ibm.com/infocenter/aix/v6r1/index.jsp?topic=%2Fcom.ibm.aix.progcomm%2Fdoc%2Fprogcomc%2Frpc_mod.htm ). I don't think Pika operates with this model, though it does have asynchronous support via callbacks (not what you are looking for I think).
If you just want to easily be able to generate new connections on the fly you could use a thread or process wrapper on a connection, where you create and block on the RPC in the other context and push to a common Queue which the main thread can monitor. Tornado might give you this, but I agree that it's a bit of overkill, and making such a connection wrapper shouldn't be all that difficult as I've done something similar for other I/O ops in less than 100 lines of Python (see Queue package for Threaded wrapper version). I think you already saw this possibility though based on your talk of multiple IOLoops.

How to expose a queue from a python script to other software?

I wrote a daemon script in python that takes dicts from a queue and processes files based on the information from those dicts. Now I want to insert some additional dicts in that queue from a separated Django script. Is it possible to expose the queue as file to other software ? If not, is there any other solution ?
My project runs on debian linux.

If you start the daemon from the django script, then you just need to use the object's methods (or directly access its queue) from the django script.
If the daemon is already started, then you need inter-process communication. Sockets or pipes are some options. Regularly checking a file's content is another solution, but not as responsive.
You might take a look at the official documentation.

I'm not a big fan of ipc when it comes down to rather trivial communication setups. Building a network based client server model also adds a lot of overhead. Since most probably the two processes will be running on the same machine.
You could create a file based queue, either pickle it or use some kind of serialization format.
The client your seperate django script will fill that file.
Your deamin will watch the file, and append the deserialized or depickled queue objects to the deamon's queue object.
pynotify for watching the file if you're running a gnu/linux os.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.