I'm using a Queue in python, and it works something like a channel. That is, whenever you make an insertion, some other thread is waiting and fetches the inserted value. That value is then yield.
#classsynchronized('mutex')
def send(self, message):
# This might raise Full
if not self._is_closed:
self._queue.put(message)
return True
return False
#classsynchronized('mutex')
def close(self):
# Don't do what we don't need to
if self._is_closed:
return
# Make the _queue.get() request fail with an Empty exception
# This will cause the channel to stop listenning to messages
# First aquire the write lock, then notify the read lock and
# finally release the write lock. This is equivalent to an
# empty write, which will cause the Empty exception
print("ACQUIRING not_full")
self._queue.not_full.acquire()
# Close first. If the queue is empty it will raise Empty as fast as
# possible, instead of waiting for the timeout
self._is_closed = True
try:
print("NOTIFYING not_empty")
self._queue.not_empty.notify()
print("NOTIFIED not_empty")
finally:
self._queue.not_full.release()
print("RELEASED not_full")
def _yield_response(self):
try:
while True:
# Fetch from the queue, wait until available, or a timeout
timeout = self.get_timeout()
print("[WAITING]")
message = self._queue.get(True, timeout)
print("[DONE WAITING] " + message)
self._queue.task_done()
# Don't yield messages on closed queues, flush instead
# This prevents writting to a closed stream, but it's
# up to the user to close the queue before closing the stream
if not self._is_closed:
yield message
# The queue is closed, ignore all remaining messages
# Allow subclasses to change the way ignored messages are handled
else:
self.handle_ignored(message)
# This exception will be thrown when the channel is closed or
# when it times out. Close the channel, in case a timeout caused
# an exception
except Empty:
pass
# Make sure the channel is closed, we can get here by timeout
self.close()
# Finally, empty the queue ignoring all remaining messages
try:
while True:
message = self._queue.get_nowait()
self.handle_ignored(message)
except Empty:
pass
I only included the relevant methods, but notice this is a class. The thing is, this does not behave as I expected. The queue does get closed, all the prints show in the console, but the thread waiting for messages does not get notifyied. Instead, it always exits with a timeout.
All #classsynchronized('mutex') annotations synchronize the methods with the same identifier ('mutex') class-wise, that is, every method in a class with that annotation with the same ID is synchronized with each other.
The reason I acquire the not_full lock before closing is to prevent inserting in a closed channel. Only then do I notify the not_empty lock.
Any idea why this doesn't work? Any other suggestions?
Thanks in advance.
Edit:
I made a few changes to the prints. I create the channel and immediately send a message. Then I send an HTTP request for deleting it. This is the output:
[WAITING]
[DONE WAITING] message
[WAITING]
ACQUIRING not_full
NOTIFYING not_empty
NOTIFIED not_empty
RELEASE not_full
So:
The first message gets processed and successfully dispatched (I get it in the client, so...)
Then the queue is waiting. It should be waiting on the not_empty lock, right?
The I issue a DELETE request for the channel. It acquires the not_full lock (to prevent writes), and notifies the not_empty lock.
I really don't get it... If the thread gets notified why does it not unblock??
It seems like a bad idea to tamper with Queue's internal locks. How about formulating the problem in terms of only the official interface of Queue?
To emulate closing a queue, for example, you do self._queue.put(None) or some other special value. The waiting thread getting this special value know that the queue has been closed. The problem is that the special value is then no longer in the queue for potentially more threads to see; but this is easily fixed: when a thread gets the special value, it puts it again into the queue immediately.
I used a dummy insertion instead.
self._queue.put(None)
This wakes up the other thread and I know we're closing the channel because the message is None.
Related
I have a single background process running alongside the main one, where it uses Queue to communicate (using multiprocessing, not multithreading). The main process runs constantly, and the background thread runs once per queue item so that if it gets backlogged, it can still catch up. Instead of closing with the main script (I've enabled daemon for that), I would prefer it to run until the queue is empty, then save and quit.
It's started like this:
q_send = Queue()
q_recv = Queue()
p1 = Process(target=background_process, args=(q_send, q_recv))
p1.daemon = True
p1.start()
Here's how the background process currently runs:
while True:
received_data = q_recv.get()
#do stuff
One way I've considered is to switch the loop to run all the time, but check the size of the queue before trying to read it, and wait a few seconds if it's empty before trying again. There are a couple of problems though. The whole point is it'll run once per item, so if there are 1000 queued commands, it seems a little inefficient checking the queue size before each one. Also, there's no real limit on how long the main process can go without sending an update, so I'd have to set the timeout quite high, as opposed to instantly exiting when the connection is broken, and queue emptied. With the background thread using up to 2gb of ram, it could probably do with exiting as soon as possible.
It'd also make it look a lot more messy:
afk_time = 0
while True:
if afk_time > 300:
return
if not q_recv.qsize():
time.sleep(2)
afk_time += 2
else:
received_data = q_recv.get()
#do stuff
I came across is_alive(), and thought perhaps getting the main process from current_process() might work, but it gave a picking error when I tried to send it to the queue.
Queue.get has a keyword argument timeout which determines the time to wait for an item if the queue is empty. If no item is available when the timeout elapses then a Empty exception is raised.
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).
So you can except that error and break out of the loop:
try:
received_data = q_recv.get(timeout=300)
except queue.Empty:
return
I've written a basic utility that listens for messages in one thread, adds them to a FIFO queue and processes them in another thread. Each message takes a fixed time to process (it's waiting for a blinking light to stop blinking), but messages can arrive randomly (patterns in the code is a dictionary of regexes to match the incoming message to, if a match is found it adds it to the queue along with a color pattern to blink).
blink_queue = Queue()
def receive(data) :
message = data['text']
for pattern in patterns:
if re.match(pattern, message):
blink_queue.put(patterns[pattern])
break
return True
def blinker(q) :
while True:
args = q.get().split()
subprocess.Popen(
[blink_app] + args,
startupinfo=startupinfo,
stderr=subprocess.PIPE,
stdout=subprocess.PIPE)
time.sleep(blink_wait)
q.task_done()
def subscribe():
print("Listening for messages on '%s' channel..." % channel)
pubnub.subscribe({
'channel' : channel,
'callback' : receive
})
blink_worker = Thread(target=blinker, args=(blink_queue,))
blink_worker.daemon=True
blink_worker.start()
sub_thread = Thread(target=subscribe)
sub_thread.daemon=True
sub_thread.start()
sub_thread.join()
How do I implement a FIFO Queue in Python that automatically trims the oldest (first) queue if it grows to big. Do I create another watching thread, or do I keep the size in check on the subscribe thread? I'm really new at Python, so if there is a totally logical Data Type please feel free to call me a noob and send me in the right direction.
Turns out there is a logical type collections.deque. From the documentation:
If maxlen is not specified or is None, deques may grow to an arbitrary
length. Otherwise, the deque is bounded to the specified maximum
length. Once a bounded length deque is full, when new items are added,
a corresponding number of items are discarded from the opposite end.
(and here is the commit that implements this datatype)
For this I would subclass Queue and overload the put method to remove items in the fashion you desire if the Queue gets too large.
e.g.
class NukeOldDataQueue(Queue.Queue):
def put(self,*args,**kwargs):
if self.full():
try:
oldest_data = self.get()
print('[WARNING]: throwing away old data:'+repr(oldest_data))
# a True value from `full()` does not guarantee
# that anything remains in the queue when `get()` is called
except Queue.Empty:
pass
Queue.Queue.put(self,*args,**kwargs)
You may also want to pass the block=False parameter or manipulate the timeout parameter depending on how bad it is to accidentally throw away new data or whether blocking on the put() call is acceptable.
I have a consumer which listens for messages, if the flow of messages is more than the consumer can handle I want to start another instance of this consumer.
But I also want to be able to poll for information from the consumer(s), my thought was that I could use RPC to request this information from the producers by using a fanout exchange so all the producers gets the RPC-call.
My question is first of all is this possible and secondly is it reasonable?
If the question is "is it possible to send an RPC message to more than one server?" the answer is yes.
When you build an RPC call you attach a temporary queue to the message (usually in header.reply_to but you can also use internal message fields). This is the queue where RPC targets will publish their answers.
When you send an RPC to a single server you can receive more than one message on the temporary queue: this means that an RPC answer could be formed by:
a single message from a single source
more than one message from a single source
more than one message from several sources
The problems arising in this scenario are
when do you stop listening? If you know the number of RPC servers you can wait until each of them sent you an answer, otherwise you have to implement some form of timeout
do you need to track the source of the answer? You can add some special fields in your message to keep this information. The same for messages order.
Just some code to show how you can do it (Python with Pika library). Pay attention, this is far from perfection: the biggest problem is that you should reset the timeout when you get a new answer.
def consume_rpc(self, queue, result_len=1, callback=None, timeout=None, raise_timeout=False):
if timeout is None:
timeout = self.rpc_timeout
result_list = []
def _callback(channel, method, header, body):
print "### Got 1/%s RPC result" %(result_len)
msg = self.encoder.decode(body)
result_dict = {}
result_dict.update(msg['content']['data'])
result_list.append(result_dict)
if callback is not None:
callback(msg)
if len(result_list) == result_len:
print "### All results are here: stopping RPC"
channel.stop_consuming()
def _outoftime():
self.channel.stop_consuming()
raise TimeoutError
if timeout != -1:
print "### Setting timeout %s seconds" %(timeout)
self.conn_broker.add_timeout(timeout, _outoftime)
self.channel.basic_consume(_callback, queue=queue, consumer_tag=queue)
if raise_timeout is True:
print "### Start consuming RPC with raise_timeout"
self.channel.start_consuming()
else:
try:
print "### Start consuming RPC without raise_timeout"
self.channel.start_consuming()
except TimeoutError:
pass
return result_list
After some researching it seems that this is not possible. If you look at the tutorial on RabbitMQ.com you see that there is an id for the call which, as far as I understand gets consumed.
I've choosen to go another way, which is reading the log-files and aggregating the data.
I am debugging application that gather information from 2 sensors : a webcam and a microphone.
The general architecture is quite simple :
the main process sends messages (start, stop, get_data) via pipes to the child processes (one for each).
child processes gather the data and send it to the main process
Child & main processes are in infinite loops to process commands (the main process from the user, the child process from the main process).
It globally works but I have trouble stopping the child processes.
I have logged the code and it seems to happen 2 things :
The 'stop' message is sent but doesn't get through the pipe.
The child process continue to send data and the conn.send(data) blocks.
The behavior is clearly linked to the state of the connection, as child processes that send nothing back don't have this behavior. Still, I don't see how to debug/modify the current architecture which seems reasonnable.
So, what cause this blocking behavior and how to avoid it ?
This is the code which is executed for each iteration of the infinite loop in the child process :
def do(self):
while self.cnx.poll():
msg = self.cnx.recv()
self.queue.append(msg)
#==
if not self.queue:
func_name = 'default_action'
self.queue.append([func_name, ])
#==
msg = self.queue.pop()
func_name, args = msg[0], msg[1:]
#==
res = self.target.__getattribute__(func_name)(*args)
#==
running = func_name != 'stop'
#==
if res and self.send:
assert running
self.output_queue.append(res[0])
if self.output_queue and running:
self.cnx.send(self.output_queue.popleft())
#==
return running
update : it seems that the Pipe cannot be written simultaneously on both end. It works if change the last few lines of the above code to :
if self.output_queue and running:
if not self.cnx.poll():
self.cnx.send(self.output_queue.popleft())
The question stays open though as Pipe are documented as full duplex by default and this behavior is not documented at all. I must have misunderstood something. Please, enlight me!
update 2 : just to be clear, no connection is closed during in this situation. To describe the sequence of events :
the main process sends a messsage ("stop") (it empties the connection before sending the message)
the main process enter an (infinite) loop that stops when the child process is terminated.
meanwhile, the child process is blocked in the send and never gets the message.
A full duplex multiprocessing.Pipe is implemented as socketpair(). Calling .send can block for all the normal reasons when talking to a socket. Based on your description I think it's likely that the reader of your Pipe has quit reading and data has built up in the buffers in the kernel to the point where your .send blocks.
If you explicitly .close the receiving side you'll probably get some kind of error (although possibly SIGPIPE as well, not sure) when you try to .send. If your receiving connection was going out of scope this would probably happen automatically. You may be able to fix the problem by just being more careful not to store references (direct or indirect) to the receiving side so it gets deallocated when that thread goes away.
Trivial demo of blocking .send:
import multiprocessing
a, b = multiprocessing.Pipe()
while True:
print "send!"
a.send("hello world")
Now note that after a while it quits printing "send!"
(I'm using the pyprocessing module in this example, but replacing processing with multiprocessing should probably work if you run python 2.6 or use the multiprocessing backport)
I currently have a program that listens to a unix socket (using a processing.connection.Listener), accept connections and spawns a thread handling the request. At a certain point I want to quit the process gracefully, but since the accept()-call is blocking and I see no way of cancelling it in a nice way. I have one way that works here (OS X) at least, setting a signal handler and signalling the process from another thread like so:
import processing
from processing.connection import Listener
import threading
import time
import os
import signal
import socket
import errno
# This is actually called by the connection handler.
def closeme():
time.sleep(1)
print 'Closing socket...'
listener.close()
os.kill(processing.currentProcess().getPid(), signal.SIGPIPE)
oldsig = signal.signal(signal.SIGPIPE, lambda s, f: None)
listener = Listener('/tmp/asdf', 'AF_UNIX')
# This is a thread that handles one already accepted connection, left out for brevity
threading.Thread(target=closeme).start()
print 'Accepting...'
try:
listener.accept()
except socket.error, e:
if e.args[0] != errno.EINTR:
raise
# Cleanup here...
print 'Done...'
The only other way I've thought about is reaching deep into the connection (listener._listener._socket) and setting the non-blocking option...but that probably has some side effects and is generally really scary.
Does anyone have a more elegant (and perhaps even correct!) way of accomplishing this? It needs to be portable to OS X, Linux and BSD, but Windows portability etc is not necessary.
Clarification:
Thanks all! As usual, ambiguities in my original question are revealed :)
I need to perform cleanup after I have cancelled the listening, and I don't always want to actually exit that process.
I need to be able to access this process from other processes not spawned from the same parent, which makes Queues unwieldy
The reasons for threads are that:
They access a shared state. Actually more or less a common in-memory database, so I suppose it could be done differently.
I must be able to have several connections accepted at the same time, but the actual threads are blocking for something most of the time. Each accepted connection spawns a new thread; this in order to not block all clients on I/O ops.
Regarding threads vs. processes, I use threads for making my blocking ops non-blocking and processes to enable multiprocessing.
Isnt that what select is for??
Only call accept on the socket if the select indicates it will not block...
The select has a timeout, so you can break out occasionally occasionally to check
if its time to shut down....
I thought I could avoid it, but it seems I have to do something like this:
from processing import connection
connection.Listener.fileno = lambda self: self._listener._socket.fileno()
import select
l = connection.Listener('/tmp/x', 'AF_UNIX')
r, w, e = select.select((l, ), (), ())
if l in r:
print "Accepting..."
c = l.accept()
# ...
I am aware that this breaks the law of demeter and introduces some evil monkey-patching, but it seems this would be the most easy-to-port way of accomplishing this. If anyone has a more elegant solution I would be happy to hear it :)
I'm new to the multiprocessing module, but it seems to me that mixing the processing module and the threading module is counter-intuitive, aren't they targetted at solving the same problem?
Anyway, how about wrapping your listen functions into a process itself? I'm not clear how this affects the rest of your code, but this may be a cleaner alternative.
from multiprocessing import Process
from multiprocessing.connection import Listener
class ListenForConn(Process):
def run(self):
listener = Listener('/tmp/asdf', 'AF_UNIX')
listener.accept()
# do your other handling here
listen_process = ListenForConn()
listen_process.start()
print listen_process.is_alive()
listen_process.terminate()
listen_process.join()
print listen_process.is_alive()
print 'No more listen process.'
Probably not ideal, but you can release the block by sending the socket some data from the signal handler or the thread that is terminating the process.
EDIT: Another way to implement this might be to use the Connection Queues, since they seem to support timeouts (apologies, I misread your code in my first read).
I ran into the same issue. I solved it by sending a "stop" command to the listener. In the listener's main thread (the one that processes the incoming messages), every time a new message is received, I just check to see if it's a "stop" command and exit out of the main thread.
Here's the code I'm using:
def start(self):
"""
Start listening
"""
# set the command being executed
self.command = self.COMMAND_RUN
# startup the 'listener_main' method as a daemon thread
self.listener = Listener(address=self.address, authkey=self.authkey)
self._thread = threading.Thread(target=self.listener_main, daemon=True)
self._thread.start()
def listener_main(self):
"""
The main application loop
"""
while self.command == self.COMMAND_RUN:
# block until a client connection is recieved
with self.listener.accept() as conn:
# receive the subscription request from the client
message = conn.recv()
# if it's a shut down command, return to stop this thread
if isinstance(message, str) and message == self.COMMAND_STOP:
return
# process the message
def stop(self):
"""
Stops the listening thread
"""
self.command = self.COMMAND_STOP
client = Client(self.address, authkey=self.authkey)
client.send(self.COMMAND_STOP)
client.close()
self._thread.join()
I'm using an authentication key to prevent would be hackers from shutting down my service by sending a stop command from an arbitrary client.
Mine isn't a perfect solution. It seems a better solution might be to revise the code in multiprocessing.connection.Listener, and add a stop() method. But, that would require sending it through the process for approval by the Python team.