Python multiprocessing queue.get with timeout vs sleep - python

I can't find any documentation on python's queue get with timeout: get([block[, timeout]]) whereas there is good documentation on python's time.sleep() at http://www.pythoncentral.io/pythons-time-sleep-pause-wait-sleep-stop-your-code/.
I've used the linux time to time a loop of 5, 500 and 5000 over both with a period of 100 ms and they both seem similar.
Snippet 1: with queue timeout
while True:
try:
if self._queue.get(True,period) == '!STOP!: break
except:
# Queue.Empty session, keep going
-- do stuff here --
Snippet 2: With time sleep
while True:
try:
if self._queue.get_nowait() == '!STOP!: break
except:
# Queue.Empty session, keep going
-- do stuff here --
time.sleep(period)
Snippet 1 is preferred because instead of sleeping, and then checking the poison pill queue, it 'sleeps' checking the queue. Of course it is a pretty moot point, since the period will normally only be between 0.100 and 0.500 secs but I wan't to make sure there isn't something in the queue.get that I'm missing.

As you said, the first option is a better choice because instead of just unconditionally sleeping for period, then checking to see if anything is in the queue, and then sleeping again, you're actively waiting for something to be put into the queue for the entire period, and then just briefly doing something other than waiting for the '!STOP!' to arrive. There's no hidden gotchas; get_nowait is internally using time.time() + period to decide how long to wait 1) to be able to acquire the internal lock on the queue, and 2) for something to actually be in the queue to get. Here's the relevant code from multprocessing/queues.py:
if block:
deadline = time.time() + timeout
if not self._rlock.acquire(block, timeout): # Waits for up to `timeout` to get the lock
raise Empty # raise empty if it didn't get it
try:
if block:
timeout = deadline - time.time()
if timeout < 0 or not self._poll(timeout): # Once it has the lock, waits for however much time is left before `deadline` to something to arrive
raise Empty
elif not self._poll():
raise Empty
res = self._recv()
self._sem.release()
return res
finally:
self._rlock.release()

Related

How to slice a queue?

I have a queue which I process in a loop
while True:
# a processing loop
batch = []
while True:
e = q.get()
if e:
batch.append(e)
else:
# the queue is empty
break
do_something_with(batch)
# wait a moment before emptying the queue again
time.sleep(2)
The idea is to empty the queue, process its content and wait a moment before checking the content again.
I sometimes hit a race condition where the queue is alimented while I get() an element and I end up with a growing batch which is never processed further.
One solution would be to check batch size and process it when the size is right. This does not work if I have not that many events getting in the queue and batch never reaching the correct size - but I need the events (whatever is their number) to be processed and not wait until enough accumulate.
The second solution is to build a check based on the size and on the time batch is idle -- this overly complicates the code.
One good solution would be to "get up to n elements from the queue at once". I could not find anything like that in the documentation. Is there a way to pop several elements at once from the queue (à la slicing for a list)?
Queue.get blocks by default; source of infinite loop.
Queue.get(block=True, timeout=None)
Remove and return an item from the queue. If optional args block is
true and timeout is None (the default), block if necessary until an
item is available. If timeout is a positive number, it blocks at most
timeout seconds and raises the Empty exception if no item was
available within that time. Otherwise (block is false), return an item
if one is immediately available, else raise the Empty exception
(timeout is ignored in that case).
You should use, Queue.get_nowait or Queue.get(block=False) to prevent block. Or use Queue.get(timeout=<seconds>) wait at most <seconds> when queue is empty.
Solution mentioned in your question sound good:
BATCH_SIZE = 10
while True:
batch = []
# Get out of loop if enough item collected or queue is empty
while len(batch) < BATCH_SIZE:
try:
e = q.get_nowait() # OR q.get(timeout=0.1)
except Empty:
break
# To prevent empty batch
# if batch:
# break
do_something_with(batch)
# wait a moment before emptying the queue again
time.sleep(2)
The workaround I am using now, until a better idea/solution:
while True:
# get up to 1000 elements form the queue
batch = []
for _ in range(1000):
try:
e = q.get(block=False)
except queue.Empty:
continue
else:
batch.append(e)
do_something_with(batch)
time.sleep(2)
I may make 1000 useless attempts to get an element (queue empty), or have all of them (even when the queue grows), or anything in between

Detect if main process has been quit from background process

I have a single background process running alongside the main one, where it uses Queue to communicate (using multiprocessing, not multithreading). The main process runs constantly, and the background thread runs once per queue item so that if it gets backlogged, it can still catch up. Instead of closing with the main script (I've enabled daemon for that), I would prefer it to run until the queue is empty, then save and quit.
It's started like this:
q_send = Queue()
q_recv = Queue()
p1 = Process(target=background_process, args=(q_send, q_recv))
p1.daemon = True
p1.start()
Here's how the background process currently runs:
while True:
received_data = q_recv.get()
#do stuff
One way I've considered is to switch the loop to run all the time, but check the size of the queue before trying to read it, and wait a few seconds if it's empty before trying again. There are a couple of problems though. The whole point is it'll run once per item, so if there are 1000 queued commands, it seems a little inefficient checking the queue size before each one. Also, there's no real limit on how long the main process can go without sending an update, so I'd have to set the timeout quite high, as opposed to instantly exiting when the connection is broken, and queue emptied. With the background thread using up to 2gb of ram, it could probably do with exiting as soon as possible.
It'd also make it look a lot more messy:
afk_time = 0
while True:
if afk_time > 300:
return
if not q_recv.qsize():
time.sleep(2)
afk_time += 2
else:
received_data = q_recv.get()
#do stuff
I came across is_alive(), and thought perhaps getting the main process from current_process() might work, but it gave a picking error when I tried to send it to the queue.
Queue.get has a keyword argument timeout which determines the time to wait for an item if the queue is empty. If no item is available when the timeout elapses then a Empty exception is raised.
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).
So you can except that error and break out of the loop:
try:
received_data = q_recv.get(timeout=300)
except queue.Empty:
return

Interrupting a Queue.get

How can I interrupt a blocking Queue.get() in Python 3.X?
In Python 2.X setting a long timeout seems to work but the same cannot be said for Python 3.5.
Running on Windows 7, CPython 3.5.1, 64 bit both machine and Python.
Seems like it does not behave the same on Ubuntu.
The reason it works on Python 2 is that Queue.get with a timeout on Python 2 is implemented incredibly poorly, as a polling loop with increasing sleeps between non-blocking attempts to acquire the underlying lock; Python 2 doesn't actually feature a lock primitive that supports a timed blocking acquire (which is what a Queue internal Condition variable needs, but lacks, so it uses the busy loop). When you're trying this on Python 2, all you're checking is whether the Ctrl-C is processed after one of the (short) time.sleep calls finishes, and the longest sleep in Condition is only 0.05 seconds, which is so short you probably wouldn't notice even if you hit Ctrl-C the instant a new sleep started.
Python 3 has true timed lock acquire support (thanks to narrowing the number of target OSes to those which feature a native timed mutex or semaphore of some sort). As such, you're actually blocking on the lock acquisition for the whole timeout period, not blocking for 0.05s at a time between polling attempts.
It looks like Windows allows for registering handlers for Ctrl-C that mean that Ctrl-C doesn't necessarily generate a true signal, so the lock acquisition isn't interrupted to handle it. Python is informed of the Ctrl-C when the timed lock acquisition eventually fails, so if the timeout is short, you'll eventually see the KeyboardInterrupt, but it won't be seen until the timeout lapses. Since Python 2 Condition is only sleeping 0.05 seconds at a time (or less) the Ctrl-C is always processed quickly, but Python 3 will sleep until the lock is acquired.
Ctrl-Break is guaranteed to behave as a signal, but it also can't be handled by Python properly (it just kills the process) which probably isn't what you want either.
If you want Ctrl-C to work, you're stuck polling to some extent, but at least (unlike Python 2) you can effectively poll for Ctrl-C while live blocking on the queue the rest of the time (so you're alerted to an item becoming free immediately, which is the common case).
import time
import queue
def get_timed_interruptable(q, timeout):
stoploop = time.monotonic() + timeout - 1
while time.monotonic() < stoploop:
try:
return q.get(timeout=1) # Allow check for Ctrl-C every second
except queue.Empty:
pass
# Final wait for last fraction of a second
return q.get(timeout=max(0, stoploop + 1 - time.monotonic()))
This blocks for a second at a time until:
The time remaining is less than a second (it blocks for the remaining time, then allows the Empty to propagate normally)
Ctrl-C was pressed during the one second interval (after the remainder of that second elapses, KeyboardInterrupt is raised)
An item is acquired (if Ctrl-C was pressed, it will raise at this point too)
As mentioned in the comment thread to the great answer #ShadowRanger provided above, here is an alternate simplified form of his function:
import queue
def get_timed_interruptable(in_queue, timeout):
'''
Perform a queue.get() with a short timeout to avoid
blocking SIGINT on Windows.
'''
while True:
try:
# Allow check for Ctrl-C every second
return in_queue.get(timeout=min(1, timeout))
except queue.Empty:
if timeout < 1:
raise
else:
timeout -= 1
And as #Bharel pointed out in the comments, this could run a few milliseconds longer than the absolute timeout, which may be undesirable. As such here is a version with significantly better precision:
import time
import queue
def get_timed_interruptable_precise(in_queue, timeout):
'''
Perform a queue.get() with a short timeout to avoid
blocking SIGINT on Windows. Track the time closely
for high precision on the timeout.
'''
timeout += time.monotonic()
while True:
try:
# Allow check for Ctrl-C every second
return in_queue.get(timeout=max(0, min(1, timeout - time.monotonic())))
except queue.Empty:
if time.monotonic() > timeout:
raise
Just use get_nowait which won't block.
import time
...
while True:
if not q.empty():
q.get_nowait()
break
time.sleep(1) # optional timeout
This is obviously busy waiting, but q.get() does basically the same thing.

efficient python raw_input and serial port polling

I am working on a python project that is polling for data on a COM port and also polling for user input. As of now, the program is working flawlessly but seems to be inefficient. I have the serial port polling occurring in a while loop running in a separate thread and sticking data into a Queue. The user input polling is also occurring in a while loop running in a separate thread sticking input into a Queue. Unfortunately I have too much code and posting it would take away from the point of the question.
So is there a more efficient way to poll a serial or raw_input() without sticking them in an infinite loop and running them in their own thread?
I have been doing a lot of research on this topic and keep coming across the "separate thread and Queue" paradigm. However, when I run this program I am using nearly 30% of my CPU resources on a quad-core i7. There has to be a better way.
I have worked with ISR's in C and was hoping there is something similar to interrupts that I could be using. My recent research has uncovered a lot of "Event" libraries with callbacks but I can't seems to wrap my head around how they would fit in my situation. I am developing on a Windows 7 (64-bit) machine but will be moving the finished product to a RPi when I am finished. I'm not looking for code, I just need to be pointed in the right direction. Thank you for any info.
You're seeing the high CPU usage because your main thread is using the non-blocking get_nowait call to poll two different queues in an infinite loop, which means most of the time your loop is going to be constantly looping. Constantly running through the loop uses CPU cycles, just as any tight infinite loop does. To avoid using lots of CPU, you want to have your infinite loops use blocking I/O, so that they wait until there's actually data to process before continuing. This way, you're not constantly running through the loop, and therefore using CPU.
So, user input thread:
while True:
data = raw_input() # This blocks, and won't use CPU while doing so
queue.put({'type' : 'input' : 'data' : data})
COM thread:
while True:
data = com.get_com_data() # This blocks, and won't use CPU while doing so
queue.put({'type' : 'COM' : 'data' : data})
main thread:
while True:
data = queue.get() # This call will block, and won't use CPU while doing so
# process data
The blocking get call will just wait until it's woken up by a put in another thread, using a threading.Condition object. It's not repeatedly polling. From Queue.py:
# Notify not_empty whenever an item is added to the queue; a
# thread waiting to get is notified then.
self.not_empty = _threading.Condition(self.mutex)
...
def get(self, block=True, timeout=None):
self.not_empty.acquire()
try:
if not block:
if not self._qsize():
raise Empty
elif timeout is None:
while not self._qsize():
self.not_empty.wait() # This is where the code blocks
elif timeout < 0:
raise ValueError("'timeout' must be a non-negative number")
else:
endtime = _time() + timeout
while not self._qsize():
remaining = endtime - _time()
if remaining <= 0.0:
raise Empty
self.not_empty.wait(remaining)
item = self._get()
self.not_full.notify()
return item
finally:
self.not_empty.release()
def put(self, item, block=True, timeout=None):
self.not_full.acquire()
try:
if self.maxsize > 0:
if not block:
if self._qsize() == self.maxsize:
raise Full
elif timeout is None:
while self._qsize() == self.maxsize:
self.not_full.wait()
elif timeout < 0:
raise ValueError("'timeout' must be a non-negative number")
else:
endtime = _time() + timeout
while self._qsize() == self.maxsize:
remaining = endtime - _time()
if remaining <= 0.0:
raise Full
self.not_full.wait(remaining)
self._put(item)
self.unfinished_tasks += 1
self.not_empty.notify() # This is what wakes up `get`
finally:
self.not_full.release()

Python - Threading and a While True Loop

I have a thread that appends rows to self.output and a loop that runs until self.done is True (or the max execution time is reached).
Is there a more efficient way to do this other than using a while loop that constantly checks to see if it's done. The while loop causes the CPU to spike to 100% while it's running..
time.clock()
while True:
if len(self.output):
yield self.output.pop(0)
elif self.done or 15 < time.clock():
if 15 < time.clock():
yield "Maximum Execution Time Exceeded %s seconds" % time.clock()
break
Are your threads appending to self.output here, with your main task consuming them? If so, this is a tailor-made job for Queue.Queue. Your code should become something like:
import Queue
# Initialise queue as:
queue = Queue.Queue()
Finished = object() # Unique marker the producer will put in the queue when finished
# Consumer:
try:
while True:
next_item = self.queue.get(timeout=15)
if next_item is Finished: break
yield next_item
except Queue.Empty:
print "Timeout exceeded"
Your producer threads add items to the queue with queue.put(item)
[Edit] The original code has a race issue when checking self.done (for example multiple items may be appended to the queue before the flag is set, causing the code to bail out at the first one). Updated with a suggestion from ΤΖΩΤΖΙΟΥ - the producer thread should instead append a special token (Finished) to the queue to indicate it is complete.
Note: If you have multiple producer threads, you'll need a more general approach to detecting when they're all finished. You could accomplish this with the same strategy - each thread a Finished marker and the consumer terminates when it sees num_threads markers.
Use a semaphore; have the working thread release it when it's finished, and block your appending thread until the worker is finished with the semaphore.
ie. in the worker, do something like self.done = threading.Semaphore() at the beginning of work, and self.done.release() when finished. In the code you noted above, instead of the busy loop, simply do self.done.acquire(); when the worker thread is finished, control will return.
Edit: I'm afraid I don't address your needed timeout value, though; this issue describes the need for a semaphore timeout in the standard library.
Use time.sleep(seconds) to create a brief pause after each iteration of the while loop to relinquish the cpu. You will have to set the time you sleep during each iteration based on how important it is that you catch the job quickly after it's complete.
Example:
time.clock()
while True:
if len(self.output):
yield self.output.pop(0)
elif self.done or 15 < time.clock():
if 15 < time.clock():
yield "Maximum Execution Time Exceeded %s seconds" % time.clock()
break
time.sleep(0.01) # sleep for 10 milliseconds
use mutex module or event/semaphore
You have to use a synchronization primitive here. Look here: http://docs.python.org/library/threading.html.
Event objects seem very simple and should solve your problem. You can also use a condition object or a semaphore.
I don't post an example because I've never used Event objects, and the alternatives are probably less simple.
Edit: I'm not really sure I understood your problem. If a thread can wait until some condition is statisfied, use synchronization. Otherwise the sleep() solution that someone posted will about taking too much CPU time.

Categories

Resources