How can I interrupt a blocking Queue.get() in Python 3.X?
In Python 2.X setting a long timeout seems to work but the same cannot be said for Python 3.5.
Running on Windows 7, CPython 3.5.1, 64 bit both machine and Python.
Seems like it does not behave the same on Ubuntu.
The reason it works on Python 2 is that Queue.get with a timeout on Python 2 is implemented incredibly poorly, as a polling loop with increasing sleeps between non-blocking attempts to acquire the underlying lock; Python 2 doesn't actually feature a lock primitive that supports a timed blocking acquire (which is what a Queue internal Condition variable needs, but lacks, so it uses the busy loop). When you're trying this on Python 2, all you're checking is whether the Ctrl-C is processed after one of the (short) time.sleep calls finishes, and the longest sleep in Condition is only 0.05 seconds, which is so short you probably wouldn't notice even if you hit Ctrl-C the instant a new sleep started.
Python 3 has true timed lock acquire support (thanks to narrowing the number of target OSes to those which feature a native timed mutex or semaphore of some sort). As such, you're actually blocking on the lock acquisition for the whole timeout period, not blocking for 0.05s at a time between polling attempts.
It looks like Windows allows for registering handlers for Ctrl-C that mean that Ctrl-C doesn't necessarily generate a true signal, so the lock acquisition isn't interrupted to handle it. Python is informed of the Ctrl-C when the timed lock acquisition eventually fails, so if the timeout is short, you'll eventually see the KeyboardInterrupt, but it won't be seen until the timeout lapses. Since Python 2 Condition is only sleeping 0.05 seconds at a time (or less) the Ctrl-C is always processed quickly, but Python 3 will sleep until the lock is acquired.
Ctrl-Break is guaranteed to behave as a signal, but it also can't be handled by Python properly (it just kills the process) which probably isn't what you want either.
If you want Ctrl-C to work, you're stuck polling to some extent, but at least (unlike Python 2) you can effectively poll for Ctrl-C while live blocking on the queue the rest of the time (so you're alerted to an item becoming free immediately, which is the common case).
import time
import queue
def get_timed_interruptable(q, timeout):
stoploop = time.monotonic() + timeout - 1
while time.monotonic() < stoploop:
try:
return q.get(timeout=1) # Allow check for Ctrl-C every second
except queue.Empty:
pass
# Final wait for last fraction of a second
return q.get(timeout=max(0, stoploop + 1 - time.monotonic()))
This blocks for a second at a time until:
The time remaining is less than a second (it blocks for the remaining time, then allows the Empty to propagate normally)
Ctrl-C was pressed during the one second interval (after the remainder of that second elapses, KeyboardInterrupt is raised)
An item is acquired (if Ctrl-C was pressed, it will raise at this point too)
As mentioned in the comment thread to the great answer #ShadowRanger provided above, here is an alternate simplified form of his function:
import queue
def get_timed_interruptable(in_queue, timeout):
'''
Perform a queue.get() with a short timeout to avoid
blocking SIGINT on Windows.
'''
while True:
try:
# Allow check for Ctrl-C every second
return in_queue.get(timeout=min(1, timeout))
except queue.Empty:
if timeout < 1:
raise
else:
timeout -= 1
And as #Bharel pointed out in the comments, this could run a few milliseconds longer than the absolute timeout, which may be undesirable. As such here is a version with significantly better precision:
import time
import queue
def get_timed_interruptable_precise(in_queue, timeout):
'''
Perform a queue.get() with a short timeout to avoid
blocking SIGINT on Windows. Track the time closely
for high precision on the timeout.
'''
timeout += time.monotonic()
while True:
try:
# Allow check for Ctrl-C every second
return in_queue.get(timeout=max(0, min(1, timeout - time.monotonic())))
except queue.Empty:
if time.monotonic() > timeout:
raise
Just use get_nowait which won't block.
import time
...
while True:
if not q.empty():
q.get_nowait()
break
time.sleep(1) # optional timeout
This is obviously busy waiting, but q.get() does basically the same thing.
Related
Is there a way in python to interrupt a thread when it's sleeping?
(As we can do in java)
I am looking for something like that.
import threading
from time import sleep
def f():
print('started')
try:
sleep(100)
print('finished')
except SleepInterruptedException:
print('interrupted')
t = threading.Thread(target=f)
t.start()
if input() == 'stop':
t.interrupt()
The thread is sleeping for 100 seconds and if I type 'stop', it interrupts
The correct approach is to use threading.Event. For example:
import threading
e = threading.Event()
e.wait(timeout=100) # instead of time.sleep(100)
In the other thread, you need to have access to e. You can interrupt the sleep by issuing:
e.set()
This will immediately interrupt the sleep. You can check the return value of e.wait to determine whether it's timed out or interrupted. For more information refer to the documentation: https://docs.python.org/3/library/threading.html#event-objects .
How about using condition objects: https://docs.python.org/2/library/threading.html#condition-objects
Instead of sleep() you use wait(timeout). To "interrupt" you call notify().
If you, for whatever reason, needed to use the time.sleep function and happened to expect the time.sleep function to throw an exception and you simply wanted to test what happened with large sleep values without having to wait for the whole timeout...
Firstly, sleeping threads are lightweight and there's no problem just letting them run in daemon mode with threading.Thread(target=f, daemon=True) (so that they exit when the program does). You can check the result of the thread without waiting for the whole execution with t.join(0.5).
But if you absolutely need to halt the execution of the function, you could use multiprocessing.Process, and call .terminate() on the spawned process. This does not give the process time to clean up (e.g. except and finally blocks aren't run), so use it with care.
I have a single background process running alongside the main one, where it uses Queue to communicate (using multiprocessing, not multithreading). The main process runs constantly, and the background thread runs once per queue item so that if it gets backlogged, it can still catch up. Instead of closing with the main script (I've enabled daemon for that), I would prefer it to run until the queue is empty, then save and quit.
It's started like this:
q_send = Queue()
q_recv = Queue()
p1 = Process(target=background_process, args=(q_send, q_recv))
p1.daemon = True
p1.start()
Here's how the background process currently runs:
while True:
received_data = q_recv.get()
#do stuff
One way I've considered is to switch the loop to run all the time, but check the size of the queue before trying to read it, and wait a few seconds if it's empty before trying again. There are a couple of problems though. The whole point is it'll run once per item, so if there are 1000 queued commands, it seems a little inefficient checking the queue size before each one. Also, there's no real limit on how long the main process can go without sending an update, so I'd have to set the timeout quite high, as opposed to instantly exiting when the connection is broken, and queue emptied. With the background thread using up to 2gb of ram, it could probably do with exiting as soon as possible.
It'd also make it look a lot more messy:
afk_time = 0
while True:
if afk_time > 300:
return
if not q_recv.qsize():
time.sleep(2)
afk_time += 2
else:
received_data = q_recv.get()
#do stuff
I came across is_alive(), and thought perhaps getting the main process from current_process() might work, but it gave a picking error when I tried to send it to the queue.
Queue.get has a keyword argument timeout which determines the time to wait for an item if the queue is empty. If no item is available when the timeout elapses then a Empty exception is raised.
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).
So you can except that error and break out of the loop:
try:
received_data = q_recv.get(timeout=300)
except queue.Empty:
return
I can't find any documentation on python's queue get with timeout: get([block[, timeout]]) whereas there is good documentation on python's time.sleep() at http://www.pythoncentral.io/pythons-time-sleep-pause-wait-sleep-stop-your-code/.
I've used the linux time to time a loop of 5, 500 and 5000 over both with a period of 100 ms and they both seem similar.
Snippet 1: with queue timeout
while True:
try:
if self._queue.get(True,period) == '!STOP!: break
except:
# Queue.Empty session, keep going
-- do stuff here --
Snippet 2: With time sleep
while True:
try:
if self._queue.get_nowait() == '!STOP!: break
except:
# Queue.Empty session, keep going
-- do stuff here --
time.sleep(period)
Snippet 1 is preferred because instead of sleeping, and then checking the poison pill queue, it 'sleeps' checking the queue. Of course it is a pretty moot point, since the period will normally only be between 0.100 and 0.500 secs but I wan't to make sure there isn't something in the queue.get that I'm missing.
As you said, the first option is a better choice because instead of just unconditionally sleeping for period, then checking to see if anything is in the queue, and then sleeping again, you're actively waiting for something to be put into the queue for the entire period, and then just briefly doing something other than waiting for the '!STOP!' to arrive. There's no hidden gotchas; get_nowait is internally using time.time() + period to decide how long to wait 1) to be able to acquire the internal lock on the queue, and 2) for something to actually be in the queue to get. Here's the relevant code from multprocessing/queues.py:
if block:
deadline = time.time() + timeout
if not self._rlock.acquire(block, timeout): # Waits for up to `timeout` to get the lock
raise Empty # raise empty if it didn't get it
try:
if block:
timeout = deadline - time.time()
if timeout < 0 or not self._poll(timeout): # Once it has the lock, waits for however much time is left before `deadline` to something to arrive
raise Empty
elif not self._poll():
raise Empty
res = self._recv()
self._sem.release()
return res
finally:
self._rlock.release()
Running Python 2.6 and 2.7 on Windows 7 and Server 2012
Event::wait causes a delay when used with a timeout that is not triggered because event is set in time. I don't understand why.
Can someone explain?
The following program shows this and gives a possible explanation;
'''Shows that using a timeout in Event::wait (same for Queue::wait) causes a
delay. This is perhaps caused by a polling loop inside the wait implementation.
This polling loop sleeps some time depending on the timeout.
Probably wait timeout > 1ms => sleep = 1ms
A wait with timeout can take at least this sleep time even though the event is
set or queue filled much faster.'''
import threading
event1 = threading.Event()
event2 = threading.Event()
def receiver():
'''wait 4 event2, clear event2 and set event1.'''
while True:
event2.wait()
event2.clear()
event1.set()
receiver_thread = threading.Thread(target = receiver)
receiver_thread.start()
def do_transaction(timeout):
'''Performs a transaction; clear event1, set event2 and wait for thread to set event1.'''
event1.clear()
event2.set()
event1.wait(timeout = timeout)
while True:
# With timeout None this runs fast and CPU bound.
# With timeout set to some value this runs slow and not CPU bound.
do_transaction(timeout = 10.0)
Looking at the source code for wait() method of the threading.Condition class, there are two very different code paths. Without a timeout, we just wait on a lock forever, and when we get the lock, we return immediately.
However, with a timeout you cannot simply wait on the lock forever, and the low-level lock provides no timeout implementation. So the code sleeps for exponentially longer periods of time, after each sleep checking if the lock can be acquired. The relevant comment from the code:
# Balancing act: We can't afford a pure busy loop, so we
# have to sleep; but if we sleep the whole timeout time,
# we'll be unresponsive. The scheme here sleeps very
# little at first, longer as time goes on, but never longer
# than 20 times per second (or the timeout time remaining).
So in an average scenario where the condition/event cannot is not notified within a short period of time, you will see a 25ms delay (a random incoming event will arrive on average with half the max sleep time of 50ms left before the sleep ends).
I have a thread that appends rows to self.output and a loop that runs until self.done is True (or the max execution time is reached).
Is there a more efficient way to do this other than using a while loop that constantly checks to see if it's done. The while loop causes the CPU to spike to 100% while it's running..
time.clock()
while True:
if len(self.output):
yield self.output.pop(0)
elif self.done or 15 < time.clock():
if 15 < time.clock():
yield "Maximum Execution Time Exceeded %s seconds" % time.clock()
break
Are your threads appending to self.output here, with your main task consuming them? If so, this is a tailor-made job for Queue.Queue. Your code should become something like:
import Queue
# Initialise queue as:
queue = Queue.Queue()
Finished = object() # Unique marker the producer will put in the queue when finished
# Consumer:
try:
while True:
next_item = self.queue.get(timeout=15)
if next_item is Finished: break
yield next_item
except Queue.Empty:
print "Timeout exceeded"
Your producer threads add items to the queue with queue.put(item)
[Edit] The original code has a race issue when checking self.done (for example multiple items may be appended to the queue before the flag is set, causing the code to bail out at the first one). Updated with a suggestion from ΤΖΩΤΖΙΟΥ - the producer thread should instead append a special token (Finished) to the queue to indicate it is complete.
Note: If you have multiple producer threads, you'll need a more general approach to detecting when they're all finished. You could accomplish this with the same strategy - each thread a Finished marker and the consumer terminates when it sees num_threads markers.
Use a semaphore; have the working thread release it when it's finished, and block your appending thread until the worker is finished with the semaphore.
ie. in the worker, do something like self.done = threading.Semaphore() at the beginning of work, and self.done.release() when finished. In the code you noted above, instead of the busy loop, simply do self.done.acquire(); when the worker thread is finished, control will return.
Edit: I'm afraid I don't address your needed timeout value, though; this issue describes the need for a semaphore timeout in the standard library.
Use time.sleep(seconds) to create a brief pause after each iteration of the while loop to relinquish the cpu. You will have to set the time you sleep during each iteration based on how important it is that you catch the job quickly after it's complete.
Example:
time.clock()
while True:
if len(self.output):
yield self.output.pop(0)
elif self.done or 15 < time.clock():
if 15 < time.clock():
yield "Maximum Execution Time Exceeded %s seconds" % time.clock()
break
time.sleep(0.01) # sleep for 10 milliseconds
use mutex module or event/semaphore
You have to use a synchronization primitive here. Look here: http://docs.python.org/library/threading.html.
Event objects seem very simple and should solve your problem. You can also use a condition object or a semaphore.
I don't post an example because I've never used Event objects, and the alternatives are probably less simple.
Edit: I'm not really sure I understood your problem. If a thread can wait until some condition is statisfied, use synchronization. Otherwise the sleep() solution that someone posted will about taking too much CPU time.