Python semaphore "hangs" in tight loops - python

I have been having an issue with Python Semaphores appearing to "lock" for an unbounded amount of time when there is a tight relationship between acquire/release. I do not have this issue with Lock/RLock
Below is code distilled to to the simplest case which exhibits the concerning behavior.
import threading
import time
sem = threading.Semaphore()
#sem = threading.RLock()
exit = False
def spinner():
while(not exit):
sem.acquire()
time.sleep(1)
sem.release()
t = threading.Thread(target=spinner)
t.start()
print time.strftime("%H:%M:%S",time.gmtime())
for i in range(0,10):
sem.acquire()
print "Accessed!"
sem.release()
print time.strftime("%H:%M:%S",time.gmtime())
exit = True
t.join()
When I use the semaphore method, this takes an unpredictable amount of time (sometimes 20 minutes!)
When I use a Lock or RLock, this completes quickly as I expect.
Am I missing something? It seems like semaphore with the default value=1 should behave the same as Lock.
According to the documentation I'm looking at, calling release on one thread should unblock an indeterminate other blocked thread. However what I think is happening is that the thread which calls the release is free to keep running, then re-acquire if it is still within its time-slice. When hitting acquire again it sees an unblocked semaphore and gets access again. Bad luck thus forces the waiting thread to keep waiting a long time.
Am I missing something? Why would Lock/RLock work any better?

Related

Python - How to break immediately out of loop without waiting for next iteration, or stop thread? [duplicate]

Is there a way in python to interrupt a thread when it's sleeping?
(As we can do in java)
I am looking for something like that.
import threading
from time import sleep
def f():
print('started')
try:
sleep(100)
print('finished')
except SleepInterruptedException:
print('interrupted')
t = threading.Thread(target=f)
t.start()
if input() == 'stop':
t.interrupt()
The thread is sleeping for 100 seconds and if I type 'stop', it interrupts
The correct approach is to use threading.Event. For example:
import threading
e = threading.Event()
e.wait(timeout=100) # instead of time.sleep(100)
In the other thread, you need to have access to e. You can interrupt the sleep by issuing:
e.set()
This will immediately interrupt the sleep. You can check the return value of e.wait to determine whether it's timed out or interrupted. For more information refer to the documentation: https://docs.python.org/3/library/threading.html#event-objects .
How about using condition objects: https://docs.python.org/2/library/threading.html#condition-objects
Instead of sleep() you use wait(timeout). To "interrupt" you call notify().
If you, for whatever reason, needed to use the time.sleep function and happened to expect the time.sleep function to throw an exception and you simply wanted to test what happened with large sleep values without having to wait for the whole timeout...
Firstly, sleeping threads are lightweight and there's no problem just letting them run in daemon mode with threading.Thread(target=f, daemon=True) (so that they exit when the program does). You can check the result of the thread without waiting for the whole execution with t.join(0.5).
But if you absolutely need to halt the execution of the function, you could use multiprocessing.Process, and call .terminate() on the spawned process. This does not give the process time to clean up (e.g. except and finally blocks aren't run), so use it with care.

Python Gevent Shared Queue (Listener Process)

I am trying to get some code working where I can implement logging into a multi-threaded program using gevent. What I'd like to do is set up custom logging handlers to put log events into a Queue, while a listener process is continuously watching for new log events to handle appropriately. I have done this in the past with Multiprocessing, but never with Gevent.
I'm having an issue where the program is getting caught up in the infinite loop (listener process), and not allowing the other threads to "do work"...
Ideally, after the worker processes have finished, I can pass an arbitrary value to the listener process to tell it to break the loop, and then join all the processes together. Here's what I have so far:
import gevent
from gevent.pool import Pool
import Queue
import random
import time
def listener(q):
while True:
if not q.empty():
num = q.get()
print "The number is: %s" % num
if num <= 100:
print q.get()
# got passed 101, break out
else:
break
else:
continue
def worker(pid,q):
if pid == 0:
listener(q)
else:
gevent.sleep(random.randint(0,2)*0.001)
num = random.randint(1,100)
q.put(num)
def main():
q = Queue.Queue()
all_threads = []
all_threads = [gevent.spawn(worker, pid,q) for pid in xrange(10)]
gevent.wait(all_threads[1:])
q.put(101)
gevent.joinall(all_threads)
if __name__ == '__main__':
main()
As I said, the program seems to be getting hung up on that first process and does not allow the other workers to do their thing. I have also tried spawning the listener process completely separately itself (which is actually how I would rather do it), but that didn't seem to work either so I tried this way.
Any help would be appreciated, feel like I am probably just missing something obvious about gevent's back end.
Thanks
The first problem is that your listener is never yielding if the queue is initially empty. The first task you spawn is your listener. When it starts, there's a while True:, the q will be empty, so you go to the else branch, which just continues, looping back to the start of the while loop, and then the q is still empty. So you just sit in the first thread constantly checking the q is empty.
The key thing here is that gevent does not use "native" threads or processes. Unlike "real" threads, which can be switched to at any time by something behind the scenes (like your OS scheduler), gevent uses 'greenlets', which require that you do something to "yield control" to another task. That something is whatever gevent thinks would block, such as read from the network, disk, or use one of the blocking gevent operations.
One crude fix would be to start your listener when pid == 9 rather than 0. By making it spawn last, there will be items in the q, and it will go into the main if branch. The downside is that this doesn't fix the logic problem, so the first time the queue is empty, you'll get stuck in your infinite loop again.
A more correct fix would be to put gevent.sleep() instead of continue. sleep is a blocking operation, so your other tasks will get a chance to run. Without arguments, it waits for no time, but still gives gevent the chance to decide to switch to another task if it is ready to run. This still isn't very efficient, though, as if the Queue is empty, it's going to spend a lot of pointless time checking that over and over and asking to run again as soon as it can. sleep'ing for longer than the default of 0 will be more efficient, but would delay processing your log messages.
However, you can instead take advantage of the fact that many of gevent's types, such as Queue, can be used in more Pythonic ways and make your code a lot simpler and easier to understand, as well as more efficient.
import gevent
from gevent.queue import Queue
def listener(q):
for msg in q:
print "the number is %d" % msg
def worker(pid,q):
gevent.sleep(random.randint(0,2)*0.001)
num = random.randint(1,100)
q.put(num)
def main():
q = Queue()
listener_task = gevent.spawn(listener, q)
worker_tasks = [gevent.spawn(worker, pid, q) for pid in xrange(1, 10)]
gevent.wait(worker_tasks)
q.put(StopIteration)
gevent.join(listener_task)
Here, Queue can operate as an iterator in a for loop. As long as there are messages, it will get an item, run the loop, and then wait for another item. If there are no items, it will just block and hang around until the next one arrives. Since it blocks, though, gevent will switch to one of your other tasks to run, avoiding the infinite loop problem your example code has.
Because this version is using the Queue as a for loop iterator, there's also automatically a nice sentinel value we can put in the queue to make the listener task quit. If a for loop gets StopIteration from its iterator, it will exit cleanly. So when our for loop that's reading from q gets StopIteration from the q, it exits, and then the function exits, and the spawned task is finished.

Difference between thread.join and thread.abort in python multithreading

I am new to python multi threading and trying to understand the basic difference between joining multiple worker threads and calling abort on them after I am done processing with them. Can somebody please explain me with an example?
.join() and setting a abort flags are two different steps in cleanly shutting down a thread.
join() just waits for a thread that is going to terminate anyway to be finished. Thus:
import threading
import time
def thread_main():
time.sleep(10)
t = threading.Thread(target=thread_main)
t.start()
t.join()
This is a reasonable program. The join just waits until the thread is finished. It doesn't do anything to make that happen, but the thread will terminate anyway, because it is just a 10 second sleep.
In contrast
import threading
import time
def thread_main():
while True:
time.sleep(10)
t = threading.Thread(target=thread_main)
t.start()
t.join()
Is not a good idea, because join will still wait for the thread to terminate on it's own. But the thread will never do that because it loops forever. Thus the whole program can't terminate.
That's the point where you want some kind of signaling to the thread for it so stop itself
import threading
import time
stop_thread = False
def thread_main():
while not stop_thread:
time.sleep(10)
t = threading.Thread(target=thread_main)
t.start()
stop_thread = True
t.join()
Here stop_thread takes the role of your __abort flag and signals the thread to stop after it has finished with it's latest work (the sleep(10) in this case)
Thus this program again is reasonable and terminates when asked to do.
Another popular way to signal a thread to stop when the thread uses a consumer pattern (i.e. gets its work from a queue) is to post a special 'terminate now' work item as alternative to setting a flag variable:
def thread_main():
while True:
(quit, data) = work_queue().get()
if quit: break
do_work(data)

Python threading with queue: how to avoid to use join?

I have a scenario with 2 threads:
a thread waiting for messages from a socket (embedded in a C library - blocking call is "Barra.ricevi") then putting an element on a queue
a thread waiting to get element from the queue and do something
Sample code
import Barra
import Queue
import threading
posQu = Queue.Queue(maxsize=0)
def threadCAN():
while True:
canMsg = Barra.ricevi("can0")
if canMsg[0] == 'ERR':
print (canMsg)
else:
print ("Enqueued message"), canMsg
posQu.put(canMsg)
thCan = threading.Thread(target = threadCAN)
thCan.daemon = True
thCan.start()
while True:
posMsg = posQu.get()
print ("Messagge from the queue"), posMsg
The result is that every time a new message is coming from the socket a new element is added to the queue, BUT the main thread that should get items from the queue is never woke up.
The output is as follow:
Enqueued message
Enqueued message
Enqueued message
Enqueued message
I expected to have:
Enqueued message
Messagge from the queue
Enqueued message
Messagge from the queue
The only way to solve this issue seams to add the line:
posQu.join()
at the end of the thread waiting for messages from the socket, and the line:
posQu.task_done()
at the end of the main thread.
In this case, after that a new message has been received from the socket, the thread is blocking waiting for the main thread to process the enqueued item.
Unfortunately this isn't the desired behavior since I would like a thread always ready to get messages from a socket and not waiting for a job to be compleated from another thread.
What I am doing wrong ?
Thanks
Andrew
(Italy)
This is likely because your Barra does not release the global interpreter lock (GIL) when Barra.ricevi. You may want to check this though.
The GIL ensures that only one thread can run at any one time (limiting the usefulness of threads in a multi-processor system). The GIL switches threads every 100 "ticks" -- a tick loosely mapping to bytecode instructions. See here for more details.
In your producer thread, not much happens outside of the C-library call. This means the producer thread will get to call Barra.ricevi a great many times before the GIL switches to another thread.
Solutions to this are to, in terms of increasing complexity:
Call time.sleep(0) after adding an item to the queue. This yields the thread so that another thread can run.
Use sys.setcheckinterval() to lower the amount of "ticks" executed before switching threads. This is will come at the cost of making the program much more computationally expensive.
Use multiprocessing rather than threading. This includes using multiprocessing.Queue instead of Queue.Queue.
Modify Barra so that it does release the GIL when its functions are called.
Example using multiprocessing. Be aware that when using multiprocessing, your processes no longer have an implied shared state. You will need to have a look at multiprocessing to see how to pass information between processes.
import Barra
import multiprocessing
def threadCAN(posQu):
while True:
canMsg = Barra.ricevi("can0")
if canMsg[0] == 'ERR':
print(canMsg)
else:
print("Enqueued message", canMsg)
posQu.put(canMsg)
if __name__ == "__main__":
posQu = multiprocessing.Queue(maxsize=0)
procCan = multiprocessing.Process(target=threadCAN, args=(posQu,))
procCan.daemon = True
procCan.start()
while True:
posMsg = posQu.get()
print("Messagge from the queue", posMsg)

Python: Pass or Sleep for long running processes?

I am writing an queue processing application which uses threads for waiting on and responding to queue messages to be delivered to the app. For the main part of the application, it just needs to stay active. For a code example like:
while True:
pass
or
while True:
time.sleep(1)
Which one will have the least impact on a system? What is the preferred way to do nothing, but keep a python app running?
I would imagine time.sleep() will have less overhead on the system. Using pass will cause the loop to immediately re-evaluate and peg the CPU, whereas using time.sleep will allow the execution to be temporarily suspended.
EDIT: just to prove the point, if you launch the python interpreter and run this:
>>> while True:
... pass
...
You can watch Python start eating up 90-100% CPU instantly, versus:
>>> import time
>>> while True:
... time.sleep(1)
...
Which barely even registers on the Activity Monitor (using OS X here but it should be the same for every platform).
Why sleep? You don't want to sleep, you want to wait for the threads to finish.
So
# store the threads you start in a your_threads list, then
for a_thread in your_threads:
a_thread.join()
See: thread.join
If you are looking for a short, zero-cpu way to loop forever until a KeyboardInterrupt, you can use:
from threading import Event
Event().wait()
Note: Due to a bug, this only works on Python 3.2+. In addition, it appears to not work on Windows. For this reason, while True: sleep(1) might be the better option.
For some background, Event objects are normally used for waiting for long running background tasks to complete:
def do_task():
sleep(10)
print('Task complete.')
event.set()
event = Event()
Thread(do_task).start()
event.wait()
print('Continuing...')
Which prints:
Task complete.
Continuing...
signal.pause() is another solution, see https://docs.python.org/3/library/signal.html#signal.pause
Cause the process to sleep until a signal is received; the appropriate handler will then be called. Returns nothing. Not on Windows. (See the Unix man page signal(2).)
I've always seen/heard that using sleep is the better way to do it. Using sleep will keep your Python interpreter's CPU usage from going wild.
You don't give much context to what you are really doing, but maybe Queue could be used instead of an explicit busy-wait loop? If not, I would assume sleep would be preferable, as I believe it will consume less CPU (as others have already noted).
[Edited according to additional information in comment below.]
Maybe this is obvious, but anyway, what you could do in a case where you are reading information from blocking sockets is to have one thread read from the socket and post suitably formatted messages into a Queue, and then have the rest of your "worker" threads reading from that queue; the workers will then block on reading from the queue without the need for neither pass, nor sleep.
Running a method as a background thread with sleep in Python:
import threading
import time
class ThreadingExample(object):
""" Threading example class
The run() method will be started and it will run in the background
until the application exits.
"""
def __init__(self, interval=1):
""" Constructor
:type interval: int
:param interval: Check interval, in seconds
"""
self.interval = interval
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
""" Method that runs forever """
while True:
# Do something
print('Doing something imporant in the background')
time.sleep(self.interval)
example = ThreadingExample()
time.sleep(3)
print('Checkpoint')
time.sleep(2)
print('Bye')

Categories

Resources