I was wondering why ever setting block=false would make sense?
from multiprocessing import Process, Lock
lock.acquire(block=False)
If i don't need to block, I wouldn't use Lock at all?
From Python in a Nutshell:
L.acquire()
When
blocking
is True, acquire locks
L
. If
L
is already locked, the calling thread suspends and waits until
L
is unlocked,
then locks
L
. Even if the calling thread was the one that last locked
L
, it still suspends and waits until another thread
releases
L
. When
blocking
is False and
L
is unlocked, acquire locks
L
and returns True. When
blocking
is False and
L
is
locked, acquire does not affect
L
, and returns False.
And a practical example using the following simple code:
from multiprocessing import Process, Lock, current_process
def blocking_testing(lock):
if not lock.acquire(False):
print('{} Couldn\'t get lock'.format(current_process().ident))
else:
print('{} Got lock'.format(current_process().ident))
if __name__ == '__main__':
lock = Lock()
for i in range(3):
procs = []
p = Process(target=blocking_testing, args=(lock,))
procs.append(p)
p.start()
for p in procs:
p.join()
With the above version (blocking=False) this outputs
12206 Got lock
12207 Couldn't get lock
12208 Couldn't get lock
If I set blocking=True (or remove it, as it defaults to True) the main process will hang, as the Lock is not being released.
Finally, if I set blocking=True and add a lock.release() at the end, my output will be
12616 Got lock
12617 Got lock
12618 Got lock
I hope this was a clear enough explanation.
multiprocessing.Lock is not used for blocking, it's used to protect one or more resources from concurrent access.
The simplest of the examples could be a file written by multiple processes. To guarantee that only one process at a time is writing on the given file, you protect it with a Lock.
There are situations where your logic cannot block. For example, if your logic is orchestrated by an event loop like the asyncio module, blocking would stop the entire execution until the Lock is released.
In such cases the common approach is trying to acquire the Lock. If you succeed, you proceed accessing the protected resource, otherwise you move to other routines and try later.
This is make sense as its parameter's name: block. block=False provide a non-blocking function to access protected resources.
Example one:
You have a GUI thread and a background work thread. Your GUI thread needs to modify some data generated by work thread, but your GUI thread cannot block as it will block the whole interaction. So you can use lock.acquire(block=False) to safely check if data is ready without blocking.
Example two:
Another example related to event loop is asyncio, this provide a non-blocking access to protected resources.
Related
I understand the purpose of joining a thread, I'm asking about resource use. My specific use-case here is that I have a long-running process that needs to spawn many threads, and during operation, checks if they have terminated and then cleans them up. The main thread waits on inotify events and spawns threads based on those, so it can't block on join() calls, because it needs to block on inotify calls.
I know that with pthreads, for instance, not joining a terminated thread will cause a resource leak:
PTHREAD_JOIN(3): Failure to join with a thread that is joinable (i.e., one that is not detached), produces a "zombie thread". Avoid doing this, since each zombie thread consumes some system resources, and when enough zombie threads have accumulated, it will no longer be possible to create new threads (or processes).
Python's documentation says no such thing, though, but it also doesn't specify that the join() can be disregarded without issue if many threads are expected to end on their own without being joined during normal operation.
I'm wondering, can I simply take my thread list and do the following:
threads = [thread for thread in threads if thread.is_alive()]
For each check, or will this leak? Or must I do the following?
alive_threads = list()
for thread in threads:
if thread.is_alive():
alive_threads.append(thread)
else:
thread.join()
threads = alive_threads
TLDR: No. Thread cleans up the underlying resources by itself.
Thread.join merely waits for the thread to end, it does not perform cleanup. Basically, each Thread has a lock that is released when the thread is done and subsequently cleaned up. Thread.join just waits for the lock to be released.
There is some minor cleanup done by Thread.join, namely removing the lock and setting a flag to mark the thread as dead. This is an optimisation to avoid needlessly waiting for the lock. These are internal, however, and also performed by all other public methods relying on the lock and flag. Finally, this cleanup is functionally equivalent to a Thread being cleaned up automatically due to garbage collection.
I have a reading thread in my application that listens on stdin. It blocks until some input is available. When some arrive, it accepts the lines, checks if they are valid commands and put them in a queue.
def ReadCommands( queue ):
for cmd in stdin:
if cmd=="":
break
# Check if cmd is valid and add to queue
queue = Queue()
thread = Thread( target=ReadCommands, args=( queue, ) )
thread.start()
Now when the main program wants to exit, it first has to join on this reading thread. The problem is that the thread is in a loop I have no control over. Even stdin.close() does not work.
How can I break the for loop in the reading thread from the main?
Alternatively, how can I write the for loop (with a while?) to be able to add my own boolean variable that would break the loop? Beware that I don't want an active waiting loop!
If you have threads that you just want to shut down on exiting your program, starting them in daemon mode is often the best way to go. If all non-daemon threads exit, your application will end, taking all daemon threads with it.
Note that you should only do this for threads that do not have to perform cleanup; your example seems to be fine for this.
Also, if you are performing a blocking C-level operation, a daemon thread may still block until it returns to the actual python scope. In that case, there is no python level option to break the block to begin with. Reading from broken sockets can be such an issue, for example.
If you need to explicitly kill a blocking thread before stopping your program, you will probably have to use the python's C-API. This can be implemented more cleanly but works in principle.
I have some function which does some file writing. The semaphore is for limiting a number of threads to 2. The total number of threads are 3. How can I prevent from the 3 threads a starvation? Is the queue is an option for that?
import time
import threading
sema = threading.Semaphore(2)
def write_file(file,data):
sema.acquire()
try:
f=open(file,"a")
f.write(data)
f.close()
finally:
sema.release()
I have to object to the accepted question. It is true that Condition queues the waits, but the more important part is when it tries to acquire the Condition lock.
The order in which threads are released is not deterministic
The implementation may pick one at random, so the order in which blocked threads are awakened should not be relied on.
In the case of three threads, there I agree, it's very unlikely that two are trying to acquire the lock at the same time (one working, one in wait, one acquiring the lock), but there still might be interferences.
A good solution for your problem IMO would be a thread that's single purpose is to read your data from a queue and write it to a file. All other threads can write to the queue and continue working.
If a thread is waiting to acquire the semaphore, either of the other two threads will be done writing and release the semaphore.
If you are worried that if there is a lot of writing going on, the writers might reacquire the semaphore before the waiting thread is notified. This can not happen, I think.
The Semaphore object in Python (2.7) uses a Condition. The Condition adds waiting threads (actually a lock, which the waiting thread is blocking on) to the end of an waiters list and when notifying threads, the notified threads are taken from the beginning of the list. So the list acts like a FIFO-queue.
It looks something like this:
def wait(self, timeout=None):
self.__waiters.append(waiter)
...
def notify(self, n=1):
...
waiters = self.__waiters[:n]
for waiter in waiters:
waiter.release()
...
My understanding, after reading the source code, is that Python's Semaphores are FIFO. I couldn't find any other information about this, so please correct me if I'm wrong.
My python script creates alot of threads, they are all daemon threads, I find that I get an error saying "out of memory".
How do I kill a daemon thread whilst my script/application is running?
I understand the concept of daemon threads, that they destroy themselves when my process(script or application) closes/finishes. But I want to kill some of my daemon threads whilst my script is still running to avoid the "out of memory" error.
Will my thread below kill itself when there are no more tasks in the queue?
class ParsePageThread(threading.Thread):
THREAD_NUM = 0
def __init__(self, _queue):
threading.Thread.__init__(self)
self.queue = _queue
def run(self):
while(True):
try:
url = self.queue.get()
except Queue.Empty,e:
return # WILL this kill the thread?
finally:
self.queue.task_done()
I'll answer your second question first because it is easier. Yes, returning from the run method will indeed stop the thread. A detailed explanation is threading: Thread Objects doc.
To stop a thread that is running before it's natural completion you have to get a little more creative. There is no direct kill method on a thread object. What you need to do is use a shared variable to define the state of the thread.
alive = True
class MyThread(threading.Thread):
def run():
while(alive):
#do work here
In some other piece of code, when you detect a condition for stopping that thread, the other thread simply sets alive to False:
alive = False
This is a simple example, I'll leave it to you to scale to multiple threads.
DANGER
This example works because reading and setting a boolean variable are atomic actions in python because of the Global Interpreter Lock. Here is an excellent tutorial for lower level python threading. You should stick to using the Queue object because that's exactly what it's for.
If you do anything more than reading and setting simple variables from multiple threads you should use Locks or alternatively Reentrant Locks depending on your design and needs. Even something as simple as a compare and swap without a lock can cause problems in your program that are very difficult to debug.
Another piece of advice for python multithreading is to never do any significant work in the interpreter thread. It should setup and start all the other threads and then sleep or wait on a condition object until the program exits. The reason for this is no other python thread can receive operating system signals. This means that no other thread can deal with Ctrl+C aka KeyboardInterrupt exceptions. It can be a good practice to have the main thread handle the KeyboardInterrupt exception and then set all the alive variables to False so you can exit your program quickly. This is especially helpful while developing so you don't have to constantly kill things when you make a mistake.
I am trying to write a unix client program that is listening to a socket, stdin, and reading from file descriptors. I assign each of these tasks to an individual thread and have them successfully communicating with the "main" application using synchronized queues and a semaphore. The problem is that when I want to shutdown these child threads they are all blocking on input. Also, the threads cannot register signal handlers in the threads because in Python only the main thread of execution is allowed to do so.
Any suggestions?
There is no good way to work around this, especially when the thread is blocking.
I had a similar issue ( Python: How to terminate a blocking thread) and the only way I was able to stop my threads was to close the underlying connection. Which resulted in the thread that was blocking to raise and exception and then allowed me to check the stop flag and close.
Example code:
class Example(object):
def __init__(self):
self.stop = threading.Event()
self.connection = Connection()
self.mythread = Thread(target=self.dowork)
self.mythread.start()
def dowork(self):
while(not self.stop.is_set()):
try:
blockingcall()
except CommunicationException:
pass
def terminate():
self.stop.set()
self.connection.close()
self.mythread.join()
Another thing to note is commonly blocking operations generally offer up a timeout. If you have that option I would consider using it. My last comment is that you could always set the thread to deamonic,
From the pydoc :
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property.
Also, the threads cannot register signal handlers
Signals to kill threads is potentially horrible, especially in C, especially if you allocate memory as part of the thread, since it won't be freed when that particular thread dies (as it belongs to the heap of the process). There is no garbage collection in C, so if that pointer goes out of scope, it's gone out of scope, the memory remains allocated. So just be careful with that one - only do it that way in C if you're going to actually kill all the threads and end the process so that the memory is handed back to the OS - adding and removing threads from a threadpool for example will give you a memory leak.
The problem is that when I want to shutdown these child threads they are all blocking on input.
Funnily enough I've been fighting with the same thing recently. The solution is literally don't make blocking calls without a timeout. So, for example, what you want ideally is:
def threadfunc(running):
while running:
blockingcall(timeout=1)
where running is passed from the controlling thread - I've never used threading but I have used multiprocessing and with this you actually need to pass an Event() object and check is_set(). But you asked for design patterns, that's the basic idea.
Then, when you want this thread to end, you run:
running.clear()
mythread.join()
and your main thread should then allow your client thread to handle its last call, and return, and the whole program folds up nicely.
What do you do if you have a blocking call without a timeout? Use the asynchronous option, and sleep (as in call whatever method you have to suspend the thread for a period of time so you're not spinning) if you need to. There's no other way around it.
See these answers:
Python SocketServer
How to exit a multithreaded program?
Basically, don't block on recv() by using select() with a timeout to check for readability of the socket, and poll a quit flag when select() times out.