I have a simple requirement, when an event occurs a thread is created and sleeps for x minutes before waking up to carry out its tasks and terminate.
But if another event occurs any thread that is sleeping should be terminated and a new thread should be spawned for the same purpose.
In python I believe the best way to make a thread sleep is,
import time
time.sleep(x*60)
Is there a way to learn the state of a thread (currently sleeping/idle or alive)?
There is really no way to do this well as a thread is either alive(running) or not. Technically even if its sleeping it still running/alive its just not doing anything.
In general using sleeps in a thread is not really desirable as it can be a pain to adjust the time it sleeps and/or wake it when you need it to do something.
One thing I have used in the past for this is Condition in the threading module. This allows you to put a thread to "sleep" by calling .wait(). You can then to an .acquire(false) to see if its blocked, and then .acquire() .notify() .release() to wake it up again if you need to.
Its a simple way to keep a thread around and from spinning or using come crazy sleep paradigm.
Another good option is just to have the thread consume in a while True from a blocking queue (Queue module in python) which will technically manage all that for you.
If you are using something like time.sleep you could set a variable to False and changed to True just after the thread "awake", for instance.
class MyThread(Thread):
def __init__(self):
self.awake = False
def run(self):
time.sleep(x*60)
self.awake = True
def is_sleeping(self):
return not self.awake() and self.isAlive() # You need to know also if you already started the thread
then you could:
if (some_thread.is_sleeping()):
# more code here.
Note: As pointed out by Luke Wahlmeier, since you do thread.start() the thread is running. If hits a line as time.sleep(10) it still running, but now is counting to 10 secs. What this code does is check if the thread execution has reached some execution point (the line after the sleep in this case).
Related
This is more out of theoretical curiosity than an actual problem I am having.
Say you want to run some code at a regular interval, what are the pros and cons of using a Timer vs using a thread + time.sleep in terms of CPU consumption?
The two below approaches do the same. I am aware that the Thread approach is not exactly one second interval, but rather adds a delay after each execution, which can matter if the task_function operation takes a long time. I am also aware that there are many other ways to solve this problem, but lets focus on the threading package.
Timer approach
def task_function():
print(time.time())
def task():
task_function()
threading.Timer(1,task).start()
task()
Thread approach
def task_function():
while True:
print(time.time())
time.sleep(1)
threading.Thread(target=task_function).start()
I read somewhere that starting a thread is quite resource intensive. So I wonder that if you had some code you wanted to run every 0.1 seconds, would the Timer approach not be sub-optimal since a new thread has to be started so often?
If the code must repeat on an interval, use the plain Thread (to be clear, Timer is just a thin wrapper around a Thread in the first place; it's implemented as a subclass). Spawning a new thread (via Timer) 10x a second is wasteful, and gains you nothing in any event.
You should make the worker thread a daemon thread though, unless you really want it to keep the process alive indefinitely.
I have a fairly simple program that each task added into the taskq is executing and computing something, say for 30 seconds. This task is 'not' running in some kind of while or for loop.
def run(self):
while not self.stopper.is_set():
DO_MY_30_SECONDS_WORK(self)
self.task_done()
Now, assuming i have a thread.event and this can check before/after the task is done. But is there a way to tell the already running thread to stop or exit it's execution.
There's no way to stop your running thread if DO_MY_30_SECONDS_WORK(self) is blocking. Well arguably you could set it as daemon thread and it'll be abruptly killed when your main program execution finishes, this would cause problems if the thread is actually holding resources (e.g. writing to a file) and is generally not a good idea to finish a thread.
What you could do is re-design DO_MY_30_SECONDS_WORK(self) and make it non-blocking, which means cutting the work into small pieces and make it check for the stop sign in a reasonable interval, so that your thread will be responsive enough to finish itself when you tell it to do so.
I have this python threading code.
import threading
def sum(value):
sum = 0
for i in range(value+1):
sum += i
print "I'm done with %d - %d\n" % (value, sum)
return sum
r = range(500001, 500000*2, 100)
ts = []
for u in r:
t = threading.Thread(target=sum, args = (u,))
ts.append(t)
t.start()
for t in ts:
t.join()
Executing this, I have hundreds of threads are working.
However, when I move the t.join() right after the t.start(), I have only two threads working.
for u in r:
t = threading.Thread(target=sum, args = (u,))
ts.append(t)
t.start()
t.join()
I tested with the code that does not invoke the t.join(), but it seems to work fine?
Then when, how, and how to use thread.join()?
You seem to not understand what Thread.join does. When calling join, the current thread will block until that thread finished. So you are waiting for the thread to finish, preventing you from starting any other thread.
The idea behind join is to wait for other threads before continuing. In your case, you want to wait for all threads to finish at the end of the main program. Otherwise, if you didn’t do that, and the main program would end, then all threads it created would be killed. So usually, you should have a loop at the end, that joins all created threads to prevent the main thread from exiting down early.
Short answer: this one:
for t in ts:
t.join()
is generally the idiomatic way to start a small number of threads. Doing .join means that your main thread waits until the given thread finishes before proceeding in execution. You generally do this after you've started all of the threads.
Longer answer:
len(list(range(500001, 500000*2, 100)))
Out[1]: 5000
You're trying to start 5000 threads at once. It's miraculous your computer is still in one piece!
Your method of .join-ing in the loop that dispatches workers is never going to be able to have more than 2 threads (i.e. only one worker thread) going at once. Your main thread has to wait for each worker thread to finish before moving on to the next one. You've prevented a computer-meltdown, but your code is going to be WAY slower than if you'd just never used threading in the first place!
At this point I'd talk about the GIL, but I'll put that aside for the moment. What you need to limit your thread creation to a reasonable limit (i.e. more than one, less than 5000) is a ThreadPool. There are various ways to do this. You could roll your own - this is fairly simple with a threading.Semaphore. You could use 3.2+'s concurrent.futures package. You could use some 3rd party solution. Up to you, each is going to have a different API so I can't really discuss that further.
Obligatory GIL Discussion
cPython programmers have to live with the GIL. The Global Interpreter Lock, in short, means that only one thread can be executing python bytecode at once. This means that on processor-bound tasks (like adding a bunch of numbers), threading will not result in any speed-up. In fact, the overhead involved in setting up and tearing down threads (not to mention context switching) will result in a slowdown. Threading is better positioned to provide gains on I/O bound tasks, such as retrieving a bunch of URLs.
multiprocessing and friends sidestep the GIL limitation by, well, using multiple processes. This isn't free - data transfer between processes is expensive, so a lot of care needs to be made not to write workers that depend on shared state.
join() waits for your thread to finish, so the first use starts a hundred threads, and then waits for all of them to finish. The second use wait for end of every thread before it launches another one, which kind of defeats the purpose of threading.
The first use makes most sense. You run the threads (all of them) to do some parallel computation, and then wait until all of them finish, before you move on and use the results, to make sure the work is done (i.e. the results are actually there).
My python script creates alot of threads, they are all daemon threads, I find that I get an error saying "out of memory".
How do I kill a daemon thread whilst my script/application is running?
I understand the concept of daemon threads, that they destroy themselves when my process(script or application) closes/finishes. But I want to kill some of my daemon threads whilst my script is still running to avoid the "out of memory" error.
Will my thread below kill itself when there are no more tasks in the queue?
class ParsePageThread(threading.Thread):
THREAD_NUM = 0
def __init__(self, _queue):
threading.Thread.__init__(self)
self.queue = _queue
def run(self):
while(True):
try:
url = self.queue.get()
except Queue.Empty,e:
return # WILL this kill the thread?
finally:
self.queue.task_done()
I'll answer your second question first because it is easier. Yes, returning from the run method will indeed stop the thread. A detailed explanation is threading: Thread Objects doc.
To stop a thread that is running before it's natural completion you have to get a little more creative. There is no direct kill method on a thread object. What you need to do is use a shared variable to define the state of the thread.
alive = True
class MyThread(threading.Thread):
def run():
while(alive):
#do work here
In some other piece of code, when you detect a condition for stopping that thread, the other thread simply sets alive to False:
alive = False
This is a simple example, I'll leave it to you to scale to multiple threads.
DANGER
This example works because reading and setting a boolean variable are atomic actions in python because of the Global Interpreter Lock. Here is an excellent tutorial for lower level python threading. You should stick to using the Queue object because that's exactly what it's for.
If you do anything more than reading and setting simple variables from multiple threads you should use Locks or alternatively Reentrant Locks depending on your design and needs. Even something as simple as a compare and swap without a lock can cause problems in your program that are very difficult to debug.
Another piece of advice for python multithreading is to never do any significant work in the interpreter thread. It should setup and start all the other threads and then sleep or wait on a condition object until the program exits. The reason for this is no other python thread can receive operating system signals. This means that no other thread can deal with Ctrl+C aka KeyboardInterrupt exceptions. It can be a good practice to have the main thread handle the KeyboardInterrupt exception and then set all the alive variables to False so you can exit your program quickly. This is especially helpful while developing so you don't have to constantly kill things when you make a mistake.
I am trying to write a unix client program that is listening to a socket, stdin, and reading from file descriptors. I assign each of these tasks to an individual thread and have them successfully communicating with the "main" application using synchronized queues and a semaphore. The problem is that when I want to shutdown these child threads they are all blocking on input. Also, the threads cannot register signal handlers in the threads because in Python only the main thread of execution is allowed to do so.
Any suggestions?
There is no good way to work around this, especially when the thread is blocking.
I had a similar issue ( Python: How to terminate a blocking thread) and the only way I was able to stop my threads was to close the underlying connection. Which resulted in the thread that was blocking to raise and exception and then allowed me to check the stop flag and close.
Example code:
class Example(object):
def __init__(self):
self.stop = threading.Event()
self.connection = Connection()
self.mythread = Thread(target=self.dowork)
self.mythread.start()
def dowork(self):
while(not self.stop.is_set()):
try:
blockingcall()
except CommunicationException:
pass
def terminate():
self.stop.set()
self.connection.close()
self.mythread.join()
Another thing to note is commonly blocking operations generally offer up a timeout. If you have that option I would consider using it. My last comment is that you could always set the thread to deamonic,
From the pydoc :
A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the daemon property.
Also, the threads cannot register signal handlers
Signals to kill threads is potentially horrible, especially in C, especially if you allocate memory as part of the thread, since it won't be freed when that particular thread dies (as it belongs to the heap of the process). There is no garbage collection in C, so if that pointer goes out of scope, it's gone out of scope, the memory remains allocated. So just be careful with that one - only do it that way in C if you're going to actually kill all the threads and end the process so that the memory is handed back to the OS - adding and removing threads from a threadpool for example will give you a memory leak.
The problem is that when I want to shutdown these child threads they are all blocking on input.
Funnily enough I've been fighting with the same thing recently. The solution is literally don't make blocking calls without a timeout. So, for example, what you want ideally is:
def threadfunc(running):
while running:
blockingcall(timeout=1)
where running is passed from the controlling thread - I've never used threading but I have used multiprocessing and with this you actually need to pass an Event() object and check is_set(). But you asked for design patterns, that's the basic idea.
Then, when you want this thread to end, you run:
running.clear()
mythread.join()
and your main thread should then allow your client thread to handle its last call, and return, and the whole program folds up nicely.
What do you do if you have a blocking call without a timeout? Use the asynchronous option, and sleep (as in call whatever method you have to suspend the thread for a period of time so you're not spinning) if you need to. There's no other way around it.
See these answers:
Python SocketServer
How to exit a multithreaded program?
Basically, don't block on recv() by using select() with a timeout to check for readability of the socket, and poll a quit flag when select() times out.