Which is more efficient? threading.Thread vs threading.Timer

Which is more efficient? threading.Thread vs threading.Timer - python

This is more out of theoretical curiosity than an actual problem I am having.
Say you want to run some code at a regular interval, what are the pros and cons of using a Timer vs using a thread + time.sleep in terms of CPU consumption?
The two below approaches do the same. I am aware that the Thread approach is not exactly one second interval, but rather adds a delay after each execution, which can matter if the task_function operation takes a long time. I am also aware that there are many other ways to solve this problem, but lets focus on the threading package.
Timer approach
def task_function():
print(time.time())
def task():
task_function()
threading.Timer(1,task).start()
task()
Thread approach
def task_function():
while True:
print(time.time())
time.sleep(1)
threading.Thread(target=task_function).start()
I read somewhere that starting a thread is quite resource intensive. So I wonder that if you had some code you wanted to run every 0.1 seconds, would the Timer approach not be sub-optimal since a new thread has to be started so often?

If the code must repeat on an interval, use the plain Thread (to be clear, Timer is just a thin wrapper around a Thread in the first place; it's implemented as a subclass). Spawning a new thread (via Timer) 10x a second is wasteful, and gains you nothing in any event.
You should make the worker thread a daemon thread though, unless you really want it to keep the process alive indefinitely.

Related

Is it better to deadlock main thread instead of busy waiting?

I have a main thread that starts multiple deamon threads listening to filesystem events.
I need to keep the main thread alive. To do so I can use a while loop, or "deadlock" it. Which is better, performance wise?
while True:
time.sleep(1)
or
from threading import Event
Event().wait()
I can interrupt the main thread using ctrl+c in both cases.
EDIT: Or is there a better way?

With time.sleep(delay) you will have to wait until the end of the sleep time to respond to the event, so the responsiveness of your code depends on the delay time. While with Event().wait() event management your application should be much more responsive as it will immediately respond to the external stimulus without waiting for the end of the delay. On the other hand, however, this also means that the managed ad will have to acquire / release the GIL much more frequently than the time.sleep(delay) that will release the GIL for the delay time. How does this affect performance?
Depending on the type of application, if you have a lot of active threads, you can probably see some small differences. This problem was particularly evident in Python 2x or earlier versions, in Python 3x these functions are impoverished at low level and the problem is much less obvious.
If you are interested in learning more about the subject, here you will find the C implementation of the function to acquire the lock using python3.
I hope I have answered your question fully.

When, why, and how to call thread.join() in Python?

I have this python threading code.
import threading
def sum(value):
sum = 0
for i in range(value+1):
sum += i
print "I'm done with %d - %d\n" % (value, sum)
return sum
r = range(500001, 500000*2, 100)
ts = []
for u in r:
t = threading.Thread(target=sum, args = (u,))
ts.append(t)
t.start()
for t in ts:
t.join()
Executing this, I have hundreds of threads are working.
However, when I move the t.join() right after the t.start(), I have only two threads working.
for u in r:
t = threading.Thread(target=sum, args = (u,))
ts.append(t)
t.start()
t.join()
I tested with the code that does not invoke the t.join(), but it seems to work fine?
Then when, how, and how to use thread.join()?

You seem to not understand what Thread.join does. When calling join, the current thread will block until that thread finished. So you are waiting for the thread to finish, preventing you from starting any other thread.
The idea behind join is to wait for other threads before continuing. In your case, you want to wait for all threads to finish at the end of the main program. Otherwise, if you didn’t do that, and the main program would end, then all threads it created would be killed. So usually, you should have a loop at the end, that joins all created threads to prevent the main thread from exiting down early.

Short answer: this one:
for t in ts:
t.join()
is generally the idiomatic way to start a small number of threads. Doing .join means that your main thread waits until the given thread finishes before proceeding in execution. You generally do this after you've started all of the threads.
Longer answer:
len(list(range(500001, 500000*2, 100)))
Out[1]: 5000
You're trying to start 5000 threads at once. It's miraculous your computer is still in one piece!
Your method of .join-ing in the loop that dispatches workers is never going to be able to have more than 2 threads (i.e. only one worker thread) going at once. Your main thread has to wait for each worker thread to finish before moving on to the next one. You've prevented a computer-meltdown, but your code is going to be WAY slower than if you'd just never used threading in the first place!
At this point I'd talk about the GIL, but I'll put that aside for the moment. What you need to limit your thread creation to a reasonable limit (i.e. more than one, less than 5000) is a ThreadPool. There are various ways to do this. You could roll your own - this is fairly simple with a threading.Semaphore. You could use 3.2+'s concurrent.futures package. You could use some 3rd party solution. Up to you, each is going to have a different API so I can't really discuss that further.
Obligatory GIL Discussion
cPython programmers have to live with the GIL. The Global Interpreter Lock, in short, means that only one thread can be executing python bytecode at once. This means that on processor-bound tasks (like adding a bunch of numbers), threading will not result in any speed-up. In fact, the overhead involved in setting up and tearing down threads (not to mention context switching) will result in a slowdown. Threading is better positioned to provide gains on I/O bound tasks, such as retrieving a bunch of URLs.
multiprocessing and friends sidestep the GIL limitation by, well, using multiple processes. This isn't free - data transfer between processes is expensive, so a lot of care needs to be made not to write workers that depend on shared state.

join() waits for your thread to finish, so the first use starts a hundred threads, and then waits for all of them to finish. The second use wait for end of every thread before it launches another one, which kind of defeats the purpose of threading.
The first use makes most sense. You run the threads (all of them) to do some parallel computation, and then wait until all of them finish, before you move on and use the results, to make sure the work is done (i.e. the results are actually there).

threadable delay in python 2.7

I'm currently using python (2.7) to write a GUI that has some threads going on. I come across a point that I need to do a roughly about a second delay before getting a piece of information, but I can't afford to have the function takes more than a few millisecond to run. With that in mind, I'm trying to create a Threaded timer that will set a flag timer.doneFlag and have the main function to keep poking to see whether it's done or not.
It is working. But not all the time. The problem that I run into is that sometimes I feel like the time.sleep function in run , doesn't wait fully for a second (sometimes it may not even wait). All I need is that I can have a flag that allow me control the start time and raise the flag when it reaches 1 second.
I maybe doing too much just to get a delay that is threadable, if you can suggest something, or help me find a bug in the following code, I'd be very grateful!
I've attached a portion of the code I used:
from main program:
class dataCollection:
def __init__(self):
self.timer=Timer(5)
self.isTimerStarted=0
return
def StateFunction(self): #Try to finish the function within a few milliseconds
if self.isTimerStarted==0:
self.timer=Timer(1.0)
self.timer.start()
self.isTimerStarted=1
if self.timer.doneFlag:
self.timer.doneFlag=0
self.isTimerStarted=0
#and all the other code
import time
import threading
class Timer(threading.Thread):
def __init__(self, seconds):
self.runTime = seconds
self.doneFlag=0
threading.Thread.__init__(self)
def run(self):
time.sleep(self.runTime)
self.doneFlag=1
print "Buzzzz"
x=dataCollection()
while 1:
x.StateFunction()
time.sleep(0.1)

First, you've effectively rebuilt threading.Timer with less flexibility. So I think you're better off using the existing class. (There are some obvious downsides with creating a thread for each timer instance. But if you just want a single one-shot timer, it's fine.)
More importantly, having your main thread repeatedly poll doneFlag is probably a bad idea. This means you have to call your state function as often as possible, burning CPU for no good reason.
Presumably the reason you have to return within a few milliseconds is that you're returning to some kind of event loop, presumably for your GUI (but, e.g., a network reactor has the same issue, with the same solutions, so I'll keep things general).
If so, almost all such event loops have a way to schedule a timed callback within the event loop—Timer in wx, callLater in twisted, etc. So, use that.
If you're using a framework that doesn't have anything like that, it hopefully at least has some way to send an event/fire a signal/post a message/whatever it's called from outside. (If it's a simple file-descriptor-based reactor, it may not have that, but you can add it yourself just by tossing a pipe into the reactor.) So, change your Timer callback to signal the event loop, instead of writing code that polls the Timer.
If for some reason you really do need to poll a variable shared across threads, you really, really, should be protecting it with a Condition or RLock. There is no guarantee in the language that, when thread 0 updates the value, thread 1 will see the new value immediately, or even ever. If you understand enough of the internals of (a specific version of) CPython, you can often prove that the GIL makes a lock unnecessary in specific cases. But otherwise, this is a race.
Finally:
The problem that I run into is that sometimes I feel like the time.sleep function in run , doesn't wait fully for a second (sometimes it may not even wait).
Well, the documentation clearly says this can happen:
The actual suspension time may be less than that requested because any caught signal will terminate the sleep() following execution of that signal’s catching routine.
So, if you need a guarantee that it actually sleeps for at least 1 second, the only way to do this is something like this:
t0 = time.time()
dur = 1.0
while True:
time.sleep(dur)
t1 = time.time()
dur = 1.0 - (t1 - t0)
if dur <= 0:
break

Resource usage of "time.sleep" in loop vs. "threading.Timer"

First method:
import threading
import time
def keepalive():
while True:
print 'Alive.'
time.sleep(200)
threading.Thread(target=keepalive).start()
Second method:
import threading
def keepalive():
print 'Alive.'
threading.Timer(200, keepalive).start()
threading.Timer(200, keepalive).start()
Which method takes up more RAM? And in the second method, does the thread end after being activated? or does it remain in the memory and start a new thread? (multiple threads)

Timer creates a new thread object for each started timer, so it certainly needs more resources when creating and garbage collecting these objects.
As each thread exits immediately after it spawned another active_count stays constant, but there are constantly new Threads created and destroyed, which causes overhead. I'd say the first method is definitely better.
Altough you won't realy see much difference, only if the interval is very small.

Here's an example of how to test this yourself:
And in the second method, does the thread end after being activated? or does it remain in the memory and start a new thread? (multiple threads)
import threading
def keepalive():
print 'Alive.'
threading.Timer(200, keepalive).start()
print threading.active_count()
threading.Timer(200, keepalive).start()
I also changed the 200 to .2 so it wouldn't take as long.
The thread count was 3 forever.
Then I did this:
top -pid 24767
The #TH column never changed.
So, there's your answer: We don't have enough info to know whether Python maintains a single timer thread for all of the timers, or ends and cleans up the thread as soon as the timer runs, but we can be sure the threads doesn't stick around and pile up. (If you do want to know which of the former is happening, you can, e.g., print the thread ids.)
An alternative way to find out is to look at the source. As the documentation says, "Timer is a subclass of Thread and as such also functions as an example of creating custom threads". The fact that it's a subclass of Thread already tells you that each Timer is a Thread. And the fact that it "functions as an example" implies that it ought to be easy to read. If you click the link form the documentation to the source, you can see how trivial it is. Most of the work is done by Event, but that's in the same source file, and it's almost as simple. Effectively, it just creates a condition variable, waits on it (so it blocks until it times out, or you notify the condition by calling cancel), then quits.
The reason I'm answering one sub-question and explaining how I did it, rather than answering each sub-question, is because I think it would be more useful for you to walk through the same steps.
On further reflection, this probably isn't a question to be decided by optimization in the first place:
If you have a simple, synchronous program that needs to do nothing for 200 seconds, make a blocking call to sleep. Or, even simpler, just do the job and quit, and pick an external tool to schedule your script to run every 200s.
On the other hand, if your program is inherently asynchronous—especially if you've already got thread, signal handlers, and/or an event loop—there's just no way you're going to get sleep to work. If Timer is too inefficient, go to PyPI or ActiveState and find a better timer that lets you schedule repeatable timers (or even multiple timers) with a single instance and thread. (Or, if you're using signals, use signal.alarm or setitimer, and if you're using an event loop, build the timer into your main loop.)
I can't think of any use case where sleep and Timer would both be serious contenders.

Python: set a function timeout without using signal or threads?

Is there a way to have a function raise an error if it takes longer than a certain amount of time to return? I want to do this without using signal (because I am not in the main thread) or by spawning more threads, which is cumbersome.

If your function is looping through a lot of things, you could check the elapsed time during each iteration of the loop... but if it's blocked on something for the long period, then you need to have some other thread which can be handling the timing stuff while the thread you're timing is blocked.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.