First method:
import threading
import time
def keepalive():
while True:
print 'Alive.'
time.sleep(200)
threading.Thread(target=keepalive).start()
Second method:
import threading
def keepalive():
print 'Alive.'
threading.Timer(200, keepalive).start()
threading.Timer(200, keepalive).start()
Which method takes up more RAM? And in the second method, does the thread end after being activated? or does it remain in the memory and start a new thread? (multiple threads)
Timer creates a new thread object for each started timer, so it certainly needs more resources when creating and garbage collecting these objects.
As each thread exits immediately after it spawned another active_count stays constant, but there are constantly new Threads created and destroyed, which causes overhead. I'd say the first method is definitely better.
Altough you won't realy see much difference, only if the interval is very small.
Here's an example of how to test this yourself:
And in the second method, does the thread end after being activated? or does it remain in the memory and start a new thread? (multiple threads)
import threading
def keepalive():
print 'Alive.'
threading.Timer(200, keepalive).start()
print threading.active_count()
threading.Timer(200, keepalive).start()
I also changed the 200 to .2 so it wouldn't take as long.
The thread count was 3 forever.
Then I did this:
top -pid 24767
The #TH column never changed.
So, there's your answer: We don't have enough info to know whether Python maintains a single timer thread for all of the timers, or ends and cleans up the thread as soon as the timer runs, but we can be sure the threads doesn't stick around and pile up. (If you do want to know which of the former is happening, you can, e.g., print the thread ids.)
An alternative way to find out is to look at the source. As the documentation says, "Timer is a subclass of Thread and as such also functions as an example of creating custom threads". The fact that it's a subclass of Thread already tells you that each Timer is a Thread. And the fact that it "functions as an example" implies that it ought to be easy to read. If you click the link form the documentation to the source, you can see how trivial it is. Most of the work is done by Event, but that's in the same source file, and it's almost as simple. Effectively, it just creates a condition variable, waits on it (so it blocks until it times out, or you notify the condition by calling cancel), then quits.
The reason I'm answering one sub-question and explaining how I did it, rather than answering each sub-question, is because I think it would be more useful for you to walk through the same steps.
On further reflection, this probably isn't a question to be decided by optimization in the first place:
If you have a simple, synchronous program that needs to do nothing for 200 seconds, make a blocking call to sleep. Or, even simpler, just do the job and quit, and pick an external tool to schedule your script to run every 200s.
On the other hand, if your program is inherently asynchronous—especially if you've already got thread, signal handlers, and/or an event loop—there's just no way you're going to get sleep to work. If Timer is too inefficient, go to PyPI or ActiveState and find a better timer that lets you schedule repeatable timers (or even multiple timers) with a single instance and thread. (Or, if you're using signals, use signal.alarm or setitimer, and if you're using an event loop, build the timer into your main loop.)
I can't think of any use case where sleep and Timer would both be serious contenders.
Related
I have a program that uses threads to start another thread once a certain threshold is reached. Right now the second thread is being started multiple times. I implemented a lock but I don't think I did it right.
for i in range(max_threads):
t1 = Thread(target=grab_queue)
t1.start()
in grab_queue, I have:
...
rows.append(resultJson)
if len(rows.value()) >= 250:
with Lock():
row_thread = Thread(target=insert_rows, kwargs={'rows': rows.value()})
row_thread.start()
rows.reset()
Which starts another thread to process the list of rows. I would like to make sure that as soon as it hits the if condition, the other threads wont run in order to make sure that extra threads to process the list of rows aren't started.
Your lock is covering the wrong portion of the code. You have a race condition between the check for the size of rows, and the portion of the code where you reset the rows. Given that the lock is taken only after the size check, two threads could easily both decide that the array has grown too large, and only then would the lock kick in to serialize the resetting of the array. "Serialize" in this case means that the task would still be performed twice, once by each thread, but it would happen in succession rather than in parallel.
The correct code could look like this:
rows.append(resultJson)
with grow_lock:
if len(rows.value()) >= 250:
row_thread = Thread(target=insert_rows, kwargs={'rows': rows.value()})
row_thread.start()
rows.reset()
There is another issue with the code as shown in the question: if Lock() refers to threading.Lock, it is creating and locking a new lock on each invocation, and in each thread! A lock protects a resource shared among threads, and to perform that function, the lock must itself be shared. To fix the problem, instantiate the lock once and pass it to the thread's target function.
Taking a step back, your code implements a custom thread pool. Getting that right and covering all the corner cases takes a lot of work, testing, and debugging. There are production-tested modules specialized for that purpose, such as the multiprocessing module shipped with Python (which supports both process and thread pools), and it is a good idea to get acquainted with them before reimplementing their functionality. See, for example, this article for an accessible introduction to multiprocessing-based thread pools.
I have a Python app that runs a pinball machine. It needs to run at a fairly consistent loop rate to do pinball-type things, but I also need to be able to load images and sounds at various points throughout the games. I don't have enough memory to pre-load all the sound files I need for the entire game, so I want to use an additional thread (or threads) to load them in the background while the main game loop continues on.
Using Python's threading module is easy enough, as is using a Queue.Queue to maintain a list of assets that need to load. My question is whether it's "ok" (for lack of a better word) to have the asset loader thread always running, or whether I should just create the thread when I need it and then let it end when I'm done. In my case the pinball machine—and my Python app—will be on an running for many hours (or days) at a time.
All of the examples of Python threading I've found tend to be for apps that do something and then end, versus creating (potentially) temporary threads for a long-running app.
In my case I think I have two options:
Option 1, where the loader thread runs forever:
self.loader_queue = Queue.Queue
def loader_thread(self):
while True:
do_my_work(self.loader_queue.get())
Option 2, where the loader thread ends when the queue is empty:
def loader_thread(self):
while not self.loader_queue.empty():
do_my_work(self.loader_queue.get())
Obviously I've left some things out.. Some try: blocks and a method for creating the thread in Option 2, but I think these snippets explain my two options.
The real question I have is that with Option 1, is that "bad" because then I'm wasting half of Python's execution cycles while the loader thread just spins and does nothing for the 99.99% of the time the queue is empty?
Or is this a case where I should use the first option, but use self.loader_queue.get(block=True)? I assume if my loader thread is just blocking while waiting for an item in the Queue then that's an efficient type of wait and I won't be wasting a bunch of cycles?
Thanks!
Brian
The default for Queue.get is to block, which is what you need:
Remove and return an item from the queue. If optional args block is true and timeout is None (the default), block if necessary until an item is available. If timeout is a positive number, it blocks at most timeout seconds and raises the Empty exception if no item was available within that time. Otherwise (block is false), return an item if one is immediately available, else raise the Empty exception (timeout is ignored in that case).
This way the while loop only runs a single time for each item in the queue and is blocked when the queue is empty.
You can actually test this yourself by doing something visible (like printing some output) in the while loop.
Option 1 is good if you are waiting for items, since option 2 may terminate before you get them (if loader is fast enough).
The thread is probably not taking up enough resources to be considered as an optimization candidate. And since you're blocking it when it shouldn't be running option 1 seems to be the way to go.
I'm using Python with wxPython for writing an app.
The method I'm considering to accomplish this may not be the best - if that's the case, let me know because I'm open to refactoring.
Right now, I have one GUI form. The main program start point instantiates an instance of the GUI form then runs wx.mainLoop(), which causes the app's main initial thread to block for the lifetime of the app.
We of course know that when events happen in the UI, the UI thread runs the code for them.
Now, I have another thread - a worker thread. This thread needs to sit idle, and then when something happens in the UI thread, e.g. a button is clicked, I want the worker thread to stop idling and do something else - run a function, say.
I can't envision this right now but I could see as the app gets more complex also having to signal the worker thread while it's actually busy doing something.
I have two questions about this setup:
How can I make my worker thread idle without using up CPU time? Doing something like while True: pass will suck CPU time, while something like while True: time.sleep(0.1) will not allow instantaneous reaction to events.
What's the best way to signal into the worker thread to do something? I don't want the UI thread to execute something, I want the worker thread to be signaled, by the UI thread, that it should change what it's doing. Ideally, I'd have some way for the worker thread to register a callback with the UI itself, so that when a button is clicked or any other UI Event happens, the worker thread is signalled to change what it's doing.
So, is this the best way to accomplish this? And what's the best way to do it?
Thanks!
First: Do you actually need a background thread to sit around idle in the first place?
On most platforms, starting a new thread is cheap. (Except on Windows and Linux, where it's supercheap.) So, why not just kick off a thread whenever you need it? (It's just as easy to keep around a list of threads as a single thread, right?)
Alternatively, why not just create a ThreadPoolExecutor, and just submit jobs to it, and let the executor worry about when they get run and on which thread. Any time you can just think in terms of "tasks that need to get run without blocking the main thread" instead of "worker threads that need to wait on work", you're making your life easier. Under the covers, there's still one or more worker threads waiting on a queue, or something equivalent, but that part's all been written (and debugged and optimized) for you. All you have to write are the tasks, which are just regular functions.
But, if you want to write explicit background threads, you can, so I'll explain that.
How can I make my worker thread idle without using up CPU time? … What's the best way to signal into the worker thread to do something?
The way to idle a thread until a value is ready is to wait on a synchronization object. On any modern OS, waiting on a synchronization object means the operating system stops giving you any CPU time until the object is ready for you.*
There are a variety of different options you can see in the Threading module docs, but the obvious one to use in most cases like this is a Condition. The way to signal the worker thread is then to notify the Condition.
However, often a Queue is a lot simpler. To wait on a Queue, just call its get method with block=True. To signal another thread to wake up, just put something on the Queue. (Under the covers, a Queue wraps up a list or deque or other collection, a Lock, and a Condition, so you just tell it what you want to do—check for a value, block until there's a value, add a value—instead of dealing with waiting and signaling and protecting the collection.)
See the answer to controlling UI elements in wxPython using threading for how to signal in both directions, from a worker thread to a UI thread and vice-versa.
I'd have some way for the worker thread to register a callback with the UI itself, so that when a button is clicked or any other UI Event happens, the worker thread is signalled to change what it's doing.
You can do it this way if you want. Just pass self.queue.put or def callback(value): self.value = value; self.condition.notify() or whatever as a callback, and the GUI thread doesn't even have to know that the callback is triggering another thread.
In fact, that's a pretty nice design that may make you very happy later, when you decide to move some code back and forth between inline and background-threaded, or move it off to a child process instead of a background thread, or whatever.
I can't envision this right now but I could see as the app gets more complex also having to signal the worker thread while it's actually busy doing something.
But what do you want to happen if it's busy?
If you just want to say "If you're idle, wake up and do this task; otherwise, hold onto it and do it whenever you're ready", that's exactly what a Queue, or an Executor, will do for you automatically.
If you want to say, "If you're idle, wake up, otherwise, don't worry about it", that's what a Condition or Event will do.
If you want to say, "If you're idle, wake up and do this, otherwise, cancel what you're doing and do this instead", that's a bit more complicated. You pretty much need to have the background thread periodically check an "interrupt_me" variable while it's busy (and put a Lock around it), and then you'll set that flag as well as notifying the Condition… although in some cases, you can merge the idle and busy cases into a single Condition or Event (by calling an infinite wait() when idle, and a quick-check wait(timeout=0) when busy).
* In some cases—e.g., a Linux futex or a Windows CriticalSection—it may actually spin off a little bit of CPU time in some cases, because that happens to be a good optimization. But the point is, you're not asking for any CPU time until you're ready to use it.
I'm currently using python (2.7) to write a GUI that has some threads going on. I come across a point that I need to do a roughly about a second delay before getting a piece of information, but I can't afford to have the function takes more than a few millisecond to run. With that in mind, I'm trying to create a Threaded timer that will set a flag timer.doneFlag and have the main function to keep poking to see whether it's done or not.
It is working. But not all the time. The problem that I run into is that sometimes I feel like the time.sleep function in run , doesn't wait fully for a second (sometimes it may not even wait). All I need is that I can have a flag that allow me control the start time and raise the flag when it reaches 1 second.
I maybe doing too much just to get a delay that is threadable, if you can suggest something, or help me find a bug in the following code, I'd be very grateful!
I've attached a portion of the code I used:
from main program:
class dataCollection:
def __init__(self):
self.timer=Timer(5)
self.isTimerStarted=0
return
def StateFunction(self): #Try to finish the function within a few milliseconds
if self.isTimerStarted==0:
self.timer=Timer(1.0)
self.timer.start()
self.isTimerStarted=1
if self.timer.doneFlag:
self.timer.doneFlag=0
self.isTimerStarted=0
#and all the other code
import time
import threading
class Timer(threading.Thread):
def __init__(self, seconds):
self.runTime = seconds
self.doneFlag=0
threading.Thread.__init__(self)
def run(self):
time.sleep(self.runTime)
self.doneFlag=1
print "Buzzzz"
x=dataCollection()
while 1:
x.StateFunction()
time.sleep(0.1)
First, you've effectively rebuilt threading.Timer with less flexibility. So I think you're better off using the existing class. (There are some obvious downsides with creating a thread for each timer instance. But if you just want a single one-shot timer, it's fine.)
More importantly, having your main thread repeatedly poll doneFlag is probably a bad idea. This means you have to call your state function as often as possible, burning CPU for no good reason.
Presumably the reason you have to return within a few milliseconds is that you're returning to some kind of event loop, presumably for your GUI (but, e.g., a network reactor has the same issue, with the same solutions, so I'll keep things general).
If so, almost all such event loops have a way to schedule a timed callback within the event loop—Timer in wx, callLater in twisted, etc. So, use that.
If you're using a framework that doesn't have anything like that, it hopefully at least has some way to send an event/fire a signal/post a message/whatever it's called from outside. (If it's a simple file-descriptor-based reactor, it may not have that, but you can add it yourself just by tossing a pipe into the reactor.) So, change your Timer callback to signal the event loop, instead of writing code that polls the Timer.
If for some reason you really do need to poll a variable shared across threads, you really, really, should be protecting it with a Condition or RLock. There is no guarantee in the language that, when thread 0 updates the value, thread 1 will see the new value immediately, or even ever. If you understand enough of the internals of (a specific version of) CPython, you can often prove that the GIL makes a lock unnecessary in specific cases. But otherwise, this is a race.
Finally:
The problem that I run into is that sometimes I feel like the time.sleep function in run , doesn't wait fully for a second (sometimes it may not even wait).
Well, the documentation clearly says this can happen:
The actual suspension time may be less than that requested because any caught signal will terminate the sleep() following execution of that signal’s catching routine.
So, if you need a guarantee that it actually sleeps for at least 1 second, the only way to do this is something like this:
t0 = time.time()
dur = 1.0
while True:
time.sleep(dur)
t1 = time.time()
dur = 1.0 - (t1 - t0)
if dur <= 0:
break
I have a Qt application written in PySide (Qt Python binding). This application has a GUI thread and many different QThreads that are in charge of performing some heavy lifting - some rather long tasks. As such long task sometimes gets stuck (usually because it is waiting for a server response), the application sometimes freezes.
I was therefore wondering if it is safe to call QCoreApplication.processEvents() "manually" every second or so, so that the GUI event queue is cleared (processed)? Is that a good idea at all?
It's safe to call QCoreApplication.processEvents() whenever you like. The docs explicitly state your use case:
You can call this function occasionally when your program is busy
performing a long operation (e.g. copying a file).
There is no good reason though why threads would block the event loop in the main thread, though. (Unless your system really can't keep up.) So that's worth looking into anyway.
A couple of hints people might find useful:
A. You need to beware of the following:
Every so often the threads want to send stuff back to the main thread. So they post an event and call processEvents
If the code runs from the event also calls processEvents then instead of returning to the next statement, python can instead dispatch a worker thread again and that can then repeat this process.
The net result of this can be hundreds or thousands of nested processEvent statements which can then result in a recursion level exceeded error message.
Moral - if you are running a multi-threaded application do NOT call processEvents in any code initiated by a thread which runs in the main thread.
B. You need to be aware that CPython has a Global Interpreter Lock (GIL) that limits threads so that only one can run at any one time and the way that Python decides which threads to run is counter-intuitive. Running process events from a worker thread does not seem to do what it says on the can, and CPU time is not allocated to the main thread or to Python internal threads. I am still experimenting, but it seems that putting worker threads to sleep for a few miliseconds allows other threads to get a look in.