I'm currently using python (2.7) to write a GUI that has some threads going on. I come across a point that I need to do a roughly about a second delay before getting a piece of information, but I can't afford to have the function takes more than a few millisecond to run. With that in mind, I'm trying to create a Threaded timer that will set a flag timer.doneFlag and have the main function to keep poking to see whether it's done or not.
It is working. But not all the time. The problem that I run into is that sometimes I feel like the time.sleep function in run , doesn't wait fully for a second (sometimes it may not even wait). All I need is that I can have a flag that allow me control the start time and raise the flag when it reaches 1 second.
I maybe doing too much just to get a delay that is threadable, if you can suggest something, or help me find a bug in the following code, I'd be very grateful!
I've attached a portion of the code I used:
from main program:
class dataCollection:
def __init__(self):
self.timer=Timer(5)
self.isTimerStarted=0
return
def StateFunction(self): #Try to finish the function within a few milliseconds
if self.isTimerStarted==0:
self.timer=Timer(1.0)
self.timer.start()
self.isTimerStarted=1
if self.timer.doneFlag:
self.timer.doneFlag=0
self.isTimerStarted=0
#and all the other code
import time
import threading
class Timer(threading.Thread):
def __init__(self, seconds):
self.runTime = seconds
self.doneFlag=0
threading.Thread.__init__(self)
def run(self):
time.sleep(self.runTime)
self.doneFlag=1
print "Buzzzz"
x=dataCollection()
while 1:
x.StateFunction()
time.sleep(0.1)
First, you've effectively rebuilt threading.Timer with less flexibility. So I think you're better off using the existing class. (There are some obvious downsides with creating a thread for each timer instance. But if you just want a single one-shot timer, it's fine.)
More importantly, having your main thread repeatedly poll doneFlag is probably a bad idea. This means you have to call your state function as often as possible, burning CPU for no good reason.
Presumably the reason you have to return within a few milliseconds is that you're returning to some kind of event loop, presumably for your GUI (but, e.g., a network reactor has the same issue, with the same solutions, so I'll keep things general).
If so, almost all such event loops have a way to schedule a timed callback within the event loop—Timer in wx, callLater in twisted, etc. So, use that.
If you're using a framework that doesn't have anything like that, it hopefully at least has some way to send an event/fire a signal/post a message/whatever it's called from outside. (If it's a simple file-descriptor-based reactor, it may not have that, but you can add it yourself just by tossing a pipe into the reactor.) So, change your Timer callback to signal the event loop, instead of writing code that polls the Timer.
If for some reason you really do need to poll a variable shared across threads, you really, really, should be protecting it with a Condition or RLock. There is no guarantee in the language that, when thread 0 updates the value, thread 1 will see the new value immediately, or even ever. If you understand enough of the internals of (a specific version of) CPython, you can often prove that the GIL makes a lock unnecessary in specific cases. But otherwise, this is a race.
Finally:
The problem that I run into is that sometimes I feel like the time.sleep function in run , doesn't wait fully for a second (sometimes it may not even wait).
Well, the documentation clearly says this can happen:
The actual suspension time may be less than that requested because any caught signal will terminate the sleep() following execution of that signal’s catching routine.
So, if you need a guarantee that it actually sleeps for at least 1 second, the only way to do this is something like this:
t0 = time.time()
dur = 1.0
while True:
time.sleep(dur)
t1 = time.time()
dur = 1.0 - (t1 - t0)
if dur <= 0:
break
Related
This is more out of theoretical curiosity than an actual problem I am having.
Say you want to run some code at a regular interval, what are the pros and cons of using a Timer vs using a thread + time.sleep in terms of CPU consumption?
The two below approaches do the same. I am aware that the Thread approach is not exactly one second interval, but rather adds a delay after each execution, which can matter if the task_function operation takes a long time. I am also aware that there are many other ways to solve this problem, but lets focus on the threading package.
Timer approach
def task_function():
print(time.time())
def task():
task_function()
threading.Timer(1,task).start()
task()
Thread approach
def task_function():
while True:
print(time.time())
time.sleep(1)
threading.Thread(target=task_function).start()
I read somewhere that starting a thread is quite resource intensive. So I wonder that if you had some code you wanted to run every 0.1 seconds, would the Timer approach not be sub-optimal since a new thread has to be started so often?
If the code must repeat on an interval, use the plain Thread (to be clear, Timer is just a thin wrapper around a Thread in the first place; it's implemented as a subclass). Spawning a new thread (via Timer) 10x a second is wasteful, and gains you nothing in any event.
You should make the worker thread a daemon thread though, unless you really want it to keep the process alive indefinitely.
I'm fairly new to Python, so maybe my whole concept of how this should Work is wrong:
I'm building a RFID Reader for time managing purposes. E.g. User logs in with RFID chip -> Timer Starts counting and updating a Google spreadsheet every Minute. The updating part works fine, but takes a little while. But I want to Check for RFID Logins all the time. Somewhere I read that Event driven Programming is what I'm looking for.
Currently I'm doing everything in a while true loop, which feels like a hack itself. Can i somehow just execute my code when the RFID reader sends a signal? And then time my update to run every minute or so parallel? I'd like to know whats best practice here.
Parallel and Event Driven are basically orthogonal, although it is generally "easy" to parallelize events. I'll first cover the event driven and then the parallelisation, although you may only want to use the later.
The "normal" controlflow in python is iterative.
That means you define the instructions the code should do and then the pc executes these step for step.
There are different ways to organize your code (functional, event driven, object oriented, although I don't want to say that these are absolute categories where you can only do X or Y). Event driven normally means you define events and how to handle them.
There is nothing you could program with event driven which you couldn't program iterativly and vice versa.
Python mainly got support for asyncronuos stuff with version 3.4 when the asyncio library was introduced.
With 3.5 you also got syntactic sugar await and async. Because you are on 2.7 this is not available for you.
There is a backport from asyncio named trollius but this is overkill if you only have a "low amount of events". Also it's not hard to "roll your own basic event loop" (of course asyncio and trollius do much more, but if we are not going to use these features, why bother?).
The basic workflow is waiting for events and then handling them as they occur:
events = []
while waiting_for_events:
if events:
event = events.pop()
handle_event(event)
You somehow need to know how to differentiate between events and how to handle them.
For a "full featured event loop" you would probably use different classes with inheritance, but lets just use a name for each event.
Also we probably need some kind of data like which RFID we encountered.
from collections import namedtuple
Event = namedtuple("Event", ["name", "data"])
Then we simply need to map events to how to handle them:
def handle_rfid(data):
...
def handle_timer(data):
...
event_handler_mapping = {"rfid": handle_rfid, "timer": handle_timer}
def handle_event(event):
event_handler_mapping[event.name](event.data)
We still need to generate events, so lets rewrite the eventloop to get events:
timer = 0
current_time = time.perf_counter()
while waiting_for_events:
rfid = get_rfids()
if rfid:
events.append(Event("rfid", rfid))
if timer > 1:
events.append(Event("timer", timer))
timer = 0
else:
timer += time.perf_counter() - current_time
current_time = time.perf_counter()
if events:
event = events.pop()
handle_event(event)
And now we are "event driven". The nice thing is that we can easily extend this to more events.
The bad thing is that it still does the same thing you probably already have, but it's more complicated.
Also if the event handling needs a lot of time (which seems to be the case with updating the spreadsheet) the other events will not
be generated and handled. This is were parallelism comes into play.
Parallelism basically means we can use multiple cores.
Here we actually only need "concurrency" which means two things can happen at once.
This is "easier" than true parallelism, we can just switch between different things but still do all the things sequentially.
In python this basically boils down to multiprocessing (parallelism) and threads ("only" concurrency) (in other programming languages threads actually do parallelism, but in python this is for reasons I don't want to go into not the case).
The problem with concurrency is always syncronisation. If things can happen at the same time, bad things can happen
if two threads try to change the same variable. In general as long as you only use thread-safe functions to access variables shared between threads, you are safe.
In python threads are created by the threading module.
I personally find it hard to understand if you don't already know threads from somewhere else, but the gist is the following:
To run a function in a thread use threading.Thread(target=function) and then thread.start().
You could use it the following:
def run_in_thread(f, *args, **kwargs):
thread = Thread(target=f, args=args, kwargs=kwargs)
thread.start()
def _update_spreadsheet(data):
# logic here
# when using the event driven approach from above
def handle_timer(data):
run_in_thread(_update_spreadsheet(data))
Note that if you access variables from within _update_spreadsheet you need to be carefule to only use thread safe function.
It is "best" to use as little inter-thread communication as possible.
A queue is often a good choice.
You can use parallelism/concurrency without the event driven organisation.
Because we already divided the code into event handlers we can call long running event handlers in a seperate thread.
If we have lots of events and event handlers running everything in threads is a bad idea (because thread switching has an overhead).
Thus asyncio (and probably all other event loops) implement some kind of "wait until atleast one event can be handled".
This is most interesting for internet input and output, because these need "a long time".
Often something like select is used.
Other events (timers, read from disk, wait for some hardware events, ...) need other mechanisms for "wake me up when something happens". Integrating all of these is one of the features asyncio offer for you.
In this case, say I wanted to wait on a condition to happen, that may happen at any random time.
while True:
if condition:
#Do Whatever
else:
pass
As you can see, pass will just happen until the condition is True. But while the condition isn't True the cpu is being pegged with pass causing higher cpu usage, when I simply just want it to wait until the condition occurs. How may I do this?
See Busy_loop#Busy-waiting_alternatives:
Most operating systems and threading libraries provide a variety of system calls that will block the process on an event, such as lock acquisition, timer changes, I/O availability or signals.
Basically, to wait for something, you have two options (same as IRL):
Check for it periodically with a reasonable interval (this is called "polling")
Make the event you're waiting for notify you: invoke (or, as a special case, unblock) your code somehow (this is called "event handling" or "notifications". For system calls that block, "blocking call" or "synchronous call" or call-specific terms are typically used instead)
As already mentioned you can a) poll i.e. check for a condition and if it is not true wait for some time interval, if your condition is an external event you can arrange for a blocking wait for the state to change, or you can also take a look at the publish subscribe model, pubsub, where your code registers an interest in a given item and then other parts of the code publish the item.
This is not really a Python problem. Optimally, you want to put your process to sleep and wait for some sort of signal that the action has occured, which will use no CPU while waiting. So it's not so much a case of writing Python code but figuring out what mechanism is used to make condition true and thus wait on that.
If the condition is a simple flag set by another thread in your program rather than an external resource, you need to go back and learn from scratch how threading works.
Only if the thing that you're waiting for does not provide any sort of push notification that you can wait on should you consider polling it in a loop. A sleep will help reduce the CPU load but not eliminate it and it will also increase the response latency as the sleep has to complete before you can commence processing.
As for waiting on events, an event-driven paradigm might be what you want unless your program is utterly trivial. Python has the Twisted framework for this.
I've been coding the python "apscheduler" package (Advanced Python Scheduler) into my app, so far it's going good, I'm able to do almost everything that I had envisioned doing with it.
Only one kink left to iron out...
The function my events are calling will only accept around 3 calls a second or fail as it is triggering very slow hardware I/O :(
I've tried limiting the max number of threads in the threadpool from 20 to just 1 to try and slow down execution, but since I'm not really putting a bit load on apscheduler my events are still firing pretty much concurrently (well... very, very close together at least).
Is there a way to 'stagger' different events that fire within the same second?
I have recently found this question because I, like yourself, was trying to stagger scheduled jobs slightly to compensate for slow hardware.
Including an argument like this in the scheduler add_job call staggers the start time for each job by 200ms (while incrementing idx for each job):
next_run_time=datetime.datetime.now() + datetime.timedelta(seconds=idx * 0.2)
What you want to use is the 'jitter' option.
From the docs:
The jitter option enables you to add a random component to the
execution time. This might be useful if you have multiple servers and
don’t want them to run a job at the exact same moment or if you want
to prevent multiple jobs with similar options from always running
concurrently
Example:
# Run the `job_function` every hour with an extra-delay picked randomly
# in a [-120,+120] seconds window.
sched.add_job(job_function, 'interval', hours=1, jitter=120)
I don't know about apscheduler but have you considered using a Redis LIST (queue) and simply serializing the event feed into that one critically bounded function so that it fires no more than three times per second? (For example you could have it do a blocking POP with a one second max delay, increment your trigger count for every event, sleep when it hits three, and zero the trigger count any time the blocking POP times out (Or you could just use 333 millisecond sleeps after each event).
My solution for future reference:
I added a basic bool lock in the function being called and a wait which seems to do the trick nicely - since it's not the calling of the function itself that raises the error, but rather a deadlock situation with what the function carries out :D
First method:
import threading
import time
def keepalive():
while True:
print 'Alive.'
time.sleep(200)
threading.Thread(target=keepalive).start()
Second method:
import threading
def keepalive():
print 'Alive.'
threading.Timer(200, keepalive).start()
threading.Timer(200, keepalive).start()
Which method takes up more RAM? And in the second method, does the thread end after being activated? or does it remain in the memory and start a new thread? (multiple threads)
Timer creates a new thread object for each started timer, so it certainly needs more resources when creating and garbage collecting these objects.
As each thread exits immediately after it spawned another active_count stays constant, but there are constantly new Threads created and destroyed, which causes overhead. I'd say the first method is definitely better.
Altough you won't realy see much difference, only if the interval is very small.
Here's an example of how to test this yourself:
And in the second method, does the thread end after being activated? or does it remain in the memory and start a new thread? (multiple threads)
import threading
def keepalive():
print 'Alive.'
threading.Timer(200, keepalive).start()
print threading.active_count()
threading.Timer(200, keepalive).start()
I also changed the 200 to .2 so it wouldn't take as long.
The thread count was 3 forever.
Then I did this:
top -pid 24767
The #TH column never changed.
So, there's your answer: We don't have enough info to know whether Python maintains a single timer thread for all of the timers, or ends and cleans up the thread as soon as the timer runs, but we can be sure the threads doesn't stick around and pile up. (If you do want to know which of the former is happening, you can, e.g., print the thread ids.)
An alternative way to find out is to look at the source. As the documentation says, "Timer is a subclass of Thread and as such also functions as an example of creating custom threads". The fact that it's a subclass of Thread already tells you that each Timer is a Thread. And the fact that it "functions as an example" implies that it ought to be easy to read. If you click the link form the documentation to the source, you can see how trivial it is. Most of the work is done by Event, but that's in the same source file, and it's almost as simple. Effectively, it just creates a condition variable, waits on it (so it blocks until it times out, or you notify the condition by calling cancel), then quits.
The reason I'm answering one sub-question and explaining how I did it, rather than answering each sub-question, is because I think it would be more useful for you to walk through the same steps.
On further reflection, this probably isn't a question to be decided by optimization in the first place:
If you have a simple, synchronous program that needs to do nothing for 200 seconds, make a blocking call to sleep. Or, even simpler, just do the job and quit, and pick an external tool to schedule your script to run every 200s.
On the other hand, if your program is inherently asynchronous—especially if you've already got thread, signal handlers, and/or an event loop—there's just no way you're going to get sleep to work. If Timer is too inefficient, go to PyPI or ActiveState and find a better timer that lets you schedule repeatable timers (or even multiple timers) with a single instance and thread. (Or, if you're using signals, use signal.alarm or setitimer, and if you're using an event loop, build the timer into your main loop.)
I can't think of any use case where sleep and Timer would both be serious contenders.