Parallel or Event driven functions in Python? - python

I'm fairly new to Python, so maybe my whole concept of how this should Work is wrong:
I'm building a RFID Reader for time managing purposes. E.g. User logs in with RFID chip -> Timer Starts counting and updating a Google spreadsheet every Minute. The updating part works fine, but takes a little while. But I want to Check for RFID Logins all the time. Somewhere I read that Event driven Programming is what I'm looking for.
Currently I'm doing everything in a while true loop, which feels like a hack itself. Can i somehow just execute my code when the RFID reader sends a signal? And then time my update to run every minute or so parallel? I'd like to know whats best practice here.

Parallel and Event Driven are basically orthogonal, although it is generally "easy" to parallelize events. I'll first cover the event driven and then the parallelisation, although you may only want to use the later.
The "normal" controlflow in python is iterative.
That means you define the instructions the code should do and then the pc executes these step for step.
There are different ways to organize your code (functional, event driven, object oriented, although I don't want to say that these are absolute categories where you can only do X or Y). Event driven normally means you define events and how to handle them.
There is nothing you could program with event driven which you couldn't program iterativly and vice versa.
Python mainly got support for asyncronuos stuff with version 3.4 when the asyncio library was introduced.
With 3.5 you also got syntactic sugar await and async. Because you are on 2.7 this is not available for you.
There is a backport from asyncio named trollius but this is overkill if you only have a "low amount of events". Also it's not hard to "roll your own basic event loop" (of course asyncio and trollius do much more, but if we are not going to use these features, why bother?).
The basic workflow is waiting for events and then handling them as they occur:
events = []
while waiting_for_events:
if events:
event = events.pop()
handle_event(event)
You somehow need to know how to differentiate between events and how to handle them.
For a "full featured event loop" you would probably use different classes with inheritance, but lets just use a name for each event.
Also we probably need some kind of data like which RFID we encountered.
from collections import namedtuple
Event = namedtuple("Event", ["name", "data"])
Then we simply need to map events to how to handle them:
def handle_rfid(data):
...
def handle_timer(data):
...
event_handler_mapping = {"rfid": handle_rfid, "timer": handle_timer}
def handle_event(event):
event_handler_mapping[event.name](event.data)
We still need to generate events, so lets rewrite the eventloop to get events:
timer = 0
current_time = time.perf_counter()
while waiting_for_events:
rfid = get_rfids()
if rfid:
events.append(Event("rfid", rfid))
if timer > 1:
events.append(Event("timer", timer))
timer = 0
else:
timer += time.perf_counter() - current_time
current_time = time.perf_counter()
if events:
event = events.pop()
handle_event(event)
And now we are "event driven". The nice thing is that we can easily extend this to more events.
The bad thing is that it still does the same thing you probably already have, but it's more complicated.
Also if the event handling needs a lot of time (which seems to be the case with updating the spreadsheet) the other events will not
be generated and handled. This is were parallelism comes into play.
Parallelism basically means we can use multiple cores.
Here we actually only need "concurrency" which means two things can happen at once.
This is "easier" than true parallelism, we can just switch between different things but still do all the things sequentially.
In python this basically boils down to multiprocessing (parallelism) and threads ("only" concurrency) (in other programming languages threads actually do parallelism, but in python this is for reasons I don't want to go into not the case).
The problem with concurrency is always syncronisation. If things can happen at the same time, bad things can happen
if two threads try to change the same variable. In general as long as you only use thread-safe functions to access variables shared between threads, you are safe.
In python threads are created by the threading module.
I personally find it hard to understand if you don't already know threads from somewhere else, but the gist is the following:
To run a function in a thread use threading.Thread(target=function) and then thread.start().
You could use it the following:
def run_in_thread(f, *args, **kwargs):
thread = Thread(target=f, args=args, kwargs=kwargs)
thread.start()
def _update_spreadsheet(data):
# logic here
# when using the event driven approach from above
def handle_timer(data):
run_in_thread(_update_spreadsheet(data))
Note that if you access variables from within _update_spreadsheet you need to be carefule to only use thread safe function.
It is "best" to use as little inter-thread communication as possible.
A queue is often a good choice.
You can use parallelism/concurrency without the event driven organisation.
Because we already divided the code into event handlers we can call long running event handlers in a seperate thread.
If we have lots of events and event handlers running everything in threads is a bad idea (because thread switching has an overhead).
Thus asyncio (and probably all other event loops) implement some kind of "wait until atleast one event can be handled".
This is most interesting for internet input and output, because these need "a long time".
Often something like select is used.
Other events (timers, read from disk, wait for some hardware events, ...) need other mechanisms for "wake me up when something happens". Integrating all of these is one of the features asyncio offer for you.

Related

twisted: processing incoming events in synchronous code

Suppose there's a synchronous function in a twisted-powered Python program that takes a long time to execute, doing that in a lot of reasonable-sized pieces of work. If the function could return deferreds, this would be a no-brainer, however the function happens to be deep inside some synchronous code, so that yielding deferreds to continue is impossible.
Is there a way to let twisted handle outstanding events without leaving that function? I.e. what I want to do is something along the lines of
def my_func():
results = []
for item in a_lot_of_items():
results.append(do_computation(item))
reactor.process_outstanding_events()
return results
Of course, this imposes reentrancy requirements on the code, but still, there's QCoreApplication.processEvents for that in Qt, is there anything in twisted?
The solution taken by some event-loop-based systems (essentially the solution you're referencing via Qt's QCoreApplication.processEvents API) is to make the main loop re-entrant. In Twisted terms, this would mean something like (not working code):
def my_expensive_task_that_cannot_be_asynchronous():
#inlineCallbacks
def do_work(units):
for unit in units:
yield do_one_work_asynchronously(unit)
work = do_work(some_work_units())
work.addBoth(lambda ignored: reactor.stop())
reactor.run()
def main():
# Whatever your setup is...
# Then, hypothetical event source triggering your
# expensive function:
reactor.callLater(
30,
my_expensive_task_that_cannot_be_asynchronous,
)
reactor.run()
Notice how there are two reactor.run calls in this program. If Twisted had a re-entrant event loop, this second call would start spinning the reactor again and not return until a matching reactor.stop call is encountered. The reactor would process all events it knows about, not just the ones generated by do_work, and so you would have the behavior you desire.
This requires a re-entrant event loop because my_expensive_task_... is already being called by the reactor loop. The reactor loop is on the call stack. Then, reactor.run is called and the reactor loop is now on the call stack again. So the usual issues apply: the event loop cannot have left over state in its frame (otherwise it may be invalid by the time the nested call is complete), it cannot leave its instance state inconsistent during any calls out to other code, etc.
Twisted does not have a re-entrant event loop. This is a feature that has been considered and, at least in the past, explicitly rejected. Supporting this features brings a huge amount of additional complexity (described above) to the implementation and the application. If the event loop is re-entrant then it becomes very difficult to avoid requiring all application code to be re-entrant safe as well. This negates one of the major benefits of the cooperative multitasking approach Twisted takes to concurrency (that you are guaranteed your functions will not be re-entered).
So, when using Twisted, this solution is out.
I'm not aware of another solution which would allow you to continue to run this code in the reactor thread. You mentioned that the code in question is nested deeply within some other synchronous code. The other options that come to mind are:
make the synchronous code capable of dealing with asynchronous things
factor the expensive parts out and compute them first, then pass the result in to the rest of the code
run all of that code, not just the computationally expensive part, in another thread
You could use deferToThread.
http://twistedmatrix.com/documents/13.2.0/core/howto/threading.html
That method runs your calculation in a separate thread and returns a deferred that is called back when the calculation is actually finished.
The issue is if do_heavy_computation() is code that blocks then execution won't go to the next function. In this case use deferToThread or blockingCallFromThread for heavy calculations. Also if you don't care for the results of the calculation then you can use callInThread. Take a look at documentation on threads
This should do:
for item in items:
reactor.callLater(0, heavy_func, item)
reactor.callLater should bring you back into the event loop.

How does Python's Twisted Reactor work?

Recently, I've been diving into the Twisted docs. From what I gathered, the basis of Twisted's functionality is the result of it's event loop called the "Reactor". The reactor listens for certain events and dispatches them to registered callback functions that have been designed to handle these events. In the book, there is some pseudo code describing what the Reactor does but I'm having trouble understanding it, it just doesn't make any sense to me.
while True:
timeout = time_until_next_timed_event()
events = wait_for_events(timeout)
events += timed_events_until(now())
for event in events:
event.process()
What does this mean?
In case it's not obvious, It's called the reactor because it reacts to
things. The loop is how it reacts.
One line at a time:
while True:
It's not actually while True; it's more like while not loop.stopped. You can call reactor.stop() to stop the loop, and (after performing some shut-down logic) the loop will in fact exit. But it is portrayed in the example as while True because when you're writing a long-lived program (as you often are with Twisted) it's best to assume that your program will either crash or run forever, and that "cleanly exiting" is not really an option.
timeout = time_until_next_timed_event()
If we were to expand this calculation a bit, it might make more sense:
def time_until_next_timed_event():
now = time.time()
timed_events.sort(key=lambda event: event.desired_time)
soonest_event = timed_events[0]
return soonest_event.desired_time - now
timed_events is the list of events scheduled with reactor.callLater; i.e. the functions that the application has asked for Twisted to run at a particular time.
events = wait_for_events(timeout)
This line here is the "magic" part of Twisted. I can't expand wait_for_events in a general way, because its implementation depends on exactly how the operating system makes the desired events available. And, given that operating systems are complex and tricky beasts, I can't expand on it in a specific way while keeping it simple enough for an answer to your question.
What this function is intended to mean is, ask the operating system, or a Python wrapper around it, to block, until one or more of the objects previously registered with it - at a minimum, stuff like listening ports and established connections, but also possibly things like buttons that might get clicked on - is "ready for work". The work might be reading some bytes out of a socket when they arrive from the network. The work might be writing bytes to the network when a buffer empties out sufficiently to do so. It might be accepting a new connection or disposing of a closed one. Each of these possible events are functions that the reactor might call on your objects: dataReceived, buildProtocol, resumeProducing, etc, that you will learn about if you go through the full Twisted tutorial.
Once we've got our list of hypothetical "event" objects, each of which has an imaginary "process" method (the exact names of the methods are different in the reactor just due to accidents of history), we then go back to dealing with time:
events += timed_events_until(now())
First, this is assuming events is simply a list of an abstract Event class, which has a process method that each specific type of event needs to fill out.
At this point, the loop has "woken up", because wait_for_events, stopped blocking. However, we don't know how many timed events we might need to execute based on how long it was "asleep" for. We might have slept for the full timeout if nothign was going on, but if lots of connections were active we might have slept for effectively no time at all. So we check the current time ("now()"), and we add to the list of events we need to process, every timed event with a desired_time that is at, or before, the present time.
Finally,
for event in events:
event.process()
This just means that Twisted goes through the list of things that it has to do and does them. In reality of course it handles exceptions around each event, and the concrete implementation of the reactor often just calls straight into an event handler rather than creating an Event-like object to record the work that needs to be done first, but conceptually this is just what happens. event.process here might mean calling socket.recv() and then yourProtocol.dataReceived with the result, for example.
I hope this expanded explanation helps you get your head around it. If you'd like to learn more about Twisted by working on it, I'd encourage you to join the mailing list, hop on to the IRC channel, #twisted to talk about applications or #twisted-dev to work on Twisted itself, both on Freenode.
I will try to elaborate:
The program yields control and go to sleep on wait for events.
I suppose the most interesting part here is event.
Event is:
on external demand (receiving network packet, click on a keyboard, timer, different program call) the program receives control (in some other thread or
in special routine). Somehow the sleep in wait_for_events becomes interrupted and wait_for_events returns.
On that occurrence of control the event handler stores information of that event into some data structure, events, which later is used for doing something about that events (event->process).
There can happen not only one, but many events in the time between entering and exiting of wait_for_events, all of them must be processed.
The event->process() procedure is custom and should usually call the interesting part - user's twisted code.

Python - wait on a condition without high cpu usage

In this case, say I wanted to wait on a condition to happen, that may happen at any random time.
while True:
if condition:
#Do Whatever
else:
pass
As you can see, pass will just happen until the condition is True. But while the condition isn't True the cpu is being pegged with pass causing higher cpu usage, when I simply just want it to wait until the condition occurs. How may I do this?
See Busy_loop#Busy-waiting_alternatives:
Most operating systems and threading libraries provide a variety of system calls that will block the process on an event, such as lock acquisition, timer changes, I/O availability or signals.
Basically, to wait for something, you have two options (same as IRL):
Check for it periodically with a reasonable interval (this is called "polling")
Make the event you're waiting for notify you: invoke (or, as a special case, unblock) your code somehow (this is called "event handling" or "notifications". For system calls that block, "blocking call" or "synchronous call" or call-specific terms are typically used instead)
As already mentioned you can a) poll i.e. check for a condition and if it is not true wait for some time interval, if your condition is an external event you can arrange for a blocking wait for the state to change, or you can also take a look at the publish subscribe model, pubsub, where your code registers an interest in a given item and then other parts of the code publish the item.
This is not really a Python problem. Optimally, you want to put your process to sleep and wait for some sort of signal that the action has occured, which will use no CPU while waiting. So it's not so much a case of writing Python code but figuring out what mechanism is used to make condition true and thus wait on that.
If the condition is a simple flag set by another thread in your program rather than an external resource, you need to go back and learn from scratch how threading works.
Only if the thing that you're waiting for does not provide any sort of push notification that you can wait on should you consider polling it in a loop. A sleep will help reduce the CPU load but not eliminate it and it will also increase the response latency as the sleep has to complete before you can commence processing.
As for waiting on events, an event-driven paradigm might be what you want unless your program is utterly trivial. Python has the Twisted framework for this.

threadable delay in python 2.7

I'm currently using python (2.7) to write a GUI that has some threads going on. I come across a point that I need to do a roughly about a second delay before getting a piece of information, but I can't afford to have the function takes more than a few millisecond to run. With that in mind, I'm trying to create a Threaded timer that will set a flag timer.doneFlag and have the main function to keep poking to see whether it's done or not.
It is working. But not all the time. The problem that I run into is that sometimes I feel like the time.sleep function in run , doesn't wait fully for a second (sometimes it may not even wait). All I need is that I can have a flag that allow me control the start time and raise the flag when it reaches 1 second.
I maybe doing too much just to get a delay that is threadable, if you can suggest something, or help me find a bug in the following code, I'd be very grateful!
I've attached a portion of the code I used:
from main program:
class dataCollection:
def __init__(self):
self.timer=Timer(5)
self.isTimerStarted=0
return
def StateFunction(self): #Try to finish the function within a few milliseconds
if self.isTimerStarted==0:
self.timer=Timer(1.0)
self.timer.start()
self.isTimerStarted=1
if self.timer.doneFlag:
self.timer.doneFlag=0
self.isTimerStarted=0
#and all the other code
import time
import threading
class Timer(threading.Thread):
def __init__(self, seconds):
self.runTime = seconds
self.doneFlag=0
threading.Thread.__init__(self)
def run(self):
time.sleep(self.runTime)
self.doneFlag=1
print "Buzzzz"
x=dataCollection()
while 1:
x.StateFunction()
time.sleep(0.1)
First, you've effectively rebuilt threading.Timer with less flexibility. So I think you're better off using the existing class. (There are some obvious downsides with creating a thread for each timer instance. But if you just want a single one-shot timer, it's fine.)
More importantly, having your main thread repeatedly poll doneFlag is probably a bad idea. This means you have to call your state function as often as possible, burning CPU for no good reason.
Presumably the reason you have to return within a few milliseconds is that you're returning to some kind of event loop, presumably for your GUI (but, e.g., a network reactor has the same issue, with the same solutions, so I'll keep things general).
If so, almost all such event loops have a way to schedule a timed callback within the event loop—Timer in wx, callLater in twisted, etc. So, use that.
If you're using a framework that doesn't have anything like that, it hopefully at least has some way to send an event/fire a signal/post a message/whatever it's called from outside. (If it's a simple file-descriptor-based reactor, it may not have that, but you can add it yourself just by tossing a pipe into the reactor.) So, change your Timer callback to signal the event loop, instead of writing code that polls the Timer.
If for some reason you really do need to poll a variable shared across threads, you really, really, should be protecting it with a Condition or RLock. There is no guarantee in the language that, when thread 0 updates the value, thread 1 will see the new value immediately, or even ever. If you understand enough of the internals of (a specific version of) CPython, you can often prove that the GIL makes a lock unnecessary in specific cases. But otherwise, this is a race.
Finally:
The problem that I run into is that sometimes I feel like the time.sleep function in run , doesn't wait fully for a second (sometimes it may not even wait).
Well, the documentation clearly says this can happen:
The actual suspension time may be less than that requested because any caught signal will terminate the sleep() following execution of that signal’s catching routine.
So, if you need a guarantee that it actually sleeps for at least 1 second, the only way to do this is something like this:
t0 = time.time()
dur = 1.0
while True:
time.sleep(dur)
t1 = time.time()
dur = 1.0 - (t1 - t0)
if dur <= 0:
break

Python - How can I make this code asynchronous?

Here's some code that illustrates my problem:
def blocking1():
while True:
yield 'first blocking function example'
def blocking2():
while True:
yield 'second blocking function example'
for i in blocking1():
print 'this will be shown'
for i in blocking2():
print 'this will not be shown'
I have two functions which contain while True loops. These will yield data which I will then log somewhere (most likely, to an sqlite database).
I've been playing around with threading and have gotten it working. However, I don't really like it... What I would like to do is make my blocking functions asynchronous. Something like:
def blocking1(callback):
while True:
callback('first blocking function example')
def blocking2(callback):
while True:
callback('second blocking function example')
def log(data):
print data
blocking1(log)
blocking2(log)
How can I achieve this in Python? I've seen the standard library comes with asyncore and the big name in this game is Twisted but both of these seem to be used for socket IO.
How can I async my non-socket related, blocking functions?
A blocking function is a function which doesn't return, but still leaves your process idle - unable to complete more work.
You're asking us to make your blocking functions non-blocking. However – unless you're writing an operating system – you don't have any blocking functions. You might have functions which block because they make calls to blocking system calls, or you might have functions which "block" because they do a lot of computation.
Making the former type of function non-blocking is impossible without making the underlying system call non-blocking. Depending on what that system call is, it may be difficult to make it non-blocking without also adding an event loop to your program; you don't just need to make the call and have it not block, you also have to make another call to determine that the result of that call will be delivered somewhere you could associate it.
The answer to this question is a very long python program and a lot of explanations of different OS interfaces and how they work, but luckily I already wrote that answer on a different site; I called it Twisted. If your particular task is already supported by a Twisted reactor, then you're in luck. Otherwise, as long as your task maps to some existing operating system concept, you can extend a reactor to support it. Practically speaking there are only 2 of these mechanisms: file descriptors on every sensible operating system ever, and I/O Completion Ports on Windows.
In the other case, if your functions are consuming a lot of CPU, and therefore not returning, they're not really blocking; your process is still chugging along and getting work done. There are three ways to deal with that:
separate threads
separate processes
if you have an event loop, write a task that periodically yields, by writing the task in such a way that it does some work, then asks the event loop to resume it in the near future in order to allow other tasks to run.
In Twisted this last technique can be accomplished in various ways, but here's a syntactically convenient trick that makes it easy:
from twisted.internet import reactor
from twisted.internet.task import deferLater
from twisted.internet.defer import inlineCallbacks, returnValue
#inlineCallbacks
def slowButSteady():
result = SomeResult()
for something in somethingElse:
result.workHardForAMoment(something)
yield deferLater(reactor, 0, lambda : None)
returnValue(result)
You can use generators for cooperative multitasking, but you have to write your own main loop that passes control between them.
Here's a (very simple) example using your example above:
def blocking1():
while True:
yield 'first blocking function example'
def blocking2():
while True:
yield 'second blocking function example'
tasks = [blocking1(), blocking2()]
# Repeat until all tasks have stopped
while tasks:
# Iterate through all current tasks. Use
# tasks[:] to copy the list because we
# might mutate it.
for t in tasks[:]:
try:
print t.next()
except StopIteration:
# If the generator stops, remove it from the task list
tasks.remove(t)
You could further improve it by allowing the generators to yield new generators, which then could be added to tasks, but hopefully this simplified example will give the general idea.
The twisted framework is not just sockets. It has asynchronous adapters for many scenarios, including interacting with subprocesses. I recommend taking a closer look at that. It does what you are trying to do.
If you don't want to use full OS threading, you might try Stackless, which is a variant of Python that adds many interesting features, including "microthreads". There are a number of good examples that you will find helpful.
Your code isn’t blocking. blocking1() and it’s brother return iterators immediately (not blocking), and neither does a single iteration block (in your case).
If you want to “eat” from both iterators one-by-one, don’t make your program try to eat up “blocking1()” entirely, before continuing...
for b1, b2 in zip(blocking1(), blocking2()):
print 'this will be shown', b1, 'and this, too', b2

Categories

Resources