How does Python's Twisted Reactor work? - python

Recently, I've been diving into the Twisted docs. From what I gathered, the basis of Twisted's functionality is the result of it's event loop called the "Reactor". The reactor listens for certain events and dispatches them to registered callback functions that have been designed to handle these events. In the book, there is some pseudo code describing what the Reactor does but I'm having trouble understanding it, it just doesn't make any sense to me.
while True:
timeout = time_until_next_timed_event()
events = wait_for_events(timeout)
events += timed_events_until(now())
for event in events:
event.process()
What does this mean?

In case it's not obvious, It's called the reactor because it reacts to
things. The loop is how it reacts.
One line at a time:
while True:
It's not actually while True; it's more like while not loop.stopped. You can call reactor.stop() to stop the loop, and (after performing some shut-down logic) the loop will in fact exit. But it is portrayed in the example as while True because when you're writing a long-lived program (as you often are with Twisted) it's best to assume that your program will either crash or run forever, and that "cleanly exiting" is not really an option.
timeout = time_until_next_timed_event()
If we were to expand this calculation a bit, it might make more sense:
def time_until_next_timed_event():
now = time.time()
timed_events.sort(key=lambda event: event.desired_time)
soonest_event = timed_events[0]
return soonest_event.desired_time - now
timed_events is the list of events scheduled with reactor.callLater; i.e. the functions that the application has asked for Twisted to run at a particular time.
events = wait_for_events(timeout)
This line here is the "magic" part of Twisted. I can't expand wait_for_events in a general way, because its implementation depends on exactly how the operating system makes the desired events available. And, given that operating systems are complex and tricky beasts, I can't expand on it in a specific way while keeping it simple enough for an answer to your question.
What this function is intended to mean is, ask the operating system, or a Python wrapper around it, to block, until one or more of the objects previously registered with it - at a minimum, stuff like listening ports and established connections, but also possibly things like buttons that might get clicked on - is "ready for work". The work might be reading some bytes out of a socket when they arrive from the network. The work might be writing bytes to the network when a buffer empties out sufficiently to do so. It might be accepting a new connection or disposing of a closed one. Each of these possible events are functions that the reactor might call on your objects: dataReceived, buildProtocol, resumeProducing, etc, that you will learn about if you go through the full Twisted tutorial.
Once we've got our list of hypothetical "event" objects, each of which has an imaginary "process" method (the exact names of the methods are different in the reactor just due to accidents of history), we then go back to dealing with time:
events += timed_events_until(now())
First, this is assuming events is simply a list of an abstract Event class, which has a process method that each specific type of event needs to fill out.
At this point, the loop has "woken up", because wait_for_events, stopped blocking. However, we don't know how many timed events we might need to execute based on how long it was "asleep" for. We might have slept for the full timeout if nothign was going on, but if lots of connections were active we might have slept for effectively no time at all. So we check the current time ("now()"), and we add to the list of events we need to process, every timed event with a desired_time that is at, or before, the present time.
Finally,
for event in events:
event.process()
This just means that Twisted goes through the list of things that it has to do and does them. In reality of course it handles exceptions around each event, and the concrete implementation of the reactor often just calls straight into an event handler rather than creating an Event-like object to record the work that needs to be done first, but conceptually this is just what happens. event.process here might mean calling socket.recv() and then yourProtocol.dataReceived with the result, for example.
I hope this expanded explanation helps you get your head around it. If you'd like to learn more about Twisted by working on it, I'd encourage you to join the mailing list, hop on to the IRC channel, #twisted to talk about applications or #twisted-dev to work on Twisted itself, both on Freenode.

I will try to elaborate:
The program yields control and go to sleep on wait for events.
I suppose the most interesting part here is event.
Event is:
on external demand (receiving network packet, click on a keyboard, timer, different program call) the program receives control (in some other thread or
in special routine). Somehow the sleep in wait_for_events becomes interrupted and wait_for_events returns.
On that occurrence of control the event handler stores information of that event into some data structure, events, which later is used for doing something about that events (event->process).
There can happen not only one, but many events in the time between entering and exiting of wait_for_events, all of them must be processed.
The event->process() procedure is custom and should usually call the interesting part - user's twisted code.

Related

Parallel or Event driven functions in Python?

I'm fairly new to Python, so maybe my whole concept of how this should Work is wrong:
I'm building a RFID Reader for time managing purposes. E.g. User logs in with RFID chip -> Timer Starts counting and updating a Google spreadsheet every Minute. The updating part works fine, but takes a little while. But I want to Check for RFID Logins all the time. Somewhere I read that Event driven Programming is what I'm looking for.
Currently I'm doing everything in a while true loop, which feels like a hack itself. Can i somehow just execute my code when the RFID reader sends a signal? And then time my update to run every minute or so parallel? I'd like to know whats best practice here.
Parallel and Event Driven are basically orthogonal, although it is generally "easy" to parallelize events. I'll first cover the event driven and then the parallelisation, although you may only want to use the later.
The "normal" controlflow in python is iterative.
That means you define the instructions the code should do and then the pc executes these step for step.
There are different ways to organize your code (functional, event driven, object oriented, although I don't want to say that these are absolute categories where you can only do X or Y). Event driven normally means you define events and how to handle them.
There is nothing you could program with event driven which you couldn't program iterativly and vice versa.
Python mainly got support for asyncronuos stuff with version 3.4 when the asyncio library was introduced.
With 3.5 you also got syntactic sugar await and async. Because you are on 2.7 this is not available for you.
There is a backport from asyncio named trollius but this is overkill if you only have a "low amount of events". Also it's not hard to "roll your own basic event loop" (of course asyncio and trollius do much more, but if we are not going to use these features, why bother?).
The basic workflow is waiting for events and then handling them as they occur:
events = []
while waiting_for_events:
if events:
event = events.pop()
handle_event(event)
You somehow need to know how to differentiate between events and how to handle them.
For a "full featured event loop" you would probably use different classes with inheritance, but lets just use a name for each event.
Also we probably need some kind of data like which RFID we encountered.
from collections import namedtuple
Event = namedtuple("Event", ["name", "data"])
Then we simply need to map events to how to handle them:
def handle_rfid(data):
...
def handle_timer(data):
...
event_handler_mapping = {"rfid": handle_rfid, "timer": handle_timer}
def handle_event(event):
event_handler_mapping[event.name](event.data)
We still need to generate events, so lets rewrite the eventloop to get events:
timer = 0
current_time = time.perf_counter()
while waiting_for_events:
rfid = get_rfids()
if rfid:
events.append(Event("rfid", rfid))
if timer > 1:
events.append(Event("timer", timer))
timer = 0
else:
timer += time.perf_counter() - current_time
current_time = time.perf_counter()
if events:
event = events.pop()
handle_event(event)
And now we are "event driven". The nice thing is that we can easily extend this to more events.
The bad thing is that it still does the same thing you probably already have, but it's more complicated.
Also if the event handling needs a lot of time (which seems to be the case with updating the spreadsheet) the other events will not
be generated and handled. This is were parallelism comes into play.
Parallelism basically means we can use multiple cores.
Here we actually only need "concurrency" which means two things can happen at once.
This is "easier" than true parallelism, we can just switch between different things but still do all the things sequentially.
In python this basically boils down to multiprocessing (parallelism) and threads ("only" concurrency) (in other programming languages threads actually do parallelism, but in python this is for reasons I don't want to go into not the case).
The problem with concurrency is always syncronisation. If things can happen at the same time, bad things can happen
if two threads try to change the same variable. In general as long as you only use thread-safe functions to access variables shared between threads, you are safe.
In python threads are created by the threading module.
I personally find it hard to understand if you don't already know threads from somewhere else, but the gist is the following:
To run a function in a thread use threading.Thread(target=function) and then thread.start().
You could use it the following:
def run_in_thread(f, *args, **kwargs):
thread = Thread(target=f, args=args, kwargs=kwargs)
thread.start()
def _update_spreadsheet(data):
# logic here
# when using the event driven approach from above
def handle_timer(data):
run_in_thread(_update_spreadsheet(data))
Note that if you access variables from within _update_spreadsheet you need to be carefule to only use thread safe function.
It is "best" to use as little inter-thread communication as possible.
A queue is often a good choice.
You can use parallelism/concurrency without the event driven organisation.
Because we already divided the code into event handlers we can call long running event handlers in a seperate thread.
If we have lots of events and event handlers running everything in threads is a bad idea (because thread switching has an overhead).
Thus asyncio (and probably all other event loops) implement some kind of "wait until atleast one event can be handled".
This is most interesting for internet input and output, because these need "a long time".
Often something like select is used.
Other events (timers, read from disk, wait for some hardware events, ...) need other mechanisms for "wake me up when something happens". Integrating all of these is one of the features asyncio offer for you.

twisted: processing incoming events in synchronous code

Suppose there's a synchronous function in a twisted-powered Python program that takes a long time to execute, doing that in a lot of reasonable-sized pieces of work. If the function could return deferreds, this would be a no-brainer, however the function happens to be deep inside some synchronous code, so that yielding deferreds to continue is impossible.
Is there a way to let twisted handle outstanding events without leaving that function? I.e. what I want to do is something along the lines of
def my_func():
results = []
for item in a_lot_of_items():
results.append(do_computation(item))
reactor.process_outstanding_events()
return results
Of course, this imposes reentrancy requirements on the code, but still, there's QCoreApplication.processEvents for that in Qt, is there anything in twisted?
The solution taken by some event-loop-based systems (essentially the solution you're referencing via Qt's QCoreApplication.processEvents API) is to make the main loop re-entrant. In Twisted terms, this would mean something like (not working code):
def my_expensive_task_that_cannot_be_asynchronous():
#inlineCallbacks
def do_work(units):
for unit in units:
yield do_one_work_asynchronously(unit)
work = do_work(some_work_units())
work.addBoth(lambda ignored: reactor.stop())
reactor.run()
def main():
# Whatever your setup is...
# Then, hypothetical event source triggering your
# expensive function:
reactor.callLater(
30,
my_expensive_task_that_cannot_be_asynchronous,
)
reactor.run()
Notice how there are two reactor.run calls in this program. If Twisted had a re-entrant event loop, this second call would start spinning the reactor again and not return until a matching reactor.stop call is encountered. The reactor would process all events it knows about, not just the ones generated by do_work, and so you would have the behavior you desire.
This requires a re-entrant event loop because my_expensive_task_... is already being called by the reactor loop. The reactor loop is on the call stack. Then, reactor.run is called and the reactor loop is now on the call stack again. So the usual issues apply: the event loop cannot have left over state in its frame (otherwise it may be invalid by the time the nested call is complete), it cannot leave its instance state inconsistent during any calls out to other code, etc.
Twisted does not have a re-entrant event loop. This is a feature that has been considered and, at least in the past, explicitly rejected. Supporting this features brings a huge amount of additional complexity (described above) to the implementation and the application. If the event loop is re-entrant then it becomes very difficult to avoid requiring all application code to be re-entrant safe as well. This negates one of the major benefits of the cooperative multitasking approach Twisted takes to concurrency (that you are guaranteed your functions will not be re-entered).
So, when using Twisted, this solution is out.
I'm not aware of another solution which would allow you to continue to run this code in the reactor thread. You mentioned that the code in question is nested deeply within some other synchronous code. The other options that come to mind are:
make the synchronous code capable of dealing with asynchronous things
factor the expensive parts out and compute them first, then pass the result in to the rest of the code
run all of that code, not just the computationally expensive part, in another thread
You could use deferToThread.
http://twistedmatrix.com/documents/13.2.0/core/howto/threading.html
That method runs your calculation in a separate thread and returns a deferred that is called back when the calculation is actually finished.
The issue is if do_heavy_computation() is code that blocks then execution won't go to the next function. In this case use deferToThread or blockingCallFromThread for heavy calculations. Also if you don't care for the results of the calculation then you can use callInThread. Take a look at documentation on threads
This should do:
for item in items:
reactor.callLater(0, heavy_func, item)
reactor.callLater should bring you back into the event loop.

Is it possible to prevent python's http.client.HTTPResponse.read() from hanging when there is no data?

I'm using Python http.client.HTTPResponse.read() to read data from a stream. That is, the server keeps the connection open forever and sends data periodically as it becomes available. There is no expected length of response. In particular, I'm getting Tweets through the Twitter Streaming API.
To accomplish this, I repeatedly call http.client.HTTPResponse.read(1) to get the response, one byte at a time. The problem is that the program will hang on that line if there is no data to read, which there isn't for large periods of time (when no Tweets are coming in).
I'm looking for a method that will get a single byte of the HTTP response, if available, but that will fail instantly if there is no data to read.
I've read that you can set a timeout when the connection is created, but setting a timeout on the connection defeats the whole purpose of leaving it open for a long time waiting for data to come in. I don't want to set a timeout, I want to read data if there is data to be read, or fail if there is not, without waiting at all.
I'd like to do this with what I have now (using http.client), but if it's absolutely necessary that I use a different library to do this, then so be it. I'm trying to write this entirely myself, so suggesting that I use someone else's already-written Twitter API for Python is not what I'm looking for.
This code gets the response, it runs in a separate thread from the main one:
while True:
try:
readByte = dc.request.read(1)
except:
readByte = []
if len(byte) != 0:
dc.responseLock.acquire()
dc.response = dc.response + chr(byte[0])
dc.responseLock.release()
Note that the request is stored in dc.request and the response in dc.response, these are created elsewhere. dc.responseLock is a Lock that prevents dc.response from being accessed by multiple threads at once.
With this running on a separate thread, the main thread can then get dc.response, which contains the entire response received so far. New data is added to dc.response as it comes in without blocking the main thread.
This works perfectly when it's running, but I run into a problem when I want it to stop. I changed my while statement to while not dc.twitterAbort, so that when I want to abort this thread I just set dc.twitterAbort to True, and the thread will stop.
But it doesn't. This thread remains for a very long time afterward, stuck on the dc.request.read(1) part. There must be some sort of timeout, because it does eventually get back to the while statement and stop the thread, but it takes around 10 seconds for that to happen.
How can I get my thread to stop immediately when I want it to, if it's stuck on the call to read()?
Again, this method is working to get Tweets, the problem is only in getting it to stop. If I'm going about this entirely the wrong way, feel free to point me in the right direction. I'm new to Python, so I may be overlooking some easier way of going about this.
Your idea is not new, there are OS mechanisms(*) for making sure that an application is only calling I/O-related system calls when they are guaranteed to be not blocking . These mechanisms are usually used by async I/O frameworks, such as tornado or gevent. Use one of those, and you will find it very easy to run code "while" your application is waiting for an I/O event, such as waiting for incoming data on a socket.
If you use gevent's monkey-patching method, you can proceed using http.client, as requested. You just need to get used to the cooperative scheduling paradigm introduced by gevent/greenlets, in which your execution flow "jumps" between sub-routines.
Of course you can also perform blocking I/O in another thread (like you did), so that it does not affect the responsiveness of your main thread. Regarding your "How can I get my thread to stop immediately" problem:
Forcing a thread that's blocking in a system call to stop is usually not a clean or even valid process (also see Is there any way to kill a Thread in Python?). Either -- if your application has finished its jobs -- you take down the entire process, which also affects all contained threads, or you just leave the thread be and give it as much time to terminate as required (these 10 seconds you were referring to are not a problem -- are they?)
If you do not want to have such long-blocking system calls anywhere in your application (be it in the main thread or not), then use above-mentioned techniques to prevent blocking system calls.
(*) see e.g. O_NONBLOCK option in http://man7.org/linux/man-pages/man2/open.2.html

Python - wait on a condition without high cpu usage

In this case, say I wanted to wait on a condition to happen, that may happen at any random time.
while True:
if condition:
#Do Whatever
else:
pass
As you can see, pass will just happen until the condition is True. But while the condition isn't True the cpu is being pegged with pass causing higher cpu usage, when I simply just want it to wait until the condition occurs. How may I do this?
See Busy_loop#Busy-waiting_alternatives:
Most operating systems and threading libraries provide a variety of system calls that will block the process on an event, such as lock acquisition, timer changes, I/O availability or signals.
Basically, to wait for something, you have two options (same as IRL):
Check for it periodically with a reasonable interval (this is called "polling")
Make the event you're waiting for notify you: invoke (or, as a special case, unblock) your code somehow (this is called "event handling" or "notifications". For system calls that block, "blocking call" or "synchronous call" or call-specific terms are typically used instead)
As already mentioned you can a) poll i.e. check for a condition and if it is not true wait for some time interval, if your condition is an external event you can arrange for a blocking wait for the state to change, or you can also take a look at the publish subscribe model, pubsub, where your code registers an interest in a given item and then other parts of the code publish the item.
This is not really a Python problem. Optimally, you want to put your process to sleep and wait for some sort of signal that the action has occured, which will use no CPU while waiting. So it's not so much a case of writing Python code but figuring out what mechanism is used to make condition true and thus wait on that.
If the condition is a simple flag set by another thread in your program rather than an external resource, you need to go back and learn from scratch how threading works.
Only if the thing that you're waiting for does not provide any sort of push notification that you can wait on should you consider polling it in a loop. A sleep will help reduce the CPU load but not eliminate it and it will also increase the response latency as the sleep has to complete before you can commence processing.
As for waiting on events, an event-driven paradigm might be what you want unless your program is utterly trivial. Python has the Twisted framework for this.

How will this asynchronous code execute in Python vs. C?

Let's say I have a simple script with two functions below. As callback_func is invoked I would assume it will only run in a singular basis. That i, there won't be two events passing through the code block at the same time. Is that correct? Also, if the callback_func is run in a singular basis, the messaging service itself would have to perform some buffering so no messages are lost, and that buffering is depending on the service originating the event. Is that also correct?
def callback_func(event):
# Can be called anytime
def main_func():
# Sets up a connection to a messaging service
Then what if I add a send_func? If I receive one message but I have three going out, how will send_func deal with a situation when it gets called while sending a message? Is such a situation handled by the Python interpreter?
def send_func(event):
# Can be called anytime
def callback_func(event):
# Can be called anytime
def main_func():
# Sets up a connection to a messaging service
Then lastly, if I change the language to C, how do the answers to my questions above change?
Confusing Two Concepts ( Asynchronous != Concurrent )
Asynchronous does not imply Concurrent, and Concurrent does not imply Asynchronous. These terms get semantically confused by beginners ( and some experts ), but they are different concepts!
You can have one without the other, or both sometimes.
Asynchronous means you don't wait for something, it doesn't imply that it happens while other things do, just that it may happen later..
Concurrent means more than one completely individual thing is happening at the exact same time, these things can be Synchronous while being isolated and concurrent.
Implementation Specific
CPython is single threaded, there is no concern about re-entry. Other Python runtimes allow for concurrency and would need locking mechanisms if those features were used.
C is inherently single threaded as well unless you are specifically starting new threads, then you would need a locking mechanism.
I'd like to add that there are many places that message buffering can happen other than the "service". At a low level, I believe the operating system will buffer incoming and outgoing bytes on a socket.

Categories

Resources