Why run_in_executor placed in BaseEventLoop?

Why run_in_executor placed in BaseEventLoop? - python

For example, asyncio.gather has signature asyncio.gather(*coros_or_futures, loop=None, return_exceptions=False).
I can pass specific loop or leave None (and default event loop will be used).
Why doesn't BaseEventLoop.run_in_executor defined same way, like: asyncio.run_in_executor(executor, callback, *args, loop=None)?
If there was some important reason to place it into BaseEventLoop?

Historically run_in_executor appeared very early and it was an event loop's method. It's modeled after twisted's methods for running code in thread pool. After appearing the run_in_executor has never changed.
It's low-level function, that accepts callback and sits pretty close to other functions which accepts callback, not couroutine: call_soon(), call_later(), add_reader() etc. All those are methods of event loop.
asyncio.gather was invited much later, after about a year of the library development. It is placed on higher abstraction level, works with coroutines and pushed along with other coroutine-related functions like wait() or sleep().

Related

Why is it that only asynchronous functions can yield in asynchronous code?

In the article "I'm not feeling the async pressure" Armin Ronacher makes the following observation:
In threaded code any function can yield. In async code only async functions can. This means for instance that the writer.write method cannot block.
This observation is made with reference to the following code sample:
from asyncio import start_server, run
async def on_client_connected(reader, writer):
while True:
data = await reader.readline()
if not data:
break
writer.write(data)
async def server():
srv = await start_server(on_client_connected, '127.0.0.1', 8888)
async with srv:
await srv.serve_forever()
run(server())
I do not understand this comment. Specifically:
How come synchronous functions cannot yield when inside of asynchronous functions?
What does yield have to do with blocking execution? Why is it that a function that cannot yield, cannot block?

Going line-by-line:
In threaded code any function can yield.
Programs running on a machine are organized in terms of processes. Each process may have one or more threads. Threads, like processes, are scheduled by (and interruptible by) the operating system. The word "yield" in this context means "letting other code run". When work is split between multiple threads, functions "yield" easily: the operating system suspends the code running in one thread, runs some code in a different thread, suspends that, comes back, and works some more on the first thread, and so on. By switching between threads in this way, concurrency is achieved.
In this execution model, whether the code being suspended is synchronous or asynchronous does not matter. The code within the thread is being run line-by-line, so the fundamental assumption of a synchronous function---that no changes occurred in between running one line of code and the next---is not violated.
In async code only async functions can.
"Async code" in this context means a single-threaded application that does the same work as the multi-threaded application, except that it achieves concurrency by using asynchronous functions within a thread, instead of splitting the work between different threads. In this execution model, your interpreter, not the operating system, is responsible for switching between functions as needed to achieve concurrency.
In this execution model, it is unsafe for work to be suspended in the middle of a synchronous function that's located inside of an asynchronous function. Doing so would mean running some other code in the middle of running your synchronous function, breaking the "line-by-line" assumption made by the synchronous function.
As a result, the interpreter will wait only suspend the execution of an asynchronous function in between synchronous sub-functions, never within one. This is what is meant by the statement that synchronous functions in async code cannot yield: once a synchronous function starts running, it must complete.
This means for instance that the writer.write method cannot block.
The writer.write method is synchronous, and hence, when run in an async program, uninterruptible. If this method were to block, it would block not just the asynchronous function it is running inside of, but the entire program. That would be bad. writer.write avoids blocking the program by writing to a write buffer instead and returning immediately.
Strictly speaking, writer.write can block, it's just inadvisable to do so.
If you need to block inside of an async function, the proper way to do so is to await another async function. This is what e.g. await writer.drain() does. This will block asynchronously: while this specific function remains blocked, it will correctly yield to other functions that can run.

“Yield” here refers to cooperative multitasking (albeit within a process rather than among them). In the context of the async/await style of Python programming, asynchronous functions are defined in terms of Python’s pre-existing generator support: if a function blocks (typically for I/O), all its callers that are performing awaits suspend (with an invisible yield/yield from that is indeed of the generator variety). The actual call for any generator is to its next method; that function actually returns.
Every caller, up to some sort of driver that most programmers never write, must participate for this approach to work: any function that did not suspend would suddenly have the responsibility of the driver of deciding what to do next while waiting on the function it called to complete. This “infectious” aspect of asynchronicity has been called a “color”; it can be problematic, as for example when people forget to await a coroutine call that looks correct because it looks like any other call. (The async/await syntax exists to minimize the disruption of the program’s structure from the concurrency by implicitly converting functions into state machines, but this ambiguity remains.) It can also be a good thing: an asynchronous function can be interrupted exactly when it awaits, so it’s straightforward to reason about the consistency of data structures.
A synchronous function therefore cannot yield simply as a matter of definition. The import of the restriction is rather that a function called with a normal (synchronous) call cannot yield: its caller is not prepared to handle such an interaction. (What will happen if it does anyway is of course the same “forgotten await”.) This also affects refactoring: a function cannot be changed to be asynchronous without changing all its clients (and making them asynchronous as well if they are not already). (This is similar to how all I/O works in Haskell, since it affects the type of any function that performs any.)
Note that yield is allowed in its role as a normal generator used with an ordinary for even in an asynchronous function, but that’s just the general fact that the caller must expect the same protocol as the callee: if an enhanced generator (an “old-style” coroutine) is used with for, it just gets None from every (yield), and if an async function is used with for, it produces awaitables that probably break when they are sent None.
The distinction with threading, or with so-called stackful coroutines or fibers, is that no special resumption support is needed from the caller because the actual function call simply doesn’t return until the thread/fiber is resumed. (In the thread case, the kernel also chooses when to resume it.) In that sense, these approaches are easier to use, but with fibers the ability to “sneak” a pause into any function is partially compromised by the need to specify arguments to that function to tell it about the userspace scheduler with which to register itself (unless you’re willing to use global variables for that…). Threads, on the other hand, have even higher overhead than fibers, which matters when great numbers of them are running.

From running thread call coroutine on other thread

I'm replacing part of an existing program. That original program uses threads. There's this particular class which inherits from threading.Thread which functionality I need to replace but I need to keep the interface the same.
The functionality I'm integrating is packaged in a library which uses asyncio a lot.
The original calls to the class I'm replacing go something like this:
network = Network()
network.start()
network.fetch_something() # crashes!
network.stop()
I've gotten to a point where my replacing class inherits from threading.Thread too and I can connect, from within the run method to my backends via the client library:
class Network(threading.Thread):
def __init__(self):
self._loop = asyncio.new_event_loop()
self._client = Client() # this is the library
def run(self):
self._loop.run_until_complete(self.__connect()) # works dandy, implementation not shown
self._loop.run_forever()
def fetch_something(self):
return self._loop.run_until_complete(self._client.fetch_something())
Running this code throws an exception:
RuntimeError: Non-thread-safe operation invoked on an event loop other than the current one
I sort of get what's going on here. In the run method things worked out because the same thread running the event loop was the caller. In the other case an other thread was the caller hence the problem.
As you might have noticed I was hoping the problem would have been solved by using the same event loop. Alas, that didn't work out.
I really want to keep the interface exactly as it is otherwise I'm refactoring for the remainder of the year. I could relatively easily pass arguments to the constructor of the Network class. I've tried passing in an event loop created on the main thread but the result was the same.
(Note that this is the opposite problem this author has: Call coroutine within Thread)

When scheduling a coroutine from a different thread, you must use asyncio.run_coroutine_threadsafe. For example:
def fetch_something(self):
future = asyncio.run_coroutine_threadsafe(
self._client.fetch_something(), loop)
return future.result()
run_coroutine_threadsafe schedules the coroutine with the event loop in a thread-safe way and returns a concurrent.futures.Future. You can use the returned future to simply wait for the result as shown above, but you can also pass it to other functions, poll whether the result has arrived, or implement timeouts.
When combining threads and asyncio, remember to make sure that all interfacing with the event loop from other threads (even to call something as simple as loop.stop to implement Network.stop) is done using loop.call_soon_threadsafe and asyncio.run_coroutine_threadsafe.

asyncio - how many coroutines?

I have been struggling for a few days now with a python application where I am expecting to look for a file or files in a folder and iterate through the each file and each record in it and create objects to be persisted on a Janusgraph database. The particular OGM that I am using, requires that the transactions with the database are done in an asynchronously using asyncio. I have read a lot of blogs, posts about asyncio and I think I understand the concept of async, await, tasks, etc... In my application I have defined several functions that handle different parts of the processing:
Retrieves the list of all files available
Select one file for processing
Iterates through the selected file and reads a line/record for processing
Receives the record, determines parses the from in and calls several other functions that are responsible for creating the Model objects before they are persisted to the database. For instance, I different functions that creates: User, Session, Browser, DeviceUsed, Server, etc...
I understand (and I may be wrong) that the big advantage of using asyncio is for situations where the call to a function will block usually for I/O, database transaction, network latency, etc...
So my question is if I need to convert all my functions into coroutines and schedule to run through the event loop, or just the ones that would block, like committing transaction to the database. I tried this approach to begin with and had all sorts of problems.

So my question is if I need to convert all my functions into coroutines and schedule to run through the event loop, or just the ones that would block,
You might need to convert most of them, but the conversion should be largely mechanical, boiling down to changing def to async def, and adding await when calling other coroutines.
Obviously, you cannot avoid converting the ones that actually block, either by switching to the appropriate asyncio API or by using loop.run_in_executor() for those that don't have one. (DNS resolution used to be an outstanding example of the latter.)
But then you also need to convert their callers, because calling a coroutine from a blocking function is not useful unless the function implements event-loop-like functionality. On the other hand, when a coroutine is called from another coroutine, everything works because suspends are automatically propagated to the top of the chain. Once the whole call chain consists of coroutines, the top-level ones are fed to the event loop using loop.create_task() or loop.run_until_complete().
Of course, convenience functions that neither block nor call blocking functions can safely remain non-async, and are invoked by either sync or async code without any difference.
The above applies to asyncio, which implements stackless coroutines. A different approach is used by greenlet, whose tasks encapsulate the call stack, which allows them to be switched at arbitrary places in code that uses normal function calls. Greenlets are a bit more heavyweight and less portable than coroutines, though, so I'd first converting to asyncio.

twisted: processing incoming events in synchronous code

Suppose there's a synchronous function in a twisted-powered Python program that takes a long time to execute, doing that in a lot of reasonable-sized pieces of work. If the function could return deferreds, this would be a no-brainer, however the function happens to be deep inside some synchronous code, so that yielding deferreds to continue is impossible.
Is there a way to let twisted handle outstanding events without leaving that function? I.e. what I want to do is something along the lines of
def my_func():
results = []
for item in a_lot_of_items():
results.append(do_computation(item))
reactor.process_outstanding_events()
return results
Of course, this imposes reentrancy requirements on the code, but still, there's QCoreApplication.processEvents for that in Qt, is there anything in twisted?

The solution taken by some event-loop-based systems (essentially the solution you're referencing via Qt's QCoreApplication.processEvents API) is to make the main loop re-entrant. In Twisted terms, this would mean something like (not working code):
def my_expensive_task_that_cannot_be_asynchronous():
#inlineCallbacks
def do_work(units):
for unit in units:
yield do_one_work_asynchronously(unit)
work = do_work(some_work_units())
work.addBoth(lambda ignored: reactor.stop())
reactor.run()
def main():
# Whatever your setup is...
# Then, hypothetical event source triggering your
# expensive function:
reactor.callLater(
30,
my_expensive_task_that_cannot_be_asynchronous,
)
reactor.run()
Notice how there are two reactor.run calls in this program. If Twisted had a re-entrant event loop, this second call would start spinning the reactor again and not return until a matching reactor.stop call is encountered. The reactor would process all events it knows about, not just the ones generated by do_work, and so you would have the behavior you desire.
This requires a re-entrant event loop because my_expensive_task_... is already being called by the reactor loop. The reactor loop is on the call stack. Then, reactor.run is called and the reactor loop is now on the call stack again. So the usual issues apply: the event loop cannot have left over state in its frame (otherwise it may be invalid by the time the nested call is complete), it cannot leave its instance state inconsistent during any calls out to other code, etc.
Twisted does not have a re-entrant event loop. This is a feature that has been considered and, at least in the past, explicitly rejected. Supporting this features brings a huge amount of additional complexity (described above) to the implementation and the application. If the event loop is re-entrant then it becomes very difficult to avoid requiring all application code to be re-entrant safe as well. This negates one of the major benefits of the cooperative multitasking approach Twisted takes to concurrency (that you are guaranteed your functions will not be re-entered).
So, when using Twisted, this solution is out.
I'm not aware of another solution which would allow you to continue to run this code in the reactor thread. You mentioned that the code in question is nested deeply within some other synchronous code. The other options that come to mind are:
make the synchronous code capable of dealing with asynchronous things
factor the expensive parts out and compute them first, then pass the result in to the rest of the code
run all of that code, not just the computationally expensive part, in another thread

You could use deferToThread.
http://twistedmatrix.com/documents/13.2.0/core/howto/threading.html
That method runs your calculation in a separate thread and returns a deferred that is called back when the calculation is actually finished.

The issue is if do_heavy_computation() is code that blocks then execution won't go to the next function. In this case use deferToThread or blockingCallFromThread for heavy calculations. Also if you don't care for the results of the calculation then you can use callInThread. Take a look at documentation on threads

This should do:
for item in items:
reactor.callLater(0, heavy_func, item)
reactor.callLater should bring you back into the event loop.

Do I need to manually call .quit() method when QThread() stops running (Python)?

I'm writing a multi-threaded application that utilizes QThreads. I know that, in order to start a thread, I need to override the run() method and call that method using the thread.start() somewhere (in my case in my GUI thread).
I was wondering, however, is it required to call the .wait() method anywhere and also am I supposed to call the .quit() once the thread finishes, or is this done automatically?
I am using PySide.
Thanks

Both answers depend on what your code is doing and what you expect from the thread.
If your logic which uses the thread needs to wait synchronously for the moment QThread finishes, then yes, you need to call wait(). However such requirement is a sign of sloppy threading model, except very specific situations like application startup and shutdown. Usage of QThread::wait() suggests creeping sequential operation, which means that you are effectively not using threads concurrently.
quit() exits QThread-internal event loop, which is not mandatory to use. A long-running thread (as opposed to one-task worker) must have an event loop of some sort - this is a generic statement, not specific to QThread. You either do it yourself (in form of some while(keepRunning) { } cycle) or use Qt-provided event loop, which you fire off by calling exec() in your run() method. The former implementation is finishable by you, because you did provide the keepRunning condition. The Qt-provided implementation is hidden from you and here goes the quit() call - which internally does nothing more than setting some sort of similar flag inside Qt.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why run_in_executor placed in BaseEventLoop? - python

Related

Why is it that only asynchronous functions can yield in asynchronous code?

From running thread call coroutine on other thread

asyncio - how many coroutines?

twisted: processing incoming events in synchronous code

Do I need to manually call .quit() method when QThread() stops running (Python)?

Categories

Resources