asyncio - how many coroutines?

asyncio - how many coroutines? - python

I have been struggling for a few days now with a python application where I am expecting to look for a file or files in a folder and iterate through the each file and each record in it and create objects to be persisted on a Janusgraph database. The particular OGM that I am using, requires that the transactions with the database are done in an asynchronously using asyncio. I have read a lot of blogs, posts about asyncio and I think I understand the concept of async, await, tasks, etc... In my application I have defined several functions that handle different parts of the processing:
Retrieves the list of all files available
Select one file for processing
Iterates through the selected file and reads a line/record for processing
Receives the record, determines parses the from in and calls several other functions that are responsible for creating the Model objects before they are persisted to the database. For instance, I different functions that creates: User, Session, Browser, DeviceUsed, Server, etc...
I understand (and I may be wrong) that the big advantage of using asyncio is for situations where the call to a function will block usually for I/O, database transaction, network latency, etc...
So my question is if I need to convert all my functions into coroutines and schedule to run through the event loop, or just the ones that would block, like committing transaction to the database. I tried this approach to begin with and had all sorts of problems.

So my question is if I need to convert all my functions into coroutines and schedule to run through the event loop, or just the ones that would block,
You might need to convert most of them, but the conversion should be largely mechanical, boiling down to changing def to async def, and adding await when calling other coroutines.
Obviously, you cannot avoid converting the ones that actually block, either by switching to the appropriate asyncio API or by using loop.run_in_executor() for those that don't have one. (DNS resolution used to be an outstanding example of the latter.)
But then you also need to convert their callers, because calling a coroutine from a blocking function is not useful unless the function implements event-loop-like functionality. On the other hand, when a coroutine is called from another coroutine, everything works because suspends are automatically propagated to the top of the chain. Once the whole call chain consists of coroutines, the top-level ones are fed to the event loop using loop.create_task() or loop.run_until_complete().
Of course, convenience functions that neither block nor call blocking functions can safely remain non-async, and are invoked by either sync or async code without any difference.
The above applies to asyncio, which implements stackless coroutines. A different approach is used by greenlet, whose tasks encapsulate the call stack, which allows them to be switched at arbitrary places in code that uses normal function calls. Greenlets are a bit more heavyweight and less portable than coroutines, though, so I'd first converting to asyncio.

Related

python async/await callback prevalence

I feel like there is a gap in my understanding regarding async/await functionality in python. From my understanding, once is a task is created via asyncio.create_task() it is automatically scheduled, and then a future await call will actually block other code execution until the task has completed. So if you create two tasks, and then await them sequentially, the second task could finish first, but the first task must be completed before code execution can continue. However, code in between the task creation and the await call will obviously proceed immediately (unlike the sync case), which is what I think is the benefit of async/await (please correct me if I am wrong). Are there also other benefits?
Alternatively, one can send off multiple tasks and then use as_completed or gather to handle them as they finish (perhaps out of order).
This flow makes sense in some data aggregating workflow, if you want to send off like 1000 requests simultaneously and want to aggregate or operate on the results sequentially. This all makes sense if you know how many requests you will have before hand, and what exactly the calls will look like, since you are essentially creating async tasks en masse. But what if you want to do async tasks frequently, but not all at once, like sending quotes to a trade matching engine?
I many situations, you likely want to fire a request, continue doing some other work, maybe fire more requests, and then handle the responses upon receiving them (in order of receipt since speed is critical), in a callback mechanism fashion. I don't see much recommendation about add_done_callback which leads me to believe that while I could probably achieve what I am looking for, there must be a gap in my understanding since its not recommended much. What alternatives in the asyncio sphere exist to achieve what I'm talking about? Can an asyncio.Queue be used to achieve this? I'm just confused because nearly every tutorial on the internet involves sending 1000 http requests at once, and handling them using gather or as_completed, but I feel that is such a synthetic and non-real-world workflow.

Is asyncio.run_in_executor multithreading?

The event loop is meant to be thread-specific, since asyncio is about cooperative multitasking using single thread. So I don't understand how asyncio.run_in_exceutor work together with ThreadPoolExcecutor?

I would like to know the purpose of the function
The loop.run_in_executor awaitable has two main use cases:
Perform an I/O operation that cannot be managed through the file descriptor interface of the selector loop (i.e using the loop.add/remove_reader methods). This happens occasionally, see how the code for loop.getaddrinfo uses loop.run_in_executor under the hood for instance.
Perform a heavy CPU operation that would block the event loop context switching mechanism for too long. There are plenty of legitimate use cases for that, imagine running some data processing task in the context of an asyncio application for instance.

Why is it that only asynchronous functions can yield in asynchronous code?

In the article "I'm not feeling the async pressure" Armin Ronacher makes the following observation:
In threaded code any function can yield. In async code only async functions can. This means for instance that the writer.write method cannot block.
This observation is made with reference to the following code sample:
from asyncio import start_server, run
async def on_client_connected(reader, writer):
while True:
data = await reader.readline()
if not data:
break
writer.write(data)
async def server():
srv = await start_server(on_client_connected, '127.0.0.1', 8888)
async with srv:
await srv.serve_forever()
run(server())
I do not understand this comment. Specifically:
How come synchronous functions cannot yield when inside of asynchronous functions?
What does yield have to do with blocking execution? Why is it that a function that cannot yield, cannot block?

Going line-by-line:
In threaded code any function can yield.
Programs running on a machine are organized in terms of processes. Each process may have one or more threads. Threads, like processes, are scheduled by (and interruptible by) the operating system. The word "yield" in this context means "letting other code run". When work is split between multiple threads, functions "yield" easily: the operating system suspends the code running in one thread, runs some code in a different thread, suspends that, comes back, and works some more on the first thread, and so on. By switching between threads in this way, concurrency is achieved.
In this execution model, whether the code being suspended is synchronous or asynchronous does not matter. The code within the thread is being run line-by-line, so the fundamental assumption of a synchronous function---that no changes occurred in between running one line of code and the next---is not violated.
In async code only async functions can.
"Async code" in this context means a single-threaded application that does the same work as the multi-threaded application, except that it achieves concurrency by using asynchronous functions within a thread, instead of splitting the work between different threads. In this execution model, your interpreter, not the operating system, is responsible for switching between functions as needed to achieve concurrency.
In this execution model, it is unsafe for work to be suspended in the middle of a synchronous function that's located inside of an asynchronous function. Doing so would mean running some other code in the middle of running your synchronous function, breaking the "line-by-line" assumption made by the synchronous function.
As a result, the interpreter will wait only suspend the execution of an asynchronous function in between synchronous sub-functions, never within one. This is what is meant by the statement that synchronous functions in async code cannot yield: once a synchronous function starts running, it must complete.
This means for instance that the writer.write method cannot block.
The writer.write method is synchronous, and hence, when run in an async program, uninterruptible. If this method were to block, it would block not just the asynchronous function it is running inside of, but the entire program. That would be bad. writer.write avoids blocking the program by writing to a write buffer instead and returning immediately.
Strictly speaking, writer.write can block, it's just inadvisable to do so.
If you need to block inside of an async function, the proper way to do so is to await another async function. This is what e.g. await writer.drain() does. This will block asynchronously: while this specific function remains blocked, it will correctly yield to other functions that can run.

“Yield” here refers to cooperative multitasking (albeit within a process rather than among them). In the context of the async/await style of Python programming, asynchronous functions are defined in terms of Python’s pre-existing generator support: if a function blocks (typically for I/O), all its callers that are performing awaits suspend (with an invisible yield/yield from that is indeed of the generator variety). The actual call for any generator is to its next method; that function actually returns.
Every caller, up to some sort of driver that most programmers never write, must participate for this approach to work: any function that did not suspend would suddenly have the responsibility of the driver of deciding what to do next while waiting on the function it called to complete. This “infectious” aspect of asynchronicity has been called a “color”; it can be problematic, as for example when people forget to await a coroutine call that looks correct because it looks like any other call. (The async/await syntax exists to minimize the disruption of the program’s structure from the concurrency by implicitly converting functions into state machines, but this ambiguity remains.) It can also be a good thing: an asynchronous function can be interrupted exactly when it awaits, so it’s straightforward to reason about the consistency of data structures.
A synchronous function therefore cannot yield simply as a matter of definition. The import of the restriction is rather that a function called with a normal (synchronous) call cannot yield: its caller is not prepared to handle such an interaction. (What will happen if it does anyway is of course the same “forgotten await”.) This also affects refactoring: a function cannot be changed to be asynchronous without changing all its clients (and making them asynchronous as well if they are not already). (This is similar to how all I/O works in Haskell, since it affects the type of any function that performs any.)
Note that yield is allowed in its role as a normal generator used with an ordinary for even in an asynchronous function, but that’s just the general fact that the caller must expect the same protocol as the callee: if an enhanced generator (an “old-style” coroutine) is used with for, it just gets None from every (yield), and if an async function is used with for, it produces awaitables that probably break when they are sent None.
The distinction with threading, or with so-called stackful coroutines or fibers, is that no special resumption support is needed from the caller because the actual function call simply doesn’t return until the thread/fiber is resumed. (In the thread case, the kernel also chooses when to resume it.) In that sense, these approaches are easier to use, but with fibers the ability to “sneak” a pause into any function is partially compromised by the need to specify arguments to that function to tell it about the userspace scheduler with which to register itself (unless you’re willing to use global variables for that…). Threads, on the other hand, have even higher overhead than fibers, which matters when great numbers of them are running.

Concurrently searching a graph in Python 3

I'd like to create a small p2p application that concurrently processes incoming data from other known / trusted nodes (it mostly stores it in an SQLite database). In order to recognize these nodes, upon connecting, each node introduces itself and my application then needs to check whether it knows this node directly or maybe indirectly through another node. Hence, I need to do a graph search which obviously needs processing time and which I'd like to outsource to a separate process (or even multiple worker processes? See my 2nd question below). Also, in some cases it is necessary to adjust the graph, add new edges or vertices.
Let's say I have 4 worker processes accepting and handling incoming connections via asynchronous I/O. What's the best way for them to access (read / modify) the graph? A single queue obviously doesn't do the trick for read access because I need to pass the search results back somehow.
Hence, one way to do it would be another queue which would be filled by the graph searching process and which I could add to the event loop. The event loop could then pass the results to a handler. However, this event/callback-based approach would make it necessary to also always pass the corresponding sockets to the callbacks and thus to the Queue – which is nasty because sockets are not picklable. (Let alone the fact that callbacks lead to spaghetti code.)
Another idea that's just crossed my mind might be to create a pipe to the graph process for each incoming connection and then, on the graph's side, do asynchronous I/O as well. However, in order to avoid callbacks, if I understand correctly, I would need an async I/O library making use of yield from (i.e. tulip / PEP 3156). Are there other options?
Regarding async I/O on the graph's side: This is certainly the best way to handle many incoming requests at once but doing graph lookups is a CPU intensive task, thus could profit from using multiple worker threads or processes. The problem is: Multiple threads allow shared data but Python's GIL somewhat negates the performance benefit. Multiple processes on the other hand don't have this problem but how can I share and synchronize data between them? (For me it seems quite impossible to split up a graph.) Is there any way to solve this problem in a nice way? Also, does it make sense in terms of performance to mix asynchronous I/O with multithreading / multiprocessing?

Answering your last question: It does! But, IMHO, the question is: does it makes sense mix Events and Threads? You can check this article about hybrid concurrency models: http://bibliotecadigital.sbc.org.br/download.php?paper=3027
My tip: Start with just one process and an event loop, like in the tulip model. I'll try to explain how can you use tulip to have Events+async I/O (and threads or other processes) without callbacks at all.
You could have something like accept = yield from check_incoming(), which should be a tulip coroutine (check_incoming), and inside this function you could use loop.run_in_executor() to run your graph search in a thread/process pool (I'll explain more about this later). This function run_in_executor() returns a Future, in which you can also yield from tasks.wait([future_returned_by_run_in_executor], loop=self). The next step would be result = future_returned_by_run_in_executor.result() and finally return True or False.
The process pool requires that only pickable objects can be executed and returned. This requirement is not a problem but it's implicit that the graph operation must be self contained in a function and must obtain the graph instance somehow. The Thread pool has the GIL problem since you mentioned CPU bound tasks which can lead to 'acquiring-gil-conflicts' but this was improved in the new Python 3.x GIL. Both solutions have limitations..
So.. instead of a pool, you can have another single process with it's own event loop just to manage all the graph work and connect both processes with a unix domain socket for instance..
This second process, just like the first one, must also accept incoming connections (but now they are from a known source) and can use a thread pool just like I said earlier but it won't "conflict" with the first event loop process(the one that handles external clients), only with the second event loop. Threads sharing the same graph instance requires some locking/unlocking.
Hope it helped!

Twisted: Making code non-blocking

I'm a bit puzzled about how to write asynchronous code in python/twisted. Suppose (for arguments sake) I am exposing a function to the world that will take a number and return True/False if it is prime/non-prime, so it looks vaguely like this:
def IsPrime(numberin):
for n in range(2,numberin):
if numberin % n == 0: return(False)
return(True)
(just to illustrate).
Now lets say there is a webserver which needs to call IsPrime based on a submitted value. This will take a long time for large numberin.
If in the meantime another user asks for the primality of a small number, is there a way to run the two function calls asynchronously using the reactor/deferreds architecture so that the result of the short calc gets returned before the result of the long calc?
I understand how to do this if the IsPrime functionality came from some other webserver to which my webserver would do a deferred getPage, but what if it's just a local function?
i.e., can Twisted somehow time-share between the two calls to IsPrime, or would that require an explicit invocation of a new thread?
Or, would the IsPrime loop need to be chunked into a series of smaller loops so that control can be passed back to the reactor rapidly?
Or something else?

I think your current understanding is basically correct. Twisted is just a Python library and the Python code you write to use it executes normally as you would expect Python code to: if you have only a single thread (and a single process), then only one thing happens at a time. Almost no APIs provided by Twisted create new threads or processes, so in the normal course of things your code runs sequentially; isPrime cannot execute a second time until after it has finished executing the first time.
Still considering just a single thread (and a single process), all of the "concurrency" or "parallelism" of Twisted comes from the fact that instead of doing blocking network I/O (and certain other blocking operations), Twisted provides tools for performing the operation in a non-blocking way. This lets your program continue on to perform other work when it might otherwise have been stuck doing nothing waiting for a blocking I/O operation (such as reading from or writing to a socket) to complete.
It is possible to make things "asynchronous" by splitting them into small chunks and letting event handlers run in between these chunks. This is sometimes a useful approach, if the transformation doesn't make the code too much more difficult to understand and maintain. Twisted provides a helper for scheduling these chunks of work, cooperate. It is beneficial to use this helper since it can make scheduling decisions based on all of the different sources of work and ensure that there is time left over to service event sources without significant additional latency (in other words, the more jobs you add to it, the less time each job will get, so that the reactor can keep doing its job).
Twisted does also provide several APIs for dealing with threads and processes. These can be useful if it is not obvious how to break a job into chunks. You can use deferToThread to run a (thread-safe!) function in a thread pool. Conveniently, this API returns a Deferred which will eventually fire with the return value of the function (or with a Failure if the function raises an exception). These Deferreds look like any other, and as far as the code using them is concerned, it could just as well come back from a call like getPage - a function that uses no extra threads, just non-blocking I/O and event handlers.
Since Python isn't ideally suited for running multiple CPU-bound threads in a single process, Twisted also provides a non-blocking API for launching and communicating with child processes. You can offload calculations to such processes to take advantage of additional CPUs or cores without worrying about the GIL slowing you down, something that neither the chunking strategy nor the threading approach offers. The lowest level API for dealing with such processes is reactor.spawnProcess. There is also Ampoule, a package which will manage a process pool for you and provides an analog to deferToThread for processes, deferToAMPProcess.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.