I'm trying to run a function on separate threads using asyncio and futures. I have a decorator which takes the long running function and its argument asynchronously and outputs its value. Unfortunately the processes seem to not be working asynchronously.
def multiprocess(self, function, executor=None, *args, **kwargs):
async def run_task(function, *args, **kwargs):
#functools.wraps(function)
async def wrap(*args, **kwargs):
while True:
execution_runner = executor or self._DEFAULT_POOL_
executed_job = execution_runner.submit(function, *args, **kwargs)
print(
f"Pending {function.__name__}:",
execution_runner._work_queue.qsize(),
"jobs",
)
print(
f"Threads: {function.__name__}:", len(execution_runner._threads)
)
future = await asyncio.wrap_future(executed_job)
return future
return wrap
return asyncio.run(run_task(function, *args, **kwargs))
To call the decorator I have two functions _async_task and task_function. _async_task contains a loop that runs task_function for each document that needs to be processed.
#staticmethod
def _async_task(documents):
processed_docs = asyncio.run(task_function(documents))
return processed_docs
task_function processes each document in documents as below,
#multiprocess
async def task_function(documents):
processed_documents = None
try:
for doc in documents:
processed_documents = process_document(doc)
print(processed_documents)
except Exception as err:
print(err)
return processed_documents
The clue that this doesn't work asynchronously is that the diagnostics I have for the multithreading decorator print the following.
Pending summarise_news: 0 jobs
Threads: summarise_news: 2
Since there's no pending jobs and the entire process takes as long as the synchronous run, it's running synchronously.
I had some issues setting up your code, but I think I've come up with an answer.
First of all, as #dano mentioned in his comment, asyncio.run blocks until the coroutine running is completed. Thus, you won't get any speedup from using this approach.
I used a slightly modified multiprocess decorator
def multiprocess(executor=None, *args, **kwargs):
def run_task(function, *args, **kwargs):
def wrap(*args, **kwargs):
execution_runner = executor or DEFAULT_EXECUTOR
executed_job = execution_runner.submit(function, *args, **kwargs)
print(
f"Pending {function.__name__}:",
execution_runner._work_queue.qsize(),
"jobs",
)
print(
f"Threads: {function.__name__}:", len(execution_runner._threads)
)
future = asyncio.wrap_future(executed_job)
return future
return wrap
return run_task
As you can see, there's no asyncio.run here, and both the decorator and inner wrapper are synchronous since asyncio.wrap_future does not need await.
The updated multiprocess decorator is now used with process_document function. The reason for that is you won't get any benefit of parallelizing a function that calls blocking functions in a sequence. You have to convert your blocking function to be runnable in an executor instead.
NOTE that this dummy process_document is exactly like I described - fully blocking and synchronous.
#multiprocess()
def process_document(doc):
print(f"Processing doc: {doc}...")
time.sleep(2)
print(f"Doc {doc} done.")
Now, to the last point. We already made process_document kind of asynchronous by converting it to be runnable in an executor, BUT it still matters HOW exactly you invoke it.
Consider the following examples:
for doc in documents:
result = await process_document(doc)
results = await asyncio.gather(*[process_document(doc) for doc in documents])
In the former one, we will wait for coroutines sequentially, having to wait until one finishes before starting another.
In the latter example, they will be executed in parallel, so it really depends on how exactly you invoke your coroutine execution.
Here's the full code snipped I used:
import asyncio
import concurrent.futures
import time
DEFAULT_EXECUTOR = concurrent.futures.ThreadPoolExecutor(max_workers=4)
def multiprocess(executor=None, *args, **kwargs):
def run_task(function, *args, **kwargs):
def wrap(*args, **kwargs):
execution_runner = executor or DEFAULT_EXECUTOR
executed_job = execution_runner.submit(function, *args, **kwargs)
print(
f"Pending {function.__name__}:",
execution_runner._work_queue.qsize(),
"jobs",
)
print(
f"Threads: {function.__name__}:", len(execution_runner._threads)
)
future = asyncio.wrap_future(executed_job)
return future
return wrap
return run_task
#multiprocess()
def process_document(doc):
print(f"Processing doc: {doc}...")
time.sleep(2)
print(f"Doc {doc} done.")
async def task_function_sequential(documents):
start = time.time()
for doc in documents:
await process_document(doc)
end = time.time()
print(f"task_function_sequential took: {end-start}s")
async def task_function_parallel(documents):
start = time.time()
jobs = [process_document(doc) for doc in documents]
await asyncio.gather(*jobs)
end = time.time()
print(f"task_function_parallel took: {end-start}s")
async def main():
documents = [i for i in range(5)]
await task_function_sequential(documents)
await task_function_parallel(documents)
asyncio.run(main())
Notice that the task_function_parallel example still takes around 4 seconds, instead of 2, because the thread pool is limited to 4 workers, and the number of jobs is 5, so the last job will be waiting for some workers to be available.
Related
I have some HTML pages that I am trying to extract the text from using asynchronous web requests through aiohttp and asyncio, after extracting them I save the files locally. I am using BeautifulSoup(under extract_text()), to process the text from the response and extract the relevant text within the HTML page(exclude the code, etc.) but facing an issue where my synchronous version of the script is faster than my asynchronous + multiprocessing.
As I understand, using the BeautifulSoup function causes the main event loop to block within parse(), so based on these two StackOverflow questions[0, 1], I figured the best thing to do was to run the extract_text() within its own process(as its a CPU task) which should prevent the event loop from blocking.
This results in the script taking 1.5x times longer than the synchronous version(with no multiprocessing).
To confirm that this was not an issue with my implementation of the asynchronous code, I removed the use of the extract_text() and instead saved the raw text from the response object. Doing this resulted in my asynchronous code being much faster, showcasing that the issue is purely from the extract_text() being run on a separate process.
Am I missing some important detail here?
import asyncio
from asyncio import Semaphore
import json
import logging
from pathlib import Path
from typing import List, Optional
import aiofiles
from aiohttp import ClientSession
import aiohttp
from bs4 import BeautifulSoup
import concurrent.futures
import functools
def extract_text(raw_text: str) -> str:
return " ".join(BeautifulSoup(raw_text, "html.parser").stripped_strings)
async def fetch_text(
url: str,
session: ClientSession,
semaphore: Semaphore,
**kwargs: dict,
) -> str:
async with semaphore:
response = await session.request(method="GET", url=url, **kwargs)
response.raise_for_status()
logging.info("Got response [%s] for URL: %s", response.status, url)
text = await response.text(encoding="utf-8")
return text
async def parse(
url: str,
session: ClientSession,
semaphore: Semaphore,
**kwargs,
) -> Optional[str]:
try:
text = await fetch_text(
url=url,
session=session,
semaphore=semaphore,
**kwargs,
)
except (
aiohttp.ClientError,
aiohttp.http_exceptions.HttpProcessingError,
) as e:
logging.error(
"aiohttp exception for %s [%s]: %s",
url,
getattr(e, "status", None),
getattr(e, "message", None),
)
except Exception as e:
logging.exception(
"Non-aiohttp exception occured: %s",
getattr(e, "__dict__", None),
)
else:
loop = asyncio.get_running_loop()
with concurrent.futures.ProcessPoolExecutor() as pool:
extract_text_ = functools.partial(extract_text, text)
text = await loop.run_in_executor(pool, extract_text_)
logging.info("Found text for %s", url)
return text
async def process_file(
url: dict,
session: ClientSession,
semaphore: Semaphore,
**kwargs: dict,
) -> None:
category = url.get("category")
link = url.get("link")
if category and link:
text = await parse(
url=f"{URL}/{link}",
session=session,
semaphore=semaphore,
**kwargs,
)
if text:
save_path = await get_save_path(
link=link,
category=category,
)
await write_file(html_text=text, path=save_path)
else:
logging.warning("Text for %s not found, skipping it...", link)
async def process_files(
html_files: List[dict],
semaphore: Semaphore,
) -> None:
async with ClientSession() as session:
tasks = [
process_file(
url=file,
session=session,
semaphore=semaphore,
)
for file in html_files
]
await asyncio.gather(*tasks)
async def write_file(
html_text: str,
path: Path,
) -> None:
# Write to file using aiofiles
...
async def get_save_path(link: str, category: str) -> Path:
# return path to save
...
async def main_async(
num_files: Optional[int],
semaphore_count: int,
) -> None:
html_files = # get all the files to process
semaphore = Semaphore(semaphore_count)
await process_files(
html_files=html_files,
semaphore=semaphore,
)
if __name__ == "__main__":
NUM_FILES = # passed through CLI args
SEMAPHORE_COUNT = # passed through CLI args
asyncio.run(
main_async(
num_files=NUM_FILES,
semaphore_count=SEMAPHORE_COUNT,
)
)
SnakeViz charts across 1000 samples
Async version with extract_text and multiprocessing
Async version without extract_text
Sync version with extract_text(notice how the html_parser from BeautifulSoup takes up the majority of the time here)
Sync version without extract_text
Here is roughly what your asynchronous program does:
Launch num_files parse() tasks concurrently
Each parse() task creates its own ProcessPoolExecutor and asynchronously awaits for extract_text (which is executed in the previously created process pool).
This is suboptimal for several reasons:
It creates num_files process pools, which are expensive to create and takes memory
Each pool is only used for one single operation, which is counterproductive: as many concurrent operations as possible should be submitted to a given pool
You are creating a new ProcessPoolExecutor each time the parse() function is called. You could try to instantiate it once (as a global for instance, of passed through a function argument):
from concurrent.futures import ProcessPoolExecutor
async def parse(loop, executor, ...):
...
text = await loop.run_in_executor(executor, extract_text)
# and then in `process_file` (or `process_files`):
async def process_file(...):
...
loop = asyncio.get_running_loop()
with ProcessPoolExecutor() as executor:
...
await process(loop, executor, ...)
I benchmarked the overhead of creating a ProcessPoolExecutor on my old MacBook Air 2015 and it shows that it is quite slow (almost 100 ms for pool creation, opening, submit and shutdown):
from time import perf_counter
from concurrent.futures import ProcessPoolExecutor
def main_1():
"""Pool crated once"""
reps = 100
t1 = perf_counter()
with ProcessPoolExecutor() as executor:
for _ in range(reps):
executor.submit(lambda: None)
t2 = perf_counter()
print(f"{(t2 - t1) / reps * 1_000} ms") # 2 ms/it
def main_2():
"""Pool created at each iteration"""
reps = 100
t1 = perf_counter()
for _ in range(reps):
with ProcessPoolExecutor() as executor:
executor.submit(lambda: None)
t2 = perf_counter()
print(f"{(t2 - t1) / reps * 1_000} ms") # 100 ms/it
if __name__ == "__main__":
main_1()
main_2()
You may again hoist it up in the process_files function, which avoid recreating the pool for each file.
Also, try to inspect more closely your first SnakeViz chart in order to know what exactly in process.py:submit is taking that much time.
One last thing, be careful of the semantics of using a context manager on an executor:
from concurrent.futures import ProcessPoolExecutor
with ProcessPoolExecutor() as executor:
for i in range(100):
executor.submit(some_work, i)
Not only this creates and executor and submit work to it but it also waits for all work to finish before exiting the with statement.
In Go language, running another task in non-blocking way is pretty simple.
import "net/http"
func Call(request http.Request) http.Response {
response := getResponse(request)
go doTask(request) // Non-blocking. No need to wait for result.
return response
}
func getResponse(request http.Request) http.Response {
// Do something synchronously here
return http.Response{}
}
func doTask(r http.Request) {
// Some task which takes time to finish
}
How can I achieve this in python?
I tried like this:
import asyncio
import threading
from asyncio.events import AbstractEventLoop
loop: AbstractEventLoop
def initialize():
global loop
loop = asyncio.new_event_loop()
thread = threading.Thread(target=run_event_loop)
thread.start()
def run_event_loop():
loop.run_forever()
def call(request):
response = get_response(request)
# This should not block
asyncio.run_coroutine_threadsafe(do_task(request), loop)
return response
def get_response(r):
# Do something synchronously here
return 42
async def do_task(r):
# Some task which takes time to finish
return
This works, but it is kind of cumbersome.
Besides, the python code only make use of one thread for tasks, while Go automatically dispatches tasks to multiple processes (and so does Kotlin).
Is there better way?
We can run non-blocking tasks by using the concurrent.futures module. The ThreadPoolExecutor object is used for this purpose. You could modify your code to use this approach by making the following adjustments:
from concurrent.futures import ThreadPoolExecutor
def call(request):
response = get_response(request)
with ThreadPoolExecutor(max_workers=4) as e:
e.submit(do_task)
return response
def get_response(r):
# Do something synchronously here
return 42
async def do_task(r):
# Some task which takes time to finish
return
response would be returned instantly and do_task would run after the result is returned.
I have a tiny event async eventsystem like this:
from collections import defaultdict
from uuid import uuid4
class EventSystem:
def __init__(self):
self.handlers = defaultdict(dict)
def register_handler(self, event, callback, register_id=None):
register_id = register_id or uuid4()
self.handlers[event][register_id] = callback
return register_id
def unregister_handler(self, event, register_id):
del self.handlers[event][register_id]
def clear_handlers(self, event):
handler_register_ids = list(self.handlers[event].keys())
for register_id in handler_register_ids:
self.unregister_handler(event, register_id)
async def fire_event(self, event, data):
handlers = self.handlers[event]
for register_id, callback in handlers.items():
await callback(data)
return len(handlers)
Which currently forces handlers to be async functions.
I cannot decide what is more pythonic, enforcing this policy, and having an async2sync wrapper for sync functions:
async def async2sync(func, *args, **kwargs):
return func(*args, **kwargs)
Or changing fire_event to checking the handler type, using inspect.isawaitable:
async def fire_event(self, event, data):
handlers = self.handlers[event]
for register_id, callback in handlers.items():
ret = callback(data)
if inspect.isawaitable(ret):
await ret
return len(handlers)
I am not worried about long-running or blocking sync functions.
Since the wrapper in your first approach wraps sync functions into async, shouldn't it be called sync2async rather than async2sync?
If long-running or blocking sync functions are not a concern, both approaches are fine. Both have benefits and drabacks. The first approach is a bit more minimalistic and easier to reason about. The second approach is a bit more clever (which can bite you when you least expect), but also much more pleasant to use because you can just write either kind of function for handler and things will "just work". If the user of your API is someone other than yourself, they will probably appreciate it.
TL;DR Either is fine; I'd personally probably go with the second.
There is a tricky post handler, sometimes it can take a lots of time (depending on a input values), sometimes not.
What I want is to write back whenever 1 second passes, dynamically allocating the response.
def post():
def callback():
self.write('too-late')
self.finish()
timeout_obj = IOLoop.current().add_timeout(
dt.timedelta(seconds=1),
callback,
)
# some asynchronous operations
if not self.request.connection.stream.closed():
self.write('here is your response')
self.finish()
IOLoop.current().remove_timeout(timeout_obj)
Turns out I can't do much from within callback.
Even raising an exception is suppressed by the inner context and won't be passed through the post method.
Any other ways to achieve the goal?
Thank you.
UPD 2020-05-15:
I found similar question
Thanks #ionut-ticus, using with_timeout() is much more convenient.
After some tries, I think I came really close to what i'm looking for:
def wait(fn):
#gen.coroutine
#wraps(fn)
def wrap(*args):
try:
result = yield gen.with_timeout(
dt.timedelta(seconds=20),
IOLoop.current().run_in_executor(None, fn, *args),
)
raise gen.Return(result)
except gen.TimeoutError:
logging.error('### TOO LONG')
raise gen.Return('Next time, bro')
return wrap
#wait
def blocking_func(item):
time.sleep(30)
# this is not a Subprocess.
# It is a file IO and DB
return 'we are done here'
Still not sure, should wait() decorator being wrapped in a
coroutine?
Some times in a chain of calls of a blocking_func(), there can
be another ThreadPoolExecutor. I have a concern, would this work
without making "mine" one global, and passing to the
Tornado's run_in_executor()?
Tornado: v5.1.1
An example of usage of tornado.gen.with_timeout. Keep in mind the task needs to be async or else the IOLoop will be blocked and won't be able to process the timeout:
#gen.coroutine
def async_task():
# some async code
#gen.coroutine
def get(self):
delta = datetime.timedelta(seconds=1)
try:
task = self.async_task()
result = yield gen.with_timeout(delta, task)
self.write("success")
except gen.TimeoutError:
self.write("timeout")
I'd advise to use https://github.com/aio-libs/async-timeout:
import asyncio
import async_timeout
def post():
try:
async with async_timeout.timeout(1):
# some asynchronous operations
if not self.request.connection.stream.closed():
self.write('here is your response')
self.finish()
IOLoop.current().remove_timeout(timeout_obj)
except asyncio.TimeoutError:
self.write('too-late')
self.finish()
Background: I'm a very experienced Python programmer who is completely clueless about the new coroutines/async/await features. I can't write an async "hello world" to save my life.
My question is: I am given an arbitrary coroutine function f. I want to write a coroutine function g that will wrap f, i.e. I will give g to the user as if it was f, and the user will call it and be none the wiser, since g will be using f under the hood. Like when you decorate a normal Python function to add functionality.
The functionality that I want to add: Whenever the program flow goes into my coroutine, it acquires a context manager that I provide, and as soon as program flow goes out of the coroutine, it releases that context manager. Flow comes back in? Re-acquire the context manager. It goes back out? Re-release it. Until the coroutine is completely finished.
To demonstrate, here is the described functionality with plain generators:
def generator_wrapper(_, *args, **kwargs):
gen = function(*args, **kwargs)
method, incoming = gen.send, None
while True:
with self:
outgoing = method(incoming)
try:
method, incoming = gen.send, (yield outgoing)
except Exception as e:
method, incoming = gen.throw, e
Is it possible to do it with coroutines?
Coroutines are built on iterators - the __await__ special method is a regular iterator. This allows you to wrap the underlying iterator in yet another iterator. The trick is that you must unwrap the iterator of your target using its __await__, then re-wrap your own iterator using your own __await__.
The core functionality that works on instantiated coroutines looks like this:
class CoroWrapper:
"""Wrap ``target`` to have every send issued in a ``context``"""
def __init__(self, target: 'Coroutine', context: 'ContextManager'):
self.target = target
self.context = context
# wrap an iterator for use with 'await'
def __await__(self):
# unwrap the underlying iterator
target_iter = self.target.__await__()
# emulate 'yield from'
iter_send, iter_throw = target_iter.send, target_iter.throw
send, message = iter_send, None
while True:
# communicate with the target coroutine
try:
with self.context:
signal = send(message)
except StopIteration as err:
return err.value
else:
send = iter_send
# communicate with the ambient event loop
try:
message = yield signal
except BaseException as err:
send, message = iter_throw, err
Note that this explicitly works on a Coroutine, not an Awaitable - Coroutine.__await__ implements the generator interface. In theory, an Awaitable does not necessarily provide __await__().send or __await__().throw.
This is enough to pass messages in and out:
import asyncio
class PrintContext:
def __enter__(self):
print('enter')
def __exit__(self, exc_type, exc_val, exc_tb):
print('exit via', exc_type)
return False
async def main_coro():
print(
'wrapper returned',
await CoroWrapper(test_coro(), PrintContext())
)
async def test_coro(delay=0.5):
await asyncio.sleep(delay)
return 2
asyncio.run(main_coro())
# enter
# exit via None
# enter
# exit <class 'StopIteration'>
# wrapper returned 2
You can delegate the wrapping part to a separate decorator. This also ensures that you have an actual coroutine, not a custom class - some async libraries require this.
from functools import wraps
def send_context(context: 'ContextManager'):
"""Wrap a coroutine to issue every send in a context"""
def coro_wrapper(target: 'Callable[..., Coroutine]') -> 'Callable[..., Coroutine]':
#wraps(target)
async def context_coroutine(*args, **kwargs):
return await CoroWrapper(target(*args, **kwargs), context)
return context_coroutine
return coro_wrapper
This allows you to directly decorate a coroutine function:
#send_context(PrintContext())
async def test_coro(delay=0.5):
await asyncio.sleep(delay)
return 2
print('async run returned:', asyncio.run(test_coro()))
# enter
# exit via None
# enter
# exit via <class 'StopIteration'>
# async run returned: 2