So currently I have 4 api requests which are called in synchronous method using a third_party lib which is synchronous. What i want is to run them in parallel. So that my total time to call all 4 apis is reduced.
I am using fastapi as micro framework.
utilities.py
async def get_api_1_data():
data = some_third_party_lib()
return data
async def get_api_2_data():
data = some_third_party_lib()
return data
async def get_api_3_data():
data = some_third_party_lib()
return data
async def get_api_4_data():
data = some_third_party_lib()
return data
my main.py looks something like this
import asyncio
from fastapi import FastAPI
app = FastAPI()
#app.get("/")
async def fetch_new_exposure_api_data(node: str):
functions_to_run = [get_api_1_data(), get_api_2_data(), get_api_3_data(), get_api_4_data()]
r1, r2, r3, r4 = await asyncio.gather(*functions_to_run)
return [r1, r2, r3, r4]
So the issue is i can not put await in front of some_third_party_lib() as it's not a async lib it's a sync lib.
So is there any way i can convert them to async functionality to run them in parallel.
Unfortunately, you cannot make a synchronous function asynchronous without changing its implementation. If some_third_party_lib() is a synchronous function, you cannot make it run in parallel using the asyncio library.
One workaround is to run each of the calls to some_third_party_lib() in a separate thread. You can use the concurrent.futures library in Python to create and manage a pool of worker threads. Here's an example setup that I came up with:
import concurrent.futures
def get_api_1_data():
return some_third_party_lib()
def get_api_2_data():
return some_third_party_lib()
def get_api_3_data():
return some_third_party_lib()
def get_api_4_data():
return some_third_party_lib()
#app.get("/")
async def fetch_new_exposure_api_data(node: str):
with concurrent.futures.ThreadPoolExecutor() as executor:
results = [executor.submit(fn) for fn in [get_api_1_data, get_api_2_data, get_api_3_data, get_api_4_data]]
return [result.result() for result in concurrent.futures.as_completed(results)]
This way, each call to some_third_party_lib() will run in a separate thread and they will run in parallel. Note that the order of the results in the returned list may not match the order of the functions in the functions_to_run list.
Related
There is a method In C# Task.Run that accepts delegate as parameter and returns task that can be awaited.
Is there such thing in Python asyncio?
I need to wrap a sync block of code to async task.
There is something like this in Python
Python docs. for ex:
https://docs.python.org/3/library/asyncio.html
for ex:
import asyncio
async def print_number():
print(1)
async def main():
task = asyncio.create_task(print_number())
await task
asyncio.run(main())
I have an external library that uses requests module to perform http requests.
I need to use the library asynchronously without using many threads (it would be the last choice if nothing else works). And I can't change its source code either.
It would be easy to monkey-patch the library since all the interacting with requests module are done from a single function, but I don't know if I can monkey-patch synchronous function with asynchronous one (I mean async keyword).
Roughly, the problem simplifies to the following code:
import asyncio
import aiohttp
import types
import requests
# Can't modify Library class.
class Library:
def do(self):
self._request('example.com')
# Some other code here..
def _request(self, url):
return requests.get(url).text
# Monkey-patched to this method.
async def new_request(self, url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
async def main():
library = Library()
# Do monkey-patch.
library._request = types.MethodType(new_request, library)
# Call library asynchronously in hope that it will perform requests using aiohttp.
asyncio.gather(
library.do(),
library.do(),
library.do()
)
print('Done.')
asyncio.run(main())
But as expected, it doesn't work. I get TypeError: An asyncio.Future, a coroutine or an awaitable is required on asyncio.gather call. And also RuntimeWarning: coroutine 'new_request' was never awaited on self._request('example.com').
So the question is: is it possible to make that code work without modifying the Library class' source code? Otherwise, what options do I have to make asynchronous requests using the library?
Is it possible to make that code work without modifying the Library class' source code? Otherwise, what options do I have to make asynchronous requests using the library?
Yes, it is possible, and you even do not need monkey-patching to perform that. You should use asyncio.to_thread to make the synchronous do method of Library an asynchronous function (coroutine). So the main coroutine should look like this:
async def main():
library = Library()
await asyncio.gather(
asyncio.to_thread(library.do),
asyncio.to_thread(library.do),
asyncio.to_thread(library.do)
)
print('Done.')
Here the asyncio.to_thread wraps the library.do method and returns a coroutine object avoiding the first error, but you also need await before asyncio.gather.
NOTE: If you are going to check my answer with the above example, please do not forget to set a valid URL instead of 'example.com'.
Edit
If you do not want to use threads at all, I would recommend an async wrapper like the to_async function below and replace asyncio.to_thread with that.
async def to_async(func):
return func()
async def main():
library = Library()
await asyncio.gather(
to_async(library.do),
to_async(library.do),
to_async(library.do),
)
I thought to release another answer as it solves the problem in another way.
So why do not extend the default behavior of the Library class and make the do and _request methods polymorph?
class AsyncLibrary(Library):
async def do(self):
return await self._request('https://google.com/')
async def _request(self, url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
async def main():
library = AsyncLibrary()
await asyncio.gather(
library.do(),
library.do(),
library.do(),
)
No way to do this with requests without threads but you can limit the number of threads active at any one time to address your without using many threads requirement.
import asyncio
import requests
# Can't modify Library class.
class Library:
def do(self):
self._request('http://example.com')
def _request(self, url):
return requests.get(url).text
async def as_thread(semaphore, func):
async with semaphore: # limit the number of threads active
await asyncio.to_thread(func)
async def main():
library = Library()
semaphore = asyncio.Semaphore(2) # limit to 2 for example
tasks = [library.do] * 10 # pretend there are a lot of sites to read
await asyncio.gather(
*[as_thread(semaphore, x) for x in tasks]
)
print('Done.')
asyncio.run(main())
I want to run a simple background task in FastAPI, which involves some computation before dumping it into the database. However, the computation would block it from receiving any more requests.
from fastapi import BackgroundTasks, FastAPI
app = FastAPI()
db = Database()
async def task(data):
otherdata = await db.fetch("some sql")
newdata = somelongcomputation(data,otherdata) # this blocks other requests
await db.execute("some sql",newdata)
#app.post("/profile")
async def profile(data: Data, background_tasks: BackgroundTasks):
background_tasks.add_task(task, data)
return {}
What is the best way to solve this issue?
Your task is defined as async, which means fastapi (or rather starlette) will run it in the asyncio event loop.
And because somelongcomputation is synchronous (i.e. not waiting on some IO, but doing computation) it will block the event loop as long as it is running.
I see a few ways of solving this:
Use more workers (e.g. uvicorn main:app --workers 4). This will allow up to 4 somelongcomputation in parallel.
Rewrite your task to not be async (i.e. define it as def task(data): ... etc). Then starlette will run it in a separate thread.
Use fastapi.concurrency.run_in_threadpool, which will also run it in a separate thread. Like so:
from fastapi.concurrency import run_in_threadpool
async def task(data):
otherdata = await db.fetch("some sql")
newdata = await run_in_threadpool(lambda: somelongcomputation(data, otherdata))
await db.execute("some sql", newdata)
Or use asyncios's run_in_executor directly (which run_in_threadpool uses under the hood):
import asyncio
async def task(data):
otherdata = await db.fetch("some sql")
loop = asyncio.get_running_loop()
newdata = await loop.run_in_executor(None, lambda: somelongcomputation(data, otherdata))
await db.execute("some sql", newdata)
You could even pass in a concurrent.futures.ProcessPoolExecutor as the first argument to run_in_executor to run it in a separate process.
Spawn a separate thread / process yourself. E.g. using concurrent.futures.
Use something more heavy-handed like celery. (Also mentioned in the fastapi docs here).
If your task is CPU bound you could use multiprocessing, there is way to do that with Background task in FastAPI:
https://stackoverflow.com/a/63171013
Although you should consider to use something like Celery if there are lot of cpu-heavy tasks.
Read this issue.
Also in the example below, my_model.function_b could be any blocking function or process.
TL;DR
from starlette.concurrency import run_in_threadpool
#app.get("/long_answer")
async def long_answer():
rst = await run_in_threadpool(my_model.function_b, arg_1, arg_2)
return rst
This is a example of Background Task To FastAPI
from fastapi import FastAPI
import asyncio
app = FastAPI()
x = [1] # a global variable x
#app.get("/")
def hello():
return {"message": "hello", "x":x}
async def periodic():
while True:
# code to run periodically starts here
x[0] += 1
print(f"x is now {x}")
# code to run periodically ends here
# sleep for 3 seconds after running above code
await asyncio.sleep(3)
#app.on_event("startup")
async def schedule_periodic():
loop = asyncio.get_event_loop()
loop.create_task(periodic())
if __name__ == "__main__":
import uvicorn
uvicorn.run(app)
I am using my Raspberry Pi and the pigpio and websockets libraries.
I want my program to run asynchronously (i.e. I will use async def main as the entry point).
The pigpio library expects a synchronous callback function to be called in response to events, which is fine, but from within that callback I want to call another, asynchronous function from the websocket library.
So it would look like:
def sync_cb(): # <- This can not be made async, therefore I can not use await
[ws.send('test') for ws in connected_ws] # <- This is async and has to be awaited
Currently I can get it to work with:
def sync_cb():
asyncio.run(asyncio.wait([ws.send('test') for ws in connected_ws]))
but the docs say this use of asyncio.run is discouraged.
So my synchronous callback needs to call ws.send (also from a third party library) which is async from a function that is synchronous.
Another option that works is:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
loop.run_until_complete(asyncio.gather(*[ws.send(json.dumps(message)) for ws in connected_ws]))
But the three lines of creating and setting an even loop sounds like a lot just to run a simple async function.
My questions are:
Is it possible to substitute an async function where a synchronous callback is required (i.e. is there a way to make cb async in this example)
And, what kind of overhead am I incurring by using asyncio.run and asyncio.wait just to call a simple async method (in the list comprehension)
You could use run_coroutine_threadsafe function returning concurrent.furures.Future, which can be waited synchronously, to wrap coroutine to regular function and call it from synchronous code.
As I understand it, this approach is more appropriate if sync code (of third party lib) is executed in separate thread, but it can be adapted to single-threaded execution with some modifications.
An example to illustrate the approach:
import asyncio
def async_to_sync(loop, foo):
def foo_(*args, **kwargs):
return asyncio.run_coroutine_threadsafe(foo(*args, **kwargs), loop).result()
return foo_
def sync_code(cb):
for i in range(10):
cb(i)
async def async_cb(a):
print("async callback:", a)
async def main():
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, sync_code, async_to_sync(loop, async_cb))
asyncio.run(main())
Output:
async callback: 0
async callback: 1
async callback: 2
...
Is it possible to substitute an async function where a synchronous callback is required
It is possible. You can run event loop in separate thread and emit async code there, but you have to consider GIL.
import asyncio
import threading
class Portal:
def __init__(self, stop_event):
self.loop = asyncio.get_event_loop()
self.stop_event = stop_event
async def _call(self, fn, args, kwargs):
return await fn(*args, **kwargs)
async def _stop(self):
self.stop_event.set()
def call(self, fn, *args, **kwargs):
return asyncio.run_coroutine_threadsafe(self._call(fn, args, kwargs), self.loop)
def stop(self):
return self.call(self._stop)
def create_portal():
portal = None
async def wait_stop():
nonlocal portal
stop_event = asyncio.Event()
portal = Portal(stop_event)
running_event.set()
await stop_event.wait()
def run():
asyncio.run(wait_stop())
running_event = threading.Event()
thread = threading.Thread(target=run)
thread.start()
running_event.wait()
return portal
Usage example:
async def test(msg):
await asyncio.sleep(0.5)
print(msg)
return "HELLO " + msg
# it'll run a new event loop in separate thread
portal = create_portal()
# it'll call `test` in the separate thread and return a Future
print(portal.call(test, "WORLD").result())
portal.stop().result()
In your case:
def sync_cb():
calls = [portal.call(ws.send, 'test') for ws in connected_ws]
# if you want to get results from these calls:
# [c.result() for c in calls]
And, what kind of overhead am I incurring by using asyncio.run and asyncio.wait just to call a simple async method
asyncio.run will create a new event loop and close it then. Most likely if the callback is not called often it won't be a problem. But if you will use asyncio.run in another callback too, then they won't be able to work concurrently.
I am running a program that makes three different requests from a rest api. data, indicator, request functions all fetch data from BitMEX's api using a wrapper i've made.
I have used asyncio to try to speed up the process so that while i am waiting on a response from previous request, it can begin to make another one.
However, my asynchronous version is not running any quicker for some reason. The code works and as far as I know, I have set everything up correctly. But there could be something wrong with how I am setting up the coroutines?
Here is the asynchronous version:
import time
import asyncio
from bordemwrapper import BitMEXData, BitMEXFunctions
'''
asynchronous I/O
'''
async def data():
data = BitMEXData().get_ohlcv(symbol='XBTUSD', timeframe='1h',
instances=25)
await asyncio.sleep(0)
return data
async def indicator():
indicator = BitMEXData().get_indicator(symbol='XBTUSD',
timeframe='1h', indicator='RSI', period=20, source='close',
instances=25)
await asyncio.sleep(0)
return indicator
async def request():
request = BitMEXFunctions().get_price()
await asyncio.sleep(0)
return request
async def chain():
data_ = await data()
indicator_ = await indicator()
request_ = await request()
return data_, indicator_, request_
async def main():
await asyncio.gather(chain())
if __name__ == '__main__':
start = time.perf_counter()
asyncio.run(main())
end = time.perf_counter()
print('process finished in {} seconds'.format(end - start))
Unfortunately, asyncio isn't magic. Although you've put them in async functions, the BitMEXData().get_<foo> functions are not themselves async (i.e. you can't await them), and therefore block while they run. The concurrency in asyncio can only occur while awaiting something.
You'll need a library which makes the actual HTTP requests asynchronously, like aiohttp. It sounds like you wrote bordemwrapper yourself - you should rewrite the get_<foo> functions to use asynchronous HTTP requests. Feel free to submit a separate question if you need help with that.