No speedup using asyncio despite awaiting API response - python

I am running a program that makes three different requests from a rest api. data, indicator, request functions all fetch data from BitMEX's api using a wrapper i've made.
I have used asyncio to try to speed up the process so that while i am waiting on a response from previous request, it can begin to make another one.
However, my asynchronous version is not running any quicker for some reason. The code works and as far as I know, I have set everything up correctly. But there could be something wrong with how I am setting up the coroutines?
Here is the asynchronous version:
import time
import asyncio
from bordemwrapper import BitMEXData, BitMEXFunctions
'''
asynchronous I/O
'''
async def data():
data = BitMEXData().get_ohlcv(symbol='XBTUSD', timeframe='1h',
instances=25)
await asyncio.sleep(0)
return data
async def indicator():
indicator = BitMEXData().get_indicator(symbol='XBTUSD',
timeframe='1h', indicator='RSI', period=20, source='close',
instances=25)
await asyncio.sleep(0)
return indicator
async def request():
request = BitMEXFunctions().get_price()
await asyncio.sleep(0)
return request
async def chain():
data_ = await data()
indicator_ = await indicator()
request_ = await request()
return data_, indicator_, request_
async def main():
await asyncio.gather(chain())
if __name__ == '__main__':
start = time.perf_counter()
asyncio.run(main())
end = time.perf_counter()
print('process finished in {} seconds'.format(end - start))

Unfortunately, asyncio isn't magic. Although you've put them in async functions, the BitMEXData().get_<foo> functions are not themselves async (i.e. you can't await them), and therefore block while they run. The concurrency in asyncio can only occur while awaiting something.
You'll need a library which makes the actual HTTP requests asynchronously, like aiohttp. It sounds like you wrote bordemwrapper yourself - you should rewrite the get_<foo> functions to use asynchronous HTTP requests. Feel free to submit a separate question if you need help with that.

Related

How to integrate asyncronous python code into synchronous function?

I have an external library that uses requests module to perform http requests.
I need to use the library asynchronously without using many threads (it would be the last choice if nothing else works). And I can't change its source code either.
It would be easy to monkey-patch the library since all the interacting with requests module are done from a single function, but I don't know if I can monkey-patch synchronous function with asynchronous one (I mean async keyword).
Roughly, the problem simplifies to the following code:
import asyncio
import aiohttp
import types
import requests
# Can't modify Library class.
class Library:
def do(self):
self._request('example.com')
# Some other code here..
def _request(self, url):
return requests.get(url).text
# Monkey-patched to this method.
async def new_request(self, url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
async def main():
library = Library()
# Do monkey-patch.
library._request = types.MethodType(new_request, library)
# Call library asynchronously in hope that it will perform requests using aiohttp.
asyncio.gather(
library.do(),
library.do(),
library.do()
)
print('Done.')
asyncio.run(main())
But as expected, it doesn't work. I get TypeError: An asyncio.Future, a coroutine or an awaitable is required on asyncio.gather call. And also RuntimeWarning: coroutine 'new_request' was never awaited on self._request('example.com').
So the question is: is it possible to make that code work without modifying the Library class' source code? Otherwise, what options do I have to make asynchronous requests using the library?
Is it possible to make that code work without modifying the Library class' source code? Otherwise, what options do I have to make asynchronous requests using the library?
Yes, it is possible, and you even do not need monkey-patching to perform that. You should use asyncio.to_thread to make the synchronous do method of Library an asynchronous function (coroutine). So the main coroutine should look like this:
async def main():
library = Library()
await asyncio.gather(
asyncio.to_thread(library.do),
asyncio.to_thread(library.do),
asyncio.to_thread(library.do)
)
print('Done.')
Here the asyncio.to_thread wraps the library.do method and returns a coroutine object avoiding the first error, but you also need await before asyncio.gather.
NOTE: If you are going to check my answer with the above example, please do not forget to set a valid URL instead of 'example.com'.
Edit
If you do not want to use threads at all, I would recommend an async wrapper like the to_async function below and replace asyncio.to_thread with that.
async def to_async(func):
return func()
async def main():
library = Library()
await asyncio.gather(
to_async(library.do),
to_async(library.do),
to_async(library.do),
)
I thought to release another answer as it solves the problem in another way.
So why do not extend the default behavior of the Library class and make the do and _request methods polymorph?
class AsyncLibrary(Library):
async def do(self):
return await self._request('https://google.com/')
async def _request(self, url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
async def main():
library = AsyncLibrary()
await asyncio.gather(
library.do(),
library.do(),
library.do(),
)
No way to do this with requests without threads but you can limit the number of threads active at any one time to address your without using many threads requirement.
import asyncio
import requests
# Can't modify Library class.
class Library:
def do(self):
self._request('http://example.com')
def _request(self, url):
return requests.get(url).text
async def as_thread(semaphore, func):
async with semaphore: # limit the number of threads active
await asyncio.to_thread(func)
async def main():
library = Library()
semaphore = asyncio.Semaphore(2) # limit to 2 for example
tasks = [library.do] * 10 # pretend there are a lot of sites to read
await asyncio.gather(
*[as_thread(semaphore, x) for x in tasks]
)
print('Done.')
asyncio.run(main())

FastAPI asynchronous background tasks blocks other requests?

I want to run a simple background task in FastAPI, which involves some computation before dumping it into the database. However, the computation would block it from receiving any more requests.
from fastapi import BackgroundTasks, FastAPI
app = FastAPI()
db = Database()
async def task(data):
otherdata = await db.fetch("some sql")
newdata = somelongcomputation(data,otherdata) # this blocks other requests
await db.execute("some sql",newdata)
#app.post("/profile")
async def profile(data: Data, background_tasks: BackgroundTasks):
background_tasks.add_task(task, data)
return {}
What is the best way to solve this issue?
Your task is defined as async, which means fastapi (or rather starlette) will run it in the asyncio event loop.
And because somelongcomputation is synchronous (i.e. not waiting on some IO, but doing computation) it will block the event loop as long as it is running.
I see a few ways of solving this:
Use more workers (e.g. uvicorn main:app --workers 4). This will allow up to 4 somelongcomputation in parallel.
Rewrite your task to not be async (i.e. define it as def task(data): ... etc). Then starlette will run it in a separate thread.
Use fastapi.concurrency.run_in_threadpool, which will also run it in a separate thread. Like so:
from fastapi.concurrency import run_in_threadpool
async def task(data):
otherdata = await db.fetch("some sql")
newdata = await run_in_threadpool(lambda: somelongcomputation(data, otherdata))
await db.execute("some sql", newdata)
Or use asyncios's run_in_executor directly (which run_in_threadpool uses under the hood):
import asyncio
async def task(data):
otherdata = await db.fetch("some sql")
loop = asyncio.get_running_loop()
newdata = await loop.run_in_executor(None, lambda: somelongcomputation(data, otherdata))
await db.execute("some sql", newdata)
You could even pass in a concurrent.futures.ProcessPoolExecutor as the first argument to run_in_executor to run it in a separate process.
Spawn a separate thread / process yourself. E.g. using concurrent.futures.
Use something more heavy-handed like celery. (Also mentioned in the fastapi docs here).
If your task is CPU bound you could use multiprocessing, there is way to do that with Background task in FastAPI:
https://stackoverflow.com/a/63171013
Although you should consider to use something like Celery if there are lot of cpu-heavy tasks.
Read this issue.
Also in the example below, my_model.function_b could be any blocking function or process.
TL;DR
from starlette.concurrency import run_in_threadpool
#app.get("/long_answer")
async def long_answer():
rst = await run_in_threadpool(my_model.function_b, arg_1, arg_2)
return rst
This is a example of Background Task To FastAPI
from fastapi import FastAPI
import asyncio
app = FastAPI()
x = [1] # a global variable x
#app.get("/")
def hello():
return {"message": "hello", "x":x}
async def periodic():
while True:
# code to run periodically starts here
x[0] += 1
print(f"x is now {x}")
# code to run periodically ends here
# sleep for 3 seconds after running above code
await asyncio.sleep(3)
#app.on_event("startup")
async def schedule_periodic():
loop = asyncio.get_event_loop()
loop.create_task(periodic())
if __name__ == "__main__":
import uvicorn
uvicorn.run(app)

Safely awaiting two event sources in asyncio/Quart

Quart is a Python web framework which re-implements the Flask API on top of the asyncio coroutine system of Python. In my particular case, I have a Quart websocket endpoint which is supposed to have not just one source of incoming events, but two possible sources of events which are supposed to continue the asynchronous loop.
An example with one event source:
from quart import Quart, websocket
app = Quart(__name__)
#app.websocket("/echo")
def echo():
while True:
incoming_message = await websocket.receive()
await websocket.send(incoming_message)
Taken from https://pgjones.gitlab.io/quart/
This example has one source: the incoming message stream. But what is the correct pattern if I had two possible sources, one being await websocket.receive() and another one being something along the lines of await system.get_next_external_notification() .
If either of them arrives, I'd like to send a websocket message.
I think I'll have to use asyncio.wait(..., return_when=FIRST_COMPLETED), but how do I make sure that I miss no data (i.e. for the race condition that websocket.receive() and system.get_next_external_notification() both finish almost exactly at the same time) ? What's the correct pattern in this case?
An idea you could use is a Queue to join the events together from different sources, then have an async function listening in the background to that queue for requests. Something like this might get you started:
import asyncio
from quart import Quart, websocket
app = Quart(__name__)
#app.before_serving
async def startup():
print(f'starting')
app.q = asyncio.Queue(1)
asyncio.ensure_future(listener(app.q))
async def listener(q):
while True:
returnq, msg = await q.get()
print(msg)
await returnq.put(f'hi: {msg}')
#app.route("/echo/<message>")
async def echo(message):
while True:
returnq = asyncio.Queue(1)
await app.q.put((returnq, message))
response = await returnq.get()
return response
#app.route("/echo2/<message>")
async def echo2(message):
while True:
returnq = asyncio.Queue(1)
await app.q.put((returnq, message))
response = await returnq.get()
return response

Removing async pollution from Python

How do I remove the async-everywhere insanity in a program like this?
import asyncio
async def async_coro():
await asyncio.sleep(1)
async def sync_func_1():
# This is blocking and synchronous
await async_coro()
async def sync_func_2():
# This is blocking and synchronous
await sync_func_1()
if __name__ == "__main__":
# Async pollution goes all the way to __main__
asyncio.run(sync_func_2())
I need to have 3 async markers and asyncio.run at the top level just to call one async function. I assume I'm doing something wrong - how can I clean up this code to make it use async less?
FWIW, I'm interested mostly because I'm writing an API using asyncio and I don't want my users to have to think too much about whether their functions need to be def or async def depending on whether they're using a async part of the API or not.
After some research, one answer is to manually manage the event loop:
import asyncio
async def async_coro():
await asyncio.sleep(1)
def sync_func_1():
# This is blocking and synchronous
loop = asyncio.get_event_loop()
coro = async_coro()
loop.run_until_complete(coro)
def sync_func_2():
# This is blocking and synchronous
sync_func_1()
if __name__ == "__main__":
# No more async pollution
sync_func_2()
If you must do that, I would recommend an approach like this:
import asyncio, threading
async def async_coro():
await asyncio.sleep(1)
_loop = asyncio.new_event_loop()
threading.Thread(target=_loop.run_forever, daemon=True).start()
def sync_func_1():
# This is blocking and synchronous
return asyncio.run_coroutine_threadsafe(async_coro(), _loop).result()
def sync_func_2():
# This is blocking and synchronous
sync_func_1()
if __name__ == "__main__":
sync_func_2()
The advantage of this approach compared to one where sync functions run the event loop is that it supports nesting of sync functions. It also only runs a single event loop, so that if the underlying library wants to set up e.g. a background task for monitoring or such, it will work continuously rather than being spawned each time anew.

Enforce serial requests in asyncio

I've been using asyncio and the http requests package aiohttp recently and I've run into a problem.
My application talks to a REST API.
for some API endpoints it makes sense to be able to dispatch multiple requests in parallel. Eg. sending different queries in the request to the same endpoint to get different data.
Though for some endpoints, this doesn't make sense. As in the endpoint always takes the same arguments (authentication) and returns requested information. (No point asking for the same data multiple times before the server has responded once) For these endpoints I need to enforce a 'serial' flow of requests. In that my program should only be able to send a request when it's not waiting for a response. (the typical behavior of blocking requests).
Of course I don't want to block.
This is an abstraction of what I intend to do. Essentially wrap the endpoint in an async generator that enforces this serial behavior.
I feel like I'm reinventing the wheel, Is there a common solution to this issue?
import asyncio
from time import sleep
# Encapsulate the idea of an endpoint that can't handle multiple requests
async def serialendpoint():
count = 0
while True:
count += 1
await asyncio.sleep(2)
yield str(count)
# Pretend client object
class ExampleClient(object):
gen = serialendpoint()
# Simulate a http request that sends multiple requests
async def simulate_multiple_http_requests(self):
print(await self.gen.asend(None))
print(await self.gen.asend(None))
print(await self.gen.asend(None))
print(await self.gen.asend(None))
async def other_stuff():
for _ in range(6):
await asyncio.sleep(1)
print('doing async stuff')
client = ExampleClient()
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.gather(client.simulate_multiple_http_requests(),
client.simulate_multiple_http_requests(),
other_stuff()))
outputs
doing async stuff
1
doing async stuff
doing async stuff
2
doing async stuff
doing async stuff
3
doing async stuff
4
5
6
7
8
update
This is the actual async generator I implemented:
All the endpoints that require serial behavior get assigned a serial_request_async_generator during the import phase. Which meant I couldn't initialize them with an await 'async_gen'.asend(None) as the await is only allowed in an async coroutine. The compromise is that every serial request at runtime must .asend(None) before asending the actual arguments. There must be a better way!
async def serial_request_async_generator():
args, kwargs = yield
while True:
yield await request(*args, **kwargs) # request is an aiohttp request
args, kwargs = yield

Categories

Resources