I've been using asyncio and the http requests package aiohttp recently and I've run into a problem.
My application talks to a REST API.
for some API endpoints it makes sense to be able to dispatch multiple requests in parallel. Eg. sending different queries in the request to the same endpoint to get different data.
Though for some endpoints, this doesn't make sense. As in the endpoint always takes the same arguments (authentication) and returns requested information. (No point asking for the same data multiple times before the server has responded once) For these endpoints I need to enforce a 'serial' flow of requests. In that my program should only be able to send a request when it's not waiting for a response. (the typical behavior of blocking requests).
Of course I don't want to block.
This is an abstraction of what I intend to do. Essentially wrap the endpoint in an async generator that enforces this serial behavior.
I feel like I'm reinventing the wheel, Is there a common solution to this issue?
import asyncio
from time import sleep
# Encapsulate the idea of an endpoint that can't handle multiple requests
async def serialendpoint():
count = 0
while True:
count += 1
await asyncio.sleep(2)
yield str(count)
# Pretend client object
class ExampleClient(object):
gen = serialendpoint()
# Simulate a http request that sends multiple requests
async def simulate_multiple_http_requests(self):
print(await self.gen.asend(None))
print(await self.gen.asend(None))
print(await self.gen.asend(None))
print(await self.gen.asend(None))
async def other_stuff():
for _ in range(6):
await asyncio.sleep(1)
print('doing async stuff')
client = ExampleClient()
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.gather(client.simulate_multiple_http_requests(),
client.simulate_multiple_http_requests(),
other_stuff()))
outputs
doing async stuff
1
doing async stuff
doing async stuff
2
doing async stuff
doing async stuff
3
doing async stuff
4
5
6
7
8
update
This is the actual async generator I implemented:
All the endpoints that require serial behavior get assigned a serial_request_async_generator during the import phase. Which meant I couldn't initialize them with an await 'async_gen'.asend(None) as the await is only allowed in an async coroutine. The compromise is that every serial request at runtime must .asend(None) before asending the actual arguments. There must be a better way!
async def serial_request_async_generator():
args, kwargs = yield
while True:
yield await request(*args, **kwargs) # request is an aiohttp request
args, kwargs = yield
Related
I have an external library that uses requests module to perform http requests.
I need to use the library asynchronously without using many threads (it would be the last choice if nothing else works). And I can't change its source code either.
It would be easy to monkey-patch the library since all the interacting with requests module are done from a single function, but I don't know if I can monkey-patch synchronous function with asynchronous one (I mean async keyword).
Roughly, the problem simplifies to the following code:
import asyncio
import aiohttp
import types
import requests
# Can't modify Library class.
class Library:
def do(self):
self._request('example.com')
# Some other code here..
def _request(self, url):
return requests.get(url).text
# Monkey-patched to this method.
async def new_request(self, url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
async def main():
library = Library()
# Do monkey-patch.
library._request = types.MethodType(new_request, library)
# Call library asynchronously in hope that it will perform requests using aiohttp.
asyncio.gather(
library.do(),
library.do(),
library.do()
)
print('Done.')
asyncio.run(main())
But as expected, it doesn't work. I get TypeError: An asyncio.Future, a coroutine or an awaitable is required on asyncio.gather call. And also RuntimeWarning: coroutine 'new_request' was never awaited on self._request('example.com').
So the question is: is it possible to make that code work without modifying the Library class' source code? Otherwise, what options do I have to make asynchronous requests using the library?
Is it possible to make that code work without modifying the Library class' source code? Otherwise, what options do I have to make asynchronous requests using the library?
Yes, it is possible, and you even do not need monkey-patching to perform that. You should use asyncio.to_thread to make the synchronous do method of Library an asynchronous function (coroutine). So the main coroutine should look like this:
async def main():
library = Library()
await asyncio.gather(
asyncio.to_thread(library.do),
asyncio.to_thread(library.do),
asyncio.to_thread(library.do)
)
print('Done.')
Here the asyncio.to_thread wraps the library.do method and returns a coroutine object avoiding the first error, but you also need await before asyncio.gather.
NOTE: If you are going to check my answer with the above example, please do not forget to set a valid URL instead of 'example.com'.
Edit
If you do not want to use threads at all, I would recommend an async wrapper like the to_async function below and replace asyncio.to_thread with that.
async def to_async(func):
return func()
async def main():
library = Library()
await asyncio.gather(
to_async(library.do),
to_async(library.do),
to_async(library.do),
)
I thought to release another answer as it solves the problem in another way.
So why do not extend the default behavior of the Library class and make the do and _request methods polymorph?
class AsyncLibrary(Library):
async def do(self):
return await self._request('https://google.com/')
async def _request(self, url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
async def main():
library = AsyncLibrary()
await asyncio.gather(
library.do(),
library.do(),
library.do(),
)
No way to do this with requests without threads but you can limit the number of threads active at any one time to address your without using many threads requirement.
import asyncio
import requests
# Can't modify Library class.
class Library:
def do(self):
self._request('http://example.com')
def _request(self, url):
return requests.get(url).text
async def as_thread(semaphore, func):
async with semaphore: # limit the number of threads active
await asyncio.to_thread(func)
async def main():
library = Library()
semaphore = asyncio.Semaphore(2) # limit to 2 for example
tasks = [library.do] * 10 # pretend there are a lot of sites to read
await asyncio.gather(
*[as_thread(semaphore, x) for x in tasks]
)
print('Done.')
asyncio.run(main())
I'm extracting information using an API that has rate limitations. I'm doing this in an asynchronous way to speed up the process using asyncio and aiohttp. I'm gathering the calls in bunch of 10s, so I make 10 concurrent calls every time. If I receive a 429 I wait for 2 minutes and retry again... For the retry part, I'm using the backoff decorator.
My problem is that the retry is executed for the 10 calls and not only for the call failing... I'm not sure how to do that:
#backoff.on_exception(backoff.expo,aiohttp.ClientError,max_tries=20,logger=my_logger)
async def get_symbols(size,position):
async with aiohttp.ClientSession() as session:
queries = get_tasks(session,size,position)
responses = await asyncio.gather(*queries)
print("gathering responses")
for response in responses:
if response.status == 429:
print(response.headers)
print("Code 429 received waiting for 2 minutes")
print(response)
time.sleep(120)
raise aiohttp.ClientError()
else:
query_data = await response.read()
Does anyone have a way of just execute the failing call and not the whole bunch?
There are two issues in your code. First is duplicate sleep -- you probably don't understand how backoff works. Its whole point is to 1) try, 2) sleep exponentially increasing delay if there was an error, 3) retry your function/coroutine for you. Second, is that get_symbols is decorated with backoff, hence obviously it's retried as a whole.
How to improve?
Decorate individual request function
Let backoff do its "sleeping" job
Let aiohttp do its job by letting it raise for non-200 HTTP repropose codes by setting raise_for_status=True in ClientSession initialiser
It should look something like the following.
#backoff.on_exception(backoff.expo, aiohttp.ClientError, max_tries=20)
async def send_task(client, params):
async with client.get('https://python.org/', params=params) as resp:
return await resp.text()
def get_tasks(client, size, position):
for params in get_param_list(size, position)
yield send_task(client, params)
async def get_symbols(size,position):
async with aiohttp.ClientSession(raise_for_status=True) as client:
tasks = get_tasks(session, size, position)
responses = await asyncio.gather(*tasks)
for response in responses:
print(await response.read())
I have subscribed to a MQ queue. Every time I get a message, I pass it a function that then performs a number of time-consuming I/O actions on it.
The issue is that everything happens serially.
A request comes in, it picks up the request, performs the action by calling the function, and then picks up the next request.
I want to do this asynchronously so that multiple requests can be dealt with in an async manner.
results = []
queue = queue.subscribe(name)
async for message in queue:
yield my_funcion(message)
The biggest issue is that my_function is slow because it calls external web services and I want my code to process other messages in the meantime.
I tried to implement it above but it doesn't work! I am not sure how to implement async here.
I can't create a task because I don't know how many requests will be received. It's a MQ which I have subscribed to. I loop over each message and perform an action. I don't want for the function to complete before I perform the action on the next message. I want it to happen asynchronously.
If I understand your request, what you need is a queue that your request handlers fill, and that you read from from the code that needs to do something with the results.
If you insist on an async iterator, it is straightforward to use a generator to expose the contents of a queue. For example:
def make_asyncgen():
queue = asyncio.Queue(1)
async def feed(item):
await queue.put(item)
async def exhaust():
while True:
item = await queue.get()
yield item
return feed, exhaust()
make_asyncgen returns two objects: an async function and an async generator. The two are connected in such a way that, when you call the function with an item, the item gets emitted by the generator. For example:
import random, asyncio
# Emulate a server that takes some time to process each message,
# and then provides a result. Here it takes an async function
# that it will call with the result.
async def serve(server_ident, on_message):
while True:
await asyncio.sleep(random.uniform(1, 5))
await on_message('%s %s' % (server_ident, random.random()))
async def main():
# create the feed function, and the generator
feed, get = make_asyncgen()
# subscribe to serve several requests in parallel
asyncio.create_task(serve('foo', feed))
asyncio.create_task(serve('bar', feed))
asyncio.create_task(serve('baz', feed))
# process results from all three servers as they arrive
async for msg in get:
print('received', msg)
asyncio.run(main())
I am running a program that makes three different requests from a rest api. data, indicator, request functions all fetch data from BitMEX's api using a wrapper i've made.
I have used asyncio to try to speed up the process so that while i am waiting on a response from previous request, it can begin to make another one.
However, my asynchronous version is not running any quicker for some reason. The code works and as far as I know, I have set everything up correctly. But there could be something wrong with how I am setting up the coroutines?
Here is the asynchronous version:
import time
import asyncio
from bordemwrapper import BitMEXData, BitMEXFunctions
'''
asynchronous I/O
'''
async def data():
data = BitMEXData().get_ohlcv(symbol='XBTUSD', timeframe='1h',
instances=25)
await asyncio.sleep(0)
return data
async def indicator():
indicator = BitMEXData().get_indicator(symbol='XBTUSD',
timeframe='1h', indicator='RSI', period=20, source='close',
instances=25)
await asyncio.sleep(0)
return indicator
async def request():
request = BitMEXFunctions().get_price()
await asyncio.sleep(0)
return request
async def chain():
data_ = await data()
indicator_ = await indicator()
request_ = await request()
return data_, indicator_, request_
async def main():
await asyncio.gather(chain())
if __name__ == '__main__':
start = time.perf_counter()
asyncio.run(main())
end = time.perf_counter()
print('process finished in {} seconds'.format(end - start))
Unfortunately, asyncio isn't magic. Although you've put them in async functions, the BitMEXData().get_<foo> functions are not themselves async (i.e. you can't await them), and therefore block while they run. The concurrency in asyncio can only occur while awaiting something.
You'll need a library which makes the actual HTTP requests asynchronously, like aiohttp. It sounds like you wrote bordemwrapper yourself - you should rewrite the get_<foo> functions to use asynchronous HTTP requests. Feel free to submit a separate question if you need help with that.
Sorry, library first-timer here. I am polling a restful endpoint every 10 seconds.
Its not obvious to me which of the following is appropriate:
import aiohttp
import asyncio
async def poll(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as r:
return await r.text()
async def main():
while True:
await asyncio.sleep(10)
print(await poll('http://example.com/api'))
loop = asyncio.get_event_loop()
loop.create_task(main())
loop.run_forever()
Or the session variable persists forever:
import aiohttp
import asyncio
async def poll(url):
async with aiohttp.ClientSession() as session:
await asyncio.sleep(10)
async with session.get(url) as r:
print(await r.text())
loop = asyncio.get_event_loop()
loop.create_task(poll('http://example.com/api'))
loop.run_forever()
I expect the latter is desirable, but coming from the non-asynchronous requests library, I'm not used to the idea of sessions. Will I actually experience faster response times because of connection pooling or other things?
From official document:
Don’t create a session per request. Most likely you need a
session per application which performs all requests altogether.
A session contains a connection pool inside. Connection reusage and
keep-alives (both are on by default) may speed up total performance.
Surely the latter one is better and definitely you will have a faster experience.