How to use python asyncio to asynchronously dispatch tasks? - python

I am looking to implement a simple p2p file downloader in python. The downloader needs to work within the following limitations:
When queried, the tracker responds with exactly 2 peers
The tracker ignores requests made within 2 seconds of the previous request
The downloader should attempt to download a single file as fast as possible. From my reading about this I thought that I should be using asyncio. I thought I'd structure the code something like this pseudo code:
async def downloader(peer):
while file is not downloaded:
download a new block from the peer without blocking
response = synchronously query tracker for initial pair of peers
newPeers = exractPeers(response)
while file not downloaded:
for peer in newPeers:
dispatch new downloader(peer)
wait 2 seconds without blocking downloaders
response = query tracker for initial pair of peers without blocking downloaders
newPeers = exractPeers(response)
What asyncio methods should I be using to "dispatch" new downloaders and query the tracker? From my testing it seems that awaiting an async function blocks the event loop:
import asyncio
from random import randint
async def myfunc(i):
print("hello", i)
await asyncio.sleep(randint(1,3))
print("world", i)
async def main():
await myfunc(1)
await myfunc(2)
await myfunc(3)
await myfunc(4)
asyncio.run(main())
This code runs each myfunc call as though it were defined without async. I'd appreciate anything, including basic pointers.

Related

Python HTTPX | RuntimeError: The connection pool was closed while 6 HTTP requests/responses were still in-flight

I've come across this error multiple times while using the HTTPX module. I believe I know what it means but I don't know how to solve it.
In the following example, I have an asynchronous function gather_players() that sends get requests to an API I'm using and then returns a list of all the players from a specified NBA team. Inside of teamRoster() I'm using asyncio.run() to initiate gather_players() and that's the line that produces this error: RuntimeError: The connection pool was closed while 6 HTTP requests/responses were still in-flight
async def gather_players(list_of_urlCodes):
async def get_json(client, link):
response = await client.get(BASE_URL + link)
return response.json()['league']['standard']['players']
async with httpx.AsyncClient() as client:
tasks = []
for code in list_of_urlCodes:
link = f'/prod/v1/2022/teams/{code}/roster.json'
tasks.append(asyncio.create_task(get_json(client, link)))
list_of_people = await asyncio.gather(*tasks)
return list_of_people
def teamRoster(list_of_urlCodes: list) -> list:
list_of_personIds = asyncio.run(gather_players(list_of_urlCodes))
finalResult = []
for person in list_of_personIds:
personId = person['personId']
#listOfPLayers is a list of every NBA player that I got
#from a previous get request
for player in listOfPlayers:
if personId == player['personId']:
finalResult.append({
"playerName": f"{player['firstName']} {player['lastName']}",
"personId": player['personId'],
"jersey": player['jersey'],
"pos": player['pos'],
"heightMeters": player['heightMeters'],
"weightKilograms": player['weightKilograms'],
"dateOfBirthUTC": player['dateOfBirthUTC'],
"nbaDebutYear": player['nbaDebutYear'],
"country": player['country']
})
return finalResult
*Note: The teamRoster() function in my original script is actually a class method and I've also used the same technique with the asynchronous function to send multiple get request in an earlier part of my script.
I was able to finally find a solution to this problem. For some reason the context manager: async with httpx.AsyncClient() as client fails to properly close the AsyncClient. A quick fix to this problem is closing it manually using: client.aclose()
Before:
async with httpx.AsyncClient() as client:
tasks = []
for code in list_of_urlCodes:
link = f'/prod/v1/2022/teams/{code}/roster.json'
tasks.append(asyncio.create_task(get_json(client, link)))
list_of_people = await asyncio.gather(*tasks)
return list_of_people
After:
client = httpx.AsyncClient()
tasks = []
for code in list_of_urlCodes:
link = f'/prod/v1/2022/teams/{code}/roster.json'
tasks.append(asyncio.create_task(get_json(client, link)))
list_of_people = await asyncio.gather(*tasks)
client.aclose()
return list_of_people
The accepted answer claims that the original code failed to properly close the client because it didn't call aclose(), and while that's technically true the implementation of the async context manager exit method (__aexit__) essentially duplicates the aclose() implementation.
In fact, you can tell that the connection is closed because the error message complains about 6 HTTP requests remaining in-flight after the connection is closed.
By contrast, the accepted answer "fixes" the error by explicitly not closing the connection. Because httpx.AsyncClient.aclose is an async function, calling it without awaiting creates a coroutine that is not actually scheduled for execution on the event loop. That coroutine is then destroyed when the function returns immediately after without having ever actually executed, meaning the connection is never closed. Python should print a RuntimeWarning that client.aclose() was never awaited. As a result, each request has plenty of time to complete before the process terminates and force-closes each connection so the RuntimeError is never raised.
While I don't know the full reason that some requests were still in-flight, I suspect it was some cleanup at the end that didn't finish before the function returned and the connections were closed. For instance, if you put await asyncio.sleep(1) right before the return, then the error would likely go away as the client would have time to finish and clean up after each of its requests. (Note I'm not saying this is a good fix, but rather would help provide evidence to back up my explanation.)
Instead of using asyncio.gather, try using TaskGroups as recommended by the Python docs for asyncio.gather. So your new code could look something like this:
async def gather_players(list_of_urlCodes):
async def get_json(client, link):
response = await client.get(BASE_URL + link)
return response.json()['league']['standard']['players']
async with httpx.AsyncClient() as client:
async with asyncio.TaskGroup() as tg:
tasks = [tg.create_task(get_json(client, f'/prod/v1/2022/teams/{code}/roster.json')) for code in list_of_urlCodes]
list_of_people = [task.result for task in tasks]
return list_of_people
This is obviously not production-grade code, as it is missing error-handling, but demonstrates the suggestion clearly enough.

How do I implement async generators?

I have subscribed to a MQ queue. Every time I get a message, I pass it a function that then performs a number of time-consuming I/O actions on it.
The issue is that everything happens serially.
A request comes in, it picks up the request, performs the action by calling the function, and then picks up the next request.
I want to do this asynchronously so that multiple requests can be dealt with in an async manner.
results = []
queue = queue.subscribe(name)
async for message in queue:
yield my_funcion(message)
The biggest issue is that my_function is slow because it calls external web services and I want my code to process other messages in the meantime.
I tried to implement it above but it doesn't work! I am not sure how to implement async here.
I can't create a task because I don't know how many requests will be received. It's a MQ which I have subscribed to. I loop over each message and perform an action. I don't want for the function to complete before I perform the action on the next message. I want it to happen asynchronously.
If I understand your request, what you need is a queue that your request handlers fill, and that you read from from the code that needs to do something with the results.
If you insist on an async iterator, it is straightforward to use a generator to expose the contents of a queue. For example:
def make_asyncgen():
queue = asyncio.Queue(1)
async def feed(item):
await queue.put(item)
async def exhaust():
while True:
item = await queue.get()
yield item
return feed, exhaust()
make_asyncgen returns two objects: an async function and an async generator. The two are connected in such a way that, when you call the function with an item, the item gets emitted by the generator. For example:
import random, asyncio
# Emulate a server that takes some time to process each message,
# and then provides a result. Here it takes an async function
# that it will call with the result.
async def serve(server_ident, on_message):
while True:
await asyncio.sleep(random.uniform(1, 5))
await on_message('%s %s' % (server_ident, random.random()))
async def main():
# create the feed function, and the generator
feed, get = make_asyncgen()
# subscribe to serve several requests in parallel
asyncio.create_task(serve('foo', feed))
asyncio.create_task(serve('bar', feed))
asyncio.create_task(serve('baz', feed))
# process results from all three servers as they arrive
async for msg in get:
print('received', msg)
asyncio.run(main())

Using Mongo Motor with async create_task

I want to submit data to MongoDB in a non-blocking way. Other questions on this website recommend either:
Phrasing this problem as a single-producer, single-consumer problem with queue communication.
Using Motor with asyncio and create_task.
All examples in the Motor documentation use asyncio.get_event_loop() and run_until_complete(). However, this is a blocking way of running co-routines. I would like to instead use the non-blocking method of create_task.
Here's what I've tried:
client = motor.motor_asyncio.AsyncIOMotorClient(my_login_url)
async def do_find_one():
doc = client.my_db.my_collection.find_one()
pprint.pprint(doc)
async def main():
task = asyncio.create_task(do_find_one())
print("other things")
await task
print("done")
asyncio.run(main())
However, instead of a MongoDB entry, task just prints a pending Future which I don't know how to get the value of. How am I supposed to be using motor with create_task?

Enforce serial requests in asyncio

I've been using asyncio and the http requests package aiohttp recently and I've run into a problem.
My application talks to a REST API.
for some API endpoints it makes sense to be able to dispatch multiple requests in parallel. Eg. sending different queries in the request to the same endpoint to get different data.
Though for some endpoints, this doesn't make sense. As in the endpoint always takes the same arguments (authentication) and returns requested information. (No point asking for the same data multiple times before the server has responded once) For these endpoints I need to enforce a 'serial' flow of requests. In that my program should only be able to send a request when it's not waiting for a response. (the typical behavior of blocking requests).
Of course I don't want to block.
This is an abstraction of what I intend to do. Essentially wrap the endpoint in an async generator that enforces this serial behavior.
I feel like I'm reinventing the wheel, Is there a common solution to this issue?
import asyncio
from time import sleep
# Encapsulate the idea of an endpoint that can't handle multiple requests
async def serialendpoint():
count = 0
while True:
count += 1
await asyncio.sleep(2)
yield str(count)
# Pretend client object
class ExampleClient(object):
gen = serialendpoint()
# Simulate a http request that sends multiple requests
async def simulate_multiple_http_requests(self):
print(await self.gen.asend(None))
print(await self.gen.asend(None))
print(await self.gen.asend(None))
print(await self.gen.asend(None))
async def other_stuff():
for _ in range(6):
await asyncio.sleep(1)
print('doing async stuff')
client = ExampleClient()
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.gather(client.simulate_multiple_http_requests(),
client.simulate_multiple_http_requests(),
other_stuff()))
outputs
doing async stuff
1
doing async stuff
doing async stuff
2
doing async stuff
doing async stuff
3
doing async stuff
4
5
6
7
8
update
This is the actual async generator I implemented:
All the endpoints that require serial behavior get assigned a serial_request_async_generator during the import phase. Which meant I couldn't initialize them with an await 'async_gen'.asend(None) as the await is only allowed in an async coroutine. The compromise is that every serial request at runtime must .asend(None) before asending the actual arguments. There must be a better way!
async def serial_request_async_generator():
args, kwargs = yield
while True:
yield await request(*args, **kwargs) # request is an aiohttp request
args, kwargs = yield

Collecting results from python coroutines before loop finishes

I have a python discord bot built with discord.py, meaning the entire program runs inside an event loop.
The function I'm working on involves making several hundred HTTP requests and add the results to a final list. It takes about two minutes to do these in order, so I'm using aiohttp to make them async. The related parts of my code are identical to the quickstart example in the aiohttp docs, but it's throwing a RuntimeError: Session is closed. The methodology was taken from an example at https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-aiohttp.html under 'Fetch multiple URLs'.
async def searchPostList(postUrls, searchString)
futures = []
async with aiohttp.ClientSession() as session:
for url in postUrls:
task = asyncio.ensure_future(searchPost(url,searchString,session))
futures.append(task)
return await asyncio.gather(*futures)
async def searchPost(url,searchString,session)):
async with session.get(url) as response:
page = await response.text()
#Assorted verification and parsing
Return data
I don't know why this error turns up since my code is so similar to two presumably functional examples. The event loop itself is working fine. It runs forever, since this is a bot application.
In the example you linked, the gathering of results was within the async with block. If you do it outside, there's no guarantee that the session won't close before the requests are even made!
Moving your return statement inside the block should work:
async with aiohttp.ClientSession() as session:
for url in postUrls:
task = asyncio.ensure_future(searchPost(url,searchString,session))
futures.append(task)
return await asyncio.gather(*futures)

Categories

Resources