Problem:
I'd like to use aiohttp producers to download large amount of files and aiofiles consumers pull files from the pool.
Here's my draft structure, any thoughts please?
async def producer() #downloader
async with session.get(url) as r:
#...something
await queue.put(r.read())
async def consumer() #writer
await queue.get()
#job for async aiofiles
queue.task_done()
async def main()
queue = asyncio.Queue()
async with aiohttp.ClientSession() as session:
for url in urls:
#...
asyncio.create_task(producer())
#append task to task list
for n in range(number_of_consumers):
asyncio.create_task(consumer())
#append task to task list
await queue.join()
await asyncio.gather(*tasks, return_exceptions=True)
Related
I'm using python 3.7 and trying to make a crawler that can go multiple domains asynchronously. I'm using for this asyncio and aiohttp but i'm experiencing problems with the aiohttp.ClientSession. This is my reduced code:
import aiohttp
import asyncio
async def fetch(session, url):
async with session.get(url) as response:
print(await response.text())
async def main():
loop = asyncio.get_event_loop()
async with aiohttp.ClientSession(loop=loop) as session:
cwlist = [loop.create_task(fetch(session, url)) for url in ['http://python.org', 'http://google.com']]
asyncio.gather(*cwlist)
if __name__ == "__main__":
asyncio.run(main())
The thrown exception is this:
_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=RuntimeError('Session is closed')>
What am i doing wrong here?
You forgot to await the asyncio.gather result:
async with aiohttp.ClientSession(loop=loop) as session:
cwlist = [loop.create_task(fetch(session, url)) for url in ['http://python.org', 'http://google.com']]
await asyncio.gather(*cwlist)
If you ever have an async with containing no await expressions you should be fairly suspicious.
When the user sends !bot in a special channel, the code runs a function that runs for 20-30 seconds (depends on some_var). My problem is that if several people write !bot , the code will reverse this in multiple threads. How do I queue for these requests?
I tried to understand asyncio, discord.ext.tasks, but can't figure out how it works
#client.command()
async def test(ctx, *args):
data = args
if data:
answer = some_function(data[0]) #works from 5 seconds to 1 minute or more
await ctx.send(answer)
Everything works well, but I just don't want to load the system so much, I need a loop to process requests first in - first out
You can use an asyncio.Queue to queue tasks, then process them sequentially in a background loop:
import asyncio
from discord.ext import commands, tasks
queue = asyncio.Queue()
bot = commands.Bot('!')
#tasks.loop(seconds=1.0)
async def executor():
task = await queue.get()
await task
queue.task_done()
#executor.before_loop
async def before():
await bot.wait_until_ready()
#bot.command()
async def example(ctx, num: int):
await queue.put(helper(ctx, num))
async def helper(ctx, num):
await asyncio.sleep(num)
await ctx.send(num)
executor.start()
bot.run('token')
Make some_function() to async then await it. Then all test command will be processed simultaneously.
async some_function(...):
# something to do
#client.command()
async def test(ctx, *args):
if args:
answer = await some_function(data[0])
await ctx.send(answer)
When I run this it lists off the websites in the database one by one with the response code and it takes about 10 seconds to run through a very small list. It should be way faster and isn't running asynchronously but I'm not sure why.
import dblogin
import aiohttp
import asyncio
import async_timeout
dbconn = dblogin.connect()
dbcursor = dbconn.cursor(buffered=True)
dbcursor.execute("SELECT thistable FROM adatabase")
website_list = dbcursor.fetchall()
async def fetch(session, url):
with async_timeout.timeout(30):
async with session.get(url, ssl=False) as response:
await response.read()
return response.status, url
async def main():
async with aiohttp.ClientSession() as session:
for all_urls in website_list:
url = all_urls[0]
resp = await fetch(session, url)
print(resp, url)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()
dbcursor.close()
dbconn.close()
This article explains the details. What you need to do is pass each fetch call in a Future object, and then pass a list of those to either asyncio.wait or asyncio.gather depending on your needs.
Your code would look something like this:
async def fetch(session, url):
with async_timeout.timeout(30):
async with session.get(url, ssl=False) as response:
await response.read()
return response.status, url
async def main():
tasks = []
async with aiohttp.ClientSession() as session:
for all_urls in website_list:
url = all_urls[0]
task = asyncio.create_task(fetch(session, url))
tasks.append(task)
responses = await asyncio.gather(*tasks)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
future = asyncio.create_task(main())
loop.run_until_complete(future)
Also, are you sure that loop.close() call is needed? The docs mention that
The loop must not be running when this function is called. Any pending callbacks will be discarded.
This method clears all queues and shuts down the executor, but does not wait for the executor to finish.
As mentioned in the docs and in the link that #user4815162342 posted, it is better to use the create_task method instead of the ensure_future method when we know that the argument is a coroutine. Note that this was added in Python 3.7, so previous versions should continue using ensure_future instead.
I need to parse repeatedly one link content. synchronous way gives me 2-3 responses per second, i need faster (yes, i know, that too fast is bad too)
I found some async examples, but all of them show how to handle result after all links are parsed, whereas i need to parse it immediately after receiving, something like this, but this code doesn't give any speed improvement:
import aiohttp
import asyncio
import time
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
while True:
async with aiohttp.ClientSession() as session:
html = await fetch(session, 'https://example.com')
print(time.time())
#do_something_with_html(html)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
but this code doesn't give any speed improvement
asyncio (and async/concurrency in general) gives speed improvement for I/O things that interleave each other.
When everything you do is await something and you never create any parallel tasks (using asyncio.create_task(), asyncio.ensure_future() etc.) then you are basically doing the classic synchronous programming :)
So, how to make the requests faster:
import aiohttp
import asyncio
import time
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def check_link(session):
html = await fetch(session, 'https://example.com')
print(time.time())
#do_something_with_html(html)
async def main():
async with aiohttp.ClientSession() as session:
while True:
asyncio.create_task(check_link(session))
await asyncio.sleep(0.05)
asyncio.run(main())
Notice: the async with aiohttp.Cliensession() as session: must be above (outside) while True: for this to work. Actually, having a single ClientSession() for all your requests is a good practice anyway.
I gave up using async, threading solved my problem, thanks to this answer
https://stackoverflow.com/a/23102874/5678457
from threading import Thread
import requests
import time
class myClassA(Thread):
def __init__(self):
Thread.__init__(self)
self.daemon = True
self.start()
def run(self):
while True:
r = requests.get('https://ex.com')
print(r.status_code, time.time())
for i in range(5):
myClassA()
I am becoming familiar with the libraries, but I am stumped by the following situation:
I wish to continuously process update messages from two different websites' websockets.
But I am having trouble achieving this with a single session variable, given that aiohttp.ClientSession() objects should exist within coroutines.
import asyncio
import aiohttp
url1 = 'wss://example.com'
async def main():
session = aiohttp.ClientSession()
async with session.ws_connect(url1) as ws1:
async for msg in ws1:
# do some processing
loop = asyncio.get_event_loop()
loop.create_task(main())
loop.run_forever()
The above will work for a single websocket connection. But because async for msg in ws: is an infinite loop, I can't see where I can put the ws2 version of this async loop.
Since each websocket needs its own infinite loop, you can abstract it into a coroutine that serves that websocket and accepts the session from its caller. The calling coroutine will create the serve tasks "in the background" using loop.create_task, which can also be called from a coroutine:
async def setup():
session = aiohttp.ClientSession()
loop = asyncio.get_event_loop()
loop.create_task(serve(url1, session))
loop.create_task(serve(url2, session))
async def serve(url, session):
async with session.ws_connect(url) as ws:
async for msg in ws:
...
loop = asyncio.get_event_loop()
loop.run_until_complete(setup())
loop.run_forever()