Asyncio not running Aiohttp requests in parallel - python

I want to run many HTTP requests in parallel using python.
I tried this module named aiohttp with asyncio.
import aiohttp
import asyncio
async def main():
async with aiohttp.ClientSession() as session:
for i in range(10):
async with session.get('https://httpbin.org/get') as response:
html = await response.text()
print('done' + str(i))
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
I expect it to execute all the requests in parallel, but they are executed one by one.
Although, I later solved this using threading, but I would like to know what's wrong with this?

You need to make the requests in a concurrent manner. Currently, you have a single task defined by main() and so the http requests are run in a serial manner for that task.
You could also consider using asyncio.run() if you are using Python version 3.7+ that abstracts out creation of event loop:
import aiohttp
import asyncio
async def getResponse(session, i):
async with session.get('https://httpbin.org/get') as response:
html = await response.text()
print('done' + str(i))
async def main():
async with aiohttp.ClientSession() as session:
tasks = [getResponse(session, i) for i in range(10)] # create list of tasks
await asyncio.gather(*tasks) # execute them in concurrent manner
asyncio.run(main())

Related

python for each run async function without await and parallel

I have 10 links in my CSV which I'm trying to run all at the same time in a loop from getTasks function. However, the way it's working now, it send a request to link 1, waits for it to complete, then link 2, etc, etc. I want the 10 links that I have to run all whenever startTask is called, leading to 10 requests a second.
Anyone know how to code that using the code below? Thanks in advance.
import requests
from bs4 import BeautifulSoup
import asyncio
def getTasks(tasks):
for task in tasks:
asyncio.run(startTask(task))
async def startTask(task):
success = await getProduct(task)
if success is None:
return startTask(task)
success = await addToCart(task)
if success is None:
return startTask(task)
...
...
...
getTasks(tasks)
First of all, to make your requests sent concurrently, you should use the aiohttp instead of the requests package that blocks I/O. And use the asyncio's semaphore to limit the count of concurrent processes at the same time.
import asyncio
import aiohttp
# read links from CSV
links = [
...
]
semaphore = asyncio.BoundedSemaphore(10)
# 10 is the max count of concurrent tasks
# that can be processed at the same time.
# In this case, tasks are requests.
async def async_request(url):
async with aiohttp.ClientSession() as session:
async with semaphore, session.get(url) as response:
return await response.text()
async def main():
result = await asyncio.gather(*[
async_request(link) for link in links
])
print(result) # [response1, response2, ...]
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()

asyncio tasks using aiohttp.ClientSession

I'm using python 3.7 and trying to make a crawler that can go multiple domains asynchronously. I'm using for this asyncio and aiohttp but i'm experiencing problems with the aiohttp.ClientSession. This is my reduced code:
import aiohttp
import asyncio
async def fetch(session, url):
async with session.get(url) as response:
print(await response.text())
async def main():
loop = asyncio.get_event_loop()
async with aiohttp.ClientSession(loop=loop) as session:
cwlist = [loop.create_task(fetch(session, url)) for url in ['http://python.org', 'http://google.com']]
asyncio.gather(*cwlist)
if __name__ == "__main__":
asyncio.run(main())
The thrown exception is this:
_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=RuntimeError('Session is closed')>
What am i doing wrong here?
You forgot to await the asyncio.gather result:
async with aiohttp.ClientSession(loop=loop) as session:
cwlist = [loop.create_task(fetch(session, url)) for url in ['http://python.org', 'http://google.com']]
await asyncio.gather(*cwlist)
If you ever have an async with containing no await expressions you should be fairly suspicious.

Handling async responses immediately

I need to parse repeatedly one link content. synchronous way gives me 2-3 responses per second, i need faster (yes, i know, that too fast is bad too)
I found some async examples, but all of them show how to handle result after all links are parsed, whereas i need to parse it immediately after receiving, something like this, but this code doesn't give any speed improvement:
import aiohttp
import asyncio
import time
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
while True:
async with aiohttp.ClientSession() as session:
html = await fetch(session, 'https://example.com')
print(time.time())
#do_something_with_html(html)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
but this code doesn't give any speed improvement
asyncio (and async/concurrency in general) gives speed improvement for I/O things that interleave each other.
When everything you do is await something and you never create any parallel tasks (using asyncio.create_task(), asyncio.ensure_future() etc.) then you are basically doing the classic synchronous programming :)
So, how to make the requests faster:
import aiohttp
import asyncio
import time
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def check_link(session):
html = await fetch(session, 'https://example.com')
print(time.time())
#do_something_with_html(html)
async def main():
async with aiohttp.ClientSession() as session:
while True:
asyncio.create_task(check_link(session))
await asyncio.sleep(0.05)
asyncio.run(main())
Notice: the async with aiohttp.Cliensession() as session: must be above (outside) while True: for this to work. Actually, having a single ClientSession() for all your requests is a good practice anyway.
I gave up using async, threading solved my problem, thanks to this answer
https://stackoverflow.com/a/23102874/5678457
from threading import Thread
import requests
import time
class myClassA(Thread):
def __init__(self):
Thread.__init__(self)
self.daemon = True
self.start()
def run(self):
while True:
r = requests.get('https://ex.com')
print(r.status_code, time.time())
for i in range(5):
myClassA()

Using Requests library to make asynchronous requests with Python 3.7

I need to make asynchronous requests using the Requests library. In Python 3.7 if I try from requests import async I get SyntaxError: invalid syntax.
async has become a reserved with in Python 3.7. How to I get around this situation?
Lukasa who is with the requests lib said:
At the current time there are no plans to support async and await. This is not because they aren't a good idea: they are. It's because to use them requires quite substantial code changes.
Right now requests is a purely synchronous library that, at the bottom of its stack, uses httplib to send and receive data. We cannot move to an async model unless we replace httplib. The best we could do is provide a shorthand to run a request in a thread, but asyncio already has just such a shorthand, so I don't believe it would be valuable.
Right now I am quietly looking at whether we can rewrite requests to work just as well in a synchronous environment as in an async one. However, the reality is that doing so will be a lot of work, involving rewriting a lot of our stack, and may not happen for many years, if ever.
But don't worry aiohttp is very similar to requests.
Here's an example.
import aiohttp
import asyncio
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
async with aiohttp.ClientSession() as session:
html = await fetch(session, 'http://python.org')
print(html)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
You can use asyncio to make asynchronous requests. Here is an example:
import asyncio
import requests
async def main():
loop = asyncio.get_event_loop()
futures = [
loop.run_in_executor(
None,
requests.get,
'http://example.org/'
)
for i in range(20)
]
for response in await asyncio.gather(*futures):
pass
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

Aiohttp - should I keep the session alive 24/7 when polling restful api?

Sorry, library first-timer here. I am polling a restful endpoint every 10 seconds.
Its not obvious to me which of the following is appropriate:
import aiohttp
import asyncio
async def poll(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as r:
return await r.text()
async def main():
while True:
await asyncio.sleep(10)
print(await poll('http://example.com/api'))
loop = asyncio.get_event_loop()
loop.create_task(main())
loop.run_forever()
Or the session variable persists forever:
import aiohttp
import asyncio
async def poll(url):
async with aiohttp.ClientSession() as session:
await asyncio.sleep(10)
async with session.get(url) as r:
print(await r.text())
loop = asyncio.get_event_loop()
loop.create_task(poll('http://example.com/api'))
loop.run_forever()
I expect the latter is desirable, but coming from the non-asynchronous requests library, I'm not used to the idea of sessions. Will I actually experience faster response times because of connection pooling or other things?
From official document:
Don’t create a session per request. Most likely you need a
session per application which performs all requests altogether.
A session contains a connection pool inside. Connection reusage and
keep-alives (both are on by default) may speed up total performance.
Surely the latter one is better and definitely you will have a faster experience.

Categories

Resources