aiohttp + uvloop parallel HTTP requests are slower than without uvloop - python

I'm writing a script to make millions of API calls in parallel.
I'm using Python 3.6 with aiohttp for this purpose.
I was expecting that uvloop would make it faster, but it seems to have made it slower. Am I doing something wrong?
with uvloop: 22 seconds
without uvloop: 15 seconds
import asyncio
import aiohttp
import uvloop
import time
import logging
from aiohttp import ClientSession, TCPConnector
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger()
urls = ["http://www.yahoo.com","http://www.bbcnews.com","http://www.cnn.com","http://www.buzzfeed.com","http://www.walmart.com","http://www.emirates.com","http://www.kayak.com","http://www.expedia.com","http://www.apple.com","http://www.youtube.com"]
bigurls = 10 * urls
def run(enable_uvloop):
try:
if enable_uvloop:
loop = uvloop.new_event_loop()
else:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
start = time.time()
conn = TCPConnector(limit=5000, use_dns_cache=True, loop=loop, verify_ssl=False)
with ClientSession(connector=conn) as session:
tasks = asyncio.gather(*[asyncio.ensure_future(do_request(url, session)) for url in bigurls]) # tasks to do
results = loop.run_until_complete(tasks) # loop until done
end = time.time()
logger.debug('total time:')
logger.debug(end - start)
return results
loop.close()
except Exception as e:
logger.error(e, exc_info=True)
async def do_request(url, session):
"""
"""
try:
async with session.get(url) as response:
resp = await response.text()
return resp
except Exception as e:
logger.error(e, exc_info=True)
run(True)
#run(False)

aiohttp recommends to use aiodns
also, as i remember, this with ClientSession(connector=conn) as session: should be async

You're not alone; I actually just got similar results (which led me to google my findings and brought me here).
My experiment involves running 500 concurrent GET requests to Google.com using aiohttp.
Here is the code for reference:
import asyncio, aiohttp, concurrent.futures
from datetime import datetime
import uvloop
class UVloopTester():
def __init__(self):
self.timeout = 20
self.threads = 500
self.totalTime = 0
self.totalRequests = 0
#staticmethod
def timestamp():
return f'[{datetime.now().strftime("%H:%M:%S")}]'
async def getCheck(self):
async with aiohttp.ClientSession() as session:
response = await session.get('https://www.google.com', timeout=self.timeout)
response.close()
await session.close()
return True
async def testRun(self, id):
now = datetime.now()
try:
if await self.getCheck():
elapsed = (datetime.now() - now).total_seconds()
print(f'{self.timestamp()} Request {id} TTC: {elapsed}')
self.totalTime += elapsed
self.totalRequests += 1
except concurrent.futures._base.TimeoutError: print(f'{self.timestamp()} Request {id} timed out')
async def main(self):
await asyncio.gather(*[asyncio.ensure_future(self.testRun(x)) for x in range(self.threads)])
def start(self):
# comment these lines to toggle
uvloop.install()
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
loop = asyncio.get_event_loop()
now = datetime.now()
loop.run_until_complete(self.main())
elapsed = (datetime.now() - now).total_seconds()
print(f'{self.timestamp()} Main TTC: {elapsed}')
print()
print(f'{self.timestamp()} Average TTC per Request: {self.totalTime / self.totalRequests}')
if len(asyncio.Task.all_tasks()) > 0:
for task in asyncio.Task.all_tasks(): task.cancel()
try: loop.run_until_complete(asyncio.gather(*asyncio.Task.all_tasks()))
except asyncio.CancelledError: pass
loop.close()
test = UVloopTester()
test.start()
I haven't planned out and executed any sort of careful experiment where I'm logging my findings and calculating standard deviations and p-values. But I have run this a (tiring) number of times and have come up with the following results.
Running without uvloop:
loop.run_until_complete(main()) takes about 10 seconds.
average time to complete for request takes about 4 seconds.
Running with uvloop:
loop.run_until_complete(main()) takes about 16 seconds.
average time to complete for request takes about 8.5 seconds.
I've shared this code with a friend of mine who is actually the one who suggested I try uvloop (since he gets a speed boost from it). Upon running it several times, his results confirm that he does in fact see an increase in speed from using uvloop (shorter time to complete for both main() and requests on average).
Our findings lead me to believe that the differences in our findings have to do with our setups: I'm using a Debian virtual machine with 8 GB RAM on a mid-tier laptop while he's using a native Linux desktop with a lot more 'muscle' under the hood.
My answer to your question is: No, I do not believe you are doing anything wrong because I am experiencing the same results and it does not appear that I am doing anything wrong although any constructive criticism is welcome and appreciated.
I wish I could be of more help; I hope my chiming in can be of some use.

I tried a similar experiment and see no real difference between uvloop and asyncio event loops for parallel http GET's:
asyncio event loop: avg=3.6285968542099 s. stdev=0.5583842811362075 s.
uvloop event loop: avg=3.419699764251709 s. stdev=0.13423859428541632 s.
It might be that the noticeable benefits of uvloop come into play when it is used in server code, i.e. for handling many incoming requests.
Code:
import time
from statistics import mean, stdev
import asyncio
import uvloop
import aiohttp
urls = [
'https://aws.amazon.com', 'https://google.com', 'https://microsoft.com', 'https://www.oracle.com/index.html'
'https://www.python.org', 'https://nodejs.org', 'https://angular.io', 'https://www.djangoproject.com',
'https://reactjs.org', 'https://www.mongodb.com', 'https://reinvent.awsevents.com',
'https://kafka.apache.org', 'https://github.com', 'https://slack.com', 'https://authy.com',
'https://cnn.com', 'https://fox.com', 'https://nbc.com', 'https://www.aljazeera.com',
'https://fly4.emirates.com', 'https://www.klm.com', 'https://www.china-airlines.com',
'https://en.wikipedia.org/wiki/List_of_Unicode_characters', 'https://en.wikipedia.org/wiki/Windows-1252'
]
def timed(func):
async def wrapper():
start = time.time()
await func()
return time.time() - start
return wrapper
#timed
async def main():
conn = aiohttp.TCPConnector(use_dns_cache=False)
async with aiohttp.ClientSession(connector=conn) as session:
coroutines = [fetch(session, url) for url in urls]
await asyncio.gather(*coroutines)
async def fetch(session, url):
async with session.get(url) as resp:
await resp.text()
asycio_results = [asyncio.run(main()) for i in range(10)]
print(f'asyncio event loop: avg={mean(asycio_results)} s. stdev={stdev(asycio_results)} s.')
# Change to uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
uvloop_results = [asyncio.run(main()) for i in range(10)]
print(f'uvloop event loop: avg={mean(uvloop_results)} s. stdev={stdev(uvloop_results)} s.')

Related

Running blocking function (f.e. requests) concurrently but asynchronous with Python

There is a function that blocks event loop (f.e. that function makes an API request). I need to make continuous stream of requests which will run in parallel but not synchronous. So every next request will be started before the previous request will be finished.
So I found this solved question with the loop.run_in_executer() solution and use it in the beginning:
import asyncio
import requests
#blocking_request_func() defined somewhere
async def main():
loop = asyncio.get_event_loop()
future1 = loop.run_in_executor(None, blocking_request_func, 'param')
future2 = loop.run_in_executor(None, blocking_request_func, 'param')
response1 = await future1
response2 = await future2
print(response1)
print(response2)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
this works well, requests run in parallel but there is a problem for my task - in this example we make group of tasks/futures in the beginning and then run this group synchronous. But I need something like this:
1. Sending request_1 and not awaiting when it's done.
(AFTER step 1 but NOT in the same time when step 1 starts):
2. Sending request_2 and not awaiting when it's done.
(AFTER step 2 but NOT in the same time when step 2 starts):
3. Sending request_3 and not awaiting when it's done.
(Request 1(or any other) gives the response)
(AFTER step 3 but NOT in the same time when step 3 starts):
4. Sending request_4 and not awaiting when it's done.
(Request 2(or any other) gives the response)
and so on...
I tried using asyncio.TaskGroup():
async def request_func():
global result #the list of results of requests defined somewhere in global area
loop = asyncio.get_event_loop()
result.append(await loop.run_in_executor(None, blocking_request_func, 'param')
await asyncio.sleep(0) #adding or removing this line gives the same result
async def main():
async with asyncio.TaskGroup() as tg:
for i in range(0, 10):
tg.create_task(request_func())
all these things gave the same result: first of all we defined group of tasks/futures and only then run this group synchronous and concurrently. But is there a way to run all these requests concurrently but "in the stream"?
I tried to make visualization if my explanation is not clear enough.
What I have for now
What I need
================ Update with the answer ===================
The most close answer however with some limitations:
import asyncio
import random
import time
def blockme(n):
x = random.random() * 2.0
time.sleep(x)
return n, x
def cb(fut):
print("Result", fut.result())
async def main():
#You need to control threads quantity
pool = concurrent.futures.ThreadPoolExecutor(max_workers=4)
loop = asyncio.get_event_loop()
futs = []
#You need to control requests per second
delay = 0.5
while await asyncio.sleep(delay, result=True):
fut = loop.run_in_executor(pool, blockme, n)
fut.add_done_callback(cb)
futs.append(fut)
#You need to control futures quantity, f.e. like this:
if len(futs)>40:
completed, futs = await asyncio.wait(futs,
timeout=5,
return_when=FIRST_COMPLETED)
asyncio.run(main())
I think this might be what you want. You don't have to await each request - the run_in_executor function returns a Future. Instead of awaiting that, you can attach a callback function:
import asyncio
import random
import time
def blockme(n):
x = random.random() * 2.0
time.sleep(x)
return n, x
def cb(fut):
print("Result", fut.result())
async def main():
loop = asyncio.get_event_loop()
futs = []
for n in range(20):
fut = loop.run_in_executor(None, blockme, n)
fut.add_done_callback(cb)
futs.append(fut)
await asyncio.gather(*futs)
# await asyncio.sleep(10)
asyncio.run(main())
All the requests are started at the beginning, but they don't all execute in parallel because the number of threads is limited by the ThreadPool. You can adjust the number of threads if you want.
Here I simulated a blocking call with time.sleep. I needed a way to prevent main() from ending before all the callbacks occurred, so I used gather for that purpose. You can also wait for some length of time, but gather is cleaner.
Apologies if I don't understand what you want. But I think you want to avoid using await for each call, and I tried to show one way you can do that.
This is directly referenced from Python documentation. The code snippet from documentation of asyncio library explains how you can run a blocking code concurrently using asyncio. It uses to_thread method to create task
you can find more here - https://docs.python.org/3/library/asyncio-task.html#running-in-threads
def blocking_io():
print(f"start blocking_io at {time.strftime('%X')}")
# Note that time.sleep() can be replaced with any blocking
# IO-bound operation, such as file operations.
time.sleep(1)
print(f"blocking_io complete at {time.strftime('%X')}")
async def main():
print(f"started main at {time.strftime('%X')}")
await asyncio.gather(
asyncio.to_thread(blocking_io),
asyncio.sleep(1))
print(f"finished main at {time.strftime('%X')}")
asyncio.run(main())

python for each run async function without await and parallel

I have 10 links in my CSV which I'm trying to run all at the same time in a loop from getTasks function. However, the way it's working now, it send a request to link 1, waits for it to complete, then link 2, etc, etc. I want the 10 links that I have to run all whenever startTask is called, leading to 10 requests a second.
Anyone know how to code that using the code below? Thanks in advance.
import requests
from bs4 import BeautifulSoup
import asyncio
def getTasks(tasks):
for task in tasks:
asyncio.run(startTask(task))
async def startTask(task):
success = await getProduct(task)
if success is None:
return startTask(task)
success = await addToCart(task)
if success is None:
return startTask(task)
...
...
...
getTasks(tasks)
First of all, to make your requests sent concurrently, you should use the aiohttp instead of the requests package that blocks I/O. And use the asyncio's semaphore to limit the count of concurrent processes at the same time.
import asyncio
import aiohttp
# read links from CSV
links = [
...
]
semaphore = asyncio.BoundedSemaphore(10)
# 10 is the max count of concurrent tasks
# that can be processed at the same time.
# In this case, tasks are requests.
async def async_request(url):
async with aiohttp.ClientSession() as session:
async with semaphore, session.get(url) as response:
return await response.text()
async def main():
result = await asyncio.gather(*[
async_request(link) for link in links
])
print(result) # [response1, response2, ...]
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()

Python Requests : How to send many post requests in the same time wait response the first and second

_1 = requests.post(logUrl,data=userDayta, headers=logHead)
i want send many post request like this in the same time
Here are two methods that work:
The reason that i posted both methods in full is that the examples given on the main website throw (RuntimeError: Event loop is closed) messages whereas both of these work.
Method 1: few lines of code, but longer run time (6.5 seconds):
import aiohttp
import asyncio
import time
start_time = time.time()
async def main():
async with aiohttp.ClientSession() as session:
for number in range(1, 151):
pokemon_url = f'https://pokeapi.co/api/v2/pokemon/{number}'
async with session.get(pokemon_url) as resp:
pokemon = await resp.json()
print(pokemon['name'])
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
# Wait 250 ms for the underlying SSL connections to close
loop.run_until_complete(asyncio.sleep(0.250))
loop.close()
Method 2: more code, but shorter run time (1.5 seconds):
import aiohttp
import asyncio
import time
start_time = time.time()
async def get_pokemon(session, url):
async with session.get(url) as resp:
pokemon = await resp.json()
return pokemon['name']
async def main():
async with aiohttp.ClientSession() as session:
tasks = []
for number in range(1, 151):
url = f'https://pokeapi.co/api/v2/pokemon/{number}'
tasks.append(asyncio.ensure_future(get_pokemon(session, url)))
original_pokemon = await asyncio.gather(*tasks)
for pokemon in original_pokemon:
print(pokemon)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
# Wait 250 ms for the underlying SSL connections to close
loop.run_until_complete(asyncio.sleep(0.250))
loop.close()
both methods are considerable faster than the equivalent synchronous code !!

Python aiohttp ClientSession requests memory leak?

I believe I have unearthed a memory leak in my long-lived application when using aiohttp ClientSession requests. If each coroutine which makes a request is awaited sequentially, then all seems fine. However there seems to be a leak of request context manager objects when run concurrently.
Please consider the following example code:
import logging
import tracemalloc
import asyncio
import aiohttp
async def log_allocations_coro():
while True:
await asyncio.sleep(120)
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
str_list = [str(x) for x in top_stats[:5]]
logging.info("\n".join(str_list))
async def do_request():
try:
async with session.request("GET", "http://192.168.1.1") as response:
text = await response.text()
except:
logging.exception("Request failed")
async def main():
tracemalloc.start()
asyncio.ensure_future(log_allocations_coro())
timeout = aiohttp.ClientTimeout(total=1)
global session
session = aiohttp.ClientSession(timeout=timeout)
while True:
tasks = [ do_request(), do_request() ]
await asyncio.gather(*tasks)
await asyncio.sleep(2)
if __name__ == '__main__':
logging.basicConfig(format='%(asctime)s %(message)s', level=logging.INFO)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
The tracemalloc coroutine logs memory allocations every two minutes. This shows the count for allocations in aiohttp/client.py where request() returns a _RequestContextManager increasing over time, initially quickly but then slows until reaching a peak and then seems to be fairly stable.
However, It has then been observed that if there is a network problem and requests start to fail then the count ramps back up again - and doesn't come back down after the problem has been resolved.
Is this a leak? If so, is there a way to work around it?
Thanks for reading!

limit number of concurrent requests aiohttp

I'm downloading images using aiohttp, and was wondering if there is a way to limit the number of open requests that haven't finished. This is the code I currently have:
async def get_images(url, session):
chunk_size = 100
# Print statement to show when a request is being made.
print(f'Making request to {url}')
async with session.get(url=url) as r:
with open('path/name.png', 'wb') as file:
while True:
chunk = await r.content.read(chunk_size)
if not chunk:
break
file.write(chunk)
# List of urls to get images from
urls = [...]
conn = aiohttp.TCPConnector(limit=3)
loop = asyncio.get_event_loop()
session = aiohttp.ClientSession(connector=conn, loop=loop)
loop.run_until_complete(asyncio.gather(*(get_images(url, session=session) for url in urls)))
The problem is, I threw a print statement in to show me when each request is being made and it is making almost 21 requests at once, instead of the 3 that I am wanting to limit it to (i.e., once an image is done downloading, it can move on to the next url in the list to get). I'm just wondering what I am doing wrong here.
Your limit setting works correctly. You made mistake while debugging.
As Mikhail Gerasimov pointed in the comment, you put your print() call in wrong place - it must be inside session.get() context.
In order to be confident that limit is respected, I tested your code against simple logging server - and test shows that the server receives exactly that number of connections that you set in TCPConnector. Here is the test:
import asyncio
import aiohttp
loop = asyncio.get_event_loop()
class SilentServer(asyncio.Protocol):
def connection_made(self, transport):
# We will know when the connection is actually made:
print('SERVER |', transport.get_extra_info('peername'))
async def get_images(url, session):
chunk_size = 100
# This log doesn't guarantee that we will connect,
# session.get() will freeze if you reach TCPConnector limit
print(f'CLIENT | Making request to {url}')
async with session.get(url=url) as r:
while True:
chunk = await r.content.read(chunk_size)
if not chunk:
break
urls = [f'http://127.0.0.1:1337/{x}' for x in range(20)]
conn = aiohttp.TCPConnector(limit=3)
session = aiohttp.ClientSession(connector=conn, loop=loop)
async def test():
await loop.create_server(SilentServer, '127.0.0.1', 1337)
await asyncio.gather(*(get_images(url, session=session) for url in urls))
loop.run_until_complete(test())
asyncio.Semaphore solves exactly this issue.
In your case it'll be something like this:
semaphore = asyncio.Semaphore(3)
async def get_images(url, session):
async with semaphore:
print(f'Making request to {url}')
# ...
You may also be interested to take a look at this ready-to-run code example that demonstrates how semaphore works.

Categories

Resources