Python aiohttp ClientSession requests memory leak?

Python aiohttp ClientSession requests memory leak? - python

I believe I have unearthed a memory leak in my long-lived application when using aiohttp ClientSession requests. If each coroutine which makes a request is awaited sequentially, then all seems fine. However there seems to be a leak of request context manager objects when run concurrently.
Please consider the following example code:
import logging
import tracemalloc
import asyncio
import aiohttp
async def log_allocations_coro():
while True:
await asyncio.sleep(120)
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics('lineno')
str_list = [str(x) for x in top_stats[:5]]
logging.info("\n".join(str_list))
async def do_request():
try:
async with session.request("GET", "http://192.168.1.1") as response:
text = await response.text()
except:
logging.exception("Request failed")
async def main():
tracemalloc.start()
asyncio.ensure_future(log_allocations_coro())
timeout = aiohttp.ClientTimeout(total=1)
global session
session = aiohttp.ClientSession(timeout=timeout)
while True:
tasks = [ do_request(), do_request() ]
await asyncio.gather(*tasks)
await asyncio.sleep(2)
if __name__ == '__main__':
logging.basicConfig(format='%(asctime)s %(message)s', level=logging.INFO)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
The tracemalloc coroutine logs memory allocations every two minutes. This shows the count for allocations in aiohttp/client.py where request() returns a _RequestContextManager increasing over time, initially quickly but then slows until reaching a peak and then seems to be fairly stable.
However, It has then been observed that if there is a network problem and requests start to fail then the count ramps back up again - and doesn't come back down after the problem has been resolved.
Is this a leak? If so, is there a way to work around it?
Thanks for reading!

Related

python for each run async function without await and parallel

I have 10 links in my CSV which I'm trying to run all at the same time in a loop from getTasks function. However, the way it's working now, it send a request to link 1, waits for it to complete, then link 2, etc, etc. I want the 10 links that I have to run all whenever startTask is called, leading to 10 requests a second.
Anyone know how to code that using the code below? Thanks in advance.
import requests
from bs4 import BeautifulSoup
import asyncio
def getTasks(tasks):
for task in tasks:
asyncio.run(startTask(task))
async def startTask(task):
success = await getProduct(task)
if success is None:
return startTask(task)
success = await addToCart(task)
if success is None:
return startTask(task)
...
...
...
getTasks(tasks)

First of all, to make your requests sent concurrently, you should use the aiohttp instead of the requests package that blocks I/O. And use the asyncio's semaphore to limit the count of concurrent processes at the same time.
import asyncio
import aiohttp
# read links from CSV
links = [
...
]
semaphore = asyncio.BoundedSemaphore(10)
# 10 is the max count of concurrent tasks
# that can be processed at the same time.
# In this case, tasks are requests.
async def async_request(url):
async with aiohttp.ClientSession() as session:
async with semaphore, session.get(url) as response:
return await response.text()
async def main():
result = await asyncio.gather(*[
async_request(link) for link in links
])
print(result) # [response1, response2, ...]
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
loop.close()

get live price in milliseconds (Binance Websocket)

how can i change my code so i get the informations every 100 milliseconds ?
import asyncio
from binance import AsyncClient, BinanceSocketManager
async def main():
client = await AsyncClient.create()
bm = BinanceSocketManager(client)
# start any sockets here, i.e a trade socket
ts = bm.trade_socket('BTCBUSD')
# then start receiving messages
async with ts as tscm:
while True:
res = await tscm.recv()
print(res)
await client.close_connection()
if __name__ == "__main__":
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
i apperciate every answer i can get , thanks a lot !

You asked about "how to get live price in milliseconds (Binance Websocket)"
Here is a list of available streams on binance with the available options for "update speed": https://github.com/binance/binance-spot-api-docs/blob/master/web-socket-streams.md#detailed-stream-information
For my understand most stream types are updating once a second (update speed: 1000ms).
Only depth streams are possible to update 10 times per second (1000ms and 100ms interval)
Trade streams (agg and normal) and Bookticker (individual and all) are in real time. That means, as soon this info is available, you will receive it...
If there is no trade, you will not receive anything, because there is no change in price... but as soon a trade happens, it will get reported to you.
If you want to know, the current buy and sell price for that an asset is available you can use the bookticker which is much less data compared to depth and diff. depth streams... if you need more than the first positions of the current orderbook i recommend using a local depth cache: https://www.lucit.tech/unicorn-binance-local-depth-cache.html
To get a stable websocket connection i recommend using UNICORN Binance WebSocket API, it catches most exceptions and reconnects automatically after a disconnect, it uses asyncio inside (callback function is inside an event loop) and the syntax to use it is easy:
from unicorn_binance_websocket_api.manager import BinanceWebSocketApiManager
def process_new_receives(stream_data, stream_buffer_name=False):
print(str(stream_data))
ubwa = BinanceWebSocketApiManager(exchange="binance.com")
ubwa.create_stream('trade',
['ethbtc', 'btcusdt', 'bnbbtc', 'ethbtc'],
process_stream_data=process_new_receives)

Since you seem in a rush, below is what I use although I'm using the websockets library to make the calls. I'll take a look at the binance api when I have some more time to see if I can get the calls to be faster but this should hopefully achieve what you want.
You can change the delay between the requests by changing the time of sleep in await asyncio.sleep(0.5) but if you put it any lower than 0.5 seconds it will trigger an error: received 1008 (policy violation) Too many requests; then sent 1008 (policy violation) Too many requests
import asyncio
import websockets
import json
msg = {"method": "SUBSCRIBE", "params":
[
"btcusdt#depth"
],
"id": 1
}
async def call_api():
async with websockets.connect('wss://stream.binance.com:9443/ws/btcusdt#depth') as ws:
while True:
await ws.send(json.dumps(msg))
response = await asyncio.wait_for(ws.recv(), timeout=2)
response = json.loads(response)
print(response)
await asyncio.sleep(0.5)
asyncio.get_event_loop().run_until_complete(call_api())

Try this out:
import asyncio
import websockets
import json
async def hello():
async with websockets.connect("wss://stream.binance.com:9443/ws/btcusdt#bookTicker") as ws:
while True:
response = await asyncio.wait_for(ws.recv(), timeout=2)
response=json.loads(response)
print(response)
await asyncio.sleep(0.5)
asyncio.get_event_loop().run_until_complete(hello())

No speedup using asyncio despite awaiting API response

I am running a program that makes three different requests from a rest api. data, indicator, request functions all fetch data from BitMEX's api using a wrapper i've made.
I have used asyncio to try to speed up the process so that while i am waiting on a response from previous request, it can begin to make another one.
However, my asynchronous version is not running any quicker for some reason. The code works and as far as I know, I have set everything up correctly. But there could be something wrong with how I am setting up the coroutines?
Here is the asynchronous version:
import time
import asyncio
from bordemwrapper import BitMEXData, BitMEXFunctions
'''
asynchronous I/O
'''
async def data():
data = BitMEXData().get_ohlcv(symbol='XBTUSD', timeframe='1h',
instances=25)
await asyncio.sleep(0)
return data
async def indicator():
indicator = BitMEXData().get_indicator(symbol='XBTUSD',
timeframe='1h', indicator='RSI', period=20, source='close',
instances=25)
await asyncio.sleep(0)
return indicator
async def request():
request = BitMEXFunctions().get_price()
await asyncio.sleep(0)
return request
async def chain():
data_ = await data()
indicator_ = await indicator()
request_ = await request()
return data_, indicator_, request_
async def main():
await asyncio.gather(chain())
if __name__ == '__main__':
start = time.perf_counter()
asyncio.run(main())
end = time.perf_counter()
print('process finished in {} seconds'.format(end - start))

Unfortunately, asyncio isn't magic. Although you've put them in async functions, the BitMEXData().get_<foo> functions are not themselves async (i.e. you can't await them), and therefore block while they run. The concurrency in asyncio can only occur while awaiting something.
You'll need a library which makes the actual HTTP requests asynchronously, like aiohttp. It sounds like you wrote bordemwrapper yourself - you should rewrite the get_<foo> functions to use asynchronous HTTP requests. Feel free to submit a separate question if you need help with that.

Aiohttp - should I keep the session alive 24/7 when polling restful api?

Sorry, library first-timer here. I am polling a restful endpoint every 10 seconds.
Its not obvious to me which of the following is appropriate:
import aiohttp
import asyncio
async def poll(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as r:
return await r.text()
async def main():
while True:
await asyncio.sleep(10)
print(await poll('http://example.com/api'))
loop = asyncio.get_event_loop()
loop.create_task(main())
loop.run_forever()
Or the session variable persists forever:
import aiohttp
import asyncio
async def poll(url):
async with aiohttp.ClientSession() as session:
await asyncio.sleep(10)
async with session.get(url) as r:
print(await r.text())
loop = asyncio.get_event_loop()
loop.create_task(poll('http://example.com/api'))
loop.run_forever()
I expect the latter is desirable, but coming from the non-asynchronous requests library, I'm not used to the idea of sessions. Will I actually experience faster response times because of connection pooling or other things?

From official document:
Don’t create a session per request. Most likely you need a
session per application which performs all requests altogether.
A session contains a connection pool inside. Connection reusage and
keep-alives (both are on by default) may speed up total performance.
Surely the latter one is better and definitely you will have a faster experience.

aiohttp + uvloop parallel HTTP requests are slower than without uvloop

I'm writing a script to make millions of API calls in parallel.
I'm using Python 3.6 with aiohttp for this purpose.
I was expecting that uvloop would make it faster, but it seems to have made it slower. Am I doing something wrong?
with uvloop: 22 seconds
without uvloop: 15 seconds
import asyncio
import aiohttp
import uvloop
import time
import logging
from aiohttp import ClientSession, TCPConnector
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger()
urls = ["http://www.yahoo.com","http://www.bbcnews.com","http://www.cnn.com","http://www.buzzfeed.com","http://www.walmart.com","http://www.emirates.com","http://www.kayak.com","http://www.expedia.com","http://www.apple.com","http://www.youtube.com"]
bigurls = 10 * urls
def run(enable_uvloop):
try:
if enable_uvloop:
loop = uvloop.new_event_loop()
else:
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
start = time.time()
conn = TCPConnector(limit=5000, use_dns_cache=True, loop=loop, verify_ssl=False)
with ClientSession(connector=conn) as session:
tasks = asyncio.gather(*[asyncio.ensure_future(do_request(url, session)) for url in bigurls]) # tasks to do
results = loop.run_until_complete(tasks) # loop until done
end = time.time()
logger.debug('total time:')
logger.debug(end - start)
return results
loop.close()
except Exception as e:
logger.error(e, exc_info=True)
async def do_request(url, session):
"""
"""
try:
async with session.get(url) as response:
resp = await response.text()
return resp
except Exception as e:
logger.error(e, exc_info=True)
run(True)
#run(False)

aiohttp recommends to use aiodns
also, as i remember, this with ClientSession(connector=conn) as session: should be async

You're not alone; I actually just got similar results (which led me to google my findings and brought me here).
My experiment involves running 500 concurrent GET requests to Google.com using aiohttp.
Here is the code for reference:
import asyncio, aiohttp, concurrent.futures
from datetime import datetime
import uvloop
class UVloopTester():
def __init__(self):
self.timeout = 20
self.threads = 500
self.totalTime = 0
self.totalRequests = 0
#staticmethod
def timestamp():
return f'[{datetime.now().strftime("%H:%M:%S")}]'
async def getCheck(self):
async with aiohttp.ClientSession() as session:
response = await session.get('https://www.google.com', timeout=self.timeout)
response.close()
await session.close()
return True
async def testRun(self, id):
now = datetime.now()
try:
if await self.getCheck():
elapsed = (datetime.now() - now).total_seconds()
print(f'{self.timestamp()} Request {id} TTC: {elapsed}')
self.totalTime += elapsed
self.totalRequests += 1
except concurrent.futures._base.TimeoutError: print(f'{self.timestamp()} Request {id} timed out')
async def main(self):
await asyncio.gather(*[asyncio.ensure_future(self.testRun(x)) for x in range(self.threads)])
def start(self):
# comment these lines to toggle
uvloop.install()
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
loop = asyncio.get_event_loop()
now = datetime.now()
loop.run_until_complete(self.main())
elapsed = (datetime.now() - now).total_seconds()
print(f'{self.timestamp()} Main TTC: {elapsed}')
print()
print(f'{self.timestamp()} Average TTC per Request: {self.totalTime / self.totalRequests}')
if len(asyncio.Task.all_tasks()) > 0:
for task in asyncio.Task.all_tasks(): task.cancel()
try: loop.run_until_complete(asyncio.gather(*asyncio.Task.all_tasks()))
except asyncio.CancelledError: pass
loop.close()
test = UVloopTester()
test.start()
I haven't planned out and executed any sort of careful experiment where I'm logging my findings and calculating standard deviations and p-values. But I have run this a (tiring) number of times and have come up with the following results.
Running without uvloop:
loop.run_until_complete(main()) takes about 10 seconds.
average time to complete for request takes about 4 seconds.
Running with uvloop:
loop.run_until_complete(main()) takes about 16 seconds.
average time to complete for request takes about 8.5 seconds.
I've shared this code with a friend of mine who is actually the one who suggested I try uvloop (since he gets a speed boost from it). Upon running it several times, his results confirm that he does in fact see an increase in speed from using uvloop (shorter time to complete for both main() and requests on average).
Our findings lead me to believe that the differences in our findings have to do with our setups: I'm using a Debian virtual machine with 8 GB RAM on a mid-tier laptop while he's using a native Linux desktop with a lot more 'muscle' under the hood.
My answer to your question is: No, I do not believe you are doing anything wrong because I am experiencing the same results and it does not appear that I am doing anything wrong although any constructive criticism is welcome and appreciated.
I wish I could be of more help; I hope my chiming in can be of some use.

I tried a similar experiment and see no real difference between uvloop and asyncio event loops for parallel http GET's:
asyncio event loop: avg=3.6285968542099 s. stdev=0.5583842811362075 s.
uvloop event loop: avg=3.419699764251709 s. stdev=0.13423859428541632 s.
It might be that the noticeable benefits of uvloop come into play when it is used in server code, i.e. for handling many incoming requests.
Code:
import time
from statistics import mean, stdev
import asyncio
import uvloop
import aiohttp
urls = [
'https://aws.amazon.com', 'https://google.com', 'https://microsoft.com', 'https://www.oracle.com/index.html'
'https://www.python.org', 'https://nodejs.org', 'https://angular.io', 'https://www.djangoproject.com',
'https://reactjs.org', 'https://www.mongodb.com', 'https://reinvent.awsevents.com',
'https://kafka.apache.org', 'https://github.com', 'https://slack.com', 'https://authy.com',
'https://cnn.com', 'https://fox.com', 'https://nbc.com', 'https://www.aljazeera.com',
'https://fly4.emirates.com', 'https://www.klm.com', 'https://www.china-airlines.com',
'https://en.wikipedia.org/wiki/List_of_Unicode_characters', 'https://en.wikipedia.org/wiki/Windows-1252'
]
def timed(func):
async def wrapper():
start = time.time()
await func()
return time.time() - start
return wrapper
#timed
async def main():
conn = aiohttp.TCPConnector(use_dns_cache=False)
async with aiohttp.ClientSession(connector=conn) as session:
coroutines = [fetch(session, url) for url in urls]
await asyncio.gather(*coroutines)
async def fetch(session, url):
async with session.get(url) as resp:
await resp.text()
asycio_results = [asyncio.run(main()) for i in range(10)]
print(f'asyncio event loop: avg={mean(asycio_results)} s. stdev={stdev(asycio_results)} s.')
# Change to uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
uvloop_results = [asyncio.run(main()) for i in range(10)]
print(f'uvloop event loop: avg={mean(uvloop_results)} s. stdev={stdev(uvloop_results)} s.')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python aiohttp ClientSession requests memory leak? - python

Related

python for each run async function without await and parallel

get live price in milliseconds (Binance Websocket)

No speedup using asyncio despite awaiting API response

Aiohttp - should I keep the session alive 24/7 when polling restful api?

aiohttp + uvloop parallel HTTP requests are slower than without uvloop

Categories

Resources