I need to parse repeatedly one link content. synchronous way gives me 2-3 responses per second, i need faster (yes, i know, that too fast is bad too)
I found some async examples, but all of them show how to handle result after all links are parsed, whereas i need to parse it immediately after receiving, something like this, but this code doesn't give any speed improvement:
import aiohttp
import asyncio
import time
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
while True:
async with aiohttp.ClientSession() as session:
html = await fetch(session, 'https://example.com')
print(time.time())
#do_something_with_html(html)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
but this code doesn't give any speed improvement
asyncio (and async/concurrency in general) gives speed improvement for I/O things that interleave each other.
When everything you do is await something and you never create any parallel tasks (using asyncio.create_task(), asyncio.ensure_future() etc.) then you are basically doing the classic synchronous programming :)
So, how to make the requests faster:
import aiohttp
import asyncio
import time
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def check_link(session):
html = await fetch(session, 'https://example.com')
print(time.time())
#do_something_with_html(html)
async def main():
async with aiohttp.ClientSession() as session:
while True:
asyncio.create_task(check_link(session))
await asyncio.sleep(0.05)
asyncio.run(main())
Notice: the async with aiohttp.Cliensession() as session: must be above (outside) while True: for this to work. Actually, having a single ClientSession() for all your requests is a good practice anyway.
I gave up using async, threading solved my problem, thanks to this answer
https://stackoverflow.com/a/23102874/5678457
from threading import Thread
import requests
import time
class myClassA(Thread):
def __init__(self):
Thread.__init__(self)
self.daemon = True
self.start()
def run(self):
while True:
r = requests.get('https://ex.com')
print(r.status_code, time.time())
for i in range(5):
myClassA()
Related
I have a very simple bot that makes a request to https://jsonplaceholder.typicode.com/todos/1 every 0.5 second (this is a practice project, I know it's not recommended) but, Upon running the bot, It starts, but due to the high amount of requests, It won't execute sayHi() due to the thread being blocked.
What I want: Is a way to execute the requests inside another thread, If possible?
#bot.command()
async def sayHi(ctx, arg):
print("Hi")
async def make_request():
r = requests.get("https://jsonplaceholder.typicode.com/todos/1")
print(r)
#tasks.loop(seconds=0.5)
async def scan_loop():
await make_request()
#bot.event
async def on_ready():
print("Bot is running")
scan_loop.start()
bot.run(TOKEN)
Of course, you can use multiprocessing or threading to run a request in another thread but there is a second way: using asyncronous aiohhtp instead of blocking requests library.
So, your make_request function will be something like this with aiohttp:
import aiohttp
async def make_request():
async with aiohttp.ClientSession() as session:
async with session.get("https://jsonplaceholder.typicode.com/todos/1") as response:
print(await response.json())
Much is written about creating self-contained functions for other tasks,
How to call a async function contained in a class?
How to set class attribute with await in __init__
but none address how to do so for GET requests.
Considering the following MWE -- how might this be transformed into a self-contained class?
import aiohttp
import asyncio
import time
async def get_pokemon(session, url):
async with session.get(url) as resp:
pokemon = await resp.json()
return pokemon['name']
async def main():
async with aiohttp.ClientSession() as session:
tasks = []
for number in range(1, 151):
url = f'https://pokeapi.co/api/v2/pokemon/{number}'
tasks.append(asyncio.ensure_future(get_pokemon(session, url)))
original_pokemon = await asyncio.gather(*tasks)
for pokemon in original_pokemon:
print(pokemon)
asyncio.run(main())
Code credit: https://www.twilio.com/blog/asynchronous-http-requests-in-python-with-aiohttp
I want to run many HTTP requests in parallel using python.
I tried this module named aiohttp with asyncio.
import aiohttp
import asyncio
async def main():
async with aiohttp.ClientSession() as session:
for i in range(10):
async with session.get('https://httpbin.org/get') as response:
html = await response.text()
print('done' + str(i))
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
I expect it to execute all the requests in parallel, but they are executed one by one.
Although, I later solved this using threading, but I would like to know what's wrong with this?
You need to make the requests in a concurrent manner. Currently, you have a single task defined by main() and so the http requests are run in a serial manner for that task.
You could also consider using asyncio.run() if you are using Python version 3.7+ that abstracts out creation of event loop:
import aiohttp
import asyncio
async def getResponse(session, i):
async with session.get('https://httpbin.org/get') as response:
html = await response.text()
print('done' + str(i))
async def main():
async with aiohttp.ClientSession() as session:
tasks = [getResponse(session, i) for i in range(10)] # create list of tasks
await asyncio.gather(*tasks) # execute them in concurrent manner
asyncio.run(main())
I'm using python 3.7 and trying to make a crawler that can go multiple domains asynchronously. I'm using for this asyncio and aiohttp but i'm experiencing problems with the aiohttp.ClientSession. This is my reduced code:
import aiohttp
import asyncio
async def fetch(session, url):
async with session.get(url) as response:
print(await response.text())
async def main():
loop = asyncio.get_event_loop()
async with aiohttp.ClientSession(loop=loop) as session:
cwlist = [loop.create_task(fetch(session, url)) for url in ['http://python.org', 'http://google.com']]
asyncio.gather(*cwlist)
if __name__ == "__main__":
asyncio.run(main())
The thrown exception is this:
_GatheringFuture exception was never retrieved
future: <_GatheringFuture finished exception=RuntimeError('Session is closed')>
What am i doing wrong here?
You forgot to await the asyncio.gather result:
async with aiohttp.ClientSession(loop=loop) as session:
cwlist = [loop.create_task(fetch(session, url)) for url in ['http://python.org', 'http://google.com']]
await asyncio.gather(*cwlist)
If you ever have an async with containing no await expressions you should be fairly suspicious.
Sorry, library first-timer here. I am polling a restful endpoint every 10 seconds.
Its not obvious to me which of the following is appropriate:
import aiohttp
import asyncio
async def poll(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as r:
return await r.text()
async def main():
while True:
await asyncio.sleep(10)
print(await poll('http://example.com/api'))
loop = asyncio.get_event_loop()
loop.create_task(main())
loop.run_forever()
Or the session variable persists forever:
import aiohttp
import asyncio
async def poll(url):
async with aiohttp.ClientSession() as session:
await asyncio.sleep(10)
async with session.get(url) as r:
print(await r.text())
loop = asyncio.get_event_loop()
loop.create_task(poll('http://example.com/api'))
loop.run_forever()
I expect the latter is desirable, but coming from the non-asynchronous requests library, I'm not used to the idea of sessions. Will I actually experience faster response times because of connection pooling or other things?
From official document:
Don’t create a session per request. Most likely you need a
session per application which performs all requests altogether.
A session contains a connection pool inside. Connection reusage and
keep-alives (both are on by default) may speed up total performance.
Surely the latter one is better and definitely you will have a faster experience.