I need to make asynchronous requests using the Requests library. In Python 3.7 if I try from requests import async I get SyntaxError: invalid syntax.
async has become a reserved with in Python 3.7. How to I get around this situation?
Lukasa who is with the requests lib said:
At the current time there are no plans to support async and await. This is not because they aren't a good idea: they are. It's because to use them requires quite substantial code changes.
Right now requests is a purely synchronous library that, at the bottom of its stack, uses httplib to send and receive data. We cannot move to an async model unless we replace httplib. The best we could do is provide a shorthand to run a request in a thread, but asyncio already has just such a shorthand, so I don't believe it would be valuable.
Right now I am quietly looking at whether we can rewrite requests to work just as well in a synchronous environment as in an async one. However, the reality is that doing so will be a lot of work, involving rewriting a lot of our stack, and may not happen for many years, if ever.
But don't worry aiohttp is very similar to requests.
Here's an example.
import aiohttp
import asyncio
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
async with aiohttp.ClientSession() as session:
html = await fetch(session, 'http://python.org')
print(html)
if __name__ == '__main__':
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
You can use asyncio to make asynchronous requests. Here is an example:
import asyncio
import requests
async def main():
loop = asyncio.get_event_loop()
futures = [
loop.run_in_executor(
None,
requests.get,
'http://example.org/'
)
for i in range(20)
]
for response in await asyncio.gather(*futures):
pass
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Related
I have an external library that uses requests module to perform http requests.
I need to use the library asynchronously without using many threads (it would be the last choice if nothing else works). And I can't change its source code either.
It would be easy to monkey-patch the library since all the interacting with requests module are done from a single function, but I don't know if I can monkey-patch synchronous function with asynchronous one (I mean async keyword).
Roughly, the problem simplifies to the following code:
import asyncio
import aiohttp
import types
import requests
# Can't modify Library class.
class Library:
def do(self):
self._request('example.com')
# Some other code here..
def _request(self, url):
return requests.get(url).text
# Monkey-patched to this method.
async def new_request(self, url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
async def main():
library = Library()
# Do monkey-patch.
library._request = types.MethodType(new_request, library)
# Call library asynchronously in hope that it will perform requests using aiohttp.
asyncio.gather(
library.do(),
library.do(),
library.do()
)
print('Done.')
asyncio.run(main())
But as expected, it doesn't work. I get TypeError: An asyncio.Future, a coroutine or an awaitable is required on asyncio.gather call. And also RuntimeWarning: coroutine 'new_request' was never awaited on self._request('example.com').
So the question is: is it possible to make that code work without modifying the Library class' source code? Otherwise, what options do I have to make asynchronous requests using the library?
Is it possible to make that code work without modifying the Library class' source code? Otherwise, what options do I have to make asynchronous requests using the library?
Yes, it is possible, and you even do not need monkey-patching to perform that. You should use asyncio.to_thread to make the synchronous do method of Library an asynchronous function (coroutine). So the main coroutine should look like this:
async def main():
library = Library()
await asyncio.gather(
asyncio.to_thread(library.do),
asyncio.to_thread(library.do),
asyncio.to_thread(library.do)
)
print('Done.')
Here the asyncio.to_thread wraps the library.do method and returns a coroutine object avoiding the first error, but you also need await before asyncio.gather.
NOTE: If you are going to check my answer with the above example, please do not forget to set a valid URL instead of 'example.com'.
Edit
If you do not want to use threads at all, I would recommend an async wrapper like the to_async function below and replace asyncio.to_thread with that.
async def to_async(func):
return func()
async def main():
library = Library()
await asyncio.gather(
to_async(library.do),
to_async(library.do),
to_async(library.do),
)
I thought to release another answer as it solves the problem in another way.
So why do not extend the default behavior of the Library class and make the do and _request methods polymorph?
class AsyncLibrary(Library):
async def do(self):
return await self._request('https://google.com/')
async def _request(self, url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as response:
return await response.text()
async def main():
library = AsyncLibrary()
await asyncio.gather(
library.do(),
library.do(),
library.do(),
)
No way to do this with requests without threads but you can limit the number of threads active at any one time to address your without using many threads requirement.
import asyncio
import requests
# Can't modify Library class.
class Library:
def do(self):
self._request('http://example.com')
def _request(self, url):
return requests.get(url).text
async def as_thread(semaphore, func):
async with semaphore: # limit the number of threads active
await asyncio.to_thread(func)
async def main():
library = Library()
semaphore = asyncio.Semaphore(2) # limit to 2 for example
tasks = [library.do] * 10 # pretend there are a lot of sites to read
await asyncio.gather(
*[as_thread(semaphore, x) for x in tasks]
)
print('Done.')
asyncio.run(main())
I want to run many HTTP requests in parallel using python.
I tried this module named aiohttp with asyncio.
import aiohttp
import asyncio
async def main():
async with aiohttp.ClientSession() as session:
for i in range(10):
async with session.get('https://httpbin.org/get') as response:
html = await response.text()
print('done' + str(i))
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
I expect it to execute all the requests in parallel, but they are executed one by one.
Although, I later solved this using threading, but I would like to know what's wrong with this?
You need to make the requests in a concurrent manner. Currently, you have a single task defined by main() and so the http requests are run in a serial manner for that task.
You could also consider using asyncio.run() if you are using Python version 3.7+ that abstracts out creation of event loop:
import aiohttp
import asyncio
async def getResponse(session, i):
async with session.get('https://httpbin.org/get') as response:
html = await response.text()
print('done' + str(i))
async def main():
async with aiohttp.ClientSession() as session:
tasks = [getResponse(session, i) for i in range(10)] # create list of tasks
await asyncio.gather(*tasks) # execute them in concurrent manner
asyncio.run(main())
I'm getting started to AsyncIO and AioHTTP, and i'm writing some basic code to get familiar with the syntax. I tried the following code that should perform 3 requests concurrently:
import time
import logging
import asyncio
import aiohttp
import json
from aiohttp import ClientSession, ClientResponseError
from aiocfscrape import CloudflareScraper
async def nested(url):
async with CloudflareScraper() as session:
async with session.get(url) as resp:
return await resp.text()
async def main():
URL = "https://www.binance.com/api/v3/exchangeInfo"
await asyncio.gather(nested(URL), nested(URL), nested(URL))
asyncio.run(main())
Here is the output:
raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed
I don't understand why do i get that error, can anyone help me on this?
Update
Originally I was recommending Greg's answer below:
import asyncio
import sys
if sys.platform:
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
Turned out, using WindowsSelectorEventLoop has functionality issues such as:
Can't support more than 512 sockets
Can't use pipe
Can't use subprocesses
due to the fact that Windows uses I/O completion Ports unlike *nix - Therefore SelectorEventLoop is not designed for Windows nor is implemented as full.
If those limitations matters to you - You might be better off using lengthy workaround in this answer.
Check out more about differences at documents.
Or alternatively, consider using Trio over asyncio, which is much more stable and consistent.
import trio
async def task():
await trio.sleep(5)
trio.run(task)
Original post
I've finally figured out how to keep ProactorEventLoop running, preventing unsuccessful IO closure.
Really not sure why windows' Event loop is so faulty, as this also happens for asyncio.open_connection and asyncio.start_server.
To workaround this, you need to run event loop in forever loop and close manually.
Following code will cover both windows and other environments.
import asyncio
from aiocfscrape import CloudflareScraper
async def nested(url):
async with CloudflareScraper() as session:
async with session.get(url) as resp:
return await resp.text()
async def main():
await nested("https://www.binance.com/api/v3/exchangeInfo")
try:
assert isinstance(loop := asyncio.new_event_loop(), asyncio.ProactorEventLoop)
# No ProactorEventLoop is in asyncio on other OS, will raise AttributeError in that case.
except (AssertionError, AttributeError):
asyncio.run(main())
else:
async def proactor_wrap(loop_: asyncio.ProactorEventLoop, fut: asyncio.coroutines):
await fut
loop_.stop()
loop.create_task(proactor_wrap(loop, main()))
loop.run_forever()
This code will check if new EventLoop is ProactorEventLoop.
If so, keep loop forever until proactor_wrap awaits main and schedules loop stop.
Else - possibly all other OS than Windows - doesn't need these additional steps, simply call asyncio.run() instead.
IDE like Pycharm will complain about passing AbstractEventLoop to ProactorEventLoop parameter, safe to ignore.
Whilst this has been answered and accepted. You can fix this issue with one line of code: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
Event loop is closed is a known issue on Windows (see https://github.com/encode/httpx/issues/914). I suspect this will be fixed in later versions of Python. To get around the error, simply set the event loop policy to WindowsSelectorEventLoopPolicy.
If you plan to run the code on non-windows environment; then you'll want to either add an if statement to prevent error. E.g: if sys.platform == 'win32'. Or add code to set the policies.
Working example:
import asyncio
from aiocfscrape import CloudflareScraper
import sys
async def nested(url):
async with CloudflareScraper() as session:
async with session.get(url) as resp:
print(resp.status)
return await resp.text()
async def main():
URL = "https://www.binance.com/api/v3/exchangeInfo"
await asyncio.gather(nested(URL), nested(URL), nested(URL))
# Only preform check if your code will run on non-windows environments.
if sys.platform == 'win32':
# Set the policy to prevent "Event loop is closed" error on Windows - https://github.com/encode/httpx/issues/914
asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())
asyncio.run(main())
I am running a program that makes three different requests from a rest api. data, indicator, request functions all fetch data from BitMEX's api using a wrapper i've made.
I have used asyncio to try to speed up the process so that while i am waiting on a response from previous request, it can begin to make another one.
However, my asynchronous version is not running any quicker for some reason. The code works and as far as I know, I have set everything up correctly. But there could be something wrong with how I am setting up the coroutines?
Here is the asynchronous version:
import time
import asyncio
from bordemwrapper import BitMEXData, BitMEXFunctions
'''
asynchronous I/O
'''
async def data():
data = BitMEXData().get_ohlcv(symbol='XBTUSD', timeframe='1h',
instances=25)
await asyncio.sleep(0)
return data
async def indicator():
indicator = BitMEXData().get_indicator(symbol='XBTUSD',
timeframe='1h', indicator='RSI', period=20, source='close',
instances=25)
await asyncio.sleep(0)
return indicator
async def request():
request = BitMEXFunctions().get_price()
await asyncio.sleep(0)
return request
async def chain():
data_ = await data()
indicator_ = await indicator()
request_ = await request()
return data_, indicator_, request_
async def main():
await asyncio.gather(chain())
if __name__ == '__main__':
start = time.perf_counter()
asyncio.run(main())
end = time.perf_counter()
print('process finished in {} seconds'.format(end - start))
Unfortunately, asyncio isn't magic. Although you've put them in async functions, the BitMEXData().get_<foo> functions are not themselves async (i.e. you can't await them), and therefore block while they run. The concurrency in asyncio can only occur while awaiting something.
You'll need a library which makes the actual HTTP requests asynchronously, like aiohttp. It sounds like you wrote bordemwrapper yourself - you should rewrite the get_<foo> functions to use asynchronous HTTP requests. Feel free to submit a separate question if you need help with that.
Sorry, library first-timer here. I am polling a restful endpoint every 10 seconds.
Its not obvious to me which of the following is appropriate:
import aiohttp
import asyncio
async def poll(url):
async with aiohttp.ClientSession() as session:
async with session.get(url) as r:
return await r.text()
async def main():
while True:
await asyncio.sleep(10)
print(await poll('http://example.com/api'))
loop = asyncio.get_event_loop()
loop.create_task(main())
loop.run_forever()
Or the session variable persists forever:
import aiohttp
import asyncio
async def poll(url):
async with aiohttp.ClientSession() as session:
await asyncio.sleep(10)
async with session.get(url) as r:
print(await r.text())
loop = asyncio.get_event_loop()
loop.create_task(poll('http://example.com/api'))
loop.run_forever()
I expect the latter is desirable, but coming from the non-asynchronous requests library, I'm not used to the idea of sessions. Will I actually experience faster response times because of connection pooling or other things?
From official document:
Don’t create a session per request. Most likely you need a
session per application which performs all requests altogether.
A session contains a connection pool inside. Connection reusage and
keep-alives (both are on by default) may speed up total performance.
Surely the latter one is better and definitely you will have a faster experience.