Asynchronous Requests with Python requests

Asynchronous Requests with Python requests - python

I tried the sample provided within the documentation of the requests library for python.
With async.map(rs), I get the response codes, but I want to get the content of each page requested. This, for example, does not work:
out = async.map(rs)
print out[0].content

Note
The below answer is not applicable to requests v0.13.0+. The asynchronous functionality was moved to grequests after this question was written. However, you could just replace requests with grequests below and it should work.
I've left this answer as is to reflect the original question which was about using requests < v0.13.0.
To do multiple tasks with async.map asynchronously you have to:
Define a function for what you want to do with each object (your task)
Add that function as an event hook in your request
Call async.map on a list of all the requests / actions
Example:
from requests import async
# If using requests > v0.13.0, use
# from grequests import async
urls = [
'http://python-requests.org',
'http://httpbin.org',
'http://python-guide.org',
'http://kennethreitz.com'
]
# A simple task to do to each response object
def do_something(response):
print response.url
# A list to hold our things to do via async
async_list = []
for u in urls:
# The "hooks = {..." part is where you define what you want to do
#
# Note the lack of parentheses following do_something, this is
# because the response will be used as the first argument automatically
action_item = async.get(u, hooks = {'response' : do_something})
# Add the task to our list of things to do via async
async_list.append(action_item)
# Do our list of things to do via async
async.map(async_list)

async is now an independent module : grequests.
See here : https://github.com/kennethreitz/grequests
And there: Ideal method for sending multiple HTTP requests over Python?
installation:
$ pip install grequests
usage:
build a stack:
import grequests
urls = [
'http://www.heroku.com',
'http://tablib.org',
'http://httpbin.org',
'http://python-requests.org',
'http://kennethreitz.com'
]
rs = (grequests.get(u) for u in urls)
send the stack
grequests.map(rs)
result looks like
[<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]
grequests don't seem to set a limitation for concurrent requests, ie when multiple requests are sent to the same server.

I tested both requests-futures and grequests. Grequests is faster but brings monkey patching and additional problems with dependencies. requests-futures is several times slower than grequests. I decided to write my own and simply wrapped requests into ThreadPoolExecutor and it was almost as fast as grequests, but without external dependencies.
import requests
import concurrent.futures
def get_urls():
return ["url1","url2"]
def load_url(url, timeout):
return requests.get(url, timeout = timeout)
with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor:
future_to_url = {executor.submit(load_url, url, 10): url for url in get_urls()}
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
try:
data = future.result()
except Exception as exc:
resp_err = resp_err + 1
else:
resp_ok = resp_ok + 1

Unfortunately, as far as I know, the requests library is not equipped for performing asynchronous requests. You can wrap async/await syntax around requests, but that will make the underlying requests no less synchronous. If you want true async requests, you must use other tooling that provides it. One such solution is aiohttp (Python 3.5.3+). It works well in my experience using it with the Python 3.7 async/await syntax. Below I write three implementations of performing n web requests using
Purely synchronous requests (sync_requests_get_all) using the Python requests library
Synchronous requests (async_requests_get_all) using the Python requests library wrapped in Python 3.7 async/await syntax and asyncio
A truly asynchronous implementation (async_aiohttp_get_all) with the Python aiohttp library wrapped in Python 3.7 async/await syntax and asyncio
"""
Tested in Python 3.5.10
"""
import time
import asyncio
import requests
import aiohttp
from asgiref import sync
def timed(func):
"""
records approximate durations of function calls
"""
def wrapper(*args, **kwargs):
start = time.time()
print('{name:<30} started'.format(name=func.__name__))
result = func(*args, **kwargs)
duration = "{name:<30} finished in {elapsed:.2f} seconds".format(
name=func.__name__, elapsed=time.time() - start
)
print(duration)
timed.durations.append(duration)
return result
return wrapper
timed.durations = []
#timed
def sync_requests_get_all(urls):
"""
performs synchronous get requests
"""
# use session to reduce network overhead
session = requests.Session()
return [session.get(url).json() for url in urls]
#timed
def async_requests_get_all(urls):
"""
asynchronous wrapper around synchronous requests
"""
session = requests.Session()
# wrap requests.get into an async function
def get(url):
return session.get(url).json()
async_get = sync.sync_to_async(get)
async def get_all(urls):
return await asyncio.gather(*[
async_get(url) for url in urls
])
# call get_all as a sync function to be used in a sync context
return sync.async_to_sync(get_all)(urls)
#timed
def async_aiohttp_get_all(urls):
"""
performs asynchronous get requests
"""
async def get_all(urls):
async with aiohttp.ClientSession() as session:
async def fetch(url):
async with session.get(url) as response:
return await response.json()
return await asyncio.gather(*[
fetch(url) for url in urls
])
# call get_all as a sync function to be used in a sync context
return sync.async_to_sync(get_all)(urls)
if __name__ == '__main__':
# this endpoint takes ~3 seconds to respond,
# so a purely synchronous implementation should take
# little more than 30 seconds and a purely asynchronous
# implementation should take little more than 3 seconds.
urls = ['https://postman-echo.com/delay/3']*10
async_aiohttp_get_all(urls)
async_requests_get_all(urls)
sync_requests_get_all(urls)
print('----------------------')
[print(duration) for duration in timed.durations]
On my machine, this is the output:
async_aiohttp_get_all started
async_aiohttp_get_all finished in 3.20 seconds
async_requests_get_all started
async_requests_get_all finished in 30.61 seconds
sync_requests_get_all started
sync_requests_get_all finished in 30.59 seconds
----------------------
async_aiohttp_get_all finished in 3.20 seconds
async_requests_get_all finished in 30.61 seconds
sync_requests_get_all finished in 30.59 seconds

maybe requests-futures is another choice.
from requests_futures.sessions import FuturesSession
session = FuturesSession()
# first request is started in background
future_one = session.get('http://httpbin.org/get')
# second requests is started immediately
future_two = session.get('http://httpbin.org/get?foo=bar')
# wait for the first request to complete, if it hasn't already
response_one = future_one.result()
print('response one status: {0}'.format(response_one.status_code))
print(response_one.content)
# wait for the second request to complete, if it hasn't already
response_two = future_two.result()
print('response two status: {0}'.format(response_two.status_code))
print(response_two.content)
It is also recommended in the office document. If you don't want involve gevent, it's a good one.

I have a lot of issues with most of the answers posted - they either use deprecated libraries that have been ported over with limited features, or provide a solution with too much magic on the execution of the request, making it difficult to error handle. If they do not fall into one of the above categories, they're 3rd party libraries or deprecated.
Some of the solutions works alright purely in http requests, but the solutions fall short for any other kind of request, which is ludicrous. A highly customized solution is not necessary here.
Simply using the python built-in library asyncio is sufficient enough to perform asynchronous requests of any type, as well as providing enough fluidity for complex and usecase specific error handling.
import asyncio
loop = asyncio.get_event_loop()
def do_thing(params):
async def get_rpc_info_and_do_chores(id):
# do things
response = perform_grpc_call(id)
do_chores(response)
async def get_httpapi_info_and_do_chores(id):
# do things
response = requests.get(URL)
do_chores(response)
async_tasks = []
for element in list(params.list_of_things):
async_tasks.append(loop.create_task(get_chan_info_and_do_chores(id)))
async_tasks.append(loop.create_task(get_httpapi_info_and_do_chores(ch_id)))
loop.run_until_complete(asyncio.gather(*async_tasks))
How it works is simple. You're creating a series of tasks you'd like to occur asynchronously, and then asking a loop to execute those tasks and exit upon completion. No extra libraries subject to lack of maintenance, no lack of functionality required.

You can use httpx for that.
import httpx
async def get_async(url):
async with httpx.AsyncClient() as client:
return await client.get(url)
urls = ["http://google.com", "http://wikipedia.org"]
# Note that you need an async context to use `await`.
await asyncio.gather(*map(get_async, urls))
if you want a functional syntax, the gamla lib wraps this into get_async.
Then you can do
await gamla.map(gamla.get_async(10))(["http://google.com", "http://wikipedia.org"])
The 10 is the timeout in seconds.
(disclaimer: I am its author)

I know this has been closed for a while, but I thought it might be useful to promote another async solution built on the requests library.
list_of_requests = ['http://moop.com', 'http://doop.com', ...]
from simple_requests import Requests
for response in Requests().swarm(list_of_requests):
print response.content
The docs are here: http://pythonhosted.org/simple-requests/

If you want to use asyncio, then requests-async provides async/await functionality for requests - https://github.com/encode/requests-async

DISCLAMER: Following code creates different threads for each function.
This might be useful for some of the cases as it is simpler to use. But know that it is not async but gives illusion of async using multiple threads, even though decorator suggests that.
You can use the following decorator to give a callback once the execution of function is completed, the callback must handle the processing of data returned by the function.
Please note that after the function is decorated it will return a Future object.
import asyncio
## Decorator implementation of async runner !!
def run_async(callback, loop=None):
if loop is None:
loop = asyncio.get_event_loop()
def inner(func):
def wrapper(*args, **kwargs):
def __exec():
out = func(*args, **kwargs)
callback(out)
return out
return loop.run_in_executor(None, __exec)
return wrapper
return inner
Example of implementation:
urls = ["https://google.com", "https://facebook.com", "https://apple.com", "https://netflix.com"]
loaded_urls = [] # OPTIONAL, used for showing realtime, which urls are loaded !!
def _callback(resp):
print(resp.url)
print(resp)
loaded_urls.append((resp.url, resp)) # OPTIONAL, used for showing realtime, which urls are loaded !!
# Must provide a callback function, callback func will be executed after the func completes execution
# Callback function will accept the value returned by the function.
#run_async(_callback)
def get(url):
return requests.get(url)
for url in urls:
get(url)
If you wish to see which url are loaded in real-time then, you can add the following code at the end as well:
while True:
print(loaded_urls)
if len(loaded_urls) == len(urls):
break

from threading import Thread
threads=list()
for requestURI in requests:
t = Thread(target=self.openURL, args=(requestURI,))
t.start()
threads.append(t)
for thread in threads:
thread.join()
...
def openURL(self, requestURI):
o = urllib2.urlopen(requestURI, timeout = 600)
o...

I second the suggestion above to use HTTPX, but I often use it in a different way so am adding my answer.
I personally use asyncio.run (introduced in Python 3.7) rather than asyncio.gather and also prefer the aiostream approach, which can be used in combination with asyncio and httpx.
As in this example I just posted, this style is helpful for processing a set of URLs asynchronously even despite the (common) occurrence of errors. I particularly like how that style clarifies where the response processing occurs and for ease of error handling (which I find async calls tend to give more of).
It's easier to post a simple example of just firing off a bunch of requests asynchronously, but often you also want to handle the response content (compute something with it, perhaps with reference to the original object that the URL you requested was to do with).
The core of that approach looks like:
async with httpx.AsyncClient(timeout=timeout) as session:
ws = stream.repeat(session)
xs = stream.zip(ws, stream.iterate(urls))
ys = stream.starmap(xs, fetch, ordered=False, task_limit=20)
process = partial(process_thing, things=things, pbar=pbar, verbose=verbose)
zs = stream.map(ys, process)
return await zs
where:
process_thing is an async response content handling function
things is the input list (which the urls generator of URL strings came from), e.g. a list of objects/dictionaries
pbar is a progress bar (e.g. tqdm.tqdm) [optional but useful]
All of that goes in an async function async_fetch_urlset which is then run by calling a synchronous 'top-level' function named e.g. fetch_things which runs the coroutine [this is what's returned by an async function] and manages the event loop:
def fetch_things(urls, things, pbar=None, verbose=False):
return asyncio.run(async_fetch_urlset(urls, things, pbar, verbose))
Since a list passed as input (here it's things) can be modified in-place, you can effectively get output back (as we're used to from synchronous function calls)

I have been using python requests for async calls against github's gist API for some time.
For an example, see the code here:
https://github.com/davidthewatson/flasgist/blob/master/views.py#L60-72
This style of python may not be the clearest example, but I can assure you that the code works. Let me know if this is confusing to you and I will document it.

I have also tried some things using the asynchronous methods in python, how ever I have had much better luck using twisted for asynchronous programming. It has fewer problems and is well documented. Here is a link of something simmilar to what you are trying in twisted.
http://pythonquirks.blogspot.com/2011/04/twisted-asynchronous-http-request.html

Non of the answers above helped me because they assume that you have a predefined list of requests, while in my case i need to be able to listen to requests and respond asynchronously (in similar way to how it works in nodejs).
def handle_finished_request(r, **kwargs):
print(r)
# while True:
def main():
while True:
address = listen_to_new_msg() # based on your server
# schedule async requests and run 'handle_finished_request' on response
req = grequests.get(address, timeout=1, hooks=dict(response=handle_finished_request))
job = grequests.send(req) # does not block! for more info see https://stackoverflow.com/a/16016635/10577976
main()
the handle_finished_request callback would be called when a response is received. note: for some reason timeout (or no response) does not trigger error here
This simple loop can trigger async requests similarly to how it would work in nodejs server

Related

How to do async api requests in a GAE application?

I am working on an application which is based on GAE with python 2.7.13. What I want to do is that to make a bunch of async API calls inside a handler. Something like that:
class MakeRequests(webapp2.RequestHandler):
def post(self, *v, **kv):
*do an async api call#1*
*do an async api call#2*
*do an async api call#3*
*wait for response from all of above api requests*
*make response in a way like if call#1 failes, make it's expected*
*attributes in response as None, if call#2 succeeds add it's*
*attributes in response etc. This is just an example.*
For that purpose, I have tried libraries like asyncio, grequests, requests and simple-requests, they don't seems to be working because either they are not compatible with with GAE or with python 2.7.13.
Can anyone help me here?

Urlfetch, which is bundled by default with GAE has a way of making asynchronous calls:
from google.appengine.api import urlfetch
def post(self, *v, **kv):
rpcs = []
for url in urls:
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(rpc, url)
rpcs.append(rpc)
results = [rpc.get_result() for rpc in rpcs]
# do stuff with results
If, for some reason you don't want to use urlfetch you can parallelize the requests manually by using threading and a synchronized Queue to read the results.

Is there an after request method in aiohttp like in flask

Flask provides this nice #app.after_request decorator which allows to execute a method after an http request has been handled. See documentation here.
How would you achieve a similar pattern with aiohttp?
Typically to send logs after the request has been handled.

The aiohttp web server supports signals, which are hooks to be called at specific points.
The Application.on_response_prepare signal is the moral equivalent of Flask's after_request handler. Use it to modify the response as it is being prepared to be returned to the client:
async def on_prepare(request, response):
response.headers['My-Header'] = 'value'
app.on_response_prepare.append(on_prepare)
The signal receives both the request and response objects. If you want to implement the Flask pattern for registering a callback per request, and are using Python 3.7, you can use a contextvars context variable:
from contextvars import ContextVar
from typing import Iterable, Callable
from aiohttp import web
PrepareCallback = Callable[[web.Request, web.StreamResponse], None]
call_on_prepare: ContextVar[Iterable[PrepareCallback]] = ContextVar('call_on_prepare', ())
async def per_request_callbacks(request, response):
# executed sequentially, in order of registration!
for callback in call_on_prepare.get():
await callback(request, response)
app.on_response_prepare.append(per_request_callbacks)
def response_prepare_after_this_request(awaitable):
call_on_prepare.set(call_on_prepare.get() + (awaitable,))
return awaitable
then use it like this in a request:
def invalidate_username_cache():
#response_prepare_after_this_request
async def delete_username_cookie(request, response):
response.del_cookie('username')
return response
If you need to support Python versions < 3.7, you'd have to store the list of callbacks on the app, request or response objects instead; see the data sharing section of the aiohttp FAQ. Personally, I think that contextvars are the better pattern here, as this provides better encapsulation for utilities like response_prepare_after_this_request, which now can be distributed separately without fear of clashing with other data set in the aiohttp.web object mappings.

how to call a method and make it run in background in python 3.4?

I have implemented the Google Cloud Messaging server in python and I want that method to be Asynchronous. I do not expect any return values from that method. Is there a simple way to do this?
I have tried using async from asyncio package:
...
loop = asyncio.get_event_loop()
if(module_status=="Fail"):
loop.run_until_complete(sendNotification(module_name, module_status))
...
and here is my method sendNotification():
async def sendNotification(module_name, module_status):
gcm = GCM("API_Key")
data ={"message":module_status, "moduleName":module_name}
reg_ids = ["device_tokens"]
response = gcm.json_request(registration_ids=reg_ids, data=data)
print("GCM notification sent!")

Since GCM is not async library compatible need to use an external event loop.
There are a few, simplest one IMO is probably gevent.
Note that gevent monkey patching may introduce dead locks if the underlying libraries used rely on blocking behaviour to operate.
import gevent
from gevent.greenlet import Greenlet
from gevent import monkey
monkey.patch_all()
def sendNotification(module_name, module_status):
gcm = GCM("API_Key")
data ={"message":module_status, "moduleName":module_name}
reg_ids = ["device_tokens"]
response = gcm.json_request(registration_ids=reg_ids, data=data)
print("GCM notification sent!")
greenlet = Greenlet.spawn(sendNotification,
args=(module_name, module_status,))
# Yield control to gevent's event loop without blocking
# to allow background tasks to run
gevent.sleep(0)
#
# Other code, other greenlets etc here
#
# Call get to get return value if needed
greenlet.get()

You could use a ThreadPoolExecutor:
from concurrent.futures import ThreadPoolExecutor
def send_notification(module_name, module_status):
[...]
with ThreadPoolExecutor() as executor:
future = executor.submit(send_notification, module_name, module_status)

You can use asyncio's api: loop.run_in_executor(None, callable)
This will run the code using an executor (by default a ThreadPoolExecutor)
See the documentation

calling functions via grequests

I realize there have been many posts on grequests such as Asynchronous Requests with Python requests
which describes the basic usage of grequests and how to send hooks via grequests.get() I pulled this bit of code right from that link.
import grequests
urls = [
'http://python-requests.org',
'http://httpbin.org',
'http://python-guide.org',
'http://kennethreitz.com'
]
# A simple task to do to each response object
def do_something(response):
print ('print_test')
# A list to hold our things to do via async
async_list = []
for u in urls:
action_item = grequests.get(u, hooks = {'response' : do_something})
async_list.append(action_item)
# Do our list of things to do via async
grequests.map(async_list)
When i run this however i get no output
/$ python test.py
/$
since there are 4 links I would expect the output to be
print_test
print_test
print_test
print_test
I have been searching around and haven't been able to find a reason for the lack of output I am amusing that there is a bit of key information that I am missing.

I need to check sources yet, but if you rewrite your hook function as
# A simple task to do to each response object
def do_something(response, *args, **kwargs):
print ('print_test')
it puts output. So it's probably failing to call you original hook(because it passes more arguments than you accept) and catching exception, so you get no output

How to handle DNS timeouts with aiohttp?

The aiohttp readme says:
If you want to use timeouts for aiohttp client please use standard asyncio approach:
yield from asyncio.wait_for(client.get(url), 10)
But that doesn't handle DNS timeouts which are, I guess, handled by the OS. Also the with aiohttp.Timeout doesn't handle OS DNS lookups.
There has been a discussion at the asyncio repo without final conclusion and Saghul has made aiodns but I'm not sure how to mix it into aiohttp and whether that will allow asyncio.wait_for functionality.
Testcase (takes 20 sec on my linux box):
async def fetch(url):
url = 'http://alicebluejewelers.com/'
with aiohttp.Timeout(0.001):
resp = await aiohttp.get(url)

Timeout works as expected but unfortunately your example hangs on python shutdown procedure: it waits for termination of background thread which performs DNS lookup.
As a solution I can suggest using aiodns for manual IP resolving:
import asyncio
import aiohttp
import aiodns
async def fetch():
dns = 'alicebluejewelers.com'
# dns = 'google.com'
with aiohttp.Timeout(1):
ips = await resolver.query(dns, 'A')
print(ips)
url = 'http://{}/'.format(ips[0].host)
async with aiohttp.get(url) as resp:
print(resp.status)
loop = asyncio.get_event_loop()
resolver = aiodns.DNSResolver(loop=loop)
loop.run_until_complete(fetch())
Maybe solution worth to be included into TCPConnector as optional feature.
Pull Request is welcome!

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Asynchronous Requests with Python requests - python

I tried the sample provided within the documentation of the requests library for python. With async.map(rs), I get the response codes, but I want to get the content of each page requested. This, for example, does not work: out = async.map(rs) print out[0].content

If you want to use asyncio, then requests-async provides async/await functionality for requests - https://github.com/encode/requests-async

from threading import Thread threads=list() for requestURI in requests: t = Thread(target=self.openURL, args=(requestURI,)) t.start() threads.append(t) for thread in threads: thread.join() ... def openURL(self, requestURI): o = urllib2.urlopen(requestURI, timeout = 600) o...

Related

How to do async api requests in a GAE application?

Is there an after request method in aiohttp like in flask

how to call a method and make it run in background in python 3.4?

calling functions via grequests

How to handle DNS timeouts with aiohttp?

Categories

Resources