Tornado async call to a function - python

I am making a web application using Python + Tornado which basically serves files to users. I have no database.
The files are either directly picked up and served if they are available, or generated on the fly if not.
I want the clients to be served in an async manner, because some files may already be available, while others need to be generated (thus they need to wait, and I don't want them to block other users).
I have a class that manages the picking or generation of files, and I just need to call it from Tornado.
What is the best way (most efficient on CPU and RAM) to achieve that? Should I use a thread? A sub process? A simple gen.Task like this one?
Also, I would like my implementation to work on Google App Engines (I think they do not allow sub processes to be spawned?).
I'm relatively new to the async web servicing, so any help is welcome.

I've found the answers to my questions: The genTask example is indeed the best way to implement an async call, and it is due to the fact that the example does use a Python coroutine, which I didn't understand at first glance because I thought yield was only used to return a value for generators.
Concrete example:
class MyHandler(tornado.web.RequestHandler):
#asynchronous
#gen.engine
def get(self):
response = yield gen.Task(self.dosomething, 'argument')
What is important here is the combination of two things:
yield , which in fact spawns a coroutine (or pseudo-thread, which is very efficient and are done to be highly concurrent-friendly).
http://www.python.org/dev/peps/pep-0342/
gen.Task() which is a non-blocking (async) function, because if you spawn a coroutine on a blocking function, it won't be async. gen.Task() is provided by Tornado, specifically to work with the coroutine syntax of Python. More infos:
http://www.tornadoweb.org/documentation/gen.html
So a canonical example of an async call in Python using coroutines:
response = yield non_blocking_func(**kwargs)

Now Documentation have solution.
Simple example:
import os.path
import tornado.web
from tornado import gen
class MyHandler(tornado.web.RequestHandler):
#gen.coroutine
def get(self, filename):
result = yield self.some_usefull_process(filename)
self.write(result)
#gen.coroutine
def some_usefull_process(self, filename):
if not os.path.exists(filename):
status = yield self.generate_file(filename)
result = 'File created'
else:
result = 'File exists'
raise gen.Return(result)
#gen.coroutine
def generate_file(self, filename):
fd = open(filename, 'w')
fd.write('created')
fd.close()

Related

how to use httpx.AsyncClient as class member, and close asynchronously

I want to use http client as a class member, but del function could not call await client.aclose().
e.g.:
import httpx
class Foo(object):
def __init__(self):
self.client = httpx.AsyncClient()
def __del__(self):
await self.client.aclose()
refer: https://www.python-httpx.org/async/#opening-and-closing-clients
how to safely aclose?
Although this is an older question, I may have something compelling to share as I had a similar situation. To #Isabi's point (Answered 2020-12-28), you need to use an event loop to decouple the client from your operations and then manually control it's lifecycle.
In my case, I need more control over the client such that I can separate the Request from the sending and when the client is closed so I can take advantage of session pooling, etc. The example provided below shows how to use http.AsyncClient as a class member and close it on exit.
In figuring this out, I bumped into an Asyncio learning curve but quickly discovered that it's ... actually not too bad. It's not as clean as Go[lang] but it starts making sense after an hour or two of fiddling around with it. Full disclosure: I still question whether this is 100% correct.
The critical pieces are in the __init__, close, and the __del__ methods. What, to me, remains to be answered, is whether using a the http.AsyncClient in a context manager actually resets connections, etc. I can only assume it does because that's what makes sense to me. I can't help but wonder: is this even necessary?
import asyncio
import httpx
import time
from typing import Callable, List
from rich import print
class DadJokes:
headers = dict(Accept='application/json')
def __init__(self):
"""
Since we want to reuse the client, we can't use a context manager that closes it.
We need to use a loop to exert more control over when the client is closed.
"""
self.client = httpx.AsyncClient(headers=self.headers)
self.loop = asyncio.get_event_loop()
async def close(self):
# httpx.AsyncClient.aclose must be awaited!
await self.client.aclose()
def __del__(self):
"""
A destructor is provided to ensure that the client and the event loop are closed at exit.
"""
# Use the loop to call async close, then stop/close loop.
self.loop.run_until_complete(self.close())
self.loop.close()
async def _get(self, url: str, idx: int = None):
start = time.time()
response = await self.client.get(url)
print(response.json(), int((time.time() - start) * 1000), idx)
def get(self, url: str):
self.loop.run_until_complete(self._get(url))
def get_many(self, urls: List[str]):
start = time.time()
group = asyncio.gather(*(self._get(url, idx=idx) for idx, url in enumerate(urls)))
self.loop.run_until_complete(group)
print("Runtime: ", int((time.time() - start) * 1000))
url = 'https://www.icanhazdadjoke.com'
dj = DadJokes()
dj.get_many([url for x in range(4)])
Since I've been using Go as of late, I originally wrote some of these methods with closures as they seemed to make sense; in the end I was able to (IMHO) provide a nice balance in between separation / encapsulation / isolation by converting the closures to class methods.
The resulting usage interface feels approachable and easy to read - I see myself writing class based async moving forward.
The problem might be due to the fact that client.aclose() returns an awaitable, which cannot be awaited in a normal def function.
It could be worth giving a try with asyncio.run(self.client.aclose()). Here it might occur an exception, complaining that you are using a different event loop (or the same, I don't know much of your context so I can't tell) from currently running one. In this case you could get the currently running event loop and run the function from there.
See https://docs.python.org/3/library/asyncio-eventloop.html for more information on how you could accomplish it.

Asynchronous download of files with twisted and (tx)requests

I'm trying to download file(s) from the internet from within a twisted application. I'd like to do this using requests due to the other features it provides directly or has well maintained libraries to provide (retries, proxies, cachecontrol, etc.). I am open to a twisted only solution which does not have these features, but I can't seem to find one anyway.
The files should be expected to be fairly large and will be downloaded on slow connections. I'm therefore using requests' stream=True interface and the response's iter_content. A more or less complete code fragment is listed at the end of this question. The entry point for this would be http_download function, called with a url, a dst to write the file to, and a callback and an optional errback to handle a failed download. I've stripped away some of the code involved in preparing the destination (create folders, etc) and code to close the session during reactor exit but I think it should still work as is.
This code works. The file is downloaded, the twisted reactor continues to operate. However, I seem to have a problem with this bit of code :
def _stream_download(r, f):
for chunk in r.iter_content(chunk_size=128):
f.write(chunk)
yield None
cooperative_dl = cooperate(_stream_download(response, filehandle))
Because iter_content returns only when it has a chunk to return, the reactor handles a chunk, runs other bits of code, then returns to waiting for the next chunk instead of keeping itself busy updating a spinning wait animation on the GUI (code not actually posted here).
Here's the question -
Is there a way to get twisted to operate on this generator in such a way that it yields control when the generator itself is not prepared to yield something? I came across some docs for twisted.flow which seemed appropriate, but this does not seem to have made it into twisted or no longer exists today. This question can be read independent of the specifics, i.e., with respect to any arbitrary blocking generator, or can be read in the immediate context of the question.
Is there a way to get twisted to download files asynchronously using something full-featured like requests? Is there an existing twisted module which just does this which I can just use?
What would the basic approach be to such a problem with twisted, independent of the http features I want to use from requests. Let's assume I'm prepared to ditch them or otherwise implement them. How would I download a file asynchronously over HTTP.
import os
import re
from functools import partial
from six.moves.urllib.parse import urlparse
from requests import HTTPError
from twisted.internet.task import cooperate
from txrequests import Session
class HttpClientMixin(object):
def __init__(self, *args, **kwargs):
self._http_session = None
def http_download(self, url, dst, callback, errback=None, **kwargs):
dst = os.path.abspath(dst)
# Log request
deferred_response = self.http_session.get(url, stream=True, **kwargs)
deferred_response.addCallback(self._http_check_response)
deferred_response.addCallbacks(
partial(self._http_download, destination=dst, callback=callback),
partial(self._http_error_handler, url=url, errback=errback)
)
def _http_download(self, response, destination=None, callback=None):
def _stream_download(r, f):
for chunk in r.iter_content(chunk_size=128):
f.write(chunk)
yield None
def _rollback(r, f, d):
if r:
r.close()
if f:
f.close()
if os.path.exists(d):
os.remove(d)
filehandle = open(destination, 'wb')
cooperative_dl = cooperate(_stream_download(response, filehandle))
cooperative_dl.whenDone().addCallback(lambda _: response.close)
cooperative_dl.whenDone().addCallback(lambda _: filehandle.close)
cooperative_dl.whenDone().addCallback(
partial(callback, url=response.url, destination=destination)
)
cooperative_dl.whenDone().addErrback(
partial(_rollback, r=response, f=filehandle, d=destination)
)
def _http_error_handler(self, failure, url=None, errback=None):
failure.trap(HTTPError)
# Log error message
if errback:
errback(failure)
#staticmethod
def _http_check_response(response):
response.raise_for_status()
return response
#property
def http_session(self):
if not self._http_session:
# Log session start
self._http_session = Session()
return self._http_session
Is there a way to get twisted to operate on this generator in such a way that it yields control when the generator itself is not prepared to yield something?
No. All Twisted can do is invoke the code. If the code blocks indefinitely, then the calling thread is blocked indefinitely. This is a basic premise of the Python runtime.
Is there a way to get twisted to download files asynchronously using something full-featured like requests?
There's treq. You didn't say what "full-featured" means here but earlier you mentioned "retries", "proxies", and "cachecontrol". I don't believe treq currently has these features. You can find some kind of feature matrix in the treq docs (though I notice it doesn't include any of the features you mentioned - even for requests). I expect implementations of such features would be welcome as treq contributions.
Is there a way to get twisted to download files asynchronously using something full-featured like requests?
Run it in a thread - probably using Twisted's threadpool APIs.
What would the basic approach be to such a problem with twisted, independent of the http features I want to use from requests.
treq.

Python asyncio task got bad yield

I am confused about how to play around with the asyncio module in Python 3.4. I have a searching API for a search engine, and want to each search request to be run either parallel, or asynchronously, so that I don't have to wait for one search finish to start another.
Here is my high-level searching API to build some objects with the raw search results. The search engine itself is using some kind of asyncio mechanism, so I won't bother with that.
# No asyncio module used here now
class search(object):
...
self.s = some_search_engine()
...
def searching(self, *args, **kwargs):
ret = {}
# do some raw searching according to args and kwargs and build the wrapped results
...
return ret
To try to async the requests, I wrote following test case to test how I can interact my stuff with the asyncio module.
# Here is my testing script
#asyncio.coroutine
def handle(f, *args, **kwargs):
r = yield from f(*args, **kwargs)
return r
s = search()
loop = asyncio.get_event_loop()
loop.run_until_complete(handle(s.searching, arg1, arg2, ...))
loop.close()
By running with pytest, it will return a RuntimeError: Task got bad yield : {results from searching...}, when it hits the line r = yield from ....
I also tried another way.
# same handle as above
def handle(..):
....
s = search()
loop = asyncio.get_event_loop()
tasks = [
asyncio.async(handle(s.searching, arg11, arg12, ...)),
asyncio.async(handle(s.searching, arg21, arg22, ...)),
...
]
loop.run_until_complete(asyncio.wait(tasks))
loop.close()
By running this test case by pytest, it passes but some weird exception from the search engine will raise. And it says Future/Task exception was never retrieved.
Things I wish to ask:
For my 1st try, is that the right way to use yield from, by returning the actual result from a function call?
I think I need to add some sleep to my 2nd test case to wait for the task finish, but how should I do that? And how can I get my function calls to return in my 2nd test case?
Is that a good way to implement asyncio with an existing module, by creating an async handler to handle requests?
If the answer to question 2 is NO, does every client calls to the class search needs to include loop = get_event_loop() this kind of stuffs to async the requests?
The problem is that you can't just call existing synchronous code as if it was an asyncio.coroutine and get asynchronous behavior. When you call yield from searching(...), you're only going to get asynchronous behavior if searching itself is actually an asyncio.coroutine, or at least returns an asyncio.Future. Right now, searching is just a regular synchronous function, so calling yield from searching(...) is just going to throw an error, because it doesn't return a Future or coroutine.
To get the behavior you want, you'll need to have an asynchronous version of searching in addition to a synchronous version (or just drop the synchronous version altogether if you don't need it). You have a few options to support both:
Rewrite searching as an asyncio.coroutine that it uses asyncio-compatible calls to do its I/O, rather than blocking I/O. This will make it work in an asyncio context, but it means you won't be able to call it directly in a synchronous context anymore. Instead, you'd need to also provide an alternative synchronous searching method that starts an asyncio event loop and calls return loop.run_until_complete(self.searching(...)). See this question for more details on that.
Keep your synchronous implementation of searching, and provide an alternative asynchronous API that uses BaseEventLoop.run_in_executor to run your the searching method in a background thread:
class search(object):
...
self.s = some_search_engine()
...
def searching(self, *args, **kwargs):
ret = {}
...
return ret
#asyncio.coroutine
def searching_async(self, *args, **kwargs):
loop = kwargs.get('loop', asyncio.get_event_loop())
try:
del kwargs['loop'] # assuming searching doesn't take loop as an arg
except KeyError:
pass
r = yield from loop.run_in_executor(None, self.searching, *args) # Passing None tells asyncio to use the default ThreadPoolExecutor
return r
Testing script:
s = search()
loop = asyncio.get_event_loop()
loop.run_until_complete(s.searching_async(arg1, arg2, ...))
loop.close()
This way, you can keep your synchronous code as is, and at least provide methods that can be used in asyncio code without blocking the event loop. It's not as clean a solution as it would be if you actually used asynchronous I/O in your code, but its better than nothing.
Provide two completely separate versions of searching, one that uses blocking I/O, and one that's asyncio-compatible. This gives ideal implementations for both contexts, but requires twice the work.

gevent to Tornado ioloop - Structure code with coroutines/generators

I'm trying to convert some fairly straightforward gevent code to use the async facilities of Tornado. The sample code below uses the ZMQ library to do a very simple request-response.
import zmq.green as zmq
def fun():
i = zmq.Context.instance()
sock = i.socket(zmq.REQ)
sock.connect('tcp://localhost:9005')
sock.send('Ping')
return sock.recv()
I can run this as fun() anywhere in my code. The .recv() call blocks while waiting for a reply, and the gevent hub can schedule the other parts of the code. When values are received, the function returns the value.
I read the problems that can arise with these implicit returns, and I want to run this using the Tornado IOLoop (also because I want to run it within the IPython Notebook). The following is an option, where recv_future() returns a Future that contains the result:
#gen.coroutine
def fun():
i = zmq.Context.instance()
sock = i.socket(zmq.REQ)
sock.connect('tcp://localhost:9005')
sock.send('Ping')
msg = yield recv_future(sock)
print "Received {}".format(msg[0])
raise gen.Return(msg)
def recv_future(socket):
zmqstream = ZMQStream(socket) # Required for ZMQ
future = Future()
def _finish(reply):
future.set_result(reply)
zmqstream.on_recv(_finish)
return future
The problem is that now fun() is not a function, but is a generator. So if I need to call it from another function, I need to use yield fun(). But then the calling function also becomes a generator!
What is the right way to structure code that uses Python generators? Do I have to make every function a generator to make it work? What if I need to call one of these functions from __init__()? Should that also become a generator?
What if I need to call one of these functions from __init__()? Should
that also become a generator?
This is one of the currently unsolved issues with explicit asynchronous programming with yield /yield from (on Python 3.3+). Magic methods don't support them. You can read some interesting thoughts from a Python core developer on asynchronous programming that touches on this issue here.
What is the right way to structure code that uses Python generators?
Do I have to make every function a generator to make it work?
Not every function, but every function that you want to call a coroutine, and wait for that coroutine to finish before continuing. When you switch to an explicit asynchronous programming model, you generally want to go all-in with it - your entire program runs inside the tornado ioloop. So, with this toy example, you would just do:
from tornado.ioloop import IOLoop
from tornado.gen import coroutine
from tornado.concurrent import Future
#gen.coroutine
def fun():
i = zmq.Context.instance()
sock = i.socket(zmq.REQ)
sock.connect('tcp://localhost:9005')
sock.send('Ping')
msg = yield recv_future(sock)
print "Received {}".format(msg[0])
raise gen.Return(msg)
def recv_future(socket):
zmqstream = ZMQStream(socket) # Required for ZMQ
future = Future()
def _finish(reply):
future.set_result(reply)
zmqstream.on_recv(_finish)
return future
if __name__ == "__main__":
ioloop = IOLoop.instance()
ioloop.add_callback(fun)
ioloop.start() # This will run fun, and then block forever.
#ioloop.run_sync(fun) # This will start the ioloop, run fun, then stop the ioloop
It looks like you might be able to get access to the ioloop IPython is using via the IPython.kernel API:
In [4]: from IPython.kernel.ioloop import manager
In [5]: manager.ioloop.IOLoop.instance()
Out[5]: <zmq.eventloop.ioloop.ZMQIOLoop at 0x4249ac8>

Simple async example with tornado python

I want find simple async server example.
I have got some function with lot of wait, database transactions ... etc:
def blocking_task(n):
for i in xrange(n):
print i
sleep(1)
return i
I need run it function in separated process without blocking. Is it possible?
Tornado is designed to run all your operations in a single thread, but utilize asynchronous I/O to avoid blocking as much as possible. If the DB you're using has asychronous Python bindings (ideally ones geared for Tornado specifically, like Motor for MongoDB or momoko for Postgres), then you'll be able to run your DB queries without blocking the server; no separate processes or threads needed.
To address the exact example you gave, where time.sleep(1) is called, you could use this approach to do it asynchronously via tornado coroutines:
#!/usr/bin/python
import tornado.web
from tornado.ioloop import IOLoop
from tornado import gen
import time
#gen.coroutine
def async_sleep(seconds):
yield gen.Task(IOLoop.instance().add_timeout, time.time() + seconds)
class TestHandler(tornado.web.RequestHandler):
#gen.coroutine
def get(self):
for i in xrange(100):
print i
yield async_sleep(1)
self.write(str(i))
self.finish()
application = tornado.web.Application([
(r"/test", TestHandler),
])
application.listen(9999)
IOLoop.instance().start()
The interesting part is async_sleep. That method is creating an asynchronous Task, which is calling the ioloop.add_timeout method. add_timeout will run a specified callback after a given number of seconds, without blocking the ioloop while waiting for the timeout to expire. It expects two arguments:
add_timeout(deadline, callback) # deadline is the number of seconds to wait, callback is the method to call after deadline.
As you can see in the example above, we're only actually providing one parameter to add_timeout explicitly in the code, which means we end up this this:
add_timeout(time.time() + seconds, ???)
We're not providing the expected callback parameter. In fact, when gen.Task executes add_timeout, it appends a callback keyword argument to the end of the explicitly provided parameters. So this:
yield gen.Task(loop.add_timeout, time.time() + seconds)
Results in this being executed inside gen.Task():
loop.add_timeout(time.time() + seconds, callback=gen.Callback(some_unique_key))
When gen.Callback is executed after the timeout, it signals that the gen.Task is complete, and the program execution will continue on to the next line. This flow is kind of difficult to fully understand, at least at first (it certainly was for me when I first read about it). It'll probably be helpful to read over the Tornado gen module documentation a few times.
import tornado.web
from tornado.ioloop import IOLoop
from tornado import gen
from tornado.concurrent import run_on_executor
from concurrent.futures import ThreadPoolExecutor # `pip install futures` for python2
MAX_WORKERS = 16
class TestHandler(tornado.web.RequestHandler):
executor = ThreadPoolExecutor(max_workers=MAX_WORKERS)
"""
In below function goes your time consuming task
"""
#run_on_executor
def background_task(self):
sm = 0
for i in range(10 ** 8):
sm = sm + 1
return sm
#tornado.gen.coroutine
def get(self):
""" Request that asynchronously calls background task. """
res = yield self.background_task()
self.write(str(res))
class TestHandler2(tornado.web.RequestHandler):
#gen.coroutine
def get(self):
self.write('Response from server')
self.finish()
application = tornado.web.Application([
(r"/A", TestHandler),
(r"/B", TestHandler2),
])
application.listen(5000)
IOLoop.instance().start()
When you run above code, you can run a computationally expensive operation at http://127.0.0.1:5000/A , which does not block execution, see by visiting http://127.0.0.1:5000/B immediately after you visit http://127.0.0.1:5000/A.
Here I update the information about Tornado 5.0. Tornado 5.0 add a new method IOLoop.run_in_executor. In the "Calling blocking functions" of Coroutine patterns Chapter:
The simplest way to call a blocking function from a coroutine is to use IOLoop.run_in_executor, which returns Futures that are compatible with coroutines:
#gen.coroutine
def call_blocking():
yield IOLoop.current().run_in_executor(blocking_func, args)
Also, in the documeng of run_on_executor, is says:
This decorator should not be confused with the similarly-named IOLoop.run_in_executor. In general, using run_in_executor when calling a blocking method is recommended instead of using this decorator when defining a method. If compatibility with older versions of Tornado is required, consider defining an executor and using executor.submit() at the call site.
In 5.0 version, IOLoop.run_in_executor is recommanded in use case of Calling blocking functions.
Python 3.5 introduced the async and await keywords (functions using these keywords are also called “native coroutines”). For compatibility with older versions of Python, you can use “decorated” or “yield-based” coroutines using the tornado.gen.coroutine decorator.
Native coroutines are the recommended form whenever possible. Only use decorated coroutines when compatibility with older versions of Python is required. Examples in the Tornado documentation will generally use the native form.
Translation between the two forms is generally straightforward:
# Decorated: # Native:
# Normal function declaration
# with decorator # "async def" keywords
#gen.coroutine
def a(): async def a():
# "yield" all async funcs # "await" all async funcs
b = yield c() b = await c()
# "return" and "yield"
# cannot be mixed in
# Python 2, so raise a
# special exception. # Return normally
raise gen.Return(b) return b
Other differences between the two forms of coroutine are outlined below.
Native coroutines:
are generally faster.
can use async for and async with statements which make some patterns much simpler.
do not run at all unless you await or yield them. Decorated coroutines can start running “in the background” as soon as they are called. Note that for both kinds of coroutines it is important to use await or yield so that any exceptions have somewhere to go.
Decorated coroutines:
have additional integration with the concurrent.futures package, allowing the result of executor.submit to be yielded directly. For native coroutines, use IOLoop.run_in_executor instead.
support some shorthand for waiting on multiple objects by yielding a list or dict. Use tornado.gen.multi to do this in native coroutines.
can support integration with other packages including Twisted via a registry of conversion functions. To access this functionality in native coroutines, use tornado.gen.convert_yielded.
always return a Future object. Native coroutines return an awaitable object that is not a Future. In Tornado the two are mostly interchangeable.
Worth to see:
Simplest async/await example

Categories

Resources