In python I want to create an async method in a class that create a thread without blocking the main thread. When the new thread finish, I return a value from that function/thread.
For example the class is used for retrieve some information from web pages. I want run parallel processing in a function that download the page and return a object.
class WebDown:
def display(self, url):
print 'display(): ' + content
def download(self, url):
thread = Thread(target=self.get_info)
# thread join
print 'download(): ' + content
# return the info
def get_info(self, url):
# download page
# retrieve info
return info
if __name__ == '__main__':
wd = WebDown()
ret = wd.download('http://...')
wd.display('http://...')
I this example, in order I call download() for retrieve the info, after display() for print others information. The print output should be
display(): foo, bar, ....
download(): blue, red, ....
One way to write asynchronous, non blocking code in python involves using Python's Twisted. Twisted does not rely on multithreading but uses multiprocessing instead. It gives you convenient way to create Deferred objects, adding callbacks and errbacks to them. The example you give would look like this in Twisted, I'm using treq (Twisted Requests) library which makes generating requests a little quicker and easier:
from treq import get
from twisted.internet import reactor
class WebAsync(object):
def download(self, url):
request = get(url)
request.addCallback(self.deliver_body)
def deliver_body(self, response):
deferred = response.text()
deferred.addCallback(self.display)
return deferred
def display(self, response_body):
print response_body
reactor.stop()
if __name__ == "__main__":
web_client = WebAsync()
web_client.download("http://httpbin.org/html")
reactor.run()
Both 'download' and 'deliver_body' methods return deferreds, you add callbacks to them that are going to be executed when results is available.
I would simply use request and gevent called grequests.
import grequests
>>> urls = [
'http://...',
'http://...'
]
>>> rs = (grequests.get(u) for u in urls)
>>> grequests.map(rs)
[<Response [200]>, <Response [200]>]
Related
Using Tornado, I have a POST request that takes a long time as it makes many requests to another API service and processes the data. This can take minutes to fully complete. I don't want this to block the entire web server from responding to other requests, which it currently does.
I looked at multiple threads here on SO, but they are often 8 years old and the code does not work anylonger as tornado removed the "engine" component from tornado.gen.
Is there an easy way to kick off this long get call and not have it block the entire web server in the process? Is there anything I can put in the code to say.. "submit the POST response and work on this one function without blocking any concurrent server requests from getting an immediate response"?
Example:
main.py
def make_app():
return tornado.web.Application([
(r"/v1", MainHandler),
(r"/v1/addfile", AddHandler, dict(folderpaths = folderpaths)),
(r"/v1/getfiles", GetHandler, dict(folderpaths = folderpaths)),
(r"/v1/getfile", GetFileHandler, dict(folderpaths = folderpaths)),
])
if __name__ == "__main__":
app = make_app()
sockets = tornado.netutil.bind_sockets(8888)
tornado.process.fork_processes(0)
tornado.process.task_id()
server = tornado.httpserver.HTTPServer(app)
server.add_sockets(sockets)
tornado.ioloop.IOLoop.current().start()
addHandler.py
class AddHandler(tornado.web.RequestHandler):
def initialize(self, folderpaths):
self.folderpaths = folderpaths
def blockingFunction(self):
time.sleep(320)
post("AWAKE")
def post(self):
user = self.get_argument('user')
folderpath = self.get_argument('inpath')
outpath = self.get_argument('outpath')
workflow_value = self.get_argument('workflow')
status_code, status_text = validateInFolder(folderpath)
if (status_code == 200):
logging.info("Status Code 200")
result = self.folderpaths.add_file(user, folderpath, outpath, workflow_value)
self.write(result)
self.finish()
#At this point the path is validated.
#POST response should be send out. Internal process should continue, new
#requests should not be blocked
self.blockingFunction()
Idea is that if input-parameters are validated the POST response should be sent out.
Then internal process (blockingFunction()) should be started, that should not block the Tornado Server from processing another API POST request.
I tried defining the (blockingFunction()) as async, which allows me to process multiple concurrent user requests - however there was a warning about missing "await" with async method.
Any help welcome. Thank you
class AddHandler(tornado.web.RequestHandler):
def initialize(self, folderpaths):
self.folderpaths = folderpaths
def blockingFunction(self):
time.sleep(320)
post("AWAKE")
async def post(self):
user = self.get_argument('user')
folderpath = self.get_argument('inpath')
outpath = self.get_argument('outpath')
workflow_value = self.get_argument('workflow')
status_code, status_text = validateInFolder(folderpath)
if (status_code == 200):
logging.info("Status Code 200")
result = self.folderpaths.add_file(user, folderpath, outpath, workflow_value)
self.write(result)
self.finish()
#At this point the path is validated.
#POST response should be send out. Internal process should continue, new
#requests should not be blocked
await loop.run_in_executor(None, self.blockingFunction)
#if this had multiple parameters it would be
#await loop.run_in_executor(None, self.blockingFunction, param1, param2)
Thank you #xyres
Further read: https://www.tornadoweb.org/en/stable/faq.html
I am trying to understand how to handle a grpc api with bidirectional streaming (using the Python API).
Say I have the following simple server definition:
syntax = "proto3";
package simple;
service TestService {
rpc Translate(stream Msg) returns (stream Msg){}
}
message Msg
{
string msg = 1;
}
Say that the messages that will be sent from the client come asynchronously ( as a consequence of user selecting some ui elements).
The generated python stub for the client will contain a method Translate that will accept a generator function and will return an iterator.
What is not clear to me is how would I write the generator function that will return messages as they are created by the user. Sleeping on the thread while waiting for messages doesn't sound like the best solution.
This is a bit clunky right now, but you can accomplish your use case as follows:
#!/usr/bin/env python
from __future__ import print_function
import time
import random
import collections
import threading
from concurrent import futures
from concurrent.futures import ThreadPoolExecutor
import grpc
from translate_pb2 import Msg
from translate_pb2_grpc import TestServiceStub
from translate_pb2_grpc import TestServiceServicer
from translate_pb2_grpc import add_TestServiceServicer_to_server
def translate_next(msg):
return ''.join(reversed(msg))
class Translator(TestServiceServicer):
def Translate(self, request_iterator, context):
for req in request_iterator:
print("Translating message: {}".format(req.msg))
yield Msg(msg=translate_next(req.msg))
class TranslatorClient(object):
def __init__(self):
self._stop_event = threading.Event()
self._request_condition = threading.Condition()
self._response_condition = threading.Condition()
self._requests = collections.deque()
self._last_request = None
self._expected_responses = collections.deque()
self._responses = {}
def _next(self):
with self._request_condition:
while not self._requests and not self._stop_event.is_set():
self._request_condition.wait()
if len(self._requests) > 0:
return self._requests.popleft()
else:
raise StopIteration()
def next(self):
return self._next()
def __next__(self):
return self._next()
def add_response(self, response):
with self._response_condition:
request = self._expected_responses.popleft()
self._responses[request] = response
self._response_condition.notify_all()
def add_request(self, request):
with self._request_condition:
self._requests.append(request)
with self._response_condition:
self._expected_responses.append(request.msg)
self._request_condition.notify()
def close(self):
self._stop_event.set()
with self._request_condition:
self._request_condition.notify()
def translate(self, to_translate):
self.add_request(to_translate)
with self._response_condition:
while True:
self._response_condition.wait()
if to_translate.msg in self._responses:
return self._responses[to_translate.msg]
def _run_client(address, translator_client):
with grpc.insecure_channel('localhost:50054') as channel:
stub = TestServiceStub(channel)
responses = stub.Translate(translator_client)
for resp in responses:
translator_client.add_response(resp)
def main():
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
add_TestServiceServicer_to_server(Translator(), server)
server.add_insecure_port('[::]:50054')
server.start()
translator_client = TranslatorClient()
client_thread = threading.Thread(
target=_run_client, args=('localhost:50054', translator_client))
client_thread.start()
def _translate(to_translate):
return translator_client.translate(Msg(msg=to_translate)).msg
translator_pool = futures.ThreadPoolExecutor(max_workers=4)
to_translate = ("hello", "goodbye", "I", "don't", "know", "why",)
translations = translator_pool.map(_translate, to_translate)
print("Translations: {}".format(zip(to_translate, translations)))
translator_client.close()
client_thread.join()
server.stop(None)
if __name__ == "__main__":
main()
The basic idea is to have an object called TranslatorClient running on a separate thread, correlating requests and responses. It expects that responses will return in the order that requests were sent out. It also implements the iterator interface so that you can pass it directly to an invocation of the Translate method on your stub.
We spin up a thread running _run_client which pulls responses out of TranslatorClient and feeds them back in the other end with add_response.
The main function I included here is really just a strawman since I don't have the particulars of your UI code. I'm running _translate in a ThreadPoolExecutor to demonstrate that, even though translator_client.translate is synchronous, it yields, allowing you to have multiple in-flight requests at once.
We recognize that this is a lot of code to write for such a simple use case. Ultimately, the answer will be asyncio support. We have plans for this in the not-too-distant future. But for the moment, this sort of solution should keep you going whether you're running python 2 or python 3.
I am trying to play with this piece of code to understand #tornado.web.asynchronous. The code as intended should handle asynchronous web requests but it doesnt seem to work as intended. There are two end points:
1) http://localhost:5000/A (This is the time consuming request and
takes a few seconds)
2) http://localhost:5000/B (This is the fast request and takes no time to return.
However when I hit the browser to go to http://localhost:5000/A and then while that is running go to http://localhost:5000/B the second request is queued and runs only after A has finished.
In other words one task is time consuming but it blocks the other faster task. What am I doing wrong?
import tornado.web
from tornado.ioloop import IOLoop
import sys, random, signal
class TestHandler(tornado.web.RequestHandler):
"""
In below function goes your time consuming task
"""
def background_task(self):
sm = 0
for i in range(10 ** 8):
sm = sm + 1
return str(sm + random.randint(0, sm)) + "\n"
#tornado.web.asynchronous
def get(self):
""" Request that asynchronously calls background task. """
res = self.background_task()
self.write(str(res))
self.finish()
class TestHandler2(tornado.web.RequestHandler):
#tornado.web.asynchronous
def get(self):
self.write('Response from server: ' + str(random.randint(0, 100000)) + "\n")
self.finish()
def sigterm_handler(signal, frame):
# save the state here or do whatever you want
print('SIGTERM: got kill, exiting')
sys.exit(0)
def main(argv):
signal.signal(signal.SIGTERM, sigterm_handler)
try:
if argv:
print ":argv:", argv
application = tornado.web.Application([
(r"/A", TestHandler),
(r"/B", TestHandler2),
])
application.listen(5000)
IOLoop.instance().start()
except KeyboardInterrupt:
print "Caught interrupt"
except Exception as e:
print e.message
finally:
print "App: exited"
if __name__ == '__main__':
sys.exit(main(sys.argv))
According to the documentation:
To minimize the cost of concurrent connections, Tornado uses a
single-threaded event loop. This means that all application code
should aim to be asynchronous and non-blocking because only one
operation can be active at a time.
To achieve this goal you need to prepare the RequestHandler properly. Simply adding #tornado.web.asynchronous decorator to any of the functions (get, post, etc.) is not enough if the function performs only synchronous actions.
What does the #tornado.web.asynchronous decorator do?
Let's look at the get function. The statements are executed one after another in a synchronous manner. Once the work is done and the function returns the request is being closed. A call to self.finish() is being made under the hood. However, when we use the #tornado.web.asynchronous decorator the request is not being closed after the function returned. So the self.finish() must be called by the user to finish the HTTP request. Without this decorator the request is automatically finished when the get() method returns.
Look at the "Example 21" from this page - tornado.web.asynchronous:
#web.asynchronous
def get(self):
http = httpclient.AsyncHTTPClient()
http.fetch("http://example.com/", self._on_download)
def _on_download(self, response):
self.finish()
The get() function performs an asynchronous call to the http://example.com/ page. Let's assume this call is a long action. So the http.fetch() function is being called and a moment later the get() function returns (http.fetch() is still running the background). The Tornado's IOLoop can move forward to serve the next request while the data from the http://example.com/ is being fetched. Once the the http.fetch() function call is finished the callback function - self._on_download - is called. Then self.finish() is called and the request is finally closed. This is the moment when the user can see the result in the browser.
It's possible due to the httpclient.AsyncHTTPClient(). If you use a synchronous version of the httpclient.HTTPClient() you will need to wait for the call to http://example.com/ to finish. Then the get() function will return and the next request will be processed.
To sum up, you use #tornado.web.asynchronous decorator if you use asynchronous code inside the RequestHandler which is advised. Otherwise it doesn't make much difference to the performance.
EDIT: To solve your problem you can run your time-consuming function in a separate thread. Here's a simple example of your TestHandler class:
class TestHandler(tornado.web.RequestHandler):
def on_finish(self, response):
self.write(response)
self.finish()
def async_function(base_function):
#functools.wraps(base_function)
def run_in_a_thread(*args, **kwargs):
func_t = threading.Thread(target=base_function, args=args, kwargs=kwargs)
func_t.start()
return run_in_a_thread
#async_function
def background_task(self, callback):
sm = 0
for i in range(10 ** 8):
sm = sm + 1
callback(str(sm + random.randint(0, sm)))
#tornado.web.asynchronous
def get(self):
res = self.background_task(self.on_finish)
You also need to add those imports to your code:
import threading
import functools
import threading
async_function is a decorator function. If you're not familiar with the topic I suggest to read (e.g.: decorators) and try it on your own. In general, our decorator allows the function to return immediately (so the main program execution can go forward) and the processing to take place at the same time in a separate thread. Once the function in a thread is finished we call a callback function which writes out the results to the end user and closes the connection.
I have a Python script makes many async requests. The API I'm using takes a callback.
The main function calls run and I want it to block execution until all the requests have come back.
What could I use within Python 2.7 to achieve this?
def run():
for request in requests:
client.send_request(request, callback)
def callback(error, response):
# handle response
pass
def main():
run()
# I want to block here
I found that the simplest, least invasive way is to use threading.Event, available in 2.7.
import threading
import functools
def run():
events = []
for request in requests:
event = threading.Event()
callback_with_event = functools.partial(callback, event)
client.send_request(request, callback_with_event)
events.append(event)
return events
def callback(event, error, response):
# handle response
event.set()
def wait_for_events(events):
for event in events:
event.wait()
def main():
events = run()
wait_for_events(events)
What is the best way to make an asynchronous call appear synchronous? Eg, something like this, but how do I coordinate the calling thread and the async reply thread? In java I might use a CountdownLatch() with a timeout, but I can't find a definite solution for Python
def getDataFromAsyncSource():
asyncService.subscribe(callback=functionToCallbackTo)
# wait for data
return dataReturned
def functionToCallbackTo(data):
dataReturned = data
There is a module you can use
import concurrent.futures
Check this post for sample code and module download link: Concurrent Tasks Execution in Python
You can put executor results in future, then get them, here is the sample code from http://pypi.python.org:
import concurrent.futures
import urllib.request
URLS = ['http://www.foxnews.com/',
'http://www.cnn.com/',
'http://europe.wsj.com/',
'http://www.bbc.co.uk/',
'http://some-made-up-domain.com/']
def load_url(url, timeout):
return urllib.request.urlopen(url, timeout=timeout).read()
with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
future_to_url = dict((executor.submit(load_url, url, 60), url)
for url in URLS)
for future in concurrent.futures.as_completed(future_to_url):
url = future_to_url[future]
if future.exception() is not None:
print('%r generated an exception: %s' % (url,future.exception()))
else:
print('%r page is %d bytes' % (url, len(future.result())))
A common solution would be the usage of a synchronized Queue and passing it to the callback function. See http://docs.python.org/library/queue.html.
So for your example this could look like (I'm just guessing the API to pass additional arguments to the callback function):
from Queue import Queue
def async_call():
q = Queue()
asyncService.subscribe(callback=callback, args=(q,))
data = q.get()
return data
def callback(data, q):
q.put(data)
This is a solution using the threading module internally so it might not work depending on your async library.