How to test a Faust agent that sends data to a sink? - python

I am trying to write unit tests using pytest for my Faust application. I have referred to the documentation here but it does not mention what to do when my Faust agent is sending data to a sink.
Without a sink, my tests are working fine but when I use a sink, I get this error:
RuntimeError: Task <Task pending name='Task-2' coro=<Agent._execute_actor() running at /Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/faust/agents/agent.py:647> cb=[<TaskWakeupMethWrapper object at 0x7fc28967c5b0>()]> got Future <Future pending> attached to a different loop
INFO faust.agents.agent:logging.py:265 [^-AgentTestWrapper: ml_exporter.processDetections]: Stopping...
I have tried various methods: such as patching out the decorator in my Faust app that sends the data to the sink, trying to test my function without the decorator (by trying to bypass it), patching out the sink parameter in my Faust app to have a None value (so it doesn't send my data to the sink), etc. I have had no luck with any of these.
Here is my Faust agent:
app = faust.App('ml-exporter', broker=dx_broker, value_serializer='json')
detection_topic = app.topic(dx_topic)
graph_topic = app.topic(gwh_topic)
#app.agent(detection_topic, sink=[graph_topic])
async def processDetections(detections):
detection_count = 0
async for detection in detections:
detection_count += 1
# r.set("detection_count", detection_count)
yield detection
Here is my current testing code:
import ml_exporter
patch('ml_exporter.graph_topic', None)
def create_app():
return faust.App('ml-exporter', value_serializer='json')
#pytest.fixture()
def test_app(event_loop):
app = create_app()
app.finalize()
app.flow_control.resume()
return app
#pytest.mark.asyncio()
async def test_processDetections(test_app):
async with ml_exporter.processDetections.test_context() as agent:
event = await agent.put('hey')
assert agent.results[event.message.offset] == 'hey'
I get the same error as mentioned above when I run this test. Is there any way to test my Faust app successfully?
Thank you!

Force pytest to use Faust's asyncio event loop as the default global loop. Add the following fixture to your test code:
#pytest.mark.asyncio()
#pytest.fixture()
def event_loop():
yield app.loop
As described in the pytest documentation:
The event_loop fixture can be easily overridden in any of the standard pytest locations (e.g. directly in the test file, or in conftest.py) to use a non-default event loop. If the pytest.mark.asyncio marker is applied, a pytest hook will ensure the produced loop is set as the default global loop.

It works for me:
with patch('faust.types.agents.SinkT') as mock:
async with process.test_context(sink=mock) as agent:

Related

How to close aiohttp.ClientSession automatically before program ends from a library POV?

I'm building an async library with aiohttp. The library has a single client that on instantiation creates a ClientSession and uses it to make requests to an API (it's an REST API wrapper)
The problem i'm facing is how to cleanly close the client session on exit?
If the session is not explicitly closed a whole lot of errors come out but i can't simply use context managers to close the session since i don't know when the program will end.
A tipical use would be this:
from mylibrary import Client
client = Client()
async main():
await client.get_foo(...)
await client.patch_bar(...)
asyncio.run(main())
I could add await client.close_session() on main but I want to remove this responsability from the enduser so ideally the client would automatically close the ClientSession when the program ends.
How can I do this?
I have tried using __del__ on the client to get the loop and close the session without success as well as using the atexit library, but it seems that by the time these run the asyncio loop has already been destroyed and I still get the warnings.
The specific error is:
Fatal error on SSL transport
protocol: <asyncio.sslproto.SSLProtocol object at 0x0000013ACFD54AF0>
transport: <_ProactorSocketTransport fd=1052 read=<_OverlappedFuture cancelled>>
I did some research on this error and google seems to think it's because I need to implement flow control, I have however and this error only occurs if I don't explicitly close the session.
Unfortunately, it seems like the only clean pattern that can apply there is to make your client itself an (async) context manager, and require that your users use it in a with block.
The __del__ method could work in some cases - but it would require that code from your users would not "leak" the Client instance itself.
so, the code is trivial - the burden on your users is not zero:
class Client:
...
async def __aenter__(self):
return self
async def __aexit__(self, exc_type, exc_value, tb):
await self.close_session()
Creating a pseudo-hook on loop.stop:
Another way, though not "clean" and not guaranteed to work, could be to decorate the running loop stop function to add a call to close_session.
If the user code just "halts" and does not tear down the loop properly, this can't help anyway - but I guess it might be an option for "well behaved" users.
The big problem here is this is not documented - but taking a pick on asyncio internals, it looks it always will go through self.stop().
import asyncio
class ShutDownCb:
def __init__(self, cb):
self.cb = cb
self.stopping = False
loop = self.loop = asyncio.get_running_loop()
self.original_stop = loop.stop
loop.stop = self.new_stop
async def _stop(self):
self.task.result()
return self.original_stop()
def new_stop(self):
if not self.stopping:
self.stopping = True
self.task = asyncio.create_task(self.cb())
asyncio.create_task(self._stop())
return
return self.original_stop()
class Client:
def __init__(self, ...):
...
ShutDownCb(self.close_session)

python aio libray logs and async method logs not showing in function application insights

I am using python aio management client library to create azure resources , for example azure.mgmt.eventhub.aio.EventHubManagementClient
Observation :
While counterpart azure sync libraries(for example azure.mgmt.eventhub.EventHubManagementClient ) print http logs when management service API is called, but async libraries doesn't print similar logs
Also when I use async method and use python logger in async methods , Even those logs doesn't print .
Sample Code
logger = logging.getLogger(__name__)
logger.setLevel(logging.INFO)
logging.basicConfig(level=logging.INFO)
async def _create_or_update_eventhub_namespace_authorization_rule(self, authorization_rule_name,
rights: AccessRights):
await self._event_management_client.namespaces.create_or_update_authorization_rule(
resource_group_name=EnvironmentVariables.RESOURCE_GROUP_NAME,
namespace_name=self._env_var_obj.event_hub_name_space_name,
authorization_rule_name=authorization_rule_name,
parameters={
"rights": [rights]
}
)
logger.info('Provisioned EH-Namespace Rule:' + authorization_rule_name)
Here both create_or_update_authorization_rule() and my own log doesn't show in Azure function insights.
This issue got resolved when I defined the the azure function main function itself as async function, Azure python workers handle manages the event loop and calls this main method asynchronously
async def main(req: func.HttpRequest) -> func.HttpResponse:
With this all aio client logs and my own logs are getting logged in app insights .
Note :
Earlier I was defining above method as non-async , then used to call async mehtod from main() by explicitly creating event loop and using asgiref.async_to_sync library

Python Flask - Asyncio asynchronous Question

I have a web endpoint for users to upload file.
When the endpoint receives the request, I want to run a background job to process the file.
Since the job would take time to complete, I wish to return the job_id to the user to track the status of the request while the job is running in background.
I am wondering if asyncio would help in this case.
import asyncio
#asyncio.coroutine
def process_file(job_id, file_obj):
<process the file and dump results in db>
#app.route('/file-upload', methods=['POST'])
def upload_file():
job_id = uuid()
process_file(job_id, requests.files['file']) . # I want this call to be asyc without any await
return jsonify('message' : 'Request received. Track the status using: " + `job_id`)
With the above code, process_file method is never called. Not able to understand why.
I am not sure if this is the right way to do it though, please help if I am missing something.
Flask doesn't support async calls yet.
To create and execute heavy tasks in background you can use https://flask.palletsprojects.com/en/1.1.x/patterns/celery/ Celery library.
You can use this for reference:
Making an asynchronous task in Flask
Official documentation:
http://docs.celeryproject.org/en/latest/getting-started/first-steps-with-celery.html#installing-celery
Even though you wrote #asyncio.coroutine() around a function it is never awaited which tells a function to return result.
Asyncio is not good for such kind of tasks, because they are blocking I/O. It is usually used to make function calls and return results fast.
As #py_dude mentioned, Flask does not support async calls. If you are looking for a library that functions and feels similar to Flask but is asynchronous, I recommend checking out Sanic. Here is some sample code:
from sanic import Sanic
from sanic.response import json
app = Sanic()
#app.route("/")
async def test(request):
return json({"hello": "world"})
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8000)
Updating your database asynchronously shouldn't be an issue; refer to here to find asyncio-supported database drivers. For processing your file, check out aiohttp. You can run your server extremely fast on a single thread without any hickup if you do so asynchronously.

Background tasks in flask

I am writing a web application which would do some heavy work. With that in mind I thought of making the tasks as background tasks(non blocking) so that other requests are not blocked by the previous ones.
I went with demonizing the thread so that it doesn't exit once the main thread (since I am using threaded=True) is finished, Now if a user sends a request my code will immediately tell them that their request is in progress, it'll be running in the background, and the application is ready to serve other requests.
My current application code looks something like this:
from flask import Flask
from flask import request
import threading
class threadClass:
def __init__(self):
thread = threading.Thread(target=self.run, args=())
thread.daemon = True # Daemonize thread
thread.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = threadClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
I just want it to be able to handle a few concurrent requests (it's not gonna be used in production)
Could I have done this better? Did I miss anything? I was going through python's multi-threading package and found this
multiprocessing is a package that supports spawning processes using an
API similar to the threading module. The multiprocessing package
offers both local and remote concurrency, effectively side-stepping
the Global Interpreter Lock by using subprocesses instead of threads.
Due to this, the multiprocessing module allows the programmer to fully
leverage multiple processors on a given machine. It runs on both Unix
and Windows.
Can I demonize a process using multi-processing? How can I achieve better than what I have with threading module?
##EDIT
I went through the multi-processing package of python, it is similar to threading.
from flask import Flask
from flask import request
from multiprocessing import Process
class processClass:
def __init__(self):
p = Process(target=self.run, args=())
p.daemon = True # Daemonize it
p.start() # Start the execution
def run(self):
#
# This might take several minutes to complete
someHeavyFunction()
app = Flask(__name__)
#app.route('/start', methods=['POST'])
try:
begin = processClass()
except:
abort(500)
return "Task is in progress"
def main():
"""
Main entry point into program execution
PARAMETERS: none
"""
app.run(host='0.0.0.0',threaded=True)
main()
Does the above approach looks good?
Best practice
The best way to implement background tasks in flask is with Celery as explained in this SO post. A good starting point is the official Flask documentation and the Celery documentation.
Crazy way: Build your own decorator
As #MrLeeh pointed out in a comment, Miguel Grinberg presented a solution in his Pycon 2016 talk by implementing a decorator. I want to emphasize that I have the highest respect for his solution; he called it a "crazy solution" himself. The below code is a minor adaptation of his solution.
Warning!!!
Don't use this in production! The main reason is that this app has a memory leak by using the global tasks dictionary. Even if you fix the memory leak issue, maintaining this sort of code is hard. If you just want to play around or use this in a private project, read on.
Minimal example
Assume you have a long running function call in your /foo endpoint. I mock this with a 10 second sleep timer. If you call the enpoint three times, it will take 30 seconds to finish.
Miguel Grinbergs decorator solution is implemented in flask_async. It runs a new thread in a Flask context which is identical to the current Flask context. Each thread is issued a new task_id. The result is saved in a global dictionary tasks[task_id]['result'].
With the decorator in place you only need to decorate the endpoint with #flask_async and the endpoint is asynchronous - just like that!
import threading
import time
import uuid
from functools import wraps
from flask import Flask, current_app, request, abort
from werkzeug.exceptions import HTTPException, InternalServerError
app = Flask(__name__)
tasks = {}
def flask_async(f):
"""
This decorator transforms a sync route to asynchronous by running it in a background thread.
"""
#wraps(f)
def wrapped(*args, **kwargs):
def task(app, environ):
# Create a request context similar to that of the original request
with app.request_context(environ):
try:
# Run the route function and record the response
tasks[task_id]['result'] = f(*args, **kwargs)
except HTTPException as e:
tasks[task_id]['result'] = current_app.handle_http_exception(e)
except Exception as e:
# The function raised an exception, so we set a 500 error
tasks[task_id]['result'] = InternalServerError()
if current_app.debug:
# We want to find out if something happened so reraise
raise
# Assign an id to the asynchronous task
task_id = uuid.uuid4().hex
# Record the task, and then launch it
tasks[task_id] = {'task': threading.Thread(
target=task, args=(current_app._get_current_object(), request.environ))}
tasks[task_id]['task'].start()
# Return a 202 response, with an id that the client can use to obtain task status
return {'TaskId': task_id}, 202
return wrapped
#app.route('/foo')
#flask_async
def foo():
time.sleep(10)
return {'Result': True}
#app.route('/foo/<task_id>', methods=['GET'])
def foo_results(task_id):
"""
Return results of asynchronous task.
If this request returns a 202 status code, it means that task hasn't finished yet.
"""
task = tasks.get(task_id)
if task is None:
abort(404)
if 'result' not in task:
return {'TaskID': task_id}, 202
return task['result']
if __name__ == '__main__':
app.run(debug=True)
However, you need a little trick to get your results. The endpoint /foo will only return the HTTP code 202 and the task id, but not the result. You need another endpoint /foo/<task_id> to get the result. Here is an example for localhost:
import time
import requests
task_ids = [requests.get('http://127.0.0.1:5000/foo').json().get('TaskId')
for _ in range(2)]
time.sleep(11)
results = [requests.get(f'http://127.0.0.1:5000/foo/{task_id}').json()
for task_id in task_ids]
# [{'Result': True}, {'Result': True}]

Start backend with async urlfetch on Google App Engine

I am experimenting with several of GAE's features.
I 've built a Dynamic Backend but I am having several issues getting this thing to work without task queues
Backend code:
class StartHandler(webapp2.RequestHandler):
def get(self):
#... do stuff...
if __name__ == '__main__':
_handlers = [(r'/_ah/start', StartHandler)]
run_wsgi_app(webapp2.WSGIApplication(_handlers))
The Backend is dynamic. So whenever it receives a call it does it's stuff and then stops.
Everything is worikng fine when I use inside my handlers:
url = backends.get_url('worker') + '/_ah/start'
urlfetch.fetch(url)
But I want this call to be async due to the reason that the Backend might take up to 10 minutes to finish it's work.
So I changed the above code to:
url = backends.get_url('worker') + '/_ah/start'
rpc = urlfetch.create_rpc()
urlfetch.make_fetch_call(rpc, url)
But then the Backend does not start. I am not interested into completion of the request or getting any data out of it.
What am I missing - implementing wrong?
Thank you all
Using RPC for async call without calling get_result() on the rpc object will not grantee that the urlfetch will be called. Once your code exits the pending async calls that were not completed will be aborted.
The only way to make the handler async is to queue the code in a push queue.

Categories

Resources