Async python smooth latency

Async python smooth latency - python

I'm using grpc.aio.server and I am stuck with a problem that if I try to make a load test on my service it will have some requests lagging for 10 seconds, but the requests are similar. The load is stable (200rps) and the latency of almost all requests are almost the same. I'm ok with higher latency, as long as it's stable. I've tried to google something like async task priority, but in my mind it means that something is wrong with the priority of tasks which wait a very long time, but they're finished or the full request task is waiting to start for a long time.
e.g 1000 requests were sent to the gRPC service, they all have the same logic to execute, the same db instance, the same query to the db, the same time to get results from db, etc, everything is the same. I see that e.g. 10th request latency is 10 seconds, but 13th request latency is 5 seconds. I can also see in logs, that the db queries have almost the same execution time.
Any suggestions? maybe I understand something wrong

There are multiple reasons why this behaviour may happen. Here are a few things that you can take a look at:
What type of workload do you have? Is it I/O bound or CPU bound?
Does your code block the event loop at some point? The path for each request is fully asynchronous? The doc states pretty clear that blocking the event loop is costly:
Blocking (CPU-bound) code should not be called directly. For example, if a function performs a CPU-intensive calculation for 1 second, all concurrent asyncio Tasks and IO operations would be delayed by 1 second.
What happens with the memory when you see those big latencies? You can run a memory profiling using this tool and check the memory. There are high chances to see a correlation between the latency and an intense activity of the Python Memory Manager which tries to reclaim the memory. Here is a nice article around memory management that you can check out.

Every server will have latency differences between requests. However, the scale should be much lower then what you experience
Your question does not have the server initialization code so can't know what config is used. I would start by looking at the thread pool size for the server.
According to the docs the thread pool instance is a required argument so maybe try to set a different pool size. My guess is that the threads are exhausted and then the latency goes up since the request is waiting for a thread to free up

Related

Using downtime in an asyncio application

I have an asyncio based program which has very inconsistent CPU load. I need to do some relatively computation intensive things to fill up a buffer which the program reads from. However, if I do this while there's high load, I may end up causing the latency-sensitive parts to be slower than I'd like, as the "precompute the stuff" coroutine will be hogging a lot of CPU time. There are also coroutines that must run frequently (handling heartbeats for a websocket connection), so if this preprocessing takes too long those will die.
One solution I've come up with is to simply do this in another process which has lower priority, but if I could keep this all in a single program I'd be much happier. What is a good design for handling this sort of situation?

Flask: spawning a single async sub-task within a request

I have seen a few variants of my question but not quite exactly what I am looking for, hence opening a new question.
I have a Flask/Gunicorn app that for each request inserts some data in a store and, consequently, kicks off an indexing job. The indexing is 2-4 times longer than the main data write and I would like to do that asynchronously to reduce the response latency.
The overall request lifespan is 100-150ms for a large request body.
I have thought about a few ways to do this, that is as resource-efficient as possible:
Use Celery. This seems the most obvious way to do it, but I don't want to introduce a large library and most of all, a dependency on Redis or other system packages.
Use subprocess.Popen. This may be a good route but my bottleneck is I/O, so threads could be more efficient.
Using threads? I am not sure how and if that can be done. All I know is how to launch multiple processes concurrently with ThreadPoolExecutor, but I only need to spawn one additional task, and return immediately without waiting for the results.
asyncio? This too I am not sure how to apply to my situation. asyncio has always a blocking call.
Launching data write and indexing concurrently: not doable. I have to wait for a response from the data write to launch indexing.
Any suggestions are welcome!
Thanks.

Celery will be your best bet - it's exactly what it's for.
If you have a need to introduce dependencies, it's not a bad thing to have dependencies. Just as long as you don't have unneeded dependencies.
Depending on your architecture, though, more advanced and locked-in solutions might be available. You could, if you're using AWS, launch an AWS Lambda function by firing off an AWS SNS notification, and have that handle what it needs to do. The sky is the limit.

I actually should have perused the Python manual section on concurrency better: the threading module does just what I needed: https://docs.python.org/3.5/library/threading.html
And I confirmed with some dummy sleep code that the sub-thread gets completed even after the Flask request is completed.

celery and long running tasks

I just watch a youtube video where the presenter mentioned that one should design his/her celery to be short. Tasks running several minutes are bad.
Is this correct? What I do see is that I have some long running task, which takes say 10 minutes to finish. When these kind of task is scheduled frequently, the queue is swamped and no other tasks get scheduled. Is this the reason?
If so, what should be used for long running tasks?

Long running tasks aren't great but It's by no means appropriate to say they are bad. The best way to handle long running tasks is to create a queue for just those tasks and have them run on a separate worker then the short tasks.

The problem with long running tasks is that you have to wait for them when you're pushing a new software version on your server. If you don't wait, your task may run possibly incompatible code, especially if you pickled some complex object as a parameter (which is strongly discouraged).

As #user2097159 said its a good practice to keep the long running tasks in a dedicate queue. You should do that by routing using "settings.CELERY_ROUTES" more info here
If you could estimate how long a task can be running, I recommend to use soft_time_limit per task, you will be able to handle it.
There is a gist from a talk I gave here

Augment the basic Task definition to optionally treat the task instantiation as a generator, and check for TERM or soft timeout on every iteration through the generator. Generically inject a "state" dict kwarg into tasks that support it. If it's the first time the task is run, allocate a new one in results cache, otherwise look up the existing one from results cache.
In your task, figure out a good place to yield which results in short execution times. Update the state parameter as necessary.
When control returns to the master task class, check for TERM or soft timeout, and if there is one, save off the state object and respond to the signal.

What happens when you have an infinite loop in Django view code?

Something that I just thought about:
Say I'm writing view code for my Django site, and I make a mistake and create an infinite loop.
Whenever someone would try to access the view, the worker assigned to the request (be it a Gevent worker or a Python thread) would stay in a loop indefinitely.
If I understand correctly, the server would send a timeout error to the client after 30 seconds. But what will happen with the Python worker? Will it keep on working indefinitely? That sounds dangerous!
Imagine I've got a server in which I've allocated 10 workers. I let it run and at some point, a client tries to access the view with the infinite loop. A worker will be assigned to it, and will be effectively dead until the next server restart. The dangerous thing is that at first I wouldn't notice it, because the site would just be imperceptibly slower, having 9 workers instead of 10. But then it might happen again and again throughout a long span of time, maybe months. The site would just get progressively slower, until eventually it would be really slow with just one worker.
A server restart would solve the problem, but I'd hate to have my site's functionality depend on server restarts.
Is this a real problem that happens? Is there a way to avoid it?
Update: I'd also really appreciate a way to take a stacktrace of the thread/worker that's stuck in an infinite loop, so I could have that emailed to me so I'll be aware of the problem. (I don't know how to do this because there is no exception being raised.)
Update to people saying things to the effect of "Avoid writing code that has infinite loops": In case it wasn't obvious, I do not spend my free time intentionally putting infinite loops into my code. When these things happen, they are mistakes, and mistakes can be minimized but never completely avoided. I want to know that even when I make a mistake, there'll be a safety net that will notify me and allow me to fix the problem.

It is a real problem. In case of gevent, due to context switching, it can even immediately stop your website from responding.
Everything depends on your environment. For example, when running django in production through uwsgi you can set harakiri - that is time in seconds, after which thread handling the request will be killed if it didn't finish handling the response. It is strongly recommended to set such a value in order to deal with some faulty requests or bad code. Such event is reported in uwsgi log. I believe other solutions for running Django in production have similar options.
Otherwise, due to network architecture, client disconnection will not stop the infinite loop, and by default there will be no response at all - just infinite loading. Various timeout options (one of which harakiri is) may end up showing connection timeout - for example, php has (as far as i remember) default timeout of 30 seconds and it will return 504 gateway timeout. Socket disconnection timeout depends on http server settings and it will not stop application thread, it will only close client socket.
If not using gevent (or any other green threads), infinite loop will tend to take up 100% of available CPU power (limited to one core), possibly eating up more and more memory, so your website will work pretty slow and/or timeout really quick. Django itself is not aware of request time, so - as mentioned before - your production environment stack is the way to prevent this from happening. In case of uwsgi, http://uwsgi-docs.readthedocs.org/en/latest/Options.html#harakiri-verbose is the way to go.
Harakiri does print stack trace of the killed proces: (https://uwsgi-docs.readthedocs.org/en/latest/Tracebacker.html?highlight=harakiri) straight to uwsgi log, and due to alarm system you can get notified through e-mail (http://uwsgi-docs.readthedocs.org/en/latest/AlarmSubsystem.html)

I just tested this on Django's development server.
Results:
Does not give a timeout after 30 seconds. (this might because its not a production server though)
Stays in loading until i close the page.
I guess one way to avoid it, without actually just avoiding a code like that, would be to use threading to have control of timeouts and be able to stop the thread.
Maybe something like:
import threading
from django.http import HttpResponse
class MyThread(threading.Thread):
def __init__(self):
threading.Thread.__init__(self)
def run(self):
print "your possible infinite loop code here"
def possible_loop_view(request):
thread = MyThread()
thread.start()
return HttpResponse("html response")

Yes, your analysis is correct. The worker thread/process will keep running. Moreover, if there is no wait/sleep in the loop, it will hog the CPU. Other threads/process will get very little cpu, resulting your entire site on slow response.
Also, I don't think server will send any timeout error to client explicitly. If the TCP timeout is set, TCP connection will be closed.
Client may also have some timeout setting to get response, which may come into picture.
Avoiding such code is best way to avoid such code. You can also have some monitoring tool on server to look for CPU/memory usage and notify for abnormal activity so that you can take action.

CherryPy and concurrency

I'm using CherryPy in order to serve a python application through WSGI.
I tried benchmarking it, but it seems as if CherryPy can only handle exactly 10 req/sec. No matter what I do.
Built a simple app with a 3 second pause, in order to accurately determine what is going on... and I can confirm that the 10 req/sec has nothing to do with the resources used by the python script.
__
Any ideas?

By default, CherryPy's builtin HTTP server will use a thread pool with 10 threads. If you are still using the defaults, you could try increasing this in your config file.
[global]
server.thread_pool = 30
See the cpserver documentation
Or the archive.org copy of the old documentation

This was extremely confounding for me too. The documentation says that CherryPy will automatically scale its thread pool based on observed load. But my experience is that it will not. If you have tasks which might take a while and may also use hardly any CPU in the mean time, then you will need to estimate a thread_pool size based on your expected load and target response time.
For instance, if the average request will take 1.5 seconds to process and you want to handle 50 requests per second, then you will need 75 threads in your thread_pool to handle your expectations.
In my case, I delegated the heavy lifting out to other processes via the multiprocessing module. This leaves the main CherryPy process and threads at idle. However, the CherryPy threads will still be blocked awaiting output from the delegated multiprocessing processes. For this reason, the server needs enough threads in the thread_pool to have available threads for new requests.
My initial thinking is that the thread_pool would not need to be larger than the multiprocessing pool worker size. But this turns out also to be a misled assumption. Somehow, the CherryPy threads will remain blocked even where there is available capacity in the multiprocessing pool.
Another misled assumption is that the blocking and poor performance have something to do with the Python GIL. It does not. In my case I was already farming out the work via multiprocessing and still needed a thread_pool sized on the average time and desired requests per second target. Raising the thread_pool size addressed the issue. Although it looks like and incorrect fix.
Simple fix for me:
cherrypy.config.update({
'server.thread_pool': 100
})

Your client needs to actually READ the server's response. Otherwise the socket/thread will stay open/running until timeout and garbage collected.
use a client that behaves correctly and you'll see that your server will behave too.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.