How to profile django channels?

How to profile django channels? - python

My technology stack is Redis as a channels backend, Postgresql as a database, Daphne as an ASGI server, Nginx in front of a whole application. Everything is deployed using Docker Swarm, with only Redis and Database outside. I have about 20 virtual hosts, with 20 interface servers, 40 http workers and 20 websocket workers. Load balancing is done using Ingress overlay Docker network.
The problem is, sometimes very weird things happen regarding performance. Most of requests are handled in under 400ms, but sometimes request can take up to 2-3s, even during very small load. Profiling workers with Django Debug Toolbar or middleware-based profilers shows nothing (timing 0.01s or so)
My question: is there any good method of profiling a whole request path with django-channels? I would like how much time each phase takes, i.e when request was processed by Daphne, when worker started processing, when it finished, when interface server sent response to the client. Currently, I have no idea how to solve this.

Django-silk might be helpful to you in profiling the request and database searching time with following reasons:
It is easy to set by simply adding the configs on settings.py of your Django project.
Can be customised: by using the provided decorator, you can profile the function or methods and get their running performance.
Dynamic setting: you can choose to dynamically allocate silk to methods and also set the profiling rate you want during the running time.
As the documentation states:
Silk is a live profiling and inspection tool for the Django framework. Silk intercepts and stores HTTP requests and database queries before presenting them in a user interface for further inspection
Note: silk may double your database searching time, so it may cause some trouble if you set it on your production environment. However, the increase from silk will be shown separately on the dash board.
https://github.com/jazzband/django-silk

Why not stick a monitoring tool something like Kibana or New Relic and monitor why and what's taking so long for a small payload response. It can tell you the time spent on Python, PostgreSQL and Memcache (Redis).

Related

Routing requests to a specific Heroku Dyno

I built an real-time collaboration application with Prosemirror that uses a centralised operational transform algorithm (described here by Marijn Haverbeke) with a Python server using Django Channels with prosemirror-py as the central point.
The server creates a DocumentInstance for every document users are collaborating on, keeps it in memory and occasionally stores it in a Redis database. As long as there is only one Dyno, all requests are routed there. Then the server looks up the instance the request belongs to and updates it.
I would like to take advantage of Heroku's horizontal scaling and run more than one dyno. But as I understand, this would imply requests being routed to any of the running dynos. But since one DocumentInstance can only live on one server this would not work.
Is there a way to make sure that requests belonging to a specific DocumentInstance are only routed to the machine that keep that keeps it?
Or maybe there is an alternative architecture that I am overlooking?

When building scalable architecture the horizontal autoscaling layer is typically ephemeral, meaning no state is kept there it typically serves either for processing or IO to other services as these boxes are shut down and spun up with regularity. You should not be keeping the DocumentInstance data there as it will be lost given dynos spin up and shut down during autoscaling.
If you're using Redis as a centralized datastore you would be much better off either saving in real-time to Redis and having all dynos write/call to Redis to keep in sync, Redis is capable of this and would move all state from ephemeral dynos to a non-emphereal datastore. You may however be able to find a middle ground for your case using https://devcenter.heroku.com/articles/session-affinity.
I run 123 Dyno an autoscaling and monitoring add-on for Heroku for more options and faster, configurable autoscaling on Heroku when you get there.

Production ready Python apps on Kubernetes

I have been deploying apps to Kubernetes for the last 2 years. And in my org, all our apps(especially stateless) are running in Kubernetes. I still have a fundamental question, just because very recently we found some issues with respect to our few python apps.
Initially when we deployed, our python apps(Written in Flask and Django), we ran it using python app.py. It's known that, because of GIL, python really doesn't have support for system threads, and it will only serve one request at a time, but in case the one request is CPU heavy, it will not be able to process further requests. This is causing sometimes the health API to not work. We have observed that, at this moment, if there is a single request which is not IO and doing some operation, we will hold the CPU and cannot process another request in parallel. And since it's only doing fewer operations, we have observed there is no increase in the CPU utilization also. This has an impact on how HorizontalPodAutoscaler works, its unable to scale the pods.
Because of this, we started using uWSGI in our pods. So basically uWSGI can run multiple pods under the hood and handle multiple requests in parallel, and automatically spin new processes on demand. But here comes another problem, that we have seen, uwsgi is lacking speed in auto-scaling the process tocorrected serve the request and its causing HTTP 503 errors, Because of this we are unable to serve our few APIs in 100% availability.
At the same time our all other apps, written in nodejs, java and golang, is giving 100% availability.
I am looking at what is the best way by which I can run a python app in 100%(99.99) availability in Kubernetes, with the following
Having health API and liveness API served by the app
An app running in Kubernetes
If possible without uwsgi(Single process per pod is the fundamental docker concept)
If with uwsgi, are there any specific config we can apply for k8s env

We use Twisted's WSGI server with 30 threads and it's been solid for our Django application. Keeps to a single process per pod model which more closely matches Kubernetes' expectations, as you mentioned. Yes, the GIL means only one of those 30 threads can be running Python code at time, but as with most webapps, most of those threads are blocked on I/O (usually waiting for a response from the database) the vast majority of the time. Then run multiple replicas on top of that both for redundancy and to give you true concurrency at whatever level you need (we usually use 4-8 depending on the site traffic, some big ones are up to 16).

I have exactly the same problem with a python deployment running the Flask application. Most api calls are handled in a matter of seconds, but there are some cpu intensive requests that acquire GIL for 2 minutes.... The pod keep accepting requests, ignores the configured timeouts, ignores a closed connection by the user; then after 1 minute of liveness probes failing, the pod is restarted by kubelet.
So 1 fat request can dramatically drop the availability.
I see two different solutions:
have a separate deployment that will host only long running api calls; configure ingress to route requests between these two deployments;
using multiprocessing handle liveness/readyness probes in a main process, every other request must be handled in the child process;
There are pros and cons for each solution, maybe I will need a combination of both. Also if I need a steady flow of prometheus metrics, I might need to create a proxy server on the application layer (1 more container on the same pod). Also need to configure ingress to have a single upstream connection to python pods, so that long running request will be queued, whereas short ones will be processed concurrently (yep, python, concurrency, good joke). Not sure tho it will scale well with HPA.
So yeah, running production ready python rest api server on kubernetes is not a piece of cake. Go and java have a much better ecosystem for microservice applications.
PS
here is a good article that shows that there is no need to run your app in kubernetes with WSGI
https://techblog.appnexus.com/beyond-hello-world-modern-asynchronous-python-in-kubernetes-f2c4ecd4a38d
PPS
Im considering to use prometheus exporter for flask. Looks better than running a python client in a separate thread;
https://github.com/rycus86/prometheus_flask_exporter

gunicorn and/or celery: What is the way get the best out of both?

I've a machine learning application which uses flask to expose api(for production this is not a good idea, but even if I'll use django in future the idea of the question shouldn't change).
The main problem is how to serve multiple requests to my app. Few months back celery has been added to get around this problem. The number of workers in celery that was spawned is equal to the number of cores present in the machine. For very few users this was looking fine and was in production for some time.
When the number of concurrent users got increased, it was evident that we should do a performance testing on it. It turns out: it is able to handle 20 users for 30 GB and 8 core machine without authentication and without any front-end. Which is not looking like a good number.
I didn't know there are things like: application server, web server, model server. When googling for this problem: gunicorn was a good application server python application.
Should I use gunicorn or any other application server along with celery and why
If I remove celery and only use gunicorn with the application can I achieve concurrency. I have read somewhere celery is not good for machine learning applications.
What are the purposes of gunicorn and celery. How can we achieve the best out of both.
Note: Main goal is to maximize concurrency. While serving in production authentication will be added. One front-end application might come into action in between in production.

There is no shame in flask. If in fact you just need a web API wrapper, flask is probably a much better choice than django (simply because django is huge and you'd be using only a fraction of its capability).
However, your concurrency problems are apparently stemming from the fact that you are doing some heavy-duty processing for each request. There is simply no way around that; if you require a certain amount of computational resources per request, you can't magic those up. From here on, it's a juggling act.
If you want a guaranteed response immediately, you need to have as many workers as potential simultaneous requests. This may involve load balancing over multiple servers, if you can't scrounge up enough resources on one server. (cue gunicorn, a web application server, responsible for accepting connections and then distributing them to multiple application processes.)
If you are okay with not getting an immediate response, you can let stuff queue up. (cue celery, a task queue, which worker processes can use to retrieve the next thing to be done, and deposit results). This works best if you don't need a response in the same request-response cycle; e.g. you submit a job from client, and they only get an acknowledgement that the job has been received; you would need a second request to ask about the status of the job, and possibly the results of the job if it is finished.
Alternately, instead of Flask you could use websockets or Tornado, to push out the response to the client when it is available (as opposed to user polling for results, or waiting on a live HTTP connection and taking up a server process).

Django Parallel Processing

I have a simple Django project.
Each time a user hits the homepage,some operations are performed based on which,view is generated. Now the problem is that when a user hits the homepage ,sometimes the operations take a long time based on network connectivity. If in the meantime, a new user hits the homepage,he has to wait for the request from the previous user to get serviced before the page gets rendered.
I found Celery is used for task scheduling and queuing . But I wonder if Celery is what i need.I need each user to have his request be processed independently and not queued.
My project is a single app project and will receive a maximum of 100 users a time.
Thanks.

If the long process needs to be done in order to serve the request and generate the proper response then you cannot use Celery.
The debug web-server that is shipped with Django is a multi-threaded-single-process server, but is really very limited and should not be used in production.
If you use gunicorn or other wsgi servers you can run your application in multiple processes but you will hit the limit quickly if you're doing heavy processing.
The solution would be in my opinion is to either change the way you're processing stuff, either prepare ahead or serve the request and do the processing in the background, you can show the user a Please wait... message, here you can use Celery to do the processing.
The other solution would be to use event-based web-server like Twisted or cyclone or others

Google App Engine Application Extremely slow

I created a Hello World website in Google App Engine. It is using Django 1.1 without any patch.
Even though it is just a very simple web page, it takes long time and often it times out.
Any suggestions to solve this?
Note: It is responding fast after the first call.

Now Google has added a payment option "Always On" which is 0.30$ a day.
Using this feature, your application will not have to cold start any more.
Always On
While warmup requests help your
application scale smoothly, they do
not help if your application has very
low amounts of traffic. For
high-priority applications with low
traffic, you can reserve instances via
App Engine's Always On feature.
Always On is a premium feature which
reserves three instances of your
application, never turning them off,
even if the application has no
traffic. This mitigates the impact of
loading requests on applications that
have small or variable amounts of
traffic. Additionally, if an Always On
instance dies accidentally, App Engine
automatically restarts the instance
with a warmup request. As a result,
Always On applications should be sure
to do as much initialization as
possible during warmup requests.
Even after enabling Always On, your
application may experience loading
requests if there is a sudden increase
in traffic.
To enable Always On, go to the Billing
Settings page in your application's
Admin Console, and click the Always On
checkbox.
http://code.google.com/intl/de-DE/appengine/docs/adminconsole/instances.html

This is a horrible suggestion but I'll make it anyway:
Build a little client application or just use wget with cron to periodically access your app, maybe once every 5 minutes or so. That should keep Google from putting it into a dormant state.
I say this is a horrible suggestion because it's a waste of resources and an abuse of Google's free service. I'd expect you to do this only during a short testing/startup phase.

To summarize this thread so far:
Cold starts take a long time
Google discourages pinging apps to keep them warm, but people do not know the alternative
There is an issue filed to pay for a warm instance (of the Java)
There is an issue filed for Python. Among other things, .py files are not precompiled.
Some apps are disproportionately affected (can't find Google Groups ref or issue)
March 2009 thread about Python says <1s (!)
I see less talk about Python on this issue.

If it's responding quickly after the first request, it's probably just a case of getting the relevant process up and running. Admittedly it's slightly surprising that it takes so long that it times out. Is this after you've updated the application and verified that the AppEngine dashboard shows it as being ready?
"First hit slowness" is quite common in many web frameworks. It's a bit of a pain during development, but not a problem for production.

One more tip which might increase the response time.
Enabling billing does increase the quotas, and, to my personal experience, increase the overall response of an application as well. Probably because of the higher priority for billing-enabled applications google has. For instance, an app with billing disabled, can send up to 5-10 emails/request, an app with billing enabled easily copes with 200 emails/request.
Just be sure to set low billing levels - you never know when Slashdot, Digg or HackerNews notices your site :)

I encounteres the same with pylons based app. I have the initial page server as static, and have a dummy ajax call in it to bring the app up, before the user types in credentials. It is usually enough to avoid a lengthy response... Just an idea that you might use before you actually have a million users ;).

I used pingdom for obvious reasons - no cold starts is a bonus. Of course the customers will soon come flocking and it will be a non-issue

You may want to try CloudUp. It pings your google apps periodically to keep them active. It's free and you can add as many apps as you want. It also supports azure and heroku.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.