I'm writing an e-commerce plug-in app in Python/Django that integrates with Shopify stores. Whenever a customer for a store reaches checkout, Shopify sends a request to my app with shopping cart and destination address data, and my app is required to respond with shipping price information. The problem is that I need to make an external API call between them sending me the request and sending them the response, and under moderate load, my WSGI workers get filled very easily.
I'm trying to avoid scaling out unnecessarily. Should I simply increase my number of workers past the recommended cores * 2 + 1? Do I simply monitor CPU load in order to adjust this number? What's the ideal CPU load % I should be looking for? Since I'm also handing short non-blocked requests from the same app, will this cause any problems?
Is Django simply not a good match for this kind of use-case? If so, what is a good match, and what would be the best way to apply it without rewriting my whole app?
EDIT: My WSGI server is Gunicorn
There are a couple of things you can do to improve the performance of gunicorn here. Given your design, it's almost certain that your workers are IO-bound. So for a start you could configure them to use multiple threads per worker; the docs suggest 2-4.
However, again because of the IO-bound nature of your site, it seems likely that you'll get even better improvements by using one of the asynchronous worker types. See the design docs for details: I don't think there is much to choose between gevent and eventlet, personally I've had good results from the former.
Related
I've a machine learning application which uses flask to expose api(for production this is not a good idea, but even if I'll use django in future the idea of the question shouldn't change).
The main problem is how to serve multiple requests to my app. Few months back celery has been added to get around this problem. The number of workers in celery that was spawned is equal to the number of cores present in the machine. For very few users this was looking fine and was in production for some time.
When the number of concurrent users got increased, it was evident that we should do a performance testing on it. It turns out: it is able to handle 20 users for 30 GB and 8 core machine without authentication and without any front-end. Which is not looking like a good number.
I didn't know there are things like: application server, web server, model server. When googling for this problem: gunicorn was a good application server python application.
Should I use gunicorn or any other application server along with celery and why
If I remove celery and only use gunicorn with the application can I achieve concurrency. I have read somewhere celery is not good for machine learning applications.
What are the purposes of gunicorn and celery. How can we achieve the best out of both.
Note: Main goal is to maximize concurrency. While serving in production authentication will be added. One front-end application might come into action in between in production.
There is no shame in flask. If in fact you just need a web API wrapper, flask is probably a much better choice than django (simply because django is huge and you'd be using only a fraction of its capability).
However, your concurrency problems are apparently stemming from the fact that you are doing some heavy-duty processing for each request. There is simply no way around that; if you require a certain amount of computational resources per request, you can't magic those up. From here on, it's a juggling act.
If you want a guaranteed response immediately, you need to have as many workers as potential simultaneous requests. This may involve load balancing over multiple servers, if you can't scrounge up enough resources on one server. (cue gunicorn, a web application server, responsible for accepting connections and then distributing them to multiple application processes.)
If you are okay with not getting an immediate response, you can let stuff queue up. (cue celery, a task queue, which worker processes can use to retrieve the next thing to be done, and deposit results). This works best if you don't need a response in the same request-response cycle; e.g. you submit a job from client, and they only get an acknowledgement that the job has been received; you would need a second request to ask about the status of the job, and possibly the results of the job if it is finished.
Alternately, instead of Flask you could use websockets or Tornado, to push out the response to the client when it is available (as opposed to user polling for results, or waiting on a live HTTP connection and taking up a server process).
I have a Flask app deployed on Elastic Beanstalk onto an EC2 instance on AWS. If 100 people simultaneously connected to my server, then wouldn't that mean that they have to wait in a queue of 100 since the app can only handle one instance at a time?
How can I make it so that I can handle more requests using the same IP address to connect to? Thanks!
The short answer is to use uWSGI or gunicorn.
The longer answer is that your intuition is correct - what you are worrying about is "concurrency", or the number of simultaneous requests your app can handle. And yes, a single Flask app without any application server can handle one request at a time. How do you change that? For most Python apps, the unit of concurrency is a process (there are frameworks that change that, but the majority of app deployments are probably process-based). That is, you run a process for each concurrent request you think you'll need. App servers like uWSGI do the listening for your app, then dispatch the request to a process from a pool. So, how many processes do you need?
The second concept you need is "throughput" - how many requests can be served in a specific time, which is influenced by, but different from, "concurrency" and is where your intuition may mislead you. Let's say you have 8 processes. You may think "but I'll have 100 users, 8 is clearly not enough". Let's assume you know that each request completes in 1/8 (.125) seconds. That means that each process can serve 8 requests a second. Times 8 processes; your throughput will be (roughly) 64 requests per second. 8 process gets you a lot closer to your 100 users than you may have otherwise expected. Your 100 users probably won't actually issue requests in that 1 second window. Possible, but unlikely. The issue isn't really the concurrency, but whether the user gets a response in a reasonable time.
Hope this helps. Scaling is a wonderful topic - both straightforward and frustratingly nuanced at the same time. As your traffic increases, the above guidance will shift and you'll need more and more advanced techniques. But to get started - keep it simple and focus on the basics.
See How many concurrent requests does a single Flask process receive?
Edit for clarify my question:
I want to attach a python service on uwsgi using this feature (I can't understand the examples) and I also want to be able to communicate results between them. Below I present some context and also present my first thought on the communication matter, expecting maybe some advice or another approach to take.
I have an already developed python application that uses multiprocessing.Pool to run on demand tasks. The main reason for using the pool of workers is that I need to share several objects between them.
On top of that, I want to have a flask application that triggers tasks from its endpoints.
I've read several questions here on SO looking for possible drawbacks of using flask with python's multiprocessing module. I'm still a bit confused but this answer summarizes well both the downsides of starting a multiprocessing.Pool directly from flask and what my options are.
This answer shows an uWSGI feature to manage daemon/services. I want to follow this approach so I can use my already developed python application as a service of the flask app.
One of my main problems is that I look at the examples and do not know what I need to do next. In other words, how would I start the python app from there?
Another problem is about the communication between the flask app and the daemon process/service. My first thought is to use flask-socketIO to communicate, but then, if my server stops I need to deal with the connection... Is this a good way to communicate between server and service? What are other possible solutions?
Note:
I'm well aware of Celery, and I pretend to use it in a near future. In fact, I have an already developed node.js app, on which users perform actions that should trigger specific tasks from the (also) already developed python application. The thing is, I need a production-ready version as soon as possible, and instead of modifying the python application, that uses multiprocessing, I thought it would be faster to create a simple flask server to communicate with node.js through HTTP. This way I would only need to implement a flask app that instantiates the python app.
Edit:
Why do I need to share objects?
Simply because the creation of the objects in questions takes too long. Actually, the creation takes an acceptable amount of time if done once, but, since I'm expecting (maybe) hundreds to thousands of requests simultaneously having to load every object again would be something I want to avoid.
One of the objects is a scikit classifier model, persisted on a pickle file, which takes 3 seconds to load. Each user can create several "job spots" each one will take over 2k documents to be classified, each document will be uploaded on an unknown point in time, so I need to have this model loaded in memory (loading it again for every task is not acceptable).
This is one example of a single task.
Edit 2:
I've asked some questions related to this project before:
Bidirectional python-node communication
Python multiprocessing within node.js - Prints on sub process not working
Adding a shared object to a manager.Namespace
As stated, but to clarify: I think the best solution would be to use Celery, but in order to quickly have a production ready solution, I trying to use this uWSGI attach daemon solution
I can see the temptation to hang on to multiprocessing.Pool. I'm using it in production as part of a pipeline. But Celery (which I'm also using in production) is much better suited to what you're trying to do, which is distribute work across cores to a resource that's expensive to set up. Have N cores? Start N celery workers, which of which can load (or maybe lazy-load) the expensive model as a global. A request comes in to the app, launch a task (e.g., task = predict.delay(args), wait for it to complete (e.g., result = task.get()) and return a response. You're trading a little bit of time learning celery for saving having to write a bunch of coordination code.
I am building REST API with Flask-restplus. One of my endpoints takes a file uploaded from client and run some analysis. The job uses up to 30 seconds. I don't want the job to block the main process. So the endpoint will return a response with 200 or 201 right away, the job can still be running. Results will be saved to database which will be retrieved later.
It seems I have two options for long-running jobs.
Threading
Task-queue
Threading is relatively simpler. But problem is, there is a limit of thread numbers for Flask app. In a standalone Python app, I could use a queue for the threads. But this is REST api, each request call is independent. I don't know if there is a way to maintain a global queue for that. So if the requests exceed the thread limit, it won't be able to take more requests.
Task-queue with Celery and Redis is probably better option. But this is just a proof of concept thing, and time line is kind of tight. Setting up Celery, Redis with Flask is not easy, I am having lots of trouble on my dev machine which is a Windows. It will be deployed on AWS which is kind of complex.
I wonder if there is a third option for this case?
I would HIGHLY recommend using Celery as you have already mentioned in your post. It is built exactly for this use case. Their docs are really informative and there are no shortage of examples online that can get you up and running quickly.
Additionally, I would say THIS would be an excellent first resource for you to start with.
Celery is a fantastic solution to this problem I have used quite successfully in the past to manage millions of jobs per day.
The only real downside is the initial learning curve and complexity of debugging when things go sour (it can happen, especially with millions of jobs).
I'm using Django with Uwsgi. We have 8 processes running, and I have no real indication that our code is particularly thread safe, as it was never designed with threads in mind.
Recently, we added the ability to get live rates from vendors of a service through their various APIs and display them at once for the user. The problem is these requests are old web services technologies, and due to their response times, the time needed before the all rates from vendors are acquired (or it gives up), can be up to 10 seconds.
This presents a problem. We have a pretty decent amount of traffic on our site, and the customers need to look at these rates pretty often. With only 8 processes, it's quite easy to see how the server can get tied up waiting on these upstream requests. Especially when other optimizations need to be made to make the site baseline faster anyway (we're working on that).
We made a separate library (which should be mostly threadsafe, and if not, should be converted to it easily enough) for the rates requesting, and we can separate out its configuration. So I was thinking of making a separate service with its own threads, perhaps in Twisted, and having the browser contact that service for JSON instead of having it run in the main Django server.
Is this solution a good one? Can you think of a better or simpler way to do it? Should I use something other than Twisted, and if so, why?
If you want to use your code in-process with Django, you can simply call out to your Twisted by using Crochet, which can automatically manage the creation, running, and shutdown of the reactor within whatever WSGI implementation you choose (presuming that it behaves like a regular Python process, at least).
Obviously it might be less complex to just run within the Twisted WSGI container :-).
It might also be worth looking at TReq to issue your service client requests; your new "thread safe" library will still have the disadvantage of tying up an entire thread for each blocking client, which is a non-trivial amount of memory and additional concurrency overhead, whereas with Twisted you will only need to worry about a couple of objects.