I just came across Docker a couple days ago and have been doing some research, but one thing is still a bit unclear to me.
In a video I watched by the creator of Docker, he likened this utility to a shipping container so that you could guarantee that your stack works as intended once set up inside.
But I'm seeing a lot of container images which are just a single part of the stack, i.e. an nginx image or a uwsgi image.
Basically I want to run a web server using python, flask, nginx, and uwsgi. They're all part of the stack so should they go in a single container or should certain parts be in their own container?
I'll have a MySQL server as well, and this seems more logical to run in its own container.
Apologies if this is a matter of opinion, but to me it feels like there is only one right way to go about this.
Docker containers are treated as deployment units - which means you package an application (or part of an application) and all its dependencies into a docker container that can be deployed independently. Your application could be monolithic where your entire application fits into one container just exposing HTTP end points for the browser to access, or an application that is composed of sub-components that can be independently deployed and managed - something like microservices - that when put together form the complete application. In such a case, each independent sub-component would reside in a container of its own. So, the decision on how many containers and how many processes within a container depends on the composition of your application and the kind of scalability you want to achieve.
Docker containers are meant to run single processes, but of course you can work around it by running process management tools like supervisord. I'm not very familiar with the python stack you are talking about, but I could explain this to you in terms of a stack comprising of Nginx + Node + Redis. I have elaborated on a sample docker workflow with this stack in my blog post as well: http://anandmanisankar.com/posts/docker-container-nginx-node-redis-example/
In my example used in the blog post, Nginx, Node and Redis run on separate containers. The reason being the following:
I want to be able to scale my node application depending on the load. So it makes sense to run it on separate container that I can scale independently.
I run Redis on a separate container which acts as a shared data store for my node containers.
To load balance my node containers I run Nginx - again on a separate container - that can dynamically balance the load across the scaled out node containers. The load balancing configuration can also be dynamically updated based on the state/availability/health of the node containers. It would be ideal if I could implement a service discovery mechanism which dynamically generates the Nginx configuration based on the availability of the containers. So scaling up would just be a matter of adding additional containers, and fault tolerance (failure of some node containers) would be automatically handled as well.
You can find the code behind this docker worflow on my github repo: https://github.com/msanand/docker-workflow
You could try to draw an analogy from this to any other web architecture stack. Hope this helps!
I think you could like this, I made a public (and open source) Docker image with all the bells and whistles that you can use to build a Python Flask web application.
It has uWSGI for running the application, Nginx to serve HTTP and Supervisord to control them, so you don't have to learn how to install and configure all those to build your Python Flask web app.
It seems like uWSGI with Nginx is one of the more robust (and with great performance) ways to deploy a Python web app. Here are the benchmarks: http://nichol.as/benchmark-of-python-web-servers.
There are even some template projects you can use to bootstrap your own. And also, you don't have to clone the full project or something, you can just use it as a base image.
Docker Hub: https://hub.docker.com/r/tiangolo/uwsgi-nginx-flask/
GitHub: https://github.com/tiangolo/uwsgi-nginx-flask-docker
And about the "one process per container" debate, some say that this is one of the key misconceptions when you look at it from a microservices point of view: https://valdhaus.co/writings/docker-misconceptions/
As others here have said, officially it is recommended to have one process per container, see
https://docs.docker.com/articles/dockerfile_best-practices/
However, I think there is much debate about how much process isolation you need. One example that is closer to the full stack is the phusion passenger image (phusion/baseimage and phusion/passenger-docker)
that bundles, amongst other things, nginx, ruby and passenger. Some people hate this, others think there is a place for such images. Opinions expressed about this particular image and a linked article discussing it can be found here: https://news.ycombinator.com/item?id=7258009. I think that you can generalize a lot of what is said there to your case and that the variety of arguments supports the variety of image types you have observed.
Personally, I think the full stack vs single process debate is down to the requirements of what you are trying to achieve. If you worry about scalability the single process paradigm might be better for you. If you care about quickly bringing up a dev environment, it could be more straight forward to create/take a container that feels a bit more like a virtual machine.
Related
TL;DR: Any advice or resources on extracting code to reusable, well-structured and maintainable libraries?
I'm working on python applications in a microservice-style architecture, where we'll be developing and deploying a bunch of small applications, each solving a specific issues, maybe (or maybe not) by interacting with other applications/external services.
We just started moving to that microservice architecture, so we already have quite a bit of code in a monolithic project. As we're adding new microservices, it's obvious that we need to extract common code(e.g. utilities, base classes, ...) into libraries to avoid reimplementing or copy-pasting code that will then have to be maintained separately. As I'm trying to do that(which I've never really done before), I'm realizing it's not trivial and can become complicated pretty quickly, and I could spend some time overthinking it too.
So I'm looking for advices, or pointers to resources on best-practices related to this situation, i.e. writing well-structured python libraries, packaging and distributing libraries, sharing code in a microservice architecture and avoiding making mistakes that might put me in problematic situations, .
Concrete problems/challenges I'm facing:
* How best to group/separate code in version control. Like, one repository per package? The number of repositories can explode pretty quickly...
For shared libraries you can publish it to git in individual repositories and set them up to use Python package managers to install them in your project.
As far as application deployments, service dependencies, etc. I would advise for you to take a look at Docker for containerization, docker-compose for service dependencies locally, Artifactory or ECR for Docker image registries, and container orchestration platforms like Kubernetes.
Containers are similar to the virtual machines but at a more granular level, the process level. This effectively will allow you to run services together locally for testing and deploying them. It would no longer matter that each service is in a different repository.
If you don't have too many microservices, you could definitely use a mono-repo but if your engineering organization is large, its pretty costly to download all the updates for all the services. As an alternative, you may have your services that are divided in respective bounded contexts all live in a single repo to remove this deterrent. Long story short, it really depends what you will find beneficial. At the end of the day, the largest problems are never how many Git repositories you have, its how you define the bounds of your services, the service-to-service communication and infrastructure for deploying the services.
What is the easiest way to deploy a python app with only two .py files but keep it in one dyno? My files are friend.py and foe.py and my Procfile looks like this:
worker: python friend.py
worker: python foe.py
But when deployed to Heroku, the only dyno I have is foe.py. I've read other similar questions but they seem to complicated and I don't yet understand the inner workings of a python web application.
If they are distinct processes working in parallel, the most direct path is two dynos, using different names (actually, friend and foe would work fine as process names) in the Procfile. Right now, you are using the name worker twice, so foe.py shows up because it's the last one defined. Two things to keep in mind -
The names in Procfile can be arbitrary; as far as I know, the only "special" name is web, which tells Heroku to expect that process to bind to a port and accept HTTP traffic from the routing mesh. worker isn't special; it's just a convention people tend to use for "something running other than the web dyno"
A dyno is closer to a Docker container than a virtual machine, so the general best practice is one kind of process per container.
If you really need only one dyno (cost?), you could write a third script whose sole job is to spawn friend.py and foe.py as subprocesses. In that case, everything comes up and down as a unit; you can't manage friend and foe independently.
Hope that helps.
I am dockerizing a Python webapp using the https://hub.docker.com/r/tiangolo/uwsgi-nginx image, which uses supervisor to control the uWSGI instance.
My app actually requires an additional supervisor-mediated process to run (LibreOffice headless, with which I generate documents through the appy module), and I'm wondering what is the proper pattern to implement it.
The way I see it, I could extend the above image with the extra supervisor config for my needs (along with all the necessary OS-level install steps), but this would be in contradiction with the general principle of running the least amount of distinct processes in a given container. However, since my Python app is designed to talk with LibreOffice only locally, I'm not sure how I could achieve it with a more containerized approach. Thanks for any help or suggestion.
The recommendation for one-process-per-container is sound - Docker only monitors the process it starts when the container runs, so if you have multiple processes they're not watched by Docker. It's also a better design - you have lightweight, focused containers with single responsibilities, and you can manage them independently.
user2105103 is right though, the image you're using already loses that benefit because it runs Python and Nginx, and you could extend it with LibreOffice headless and package your whole app without changing code.
If you move to a more "best practice" approach, you'd have a distributed app running across three containers in a Docker network:
nginx - web proxy, this is the public entry point to the app. Nginx can do routing, caching, SSL termination, rate limiting etc.
app - your Python app, only visible inside the Docker network. Receives requests from nginx and uses libreoffice for document manipulation;
libreoffice - running in headless mode with the API exposed, but only available within the Docker network.
You'd need code changes for this, bringing in something like PyOO to use the LibreOffice API remotely from the app container.
You've already blown the "one process per container" -- just add another process. It's not a hard rule, or even one that everybody agrees with.
Extend away, or better yet author your own custom container. That way you own it, you understand it, and it's optimized for your purpose.
I have a Python script that consumes an Azure queue, and I would like to scale this easily inside Azure infrastructure. I'm looking for the easiest solution possible to
run the Python script in an environment that is as managed as possible
have a centralized way to see the scripts running and their output, and easily scale the amount of scripts running through a GUI or something very easy to use
I'm looking at Docker at the moment, but this seems very complicated for the extremely simple task I'm trying to achieve. What possible approaches are known to do this? An added bonus would be if I could scale wrt the amount of items on the queue, but it is fine if we'd just be able to manually control the amount of parallelism.
You should have a look at Azure Web Apps, which also support Python.
This would be a managed and scaleable environment and also supports background tasks (WebJobs) with a central logging.
Azure Web Apps also offer a free plan for development and testing.
Per my experience, I think CoreOS on Azure can satisfy your needs. You can try to refer to the doc https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-coreos-how-to/ to know how to get started.
CoreOS is a Linux distribution for running Docker as Linux container, that you can remote access via SSH client like putty. For using Docker, you can search the key words Docker tutorial via Bing to rapidly learning some simple usage that enough for running Python scripts.
Sounds to me like you are describing something like a micro-services architecture. From that perspective, Docker is a great choice. I recommend you consider using an orchestration framework such as Apache Mesos or Docker Swarm which will allow you to run your containers on a cluster of VMs with the ability to easily scale, deploy new versions, rollback and implement load balancing. The schedulers Mesos supports (Marathon and Chronos) also have a Web UI. I believe you can also implement some kind of triggered scaling like you describe but that will probably not be off the shelf.
This does seem like a bit of a learning curve but I think is worth it especially once you start considering the complexities of deploying new versions (with possible rollbacks), monitoring failures and even integrating things like Jenkins and continuous delivery.
For Azure, an easy way to deploy and configure a Mesos or Swarm cluster is by using Azure Container Service (ACS) which does all the hard work of configuring the cluster for you. Find additional info here: https://azure.microsoft.com/en-us/documentation/articles/container-service-intro/
We are developing a distributed application in Python. Right now, we are about to re-organize some of our system components and deploy them on separate servers, so I'm looking to understand more about deployment for an application such as this. We will have several back-end code servers, several database servers (of different types) and possibly several front-end servers.
My question is this: what / which are good deployment patterns for distributed applications (in Python or in general)? How can I manage pushing code to several servers (whose IP's should be parameterized in the deployment system), static files to several front ends, starting / stopping processes in the servers, etc.? We are looking for possibly an easy-to-use solution, but mostly, something that once set-up will get out of our way and let us deploy as painlessly as possible.
To clarify: we are aware that there is no one standard solution for this particular application, but this question is rather more geared towards a guide of best practices for different types / parts of deployment than a single, unified solution.
Thanks so much! Any suggestions regarding this or other deployment / architecture pointers will be very appreciated.
It all depends on your application.
You can:
use Puppet to deploy servers,
use Fabric to remotely connect to the servers and execute specific tasks,
use pip for distributing Python modules (even non-public ones) and install dependencies,
use other tools for specific tasks (such as use boto to work with Amazon Web Services APIs, eg. to start new instance),
It is not always that simple and you will most likely need something customized. Just take a look at your system: it is not so "standard", so do not expect it to be handled in a "standard" way.