I am currently planning a sizeable ROS project which will contain upwards of 15 Nodes developed either in Python2.x or C++ talking to each other. We will try to isolate different tasks as individual nodes to guarentee unit-testability and modularity and to improve reusability for future projects.
The question is if creating a docker container for each individual node is worth the effort and if there are any downsides. The big upside would be that we could have vastly different systems from developer-pcs to build machines and then the devices the nodes are deployed to. Docker keeps dependencies and environment configurations in one managable place and fixes many hurdles.
But are there any significant drawbacks to that idea? Are there significant performance hits or other hurdles docker does introduce in such a scenario?
Creation of isolated ROS nodes in docker containers is not as complicated and already done before:
How can I run two ROS2 nodes each in a separate docker container?
Running 2 nodes in 2 separate docker containers
Since the effort is not as high, I think the improvements in testing, reusability, integration, delivery, ... are absolutely arguable if these points are scoped in your project.
Since CPU performance is no drawback in Docker, you should have a closer look on communication of ROS nodes, which means passing messages between your containered ROS nodes. Here you can find two drawbacks you should know:
Socket communication:
As you can see in Networking with Docker: Don’t settle for the defaults, network access in Docker is done by additional interfaces, which will affect the TCP transmission of ROS messages. But a performance analysis in Docker network performance shows only small impact.
Nodelet usage:
Since nodelets are loaded in an existing host ROS node communication between ROS node can be improved imense. Using Docker, the usage of nodelets would not be possible.
All in all, you have to consider which and how many messages are transmitted between your ROS nodes. If many images or other large messages are transmitted, further analysis would be absolutely advisable. Except a little more complicated handling of your nodes, I can't find notable points that speaks against the concept of using ROS nodes in separated Docker containers.
Related
TL;DR: Any advice or resources on extracting code to reusable, well-structured and maintainable libraries?
I'm working on python applications in a microservice-style architecture, where we'll be developing and deploying a bunch of small applications, each solving a specific issues, maybe (or maybe not) by interacting with other applications/external services.
We just started moving to that microservice architecture, so we already have quite a bit of code in a monolithic project. As we're adding new microservices, it's obvious that we need to extract common code(e.g. utilities, base classes, ...) into libraries to avoid reimplementing or copy-pasting code that will then have to be maintained separately. As I'm trying to do that(which I've never really done before), I'm realizing it's not trivial and can become complicated pretty quickly, and I could spend some time overthinking it too.
So I'm looking for advices, or pointers to resources on best-practices related to this situation, i.e. writing well-structured python libraries, packaging and distributing libraries, sharing code in a microservice architecture and avoiding making mistakes that might put me in problematic situations, .
Concrete problems/challenges I'm facing:
* How best to group/separate code in version control. Like, one repository per package? The number of repositories can explode pretty quickly...
For shared libraries you can publish it to git in individual repositories and set them up to use Python package managers to install them in your project.
As far as application deployments, service dependencies, etc. I would advise for you to take a look at Docker for containerization, docker-compose for service dependencies locally, Artifactory or ECR for Docker image registries, and container orchestration platforms like Kubernetes.
Containers are similar to the virtual machines but at a more granular level, the process level. This effectively will allow you to run services together locally for testing and deploying them. It would no longer matter that each service is in a different repository.
If you don't have too many microservices, you could definitely use a mono-repo but if your engineering organization is large, its pretty costly to download all the updates for all the services. As an alternative, you may have your services that are divided in respective bounded contexts all live in a single repo to remove this deterrent. Long story short, it really depends what you will find beneficial. At the end of the day, the largest problems are never how many Git repositories you have, its how you define the bounds of your services, the service-to-service communication and infrastructure for deploying the services.
I am dockerizing a Python webapp using the https://hub.docker.com/r/tiangolo/uwsgi-nginx image, which uses supervisor to control the uWSGI instance.
My app actually requires an additional supervisor-mediated process to run (LibreOffice headless, with which I generate documents through the appy module), and I'm wondering what is the proper pattern to implement it.
The way I see it, I could extend the above image with the extra supervisor config for my needs (along with all the necessary OS-level install steps), but this would be in contradiction with the general principle of running the least amount of distinct processes in a given container. However, since my Python app is designed to talk with LibreOffice only locally, I'm not sure how I could achieve it with a more containerized approach. Thanks for any help or suggestion.
The recommendation for one-process-per-container is sound - Docker only monitors the process it starts when the container runs, so if you have multiple processes they're not watched by Docker. It's also a better design - you have lightweight, focused containers with single responsibilities, and you can manage them independently.
user2105103 is right though, the image you're using already loses that benefit because it runs Python and Nginx, and you could extend it with LibreOffice headless and package your whole app without changing code.
If you move to a more "best practice" approach, you'd have a distributed app running across three containers in a Docker network:
nginx - web proxy, this is the public entry point to the app. Nginx can do routing, caching, SSL termination, rate limiting etc.
app - your Python app, only visible inside the Docker network. Receives requests from nginx and uses libreoffice for document manipulation;
libreoffice - running in headless mode with the API exposed, but only available within the Docker network.
You'd need code changes for this, bringing in something like PyOO to use the LibreOffice API remotely from the app container.
You've already blown the "one process per container" -- just add another process. It's not a hard rule, or even one that everybody agrees with.
Extend away, or better yet author your own custom container. That way you own it, you understand it, and it's optimized for your purpose.
I have a Python script that consumes an Azure queue, and I would like to scale this easily inside Azure infrastructure. I'm looking for the easiest solution possible to
run the Python script in an environment that is as managed as possible
have a centralized way to see the scripts running and their output, and easily scale the amount of scripts running through a GUI or something very easy to use
I'm looking at Docker at the moment, but this seems very complicated for the extremely simple task I'm trying to achieve. What possible approaches are known to do this? An added bonus would be if I could scale wrt the amount of items on the queue, but it is fine if we'd just be able to manually control the amount of parallelism.
You should have a look at Azure Web Apps, which also support Python.
This would be a managed and scaleable environment and also supports background tasks (WebJobs) with a central logging.
Azure Web Apps also offer a free plan for development and testing.
Per my experience, I think CoreOS on Azure can satisfy your needs. You can try to refer to the doc https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-coreos-how-to/ to know how to get started.
CoreOS is a Linux distribution for running Docker as Linux container, that you can remote access via SSH client like putty. For using Docker, you can search the key words Docker tutorial via Bing to rapidly learning some simple usage that enough for running Python scripts.
Sounds to me like you are describing something like a micro-services architecture. From that perspective, Docker is a great choice. I recommend you consider using an orchestration framework such as Apache Mesos or Docker Swarm which will allow you to run your containers on a cluster of VMs with the ability to easily scale, deploy new versions, rollback and implement load balancing. The schedulers Mesos supports (Marathon and Chronos) also have a Web UI. I believe you can also implement some kind of triggered scaling like you describe but that will probably not be off the shelf.
This does seem like a bit of a learning curve but I think is worth it especially once you start considering the complexities of deploying new versions (with possible rollbacks), monitoring failures and even integrating things like Jenkins and continuous delivery.
For Azure, an easy way to deploy and configure a Mesos or Swarm cluster is by using Azure Container Service (ACS) which does all the hard work of configuring the cluster for you. Find additional info here: https://azure.microsoft.com/en-us/documentation/articles/container-service-intro/
I just came across Docker a couple days ago and have been doing some research, but one thing is still a bit unclear to me.
In a video I watched by the creator of Docker, he likened this utility to a shipping container so that you could guarantee that your stack works as intended once set up inside.
But I'm seeing a lot of container images which are just a single part of the stack, i.e. an nginx image or a uwsgi image.
Basically I want to run a web server using python, flask, nginx, and uwsgi. They're all part of the stack so should they go in a single container or should certain parts be in their own container?
I'll have a MySQL server as well, and this seems more logical to run in its own container.
Apologies if this is a matter of opinion, but to me it feels like there is only one right way to go about this.
Docker containers are treated as deployment units - which means you package an application (or part of an application) and all its dependencies into a docker container that can be deployed independently. Your application could be monolithic where your entire application fits into one container just exposing HTTP end points for the browser to access, or an application that is composed of sub-components that can be independently deployed and managed - something like microservices - that when put together form the complete application. In such a case, each independent sub-component would reside in a container of its own. So, the decision on how many containers and how many processes within a container depends on the composition of your application and the kind of scalability you want to achieve.
Docker containers are meant to run single processes, but of course you can work around it by running process management tools like supervisord. I'm not very familiar with the python stack you are talking about, but I could explain this to you in terms of a stack comprising of Nginx + Node + Redis. I have elaborated on a sample docker workflow with this stack in my blog post as well: http://anandmanisankar.com/posts/docker-container-nginx-node-redis-example/
In my example used in the blog post, Nginx, Node and Redis run on separate containers. The reason being the following:
I want to be able to scale my node application depending on the load. So it makes sense to run it on separate container that I can scale independently.
I run Redis on a separate container which acts as a shared data store for my node containers.
To load balance my node containers I run Nginx - again on a separate container - that can dynamically balance the load across the scaled out node containers. The load balancing configuration can also be dynamically updated based on the state/availability/health of the node containers. It would be ideal if I could implement a service discovery mechanism which dynamically generates the Nginx configuration based on the availability of the containers. So scaling up would just be a matter of adding additional containers, and fault tolerance (failure of some node containers) would be automatically handled as well.
You can find the code behind this docker worflow on my github repo: https://github.com/msanand/docker-workflow
You could try to draw an analogy from this to any other web architecture stack. Hope this helps!
I think you could like this, I made a public (and open source) Docker image with all the bells and whistles that you can use to build a Python Flask web application.
It has uWSGI for running the application, Nginx to serve HTTP and Supervisord to control them, so you don't have to learn how to install and configure all those to build your Python Flask web app.
It seems like uWSGI with Nginx is one of the more robust (and with great performance) ways to deploy a Python web app. Here are the benchmarks: http://nichol.as/benchmark-of-python-web-servers.
There are even some template projects you can use to bootstrap your own. And also, you don't have to clone the full project or something, you can just use it as a base image.
Docker Hub: https://hub.docker.com/r/tiangolo/uwsgi-nginx-flask/
GitHub: https://github.com/tiangolo/uwsgi-nginx-flask-docker
And about the "one process per container" debate, some say that this is one of the key misconceptions when you look at it from a microservices point of view: https://valdhaus.co/writings/docker-misconceptions/
As others here have said, officially it is recommended to have one process per container, see
https://docs.docker.com/articles/dockerfile_best-practices/
However, I think there is much debate about how much process isolation you need. One example that is closer to the full stack is the phusion passenger image (phusion/baseimage and phusion/passenger-docker)
that bundles, amongst other things, nginx, ruby and passenger. Some people hate this, others think there is a place for such images. Opinions expressed about this particular image and a linked article discussing it can be found here: https://news.ycombinator.com/item?id=7258009. I think that you can generalize a lot of what is said there to your case and that the variety of arguments supports the variety of image types you have observed.
Personally, I think the full stack vs single process debate is down to the requirements of what you are trying to achieve. If you worry about scalability the single process paradigm might be better for you. If you care about quickly bringing up a dev environment, it could be more straight forward to create/take a container that feels a bit more like a virtual machine.
There are lots of different modules for threading/parallelizing python. Dispy and pp/ParallelPython seem especially popular. It looks like these are all designed for a single interface (e.g. desktop) which has many cores/processors. Is there a module which works on massively parallel architectures which are run by queue systems (specifically: SLURM)?
The most used parallel framework on large compute clusters for scientific/technical applications is MPI. The name of the Python package is MPI4py, which is part of SciPy.
MPI offers a high-level API for creating parallel software using messages for communicating over the network; remote process creation, data scatter/gather, reductions, etc. All implementations are able to take advantage of fast and low-latency networks if present. It is fully integrated with all cluster managers, including Slurm.
Via the ParallelPython main page:
"PP is a python module which provides mechanism for parallel execution of python code on SMP (systems with multiple processors or cores) and clusters (computers connected via network)."