I am dockerizing a Python webapp using the https://hub.docker.com/r/tiangolo/uwsgi-nginx image, which uses supervisor to control the uWSGI instance.
My app actually requires an additional supervisor-mediated process to run (LibreOffice headless, with which I generate documents through the appy module), and I'm wondering what is the proper pattern to implement it.
The way I see it, I could extend the above image with the extra supervisor config for my needs (along with all the necessary OS-level install steps), but this would be in contradiction with the general principle of running the least amount of distinct processes in a given container. However, since my Python app is designed to talk with LibreOffice only locally, I'm not sure how I could achieve it with a more containerized approach. Thanks for any help or suggestion.
The recommendation for one-process-per-container is sound - Docker only monitors the process it starts when the container runs, so if you have multiple processes they're not watched by Docker. It's also a better design - you have lightweight, focused containers with single responsibilities, and you can manage them independently.
user2105103 is right though, the image you're using already loses that benefit because it runs Python and Nginx, and you could extend it with LibreOffice headless and package your whole app without changing code.
If you move to a more "best practice" approach, you'd have a distributed app running across three containers in a Docker network:
nginx - web proxy, this is the public entry point to the app. Nginx can do routing, caching, SSL termination, rate limiting etc.
app - your Python app, only visible inside the Docker network. Receives requests from nginx and uses libreoffice for document manipulation;
libreoffice - running in headless mode with the API exposed, but only available within the Docker network.
You'd need code changes for this, bringing in something like PyOO to use the LibreOffice API remotely from the app container.
You've already blown the "one process per container" -- just add another process. It's not a hard rule, or even one that everybody agrees with.
Extend away, or better yet author your own custom container. That way you own it, you understand it, and it's optimized for your purpose.
Related
I want to create web form that stays on forever on a single computer. Users can come to the computer fill out the form and submit it. After submitting, it will record the responses in an excel file and send emails. The next user can then come and fill out a new form automatically. I was planning on using Flask for this task since it is simple to create, but since I am not doing this on some production server, I will just have it running locally in development on the single computer.
I have never seen anyone do something like this with Flask so I was wondering if my idea is possible or if I should avoid it. I am also new to web development so I was wondering what problems there could be with keeping a flask application stay on 24/7 on a local development computer.
Thanks
There is nothing wrong with doing this in principle however, it is likely not the best solution for the time-to-reward payoff.
First, to answer your question, this could easily be done, even for a beginner, completing this in a few hours with minimal Python and HTML experience could definitely be done. Your app could crash in the background for many reasons (running out of space, bad memory addresses, etc) but most likely you will be fine.
As for specifically building it, it is all possible, there are libraries you can use to add the results to an excel file, or you can easily just append to a CSV (which is what I would recommend). Creating and sending an email, similarly is relatively simple, but again, doing it without python would be much easier.
If you are not set on flask/python, you could check out Google Forms but if you are set on python, or want to use it as a learning experience, it can definitely be done.
Your idea is possible and while there are many ways to do this kind of thing, what you are suggesting is not necessarily to be avoided.
All apps that run on a computer over a long period of time start a process and keep it going until closed. That is essentially what you are doing.
Having done this myself (and still currently doing it) at my business, I can say that it works great.
The only caveat is that to ensure that it will always be available, you need to have the process monitored by some tool to make sure that it gets restarted if it ever closes due to a variety of reasons.
In linux, supervisor is a great tool for doing that. In windows you could register it as a service. But you could also just create an easy way to restart and make it easy for the user to do so if it is down when they need it.
Yes, this could be done. It's very similar to the applications that run on the servers in data centers.
To keep the application running forever or restarting it after your system starts you'll need to use a system manager similar to systemd in Unix. You could use NSSM - the Non-Sucking Service Manager
or Service Control to monitor your application and restart it if it crashes. This will also have to be enabled on startup.
Other than this, you could use Waitres to serve your Flask application. Waitress is a WSGI web server with which you can easily configure the number of threads and workers to enable serving multiple users at the same time.
In a production environment, it's always suggested to use a web server interface like Gunicorn or Waitress.
I have a Python script that consumes an Azure queue, and I would like to scale this easily inside Azure infrastructure. I'm looking for the easiest solution possible to
run the Python script in an environment that is as managed as possible
have a centralized way to see the scripts running and their output, and easily scale the amount of scripts running through a GUI or something very easy to use
I'm looking at Docker at the moment, but this seems very complicated for the extremely simple task I'm trying to achieve. What possible approaches are known to do this? An added bonus would be if I could scale wrt the amount of items on the queue, but it is fine if we'd just be able to manually control the amount of parallelism.
You should have a look at Azure Web Apps, which also support Python.
This would be a managed and scaleable environment and also supports background tasks (WebJobs) with a central logging.
Azure Web Apps also offer a free plan for development and testing.
Per my experience, I think CoreOS on Azure can satisfy your needs. You can try to refer to the doc https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-coreos-how-to/ to know how to get started.
CoreOS is a Linux distribution for running Docker as Linux container, that you can remote access via SSH client like putty. For using Docker, you can search the key words Docker tutorial via Bing to rapidly learning some simple usage that enough for running Python scripts.
Sounds to me like you are describing something like a micro-services architecture. From that perspective, Docker is a great choice. I recommend you consider using an orchestration framework such as Apache Mesos or Docker Swarm which will allow you to run your containers on a cluster of VMs with the ability to easily scale, deploy new versions, rollback and implement load balancing. The schedulers Mesos supports (Marathon and Chronos) also have a Web UI. I believe you can also implement some kind of triggered scaling like you describe but that will probably not be off the shelf.
This does seem like a bit of a learning curve but I think is worth it especially once you start considering the complexities of deploying new versions (with possible rollbacks), monitoring failures and even integrating things like Jenkins and continuous delivery.
For Azure, an easy way to deploy and configure a Mesos or Swarm cluster is by using Azure Container Service (ACS) which does all the hard work of configuring the cluster for you. Find additional info here: https://azure.microsoft.com/en-us/documentation/articles/container-service-intro/
I just came across Docker a couple days ago and have been doing some research, but one thing is still a bit unclear to me.
In a video I watched by the creator of Docker, he likened this utility to a shipping container so that you could guarantee that your stack works as intended once set up inside.
But I'm seeing a lot of container images which are just a single part of the stack, i.e. an nginx image or a uwsgi image.
Basically I want to run a web server using python, flask, nginx, and uwsgi. They're all part of the stack so should they go in a single container or should certain parts be in their own container?
I'll have a MySQL server as well, and this seems more logical to run in its own container.
Apologies if this is a matter of opinion, but to me it feels like there is only one right way to go about this.
Docker containers are treated as deployment units - which means you package an application (or part of an application) and all its dependencies into a docker container that can be deployed independently. Your application could be monolithic where your entire application fits into one container just exposing HTTP end points for the browser to access, or an application that is composed of sub-components that can be independently deployed and managed - something like microservices - that when put together form the complete application. In such a case, each independent sub-component would reside in a container of its own. So, the decision on how many containers and how many processes within a container depends on the composition of your application and the kind of scalability you want to achieve.
Docker containers are meant to run single processes, but of course you can work around it by running process management tools like supervisord. I'm not very familiar with the python stack you are talking about, but I could explain this to you in terms of a stack comprising of Nginx + Node + Redis. I have elaborated on a sample docker workflow with this stack in my blog post as well: http://anandmanisankar.com/posts/docker-container-nginx-node-redis-example/
In my example used in the blog post, Nginx, Node and Redis run on separate containers. The reason being the following:
I want to be able to scale my node application depending on the load. So it makes sense to run it on separate container that I can scale independently.
I run Redis on a separate container which acts as a shared data store for my node containers.
To load balance my node containers I run Nginx - again on a separate container - that can dynamically balance the load across the scaled out node containers. The load balancing configuration can also be dynamically updated based on the state/availability/health of the node containers. It would be ideal if I could implement a service discovery mechanism which dynamically generates the Nginx configuration based on the availability of the containers. So scaling up would just be a matter of adding additional containers, and fault tolerance (failure of some node containers) would be automatically handled as well.
You can find the code behind this docker worflow on my github repo: https://github.com/msanand/docker-workflow
You could try to draw an analogy from this to any other web architecture stack. Hope this helps!
I think you could like this, I made a public (and open source) Docker image with all the bells and whistles that you can use to build a Python Flask web application.
It has uWSGI for running the application, Nginx to serve HTTP and Supervisord to control them, so you don't have to learn how to install and configure all those to build your Python Flask web app.
It seems like uWSGI with Nginx is one of the more robust (and with great performance) ways to deploy a Python web app. Here are the benchmarks: http://nichol.as/benchmark-of-python-web-servers.
There are even some template projects you can use to bootstrap your own. And also, you don't have to clone the full project or something, you can just use it as a base image.
Docker Hub: https://hub.docker.com/r/tiangolo/uwsgi-nginx-flask/
GitHub: https://github.com/tiangolo/uwsgi-nginx-flask-docker
And about the "one process per container" debate, some say that this is one of the key misconceptions when you look at it from a microservices point of view: https://valdhaus.co/writings/docker-misconceptions/
As others here have said, officially it is recommended to have one process per container, see
https://docs.docker.com/articles/dockerfile_best-practices/
However, I think there is much debate about how much process isolation you need. One example that is closer to the full stack is the phusion passenger image (phusion/baseimage and phusion/passenger-docker)
that bundles, amongst other things, nginx, ruby and passenger. Some people hate this, others think there is a place for such images. Opinions expressed about this particular image and a linked article discussing it can be found here: https://news.ycombinator.com/item?id=7258009. I think that you can generalize a lot of what is said there to your case and that the variety of arguments supports the variety of image types you have observed.
Personally, I think the full stack vs single process debate is down to the requirements of what you are trying to achieve. If you worry about scalability the single process paradigm might be better for you. If you care about quickly bringing up a dev environment, it could be more straight forward to create/take a container that feels a bit more like a virtual machine.
As part of an effort to make the scikit-image examples gallery interactive, I would like to build a web service that receives a Python code snippet, executes it, and provides me with the generated output image.
For safety, the Python instances launched should be sandboxed and resource controlled, so I was thinking of using LXC containers.
Is this a good way to approach the problem? If so, what is the recommended way of launching one Python VM per request?
Stefan, perhaps "Docker" could be of use? I get the impression that you could constrain the VM that the application is run in -- an example web service:
http://docs.docker.io/en/latest/examples/python_web_app/
You could try running the application on Digital Ocean, like so:
https://www.digitalocean.com/community/articles/how-to-install-and-use-docker-getting-started
[disclaimer: I'm an engineer at Continuum working on Wakari]
Wakari Enterprise (http://enterprise.wakari.io) is aiming to do exactly this, and we're hoping to back-port the functionality into Wakari Cloud (http://wakari.io) so "published" IPython Notebooks can have some knobs on them for variable input control, then they can be "invoked" in a sandboxed state, and then the output given back to the user.
However for things that exist now, you should look at Sage Notebook. A few years ago several people worked hard on a Sage Notebook Cell Server that could do exactly what you were asking for: execute small code snippets. I haven't followed it since then, but it seems it is still alive and well from a quick search:
http://sagecell.sagemath.org/?q=ejwwif
http://sagecell.sagemath.org
http://www.sagemath.org/eval.html
For the last URL, check out Graphics->Mandelbrot and you can see that Sage already has some great capabilities for UI widgets that are tied to the "cell execution".
I think docker is the way to go for this. The instances are very light weight, and docker is designed to spawn 100s of instances at a time (Spin up time is fractions of a second vs traditional VMs couple of seconds). Configured correctly I believe it also gives you a complete sandboxed environment. Then it matters not about trying to sandbox python :-D
I'm not sure if you really have to go as far as setting up LXC containers:
There is seccomp-nurse, a Python sandbox that leverages the seccomp feature of the Linux kernel.
Another option would be to use PyPy, which has explicit support for sandboxing out of the box.
In any case, do not use pysandbox, it is broken by design and has severe security risks.
I have a web service to which users upload python scripts that are run on a server. Those scripts process files that are on the server and I want them to be able to see only a certain hierarchy of the server's filesystem (best: a temporary folder on which I copy the files I want processed and the scripts).
The server will ultimately be a linux based one but if a solution is also possible on Windows it would be nice to know how.
What I though of is creating a user with restricted access to folders of the FS - ultimately only the folder containing the scripts and files - and launch the python interpreter using this user.
Can someone give me a better alternative? as relying only on this makes me feel insecure, I would like a real sandboxing or virtual FS feature where I could run safely untrusted code.
Either a chroot jail or a higher-order security mechanism such as SELinux can be used to restrict access to specific resources.
You are probably best to use a virtual machine like VirtualBox or VMware (perhaps even creating one per user/session).
That will allow you some control over other resources such as memory and network as well as disk
The only python that I know of that has such features built in is the one on Google App Engine. That may be a workable alternative for you too.
This is inherently insecure software. By letting users upload scripts you are introducing a remote code execution vulnerability. You have more to worry about than just modifying files, whats stopping the python script from accessing the network or other resources?
To solve this problem you need to use a sandbox. To better harden the system you can use a layered security approach.
The first layer, and the most important layer is a python sandbox. User supplied scripts will be executed within a python sandbox. This will give you the fine grained limitations that you need. Then, the entire python app should run within its own dedicated chroot. I highly recommend using the grsecurity kernel modules which improve the strength of any chroot. For instance a grsecuirty chroot cannot be broken unless the attacker can rip a hole into kernel land which is very difficult to do these days. Make sure your kernel is up to date.
The end result is that you are trying to limit the resources that an attacker's script has. Layers are a proven approach to security, as long as the layers are different enough such that the same attack won't break both of them. You want to isolate the script form the rest of the system as much as possible. Any resources that are shared are also paths for an attacker.