Run multiple Python scripts in Azure (using Docker?)

Run multiple Python scripts in Azure (using Docker?) - python

I have a Python script that consumes an Azure queue, and I would like to scale this easily inside Azure infrastructure. I'm looking for the easiest solution possible to
run the Python script in an environment that is as managed as possible
have a centralized way to see the scripts running and their output, and easily scale the amount of scripts running through a GUI or something very easy to use
I'm looking at Docker at the moment, but this seems very complicated for the extremely simple task I'm trying to achieve. What possible approaches are known to do this? An added bonus would be if I could scale wrt the amount of items on the queue, but it is fine if we'd just be able to manually control the amount of parallelism.

You should have a look at Azure Web Apps, which also support Python.
This would be a managed and scaleable environment and also supports background tasks (WebJobs) with a central logging.
Azure Web Apps also offer a free plan for development and testing.

Per my experience, I think CoreOS on Azure can satisfy your needs. You can try to refer to the doc https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-coreos-how-to/ to know how to get started.
CoreOS is a Linux distribution for running Docker as Linux container, that you can remote access via SSH client like putty. For using Docker, you can search the key words Docker tutorial via Bing to rapidly learning some simple usage that enough for running Python scripts.

Sounds to me like you are describing something like a micro-services architecture. From that perspective, Docker is a great choice. I recommend you consider using an orchestration framework such as Apache Mesos or Docker Swarm which will allow you to run your containers on a cluster of VMs with the ability to easily scale, deploy new versions, rollback and implement load balancing. The schedulers Mesos supports (Marathon and Chronos) also have a Web UI. I believe you can also implement some kind of triggered scaling like you describe but that will probably not be off the shelf.
This does seem like a bit of a learning curve but I think is worth it especially once you start considering the complexities of deploying new versions (with possible rollbacks), monitoring failures and even integrating things like Jenkins and continuous delivery.
For Azure, an easy way to deploy and configure a Mesos or Swarm cluster is by using Azure Container Service (ACS) which does all the hard work of configuring the cluster for you. Find additional info here: https://azure.microsoft.com/en-us/documentation/articles/container-service-intro/

Related

How to limit python script so that it can't access local resources?

I am working on a project that allows users to upload a python script to an API and run it on a schedule. Currently, I'm trying to figure out a way to limit the functionality of the script so that it cannot access local files, mess with the flask server running the API, etc. Do you have any ideas on how I can achieve this? Is there anyway to make it so only specific libraries are available for importing?

Running other scripts on your server is serious security issue. If you are trying to deploy Python interpreter on your web application, you can try with something like judge0 - GitHub. It is free if you deploy it yourself and it will run scripts safely inside containers.

The simplest way is to ensure the user running the script is not root, but a user specifically designed for this task (e.g. part of a group that can only read and not write or execute). This means at minimum you should ensure all files have the appropriate mode. Then you can just use a pipe or something to run the script.
Alternatively, you could use a runtime that’s not “local”, like a VM or compute service (AWS lambda, etc). The latter would be simplest, and there’s lots of vendors who offer compute service with programmatic api.

Containerization pattern best practice

I am dockerizing a Python webapp using the https://hub.docker.com/r/tiangolo/uwsgi-nginx image, which uses supervisor to control the uWSGI instance.
My app actually requires an additional supervisor-mediated process to run (LibreOffice headless, with which I generate documents through the appy module), and I'm wondering what is the proper pattern to implement it.
The way I see it, I could extend the above image with the extra supervisor config for my needs (along with all the necessary OS-level install steps), but this would be in contradiction with the general principle of running the least amount of distinct processes in a given container. However, since my Python app is designed to talk with LibreOffice only locally, I'm not sure how I could achieve it with a more containerized approach. Thanks for any help or suggestion.

The recommendation for one-process-per-container is sound - Docker only monitors the process it starts when the container runs, so if you have multiple processes they're not watched by Docker. It's also a better design - you have lightweight, focused containers with single responsibilities, and you can manage them independently.
user2105103 is right though, the image you're using already loses that benefit because it runs Python and Nginx, and you could extend it with LibreOffice headless and package your whole app without changing code.
If you move to a more "best practice" approach, you'd have a distributed app running across three containers in a Docker network:
nginx - web proxy, this is the public entry point to the app. Nginx can do routing, caching, SSL termination, rate limiting etc.
app - your Python app, only visible inside the Docker network. Receives requests from nginx and uses libreoffice for document manipulation;
libreoffice - running in headless mode with the API exposed, but only available within the Docker network.
You'd need code changes for this, bringing in something like PyOO to use the LibreOffice API remotely from the app container.

You've already blown the "one process per container" -- just add another process. It's not a hard rule, or even one that everybody agrees with.
Extend away, or better yet author your own custom container. That way you own it, you understand it, and it's optimized for your purpose.

How to build a web service with one sandboxed Python (VM) per request

As part of an effort to make the scikit-image examples gallery interactive, I would like to build a web service that receives a Python code snippet, executes it, and provides me with the generated output image.
For safety, the Python instances launched should be sandboxed and resource controlled, so I was thinking of using LXC containers.
Is this a good way to approach the problem? If so, what is the recommended way of launching one Python VM per request?

Stefan, perhaps "Docker" could be of use? I get the impression that you could constrain the VM that the application is run in -- an example web service:
http://docs.docker.io/en/latest/examples/python_web_app/
You could try running the application on Digital Ocean, like so:
https://www.digitalocean.com/community/articles/how-to-install-and-use-docker-getting-started

[disclaimer: I'm an engineer at Continuum working on Wakari]
Wakari Enterprise (http://enterprise.wakari.io) is aiming to do exactly this, and we're hoping to back-port the functionality into Wakari Cloud (http://wakari.io) so "published" IPython Notebooks can have some knobs on them for variable input control, then they can be "invoked" in a sandboxed state, and then the output given back to the user.
However for things that exist now, you should look at Sage Notebook. A few years ago several people worked hard on a Sage Notebook Cell Server that could do exactly what you were asking for: execute small code snippets. I haven't followed it since then, but it seems it is still alive and well from a quick search:
http://sagecell.sagemath.org/?q=ejwwif
http://sagecell.sagemath.org
http://www.sagemath.org/eval.html
For the last URL, check out Graphics->Mandelbrot and you can see that Sage already has some great capabilities for UI widgets that are tied to the "cell execution".

I think docker is the way to go for this. The instances are very light weight, and docker is designed to spawn 100s of instances at a time (Spin up time is fractions of a second vs traditional VMs couple of seconds). Configured correctly I believe it also gives you a complete sandboxed environment. Then it matters not about trying to sandbox python :-D

I'm not sure if you really have to go as far as setting up LXC containers:
There is seccomp-nurse, a Python sandbox that leverages the seccomp feature of the Linux kernel.
Another option would be to use PyPy, which has explicit support for sandboxing out of the box.
In any case, do not use pysandbox, it is broken by design and has severe security risks.

Continuous Deployment: Version Numbering and Jenkins for Deployment?

We want to use continuous deployment.
We have:
all sources (python) in a local RhodeCode (git) server.
Jenkins for automated testing
SSH connections to the production systems (linux).
a tool which can update servers in one command.
Now something like this should be implemented:
run tests with Jenkins
if there is a failure. Stop, mail developers
If all tests are OK:
deploy
We are long enough in the business to write some scripts to do this.
My questions:
How to you update the version numbers? You could increment them, you could use a timestamp ...
Since we already use Jenkins, I think we do it in a script called by Jenkins. Any reason to do it with a different (better) tool?
My fear: Jenkins becomes a central server for things which are not related to testing (deploy). I think other tools like SaltStack or Ansible should be used for this. Up to now we use Fabric (simple layer above ssh). Maybe we should switch to a central management system before starting with continuous deployment.

Since we already use Jenkins, I think we do it in a script called by
Jenkins. Any reason to do it with a different (better) tool?
To answer your question: No, there aren't any big reasons to not go with Jenkins for deployment.
Pros:
You already know Jenkins (and you probably know some of the quirks)
You don't need to introduce yet another technology
You said that you want to write scripts called by Jenkins, so you can switch easily to a different system later.
Cons:
there might be better tools out there for deployment
Does not tie the best with Change Control tools.
Additional Considerations:
Do not use the same server for prod deployment and continuous build/integration. These are two different tasks performed by two different roles. Therefore two different permission schemes might be employed.
Use permissions wisely. I use two different permissions for my deploy and CI servers. We have 3 Jenkins servers right now.
CI and deploy to uncontrolled environments (Developers can play with these environments)
Deploy to controlled environments. (QA environemnts and upwards)
Deploy to prod (yes, that's the only purpose in live of this server.) with the most restrictive permission scheme.
sandbox, actually there is this forth server for Jenkins admins to play with.
Store your deployable artifacts outside of Jenkins (and you do if I read your question correctly).
So depending on your existing infrastructure and procedure you decide for the tooling. Jenkins won't log you in as long as you keep as much of the logic as possible in scripts that are only executed by Jenkins.

Python deployment for distributed application

We are developing a distributed application in Python. Right now, we are about to re-organize some of our system components and deploy them on separate servers, so I'm looking to understand more about deployment for an application such as this. We will have several back-end code servers, several database servers (of different types) and possibly several front-end servers.
My question is this: what / which are good deployment patterns for distributed applications (in Python or in general)? How can I manage pushing code to several servers (whose IP's should be parameterized in the deployment system), static files to several front ends, starting / stopping processes in the servers, etc.? We are looking for possibly an easy-to-use solution, but mostly, something that once set-up will get out of our way and let us deploy as painlessly as possible.
To clarify: we are aware that there is no one standard solution for this particular application, but this question is rather more geared towards a guide of best practices for different types / parts of deployment than a single, unified solution.
Thanks so much! Any suggestions regarding this or other deployment / architecture pointers will be very appreciated.

It all depends on your application.
You can:
use Puppet to deploy servers,
use Fabric to remotely connect to the servers and execute specific tasks,
use pip for distributing Python modules (even non-public ones) and install dependencies,
use other tools for specific tasks (such as use boto to work with Amazon Web Services APIs, eg. to start new instance),
It is not always that simple and you will most likely need something customized. Just take a look at your system: it is not so "standard", so do not expect it to be handled in a "standard" way.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.