We are developing a distributed application in Python. Right now, we are about to re-organize some of our system components and deploy them on separate servers, so I'm looking to understand more about deployment for an application such as this. We will have several back-end code servers, several database servers (of different types) and possibly several front-end servers.
My question is this: what / which are good deployment patterns for distributed applications (in Python or in general)? How can I manage pushing code to several servers (whose IP's should be parameterized in the deployment system), static files to several front ends, starting / stopping processes in the servers, etc.? We are looking for possibly an easy-to-use solution, but mostly, something that once set-up will get out of our way and let us deploy as painlessly as possible.
To clarify: we are aware that there is no one standard solution for this particular application, but this question is rather more geared towards a guide of best practices for different types / parts of deployment than a single, unified solution.
Thanks so much! Any suggestions regarding this or other deployment / architecture pointers will be very appreciated.
It all depends on your application.
You can:
use Puppet to deploy servers,
use Fabric to remotely connect to the servers and execute specific tasks,
use pip for distributing Python modules (even non-public ones) and install dependencies,
use other tools for specific tasks (such as use boto to work with Amazon Web Services APIs, eg. to start new instance),
It is not always that simple and you will most likely need something customized. Just take a look at your system: it is not so "standard", so do not expect it to be handled in a "standard" way.
Related
TL;DR: Any advice or resources on extracting code to reusable, well-structured and maintainable libraries?
I'm working on python applications in a microservice-style architecture, where we'll be developing and deploying a bunch of small applications, each solving a specific issues, maybe (or maybe not) by interacting with other applications/external services.
We just started moving to that microservice architecture, so we already have quite a bit of code in a monolithic project. As we're adding new microservices, it's obvious that we need to extract common code(e.g. utilities, base classes, ...) into libraries to avoid reimplementing or copy-pasting code that will then have to be maintained separately. As I'm trying to do that(which I've never really done before), I'm realizing it's not trivial and can become complicated pretty quickly, and I could spend some time overthinking it too.
So I'm looking for advices, or pointers to resources on best-practices related to this situation, i.e. writing well-structured python libraries, packaging and distributing libraries, sharing code in a microservice architecture and avoiding making mistakes that might put me in problematic situations, .
Concrete problems/challenges I'm facing:
* How best to group/separate code in version control. Like, one repository per package? The number of repositories can explode pretty quickly...
For shared libraries you can publish it to git in individual repositories and set them up to use Python package managers to install them in your project.
As far as application deployments, service dependencies, etc. I would advise for you to take a look at Docker for containerization, docker-compose for service dependencies locally, Artifactory or ECR for Docker image registries, and container orchestration platforms like Kubernetes.
Containers are similar to the virtual machines but at a more granular level, the process level. This effectively will allow you to run services together locally for testing and deploying them. It would no longer matter that each service is in a different repository.
If you don't have too many microservices, you could definitely use a mono-repo but if your engineering organization is large, its pretty costly to download all the updates for all the services. As an alternative, you may have your services that are divided in respective bounded contexts all live in a single repo to remove this deterrent. Long story short, it really depends what you will find beneficial. At the end of the day, the largest problems are never how many Git repositories you have, its how you define the bounds of your services, the service-to-service communication and infrastructure for deploying the services.
I have a Python script that consumes an Azure queue, and I would like to scale this easily inside Azure infrastructure. I'm looking for the easiest solution possible to
run the Python script in an environment that is as managed as possible
have a centralized way to see the scripts running and their output, and easily scale the amount of scripts running through a GUI or something very easy to use
I'm looking at Docker at the moment, but this seems very complicated for the extremely simple task I'm trying to achieve. What possible approaches are known to do this? An added bonus would be if I could scale wrt the amount of items on the queue, but it is fine if we'd just be able to manually control the amount of parallelism.
You should have a look at Azure Web Apps, which also support Python.
This would be a managed and scaleable environment and also supports background tasks (WebJobs) with a central logging.
Azure Web Apps also offer a free plan for development and testing.
Per my experience, I think CoreOS on Azure can satisfy your needs. You can try to refer to the doc https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-coreos-how-to/ to know how to get started.
CoreOS is a Linux distribution for running Docker as Linux container, that you can remote access via SSH client like putty. For using Docker, you can search the key words Docker tutorial via Bing to rapidly learning some simple usage that enough for running Python scripts.
Sounds to me like you are describing something like a micro-services architecture. From that perspective, Docker is a great choice. I recommend you consider using an orchestration framework such as Apache Mesos or Docker Swarm which will allow you to run your containers on a cluster of VMs with the ability to easily scale, deploy new versions, rollback and implement load balancing. The schedulers Mesos supports (Marathon and Chronos) also have a Web UI. I believe you can also implement some kind of triggered scaling like you describe but that will probably not be off the shelf.
This does seem like a bit of a learning curve but I think is worth it especially once you start considering the complexities of deploying new versions (with possible rollbacks), monitoring failures and even integrating things like Jenkins and continuous delivery.
For Azure, an easy way to deploy and configure a Mesos or Swarm cluster is by using Azure Container Service (ACS) which does all the hard work of configuring the cluster for you. Find additional info here: https://azure.microsoft.com/en-us/documentation/articles/container-service-intro/
We want to use continuous deployment.
We have:
all sources (python) in a local RhodeCode (git) server.
Jenkins for automated testing
SSH connections to the production systems (linux).
a tool which can update servers in one command.
Now something like this should be implemented:
run tests with Jenkins
if there is a failure. Stop, mail developers
If all tests are OK:
deploy
We are long enough in the business to write some scripts to do this.
My questions:
How to you update the version numbers? You could increment them, you could use a timestamp ...
Since we already use Jenkins, I think we do it in a script called by Jenkins. Any reason to do it with a different (better) tool?
My fear: Jenkins becomes a central server for things which are not related to testing (deploy). I think other tools like SaltStack or Ansible should be used for this. Up to now we use Fabric (simple layer above ssh). Maybe we should switch to a central management system before starting with continuous deployment.
Since we already use Jenkins, I think we do it in a script called by
Jenkins. Any reason to do it with a different (better) tool?
To answer your question: No, there aren't any big reasons to not go with Jenkins for deployment.
Pros:
You already know Jenkins (and you probably know some of the quirks)
You don't need to introduce yet another technology
You said that you want to write scripts called by Jenkins, so you can switch easily to a different system later.
Cons:
there might be better tools out there for deployment
Does not tie the best with Change Control tools.
Additional Considerations:
Do not use the same server for prod deployment and continuous build/integration. These are two different tasks performed by two different roles. Therefore two different permission schemes might be employed.
Use permissions wisely. I use two different permissions for my deploy and CI servers. We have 3 Jenkins servers right now.
CI and deploy to uncontrolled environments (Developers can play with these environments)
Deploy to controlled environments. (QA environemnts and upwards)
Deploy to prod (yes, that's the only purpose in live of this server.) with the most restrictive permission scheme.
sandbox, actually there is this forth server for Jenkins admins to play with.
Store your deployable artifacts outside of Jenkins (and you do if I read your question correctly).
So depending on your existing infrastructure and procedure you decide for the tooling. Jenkins won't log you in as long as you keep as much of the logic as possible in scripts that are only executed by Jenkins.
We're writing a web-based tool to configure our services provided by multiple servers. This includes interfaces configuration, dhcp configs etc. etc.
Having configs in database and views that generate proper output, how to send it/make it available for servers?
I'm thinking about sending it through scp and invoking reload command to services through ssh. I'm also thinking about using Func to do all the job, as this is Python tool and will seemingly integrate with python-based (django) config tool.
Any other proposals?
I tried using Puppet for config management, mostly because of all the buzz around it. Unfortunately, I discovered (too late) that the puppetmaster scales horribly, and does not handle heterogeneous environments well. It works for tens of servers, but its inherent architecture prevents scaling.
So I switched to Cfengine 3, which you barely notice any performance impact of, and scales much better because of its distributed architecture. Also, I later discovered that Puppet is just an attempt to reimplement Cfengine 2 inefficiently in Ruby. See http://verticalsysadmin.com/blog/uncategorized/relative-origins-of-cfengine-chef-and-puppet
If your setup is going to be used for something useful, not just play around with, go with Cfengine 3!
You can take a look at Fabric.
As an example, this is an adapted excerpt from one of my backup scripts that starts Mercurial server on remote host and pushes local changesets there:
from fabric.api import *
env.hosts = ['login#my.host.com']
def mybckp():
run('cd ~/somedir; hg serve -a 111.222.111.222 -d') # start mercurial server in daemon mode
local('hg push') # push local changesets
To execute it, I simply type:
fab mybckp
Basically, what Fabric offers is easy&convenient SSH access to shell of one more (remote) hosts, from inside of Python script.
I think you are looking for Puppet and Foreman to manage puppet (create groups of servers).
There are many ways to do this, including Chef, Bcfg2, Capistrano etc. Puppet has biggest "lead" now. There is definitely a learning curve, but the results are worth it.
You could keep your servers config files on the puppet master (in version control). And when you deploy the latest config files on the master, puppet clients can automatically pull them and restart services. Puppet "templates" can dynamically generate config files for each server.
Puppet has "Providers" for things like Packages(apt, yum), Files and OS awareness.
It really depends what you're intending to do, as the question is a little vague. The other answers cover the tools available; choosing one over the other comes down to purpose.
Are you intending to manage servers, and services on those servers? If so, try Puppet, CFEngine, or some other tool for managing server configurations.
Or, more specifically, are you looking for a deployment/buildout tool that talks to servers? So that you can type in something along the lines of "mytool deploy myproject", and have your project propagate to all the servers? In which case, fabric would be the tool to use.
Generally a good configuration will consist of both anyway... but for what it's worth, from the sound of it (managing DHCP/network/etc.), Puppet's the way to go.
I have an application for Tomcat which needs to offer/consume web services. Since Java web services are a nightmare (xml, code generation, etc.) compared with what is possible in Python, I would like to learn from your experience using jython instead of java for offerring/consuming web services.
What I have done so far involves adapting http://pywebsvcs.sourceforge.net/ to Jython. I still get errors (namespaces, types and so), although some of it is succesful for the simplest services.
I've put together more details on how to use webservices in jython using axis. Read about it here: How To Script Webservices with Jython and Axis.
PyServlet helps you configure Tomcat to serve up Jython scripts from a URL. You could use this is a "REST-like" way to do some basic web services without much effort. (It is also described here.)
We used a similar home grown framework to provide a variety of data services in a large multiple web application very successfully.