TL;DR: Any advice or resources on extracting code to reusable, well-structured and maintainable libraries?
I'm working on python applications in a microservice-style architecture, where we'll be developing and deploying a bunch of small applications, each solving a specific issues, maybe (or maybe not) by interacting with other applications/external services.
We just started moving to that microservice architecture, so we already have quite a bit of code in a monolithic project. As we're adding new microservices, it's obvious that we need to extract common code(e.g. utilities, base classes, ...) into libraries to avoid reimplementing or copy-pasting code that will then have to be maintained separately. As I'm trying to do that(which I've never really done before), I'm realizing it's not trivial and can become complicated pretty quickly, and I could spend some time overthinking it too.
So I'm looking for advices, or pointers to resources on best-practices related to this situation, i.e. writing well-structured python libraries, packaging and distributing libraries, sharing code in a microservice architecture and avoiding making mistakes that might put me in problematic situations, .
Concrete problems/challenges I'm facing:
* How best to group/separate code in version control. Like, one repository per package? The number of repositories can explode pretty quickly...
For shared libraries you can publish it to git in individual repositories and set them up to use Python package managers to install them in your project.
As far as application deployments, service dependencies, etc. I would advise for you to take a look at Docker for containerization, docker-compose for service dependencies locally, Artifactory or ECR for Docker image registries, and container orchestration platforms like Kubernetes.
Containers are similar to the virtual machines but at a more granular level, the process level. This effectively will allow you to run services together locally for testing and deploying them. It would no longer matter that each service is in a different repository.
If you don't have too many microservices, you could definitely use a mono-repo but if your engineering organization is large, its pretty costly to download all the updates for all the services. As an alternative, you may have your services that are divided in respective bounded contexts all live in a single repo to remove this deterrent. Long story short, it really depends what you will find beneficial. At the end of the day, the largest problems are never how many Git repositories you have, its how you define the bounds of your services, the service-to-service communication and infrastructure for deploying the services.
Related
I just came across Docker a couple days ago and have been doing some research, but one thing is still a bit unclear to me.
In a video I watched by the creator of Docker, he likened this utility to a shipping container so that you could guarantee that your stack works as intended once set up inside.
But I'm seeing a lot of container images which are just a single part of the stack, i.e. an nginx image or a uwsgi image.
Basically I want to run a web server using python, flask, nginx, and uwsgi. They're all part of the stack so should they go in a single container or should certain parts be in their own container?
I'll have a MySQL server as well, and this seems more logical to run in its own container.
Apologies if this is a matter of opinion, but to me it feels like there is only one right way to go about this.
Docker containers are treated as deployment units - which means you package an application (or part of an application) and all its dependencies into a docker container that can be deployed independently. Your application could be monolithic where your entire application fits into one container just exposing HTTP end points for the browser to access, or an application that is composed of sub-components that can be independently deployed and managed - something like microservices - that when put together form the complete application. In such a case, each independent sub-component would reside in a container of its own. So, the decision on how many containers and how many processes within a container depends on the composition of your application and the kind of scalability you want to achieve.
Docker containers are meant to run single processes, but of course you can work around it by running process management tools like supervisord. I'm not very familiar with the python stack you are talking about, but I could explain this to you in terms of a stack comprising of Nginx + Node + Redis. I have elaborated on a sample docker workflow with this stack in my blog post as well: http://anandmanisankar.com/posts/docker-container-nginx-node-redis-example/
In my example used in the blog post, Nginx, Node and Redis run on separate containers. The reason being the following:
I want to be able to scale my node application depending on the load. So it makes sense to run it on separate container that I can scale independently.
I run Redis on a separate container which acts as a shared data store for my node containers.
To load balance my node containers I run Nginx - again on a separate container - that can dynamically balance the load across the scaled out node containers. The load balancing configuration can also be dynamically updated based on the state/availability/health of the node containers. It would be ideal if I could implement a service discovery mechanism which dynamically generates the Nginx configuration based on the availability of the containers. So scaling up would just be a matter of adding additional containers, and fault tolerance (failure of some node containers) would be automatically handled as well.
You can find the code behind this docker worflow on my github repo: https://github.com/msanand/docker-workflow
You could try to draw an analogy from this to any other web architecture stack. Hope this helps!
I think you could like this, I made a public (and open source) Docker image with all the bells and whistles that you can use to build a Python Flask web application.
It has uWSGI for running the application, Nginx to serve HTTP and Supervisord to control them, so you don't have to learn how to install and configure all those to build your Python Flask web app.
It seems like uWSGI with Nginx is one of the more robust (and with great performance) ways to deploy a Python web app. Here are the benchmarks: http://nichol.as/benchmark-of-python-web-servers.
There are even some template projects you can use to bootstrap your own. And also, you don't have to clone the full project or something, you can just use it as a base image.
Docker Hub: https://hub.docker.com/r/tiangolo/uwsgi-nginx-flask/
GitHub: https://github.com/tiangolo/uwsgi-nginx-flask-docker
And about the "one process per container" debate, some say that this is one of the key misconceptions when you look at it from a microservices point of view: https://valdhaus.co/writings/docker-misconceptions/
As others here have said, officially it is recommended to have one process per container, see
https://docs.docker.com/articles/dockerfile_best-practices/
However, I think there is much debate about how much process isolation you need. One example that is closer to the full stack is the phusion passenger image (phusion/baseimage and phusion/passenger-docker)
that bundles, amongst other things, nginx, ruby and passenger. Some people hate this, others think there is a place for such images. Opinions expressed about this particular image and a linked article discussing it can be found here: https://news.ycombinator.com/item?id=7258009. I think that you can generalize a lot of what is said there to your case and that the variety of arguments supports the variety of image types you have observed.
Personally, I think the full stack vs single process debate is down to the requirements of what you are trying to achieve. If you worry about scalability the single process paradigm might be better for you. If you care about quickly bringing up a dev environment, it could be more straight forward to create/take a container that feels a bit more like a virtual machine.
I have some machines available in our office that I want to use to deploy Python scripts to.
My idea was that I have one central machine that manages the deployment of the python scripts and that each node communicates with that central machine to pick up new scripts, etc.
There is no dependency amongst the scripts, they can just run (scrape) and store the results locally.
I am not sure where to start with this.. I have some ideas on writing apps that do this automatically, but I just can not imagine I am the only one trying to do this.
G.
A Couple things to note. You may not even need to have a management node. I have ran calculations both that had a management node and those that do not. Depending on your cluster, it may honestly be easier to qsub a shell script. There is usually memory in a cluster for this.
Here is a list of other alternatives to IPython, but Peter Sutton's recommendation for IPython is great.
http://mpi4py.scipy.org/docs/usrman/intro.html
If you're still looking for a long-term solution (or this can answer can server for people to come), here I have suggestion. I am assuming, you want some solution for not a just one time task, but a long term management (correct me if I get it wrong).
Use a configuration management tool - as per your case a single management node, and merely deploying .py scripts,
I'd recommend using Ansible.
With Anisble you can start with a controller node, making an inventory (list of hosts/machines), and share the controller's public key on all the nodes/machines.
Here's few very plain English blogs to get started with:
How to install Ansible.
Getting started with Ansible.
Disclaimer: I am the author of above blog posts.
So I've been looking into ways to using Heroku for a small-scale personal project (Python Flask + MongoDB), however I can't seem to find much information on how to do simple continuos integration testing or simple unit testing on a Heroku staging instance. I feel that this would be necessary to make sure that everything will work in production, before actually making it public.
There doesn't seem to be much information on as to how I could achieve this. There are a couple of CI addons that would help, but they currently work only with Ruby/RoR (tddium, Rails on Fire) and proper testing on Heroku seems like a problem that should already be solved by a number of people. Buildpacks seem like a potential way to achieve what I need, but I'd rather use existing tools than re-invent the wheel myself.
So the question is, what are my options?
I wouldn't advise on running your tests on Heroku, as the platform isn't designed to do this. It will probably take you much longer to get the Platform to work than simply using another hosted service. There are lots of other alternatives (e.g. Codeship where I am one of the founders).
At Codeship we are currently working on Python support which will be released soon. MongoDB (as well as lots of other tools) is integrated nicely and works out of the box. We are also focusing very strongly on helping you deploy often and integrate that nicely, so you can work on your app and not your infrastructure.
CircleCi has Python support! It also directly supports MongoDB. You'll be able to set it up very easily.
None of the hosted CI solutions, Circle included, run directly on Heroku. We (Circle - I'm a founder) have looked into it, but the way people write tests make this awkward (they're really designed to be run on the same machine). Heroku is also very slow and memory constrained, while the main goal of a CI system is to get results to you quickly.
We are developing a distributed application in Python. Right now, we are about to re-organize some of our system components and deploy them on separate servers, so I'm looking to understand more about deployment for an application such as this. We will have several back-end code servers, several database servers (of different types) and possibly several front-end servers.
My question is this: what / which are good deployment patterns for distributed applications (in Python or in general)? How can I manage pushing code to several servers (whose IP's should be parameterized in the deployment system), static files to several front ends, starting / stopping processes in the servers, etc.? We are looking for possibly an easy-to-use solution, but mostly, something that once set-up will get out of our way and let us deploy as painlessly as possible.
To clarify: we are aware that there is no one standard solution for this particular application, but this question is rather more geared towards a guide of best practices for different types / parts of deployment than a single, unified solution.
Thanks so much! Any suggestions regarding this or other deployment / architecture pointers will be very appreciated.
It all depends on your application.
You can:
use Puppet to deploy servers,
use Fabric to remotely connect to the servers and execute specific tasks,
use pip for distributing Python modules (even non-public ones) and install dependencies,
use other tools for specific tasks (such as use boto to work with Amazon Web Services APIs, eg. to start new instance),
It is not always that simple and you will most likely need something customized. Just take a look at your system: it is not so "standard", so do not expect it to be handled in a "standard" way.
I want to remove as much complexity as I can from administering Python in on Amazon EC2 following some truly awful experiences with hosting providers who claim support for Python. I am looking for some guidance on which AMI to choose so that I have a stable and easily managed environment which already included Python and ideally an Apache web server and a database.
I am agnostic to Python version, web server, DB and OS as I am still early enough in my development cycle that I can influence those choices. Cost is not a consideration (within bounds) so Windows will work fine if it means easy administration.
Anyone have any practical experience or recommendations they can share?
Try the Ubuntu EC2 images. Python 2.7 is installed by default. The rest you just apt-get install and optionally create an image when the baseline is the way you want it (or just maintain a script that installs all the pieces and run after you create the base Ubuntu instance).
If you can get by with using the Amazon provided ones, I'd recommend it. I tend to use ami-84db39ed.
Honestly though, if you plan on leaving this running all the time, you would probably save a bit of money by just going with a VPS. Amazon tends to be cheaper if you are turning the service on and off over time.