How to organize shared libraries with docker and monorepo

How to organize shared libraries with docker and monorepo - python

What I have
I have 2 python apps that share a few bits of code, enough that I am trying to isolate the shared parts into modules/packages/libraries (I'm keeping the term vague on purpose, as I am not sure what the solution is). All my code is in a monorepo, because I am hoping to overcome some of the annoyances of managing more repos than we have team members.
Currently my file layout looks like:
+ myproject
+ appA
| + python backend A
| + js frontend
+ appB
| + B stuff
+ libs
+ lib1
+ lib2
Both appA and appB use lib1 and lib2 (they are essentially data models to abstract away the shared database). appA is a webapp with several components, not all of which are python. It is deployed as a docker stack that involve a bunch of containers.
I manage my dependencies with poetry to ensure reproducible builds, etc... Each python component (appA, appB...) have their own pyproject.toml file, virtual env, etc...
appB is deployed separately.
All development is on linux, if it makes any difference.
What I need
I am looking for a clean way to deal with the libs:
development for appA is done in a local docker-compose setup. The backend auto-reloads on file changes (using a docker volume), and I would like it to happen for changes in the libs too.
development for appB is simpler, but is moving to docker so the problem will be the same.
What I've tried
My initial "solution" was to copy the libs folder over to a temporary location for development in appA. It works for imports, but it's messy as soon as I want to change the libs code (which is still quite often), as I need to change the original file, copy over, rebuild the container.
I tried symlinking the libs into the backend's docker environment, but symlinks don't seem to work well with docker (it did not seem to follow the link, so the files don't end up in the docker image, unless I essentially copy the files inside the docker build context, which defeats the purpose of the link.)
I have tried packaging each lib into a python package, and install them via poetry add ../../libs/lib1 which doesn't really work inside docker because the paths don't match, and then I'm back to the symlink issue.
I am sure there is a clean way to do this, but I can't figure it out. I know I could break up the repo into smaller ones and install dependencies, but development would still cause problems inside docker, as I would still need to rebuild the container each time I change the lib files, so I would rather keep the monorepo.

If you are using docker-compose anyway you could use volumes to mount the local libs in your container and be able to edit them in your host system and the container. Not super fancy, but that should work, right?
#ckaserer your suggestion seems to work, indeed. In short, in the docker files I do COPY ../libs/lib1 /app/lib1 and then for local development, I mount ../libs/lib1 onto /app/lib1. That gives me the behavior I was looking for. I use a split docker-compose file for this. The setup causes a few issues with various tools needing some extra config so they know that the libs are part of the code base, but nothing impossible. Thanks for the idea!
So even though it's not an ideal solution locally mounting over the app and lib directories works on Linux systems.
FYI: On Windows hosts you might run into trouble if you want to watch for file changes as that is not propagated from a windows host to a Linux container.

Related

Using a local python project with Poetry within a Dockerized project

I have a dockerized project that uses Poetry for dependency management. I'm developing a Python library that I'm using within that dockerized project and would like to be able to make a change to that library and then use it within the project, preferably without doing more than saving those changes.
Right now, this works:
Make a change
poetry build
Copy the *tar.gz file into the dockerized project's root directory
Run docker-compose up --build
I've tried changing the path field for the project to something like, since the dockerized project and the lib both live on the same level on my filesystem:
my-lib = {path="../my-lib", develop=true}
Poetry can't find it, so I added a COPY command in my dockerfile. Docker didn't like that. I started finding a workaround for that, but thought, "Maybe somebody knows a better way."
Is there something I'm missing?
Is there a better way to do what I'm trying to do?

environment variables applied during elastic beanstalk deploy

My basic question: How would I set an environment variable that will be in effect during the Elastic Beanstalk deploy process?
I am not talking about setting environment variables during deployment that will be accessible by my application after it is deployed, I want to set environment variables that will modify a specific behavior of Elastic Beanstalk's build scripts.
To be clear - I generally think this is a bad idea, but it might be OK in this case so I am trying this out as an experiment. Here is some background about why I am looking into this, and why I think it might be OK:
I am in the process of transferring a server from AWS in the US to AWS in China, and am finding that server deploys fail between 50% ~ 100% of the time, depending on the day. This is a major pain during development, but I am primarily concerned about how I am going to make this work in production.
This is an Amazon Linux server running Python 2.7, and logs indicate that the failures are mainly Read Timeout Errors, with a few Connection Reset by Peers thrown in once in a while, all generated by pip install while attempting to download packages from pypi. To verify this I have ssh'd into my instances to manually install a few packages, and on a small sample size see similar failure rates. Note that this is pretty common when trying to access content on the other side of China's GFW.
So, I wrote a script that pip downloads the packages to my local machine, then aws syncs them to an S3 bucket located in the same region as my server. This would eliminate the need to cross the GFW while deploying.
My original plan was to add an .ebextension that aws cps the packages from S3 to the pip cache, but (unless I missed something) this somewhat surprisingly doesn't appear to be straight forward.
So, as plan B I am redirecting the packages into a local directory on the instance. This is working well, but I can't get pip install to pull packages from the local directory rather than downloading the packages from pypi.
Following the pip documentation, I expected that pointing the PIP_FIND_LINKS environment variable to my package directory would have pip "naturally" pull packages from my directory, rather than pypi. Which would make the change transparent to the EB build scripts, and why I thought that this might be a reasonable solution.
So far I have tried:
1) a command which exports PIP_FIND_LINKS=/path/to/package, with no luck. I assumed that this was due to the deploy step being called from a different session, so I then tried:
2) a command which (in addition to the previous export) appends export PIP_FIND_LINKS=/path/to/package to ~./profile, in an attempt to have this apply to any new sessions.
I have tried issuing the commands by both ec2_user and root, and neither works.
Rather than keep poking a stick at this, I was hoping that someone with a bit more experience with the nuances of EB, pip, etc might be able to provide some guidance.

After some thought I decided that a pip config file should be a more reliable solution than environment variables.
This turned out to be easy to implement with .ebextensions. I first create the download script, then create the config file directly in the virtualenv folder:
files:
/home/ec2-user/download_packages.sh:
mode: "000500"
owner: root
group: root
content: |
#!/usr/bin/env bash
package_dir=/path/to/packages
mkdir -p $package_dir
aws s3 sync s3://bucket/packages $package_dir
/opt/python/run/venv/pip.conf:
mode: "000755"
owner: root
group: root
content: |
[install]
find-links = file:///path/to/packages
no-index=false
Finally, a command is used to call the script that we just created:
commands:
03_download_packages:
command: bash /home/ec2-user/download_packages.sh
One potential issue is that pip bypasses the local package directory and downloads packages that are stored in our private git repo, so there is still potential for timeout errors, but these represent just a small fraction of the packages that need to be installed so it should be workable.
Still unsure if this will be a long-term solution, but it is very simple and (after just one day of testing...) failure rates have fallen from 50% ~ 100% to 0%.

How to deploy a Django/Tornado based web app that's built with platter?

This question is mostly about the technical details + some best practices of how to efficiently deploy a python web app that's built using platter.
Taking Django for instance, I have a project that's already built into a tarball distribution. This includes all wheels of all deps + the package of the app itself.
My repo directory also contains some other files that need to be distributed with the deployed code, such as: manage.py, a fabfile package with fabric utils, and some configuration files (for supervisor, nginx, etc).
So my questions are:
How can I wrap these extra files into the distribution that contains the project?
If I simply use git to clone/pull the project on the server I have these files, but then I have duplicate of the source code being both in the project and zipped in the tarball. How can I avoid that? Committing the tarball into a separate repo?
Perhaps the duplication is not so bad, and I'll end up with multiple tarballs in my dist/ directory and only one symlinked to the current from which I deploy?
Same goes for a Tornado based app.

My first rule of deployment is "whatever works". Every production environment has different requirements. But to give opinions on your questions:
Not everything should be in your Python project. Perhaps there is a way to do it, but I think it's using the wrong hammer.
You can create a separate Git repo that handles configuration and asset files for your production deployment (this does not even be managed by Git if you don't care about old, irrelevant configuration files). This does not have to be a Python project, just the files for the production deployment. You may optionally put a Python script or two in here (or just a README.txt or fab files or a Buildout config) to automate tasks such as unpacking your platter or copying config files around.
It's tempting (and possible) to put production config things in your main Git repo. This is even suggested by apps that create boilerplate files for development and production configuration. This doesn't mean it's the best way to do things though.
My rule is that the main Git repo is "development only". It's cloned by developers who are setting up and working in development environments. It conflates a Python project far too much to try and be an Python application and also be a place to manage a production system, IMHO.
Production is managed separately. Sometimes by people different from the developers or at least the developer is wearing a different hat when thinking about a production deployment. This way you can also have a small, clean repo that tracks just changes to your production system.
Playing with symlinks within a single deployment that represents different builds is an extra layer of confusion. And the impetus to do so comes from trying to do everything from a single Python project.
Deploy your python application to something like /var/myapp/build-2015-10-29/. Then create a symlink at /var/myapp/current/ that points to this location. This way you can create a full deployment at /var/myapp/build-2015-11-05/ and tweak the config to start on a separate port, bring the app up and ensure everything works, then just switch from the symlink from the old build to the new build with minimal downtime.

Django Dev/Prod Deployment using Mercurial

I've a puzzle of a development and production Django setup that I can't figure out a good way to deploy in a simple way. Here's the setup:
/srv/www/projectprod contains my production code, served at www.domain.com
/srv/www/projectbeta contains my development code, served at www.dev.domain.com
Prod and Dev are also split into two different virtualenvs, to isolate their various Python packages, just in case.
What I want to do here is to make a bunch of changes in dev, then push to my Mercurial server, and then re-pull those changes in production when stable. But there are a few things making this complicated:
wsgi.py contains the activate_this.py call for the virtualenv, but the path is scoped to either prod or dev, so that needs to be edited before deployment.
manage.py has a shebang at the top to define the correct python path for the virtualenv. (This is currently #!/srv/ve/.virtualenvs/project-1.2/bin/python so I'm wondering if I can just remove this to simplify things)
settings.py contains paths to the templates, staticfiles, media root, etc. which are all stored under /srv/www/project[prod|dev]/*
I've looked into Fabric, but I don't see anything in it that would re-write these files for me prior to doing the mercurial push/pull.
Does anyone have any tips for simplifying this, or a way to automate this deployment?

Two branches for different environment (with env-specific changes in each, thus - additional merge before deploy)
or
MQ extension, "clean" code in changesets, MQ-patch for every environment on top of single branch (and accuracy with apply|unapply of patches)

How to re-use a reusable app in Django

I am trying to create my first site in Django and as I'm looking for example apps out there to draw inspiration from, I constantly stumble upon a term called "reusable apps".
I understand the concept of an app that is reusable easy enough, but the means of reusing an app in Django are quite lost for me. Few questions that are bugging me in the whole business are:
What is the preferred way to re-use an existing Django app? Where do I put it and how do I reference it?
From what I understand, the recommendation is to put it on your "PYTHONPATH", but that breaks as soon as I need to deploy my app to a remote location that I have limited access to (e.g. on a hosting service).
So, if I develop my site on my local computer and intend to deploy it on an ISP where I only have ftp access, how do I re-use 3rd party Django apps so that if I deploy my site, the site keeps working (e.g. the only thing I can count on is that the service provider has Python 2.5 and Django 1.x installed)?
How do I organize my Django project so that I could easily deploy it along with all of the reusable apps I want to use?

In general, the only thing required to use a reusable app is to make sure it's on sys.path, so that you can import it from Python code. In most cases (if the author follows best practice), the reusable app tarball or bundle will contain a top-level directory with docs, a README, a setup.py, and then a subdirectory containing the actual app (see django-voting for an example; the app itself is in the "voting" subdirectory). This subdirectory is what needs to be placed in your Python path. Possible methods for doing that include:
running pip install appname, if the app has been uploaded to PyPI (these days most are)
installing the app with setup.py install (this has the same result as pip install appname, but requires that you first download and unpack the code yourself; pip will do that for you)
manually symlinking the code directory to your Python site-packages directory
using software like virtualenv to create a "virtual Python environment" that has its own site-packages directory, and then running setup.py install or pip install appname with that virtualenv active, or placing or symlinking the app in the virtualenv's site-packages (highly recommended over all the "global installation" options, if you value your future sanity)
placing the application in some directory where you intend to place various apps, and then adding that directory to the PYTHONPATH environment variable
You'll know you've got it in the right place if you can fire up a Python interpreter and "import voting" (for example) without getting an ImportError.
On a server where you have FTP access only, your only option is really the last one, and they have to set it up for you. If they claim to support Django they must provide some place where you can upload packages and they will be available for importing in Python. Without knowing details of your webhost, it's impossible to say how they structure that for you.

An old question, but here's what I do:
If you're using a version control system (VCS), I suggest putting all of the reusable apps and libraries (including django) that your software needs in the VCS. If you don't want to put them directly under your project root, you can modify settings.py to add their location to sys.path.
After that deployment is as simple as cloning or checking out the VCS repository to wherever you want to use it.
This has two added benefits:
Version mismatches; your software always uses the version that you tested it with, and not the version that was available at the time of deployment.
If multiple people work on the project, nobody else has to deal with installing the dependencies.
When it's time to update a component's version, update it in your VCS and then propagate the update to your deployments via it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.