Just curious how people are deploying their Django projects in combination with virtualenv
More specifically, how do you keep your production virtualenv's synched correctly with your development machine?
I use git for scm but I don't have my virtualenv inside the git repo - should I, or is it best to use the pip freeze and then re-create the environment on the server using the freeze output? (If you do this, could you please describe the steps - I am finding very little good documentation on the unfreezing process - is something like pip install -r freeze_output.txt possible?)
I just set something like this up at work using pip, Fabric and git. The flow is basically like this, and borrows heavily from this script:
In our source tree, we maintain a requirements.txt file. We'll maintain this manually.
When we do a new release, the Fabric script creates an archive based on whatever treeish we pass it.
Fabric will find the SHA for what we're deploying with git log -1 --format=format:%h TREEISH. That gives us SHA_OF_THE_RELEASE
Fabric will get the last SHA for our requirements file with git log -1 --format=format:%h SHA_OF_THE_RELEASE requirements.txt. This spits out the short version of the hash, like 1d02afc which is the SHA for that file for this particular release.
The Fabric script will then look into a directory where our virtualenvs are stored on the remote host(s).
If there is not a directory named 1d02afc, a new virtualenv is created and setup with pip install -E /path/to/venv/1d02afc -r /path/to/requirements.txt
If there is an existing path/to/venv/1d02afc, nothing is done
The little magic part of this is passing whatever tree-ish you want to git, and having it do the packaging (from Fabric). By using git archive my-branch, git archive 1d02afc or whatever else, I'm guaranteed to get the right packages installed on my remote machines.
I went this route since I really didn't want to have extra virtuenvs floating around if the packages hadn't changed between release. I also don't like the idea of having the actual packages I depend on in my own source tree.
I use this bootstrap.py: http://github.com/ccnmtl/ccnmtldjango/blob/master/ccnmtldjango/template/bootstrap.py
which expects are directory called 'requirements' that looks something like this: http://github.com/ccnmtl/ccnmtldjango/tree/master/ccnmtldjango/template/requirements/
There's an apps.txt, a libs.txt (which apps.txt includes--I just like to keep django apps seperate from other python modules) and a src directory which contains the actual tarballs.
When ./bootstrap.py is run, it creates the virtualenv (wiping a previous one if it exists) and installs everything from requirements/apps.txt into it. I do not ever install anything into the virtualenv otherwise. If I want to include a new library, I put the tarball into requirements/src/, add a line to one of the textfiles and re-run ./bootstrap.py.
bootstrap.py and requirements get checked into version control (also a copy of pip.py so I don't even have to have that installed system-wide anywhere). The virtualenv itself isn't. The scripts that I have that push out to production run ./bootstrap.py on the production server each time I push. (bootstrap.py also goes to some lengths to ensure that it's sticking to Python 2.5 since that's what we have on the production servers (Ubuntu Hardy) and my dev machine (Ubuntu Karmic) defaults to Python 2.6 if you're not careful)
Related
There is a similar question from last year but I don't think the responses are widely applicable and it's not accepted.
Edit: this is in the context of developing small jobs that will only be run in docker in-house; I'm not talking about sharing work with anyone outside a small team, or about projects getting heavy re-use.
What advantage do you see in using requirements.txt to install instead of pip install commands in Dockerfile? I see one: your Dockerfile for various projects is more cookie-cutter.
I'm not even thinking of the use of setup envisioned in the question I linked.
What downside is there to naming the packages in Dockerfile:
RUN pip install --target=/build django==3.0.1 Jinja2==2.11.1 . . .
EDIT 2: #superstormer asked "what are the upsides to putting it in Dockefile" -- fair question. I read co-workers' dockerfiles in Gitlab and have to navigate to the requirements, I don't have it locally in an editor. EDIT3: Note to self: so clone it and look at it in an editor.
First consider going with the flow of the tools:
To manually install those packages, inside or outside a Docker Container, or to test that it works without building a new Docker Image, do pip install -r requirements.txt. You won't have to copy/paste the list of packages.
To "freeze" on specific versions of the packages to make builds more repeatable, pip freeze will create (or augment) that requirements.txt file for you.
PyCharm will look for a requirements.txt file, let you know if your currently installed packages don't match that specification, help you fix that, show you if updated packages are available, and help you update.
Presumably other modern IDEs do the same, but if you're developing in plain text editors, you can still run a script like this to check the installed packages (this is also handy in a git post-checkout hook):
echo -e "\nRequirements diff (requirements.txt vs current pips):"
diff --ignore-case <(sed 's/ *#.*//;s/^ *--.*//;/^$/d' requirements.txt | sort --ignore-case) \
<(pip freeze 2>/dev/null | sort --ignore-case) -yB --suppress-common-lines
Hopefully this makes it clearer that requirements.txt declares required packages and usually the package versions. It's more modular and reusable to keep it separate than embed it inside a Dockerfile.
It's a question of single responsibility.
Dockerfile's job is to package an application up to be built as an image. That is: it should describe every step needed to turn an application into a container image.
requirements.txt's job is to list every dependency of a Python application, regardless of its deployment strategy. Many Python workflows expect a requirements.txt and know how to add new dependencies while updating that requirements.txt file. Many other workflows can at least interoperate with requirements.txt. None of them know how to auto-populate a Dockerfile.
In short, the application is not complete if it does not include a requirements.txt. Including that information in the Dockerfile is like writing documentation that teaches your operations folks how to pull and install every individual dependency while deploying the application, rather than including it in a dependency manager that packages into the binary you deliver to ops.
I apologize if this is not the correct site for this. If it is not, please let me know.
Here's some background on what I am attempting. We are working on a series of chat bots that will go into production. Each of them will run on a environment in Anaconda. However, our setup uses tensorflow, which uses gcc to be compiled, and compliance has banned compilers from production. In addition, compliance rules also frown on us using pip or conda install in production.
As a way to get around this, I'm trying to tar the Anaconda 3 folder and move it into prod, with all dependencies already compiled and installed. However, the accounts between environments have different names, so this requires me to go into the bin folder (at the very least; I'm sure I will need to change them in the lib and pckg folders as well) and use sed -i to rename the hard coded paths to point from \home\<dev account>\anaconda to \home\<prod account>\anaconda, and while this seems to work, its also a good way to mangle my installation.
My questions are as follows:
Is there any good way to transfer anaconda from one user to another, without having to use sed -i on these paths? I've already read that Anaconda itself does not support this, but I would like your input.
Is there any way for me to install anaconda in dev so the scripts in it are either hard coded to use the production account name in their paths, or to use ~.
If I must continue to use sed, is there anything critical I should be aware of? For example, when I use grep <dev account> *, I will some files listed as binary file matches. DO I need to do anything special to change these?
And once again, I am well aware that I should just create a new Anaconda installation on the production machine, but that is simply not an option.
Edit:
So far, I've changed the conda.sh and conda.csh files in /etc, as well as the conda, activate, and deactivate files in the root bin. As such, I'm able to activate and deactivate my environment on the new user account. Also, I've changed the files in the bin folder under the bot environment. Right now, I'm trying to train the bot to test if this works, but it keeps failing and stating that a custom action does not exist in the the list. I don't think that is related to this, though.
Edit2:
I've confirmed that the error I was getting was not related to this. In order to get the bot to work properly with a ported version of Anaconda, all I had to change was the the conda.sh and conda.csh files in /etc so their paths to python use ~, do the same for the activate and deactivate files in /bin, and change the shebang line in the conda file in /bin to use the actual account name. This leaves every other file in /bin and lib still using the old account name in their shebang lines and other variable that use the path, and yet the bots work as expected. By all rights, I don't think this should work, but it does.
Anaconda is touchy about path names. They're obviously inserted into scripts, but they may be inserted into binaries as well. Some approaches that come to mind are:
Use Docker images in production. When building the image:
Install compilers as needed.
Build your stuff.
Uninstall the compilers and other stuff not needed at runtime.
Squash the image into a single layer.
This makes sure that the uninstalled stuff is actually gone.
Install Anaconda into the directory \home\<prod account>\anaconda on the development or build systems as well. Even though accounts are different, there should be a way to create a user-writeable directory in the same location.
Even better: Install Anaconda into a directory \opt\anaconda in all environments. Or some other directory that does not contain a username.
If you cannot get a directory outside of the user home, negotiate for a symlink or junction (mklink.exe /d or /j) at a fixed path \opt\anaconda that points into the user home.
If necessary, play it from the QA angle: Differing directory paths in production, as compared to all other environments, introduce a risk for bugs that can only be detected and reproduced in production. The QA or operations team should mandate that all applications use fixed paths everywhere, rather than make an exception for yours ;-)
Build inside a Docker container using directory \home\<prod account>\anaconda, then export an archive and run on the production system without Docker.
It's generally a good idea to build inside a reproducible Docker environment, even if you can get a fixed path without an account name in it.
Bundle your whole application as a pre-compiled Anaconda package, so that it can be installed without compilers.
That doesn't really address your problem though, because even conda install is frowned upon in production. But it could simplify building Docker images without squashing.
I've been building Anaconda environments inside Docker and running them on bare metal in production, too. But we've always made sure that the paths are identical across environments. I found mangling the paths too scary to even try. Life has become much simpler when we switched to Docker images everywhere. But if you have to keep using sed... Good Luck :-)
This is probably what you need : pip2pi.
This only works for pip compatible packages.
As I understand you need to move your whole setup as previously compiled as .tar.gz file, then here are a few things you could try:
Create a requirements.txt. These packages can help :
a. pipreqs
$ pipreqs /home/project/location
Successfully saved requirements file in /home/project/location/requirements.txt
b. snakefood.
Then, install pip2pi
$ pip install pip2pi
$ pip2tgz packages/ foo==1.2
...
$ ls packages/
foo-1.2.tar.gz
bar-0.8.tar.gz
pip2tgz passes package arguments directly to pip, so packages can be specified in any format that pip recognises:
$ cat requirements.txt
foo==1.2
http://example.com/baz-0.3.tar.gz
$ pip2tgz packages/ -r requirements.txt bam-2.3/
...
$ ls packages/
foo-1.2.tar.gz
bar-0.8.tar.gz
baz-0.3.tar.gz
bam-2.3.tar.gz
After getting all .tar.gz files, .tar.gz files can be turned into PyPI-compatible "simple" package index using the dir2pi command:
$ ls packages/
bar-0.8.tar.gz
baz-0.3.tar.gz
foo-1.2.tar.gz
$ dir2pi packages/
$ find packages/
packages/
packages/bar-0.8.tar.gz
packages/baz-0.3.tar.gz
packages/foo-1.2.tar.gz
packages/simple
packages/simple/bar
packages/simple/bar/bar-0.8.tar.gz
packages/simple/baz
packages/simple/baz/baz-0.3.tar.gz
packages/simple/foo
packages/simple/foo/foo-1.2.tar.gz
but they may be inserted into binaries as well
I can confirm that some packages have hard-coded the absolute path (including username) into the compiled binary. But if you restrict usernames to have the same length, you can apply sed on both binary and text files to make almost everything work as perfect.
On the other hand, if you copy the entire folder and use sed to replace usernames on only text files, you can run most of the installed packages. However, operations involving run-time compilation might fail, one example is installing a new package that requires compilation during installation.
Using GitHub's .gitignore, I was able to filter out some files and directories. However, there's a few things that left me a little bit confused:
GitHub's .gitignore did not include /bin and /share created by venv. I assumed they should be ignored by git, however, as the user is meant to build the virtual environment themselves.
Pip generated a pip-selfcheck.json file, which seemed mostly like clutter. I assume it usually does this, and I just haven't seen the file before because it's been placed with my global pip.
pyvenv.cfg is what I really can't make any sense of, though. On one hand, it specifies python version, which ought to be needed for others who want to use the project. On the other hand, it also specifies home = /usr/bin, which, while perhaps probably correct on a lot of Linux distributions, won't necessarily apply to all systems.
Are there any other files/directories I missed? Are there any stricter guidelines for how to structure a project and what to include?
Although venv is a very useful tool, you should not assume (unless you have good reason to do so) that everyone who looks at your repository uses it. Avoid committing any files used only by venv; these are not strictly necessary to be able to run your code and they are confusing to people who don't use venv.
The only configuration file you need to include in your repository is the requirements.txt file generated by pip freeze > requirements.txt which lists package dependencies. You can then add a note in your readme instructing users to install these dependencies with the command pip install -r requirements.txt. It would also be a good idea to specify the required version of Python in your readme.
I have a number of python "script suites" (as I call them) which I would like to make easy-to-install for my colleagues. I have looked into pip, and that seems really nice, but in that regimen (as I understand it) I would have to submit a static version and update them on every change.
As it happens I am going to be adding and changing a lot of stuff in my script suites along the way, and whenever someone installs it, I would like them to get the newest version. With pip, that means that on every commit to my repository, I will also have to re-submit a package to the PyPI index. That's a lot of unnecessary work.
Is there any way to provide an easy cross-platform installation (via pip or otherwise) which pulls the files directly from my github repo?
I'm not sure if I understand your problem entirely, but you might want to use pip's editable installs[1]
Here's a brief example: In this artificial example let's suppose you want to use git as CVS.
git clone url_to_myrepo.git path/to/local_repository
pip install [--user] -e path/to/local_repository
The installation of the package will reflect the state of your local repository. Therefore there is no need to reinstall the package with pip when the remote repository gets updated. Whenever you pull changes to your local repository, the installation will be up-to-date as well.
[1] http://pip.readthedocs.org/en/latest/reference/pip_install.html#editable-installs
I have a django project that uses a lot of 3rd party apps, so wanted to decide out of the two approaches to manage my situation :
I can use [ virtualenv + pip ] along with pip freeze as requirements file to manage my project dependencies.
I don't have to worry about the apps, but can't have that committed with my code to svn.
I can have a lib folder in my svn structure and have my apps sit there and add that to sys.path
This way, my dependencies can be committed to svn, but I have to manage sys.path
Which way should I proceed ?
What are the pros and cons of each approach ?
Update:
Method1 Disadvantage : Difficult to work with appengine.
This has been unanswered question (at least to me) so far. There're some discussion on this recently:-
https://plus.google.com/u/0/104537541227697934010/posts/a4kJ9e1UUqE
Ian Bicking said this in the comment:-
I do think we can do something that incorporates both systems. I
posted a recipe for handling this earlier, for instance (I suppose I
should clean it up and repost it). You can handle libraries in a very
similar way in Python, while still using the tools we have to manage
those libraries. You can do that, but it's not at all obvious how to
do that, so people tend to rely on things like reinstalling packages
at deploy time.
http://tarekziade.wordpress.com/2012/02/10/defining-a-wsgi-app-deployment-standard/
The first approach seem the most common among python devs. When I first started doing development in Django, it feels a bit weird since when doing PHP, it quite common to check third party lib into the project repo but as Ian Bicking said in the linked post, PHP style deployment leaves out thing such non-portable library. You don't want to package things such as mysqldb or PIL into your project which better being handled by tools like Pip or distribute.
So this is what I'm using currently.
All projects will have virtualenv directory at the project root. We name it as .env and ignore it in vcs. The first thing dev did when to start doing development is to initialize this virtualenv and install all requirements specified in requirements.txt file. I prefer having virtualenv inside project dir so that it obvious to developer rather than having it in some other place such as $HOME/.virtualenv and then doing source $HOME/virtualenv/project_name/bin/activate to activate the environment. Instead developer interact with the virtualenv by invoking the env executable directly from project root, such as:-
.env/bin/python
.env/bin/python manage.py runserver
To deploy, we have a fabric script that will first export our project directory together with the .env directory into a tarball, then copy the tarball to live server, untar it deployment dir and do some other tasks like restarting the server etc. When we untar the tarball on live server, the fabric script make sure to run virtualenv once again so that all the shebang path in .env/bin get fixed. This mean we don't have to reinstall dependencies again on live server. The fabric workflow for deployment will look like:-
fab create_release:1.1 # create release-1.1.tar.gz
fab deploy:1.1 # copy release-1.1.tar.gz to live server and do the deployment tasks
fab deploy:1.1,reset_env=1 # same as above but recreate virtualenv and re-install all dependencies
fab deploy:1.1,update_pkg=1 # only reinstall deps but do not destroy previous virtualenv like above
We also do not install project src into virtualenv using setup.py but instead add path to it to sys.path. So when deploying under mod_wsgi, we have to specify 2 paths in our vhost config for mod_wsgi, something like:-
WSGIDaemonProcess project1 user=joe group=joe processes=1 threads=25 python-path=/path/to/project1/.env/lib/python2.6/site-packages:/path/to/project1/src
In short:
We still use pip+virtualenv to manage dependencies.
We don't have to reinstall requirements when deploying.
We have to maintain path into sys.path a bit.
Virtualenv and pip are fantastic for working on multiple django projects on one machine. However, if you only have one project that you are editing, it is not necessary to use virtualenv.