Why use requirements.txt in a Docker image

Why use requirements.txt in a Docker image - python

There is a similar question from last year but I don't think the responses are widely applicable and it's not accepted.
Edit: this is in the context of developing small jobs that will only be run in docker in-house; I'm not talking about sharing work with anyone outside a small team, or about projects getting heavy re-use.
What advantage do you see in using requirements.txt to install instead of pip install commands in Dockerfile? I see one: your Dockerfile for various projects is more cookie-cutter.
I'm not even thinking of the use of setup envisioned in the question I linked.
What downside is there to naming the packages in Dockerfile:
RUN pip install --target=/build django==3.0.1 Jinja2==2.11.1 . . .
EDIT 2: #superstormer asked "what are the upsides to putting it in Dockefile" -- fair question. I read co-workers' dockerfiles in Gitlab and have to navigate to the requirements, I don't have it locally in an editor. EDIT3: Note to self: so clone it and look at it in an editor.

First consider going with the flow of the tools:
To manually install those packages, inside or outside a Docker Container, or to test that it works without building a new Docker Image, do pip install -r requirements.txt. You won't have to copy/paste the list of packages.
To "freeze" on specific versions of the packages to make builds more repeatable, pip freeze will create (or augment) that requirements.txt file for you.
PyCharm will look for a requirements.txt file, let you know if your currently installed packages don't match that specification, help you fix that, show you if updated packages are available, and help you update.
Presumably other modern IDEs do the same, but if you're developing in plain text editors, you can still run a script like this to check the installed packages (this is also handy in a git post-checkout hook):
echo -e "\nRequirements diff (requirements.txt vs current pips):"
diff --ignore-case <(sed 's/ *#.*//;s/^ *--.*//;/^$/d' requirements.txt | sort --ignore-case) \
<(pip freeze 2>/dev/null | sort --ignore-case) -yB --suppress-common-lines
Hopefully this makes it clearer that requirements.txt declares required packages and usually the package versions. It's more modular and reusable to keep it separate than embed it inside a Dockerfile.

It's a question of single responsibility.
Dockerfile's job is to package an application up to be built as an image. That is: it should describe every step needed to turn an application into a container image.
requirements.txt's job is to list every dependency of a Python application, regardless of its deployment strategy. Many Python workflows expect a requirements.txt and know how to add new dependencies while updating that requirements.txt file. Many other workflows can at least interoperate with requirements.txt. None of them know how to auto-populate a Dockerfile.
In short, the application is not complete if it does not include a requirements.txt. Including that information in the Dockerfile is like writing documentation that teaches your operations folks how to pull and install every individual dependency while deploying the application, rather than including it in a dependency manager that packages into the binary you deliver to ops.

Related

When working with a venv virtual environment, which files should I be commiting to my git repository?

Using GitHub's .gitignore, I was able to filter out some files and directories. However, there's a few things that left me a little bit confused:
GitHub's .gitignore did not include /bin and /share created by venv. I assumed they should be ignored by git, however, as the user is meant to build the virtual environment themselves.
Pip generated a pip-selfcheck.json file, which seemed mostly like clutter. I assume it usually does this, and I just haven't seen the file before because it's been placed with my global pip.
pyvenv.cfg is what I really can't make any sense of, though. On one hand, it specifies python version, which ought to be needed for others who want to use the project. On the other hand, it also specifies home = /usr/bin, which, while perhaps probably correct on a lot of Linux distributions, won't necessarily apply to all systems.
Are there any other files/directories I missed? Are there any stricter guidelines for how to structure a project and what to include?

Although venv is a very useful tool, you should not assume (unless you have good reason to do so) that everyone who looks at your repository uses it. Avoid committing any files used only by venv; these are not strictly necessary to be able to run your code and they are confusing to people who don't use venv.
The only configuration file you need to include in your repository is the requirements.txt file generated by pip freeze > requirements.txt which lists package dependencies. You can then add a note in your readme instructing users to install these dependencies with the command pip install -r requirements.txt. It would also be a good idea to specify the required version of Python in your readme.

Getting dependencies in a Python development environment

When working with JVM languages a pattern commonly followed is to use a build system (ant+ivy / maven / gradle), where using a build file, the dependencies of your code can be defined. The build system is able to fetch these dependencies when you build your code. Moreover IDEs like Eclipse/IntelliJ are also able to read these build files and continuously build/verify your code as you write it.
How is something similar done while developing in Python? While there may not necessarily be a build step, I want a developer to be able to checkout my code and then run a single bootstrap command that will setup a virtualenv and pull in any thirdy-party dependencies necessary to run the code. I could include some sort of a script to do this, but I am wondering if there is a tool to do this? Most of my search so far has led me to packaging tools, which are more for distribution to end-user than for this purpose (or so I understand).

This is managed by virtualenv and the pip install -r requirements.txt command. More info here: Virtual Environments

I guess requirements.txt is what you are looking for. For example, PyCharm IDE will definitely see it as a dependency list.

Why should i create node_modules folder for nodejs dependencies for each expressjs app

I really don't get it. When i run npm install on the main folder, why does it have to download all the dependencies in the node_modules and why does this need to be done for each single project? In Sinatra(Ruby microframework), I never had to do this and it is easy to use the gems that are installed globally without having to download and save each one of them into the project folder again.
I read somewhere that it is done to avoid version mismatch issues but if its working by installing it globally and simply 'require'ing it in many other languages like Python(uses virtualenv to tackle version issues), Ruby etc., why can't it be the same for node.js?
Whatever happened to DRY?

You can do npm install --global (or -g for short) to install globally. But that creates a problem when 2 projects reference to different version of the same dependency. You will have conflicts that is hard to trace. Also install it locally makes it more portable. You can reference to this document
This problem does not only exist in node.js world. Just different language approach this problem differently. I don't know ruby well, but:
In python people use virtualenv to separate dependencies, which needs more effort.
Maven of Java will cache artifacts in $HOME/.m2 folder, but when compile the projects, they will copy those bytecode from .m2 folder to a local folder target.
There is time you want to do npm install --global though for those tools such as grunt.js/gulp.js.
I just think it has nothing to do with DRY, because you didn't write code twice. It's just downloaded twice.
That being said, you still can install everything globlally.

Distributing python code with virtualenv?

I want to distribute some python code, with a few external dependencies, to machines with only core python installed (and users that unfamiliar with easy_install etc.).
I was wondering if perhaps virtualenv can be used for this purpose? I should be able to write some bash scripts that trigger the virtualenv (with the suitable packages) and then run my code.. but this seems somewhat messy, and I'm wondering if I'm re-inventing the wheel?
Are there any simple solutions to distributing python code with dependencies, that ideally doesn't require sudo on client machines?

Buildout - http://pypi.python.org/pypi/zc.buildout
As sample look at my clean project: http://hg.jackleo.info/hyde-0.5.3-buildout-enviroment/src its only 2 files that do the magic, more over Makefile is optional but then you'll need bootstrap.py (Make file downloads it, but it runs only on Linux). buildout.cfg is the main file where you write dependency's and configuration how project is laid down.
To get bootstrap.py just download from http://svn.zope.org/repos/main/zc.buildout/trunk/bootstrap/bootstrap.py
Then run python bootstap.py and bin/buildout. I do not recommend to install buildout locally although it is possible, just use the one bootstrap downloads.
I must admit that buildout is not the easiest solution but its really powerful. So learning is worth time.
UPDATE 2014-05-30
Since It was recently up-voted and used as an answer (probably), I wan to notify of few changes.
First of - buildout is now downloaded from github https://raw.githubusercontent.com/buildout/buildout/master/bootstrap/bootstrap.py
That hyde project would probably fail due to buildout 2 breaking changes.
Here you can find better samples http://www.buildout.org/en/latest/docs/index.html also I want to suggest to look at "collection of links related to Buildout" part, it might contain info for your project.
Secondly I am personally more in favor of setup.py script that can be installed using python. More about the egg structure can be found here http://peak.telecommunity.com/DevCenter/PythonEggs and if that looks too scary - look up google (query for python egg). It's actually more simple in my opinion than buildout (definitely easier to debug) as well as it is probably more useful since it can be distributed more easily and installed anywhere with a help of virtualenv or globally where with buildout you have to provide all of the building scripts with the source all of the time.

You can use a tool like PyInstaller for this purpose. Your application will appear as a single executable on all platforms, and include dependencies. The user doesn't even need Python installed!
See as an example my logview package, which has dependencies on PyQt4 and ZeroMQ and includes distributions for Linux, Mac OSX and Windows all created using PyInstaller.

You don't want to distribute your virtualenv, if that's what you're asking. But you can use pip to create a requirements file - typically called requirements.txt - and tell your users to create a virtualenv then run pip install -r requirements.txt, which will install all the dependencies for them.
See the pip docs for a description of the requirements file format, and the Pinax project for an example of a project that does this very well.

Django and VirtualEnv Development/Deployment Best Practices

Just curious how people are deploying their Django projects in combination with virtualenv
More specifically, how do you keep your production virtualenv's synched correctly with your development machine?
I use git for scm but I don't have my virtualenv inside the git repo - should I, or is it best to use the pip freeze and then re-create the environment on the server using the freeze output? (If you do this, could you please describe the steps - I am finding very little good documentation on the unfreezing process - is something like pip install -r freeze_output.txt possible?)

I just set something like this up at work using pip, Fabric and git. The flow is basically like this, and borrows heavily from this script:
In our source tree, we maintain a requirements.txt file. We'll maintain this manually.
When we do a new release, the Fabric script creates an archive based on whatever treeish we pass it.
Fabric will find the SHA for what we're deploying with git log -1 --format=format:%h TREEISH. That gives us SHA_OF_THE_RELEASE
Fabric will get the last SHA for our requirements file with git log -1 --format=format:%h SHA_OF_THE_RELEASE requirements.txt. This spits out the short version of the hash, like 1d02afc which is the SHA for that file for this particular release.
The Fabric script will then look into a directory where our virtualenvs are stored on the remote host(s).
If there is not a directory named 1d02afc, a new virtualenv is created and setup with pip install -E /path/to/venv/1d02afc -r /path/to/requirements.txt
If there is an existing path/to/venv/1d02afc, nothing is done
The little magic part of this is passing whatever tree-ish you want to git, and having it do the packaging (from Fabric). By using git archive my-branch, git archive 1d02afc or whatever else, I'm guaranteed to get the right packages installed on my remote machines.
I went this route since I really didn't want to have extra virtuenvs floating around if the packages hadn't changed between release. I also don't like the idea of having the actual packages I depend on in my own source tree.

I use this bootstrap.py: http://github.com/ccnmtl/ccnmtldjango/blob/master/ccnmtldjango/template/bootstrap.py
which expects are directory called 'requirements' that looks something like this: http://github.com/ccnmtl/ccnmtldjango/tree/master/ccnmtldjango/template/requirements/
There's an apps.txt, a libs.txt (which apps.txt includes--I just like to keep django apps seperate from other python modules) and a src directory which contains the actual tarballs.
When ./bootstrap.py is run, it creates the virtualenv (wiping a previous one if it exists) and installs everything from requirements/apps.txt into it. I do not ever install anything into the virtualenv otherwise. If I want to include a new library, I put the tarball into requirements/src/, add a line to one of the textfiles and re-run ./bootstrap.py.
bootstrap.py and requirements get checked into version control (also a copy of pip.py so I don't even have to have that installed system-wide anywhere). The virtualenv itself isn't. The scripts that I have that push out to production run ./bootstrap.py on the production server each time I push. (bootstrap.py also goes to some lengths to ensure that it's sticking to Python 2.5 since that's what we have on the production servers (Ubuntu Hardy) and my dev machine (Ubuntu Karmic) defaults to Python 2.6 if you're not careful)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.