Here's the situation:
I have developed some python code, and I have pip available on my machine. I pull in some libraries to make the program behave better, do advanced things, have colored output, etc...
Now I need to ship this code to boxes that do not have pip and will not ever have pip.
In node, it is trivial to achieve this. You can just deploy your node_modules folder and the npm install step isn't needed.
What's the best practice to do this for Python? The version I am working with is Python 2.6.8, and I only need to care about distributing to Linux machines.
I've read that it's possible to just freeze the entire project (including the Python interpreter) and just ship a binary. Would like to avoid that if possible, as it seems wasteful to do that in an environment where Python is available on all of your systems.
Thanks for any advice.
Related
I would like to pack a python program and ship it in a deb package.
For reasons (I know in 99% it is bad practice) I want to ship the program in a python virtual environment within a debian package.
I know I can do this using dh-virtualenv. This works great - generally no problem.
But the problem arises when I want to upload this to launchpad. Uploading to launchpad means uploading a source package. In terms of dh-virtualenv a source package is the package description, where the virtualenv has not been created, yet.
What happens when I upload this to launchpad is, that the package will not build, since the dh-virtualenv which is executed during the build process on launchpad will try to install python modules into the virtualenv, which means installing these from the PyPI, which will not work, since launchpad does not allow external network access.
So basically there are two possible solutions:
Approach A
Prepare the virtualenv and somehow incorporate it into the source package and having the dh build process simply "move" this prepared virtualenv to its end location. This could work with virtualenv --relocatable. BUT the relocation strips the utf-8 marker at the beginning of all python scripts, rendering all python scripts in the virtualenv broken.
Apporach B
Somehow cache all necessary python packages in the source package and have dh_virtualenv install from the cache instead of from PyPI.
This seems like to be doable with pip2pi, but certain experiements show, that it will not install packages, although they are located in the local package index.
Both approaches seem a bit clumsy and prone to errors.
What do you think of this?
What are your experiences?
What would you recommend?
I'm working by myself right now, but am looking at ways to scale my operation.
I'd like to find an easy way to version my Python distribution, so that I can recreate it very easily. Is there a tool to do this? Or can I add /usr/local/lib/python2.7/site-packages/ (or whatever) to an svn repo? This doesn't solve the problems with PATHs, but I can always write a script to alter the path. Ideally, the solution would be to build my Python env in a VM, and then hand copies of the VM out.
How have other people solved this?
virtualenv + requirements.txt are your friend.
You can create several virtual python installs for your projects, everything containing exactly those library versions you need (Tip: pip freeze spits out a requirements.txt with the exact library versions).
Find a good reference to virtualenv here: http://simononsoftware.com/virtualenv-tutorial/ (it's from this question Comprehensive beginner's virtualenv tutorial?).
Alternatively, if you just want to distribute your code together with libraries, PyInstaller is worth a try. You can package everything together in a static executable - you don't even have to install the software afterwards.
You want to use virtualenv. It lets you create an application(s) specific directory for installed packages. You can also use pip to generate and build a requirements.txt
For the same goal, i.e. having the exact same Python distribution as my colleagues, I tried to create a virtual environment in a network drive, so that everybody of us would be able to use it, without anybody making his local copy.
The idea was to share the same packages installed in a shared folder.
Outcome: Python run so unbearably slow that it could not be used. Also installing a package was very very sluggish.
So it looks there is no other way than using virtualenv and a requirements file. (Even if unfortunately often it does not always work smoothly on Windows and it requires manual installation of some packages and dependencies, at least at this time of writing.)
My goal is to distribute a Python package that has several other widely used Python packages as dependencies. My package depends on well written, Pypi-indexed packages like pandas, scipy and numpy, and specifies in the setup.py that certain versions or higher of these are needed, e.g. "numpy >= 1.5".
I found that it's immensely frustrating and nearly impossible for Unix savvy users who are not experts in Python packaging (even if they know how to write Python) to install a package like mine, even when using what are supposed to be easy to use package managers. I am wondering if there is an alternative to this painful process that someone can offer, or if my experience just reflects the very difficult current state of Python packaging and distribution.
Suppose users download your package onto their system. Most will try to install it "naively", using something like:
$ python setup.py install
Since if you google instructions on installing Python packages, this is usually what comes up. This will fail for the vast majority of users, since most do not have root access on their Unix/Linux servers. With more searching, they will discover the "--prefix" option and try:
$ python setup.py install --prefix=/some/local/dir
Since the users are not aware of the intricacies of Python packaging, they will pick an arbitrary directory as an argument to --prefix, e.g. "~/software/mypackage/". It will not be a cleanly curated directory where all other Python packages reside, because again, most users are not aware of these details. If they install another package "myotherpackage", they might pass it "~/software/myotherpackage", and you can imagine how down the road this will lead to frustrating hacking of PYTHONPATH and other complications.
Continuing with the installation process, the call to "setup.py install" with "--prefix" will also fail once users try to use the package, even though it appeared to have been installed correctly, since one of the dependencies might be missing (e.g. pandas, scipy or numpy) and a package manager is not used. They will try to install these packages individually. Even if successful, the packages will inevitably not be in the PYTHONPATH due to the non-standard directories given to "--prefix" and patient users will dabble with modifications of their PYTHONPATH to get the dependencies to be visible.
At this stage, users might be told by a Python savvy friend that they should use a package manager like "easy_install", the mainstream manager, to install the software and have dependencies taken care of. After installing "easy_install", which might be difficult, they will try:
$ easy_install setup.py
This too will fail, since users again do not typically have permission to install software globally on production Unix servers. With more reading, they will learn about the "--user" option, and try:
$ easy_install setup.py --user
They will get the error:
usage: easy_install [options] requirement_or_url ...
or: easy_install --help
error: option --user not recognized
They will be extremely puzzled why their easy_install does not have the --user option where there are clearly pages online describing the option. They might try to upgrade their easy_install to the latest version and find that it still fails.
If they continue and consult a Python packaging expert, they will discover that there are two versions of easy_install, both named "easy_install" so as to maximize confusion, but one part of "distribute" and the other part of "setuptools". It happens to be that only the "easy_install" of "distribute" supports "--user" and the vast majority of servers/sys admins install "setuptools"'s easy_install and so local installation will not be possible. Keep in mind that these distinctions between "distribute" and "setuptools" are meaningless and hard to understand for people who are not experts in Python package management.
At this point, I would have lost 90% of even the most determined, savvy and patient users who try to install my software package -- and rightfully so! They wanted to install a piece of software that happened to be written in Python, not to become experts in state of the art Python package distribution, and this is far too confusing and complex. They will give up and be frustrated at the time wasted.
The tiny minority of users who continue on and ask more Python experts will be told that they ought to use pip/virtualenv instead of easy_install. Installing pip and virtualenv and figuring out how these tools work and how they are different from the conventional "python setup.py" or "easy_install" calls is in itself time consuming and difficult, and again too much to ask from users who just wanted to install a simple piece of Python software and use it. Even those who pursue this path will be confused as to whether whatever dependencies they installed with easy_install or setup.py install --prefix are still usable with pip/virtualenv or if everything needs to be reinstalled from scratch.
This problem is exacerbated if one or more of the packages in question depends on installing a different version of Python than the one that is the default. The difficulty of ensuring that your Python package manger is using the Python version you want it to, and that the required dependencies are installed in the relevant Python 2.x directory and not Python 2.y, will be so endlessly frustrating to users that they will certainly give up at that stage.
Is there a simpler way to install Python software that doesn't require users to delve into all of these technical details of Python packages, paths and locations? For example, I am not a big Java user, but I do use some Java tools occasionally, and don't recall ever having to worry about X and Y dependencies of the Java software I was installing, and I have no clue how Java package managing works (and I'm happy that I don't -- I just wanted to use a tool that happened to be written in Java.) My recollection is that if you download a Jar, you just get it and it tends to work.
Is there an equivalent for Python? A way to distribute software in a way that doesn't depend on users having to chase down all these dependencies and versions? A way to perhaps compile all the relevant packages into something self-contained that can just be downloaded and used as a binary?
I would like to emphasize that this frustration happens even with the narrow goal of distributing a package to savvy Unix users, which makes the problem simpler by not worrying about cross platform issues, etc. I assume that the users are Unix savvy, and might even know Python, but just aren't aware (and don't want to be made aware) about the ins and outs of Python packaging and the myriad of internal complications/rivalries of different package managers. A disturbing feature of this issue is that it happens even when all of your Python package dependencies are well-known, well-written and well-maintained Pypi-available packages like Pandas, Scipy and Numpy. It's not like I was relying on some obscure dependencies that are not properly formed packages: rather, I was using the most mainstream packages that many might rely on.
Any help or advice on this will be greatly appreciated. I think Python is a great language with great libraries, but I find it virtually impossible to distribute the software I write in it (once it has dependencies) in a way that is easy for people to install locally and just run. I would like to clarify that the software I'm writing is not a Python library for programmatic use, but software that has executable scripts that users run as individual programs. Thanks.
We also develop software projects that depend on numpy, scipy and other PyPI packages. Hands down, the best tool currently available out there for managing remote installations is zc.buildout. It is very easy to use. You download a bootstrapping script from their website and distribute that with your package. You write a "local deployment" file, called normally buildout.cfg, that explains how to install the package locally. You ship both the bootstrap.py file and buildout.cfg with your package - we use the MANIFEST.in file in our python packages to force the embedding of these two files with the zip or tar balls distributed by PyPI. When the user unpackages it, it should execute two commands:
$ python bootstrap.py # this will download zc.buildout and setuptools
$ ./bin/buildout # this will build and **locally** install your package + deps
The package is compiled and all dependencies are installed locally, which means that the user installing your package doesn't even need root privileges, which is an added feature. The scripts are (normally) placed under ./bin, so the user can just execute them after that. zc.buildout uses setuptools for interaction with PyPI so everything you expect works out of the box.
You can extend zc.buildout quite easily if all that power is not enough - you create the so-called "recipes" that can help the user to create extra configuration files, download other stuff from the net or instantiate custom programs. zc.buildout website contains a video tutorial that explains in details how to use buildout and how to extend it. Our project Bob makes extensive use of buildout for distributing packages for scientific usage. If you would like, please visit the following page that contains detailed instructions for our developers on how they can setup their python packages so other people can build and install them locally using zc.buildout.
We're currently working to make it easier for users to get started installing Python software in a platform independent manner (in particular see https://python-packaging-user-guide.readthedocs.org/en/latest/future.html and http://www.python.org/dev/peps/pep-0453/)
For right now, the problem with two competing versions of easy_install has been resolved, with the competing fork "distribute" being merged backing into the setuptools main line of development.
The best currently available advice on cross-platform distribution and installation of Python software is captured here: https://packaging.python.org/
I'm not happy with the way that I currently deploy Python code and I was wondering if there is a better way. First I'll explain what I'm doing, then the drawbacks:
When I develop, I use virtualenv to do dependancy isolation and install all libraries using pip. Python itself comes from my OS (Ubuntu)
Then I build my code into a ".deb" debian package consisting of my source tree and a pip bundle of my dependancies
Then when I deploy, I rebuild the virtualenv environment, source foo/bin/activate and then run my program (under Ubuntu's upstart)
Here are the problems:
The pip bundle is pretty big and increases the size of the debian package significantly. This is not too big a deal, but it's annoying.
I have to build all the C libraries (PyMongo, BCrypt, etc) every time I deploy. This takes a little while (a few minutes) and it's a bit lame to do this CPU bound job on production
Here are my constraints:
Must work on Python 3. Preferably 3.2
Must have dependency isolation
Must work with libraries that use C (like PyMongo)
I've heard things about freezing, but I haven't been able to get this to work. cx_freeze out of Pypi doesn't seem to compile (on my Python, at least). The other freeze utilities don't seem to work with Python 3. How can I do this better?
Wheel is probably the best way to do this at the moment.
Create a virtualenv on the deployment machine, and deploy a wheel along with any dependencies (also built as wheels) to that virtualenv.
This solves the problems:
Having separate wheels for dependencies mean you don't have to redeploy dependencies that haven't changed, cutting the size of the deployment artefact
Building big packages (such as lxml or scipy) can be done locally, and then the compiled wheel pushed to production
Also, it works fine with libraries that use C.
Have you looked at buildout (zc.buildout)? With a custom recipe you may be able to automate most of this.
I'm looking to create the following:
A portable version of python that can be run on any system (with any previous version of python or no python installed) and have it pre-configured with various python packages (ie, django, lxml, pysqlite, etc)
The closest I've found to the above is virtualenv, but this only goes so far.
If I package up a nice virtualenv for python on one machine, it contains sym links to a lot of the libraries it needs. I can take those sym links and convert them to their actual files, but if I try to move this entire directory to another machine, I get seg fault after seg fault.
To launch python on a different machine, I'm using:
LD_LIBRARY_PATH=lib/ ./bin/python
and in lib/ I have all of the shared libraries I copied from the original machine. The problem here is these shared libraries might rely on other shared libraries that I'm not including, so executing this on other linux distros does not work. Probably due to it falling back on older shared libaries installed on the system that do not work with what I copied over.
Anyone have an idea on how to get this working? Is this even possible?
EDIT:
To clarify, the desired outcome is to create a tar.gz of a python binary and associated packages (django, lxml, pysqlite, etc) that can be extracted and run on any linux based system, ie (ubuntu 8.04, redhat 5, suse 11, etc), all 32bit distros, where the locally installed version of python doesn't impact what's in the tar.gz.
I just tested this and it works great.
Get the copy of python you want to install and untar it and cd to the untarred folder first.
Also get a copy of setuptools and untar that.
/opt/portapy used below is of course just the name I came up with for this post, it could be any path and the full path should be tarred up and the same path should be used on any systems you put this on due to absolute path linking.
mkdir /opt/portapy
cd <python source dir>
./configure --prefix=/opt/portapy && make && make install
cd <setuptools source dir>
/opt/portapy/bin/python ./setup.py install
Make the virtual env folder inside the portapy folder.
mkdir /opt/portapy/virtenv
/opt/portapy/bin/virtualenv /opt/portapy/virtenv
cd /opt/portapy/virtenv
source bin/activate
Done. You are ready to install all of your libraries here and have the option of creating multiple virtual envs this way.
You can then tar up the whole /opt/portapy folder and transport it to any Linux system of the same arch, within reason I suspect.
I compiled 2.7.5 ond centOS 5.8 64bit and moved the folder to a Cent6.9 system and it runs perfectly.
I don't know how this is even possible. If it were, they woudn't need to distribute binary packages of python for different platforms. You can't simply distribute python that will run on any platform. It has to be built from source for that arch. Virtualenv will expect you to tell it which system python to use (using links).
This pretty much goes for almost any binary package that links against system libs. Again, if it were possible, we wouldn't need any platform specific binary distributions.
You can, however, achieve part of what you want. That is, running python on another machine that doesn't have python installed as long as its the same arch. This is the same concept behind freezing, or py2exe/py2app/pyinstaller. An interpreter is bundled into a standalone environment. So the app can run on any similar platform.
Edit
I just realized that while your question speaks about "system" agnostically, your title contains the reference "linux". There are different flavors of linux, so in order for it to work you would have to build it fat for multiple archs and also completely contain the standalone links. You might try building a package with pyinstaller and using that to include in your project.
You can try just building python from source, in your virtualenv:
$ ./configure --prefix=/path/to/virtualenv && make && make install
If you still have problems with the links to libs, you can also investigate building it statically
I'm not sure that working solely in Python is the way to go here. You might have better luck with Puppet of Chef, which are configuration tools that can be used to create a local environment. There is plenty of code out there to install virtualenv and python on just about any Linux plus OSX (probably not Windows though).
Your workflow would be to install chef or Puppet (your choice), run a script to install the Python you want, then enter a virtualenv and pip install any packages you might need.
Sorry this isn't as easy as virtualenv alone, but it is much more robust.
Well, since I rarely accept "can't be done", there is a way to do it. Warning: it isn't pretty and you should probably look into a different scenario.
What you will need to to is determine a standard location for this top level directory. Second, using that directory as your root you will need to compile Python on each Linux distribution you want to run this on. For this you would use something like "/usr/local/myappname/platform/" to configure and compile Python to live in. In each case substitute "platform" with the name of the platform such as "/usr/local/rhel/". If memory serves the configure option you are looking for here is --prefix.
Once you have each distribution compiled you will need a script to determine which one to use and either set environment variables or have it create symlinks to the appropriate "installation" of python. I would then use virtualenv and bootstrap in that tree to keep the "in-use" python libraries even more specific.
I can't think of a common Linux distribution that doesn't have Python by default. As such you could use setup.py and/or basic python scripts to script this out since you should be able to rely in Python being present - even if its ye olde version as in RHEL installs. Personally I find the above method overly complicated but it would meet your stated requirements with the allowance for a final script. Of course, you could use shar (SHell ARchive) to tar all of this into a runnable shell script to do the installation and avoid the need for secondary scripts. If you gzip the resulting shel archive then you can decompress it on target systems and execute it to set everything up.
All that said, I would not recommend this. I would recommend determining the minimum Python version you can run on and ensuring that is installed by the distribution whenever possible and if needs be pulling down from a repo and installing. Then, use virtualenv and bootstrap with a requirements.txt to install necessary python libraries and apps into the virutalenv. For that see this documentation
I faced the same problem, so I created PortableVirtualenv. Your Question is just the definition of it.
I use it as a base for commercial multiplatform app I develop. (But PortableVirtualenv is public domain - use it freely.)
If needed, you can pip-install any package and zip the whole directory to distribute also packages you need.
One nice option is to make a "snap" portable linux application. They have a python mode which lets you specify you specify exactly what modules you need. From https://snapcraft.io/first-snap#python :
Snaps let you distribute a dependency-isolated Python app in an app store experience for end users.
Another option is to containerize your application with something like docker. Then instead of executing your script directly, the user is actually running a small OS with just your application and its dependencies. https://www.infoq.com/articles/docker-executable-images/ has more about executable containers.
Container images can also be used for short lived processes: a containerized executable meant to be run on your computer. These containers execute a single task, are short lived and can generally be removed after use. We call these executable images. Examples are compilers (Golang) or build tools (Maven), presentation software (I love to hack a simple presentation in Markdown format and let a RevealJS Docker image serve that) and browsers (a fresh contained browser to follow that fishy link). A real evangelist for executable images is Docker's own Jessie Frazelle. To get some great inspiration be sure to read her blog about them or check out this presentation at DockerCon 2015.