Creating cross platform self contained python environments?

Creating cross platform self contained python environments? - python

I would like to develop some Python3.6 software. The problem is that the software would run on hundreds of uniquely configured build environments that may or may not have python installed and do not have access to the internet or pypi. The machine are a mix between windows and Suse. It's important not to mess with the build environment so I would like to package my software with a isolated python environment with all the dependencies.
I'm finding it difficult to find a solution that would meet my criteria.
I've come across python virtual environments but they do not have an interpreter and are not really intended to copied around.
Another person on stack overflow recommend PEX, this looks perfect but does not seem to be compatible with Windows.
I also have thought about making the software a statically linked binary, using Cython. But again to my knowledge this still requires the correct python to be installed and has to use pure Python.

https://pyoxidizer.readthedocs.io/en/latest/comparisons.html has a comparison of various solutions in this space. It looks that if you need a cross-platform solution that doesn't require the target systems to be pre-configured (e.g. with a particular version of python pre-installed), your options are PyInstaller, PyOxidizer and Docker.
PyInstaller is more established, while PyOxidizer claims to have faster startup.
I'd expect Docker to be the least problematic if you have complex dependencies. It must be preinstalled on the target systems, but the build environments will probably have it already installed. Obviously it comes with more overhead.

Related

Building a python package to publish in pypi

I am greatly confused with the process of building a python package that I want to distribute on pypi.
There are some specific, basic things that I did not understand:
What exactly is that gets published? Binaries? Source code? How do I do one or the other?
How do I build multiple platform-specific, os-specific build from the same codebase?
How do I build a the package for multiple versions of Python from the same codebase? Is it necessary if I want to support many python versions?
I am using a .toml file for the setup configuration.
I found some answers only, but all refer to procedures with either a setup.py or a setup.cfg.

What exactly is that gets published? Binaries? Source code?
Yes, and yes.
It depends on details of your project and your package config.
Arbitrary commands can be run during a package build.
You might, for example, run a fortran compiler locally
and ship binaries, or you might insist that each person
installing the package run their own local fortran compiler.
We usually expect full *.py source code will appear on pypi.org.
"Binaries" here will usually describe compiled machine code,
and not *.pyc bytecode files.
How do I build multiple platform-specific, os-specific build from the same codebase?
I have only done this via git pull on a target platform, followed
by a local build, but there is certainly support for cross target
toolchains if you need that.
How do I build a the package for multiple versions of Python from the same codebase?
Same as above -- do a build under each separate target version.
Is it necessary if I want to support many python versions?
Typically the answer is "no".
Pick a minimum required interpreter version, such as 3.7,
and do all your development / testing / release work in that.
Backward compatibility of interpreters is excellent.
Folks running 3.8 or 3.11 should have no trouble
with your package.
There can be a fly in the ointment.
Suppose your project depends on library X,
or depends on X which depends on Y.
And one of them stopped being updated a few years ago,
or went through a big change like a rename.
Your users who are on 3.11 might find it
inconvenient to obtain a compatible version of X or Y.
This might motivate you to do split releases,
for example via major version number or by
slightly altering your project name.
Clearly you haven't crossed that bridge quite yet.
The poetry ecosystem
is reasonably mature. It has tried to fix many of the
rough edges surrounding the python packaging practices
of the last few decades. I recommend that you prefer
modern over ancient practices, and that you adopt poetry
for your project.
If that won't fly for some reason, and especially if
binaries are a big deal for your project, consider publishing via
conda.
There are many pip pitfalls with target system
needing compilers and libraries. Conda does an
excellent job of ensuring that conda install ...
will Just Work.

How to install multiple versions of Python in Windows?

up until recently I have only worked with one version of Python and used virtual environments every now and then. Now, I am working with some libraries that require older version of Python. So, I am very confused. Could anyone please clear up some of my confusion?
How do I install multiple Python versions?
I initially had Python version 3.8.x but upgraded to 3.10.x last month. There is currently only that one version on my PC now.
I wanted to install one of the Python 3.8.x version and went to https://www.python.org/downloads/. It lists a lot of versions and subversions like 3.6, 3.7, 3.8 etc. etc. with 3.8.1, 3.8.2 till 3.8.13. Which one should I pick?
I actually went ahead with 3.8.12 and downloaded the Tarball on the page: https://www.python.org/downloads/release/python-3812/
I extracted the tarball (23.6MB) and it created a folder with a setup.py file.
Is Python 3.8.12 now installed? Clicking on the setup.py file simply flashes the terminal for a second.
I have a few more questions. Hopefully, they won't get me downvoted. I am just confused and couldn't find proper answers for them.
Why does Python have such heavy dependency on the exact versions of libraries and packages etc?
For example, this question
How can I run Mozilla TTS/Coqui TTS training with CUDA on a Windows system?. This seems very beginner unfriendly. Slightly mismatched package
version can prevent any program from running.
Do virtual environments copy all the files from the main Python installation to create a virtual environment and then install specific packages inside it? Isn't that a lot of wasted resources in duplication because almost all projects require there own virtual environment.

Your questions depend a bit on "all the other software". For example, as #leiyang indicated, the answer will be different if you use conda vs just pip on vanilla CPython (the standard Windows Python).
I'm also going to assume you're actually on Windows, because on Linux I would recommend looking at pyenv. There is a pyenv-win, which may be worth looking into, but I don't use it myself because it doesn't play as nice if you also want (mini)conda environments.
1. (a) How do I install multiple Python versions?
Simply download the various installers and install them in sensible locations. E.g. "C:\Program Files\Python39" for Python 3.9, or some other location where you're allowed to install software.
Don't have Python add itself to the PATH though, since that'll only find the last version to do so and can really confuse things.
Also, you probably want to use virtual environments consistently, as this ties a specific project very clearly to a specific Python version, avoiding future confusion or problems.
1. (b) "3.8.1, 3.8.2 till 3.8.13" which should I pick?
Always pick the latest 3.x.y, so if there's a 3.8.13 for Windows, but no 3.8.14, pick that. Check if the version is actually available for your operating system, sometimes there are later versions for one OS, but not for another.
The reason is that between a verion like 3.6 and 3.7, there may be major changes that change how Python works. Generally, there will be backwards compatibility, but some changes may break how some of your packages work. However, when going up a minor version, there won't be any such breaking changes, just fixes and additions that don't get in the way of what was already there. A change from 2.x to 3.x only happens if the language itself goes through a major change, and rarely happens (and perhaps never will again, depending on who you ask).
An exception to the "no minor version change problems" is of course if you run some script that very specifically relies on something that was broken in 3.8.6, but no fixed in 3.8.7+ (as an example). However, that's very bad coding, to rely on what's broken and not fixing it later, so only go along with that if you have no other recourse. Otherwise, just the latest minor version of any version you're after.
Also: make sure you pick the correct architecture. If there's no specific requirement, just pick 64-bit, but if your script needs to interact with other installed software at the binary level, it may require you to install 32-bit Python (and 32-bit packages as well). If you have no such requirement, 64-bit allows more memory access and has some other benefits on modern computers.
2. Why does Python have such heavy dependency on the exact versions of libraries and packages etc?
It's not just Python, this is true for many languages. It's just more visible to the end user for Python, because you run it as an interpreted language. It's only compiled at the very last moment, on the computer it's running on.
This has the advantage that the code can run on a variety of computers and operating systems, but the downside that you need the right environment where you're running it. For people who code in languages like C++, they have to deal with this problem when they're coding, but target a much smaller number of environments (although there's still runtimes to contend with, and DirectX versions, etc.). Other languages just roll everything up into the program that's being distributed, while a Python script by itself can be tiny. It's a design choice.
There are a lot of tools to help you automate the process though and well-written packages will make the process quite painless. If you feel Python is very shakey when it comes to this, that's probable to blame on the packages or scripts you're using, not really the language. The only fault of the language is that it makes it very easy for developers to make such a mess for you and make your life hard with getting specific requirements.
Look for alternatives, but if you can't avoid using a specific script or package, once you figure out how to install or use it, document it or better yet, automate it so you don't have to think about it again.
3. Do virtual environments copy all the files from the main Python installation to create a virtual environment and then install specific packages inside it? Isn't that a lot of wasted resources in duplication because almost all projects require there own virtual environment.
Not all of them, but quite a few of them. However, you still need the original installation to be present on the system. Also, you can't pick up a virtual environment and put it somewhere else, not even on the same PC without some careful changes (often better to just recreate it).
You're right that this is a bit wasteful - but this is a difficult choice.
Either Python would be even more complicated, having to manage many different version of packages in a single environment (Java developers will be able to tell you war stories about this, with their dependency management - or wax lyrically about it, once they get it themselves).
Or you get what we have: a bit wasteful, but in the end diskspace is a lot cheaper than your time. And unlike your time, diskspace is almost infinitely expandable.
You can share virtual environments between very similar projects though, but especially if you get your code from someone else, it's best to not have to worry and just give up a few dozen MB for the project. On the upside: you can just delete a virtual environment directory and that pretty much gets rid of the whole things. Some applications like PyCharm may remember that it was once there, but other than that, that's the virtual environment gone.

Just install them. You can have any number of Python installations side by side. Unless you need to have 2 different minor versions, for example 3.10.1 and 3.10.2, there is no need to do anything special. (And if you do need that then you don't need any advice.) Just set up separate shortcuts for each one.
Remember you have to install any 3rd-party libraries you need in each version. To do this, navigate to the Scripts folder in the version you want to do the install in, and run pip from that folder.
Python's 3rd-party libraries are open-source and come from projects that have release schedules that don't necessarily coincide with Python's. So they will not always have a version available that coincides with the latest version of Python.
Often you can get around this by downloading unofficial binaries from Christoph Gohlke's site. Google Python Gohlke.

Install Python using the windows executable installers from python.org. If the version is 3.x.y, use the highest y that has a windows executable installer. Unless your machine is very old, use the 64-bit versions. Do not have them add python to your PATH environment variable, but in only one of the installs have it install the python launcher py. That will help you in using multiple versions. See e.g. here.
Python itself does not. But some modules/libraries do. Especially those that are not purely written in Python but contain extensions written in C(++). The reason for this is that compiling programs on ms-windows can be a real PITA. Unlike UNIX-like operating systems with Linux, ms-windows doesn't come with development tools as standard. Nor does it have decent package management. Since the official Python installers are built with microsoft tools, you need to use those with C(++) extensions as well. Before 2015, you even had to use exactly the same version of the compiler that Python was built with. That is still a good idea, but no longer strictly necessary. So it is a signigicant amount of work for developers to release binary packages for each supported Python version. It is much easier for them to say "requires Python 3.x".

Best practice management of conda, pip, system, and personal software

In my work I need a mix of C++, Fortran and Python codes to interact, in the latter case mainly via Cython and SWIG. They are a mix of widely-used libraries (typically available via some package management system), ones specific to my field but not written by me (mostly not packaged beyond source tarballs), and ones for which I'm the developer.
For a long time we've been able to get by without worrying too much about Python 3, and so I operated a ~/local install area with a mix of the compiled and Python2-based software in the usual bin, lib, lib/python2.*/site-packages, etc. structure, and with the relevant subdir paths included in my .bashrc PATH, LD_LIBRARY_PATH, PYTHONPATH environment variables. But in particular with the rise of Python 3 for machine learning, and an increase in incompatible ML packages, I've had to start operating virtualenv directories for some projects. This, and also the switch of system tools like meld to Python 3, mean that my single, global, Python 2 environment isn't fit for purpose anymore.
At the same time, I've become aware that conda and condaforge are now being pushed for a lot of relevant software. So there's now going to be system packages and Python versions, potentially conda environments (for specific Pythons), pip packages (in virtualenvs or not), and then my personal builds. This is quite a lot to operate consistently, and there doesn't seem to be much information out there about best-practice, especially when sharing some code between multiple projects, and mixing in non-Python libraries with these Python-focused tools. Installing a whole chain of manual dependencies in an independent conda or virtualenv environment for each project would be very difficult to manage and wasteful in terms of duplication of large libraries, but on the other hand there seems to be at least a need for separate Python 2/3 environments, perhaps with more project-specific virtualenvs within them.
So, scene set -- apologies for the length, but it's intrinsically complex. Is anyone else wrestling with this problem, and is there an emerging standard or best-practice way to manage the mix of system, conda, pip, and manual package dependencies for development of many projects, without undue duplication?
PS. I appreciate answers may be opinion-based to some extent, although good ones will evidence and justify their recommendations. On the other hand, it's definitely about software development rather than just software management. So I hope it's an appropriate question for SO, since I don't see a better fit within the SX network.

Moving a Python environment over to a new OS install

I have reinstalled my operating system (moved from windows XP to Windows 7).
I have reinstalled Python 2.7.
But i had a lot of packages installed in my old environment.
(Django, sciPy, jinja2, matplotlib, numpy, networkx, to name just a view)
I still have my old Python installation lying around on a data partition, so i wondered if i can just copy-paste the old Python library folders onto the new installation?
Or do i need to reinstall every package?
Do the packages keep any information in registry, system variables or similar?
Does it depend on the package?

That's the point where you must be able to layout your project, thus having special tools for that.
Normally, Python packages do not do such wierd things as dealing with registry (unless they are packaged via MSI installer). The problems may start with packages that contain C extensions, so moving to another version of OS or from 32 to 64-bit architecture will require recompiling/rebuilding of those. So, it would be much better to reinstall all packages to new system as written below.
Your demands may vary, but you definitely must choose the way of building your environment. If you don't have and plan to have a large variety of projects you may consider the first approach as follows below, the second approach is more likely for setting up development environment for different projects or for different versions of the same project.
Global environment (your Python installation in your system along with installed packages).
Here you can consider using pip. In this case your project can have requirements file containing all needed packages for your project. Basically, requirements file is a text file containing package names (on PyPI and their versions).
Isolated environment. It can be achieved using special tools or specially organized path.
Here where pip can be gracefully combined with virtualenv. This way is highly recommended by a lot of developers (I must remind that Python 3.3 that will soon be released contains virtualenv as a part of standard library). This approach assumes creating virtual shell with its own instance of Python interpreter and installed packages.
Another popular tool for achieving isolated environment is called buildout. It lays out your project source and dependencies in one path so you achieve the same effect as virtualenv creates. The great advantage of buildout that it's built upon an idea of pluggable recipes (pieces of code implementing different common project deployment tasks) and there are hundreds of stable and reliable recipes over Internet.
Both virtualenv and buildout help you to remove head-ache when installing dependencies and solve the problem of different versions of the same package kept on a single machine.
Choose your destiny...

The short answer to this question is "no", since packages can execute arbitrary code on installation and do whatever the heck they want wherever they want on your system.
Just reinstall all of them.

Are there any other good alternatives to zc.buildout and/or virtualenv for installing non-python dependencies?

I am a member of a team that is about to launch a beta of a python (Django specifically) based web site and accompanying suite of backend tools. The team itself has doubled in size from 2 to 4 over the past few weeks and we expect continued growth for the next couple of months at least. One issue that has started to plague us is getting everyone up to speed in terms of getting their development environment configured and having all the right eggs installed, etc.
I'm looking for ways to simplify this process and make it less error prone. Both zc.buildout and virtualenv look like they would be good tools for addressing this problem but both seem to concentrate primarily on the python-specific issues. We have a couple of small subprojects in other languages (Java and Ruby specifically) as well as numerous python extensions that have to be compiled natively (lxml, MySQL drivers, etc). In fact, one of the biggest thorns in our side has been getting some of these extensions compiled against appropriate versions of the shared libraries so as to avoid segfaults, malloc errors and all sorts of similar issues. It doesn't help that out of 4 people we have 4 different development environments -- 1 leopard on ppc, 1 leopard on intel, 1 ubuntu and 1 windows.
Ultimately what would be ideal would be something that works roughly like this, from the dos/unix prompt:
$ git clone [repository url]
...
$ python setup-env.py
...
that then does what zc.buildout/virtualenv does (copy/symlink the python interpreter, provide a clean space to install eggs) then installs all required eggs, including installing any native shared library dependencies, installs the ruby project, the java project, etc.
Obviously this would be useful for both getting development environments up as well as deploying on staging/production servers.
Ideally I would like for the tool that accomplishes this to be written in/extensible via python, since that is (and always will be) the lingua franca of our team, but I am open to solutions in other languages.
So, my question then is: does anyone have any suggestions for better alternatives or any experiences they can share using one of these solutions to handle larger/broader install bases?

Setuptools may be capable of more of what you're looking for than you realize -- if you need a custom version of lxml to work correctly on MacOS X, for instance, you can put a URL to an appropriate egg inside your setup.py and have setuptools download and install that inside your developers' environments as necessary; it also can be told to download and install a specific version of a dependency from revision control.
That said, I'd lean towards using a scriptably generated virtual environment. It's pretty straightforward to build a kickstart file which installs whichever packages you depend on and then boot virtual machines (or production hardware!) against it, with puppet or similar software doing other administration (adding users, setting up services [where's your database come from?], etc). This comes in particularly handy when your production environment includes multiple machines -- just script the generation of multiple VMs within their handy little sandboxed subnet (I use libvirt+kvm for this; while kvm isn't available on all the platforms you have developers working on, qemu certainly is, or you can do as I do and have a small number of beefy VM hosts shared by multiple developers).
This gets you out of the headaches of supporting N platforms -- you only have a single virtual platform to support -- and means that your deployment process, as defined by the kickstart file and puppet code used for setup, is source-controlled and run through your QA and review processes just like everything else.

I always create a develop.py file at the top level of the project, and have also a packages directory with all of the .tar.gz files from PyPI that I want to install, and also included an unpacked copy of virtualenv that is ready to run right from that file. All of this goes into version control. Every developer can simply check out the trunk, run develop.py, and a few moments later will have a virtual environment ready to use that includes all of our dependencies at exactly the versions the other developers are using. And it works even if PyPI is down, which is very helpful at this point in that service's history.

Basically, you're looking for a cross-platform software/package installer (on the lines of apt-get/yum/etc.) I'm not sure something like that exists?
An alternative might be specifying the list of packages that need to be installed via the OS-specific package management system such as Fink or DarwinPorts for Mac OS X and having a script that sets up the build environment for the in-house code?

I have continued to research this issue since I posted the question. It looks like there are some attempts to address some of the needs I outlined, e.g. Minitage and Puppet which take different approaches but both may accomplish what I want -- although Minitage does not explicitly state that it supports Windows. Lacking any better options I will try to make either one of these or just extensive customized use of zc.buildout work for our needs, but I still feel like there must be better options out there.

You might consider creating virtual machine appliances with whatever production OS you are running, and all of the software dependencies pre-built. Code can be edited either remotely, or with a shared folder. It worked pretty well for me in a past life that had a fairly complicated development environment.

Puppet doesn't (easily) support the Win32 world either. If you're looking for a deployment mechanism and not just a "dev setup" tool, you might consider looking into ControlTier (http://open.controltier.com/) which has a open-source cross-platform solution.
Beyond that you're looking at "enterprise" software such as BladeLogic or OpsWare and typically an outrageous pricetag for the functionality offered (my opinion, obviously).
A lot of folks have been aggressively using a combination of Puppet and Capistrano (even non-rails developers) for deployment automation tools to pretty good effect. Downside, again, is that it's expecting a somewhat homogeneous environment.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.