Managing python project CI/CD with Conda, Poetry and pip - python

I have read through a couple of dozens of writeups on how to equip a modern python project to automate linting, testing, coverage, type checking etc (and eventually deploy it on cloud servers. Not interested in the latter part yet.)
I am thinking about using conda as my environment manager. This would give me the advantage of being able to install non python packages if needed and also by creating the project environment with a specified python version I believe it would replace pyenv.
Within the newly conda created environment I would use poetry to manage dependencies and to initialize its TOML and add other needed python packages which would be installed by pip from PyPI.
Did I get the above right?
To complete the "building" phase I am also looking into using pytest and pytest-cov for code coverage, mypy/pydantic for type checking and black for formatting.
All of this should work both on my local development machine but then when I push on GitHub trigger an Action and perform the same checks, so that any contributor will go through them. Currently managed to do it for pylint, pytest and coverage for very simple projects without requirements.
Does all of this make sense? Am I missing some important component/step? For example I'm trying to understand if tox would help me in this workflow by automating testing on different python versions, but haven't come to grips integrating this flood on (new to me) concepts. Thanks

Related

How to have python libraries already installed in python project?

I am working on a python project that requires a few libraries. This project will further be shared with other people.
The problem I have is that I can't use the usual pip install 'library' as part of my code because the project could be shared with offline computers and the work proxy could block the download.
So what I first thought of was installing .whl files and running pip install 'my_file.whl' but this is limited since some .whl files work on some computers but not on others, so this couldn't be the solution of my problem.
I tried sharing my project with another project and i had an error with a .whl file working on one computer but not the other.
What I am looking for is to have all the libraries I need to be already downloaded before sharing my project. So that when the project is shared, the peers can launch it without needing to download the libraries.
Is this possible or is there something else that can solve my problem ?
There are different approaches to the issue here, depending on what the constraints are:
1. Defined Online Dependencies
It is a good practice to define the dependencies of your project (not only when shared). Python offers different methods for this.
In this scenario every developer has access to a pypi repository via the network. Usually the official main mirrors (i.e. via internet). New packages need to be pulled individually from here, whenever there are changes.
Repository (internet) access is only needed when pulling new packages.
Below the most common ones:
1.1 requirements.txt
The requirements.txt is a plain text list of required packages and versions, e.g.
# requirements.txt
matplotlib==3.6.2
numpy==1.23.5
scipy==1.9.3
When you check this in along with your source code, users can freely decide how to install it. The mosty simple (and most convoluted way) is to install it in the base python environment via
pip install -r requirements.txt
You can even automatically generate such a file, if you lost track with pipreqs. The result is usually very good. However, a manual cleanup afterwards is recommended.
Benefits:
Package dependency is clear
Installation is a one line task
Downsides:
Possible conflicts with multiple projects
Not sure that everyone has the exact same version if flexibility is allowed (default)
1.2 Pipenv
There is a nice and almost complete Answer to Pipenv. Also the Pipenv documentation itself is very good.
In a nutshell: Pipenv allows you to have virtual environments. Thus, version conflicts from different projects are gone for good. Also, the Pipfile used to define such an environment allows seperation of production and development dependencies.
Users now only need to run the following commands in the folder with the source code:
pip install pipenv # only needed first time
pipenv install
And then, to activate the virtual environment:
pipenv shell
Benefits:
Seperation between projects
Seperation of development/testing and production packages
Everyone uses the exact same version of the packages
Configuration is flexible but easy
Downsides:
Users need to activate the environment
1.3 conda environment
If you are using anaconda, a conda environment definition can be also shared as a configuration file. See this SO answer for details.
This scenario is as the pipenv one, but with anaconda as package manager. It is recommended not to mix pip and conda.
1.4 setup.py
When you are anyway implementing a library, you want to have a look on how to configure the dependencies via the setup.py file.
2. Defined local dependencies
In this scenario the developpers do not have access to the internet. (E.g. they are "air-gapped" in a special network where they cannot communicate to the outside world. In this case all the scenarios from 1. can still be used. But now we need to setup our own mirror/proxy. There are good guides (and even comlplete of the shelf software) out there, depending on the scenario (above) you want to use. Examples are:
Local Pypi mirror [Commercial solution]
Anaconda behind company proxy
Benefits:
Users don't need internet access
Packages on the local proxy can be trusted (cannot be corrupted / deleted anymore)
The clean and flexible scenarios from above can be used for setup
Downsides:
Network connection to the proxy is still required
Maintenance of the proxy is extra effort
3. Turn key environments
Last, but not least, there are solutions to share the complete and installed environment between users/computers.
3.1 Copy virtual-env folders
If (and only if) all users (are forced to) use an identical setup (OS, install paths, uses paths, libraries, LOCALS, ...) then one can copy the virtual environments for pipenv (1.2) or conda (1.3) between PCs.
These "pre-compiled" environments are very fragile, as a sall change can cause the setup to malfunction. So this is really not recommended.
Benefits:
Can be shared between users without network (e.g. USB stick)
Downsides:
Very fragile
3.2 Virtualisation
The cleanest way to support this is some kind of virtualisation technique (virtual machine, docker container, etc.).
Install python and the dependencies needed and share the complete container.
Benefits:
Users can just use the provided container
Downsides:
Complex setup
Complex maintenance
Virtualisation layer needed
Code and environment may become convoluted
Note: This answer is compiled from the summary of (mostly my) comments

How to distribute a python project for personal use

So I've been working on a project.
I've reached the stage where everything works.
The only thing that's left for me to do is to deploy/distribute the project so I'll be able to install / run it on my own machines with minimal effort. Since this project is for personal use only, I wasn't planning on distributing it and putting on on PyPi.
So what is the best distribution scheme given this?
Is it to install using distutils and setup.py?
My original plan was to pull the entire project from my git repository on to the host machine and automatically install the dependencies (Which I understood can be done using Pip and a requirement file). Does this make more sense?
I'd appreciate some guidance since I've never done this before.

Post install script after installing a wheel

Using from setuptools.command.install import install, I can easily run a custom post-install script if I run python setup.py install. This is fairly trivial to do.
Currently, the script does nothing but print some text but I want it to deal with system changes that need to happen when a new package is installed -- for example, back up the database that the package is using.
I want to generate the a Python wheel for my package and then copy that and install it on a a set of deployment machines. However, my custom install script is no longer run on the deployment machine.
What am I doing wrong? Is that even possible?
Do not mix package installation and system deployment
Installation of Python packages (using any sort of packaging tools or formats) shall be focused on making that package usable from Python code.
Deployment, what might include database modifications etc. is definitely out of scope and shall be handled by other tools like fab, salt-stack etc.
The fact, that something seems fairly trivial does not mean, one shall do it.
The risk is, you will make your package installation difficult to reuse, as it will be spoiled by others things, which are not related to pure package installation.
The option to hook into installation process and modify environment is by some people even considered flaw in design, causing big mess in Python packaging situation - see Armin Roacher in Python Packaging: Hate, Hate, Hate Everywhere, chapter "PTH: The failed Design that enabled it all"
PEP 427 which specifies the wheel package format does not leave any provisions for custom pre or post installation scripts.
Therefore running a custom script is not possible during wheel package installation.
You'll have to add the custom script to a place in your package where you expect the developer to execute first.

How to easily distribute Python software that has Python module dependencies? Frustrations in Python package installation on Unix

My goal is to distribute a Python package that has several other widely used Python packages as dependencies. My package depends on well written, Pypi-indexed packages like pandas, scipy and numpy, and specifies in the setup.py that certain versions or higher of these are needed, e.g. "numpy >= 1.5".
I found that it's immensely frustrating and nearly impossible for Unix savvy users who are not experts in Python packaging (even if they know how to write Python) to install a package like mine, even when using what are supposed to be easy to use package managers. I am wondering if there is an alternative to this painful process that someone can offer, or if my experience just reflects the very difficult current state of Python packaging and distribution.
Suppose users download your package onto their system. Most will try to install it "naively", using something like:
$ python setup.py install
Since if you google instructions on installing Python packages, this is usually what comes up. This will fail for the vast majority of users, since most do not have root access on their Unix/Linux servers. With more searching, they will discover the "--prefix" option and try:
$ python setup.py install --prefix=/some/local/dir
Since the users are not aware of the intricacies of Python packaging, they will pick an arbitrary directory as an argument to --prefix, e.g. "~/software/mypackage/". It will not be a cleanly curated directory where all other Python packages reside, because again, most users are not aware of these details. If they install another package "myotherpackage", they might pass it "~/software/myotherpackage", and you can imagine how down the road this will lead to frustrating hacking of PYTHONPATH and other complications.
Continuing with the installation process, the call to "setup.py install" with "--prefix" will also fail once users try to use the package, even though it appeared to have been installed correctly, since one of the dependencies might be missing (e.g. pandas, scipy or numpy) and a package manager is not used. They will try to install these packages individually. Even if successful, the packages will inevitably not be in the PYTHONPATH due to the non-standard directories given to "--prefix" and patient users will dabble with modifications of their PYTHONPATH to get the dependencies to be visible.
At this stage, users might be told by a Python savvy friend that they should use a package manager like "easy_install", the mainstream manager, to install the software and have dependencies taken care of. After installing "easy_install", which might be difficult, they will try:
$ easy_install setup.py
This too will fail, since users again do not typically have permission to install software globally on production Unix servers. With more reading, they will learn about the "--user" option, and try:
$ easy_install setup.py --user
They will get the error:
usage: easy_install [options] requirement_or_url ...
or: easy_install --help
error: option --user not recognized
They will be extremely puzzled why their easy_install does not have the --user option where there are clearly pages online describing the option. They might try to upgrade their easy_install to the latest version and find that it still fails.
If they continue and consult a Python packaging expert, they will discover that there are two versions of easy_install, both named "easy_install" so as to maximize confusion, but one part of "distribute" and the other part of "setuptools". It happens to be that only the "easy_install" of "distribute" supports "--user" and the vast majority of servers/sys admins install "setuptools"'s easy_install and so local installation will not be possible. Keep in mind that these distinctions between "distribute" and "setuptools" are meaningless and hard to understand for people who are not experts in Python package management.
At this point, I would have lost 90% of even the most determined, savvy and patient users who try to install my software package -- and rightfully so! They wanted to install a piece of software that happened to be written in Python, not to become experts in state of the art Python package distribution, and this is far too confusing and complex. They will give up and be frustrated at the time wasted.
The tiny minority of users who continue on and ask more Python experts will be told that they ought to use pip/virtualenv instead of easy_install. Installing pip and virtualenv and figuring out how these tools work and how they are different from the conventional "python setup.py" or "easy_install" calls is in itself time consuming and difficult, and again too much to ask from users who just wanted to install a simple piece of Python software and use it. Even those who pursue this path will be confused as to whether whatever dependencies they installed with easy_install or setup.py install --prefix are still usable with pip/virtualenv or if everything needs to be reinstalled from scratch.
This problem is exacerbated if one or more of the packages in question depends on installing a different version of Python than the one that is the default. The difficulty of ensuring that your Python package manger is using the Python version you want it to, and that the required dependencies are installed in the relevant Python 2.x directory and not Python 2.y, will be so endlessly frustrating to users that they will certainly give up at that stage.
Is there a simpler way to install Python software that doesn't require users to delve into all of these technical details of Python packages, paths and locations? For example, I am not a big Java user, but I do use some Java tools occasionally, and don't recall ever having to worry about X and Y dependencies of the Java software I was installing, and I have no clue how Java package managing works (and I'm happy that I don't -- I just wanted to use a tool that happened to be written in Java.) My recollection is that if you download a Jar, you just get it and it tends to work.
Is there an equivalent for Python? A way to distribute software in a way that doesn't depend on users having to chase down all these dependencies and versions? A way to perhaps compile all the relevant packages into something self-contained that can just be downloaded and used as a binary?
I would like to emphasize that this frustration happens even with the narrow goal of distributing a package to savvy Unix users, which makes the problem simpler by not worrying about cross platform issues, etc. I assume that the users are Unix savvy, and might even know Python, but just aren't aware (and don't want to be made aware) about the ins and outs of Python packaging and the myriad of internal complications/rivalries of different package managers. A disturbing feature of this issue is that it happens even when all of your Python package dependencies are well-known, well-written and well-maintained Pypi-available packages like Pandas, Scipy and Numpy. It's not like I was relying on some obscure dependencies that are not properly formed packages: rather, I was using the most mainstream packages that many might rely on.
Any help or advice on this will be greatly appreciated. I think Python is a great language with great libraries, but I find it virtually impossible to distribute the software I write in it (once it has dependencies) in a way that is easy for people to install locally and just run. I would like to clarify that the software I'm writing is not a Python library for programmatic use, but software that has executable scripts that users run as individual programs. Thanks.
We also develop software projects that depend on numpy, scipy and other PyPI packages. Hands down, the best tool currently available out there for managing remote installations is zc.buildout. It is very easy to use. You download a bootstrapping script from their website and distribute that with your package. You write a "local deployment" file, called normally buildout.cfg, that explains how to install the package locally. You ship both the bootstrap.py file and buildout.cfg with your package - we use the MANIFEST.in file in our python packages to force the embedding of these two files with the zip or tar balls distributed by PyPI. When the user unpackages it, it should execute two commands:
$ python bootstrap.py # this will download zc.buildout and setuptools
$ ./bin/buildout # this will build and **locally** install your package + deps
The package is compiled and all dependencies are installed locally, which means that the user installing your package doesn't even need root privileges, which is an added feature. The scripts are (normally) placed under ./bin, so the user can just execute them after that. zc.buildout uses setuptools for interaction with PyPI so everything you expect works out of the box.
You can extend zc.buildout quite easily if all that power is not enough - you create the so-called "recipes" that can help the user to create extra configuration files, download other stuff from the net or instantiate custom programs. zc.buildout website contains a video tutorial that explains in details how to use buildout and how to extend it. Our project Bob makes extensive use of buildout for distributing packages for scientific usage. If you would like, please visit the following page that contains detailed instructions for our developers on how they can setup their python packages so other people can build and install them locally using zc.buildout.
We're currently working to make it easier for users to get started installing Python software in a platform independent manner (in particular see https://python-packaging-user-guide.readthedocs.org/en/latest/future.html and http://www.python.org/dev/peps/pep-0453/)
For right now, the problem with two competing versions of easy_install has been resolved, with the competing fork "distribute" being merged backing into the setuptools main line of development.
The best currently available advice on cross-platform distribution and installation of Python software is captured here: https://packaging.python.org/

Tool (or combination of tools) for reproducible environments in Python

I used to be a java developer and we used tools like ant or maven to manage our development/testing/UAT environments in a standardized way. This allowed us to handle library dependencies, setting OS variables, compiling, deploying, running unit tests, and all the required tasks. Also, the scripts generated guaranteed that all the environments were almost equally configured, and all the task were performed in the same way by all the members of the team.
I'm starting to work in Python now and I'd like your advice in which tools should I use to accomplish the same as described for java.
virtualenv to create a contained virtual environment (prevent different versions of Python or Python packages from stomping on each other). There is increasing buzz from people moving to this tool. The author is the same as the older working-env.py mentioned by Aaron.
pip to install packages inside a virtualenv. The traditional is easy_install as answered by S. Lott, but pip works better with virtualenv. easy_install still has features not found in pip though.
scons as a build tool, although you won't need this if you stay purely Python.
Fabric paste, or paver for deployment.
buildbot for continuous integration.
Bazaar, mercurial, or git for version control.
Nose as an extension for unit testing.
PyFit for FIT testing.
I also work with both java and python.
For python development the maven equivalent is setuptools (http://peak.telecommunity.com/DevCenter/setuptools). For web application development I use this in combination with paster (http://pythonpaste.org/) for the deployment process
Other than easy_install?
For our Linux servers, we use easy_install and yum.
For our Windows development laptops, we use easy_install and a few MSI's for some projects.
Most of the Python libraries we use are source-only, so we can use the same distribution on all boxes. If we could have a network shared device, we'd put them all there. Sadly, our infrastructure is kind of scattered, so we have to either move .TAR files around or redo the installs to rebuild the environments.
In a few cases (e.g., PIL), we have to recompile and check the version numbers.
You will want easy_setup to get the eggs (roughly what Maven calls an artifact).
For setting up your environment, have a look at working-env.py
Python is not compiled but you can put all files for a project in an egg. This is done with setuptools
For CI, check this answer.
We would be remiss not to also mention Paver, which was created by Kevin Dangoor of TurboGears fame. The project is still in alpha, but it appears very promising. A snippet from the project page:
Paver is a Python-based build/distribution/deployment scripting tool along the lines of Make or Rake. What makes Paver unique is its integration with commonly used Python libraries. Common tasks that were easy before remain easy. More importantly, dealing with your applications specific needs and requirements is now much easier.
I do exactly this with a combination of setuptools and Hudson. I know Hudson is a java app, but it can run Python stuff just fine.
You might want to check our Devenv. It allows you to standardize the build environments for development, QA and UAT. It's free as in "free beer".
HTH

Categories

Resources