Setup.py for Utility Python project

Setup.py for Utility Python project - python

Should/does a utility Python project need a setup.py file? My utility project will train a computer vision model with sample images. It depends on a computer vision python module/package. It will be used internally and not publicly distributed.
Is a setup.py file useful or applicable for this kind of Python project?

Yes, it is certainly useful to setup packaging even for internal libraries. You'll want to write a setup.py if you'd like any of the benefits below:
Users to be able to install your stuff with pip install myutility
Different versions, e.g. one user on myutility==1.0.1 and another on myutility==1.2.1
Control / management of dependencies (see install_requires)
To use continuous integration and software dev best practices (e.g. the tests must have passed for code to be released)
The only time I would consider omitting setup.py is for quick throwaway scripts that have no external dependencies.

Related

Building a python package to publish in pypi

I am greatly confused with the process of building a python package that I want to distribute on pypi.
There are some specific, basic things that I did not understand:
What exactly is that gets published? Binaries? Source code? How do I do one or the other?
How do I build multiple platform-specific, os-specific build from the same codebase?
How do I build a the package for multiple versions of Python from the same codebase? Is it necessary if I want to support many python versions?
I am using a .toml file for the setup configuration.
I found some answers only, but all refer to procedures with either a setup.py or a setup.cfg.

What exactly is that gets published? Binaries? Source code?
Yes, and yes.
It depends on details of your project and your package config.
Arbitrary commands can be run during a package build.
You might, for example, run a fortran compiler locally
and ship binaries, or you might insist that each person
installing the package run their own local fortran compiler.
We usually expect full *.py source code will appear on pypi.org.
"Binaries" here will usually describe compiled machine code,
and not *.pyc bytecode files.
How do I build multiple platform-specific, os-specific build from the same codebase?
I have only done this via git pull on a target platform, followed
by a local build, but there is certainly support for cross target
toolchains if you need that.
How do I build a the package for multiple versions of Python from the same codebase?
Same as above -- do a build under each separate target version.
Is it necessary if I want to support many python versions?
Typically the answer is "no".
Pick a minimum required interpreter version, such as 3.7,
and do all your development / testing / release work in that.
Backward compatibility of interpreters is excellent.
Folks running 3.8 or 3.11 should have no trouble
with your package.
There can be a fly in the ointment.
Suppose your project depends on library X,
or depends on X which depends on Y.
And one of them stopped being updated a few years ago,
or went through a big change like a rename.
Your users who are on 3.11 might find it
inconvenient to obtain a compatible version of X or Y.
This might motivate you to do split releases,
for example via major version number or by
slightly altering your project name.
Clearly you haven't crossed that bridge quite yet.
The poetry ecosystem
is reasonably mature. It has tried to fix many of the
rough edges surrounding the python packaging practices
of the last few decades. I recommend that you prefer
modern over ancient practices, and that you adopt poetry
for your project.
If that won't fly for some reason, and especially if
binaries are a big deal for your project, consider publishing via
conda.
There are many pip pitfalls with target system
needing compilers and libraries. Conda does an
excellent job of ensuring that conda install ...
will Just Work.

Python: A standard skeleton seed for initiating standalone projects

In Scala I can use the following command (using Scala build tool), to get an initial project that has a pretty much standard skeleton:
sbt new scala/scala-seed.g8
This saved me loads of headache when it comes to clean code and initial structure of the project.
I want to achieve the same thing with Python, is there a way, a “seed” I can use, that pretty much sums up the standard skeleton for a python project? My criterias are:
Config files: Any files that could sums up the dependencies, for tests specifically coverage test and coverage report generation.
Source: the source folder for source files.
Test: For unit, integration and property-based tests.
Manageable build tool: a build I can use to create doc, compile, test, and run.
I also asked myself, what are the big open source projects in Python look like. None, I mean none, look the same in terms of structuring the code. I looked at Tensorflow, Scikit, Zulip, and Keras on their Github pages.

Cookiecutter should work in this situation. It's a command-line utility to set up a project from a template. You can install it by pip install --user cookiecutter.
You can use a variety of templates, from a full blown Python package to a minimal pip installable project.
Take a look here in order to see how to set up tests, CI, coverage reports, documentation etc.
Full documentation: https://cookiecutter.readthedocs.io/en/latest/readme.html

Post install script after installing a wheel

Using from setuptools.command.install import install, I can easily run a custom post-install script if I run python setup.py install. This is fairly trivial to do.
Currently, the script does nothing but print some text but I want it to deal with system changes that need to happen when a new package is installed -- for example, back up the database that the package is using.
I want to generate the a Python wheel for my package and then copy that and install it on a a set of deployment machines. However, my custom install script is no longer run on the deployment machine.
What am I doing wrong? Is that even possible?

Do not mix package installation and system deployment
Installation of Python packages (using any sort of packaging tools or formats) shall be focused on making that package usable from Python code.
Deployment, what might include database modifications etc. is definitely out of scope and shall be handled by other tools like fab, salt-stack etc.
The fact, that something seems fairly trivial does not mean, one shall do it.
The risk is, you will make your package installation difficult to reuse, as it will be spoiled by others things, which are not related to pure package installation.
The option to hook into installation process and modify environment is by some people even considered flaw in design, causing big mess in Python packaging situation - see Armin Roacher in Python Packaging: Hate, Hate, Hate Everywhere, chapter "PTH: The failed Design that enabled it all"

PEP 427 which specifies the wheel package format does not leave any provisions for custom pre or post installation scripts.
Therefore running a custom script is not possible during wheel package installation.
You'll have to add the custom script to a place in your package where you expect the developer to execute first.

Distributing python code with virtualenv?

I want to distribute some python code, with a few external dependencies, to machines with only core python installed (and users that unfamiliar with easy_install etc.).
I was wondering if perhaps virtualenv can be used for this purpose? I should be able to write some bash scripts that trigger the virtualenv (with the suitable packages) and then run my code.. but this seems somewhat messy, and I'm wondering if I'm re-inventing the wheel?
Are there any simple solutions to distributing python code with dependencies, that ideally doesn't require sudo on client machines?

Buildout - http://pypi.python.org/pypi/zc.buildout
As sample look at my clean project: http://hg.jackleo.info/hyde-0.5.3-buildout-enviroment/src its only 2 files that do the magic, more over Makefile is optional but then you'll need bootstrap.py (Make file downloads it, but it runs only on Linux). buildout.cfg is the main file where you write dependency's and configuration how project is laid down.
To get bootstrap.py just download from http://svn.zope.org/repos/main/zc.buildout/trunk/bootstrap/bootstrap.py
Then run python bootstap.py and bin/buildout. I do not recommend to install buildout locally although it is possible, just use the one bootstrap downloads.
I must admit that buildout is not the easiest solution but its really powerful. So learning is worth time.
UPDATE 2014-05-30
Since It was recently up-voted and used as an answer (probably), I wan to notify of few changes.
First of - buildout is now downloaded from github https://raw.githubusercontent.com/buildout/buildout/master/bootstrap/bootstrap.py
That hyde project would probably fail due to buildout 2 breaking changes.
Here you can find better samples http://www.buildout.org/en/latest/docs/index.html also I want to suggest to look at "collection of links related to Buildout" part, it might contain info for your project.
Secondly I am personally more in favor of setup.py script that can be installed using python. More about the egg structure can be found here http://peak.telecommunity.com/DevCenter/PythonEggs and if that looks too scary - look up google (query for python egg). It's actually more simple in my opinion than buildout (definitely easier to debug) as well as it is probably more useful since it can be distributed more easily and installed anywhere with a help of virtualenv or globally where with buildout you have to provide all of the building scripts with the source all of the time.

You can use a tool like PyInstaller for this purpose. Your application will appear as a single executable on all platforms, and include dependencies. The user doesn't even need Python installed!
See as an example my logview package, which has dependencies on PyQt4 and ZeroMQ and includes distributions for Linux, Mac OSX and Windows all created using PyInstaller.

You don't want to distribute your virtualenv, if that's what you're asking. But you can use pip to create a requirements file - typically called requirements.txt - and tell your users to create a virtualenv then run pip install -r requirements.txt, which will install all the dependencies for them.
See the pip docs for a description of the requirements file format, and the Pinax project for an example of a project that does this very well.

Tool (or combination of tools) for reproducible environments in Python

I used to be a java developer and we used tools like ant or maven to manage our development/testing/UAT environments in a standardized way. This allowed us to handle library dependencies, setting OS variables, compiling, deploying, running unit tests, and all the required tasks. Also, the scripts generated guaranteed that all the environments were almost equally configured, and all the task were performed in the same way by all the members of the team.
I'm starting to work in Python now and I'd like your advice in which tools should I use to accomplish the same as described for java.

virtualenv to create a contained virtual environment (prevent different versions of Python or Python packages from stomping on each other). There is increasing buzz from people moving to this tool. The author is the same as the older working-env.py mentioned by Aaron.
pip to install packages inside a virtualenv. The traditional is easy_install as answered by S. Lott, but pip works better with virtualenv. easy_install still has features not found in pip though.
scons as a build tool, although you won't need this if you stay purely Python.
Fabric paste, or paver for deployment.
buildbot for continuous integration.
Bazaar, mercurial, or git for version control.
Nose as an extension for unit testing.
PyFit for FIT testing.

I also work with both java and python.
For python development the maven equivalent is setuptools (http://peak.telecommunity.com/DevCenter/setuptools). For web application development I use this in combination with paster (http://pythonpaste.org/) for the deployment process

Other than easy_install?
For our Linux servers, we use easy_install and yum.
For our Windows development laptops, we use easy_install and a few MSI's for some projects.
Most of the Python libraries we use are source-only, so we can use the same distribution on all boxes. If we could have a network shared device, we'd put them all there. Sadly, our infrastructure is kind of scattered, so we have to either move .TAR files around or redo the installs to rebuild the environments.
In a few cases (e.g., PIL), we have to recompile and check the version numbers.

You will want easy_setup to get the eggs (roughly what Maven calls an artifact).
For setting up your environment, have a look at working-env.py
Python is not compiled but you can put all files for a project in an egg. This is done with setuptools
For CI, check this answer.

We would be remiss not to also mention Paver, which was created by Kevin Dangoor of TurboGears fame. The project is still in alpha, but it appears very promising. A snippet from the project page:
Paver is a Python-based build/distribution/deployment scripting tool along the lines of Make or Rake. What makes Paver unique is its integration with commonly used Python libraries. Common tasks that were easy before remain easy. More importantly, dealing with your applications specific needs and requirements is now much easier.

I do exactly this with a combination of setuptools and Hudson. I know Hudson is a java app, but it can run Python stuff just fine.

You might want to check our Devenv. It allows you to standardize the build environments for development, QA and UAT. It's free as in "free beer".
HTH

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.