Distribution format for Python libraries uploaded on PyPI

Distribution format for Python libraries uploaded on PyPI - python

I went through the tutorial of uploading packages to https://test.pypi.org/ and I was successful in doing so.
However, $python setup.py sdist bdist_wheel produces a .whl file and a tar.gz file in the dist/ directory. twine allows uploading just the .whl or tar.gz file or both. I see many repositories on https://pypi.org/ have uploaded both the formats.
I want to understand what is the best practice. Is one format preferred over the other? If .whl file is enough for distributing my code, should I upload tar.gz file too? Or is there anything else I am completely missing here.

Best practice is to provide both.
A "built distribution" (.whl) for users that are able to use that distribution. This saves install time as a "built distribution" is pre-built and can just be dropped into place on the users machine, without any compilation step or without executing setup.py. There may be more than one built distribution for a given release -- once you start including compiled binaries with your distribution, they become platform-specific (see https://pypi.org/project/tensorflow/#files for example)
A "source distribution" (.tar.gz) is essentially a fallback for any user that cannot use your built distribution(s). Source distributions are not "built" meaning they may require compilation to install. At the minimum, they require executing a build-backend (for most projects, this is invoking setup.py with setuptools as the build-backend). Any installer should be able to install from source. In addition, a source distribution makes it easier for users who want to audit your source code (although this is possible with built distributions as well).
For the majority of Python projects, turning a "source distribution" into a "built distribution" results in a single pure-Python wheel (which is indicated by the none-any in a filename like projectname-1.2.3-py2.py3-none-any.whl). There's not much difference between this and the source distribution, but it's still best practice to upload both.

A .tar.gz is a so called source distribution. It contains the source code of your package and instructions on how to build it, and the target system will perform the build before installing it.
A .wheel (spec details in PEP 427) is a built distribution format, which means that the target system doesn't need to build it any more. Installing a wheel usually just means copying its content into the right site-packages.
The wheel sounds downright superior, because it is. It is still best practice to upload both, wheels and a source distribution, because any built distribution format only works for a subset of target systems. For a package that contains only python code, that subset is "everything" - people still often upload source distributions though, maybe to be forward compatible in case a new standard turns up[1], maybe to anticipate system specific extensions that will suddenly require source distributions in order to support all platforms, maybe to give users the option to run a custom build with specific build-parameters.
A good example package to observe the different cases is numpy, which uploads a whooping 25 wheels to cover the most popular platforms, plus a source distribution. If you install numpy from any of the supported platforms, you'll get a nice short install taking a couple of second where the contents of the wheel are copied over. If you are on an unsupported platform (such as alpine), a normal computer will probably take at least 20 minutes to build numpy from source before it can be installed, and you need to have all kinds of system level dev tools for building C-extensions installed. A bit of a pain, but still better then not being able to install it all.
[1] After all, before wheel there was egg, and the adoption of the wheel format would have been a lot harder than it already was if package managers had decided to only upload egg distributions and no sources.

You can just upload whl file and install with below command

Related

what is use_develop in tox and development mode

I was trying to understand the purpose of use_develop and from the docs, I found this:
Install the current package in development mode with develop mode. For pip this uses -e option, so should be avoided if you’ve specified a custom install_command that does not support -e.
I don't understand what "development mode" means. Is this a python concept or it's specific to tox? Either way what does it mean?

development mode or editable installs is a Python concept, or even more specific a Python packaging concept.
Usually, when you package a Python application or library, the source files are packaged into a "container", either a wheel or a source distribution.
This is a good thing to distribute a package, but not for developing, as then the source files are no longer accessible.
editable installs is a concept, that instead of "moving/copying" the files into the package container, the files are just symlinked (at least this is one way).
So when you edit the source files, also the package is updated immediately.
For tox this also means that the files in the root of the source tree are importable by Python, not only the one in the package.
This might be comfortable, but there is one huge caveat. If you misconfigure the packaging setup, maybe the tests ran by tox are green, but it is entirely possible that you forget to include the source files in the package you deliver to your users.

How to edit and use a python package containing an .so file

I have installed a Python package as usual using pip install package_name. It contains the main/most relevant file in the form of .so extension. I want to MODIFY it and use it for my work. Is it even possible to do it. Is there a background/underlying code for the .so file in python/.. that comes along with the package or is it a standalone program?

Go to the site whence pip fetches things, find your package, and follow the link to its source distribution. Building that yourself often requires more tools and expertise than usingpip, which is the cost of customization. (The GPL, despite being more restrictive (in this peculiar sense) than most Free licenses, certainly allows merely providing Internet access to the sources for binaries so distributed.)

How to include files downloadable from a server into a pip installation?

I want to create a pip-installable (this is important, we already have a mostly-working easy-install version, but we want to switch to PIP) python package, which is essentially a wrapper for some C functions. As I understand it, I cannot count on users having compilers installed (e.g. Windows), and so preferably I would precompile these files and upload them onto a server. What I would like, is PIP to download a suitable file (I would prefer if it wasn't necessary for all these files to come shipped with the package) during the installation. I've tried reading the docs, but failed to find any solutions for my problem there. Is PIP able to download a compiled C file from a server during the installation? If so, what is the course of action? Should I perhaps try to include a python script, to be run at installation, which would determine the OS and the architecture, and then access a specific link?

You are correct in most of your assumptions. You can offer a source distribution, or sdist, which requires build tools on the target machine in order to be installed. It is often uploaded as a fallback when the platform wheel that you need doesn't exist, or if you want your users to be able to build it themselves.
Speaking of wheels, that is the name of the current standard for binary python distributions, or bdists. If your package contains code that needs to be compiled, wheels will end up being platform specific - depending on the used build system that can be Linux, macOS, or Windows. See for example the sklearn entry on pyPI, which features one wheel per os for all supported python versions (plus 32/64 bit support, but that's another story).
If you specify an index (or just a directory that has the wheels in it) where pip should install from, it will automatically make sure that it downloads/installs the correct wheel, which avoids writing platform-specific code into your source code. The hard part is building the wheels.
Related question:
Pip install and platform specific wheels
How to avoid building C library with my python package? (ends up building platform specific wheels)

setuptools "eager_resources" to executable directory

I maintain a Python utility that allows bpy to be installable as a Python module. Due to the hugeness of the spurce code, and the length of time it takes to download the libraries, I have chosen to provide this module as a wheel.
Unfortunately, platform differences and Blender runtime expectations makes support for this tricky at times.
Currently, one of my big goals is to get the Blender addon scripts directory to install into the correct location. The directory (simply named after the version of Blender API) has to exist in the same directory as the Python executable.
Unfortunately the way that setuptools works (or at least the way that I have it configured) the 2.79 directory is not always placed as a sibling to the Python executable. It fails on Windows platforms outside of virtual environments.
However, I noticed in setuptools documentation that you can specify eager_resources that supposedly guarantees the location of extracted files.
https://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-resource-extraction
https://setuptools.readthedocs.io/en/latest/pkg_resources.html#resource-extraction
There was a lot of hand waving and jargon in the documentation, and 0 examples. I'm really confused as to how to structure my setup.py file in order to guarantee the resource extraction. Currently, I just label the whole 2.79 directory as "scripts" in my setuptools Extension and ship it.
Is there a way to write my setup.py and package my module so as to guarantee the 2.79 directory's location is the same as the currently running python executable when someone runs
py -3.6.8-32 -m pip install bpy
Besides simply "hacking it in"? I was considering writing a install_requires module that would simply move it if possible but that is mangling with the user's file system and kind of hacky. However it's the route I am going to go if this proves impossible.
Here is the original issue for anyone interested.
https://github.com/TylerGubala/blenderpy/issues/13
My build process is identical to the process descsribed in my answer here
https://stackoverflow.com/a/51575996/6767685

Maybe try the data_files option of distutils/setuptools.
You could start by adding data_files=[('mydata', ['setup.py'],)], to your setuptools.setup function call. Build a wheel, then install it and see if you can find mydata/setup.py somewhere in your sys.prefix.
In your case the difficult part will be to compute the actual target directory (mydata in this example). It will depend on the platform (Linux, Windows, etc.), if it's in a virtual environment or not, if it's a global or local install (not actually feasible with wheels currently, see update below) and so on.
Finally of course, check that everything gets removed cleanly on uninstall. It's a bit unnecessary when working with virtual environments, but very important in case of a global installation.
Update
Looks like your use case requires a custom step at install time of your package (since the location of the binary for the Python interpreter relative to sys.prefix can not be known in advance). This can not be done currently with wheels. You have seen it yourself in this discussion.
Knowing this, my recommendation would be to follow the advice from Jan Vlcinsky in his comment for his answer to this question:
Post install script after installing a wheel.
Add an extra setuptools console entry point to your package (let's call it bpyconfigure).
Instruct the users of your package to run it immediately after installing your package (pip install bpy && bpyconfigure).
The purpose of bpyconfigure should be clearly stated (in the documentation and maybe also as a notice shown in the console right after starting bpyconfigure) since it would write into locations of the file system where pip install does not usually write.
bpyconfigure should figure out where is the Python interpreter, and where to write the extra data.
The extra data to write should be packaged as package_data, so that it can be found with pkg_resources.
Of course bpyconfigure --uninstall should be available as well!

How can I build an RPM for an earlier version of python?

How can I build a python distribution RPM that is only dependent on an earlier version of python?
Why? I'm trying to build a distribution RPMs for RHEL6/CentOS 6, which only includes Python 2.6, but I am building usually on machines with Python 2.7.
This is an open source project, and I have already ensured that it shouldn't be including any libraries/APIs that are not in 2.6.
I am building the RPMs with:
python setup.py bdist_rpm
setup.py file:
from distutils.core import setup
setup(name='pyresttest',
version='0.1',
description=Text',
maintainer='Not listing here',
maintainer_email='no,just no',
url='project url here',
keywords='rest web http testing',
packages=['pyresttest'],
license='Apache License, Version 2.0',
requires=['yaml','pycurl']
)
(Specifics removed for the url, maintainer, email and description).
The RPM appears to be valid, but when I try to install on RHEL6, I get this error:
python(abi) = 2.7 is needed by pyresttest-0.1-1.noarch
There should be some way to get it to override the default python version to require, or supply a custom SPEC file, but after several hours of fiddling with it, I'm stuck. Ideas?
EDIT: I suppose I should clarify why I'm doing a RPM for python code, instead of just using setuptools or pip: this will hopefully go to production at work, where all deployments are RPM-based and most VMs are still RHEL6. Asking them to adopt another packaging tool is likely to be a non-starter, since our company is closely tied to the RPM format.

Re-organized the answer.
Actually, there's no "rpm-package". There're rpm-packages for RHEL6, rpm-packages for FedoraNN, rpm-packagse for OpenSUSE-X.Y and so on. And besides there're Debian, Ubuntu, Arch and Gentoo :)
You have the following possibilities with your Python package:
You may completely avoid rpm-, deb- and other "native linux packaging systems", and may opt to use a "python-native" packaging system like PIP. Thus you completely avoid the complexity and lack of compatibility between packaging systems in various versions and various flavours of Linux. And for a package which doesn't "infiltrate" deeply into "core system", this could be the best solution.
You may continue to use RPM as an archive format for your package but completely turn off automatic dependency calculations. This can be done with AutoReqProv: no directive in the spec. To be able to work with a customized spec one may use --spec-only and --spec-file distutils options. But remember that a package built this way is even worse than a zip from p.1: without proper dependencies it contains less necessary metainformation and thus "defames" the whole idea behind Linux packaging systems which were invented to built consistent systems, to avoid problems like "DLL hell" and to be suitable for automatic maintainance and updates. Actually you may add dependency information manually, via Requires: <something> tag but this may become even more hard and bporing if you target several Linux platforms at once.
In order to take into account all those complex and boring details and nuances of a particular package system you may create "build sandboxes" with appropriate versions of necessary Linux flavours. My preferred way to create such sandboxes is to use pre-created "OpenVZ templates", but without OpenVZ per se: simply unpack a given archive into a subdirectory (being root to preserve permissions), then chroot into the subdirectory, and voila! you've got Debian, RHEL etc... Fedora people have created Mock for the same purposes and likely Mock would be a more elaborated solution. As #BobMcGee suggests in the comment one also may consider Jenkins Docker plugin
Once you have a build sandbox with python distribution specific to that system, distutils etc you may automate the build process using simple scripting, bash or python.
That's it.

I do not do very much python work but have done some RPM packaging. You probably need to somehow do what one would normally do in the RPM's spec file and specify and require a particular release of your python package like so ...
# this would be in your spec file
requires: python <= 2.6
Take a look here for more info:
http://ftp.rpm.org/max-rpm/s1-rpm-depend-manual-dependencies.html

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.