What is the point of built distributions for pure Python packages? - python

One can share Python as a source distribution (.tar.gz format) or as a built distribution (wheels format).
As I understand it, the point of built distributions is:
Save time: Compilation might be pretty time-consuming. We can do this once on the server and share it for many users.
Reduce requirements: The user does not have to have a compiler installed
However, those two arguments for bdist files seem not to hold for pure-python packages. Still, I see that natsort comes in both, a sdist and a bdist. Is there any advantage of sharing a pure-python package in bdist format?

From pythonwheels.com:
Advantages of wheels
Faster installation for pure Python and native C extension packages.
Avoids arbitrary code execution for installation. (Avoids setup.py)
Installation of a C extension does not require a compiler on Linux, Windows or macOS.
Allows better caching for testing and continuous integration.
Creates .pyc files as part of installation to ensure they match the Python interpreter used.
More consistent installs across platforms and machines.
So for me, I think the first and second points are most meaningful for a pure Python package. It's smaller, faster and also more secure.

Related

Building a python package to publish in pypi

I am greatly confused with the process of building a python package that I want to distribute on pypi.
There are some specific, basic things that I did not understand:
What exactly is that gets published? Binaries? Source code? How do I do one or the other?
How do I build multiple platform-specific, os-specific build from the same codebase?
How do I build a the package for multiple versions of Python from the same codebase? Is it necessary if I want to support many python versions?
I am using a .toml file for the setup configuration.
I found some answers only, but all refer to procedures with either a setup.py or a setup.cfg.
What exactly is that gets published? Binaries? Source code?
Yes, and yes.
It depends on details of your project and your package config.
Arbitrary commands can be run during a package build.
You might, for example, run a fortran compiler locally
and ship binaries, or you might insist that each person
installing the package run their own local fortran compiler.
We usually expect full *.py source code will appear on pypi.org.
"Binaries" here will usually describe compiled machine code,
and not *.pyc bytecode files.
How do I build multiple platform-specific, os-specific build from the same codebase?
I have only done this via git pull on a target platform, followed
by a local build, but there is certainly support for cross target
toolchains if you need that.
How do I build a the package for multiple versions of Python from the same codebase?
Same as above -- do a build under each separate target version.
Is it necessary if I want to support many python versions?
Typically the answer is "no".
Pick a minimum required interpreter version, such as 3.7,
and do all your development / testing / release work in that.
Backward compatibility of interpreters is excellent.
Folks running 3.8 or 3.11 should have no trouble
with your package.
There can be a fly in the ointment.
Suppose your project depends on library X,
or depends on X which depends on Y.
And one of them stopped being updated a few years ago,
or went through a big change like a rename.
Your users who are on 3.11 might find it
inconvenient to obtain a compatible version of X or Y.
This might motivate you to do split releases,
for example via major version number or by
slightly altering your project name.
Clearly you haven't crossed that bridge quite yet.
The poetry ecosystem
is reasonably mature. It has tried to fix many of the
rough edges surrounding the python packaging practices
of the last few decades. I recommend that you prefer
modern over ancient practices, and that you adopt poetry
for your project.
If that won't fly for some reason, and especially if
binaries are a big deal for your project, consider publishing via
conda.
There are many pip pitfalls with target system
needing compilers and libraries. Conda does an
excellent job of ensuring that conda install ...
will Just Work.

Distribution format for Python libraries uploaded on PyPI

I went through the tutorial of uploading packages to https://test.pypi.org/ and I was successful in doing so.
However, $python setup.py sdist bdist_wheel produces a .whl file and a tar.gz file in the dist/ directory. twine allows uploading just the .whl or tar.gz file or both. I see many repositories on https://pypi.org/ have uploaded both the formats.
I want to understand what is the best practice. Is one format preferred over the other? If .whl file is enough for distributing my code, should I upload tar.gz file too? Or is there anything else I am completely missing here.
Best practice is to provide both.
A "built distribution" (.whl) for users that are able to use that distribution. This saves install time as a "built distribution" is pre-built and can just be dropped into place on the users machine, without any compilation step or without executing setup.py. There may be more than one built distribution for a given release -- once you start including compiled binaries with your distribution, they become platform-specific (see https://pypi.org/project/tensorflow/#files for example)
A "source distribution" (.tar.gz) is essentially a fallback for any user that cannot use your built distribution(s). Source distributions are not "built" meaning they may require compilation to install. At the minimum, they require executing a build-backend (for most projects, this is invoking setup.py with setuptools as the build-backend). Any installer should be able to install from source. In addition, a source distribution makes it easier for users who want to audit your source code (although this is possible with built distributions as well).
For the majority of Python projects, turning a "source distribution" into a "built distribution" results in a single pure-Python wheel (which is indicated by the none-any in a filename like projectname-1.2.3-py2.py3-none-any.whl). There's not much difference between this and the source distribution, but it's still best practice to upload both.
A .tar.gz is a so called source distribution. It contains the source code of your package and instructions on how to build it, and the target system will perform the build before installing it.
A .wheel (spec details in PEP 427) is a built distribution format, which means that the target system doesn't need to build it any more. Installing a wheel usually just means copying its content into the right site-packages.
The wheel sounds downright superior, because it is. It is still best practice to upload both, wheels and a source distribution, because any built distribution format only works for a subset of target systems. For a package that contains only python code, that subset is "everything" - people still often upload source distributions though, maybe to be forward compatible in case a new standard turns up[1], maybe to anticipate system specific extensions that will suddenly require source distributions in order to support all platforms, maybe to give users the option to run a custom build with specific build-parameters.
A good example package to observe the different cases is numpy, which uploads a whooping 25 wheels to cover the most popular platforms, plus a source distribution. If you install numpy from any of the supported platforms, you'll get a nice short install taking a couple of second where the contents of the wheel are copied over. If you are on an unsupported platform (such as alpine), a normal computer will probably take at least 20 minutes to build numpy from source before it can be installed, and you need to have all kinds of system level dev tools for building C-extensions installed. A bit of a pain, but still better then not being able to install it all.
[1] After all, before wheel there was egg, and the adoption of the wheel format would have been a lot harder than it already was if package managers had decided to only upload egg distributions and no sources.
You can just upload whl file and install with below command

Distributing a Python module with ffmpeg dependency

Rookie software developer here. I’m working on a Python module that harnesses some functionality from the FFmpeg framework - specifically, the ebur128 filter function. Ideally the module will stand on its own as an independent, platform agnostic tool for verifying that audio clips comply with EBU loudness standards. It’s being designed so that end users need only perform one simple, (hopefully!) painless installation procedure, which will encompass the installation of both the FFmpeg libraries and my Python wrapper/GUI.
I apologize for the rather vague question, but does anyone have general advice for creating Python module with external dependencies, or specific advice for standardizing the FFmpeg installation across platforms? Distutils seems pretty helpful – are there other guidelines or standard practices for developing a neatly packaged Python tool? I want to minimize any installation headaches for end users.
Thanks very much.
For Windows
I think it will be easy to find ffmpeg binaries that work on any system, just like for Qt or whatever GUI library you are using. You can ship these binaries with your project and things will work (you may want to distinguish 32 bit and 64 bit systems, though).
It looks like you want to create a software that is self-contained and easily installable for end-users. Inkscape is such an example -- its installer contains Python and all other dependencies, in binary form (if required). That is, for Windows, you do not need to create a real Python package (which would allow installation with pip), and you do not need to look into distutils (which supports building C extensions). Both you do not need/want, I guess.
Maybe it will be enough for you to assemble a good directory structure and to distribute a ZIP archive with your software. This is enough if you do not need to interact with the Windows registry, for instance. Such programs are usually called "standalone", in the Windows world. However, you might still want to have a real Windows installer (even if it is just a self-extracting archive). The following article covers your requirements, I believe: http://cyrille.rossant.net/create-a-standalone-windows-installer-for-your-python-application/
It suggests using http://www.jrsoftware.org/isinfo.php for creating such an installer.
Other platforms
On other operating systems it will be more difficult. For instance, I think it will be almost impossible to create ffmpeg binaries that run on every Linux system, because ffmpeg itself has so many binary dependencies. I do not know whether you can statically build ffmpeg at all.

Python: Is there a performance difference between `dist` and `sdist`?

Python setuptools can create a source distribution:
python setup.py sdist # create a source distribution (tarball, zip file, etc.)
Or a binary distribution:
python setup.py bdist # create a built (binary) distribution
As far as I understand, there should not be any performance difference:
bdist installs the already-compiled .pyc files from the binary package.
sdist compiles the .py files to .pyc files, and installs them.
When executed it should not matter how were the .pyc files compiled - they should have the same performance.
Is there any performance difference between dist and sdist python packages?
If you have a pure Python code, the difference in time deploying will be slim. Note that there is no difference in performance between .py and .pyc, except that the latter will be read slightly faster the first time. The so called optimised .pyo only strip the asserts, and optionally, get rid of the docstrings, so they are not very much optimised.
The big difference comes when you have C files. sdist will include them if properly referenced, but the user will need a working and appropiate compiler, Python header files, and so on. Also, you will have to take the time to build them on every client. The same distribution will be valid for any platform you deploy in.
On the other hand, bdist compiles the code once. Installing in the client is immediate, as they don't need to build anything, and easier as they don't require a compiler installed. The downside is that you have to build for that platform. Setuptools is capable of doing cross-compilation, provided you have installed and configured the right tools.

Should I bundle C libraries with my Python application?

If I have a Python package that depends on some C libraries (like say the Gnu Scientific Library (GSL) for numerical computations), is it a good idea to bundle the library with my code?
I'd like to make my package as easy to install as possible for users and I don't want them to have to download C libraries by hand and supply include-paths. Also I could always ensure that the version of the library that I ship is compatible with my code.
However, is it possible that there are clashes if the user has the library installed already, or ar there any other reasons why I shouldn't do this?
I know that I can make it easier for users by just providing a binary distribution, but I'd like to avoid having to maintain binary distributions for all possible OSs. So, I'd like to stick to a source distribution, but for the user (who proudly owns a C compiler) installation should be as easy as python setup.py install.
Distribution is one of the hard parts for any software project. Java and .NET lift part of this burden by defining a standard runtime and then just saying "just distribute everything else." Of course there's a drawback: everything must be rewritten in a language supported by the runtime - as soon as you want to use native code, you lose all the advantages.
That's harder in Python, as it is in Ruby, C, C++ and other languages, as they usually leverage existing native libraries.
Generally speaking:
Make it possible to get a source sdist, via pypi.python.org as an example. Correctly set your install_requires (probably you'll require python bindings for GSL, not GSL itself). Use standard setuptools/distribute layout. This will let anyone - let's say a package maintainer for any distro - to pick up your software and package it.
Additionally, consider providing a full-blown installable package for your audience. You don't have to support all the distros and operating system; pick one or two that you consider will be used most. Tools like PyInstaller will let you create an installable, runnable package for many operating systems, but especially for linux you might want the user to install the distribution's own version of transitive deps (libgsl?) - you'll need a full-blown deb or rpm package to satisfy that - again, don't try supporting any and all the distro, you'll turn out mad. Support something you most use, and let other users to help you with other packaging needs.
Also take a look at Python Packaging Guide
You could have two separate branches of the src, one containing the libraries and another that doesn't. That way you can explicitly warn your users in case they have installed the libraries. Another solution could be (if the licences of the libraries allow you) is to wrap 'em up in a single file.
I think there's no unique solution, but this are the ideas I could think so far.
Good luck
You can use virtualenv to create a private Python environment for your application. This avoids conflicts with other libraries. It is best if you package modules and dependencies such as your libraries using Distribute. Distutils is something else that is worth researching.

Categories

Resources