Python: Is there a performance difference between `dist` and `sdist`? - python

Python setuptools can create a source distribution:
python setup.py sdist # create a source distribution (tarball, zip file, etc.)
Or a binary distribution:
python setup.py bdist # create a built (binary) distribution
As far as I understand, there should not be any performance difference:
bdist installs the already-compiled .pyc files from the binary package.
sdist compiles the .py files to .pyc files, and installs them.
When executed it should not matter how were the .pyc files compiled - they should have the same performance.
Is there any performance difference between dist and sdist python packages?

If you have a pure Python code, the difference in time deploying will be slim. Note that there is no difference in performance between .py and .pyc, except that the latter will be read slightly faster the first time. The so called optimised .pyo only strip the asserts, and optionally, get rid of the docstrings, so they are not very much optimised.
The big difference comes when you have C files. sdist will include them if properly referenced, but the user will need a working and appropiate compiler, Python header files, and so on. Also, you will have to take the time to build them on every client. The same distribution will be valid for any platform you deploy in.
On the other hand, bdist compiles the code once. Installing in the client is immediate, as they don't need to build anything, and easier as they don't require a compiler installed. The downside is that you have to build for that platform. Setuptools is capable of doing cross-compilation, provided you have installed and configured the right tools.

Related

Distribution format for Python libraries uploaded on PyPI

I went through the tutorial of uploading packages to https://test.pypi.org/ and I was successful in doing so.
However, $python setup.py sdist bdist_wheel produces a .whl file and a tar.gz file in the dist/ directory. twine allows uploading just the .whl or tar.gz file or both. I see many repositories on https://pypi.org/ have uploaded both the formats.
I want to understand what is the best practice. Is one format preferred over the other? If .whl file is enough for distributing my code, should I upload tar.gz file too? Or is there anything else I am completely missing here.
Best practice is to provide both.
A "built distribution" (.whl) for users that are able to use that distribution. This saves install time as a "built distribution" is pre-built and can just be dropped into place on the users machine, without any compilation step or without executing setup.py. There may be more than one built distribution for a given release -- once you start including compiled binaries with your distribution, they become platform-specific (see https://pypi.org/project/tensorflow/#files for example)
A "source distribution" (.tar.gz) is essentially a fallback for any user that cannot use your built distribution(s). Source distributions are not "built" meaning they may require compilation to install. At the minimum, they require executing a build-backend (for most projects, this is invoking setup.py with setuptools as the build-backend). Any installer should be able to install from source. In addition, a source distribution makes it easier for users who want to audit your source code (although this is possible with built distributions as well).
For the majority of Python projects, turning a "source distribution" into a "built distribution" results in a single pure-Python wheel (which is indicated by the none-any in a filename like projectname-1.2.3-py2.py3-none-any.whl). There's not much difference between this and the source distribution, but it's still best practice to upload both.
A .tar.gz is a so called source distribution. It contains the source code of your package and instructions on how to build it, and the target system will perform the build before installing it.
A .wheel (spec details in PEP 427) is a built distribution format, which means that the target system doesn't need to build it any more. Installing a wheel usually just means copying its content into the right site-packages.
The wheel sounds downright superior, because it is. It is still best practice to upload both, wheels and a source distribution, because any built distribution format only works for a subset of target systems. For a package that contains only python code, that subset is "everything" - people still often upload source distributions though, maybe to be forward compatible in case a new standard turns up[1], maybe to anticipate system specific extensions that will suddenly require source distributions in order to support all platforms, maybe to give users the option to run a custom build with specific build-parameters.
A good example package to observe the different cases is numpy, which uploads a whooping 25 wheels to cover the most popular platforms, plus a source distribution. If you install numpy from any of the supported platforms, you'll get a nice short install taking a couple of second where the contents of the wheel are copied over. If you are on an unsupported platform (such as alpine), a normal computer will probably take at least 20 minutes to build numpy from source before it can be installed, and you need to have all kinds of system level dev tools for building C-extensions installed. A bit of a pain, but still better then not being able to install it all.
[1] After all, before wheel there was egg, and the adoption of the wheel format would have been a lot harder than it already was if package managers had decided to only upload egg distributions and no sources.
You can just upload whl file and install with below command

What is the point of built distributions for pure Python packages?

One can share Python as a source distribution (.tar.gz format) or as a built distribution (wheels format).
As I understand it, the point of built distributions is:
Save time: Compilation might be pretty time-consuming. We can do this once on the server and share it for many users.
Reduce requirements: The user does not have to have a compiler installed
However, those two arguments for bdist files seem not to hold for pure-python packages. Still, I see that natsort comes in both, a sdist and a bdist. Is there any advantage of sharing a pure-python package in bdist format?
From pythonwheels.com:
Advantages of wheels
Faster installation for pure Python and native C extension packages.
Avoids arbitrary code execution for installation. (Avoids setup.py)
Installation of a C extension does not require a compiler on Linux, Windows or macOS.
Allows better caching for testing and continuous integration.
Creates .pyc files as part of installation to ensure they match the Python interpreter used.
More consistent installs across platforms and machines.
So for me, I think the first and second points are most meaningful for a pure Python package. It's smaller, faster and also more secure.

setuptools "eager_resources" to executable directory

I maintain a Python utility that allows bpy to be installable as a Python module. Due to the hugeness of the spurce code, and the length of time it takes to download the libraries, I have chosen to provide this module as a wheel.
Unfortunately, platform differences and Blender runtime expectations makes support for this tricky at times.
Currently, one of my big goals is to get the Blender addon scripts directory to install into the correct location. The directory (simply named after the version of Blender API) has to exist in the same directory as the Python executable.
Unfortunately the way that setuptools works (or at least the way that I have it configured) the 2.79 directory is not always placed as a sibling to the Python executable. It fails on Windows platforms outside of virtual environments.
However, I noticed in setuptools documentation that you can specify eager_resources that supposedly guarantees the location of extracted files.
https://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-resource-extraction
https://setuptools.readthedocs.io/en/latest/pkg_resources.html#resource-extraction
There was a lot of hand waving and jargon in the documentation, and 0 examples. I'm really confused as to how to structure my setup.py file in order to guarantee the resource extraction. Currently, I just label the whole 2.79 directory as "scripts" in my setuptools Extension and ship it.
Is there a way to write my setup.py and package my module so as to guarantee the 2.79 directory's location is the same as the currently running python executable when someone runs
py -3.6.8-32 -m pip install bpy
Besides simply "hacking it in"? I was considering writing a install_requires module that would simply move it if possible but that is mangling with the user's file system and kind of hacky. However it's the route I am going to go if this proves impossible.
Here is the original issue for anyone interested.
https://github.com/TylerGubala/blenderpy/issues/13
My build process is identical to the process descsribed in my answer here
https://stackoverflow.com/a/51575996/6767685
Maybe try the data_files option of distutils/setuptools.
You could start by adding data_files=[('mydata', ['setup.py'],)], to your setuptools.setup function call. Build a wheel, then install it and see if you can find mydata/setup.py somewhere in your sys.prefix.
In your case the difficult part will be to compute the actual target directory (mydata in this example). It will depend on the platform (Linux, Windows, etc.), if it's in a virtual environment or not, if it's a global or local install (not actually feasible with wheels currently, see update below) and so on.
Finally of course, check that everything gets removed cleanly on uninstall. It's a bit unnecessary when working with virtual environments, but very important in case of a global installation.
Update
Looks like your use case requires a custom step at install time of your package (since the location of the binary for the Python interpreter relative to sys.prefix can not be known in advance). This can not be done currently with wheels. You have seen it yourself in this discussion.
Knowing this, my recommendation would be to follow the advice from Jan Vlcinsky in his comment for his answer to this question:
Post install script after installing a wheel.
Add an extra setuptools console entry point to your package (let's call it bpyconfigure).
Instruct the users of your package to run it immediately after installing your package (pip install bpy && bpyconfigure).
The purpose of bpyconfigure should be clearly stated (in the documentation and maybe also as a notice shown in the console right after starting bpyconfigure) since it would write into locations of the file system where pip install does not usually write.
bpyconfigure should figure out where is the Python interpreter, and where to write the extra data.
The extra data to write should be packaged as package_data, so that it can be found with pkg_resources.
Of course bpyconfigure --uninstall should be available as well!

.pyc files generated in different folder

Friends,
I was trying to create a python distribution. I executed the commands:
python setup.py sdist followed by python setup.py install
but the distribution created showed up folders build and dist without .pyc file.
Then I tried to find that file using windows find and got that its present in
C:\Python27\Lib\site-packages
Could anybody tell me the mistake I did in setup or missed anything.
Thanks in advance,
Saurabh
sdist command creates a source distribution that should not and does not contain pyc files.
install command installs in your local environment. Creating Python distributions is not its purpose.
If your package is pure Python then the source distribution is enough to install it anywhere you like; you don't need pyc files that depend on Python version and thus less general. bdist or bdist_egg (setuptools) generate among other things pyc files.
You won't get any pyc files until the first time you run the module. A source distribution is just that. Distributing pyc files is not usually useful anyway, they are not portable. If you intended to only distribute pyc files then you are in for a lovely set of problems when different python versions, and different operating systems, are used. Always distribute the source.
For most modules, the time taken to generate them the first time they are used is trivial - it is not worth worrying about.
By the way, when you move to Python 3, the pyc files are now stored in a directory called __pycache__ (3.2).

What can be (safely) removed from custom python installation on linux, to make it smaller

For one of my projects at work I needed to create a standalone python installation (from source). However, the complete directory takes ~90MB of disk space, not much, but too much to be replicated over and over.
Which files can I remove from the custom python installation directory?
There is a large "test" folder (./lib/python2.7/test), everything is precompiled (but 99% of modules will not be used in this project), libpython2.7.a is placed twice (./lib and .lib/python2.7/config), etc.
freeze.py should help you - it's part of the standard installation.
See: Python: Where is freeze.py?
and: http://wiki.python.org/moin/Freeze
and the README: http://svn.python.org/projects/python/trunk/Tools/freeze/README
It tries to only include what is required.

Categories

Resources