About the only reason I can think of to distribute a python package as an egg is so that you can not include the .py files with your package (and only include .pyc files, which is a dubious way to protect your code anyway). Aside from that, I can't really think of any reason to upload a package as an egg rather than an sdist. In fact, pip doesn't even support eggs.
Is there any real reason to use an egg rather than an sdist?
One reason: eggs can include compiled C extension modules so that the end user does not need to have the necessary build tools and possible additional headers and libraries to build the extension module from scratch. The drawback to that is that the packager may need to supply multiple eggs to match each targeted platform and Python configuration. If there are many supported configurations, that can prove to be a daunting task but it can be effective for more homogenous environments.
Related
It seems that non-python resources are included in python distribution packages one of 4 ways:
Manifest.in file (I'm not sure when this is preferred over package_data or data_files)
package_data in setup.py (for including resources within python import packages)
data_files in setup.py (for including resources outside python import packages)
something called setuptools-scm (which I believe uses your version control system to find resources instead of manifest.in or something)
Which of these are accessible from importlib.resources?
(It is my understanding that importlib.resources is the preferred way to access such resources.) If any of these are not accessible via importlib.resources, then how could/should one access such resources?
Other people online have been scolded for the suggestion of using __file__ to find the path to a resource because installed wheel distributions may be stored as zip files and therefore there won't even be a proper path to your resources. When are wheels extracted into site-packages and when do they remain zipped?
All of (1)-(3) will put files into your package (don't know about (4)).
At runtime, importlib.resources will then be able to access any data in your package.
At least with Python 3.9, which can access resources in subdirectories.
Before, you had to make each subdirectory a package by adding a __init__.
As for why not use __file__: Pythons import system has some weird ways to resolve packages. For example it can look them up in a zip file if you use Zipapp.
You may even have a custom loader for a package you are asked to load some resources from.
Who knows where those resources are located? Ans: importlib.resources.
Afaik, wheels are not a contender as those are unpacked.
I am packaging a python project and look for the "right" way to specify dpendencies on external files. I am aware of multiple ways to specify dpendencies, but I would like to know how these ways work practically, how they differ, and at which specific time points they are taking effect, respectively.
Known ways to specify dependencies (see here)
We could specify the dependency in the install_requires argument of setuptools.setup
We could use requirements.txt
We could use setup.cfg
My main questions
What are advantages and disadvantages of the different methods?
When and in which order are the information in the respective files read? Would e.g. pip first execute setup.py and then check the requirements.txt afterwards?
What happens if I specify a requirement only in one of the ways given above? What happens, if I speciefy different requirements with different ways?
Motivating example
I need to create a package that uses cython and numpy. As can be seen e.g. here, cython (and similarly numpy, must be imported before setuptools.setup is called. Hence, setup.py would raise an ImportError if the library is not available. How would I make the requirement visible anyway so that the necessary packages are installed before setup.py is called? Should I move the compilation to a different file that is then called from setup.py? How would I do that?
These are many broad questions. It is difficult to give an exhaustive answer and to keep opinions aside...
When it comes strictly to packaging Python projects, I believe pip's "requirements" files are not commonly useful nor used (there are counter examples of course, but I would argue that they are not a good idea for packaging in general). But in some other use cases, I believe they are very helpful.
If setuptools is used for packaging (there are perfectly valid alternatives), then install_requires is the way to go to declare the direct dependencies of the project. Unless there is no other way around, then I would recommend placing the whole setuptools configuration (this includes install_requires) in the setup.cfg file instead of as arguments to setuptools.setup(). The result would be exactly the same in both cases but (as it is now commonly accepted) for configuration, a static declarative file is a much more natural fit than executable code. As for the order of precedence, I believe a setuptools.setup() argument would override the same configuration item read from setup.cfg.
Would e.g. pip first execute setup.py and then check the requirements.txt afterwards?
These are two different things. The requirements file is more or less a list of projects that one wants to install. Each of these projects might be packaged (setuptools or other) and distributed (source or pre-built wheel) in different ways, some will require the execution of a setup.py some won't. But again, when packaging (and distributing) a project there is usually no need to bother with a requirements file (never when the project is a library, and not at first when the project is an application, see Donald Stufft's article "setup.py vs requirements.txt").
I would rather not go into details regarding the question about the combination of Cython and setuptools. It would be a good additional question. These two links might help get you started though:
https://setuptools.readthedocs.io/en/latest/setuptools.html#distributing-extensions-compiled-with-cython
https://medium.com/#grassfedcode/pep-517-and-518-in-plain-english-47208ca8b7a6
There are python tools like check-manifest, to verify that all your files under your vcs are included also in your MANIFEST.in. And releasing helpers like zest.releaser recommend you to use them.
I think files in tests or docs directories are never used directly from the python package. Usually services like read the docs or travis-ci are going to access that files, and they get the files from the vcs, not from the package. I have seen also packages including .travis.yml files, what makes even less sense to me.
What is the advantage of including all the files in the python package?
My python script runs with several imports. On some systems where it needs to run, some of those modules may not be installed. Is there a way to distribute a standalone script that will automagically work? Perhaps by just including all of those imports in the script itself?
Including all necessary modules in a standalone script is probably extremely tricky and not nice. However, you can distribute modules along with your script (by distributing an archive for example).
Most modules will work if they are just installed in the same folder as your script instead of the usual site-packages. According to the sys.path order, the system's module will be loaded in preference to the one you ship, but if it doesn't exist the later will be imported transparently.
You can also bundle the dependencies in a zip and add that zip to the path, if you think that approach is cleaner.
However, some modules cannot be that flexible. One example are extensions that must first be compiled (like C extensions) and are thus bound to the platform.
IMHO, the cleanest solution is still to package your script properly using distutils and proper dependency definition and write some installation routine that installs missing dependencies from your bundle or using pip.
You can take a look at python Eggs
http://mrtopf.de/blog/en/a-small-introduction-to-python-eggs/
I've read some about .egg files and I've noticed them in my lib directory but what are the advantages/disadvantages of using then as a developer?
From the Python Enterprise Application Kit community:
"Eggs are to Pythons as Jars are to Java..."
Python eggs are a way of bundling
additional information with a Python
project, that allows the project's
dependencies to be checked and
satisfied at runtime, as well as
allowing projects to provide plugins
for other projects. There are several
binary formats that embody eggs, but
the most common is '.egg' zipfile
format, because it's a convenient one
for distributing projects. All of the
formats support including
package-specific data, project-wide
metadata, C extensions, and Python
code.
The primary benefits of Python Eggs
are:
They enable tools like the "Easy Install" Python package manager
.egg files are a "zero installation" format for a Python
package; no build or install step is
required, just put them on PYTHONPATH
or sys.path and use them (may require
the runtime installed if C extensions
or data files are used)
They can include package metadata, such as the other eggs they depend on
They allow "namespace packages" (packages that just contain other
packages) to be split into separate
distributions (e.g. zope., twisted.,
peak.* packages can be distributed as
separate eggs, unlike normal packages
which must always be placed under the
same parent directory. This allows
what are now huge monolithic packages
to be distributed as separate
components.)
They allow applications or libraries to specify the needed
version of a library, so that you can
e.g. require("Twisted-Internet>=2.0")
before doing an import
twisted.internet.
They're a great format for distributing extensions or plugins to
extensible applications and frameworks
(such as Trac, which uses eggs for
plugins as of 0.9b1), because the egg
runtime provides simple APIs to locate
eggs and find their advertised entry
points (similar to Eclipse's
"extension point" concept).
There are also other benefits that may come from having a standardized
format, similar to the benefits of
Java's "jar" format.
-Adam
One egg by itself is not better than a proper source release. The good part is the dependency handling. Like debian or rpm packages, you can say you depend on other eggs and they'll be installed automatically (through pypi.python.org).
A second comment: the egg format itself is a binary packaged format. Normal python packages that consist of just python code are best distributed as "source releases", so "python setup.py sdist" which result in a .tar.gz. These are also commonly called "eggs" when uploaded to pypi.
Where you need binary eggs: when you're bundling some C code extension. You'll need several binary eggs (a 32bit unix one, a windows one, etc.) then.
Eggs are a pretty good way to distribute python apps. Think of it as a platform independent .deb file that will install all dependencies and whatnot. The advantage is that it's easy to use for the end user. The disadvantage are that it can be cumbersome to package your app up as a .egg file.
You should also offer an alternative means of installation in addition to .eggs. There are some people who don't like using eggs because they don't like the idea of a software program installing whatever software it wants. These usually tend to be sysadmin types.
.egg files are basically a nice way to deploy your python application. You can think of it as something like .jar files for Java.
More info here.
Whatever you do, do not stop distributing your application, also, as a tarball, as that is the easiest packagable format for operating systems with a package sysetem.
For simple Python programs, you probably don't need to use eggs. Distributing the raw .py files should suffice; it's like distributing source files for GNU/Linux. You can also use the various OS "packagers" (like py2exe or py2app) to create .exe, .dmg, or other files for different operating systems.
More complex programs, e.g. Django, pretty much require eggs due to the various modules and dependencies required.