Why is recommended to add extra files in a python source package?

Why is recommended to add extra files in a python source package? - python

There are python tools like check-manifest, to verify that all your files under your vcs are included also in your MANIFEST.in. And releasing helpers like zest.releaser recommend you to use them.
I think files in tests or docs directories are never used directly from the python package. Usually services like read the docs or travis-ci are going to access that files, and they get the files from the vcs, not from the package. I have seen also packages including .travis.yml files, what makes even less sense to me.
What is the advantage of including all the files in the python package?

Related

Packaging python resources (Manifest.in vs package_data vs data_files)

It seems that non-python resources are included in python distribution packages one of 4 ways:
Manifest.in file (I'm not sure when this is preferred over package_data or data_files)
package_data in setup.py (for including resources within python import packages)
data_files in setup.py (for including resources outside python import packages)
something called setuptools-scm (which I believe uses your version control system to find resources instead of manifest.in or something)
Which of these are accessible from importlib.resources?
(It is my understanding that importlib.resources is the preferred way to access such resources.) If any of these are not accessible via importlib.resources, then how could/should one access such resources?
Other people online have been scolded for the suggestion of using __file__ to find the path to a resource because installed wheel distributions may be stored as zip files and therefore there won't even be a proper path to your resources. When are wheels extracted into site-packages and when do they remain zipped?

All of (1)-(3) will put files into your package (don't know about (4)).
At runtime, importlib.resources will then be able to access any data in your package.
At least with Python 3.9, which can access resources in subdirectories.
Before, you had to make each subdirectory a package by adding a __init__.
As for why not use __file__: Pythons import system has some weird ways to resolve packages. For example it can look them up in a zip file if you use Zipapp.
You may even have a custom loader for a package you are asked to load some resources from.
Who knows where those resources are located? Ans: importlib.resources.
Afaik, wheels are not a contender as those are unpacked.

Converting a python package into a single importable file

Is there a way to convert a python package, i.e. is a folder of python files, into a single file that can be copied and then directly imported into a python script without needing to run any extra shell commands? I know it is possible to zip all of the files and then unzip them from python when they are needed, but I'm hoping that there is a more elegant solution.

It's not totally clear what the question is. I could interpret it two ways.
If you are looking to manage the symbols from many modules in a more organized way:
You'll want to put an __init__.py file in your directory and make it a package. In it you can define the symbols for your package, and create a graceful import packagename behavior. Details on packages.
If you are looking to make your code portable to another environment:
One way or the other, the package needs to be accessible in whatever environment it is run in. That means it either needs to be installed in the python environment (likely using pip), copied into a location that is in a subdirectory relative to the running code, or in a directory that is listed in the PYTHONPATH environment variable.
The most straightforward way to package up code and make it portable is to use setuptools to create a portable package that can be installed into any python environment. The manual page for Packaging Projects gives the details of how to go about building a package archive, and optionally uploading to PyPi for public distribution. If it is for private use, the resulting archive can be passed around without uploading it to the public repository.

Python compile all modules into a single python file

I am writing a program in python to be sent to other people, who are running the same python version, however these some 3rd party modules that need to be installed to use it.
Is there a way to compile into a .pyc (I only say pyc because its a python compiled file) that has the all the dependant modules inside it as well?
So they can run the programme without needing to install the modules separately?
Edit:
Sorry if it wasnt clear, but I am aware of things such as cx_freeze etc but what im trying to is just a single python file.
So they can just type "python myapp.py" and then it will run. No installation of anything. As if all the module codes are in my .py file.

If you are on python 2.3 or later and your dependencies are pure python:
If you don't want to go the setuptools or distutiles routes, you can provide a zip file with the pycs for your code and all of its dependencies. You will have to do a little work to make any complex pathing inside the zip file available (if the dependencies are just lying around at the root of the zip this is not necessary. Then just add the zip location to your path and it should work just as if the dependencies files has been installed.
If your dependencies include .pyds or other binary dependencies you'll probably have to fall back on distutils.

You can simply include .pyc files for the libraries required, but no - .pyc cannot work as a container for multiple files (unless you will collect all the source into one .py file and then compile it).

It sounds like what you're after is the ability for your end users to run one command, e.g. install my_custom_package_and_all_required_dependencies, and have it assemble everything it needs.
This is a perfect use case for distutils, with which you can make manifests for your own code that link out to external dependencies. If your 3rd party modules are available publicly in a standard format (they should be, and if they're not, it's pretty easy to package them yourself), then this approach has the benefit of allowing you to very easily change what versions of 3rd party libraries your code runs against (see this section of the above linked doc). If you're dead set on packaging others' code with your own, you can always include the required files in the .egg you create with distutils.

Two options:
build a package that will install the dependencies for them (I don't recommend this if the only dependencies are python packages that are installed with pip)
Use virtual environments. You use an existing python on their system but python modules are installed into the virtualenv.
or I suppose you could just punt, and create a shell script that installs them, and tell them to run it once before they run your stuff.

Package python directory for different architectures

I have a personal python library consisting of several modules of scientific programs that I use. These live on a directory with the structure:
root/__init__.py
root/module1/__init__.py
root/module1/someprog.py
root/module1/ (...)
root/module2/__init__.py
root/module2/someprog2.py
root/module2/somecython.pyx
root/module2/somecython.so
root/module2/somefortran.f
root/module2/somefortran.so
(...)
I am constantly making changes to these programs and adding new files. With my current setup at work, I share the same directory with several machines of different architectures. What I want is a way to use these packages from python in the different architectures. If the packages were all pure python, this would be no problem. But the issue is that I have several compiled binaries (as shown in the example) from Cython and from f2py.
Is there a clever way to repackage these binaries so that python in the different systems only imports the relevant binaries? I'd like to keep the code organised in the same directory.
Obviously the simplest way would be to duplicate the directory or create another directory of symlinks. But this would mean that when new files are created, I'd have to update the symlinks manually.
Has anyone bumped into a similar problem, or can suggest a more pythonic approach to this organisation problem?

Probably you should use setuptools/distribute. You can then define a setup.py that compiles all files according to your current platform, copies them to some adequate directory and makes sure they are available in your sys.path.

You would do the following when you compile the source code of python.
Pass the exec-prefix flag with the directory to ./configure
For more info: ./configure --help will give you the following:
Installation directories:
--prefix=PREFIX install architecture-independent files in PREFIX
[/usr/local]
--exec-prefix=EPREFIX install architecture-dependent files in EPREFIX
[PREFIX]
Hope this helps :)

There is unfortunately no way to do this. A python package must reside entirely in one directory. PEP 382 proposed support for namespace-packages that could be split in different directories, but it was rejected. (And in any case, those would be special packages.)
Given that python packages have to be in a single directory, it is not possible to mix compiled extension modules for different architectures. There are two ways to mitigate this problem:
Keep binary extensions on a separate directory, and have all the python packages in a common directory that can be shared between architectures. The separate directory for binary extension can then be selected for different architectures with PYTHONPATH.
Keep a common directory with all the python files and extensions for different architectures. For each architecture, create a new directory with the package name. Then symlink all the python files and binaries in each of these directories. This will still allow a single place where the code lives, at the expense of having to create new symlinks for each new file.
The option suggested by Thorsten Krans is unfortunately not viable for this problem. Using distutils/setuptools/distribute still requires all the python source files to be installed in a directory for each architecture, negating the advantage of having them in a single directory. (This is not a finished package, but always work in progress.)

Why would one use an egg over an sdist?

About the only reason I can think of to distribute a python package as an egg is so that you can not include the .py files with your package (and only include .pyc files, which is a dubious way to protect your code anyway). Aside from that, I can't really think of any reason to upload a package as an egg rather than an sdist. In fact, pip doesn't even support eggs.
Is there any real reason to use an egg rather than an sdist?

One reason: eggs can include compiled C extension modules so that the end user does not need to have the necessary build tools and possible additional headers and libraries to build the extension module from scratch. The drawback to that is that the packager may need to supply multiple eggs to match each targeted platform and Python configuration. If there are many supported configurations, that can prove to be a daunting task but it can be effective for more homogenous environments.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Why is recommended to add extra files in a python source package? - python

Related

Packaging python resources (Manifest.in vs package_data vs data_files)

Converting a python package into a single importable file

Python compile all modules into a single python file

Package python directory for different architectures

Why would one use an egg over an sdist?

Categories

Resources