Override the shebang mangling in python setuptools

Override the shebang mangling in python setuptools - python

Background
I write small python packages for a system that uses modules (https://luarocks.org/) to manage packages. For those of you who don't know it, you can run module load x and a small script is run that modifies various environmental variables to make software 'x' work, you can then undo this with module unload x.
This method of software management is nearly ubiquitous in scientific computing and has a lot of value in that arena: you can run ancient unmaintained software alongside packages that that software would interfere with, you can run multiple versions of software, which allows you to reproduce your data exactly (you can go back to old versions), and you can run frankly poorly written non updated software with outdated dependencies.
These features are great, but they create an issue with the python 2/3 split:
What if you want to write a package that works with both python 2 and 3 and use it alongside software that requires either python 2 or 3?
The way you make old python2 dependent software work on these large systems is that you make a python/2.7.x module and a python/3.5 module. When you want to run a script that uses python 2, you load that module, etc.
However, I want to write a single python package that can work in either environment, because I want that software to be active regardless of which python interpreter is being used.
This is fundamentally extremely easy: just use a #!/usr/bin/env python shebang line, done. That works. I write all my software to work with either, so no problem.
Question
The issue is: I want to use setuptools to distribute my package to other scientists in the same situation, and setup tools mangles the shebang line.
I don't want to get into a debate about whether mangling the shebang line is a good idea or not, I am sure it is since it has existed for years now in the same state. I honestly don't care, it doesn't work for me. The default setuptools install causes the software not to run because when a python interpreter's module is not loaded, that python interpreter does not function, the PYTHONPATH is totally wrong for it.
If all of my users had root access, I could use the data_files option to just copy the scripts to /usr/bin, but this is a bad idea for compatibility, and my users don't have root access anyway so it is a moot point.
Things I tried so far:
I tried setting the sys.executable to /usr/bin/env python in the setup.py file, but that doesn't work, because then the shebang is: #!"/usr/bin/env python", which obviously doesn't work.
I tried the Don't touch my shebang class idea in this question: Don't touch my shebang! (it is the bottom answer with 0 votes). That didn't work either, probably because it is written for distutils and not setuptools. Plus that question is 6 years old.
I also looked at these questions:
Setuptools entry_points/console_scripts have specific Python version in shebang
Changing console_script entry point interpreter for packaging
The methods described there do not work, the shebang line is still altered.
Creating a setup.cfg file with the contents::
[build]
executable = /usr/bin/env python
also does not change the shebang line mangling behavior.
There is an open issue on the setuptools github page that discusses something similar:
https://github.com/pypa/setuptools/issues/494
So I assume this isn't possible to do natively, but I wonder if there is a workaround?
Finally, I don't like any solution that involves asking the user to modify their install flags, e.g. with -e.
Is there anyway to modify this behavior, or is there another distribution system I can use instead? Or is this too much of an edge case and I just need to write some kind of custom installation script?
Thanks all.
Update
I think I was not clear enough in my original question, what I want the user to be able to do is:
Install the package in both python2 and python3 (the modules will go into lib/pythonx/site-lib.
Be able to run the scripts irrespective of which python environment is active.
If there is a way to accomplish this without preventing shebang munging, that would be great.
All my code is already compatible with python 2.7 and python 3.3+ out of the box, the main thing is just making the scripts run irrespective of active python environment.

I accidentally stumbled onto a workaround while trying to write a custom install script.
import os
from setuptools import setup
from setuptools.command.install import install
here = os.path.abspath(os.path.dirname(__file__))
# Generate a list of python scripts
scpts = []
scpt_dir = os.listdir(os.path.join(here, 'bin'))
for scpt in scpt_dir:
scpts.append(os.path.join(here, 'bin', scpt))
class ScriptInstaller(install):
"""Install scripts directly."""
def run(self):
"""Wrapper for parent run."""
super(ScriptInstaller, self).run()
setup(
cmdclass={'install': ScriptInstaller},
scripts=scpts,
...
)
This code doesn't do exactly what I wanted (alter just the shebang line), it actually just copies the whole script to ~/.local/bin, instead of wrapping it in::
__import__('pkg_resources').run_script()
Additionally, and more concerningly, this method makes setuptools create a root module directory plus an egg-info directory like this::
.local/lib/python3.5/site-packages/cluster
.local/lib/python3.5/site-packages/python_cluster-0.6.1-py3.5.egg-info
Instead of a single egg, which is the usual behavior::
.local/lib/python3.5/site-packages/python_cluster-0.6.1-py3.5.egg
As far as I am aware this is the behavior of the old distutils, which makes me worry that this install would fail on some systems or have other unexpected behavior (although please correct me if I am wrong, I really am no expert on this).
However, given that my code is going to be used almost entirely on linux and OS X, this isn't the end of the world. I am more concerned that this behavior will just disappear sometime very soon.
I posted a comment on an open feature request on the setuptools github page:
https://github.com/pypa/setuptools/issues/494
The ideal solution would be if I could add an executable=/usr/bin/env python statement to setup.cfg, hopefully that is reimplemented soon.
This workaround will work for me for now though. Thanks all.

Related

Multiple Python Version Env Setup

I'm working in the VFX industry and we deal with different software packages that ship with their own Python interpreter. We are running on Linux and use modules to handle our environments to make sure that people are using the correct version of all applications depending on the project they are working on.
Since months, we are trying to setup an environment that supports multiple versions of Python. And what is blocking right now are additional Python libraries that we are using in our in-house tools, like sqlalchemy, psycopg2, openpyxl, zmq, etc.
So far, for each project, we have config file that defines the version of each package to be use including the additional Python modules. And to use the correct Python version of each Python module, we look up the main Python interpreter defined in that same modules definition file. This works as long as the major and minor versions of all Python interpreters do line up.
But now, I would like to start an application that ships with a Python 3.7 interpreter and another application with a Python 3.9 interpreter and so on. All applications do you use our in-house tools which need the additional Python modules. Of course, this fails when trying to import any additional module.
For now, the only solution that I see is to install the corresponding Python modules in the 'site-packages' of each application that comes with its own Python interpreter. That should work. But this means that we have to install for each application version all necessary Python modules (ideally, the same version of it to avoid compatibility issues) and when we decide to update one of them, this needs to be done again for all 3rd party applications.
That does not sound super-efficient to me.
Do you have similar experiences and what did you came up with to handle this? I know, that there are more advanced packages like rez to handle complex environment setups, but although I do not know the details of rez, I could imagine that the problems stays the same. I guess that it is not possible to globally populate PYTHONPATH with additional modules so that it works on multiple Python interpreter versions.
Another solution that I could imagine is to make sure that on startup of each application that needs additional Python modules, we do our own sys.path modification depending on the interpreter version. That would imply some coding but we could keep a global version handling without installing them everywhere.
Anyway, if you have any further hints, please let me know.
Greets,
Carlo

Python sys.path vs import

What I wish to understand is what is good/bad practice, and why, when it comes to imports. What I want to understand is the agreed upon view by the community on the matter, if there's any one such in some PEP document or similar.
What I see normally is people have a python environment, use conda/pip to install packages and all that's needed to do in the code is to use "import X" (and variants). In my current understanding this is the right way to do things.
Whenever python interacts with C++ at my firm, though, it always ends up with the need to use sys.path and absolute imports (but we have some standardized paths to use as "base" and usually define relative paths based on those).
There are 2 major cases:
Python wrapper for C++ library (pybind/ctype/etc) - in this case the user python code must use sys.path to specify where the C++ library to import is.
A project that establish communications between python and C++ (say C++ server, python clients, TCP connections and flatbuffer serialization between the two) - Here the python code lives together with the C++ code, and if it some python files end up using sys.path to import python modules from the same project but that live in a different directory - Essentially we deploy the python together with the C++ through our C++ deployment procedure.
I am not fully sure if we can do something better for case #1, but case #2 seems quite unnecessary, and basically forced just by the choice to not deploy the python code through a python package manager. Choice ends up forcing us to use sys.path on both the library and user code.
This seems very bad to me as basically this way of doing things doesn't allow us to fully manage our python environments (since we have some libraries that we import even thought they are not technically installed in the environment), and that is probably why I have a negative view of using sys.path for imports. But I need to find if I'm right, and if so I need some official (or almost) documents to support my case if I'm to propose fixes to our procedures.

For your scenario 2, my understanding is you have some C++ and accompanying python in one place, and a separate python project wants to import that python.
Could you structure the imported python as a package and install it to your environment with pip install path/to/package? If it's a package that you'll continue to edit, you can add the -e flag to pip install so that when the package changes your imports get the latest code.

setuptools "eager_resources" to executable directory

I maintain a Python utility that allows bpy to be installable as a Python module. Due to the hugeness of the spurce code, and the length of time it takes to download the libraries, I have chosen to provide this module as a wheel.
Unfortunately, platform differences and Blender runtime expectations makes support for this tricky at times.
Currently, one of my big goals is to get the Blender addon scripts directory to install into the correct location. The directory (simply named after the version of Blender API) has to exist in the same directory as the Python executable.
Unfortunately the way that setuptools works (or at least the way that I have it configured) the 2.79 directory is not always placed as a sibling to the Python executable. It fails on Windows platforms outside of virtual environments.
However, I noticed in setuptools documentation that you can specify eager_resources that supposedly guarantees the location of extracted files.
https://setuptools.readthedocs.io/en/latest/setuptools.html#automatic-resource-extraction
https://setuptools.readthedocs.io/en/latest/pkg_resources.html#resource-extraction
There was a lot of hand waving and jargon in the documentation, and 0 examples. I'm really confused as to how to structure my setup.py file in order to guarantee the resource extraction. Currently, I just label the whole 2.79 directory as "scripts" in my setuptools Extension and ship it.
Is there a way to write my setup.py and package my module so as to guarantee the 2.79 directory's location is the same as the currently running python executable when someone runs
py -3.6.8-32 -m pip install bpy
Besides simply "hacking it in"? I was considering writing a install_requires module that would simply move it if possible but that is mangling with the user's file system and kind of hacky. However it's the route I am going to go if this proves impossible.
Here is the original issue for anyone interested.
https://github.com/TylerGubala/blenderpy/issues/13
My build process is identical to the process descsribed in my answer here
https://stackoverflow.com/a/51575996/6767685

Maybe try the data_files option of distutils/setuptools.
You could start by adding data_files=[('mydata', ['setup.py'],)], to your setuptools.setup function call. Build a wheel, then install it and see if you can find mydata/setup.py somewhere in your sys.prefix.
In your case the difficult part will be to compute the actual target directory (mydata in this example). It will depend on the platform (Linux, Windows, etc.), if it's in a virtual environment or not, if it's a global or local install (not actually feasible with wheels currently, see update below) and so on.
Finally of course, check that everything gets removed cleanly on uninstall. It's a bit unnecessary when working with virtual environments, but very important in case of a global installation.
Update
Looks like your use case requires a custom step at install time of your package (since the location of the binary for the Python interpreter relative to sys.prefix can not be known in advance). This can not be done currently with wheels. You have seen it yourself in this discussion.
Knowing this, my recommendation would be to follow the advice from Jan Vlcinsky in his comment for his answer to this question:
Post install script after installing a wheel.
Add an extra setuptools console entry point to your package (let's call it bpyconfigure).
Instruct the users of your package to run it immediately after installing your package (pip install bpy && bpyconfigure).
The purpose of bpyconfigure should be clearly stated (in the documentation and maybe also as a notice shown in the console right after starting bpyconfigure) since it would write into locations of the file system where pip install does not usually write.
bpyconfigure should figure out where is the Python interpreter, and where to write the extra data.
The extra data to write should be packaged as package_data, so that it can be found with pkg_resources.
Of course bpyconfigure --uninstall should be available as well!

alternatives to DYLD_LIBRARY_PATH/LD_LIBRARY_PATH

I'm developing python C++ extensions for use in both OSX and linux. Currently, I can run my code with a wrapper script wrapper.sh:
#!/bin/bash
trunk=`dirname $0`
trunk=`cd $trunk; pwd`
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:$trunk/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$trunk/lib/:$trunk/src/hdf5/lib/:$trunk/src/python/lib
$trunk/src/python/bin/python "$#"
which is able to set up my run like this: wrapper.sh app.py
What I would like to do is to eliminate the need for wrapper.sh, so I need alternatives for DYLD_LIBRARY_PATH and LD_LIBRARY_PATH. I can not put my libraries in some standard location like /usr/local/lib because on my machine, I maintain several independent instances of my libraries. That is, my libraries need to be kept somewhere relative to my installation path. I can't put these environment variables in my login script for the same reason. Currently, I need to call one of my wrapper.sh scripts to use the associated libraries. My goal is to be able to run merely app.py, which if it lives in my installation path, should be able to find its associated python and libraries. The purpose is to simplify execution for users, and to simplify usage of external tools like nosetests.
One alternative seems to be using rpath when I build my version of python:
./configure --enable-shared --prefix=$(CURDIR)/$(PYTHON_DIR) LDFLAGS="-Wl,-rpath,$(CURDIR)/lib/ -Wl,-rpath,$(CURDIR)/src/hdf5/lib -Wl,-rpath,$(CURDIR)/src/python/lib"
This trick seems to work fine on linux, even though one of my libraries ended up needing to be copied directly into trunk/src/python/lib/python2.6/lib-dynload for some reason unclear to me. However, this trick is not working on OSX; it looks like I need to run install_name_tool on all my dylibs libraries.
The other alternative I came up with was to do something like this:
ln -s wrapper.sh python
so that my scripts could all use #! ../python, but I'm getting Unmatched ". errors. Same thing if I use #! ../wrapper.sh. I'm not really an expert in bash...
However, these all seem so unnecessarily complicated, and surely this is something that other people have solved?? Thanks for any advice!

For python extensions, consider using PYTHONPATH: the Python interpreter will search the PYTHONPATH for .py/.pyc/.pyo/.so modules, as well as packages. See docs for Python 2.x as well as docs for Python 3.x; specifically the section named "The Module Search Path" on both pages. This also references information that seems to indicate that it is possible to update the module search path at runtime, which, if true, means that you could add all that logic to your program and it can hunt for its libraries on its own (say if it installs a copy in /usr/libexec/pkgname/... somewhere or something).
For all but the most complex of cases, though, setting PYTHONPATH and using a shell-script or native-compiled binary wrapper to start the core program is an okay approach, and one that is also used in other language environments including Mono and Java.

Not sure if this would be an acceptable (partial) solution in your circumstances, but another way to get libraries noticed by ld on linux is to add the path to the libraries to /etc/ld.so.conf and then runldconfig
For the Mac I don't remember the details, but I think Apple provide some resources for distributing apps packaged as a .app which includes some default locations (relative to the root of the .app) for libraries, or "frameworks" as they call them. Would require some googling from there - sorry can't help further on that but hope you get some progress :-)

Is there a standard way to make sure a python script will be interpreted by python2 and not python3?

Is there a standard way to make sure a python script will be interpreted by python2 and not python3? On my distro, I can use #!/usr/bin/env python2 as the shebang, but it seems not all distros ship "python2". I could explicitly call a specific version (eg. 2.6) of python, but that would rule out people who don't have that version.
It seems to me that this is going to be increasingly a problem when distros will start putting python3 as the default python interpreter.

http://docs.python.org/library/sys.html#sys.version_info
using the sys module you can determine the version of python that is running and raise an exception or exit or whatever you like.
UPDATE:
You could use this to call the appropriate interpreter. For example, set up a small script that does the checking for you, and use it in the shbang. It would check the python version running, and if not what you want, looks for one you want. Then it would run the script in that version of python (or fail if nothing good was found).

This is a bit of a messy issue during what will be a very long transition time period. Unfortunately, there is no fool-proof, cross-platform way to guarantee which Python version is being invoked, other than to have the Python script itself check once started. Many, if not most, distributions that ship Python 3 are ensuring the generic python command is aliased by default to the most recent Python 2 version while python3 is aliased to the most recent Python 3. Those distributions that don't should be encouraged to do so. But there is no guarantee that a user won't override that. I think the best practice available for the foreseeable future is to for packagers, distributors, and users to assume python refers to Python 2 and, where necessary, build a run-time check into the script.

Using sys.version_info you can do a simple value test against it. For example if you only want to support version 2.6 or lower:
import sys
if sys.version_info > (2,6):
sys.exit("Sorry, only we only support up to Python 2.6!")

Not quite the same situation, but the company I work for has an app that can run Python scripts (among its many features). After numerous support issues involving Python installations on various platforms, we decided to just install our own Python interpreter with the app. That way we know exactly where it is installed and what version it is. This approach may be too heavyweight for your needs (the Python package is only about 10% of our app's bits) but it definitely works.

Depends on how you're distributing it, I guess.
If you're using a normal setup.py file to manage your distribution, have it bomb out if the user is trying to install it in Python 3.
Once it's installed, the shebang of the console script created by (say) setuptools will likely be linked to the specific interpreter used to install it.
If you're doing something weird for your installation, you can in whatever installation script you're using look for python interpreters and store a choice. You might first check whether whatever is called "python" is a 2.x. If not, check for "python2.7", "python2.6", etc to see what's available.

As I understand different distros will be in different locations in your drive. Here are some suggestions that come to mind -
You could use UNIX alias to create shortcuts pointing to the different distros. Eg: alias py2="/usr/bin/python2.X". So when you run your script you could use py2 xx.py
Or other way could be to modify your PYTHON_PATH environment variable.
Or if I am not wrong there is a provision in sys module to get the current python version number. You could get that & deal appropriately.
This should do it...

You can use the autotools to pick a Python 2 interpreter. Here is how to do that. Guaranteeing a correct shebang may be tricky to do elegantly; here is one way to do that. It may be easier to simply have a light Bash wrapper script, wrapper.sh.in that looks something like:
#!/bin/bash
PYTHON2="#PYTHON#" #That first link enables this autotool variable
"$PYTHON2" "$#" #Call the desired Python 2 script with its arguments
Call wrapper.sh (after a ./configure) like:
./wrapper.sh my_python2_script.py --an_option an_argument

I believe this will do what you want, namely test for a non-specific version of Python less than 3.x (as long as it doesn't contain a from __future__ import print_function statement).
try:
py3 = eval('print')
except SyntaxError:
py3 = False
if py3: exit('requires Python 2')
...
It works by testing to see if print is a built-in function as opposed to a statement, as it is in Python3. When it's not a function, the eval() function will raise an exception, meaning the code is running on a pre-Python 3.0 interpreter with the caveat mentioned above.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.