Python sys.path vs import - python

What I wish to understand is what is good/bad practice, and why, when it comes to imports. What I want to understand is the agreed upon view by the community on the matter, if there's any one such in some PEP document or similar.
What I see normally is people have a python environment, use conda/pip to install packages and all that's needed to do in the code is to use "import X" (and variants). In my current understanding this is the right way to do things.
Whenever python interacts with C++ at my firm, though, it always ends up with the need to use sys.path and absolute imports (but we have some standardized paths to use as "base" and usually define relative paths based on those).
There are 2 major cases:
Python wrapper for C++ library (pybind/ctype/etc) - in this case the user python code must use sys.path to specify where the C++ library to import is.
A project that establish communications between python and C++ (say C++ server, python clients, TCP connections and flatbuffer serialization between the two) - Here the python code lives together with the C++ code, and if it some python files end up using sys.path to import python modules from the same project but that live in a different directory - Essentially we deploy the python together with the C++ through our C++ deployment procedure.
I am not fully sure if we can do something better for case #1, but case #2 seems quite unnecessary, and basically forced just by the choice to not deploy the python code through a python package manager. Choice ends up forcing us to use sys.path on both the library and user code.
This seems very bad to me as basically this way of doing things doesn't allow us to fully manage our python environments (since we have some libraries that we import even thought they are not technically installed in the environment), and that is probably why I have a negative view of using sys.path for imports. But I need to find if I'm right, and if so I need some official (or almost) documents to support my case if I'm to propose fixes to our procedures.

For your scenario 2, my understanding is you have some C++ and accompanying python in one place, and a separate python project wants to import that python.
Could you structure the imported python as a package and install it to your environment with pip install path/to/package? If it's a package that you'll continue to edit, you can add the -e flag to pip install so that when the package changes your imports get the latest code.

Related

Multiple Python Version Env Setup

I'm working in the VFX industry and we deal with different software packages that ship with their own Python interpreter. We are running on Linux and use modules to handle our environments to make sure that people are using the correct version of all applications depending on the project they are working on.
Since months, we are trying to setup an environment that supports multiple versions of Python. And what is blocking right now are additional Python libraries that we are using in our in-house tools, like sqlalchemy, psycopg2, openpyxl, zmq, etc.
So far, for each project, we have config file that defines the version of each package to be use including the additional Python modules. And to use the correct Python version of each Python module, we look up the main Python interpreter defined in that same modules definition file. This works as long as the major and minor versions of all Python interpreters do line up.
But now, I would like to start an application that ships with a Python 3.7 interpreter and another application with a Python 3.9 interpreter and so on. All applications do you use our in-house tools which need the additional Python modules. Of course, this fails when trying to import any additional module.
For now, the only solution that I see is to install the corresponding Python modules in the 'site-packages' of each application that comes with its own Python interpreter. That should work. But this means that we have to install for each application version all necessary Python modules (ideally, the same version of it to avoid compatibility issues) and when we decide to update one of them, this needs to be done again for all 3rd party applications.
That does not sound super-efficient to me.
Do you have similar experiences and what did you came up with to handle this? I know, that there are more advanced packages like rez to handle complex environment setups, but although I do not know the details of rez, I could imagine that the problems stays the same. I guess that it is not possible to globally populate PYTHONPATH with additional modules so that it works on multiple Python interpreter versions.
Another solution that I could imagine is to make sure that on startup of each application that needs additional Python modules, we do our own sys.path modification depending on the interpreter version. That would imply some coding but we could keep a global version handling without installing them everywhere.
Anyway, if you have any further hints, please let me know.
Greets,
Carlo

Is it possible to have users not pip install modules and instead include the modules used in a different folder and then import that?

I want to know if I can create a python script with a folder in the same directory with all the assets of a python module, so when someone wants to use it, they would not have to pip install module, because it would import from the directory.
Yes, you can, but it doesn't mean that you should.
First, ask yourself who is suposed to use that code.
If you plan to give it to consumers, it would be a good idea to use a tool like py2exe and create executable file which would include all modules and not allow for code to be changed.
If you plan to share it with another developer, you might want to look into virtual environments and requirements.txt file.
There are multiple reasons why sharing modules is bad idea:
It is harder to update modules later, at least without upgrading whole project.
It uses more space on version control, which can create issues on huge projects with hundreds of modules and branches
It might be illegal as some licenses specifically forbid including their code in your source code.
The pip install of some module might do different things depending on operating system version or installed packages. The modules on your machine might be suboptimal on someone else's machine, and in some instances might not even work.
And probably more that I can't think of right now.
The only situation where I saw this being unavoidable was when the module didn't support python implementation the application was running on. The module was changed, and its source was put under lib folder with the rest of the libraries.
I think you can add the directory with python modules into PYTHONPATH. Then people want to use those modules just need has this envvar set.
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH

Override the shebang mangling in python setuptools

Background
I write small python packages for a system that uses modules (https://luarocks.org/) to manage packages. For those of you who don't know it, you can run module load x and a small script is run that modifies various environmental variables to make software 'x' work, you can then undo this with module unload x.
This method of software management is nearly ubiquitous in scientific computing and has a lot of value in that arena: you can run ancient unmaintained software alongside packages that that software would interfere with, you can run multiple versions of software, which allows you to reproduce your data exactly (you can go back to old versions), and you can run frankly poorly written non updated software with outdated dependencies.
These features are great, but they create an issue with the python 2/3 split:
What if you want to write a package that works with both python 2 and 3 and use it alongside software that requires either python 2 or 3?
The way you make old python2 dependent software work on these large systems is that you make a python/2.7.x module and a python/3.5 module. When you want to run a script that uses python 2, you load that module, etc.
However, I want to write a single python package that can work in either environment, because I want that software to be active regardless of which python interpreter is being used.
This is fundamentally extremely easy: just use a #!/usr/bin/env python shebang line, done. That works. I write all my software to work with either, so no problem.
Question
The issue is: I want to use setuptools to distribute my package to other scientists in the same situation, and setup tools mangles the shebang line.
I don't want to get into a debate about whether mangling the shebang line is a good idea or not, I am sure it is since it has existed for years now in the same state. I honestly don't care, it doesn't work for me. The default setuptools install causes the software not to run because when a python interpreter's module is not loaded, that python interpreter does not function, the PYTHONPATH is totally wrong for it.
If all of my users had root access, I could use the data_files option to just copy the scripts to /usr/bin, but this is a bad idea for compatibility, and my users don't have root access anyway so it is a moot point.
Things I tried so far:
I tried setting the sys.executable to /usr/bin/env python in the setup.py file, but that doesn't work, because then the shebang is: #!"/usr/bin/env python", which obviously doesn't work.
I tried the Don't touch my shebang class idea in this question: Don't touch my shebang! (it is the bottom answer with 0 votes). That didn't work either, probably because it is written for distutils and not setuptools. Plus that question is 6 years old.
I also looked at these questions:
Setuptools entry_points/console_scripts have specific Python version in shebang
Changing console_script entry point interpreter for packaging
The methods described there do not work, the shebang line is still altered.
Creating a setup.cfg file with the contents::
[build]
executable = /usr/bin/env python
also does not change the shebang line mangling behavior.
There is an open issue on the setuptools github page that discusses something similar:
https://github.com/pypa/setuptools/issues/494
So I assume this isn't possible to do natively, but I wonder if there is a workaround?
Finally, I don't like any solution that involves asking the user to modify their install flags, e.g. with -e.
Is there anyway to modify this behavior, or is there another distribution system I can use instead? Or is this too much of an edge case and I just need to write some kind of custom installation script?
Thanks all.
Update
I think I was not clear enough in my original question, what I want the user to be able to do is:
Install the package in both python2 and python3 (the modules will go into lib/pythonx/site-lib.
Be able to run the scripts irrespective of which python environment is active.
If there is a way to accomplish this without preventing shebang munging, that would be great.
All my code is already compatible with python 2.7 and python 3.3+ out of the box, the main thing is just making the scripts run irrespective of active python environment.
I accidentally stumbled onto a workaround while trying to write a custom install script.
import os
from setuptools import setup
from setuptools.command.install import install
here = os.path.abspath(os.path.dirname(__file__))
# Generate a list of python scripts
scpts = []
scpt_dir = os.listdir(os.path.join(here, 'bin'))
for scpt in scpt_dir:
scpts.append(os.path.join(here, 'bin', scpt))
class ScriptInstaller(install):
"""Install scripts directly."""
def run(self):
"""Wrapper for parent run."""
super(ScriptInstaller, self).run()
setup(
cmdclass={'install': ScriptInstaller},
scripts=scpts,
...
)
This code doesn't do exactly what I wanted (alter just the shebang line), it actually just copies the whole script to ~/.local/bin, instead of wrapping it in::
__import__('pkg_resources').run_script()
Additionally, and more concerningly, this method makes setuptools create a root module directory plus an egg-info directory like this::
.local/lib/python3.5/site-packages/cluster
.local/lib/python3.5/site-packages/python_cluster-0.6.1-py3.5.egg-info
Instead of a single egg, which is the usual behavior::
.local/lib/python3.5/site-packages/python_cluster-0.6.1-py3.5.egg
As far as I am aware this is the behavior of the old distutils, which makes me worry that this install would fail on some systems or have other unexpected behavior (although please correct me if I am wrong, I really am no expert on this).
However, given that my code is going to be used almost entirely on linux and OS X, this isn't the end of the world. I am more concerned that this behavior will just disappear sometime very soon.
I posted a comment on an open feature request on the setuptools github page:
https://github.com/pypa/setuptools/issues/494
The ideal solution would be if I could add an executable=/usr/bin/env python statement to setup.cfg, hopefully that is reimplemented soon.
This workaround will work for me for now though. Thanks all.

Can an arbitrary Python program have its dependencies inlined?

In the JavaScript ecosystem, "compilers" exist which will take a program with a significant dependency chain (of other JavaScript libraries) and emit a standalone JavaScript program (often, with optimizations applied).
Does any equivalent tool exist for Python, able to generate a script with all non-standard-library dependencies inlined? Are there other tools/practices available for bundling in dependencies in Python?
Since your goal is to be cross-architecture, any Python program which relies on native C modules will not be possible with this approach.
In general, using virtualenv to create a target environment will mean that even users who don't have permission to install new system-level software can install dependencies under their own home directory; thus, what you ask about is not often needed in practice.
However, if you wanted to do things that are consider evil / bad practices, pure-Python modules can in fact be bundled into a script; thus, a tool of this sort would be possible for modules with only native-Python dependencies!
If I were writing such a tool, I might start the following way:
Use pickle to serialize content of modules on the "sending" side
In the loader code, use imp.create_module() to create new module objects, and assign unpickled objects to them.

Some way to create a cross-platform, self-contained, cloud-synchronized python library of modules for personal use? [duplicate]

I need to ship a collection of Python programs that use multiple packages stored in a local Library directory: the goal is to avoid having users install packages before using my programs (the packages are shipped in the Library directory). What is the best way of importing the packages contained in Library?
I tried three methods, but none of them appears perfect: is there a simpler and robust method? or is one of these methods the best one can do?
In the first method, the Library folder is simply added to the library path:
import sys
import os
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
import package_from_Library
The Library folder is put at the beginning so that the packages shipped with my programs have priority over the same modules installed by the user (this way I am sure that they have the correct version to work with my programs). This method also works when the Library folder is not in the current directory, which is good. However, this approach has drawbacks. Each and every one of my programs adds a copy of the same path to sys.path, which is a waste. In addition, all programs must contain the same three path-modifying lines, which goes against the Don't Repeat Yourself principle.
An improvement over the above problems consists in trying to add the Library path only once, by doing it in an imported module:
# In module add_Library_path:
sys.path.insert(0, os.path.join(os.path.dirname(__file__), 'Library'))
and then to use, in each of my programs:
import add_Library_path
import package_from_Library
This way, thanks to the caching mechanism of CPython, the module add_Library_path is only run once, and the Library path is added only once to sys.path. However, a drawback of this approach is that import add_Library_path has an invisible side effect, and that the order of the imports matters: this makes the code less legible, and more fragile. Also, this forces my distribution of programs to inlude an add_Library_path.py program that users will not use.
Python modules from Library can also be imported by making it a package (empty __init__.py file stored inside), which allows one to do:
from Library import module_from_Library
However, this breaks for packages in Library, as they might do something like from xlutils.filter import …, which breaks because xlutils is not found in sys.path. So, this method works, but only when including modules in Library, not packages.
All these methods have some drawback.
Is there a better way of shipping programs with a collection of packages (that they use) stored in a local Library directory? or is one of the methods above (method 1?) the best one can do?
PS: In my case, all the packages from Library are pure Python packages, but a more general solution that works for any operating system is best.
PPS: The goal is that the user be able to use my programs without having to install anything (beyond copying the directory I ship them regularly), like in the examples above.
PPPS: More precisely, the goal is to have the flexibility of easily updating both my collection of programs and their associated third-party packages from Library by having my users do a simple copy of a directory containing my programs and the Library folder of "hidden" third-party packages. (I do frequent updates, so I prefer not forcing the users to update their Python distribution too.)
Messing around with sys.path() leads to pain... The modern package template and Distribute contain a vast array of information and were in part set up to solve your problem.
What I would do is to set up setup.py to install all your packages to a specific site-packages location or if you could do it to the system's site-packages. In the former case, the local site-packages would then be added to the PYTHONPATH of the system/user. In the latter case, nothing needs to changes
You could use the batch file to set the python path as well. Or change the python executable to point to a shell script that contains a modified PYTHONPATH and then executes the python interpreter. The latter of course, means that you have to have access to the user's machine, which you do not. However, if your users only run scripts and do not import your own libraries, you could use your own wrapper for scripts:
#!/path/to/my/python
And the /path/to/my/python script would be something like:
#!/bin/sh
PYTHONPATH=/whatever/lib/path:$PYTHONPATH /usr/bin/python $*
I think you should have a look at path import hooks which allow to modify the behaviour of python when searching for modules.
For example you could try to do something like kde's scriptengine does for python plugins[1].
It adds a special token to sys.path(like "<plasmaXXXXXX>" with XXXXXX being a random number just to avoid name collisions) and then when python try to import modules and can't find them in the other paths, it will call your importer which can deal with it.
A simpler alternative is to have a main script used as launcher which simply adds the path to sys.path and execute the target file(so that you can safely avoid putting the sys.path.append(...) line on every file).
Yet an other alternative, that works on python2.6+, would be to install the library under the per-user site-packages directory.
[1] You can find the source code under /usr/share/kde4/apps/plasma_scriptengine_python in a linux installation with kde.

Categories

Resources