Proper initialization of sys.path when main script is in a submodule

Proper initialization of sys.path when main script is in a submodule - python

The first entry of sys.path is the directory of the current script, according to the docs. In the following setup, I would like to change this default. Imagine the following directory structure:
src/
core/
stuff/
tools/
tool1.py
tool2.py
gui/
morestuff/
gui.py
The scripts tool*.py and gui.py are intended to be run as scripts, like the following:
python src/core/tools/tool2.py
python src/gui/gui.py
Now all tools import from src.core.stuff, and the GUI needs gui.morestuff. This means that sys.path[0] should point to src/, but it points to src/core/tools/ or src/gui/ by default.
I can adjust sys.path[0] in every script (with a construct like the following, e.g., at the beginning of gui.py):
if __name__ == '__main__':
if sys.path[0]: sys.path[0] = os.path.dirname(os.path.abspath(sys.path[0]))
However, this is sort of redundant, and it becomes tedious for a mature code base with thousands of scripts. I also know the -m switch:
python -m gui.gui
But this requires the current directory to be src/.
Is there a better way to achieve the desired result, e.g. by modifying the __init__.py files?
EDIT: This is for Python 2.7:
~$ python -V
Python 2.7.3

The only officially approved way to run a script that is in a package is by using the -m flag. While you could run a script directly and try to do sys.path manipulations yourself in each script, it's likely to be a big pain. If you move a script between folders, the logic for rewriting sys.path may also need to be changed to reflect the new location. Even if you get sys.path right, explicit relative imports will not work correctly.
Now, making python -m mypackage.mymodule work requires that either you be in the project's top level folder (src in your case), or for that top level folder to be on the Python search path. Requiring you to be in a specific folder is awkward, and you've said that you don't want that. Getting src into the search path is our goal then.
I think the best approach is to use the PYTHONPATH environment variable to point the interpreter to your project's src folder so that it can find your packages from anywhere.
This solution is simple to set up (the environment variable can be be set automatically in your .profile, .bashrc or some other equivalent place), and will work for any number of scripts. If you move your project, just update your environment settings and you'll be all set, without needing to do any more work for each script.

You've got three basic options here. I've been through all three in both a production environment and personal projects. In many ways they build on each other. However, my advice is to just skip to the last one.
The fundamental problem is that you need your ./src directory to be in the python search path. This is really what python packaging is all about.
PYTHONPATH
The most straightforward, user defined way to adjust your python path is through the environment variable PYTHONPATH. You can set it at run time, doing something like:
PYTHONPATH=/src python src/gui/gui.py
You can of course also set this up in your global environment so hopefully all processes that need it will find the correct PYTHONPATH. But, just remember, you'll always forget one. Usually at 3 AM when your cron task finally runs.
Site Packages
To avoid needing an environment variable, your options are pretty much to include your software in an existing entry in the source path, or find some additional way to add a new search path. So this can mean dropping the contents of your src directory into /usr/lib/python2.7/site-packages or wherever your system site-packages is located.
Since you may not want to actually include the code in site-packages, you can create a symlink for your two sub-packages.
This is of course less than ideal for a number of reasons. If you're not careful with naming then suddenly every python program on the machine is exposed to potential name conflicts. You're exposing your software to every user on the machine. You might run into issues if python get's updated. If you add a new sub-package, now you have to create a new symlink.
A slightly better approach is to include a .pth file somewhere in your site-packages. When python encounters these files, it adds the contents (which is supposed to be the name of a directory) to the search path. This avoids the problem of having to remember to add a new symlink for each new sub-package.
virtualenv and packaging
The best solution is to just bite the bullet and do real python packaging. This, combined with great tools like virtualenv and pip let you have an isolated (or semi-isolated) python environment.
Under virtualenv, you would have a custom site-packages for just your project where you can easily install your software into it, avoiding all the problems of the earlier solutions. virtualenv also makes it easy to maintain executable scripts so that the python environment it runs under is exactly as you expect.
The one downside is that you have to write and maintain a setup.py which will instruct pip (the python installer) to include your software in the virtualenv. The contents would be something like:
!/usr/bin/env python
# -*- coding: utf-8 -*-
from distutils.core import setup
setup(
name='myproject',
package_dir={'myproject': 'src'},
scripts=['src/gui/gui.py', 'src/core/tools/tool1.py', 'src/core/tools/tool2.py']
)
So, to setup this environment, it's going to look something like this:
virtualenv env
env/bin/pip install -e setup.py
To run your script, then you'd just do something like:
env/bin/tool1.py

I wanted to do this to avoid having to set PYTHONPATH in the first place
There are other places you can hook into Python's sys.path initialization, using the site module, which is (by default) automatically imported when Python initializes.
Based on the this code in site.py...
# Prefixes for site-packages; add additional prefixes like /usr/local here
PREFIXES = [sys.prefix, sys.exec_prefix]
...it looks as if the intention was that this file was designed to be modified after installation, which is one option, although it also provides other ways you can influence sys.path, e.g. by placing a .pth file somewhere inside your site-packages directory.
Assuming the desired result is to make the code work 'out of the box', this would work, but only for all users on a single system.
If you need it to work on multiple systems, then you'd have to apply the same changes to all systems.
For deployment, this is no big deal. Indeed, many Python packages already do something like this. e.g. on Ubuntu...
~$ dpkg -L python-imaging | grep pth
/usr/share/pyshared/PIL.pth
/usr/lib/python2.7/dist-packages/PIL.pth
...but if your intention is to make it easy for multiple concurrent developers, each using their own system, you may be better off sticking with the current option of adding some 'boilerplate' code to every Python module which is intended to be run as a script.
There may be another option, but it depends on exactly what you're trying to achieve.

Related

How can I add paths to the $PYTHONPATH or $PATH variables when a script within my python project is executed?

I have a python project with multiple scripts (scriptA, scriptB, scriptC) that must be able to find packages located in subpath of the project that is not a python package or module. The organization is like so:
|_project
|_scriptA.py
|_scriptB.py
|_scriptC.py
|_thrift_packages
|_gen-py
|_thriftA
|__init__.py
|_thriftA.py
On a per script basis I am adding the absolute path to this directory to sys.path
Is there a way that I can alter the PYTHONPATH or sys.path every time a script is executed within the project so that I do not have to append the path to this directory to sys.path on a per-script basis?

You have an XY problem, albeit an understandable one since the "proper" way to develop a Python project is not obvious, and there aren't a lot of good guides to getting started (there are also differing opinions on this, especially in the details, though I think what I'll describe here is fairly commonplace).
First, at the root of your project, you can create a setup.py. These days this file can just be a stub; eventually the need for it should go away entirely, but some tools still require it:
$ cat setup.py
#!/usr/bin/env python
from setuptools import setup
setup()
Then create a setup.cfg. For most Python projects this should be sufficient--you only need to put additional code in setup.py for special cases. Here's a template for a more sophisticated setup.cfg, but for your project you need at a minimum:
$ cat setup.cfg
[metadata]
name = project
version = 0.1.0
[options]
package_dir =
=thrift_packages/gen-py
packages = find:
Now create and activate a virtual environment for your project (going in-depth into virtual environments will be out of scope for this answer but I'm happy to answer follow-up questions).
$ mkdir -p ~/.virtualenvs
$ python3 -m venv ~/.virtualenvs/thrift
$ source ~/.virtualenvs/thrift/bin/activate
Now you should have a prompt that look something like (thrift) $ indicating that you're in an isolated virtualenv. Here you can install any dependencies for your package, as well as the package itself, without interfering with your main Python installation.
Now install your package in "editable" mode, meaning that the path to the sources you're actively developing on will automatically be added to sys.path when you run Python (including your top-level scripts):
$ pip install -e .
If you then start Python and import your package you can see, for example, something like:
$ python -c 'import thriftA'; print(thriftA)
<module 'thriftA' from '/path/to/your/source/code/project/thrift_packages/gen-py/thriftA/__init__.py'>
If this seems like too much trouble, trust me, it isn't. Once you get the hang of this (and there are several project templates, e.g. made with cookie-cutter to take the thinking out of it) you'll see that it makes things like paths less trouble). This is how I start any non-trivial project (anything more than a single file); if you set everything up properly you'll never have to worry about fussing with sys.path or $PYTHONPATH manually.
In this guide, although the first part is a bit application-specific, if you ignore the specific purpose of this code a lot of it is actually pretty generic, especially the section titled "Packaging our package", which repeats some of this advice in more detail.
As an aside, if you're already using conda you don't need to create a virtualenv, as conda environments are just fancy virtualenvs.
An advantage to doing this the "right" way, is that when it comes time to install your package, whether by yourself, or by users, if your setup.cfg and setup.py are set up properly, then all users have to do is run pip install . (without the -e) and it should work the same way.

You should add __init__.py in each package, and then properly call all your script.
|_project
|_scriptA.py
|_scriptB.py
|_scriptC.py
|__init__.py <====== Here
|_thrift_packages
|__init__.py <====== Here
|_gen-py
|_thriftA
|__init__.py
|_thriftA.py
Assuming that, project is in your pythonpath, you can do:
from project.thrift_packages.thriftA import thriftA

yes you can.
However I think the other answers are more adapted to your current issue.
Here I just explain, how to call subprocesses with another environment, another current working directory.
This can come in handy in other situations.
You can get the current environment as a dict (os.environ) copy it to another dict, modify it and pass it to a supbprocess call. (subprocess functions have an env parameter)
import os
import subprocess
new_env = dict(os.environ)
new_env["PYTHONPATH"] = whateveryouwanthere # add here the pythonpath that you want
cmd = ["mycmd.py", "myarg1", "myarg2"] # replace with your command
subprocess.call(cmd, env=new_env) #
# or alternatively if you also want to change the current directory
# subprocess.call(cmd, env=new_env, cwd=path_of_your_choice) #

Do I need PYTHONPATH

There are many of similar questions about PYTHONPATH and imports but I didn't find exactly what I needed.
I have a git repository that contains a few python helper scripts. The scripts are naturally organized in a few packages. Something like:
scripts/main.py
scripts/other_main.py
scripts/__init__.py
a/foo.py
a/bar.py
a/__init__py
b/foo.py
b/bar.py
b/__init__.py
__init__.py
scripts depends on a and b. I'm using absolute import in all modules. I run python3 scripts/main.py. Everything works as long as I set up PYTHONPATH to the root of my project.
However, I'd like to avoid users the hassle of setting up an environment variable.
What would be the right way to go? I expected this to work like in java, where the current dir is in the classpath by default but it doesn't seem to be the case. I've also tried relative import without success.
EDIT: it seems to work if I remove the top-level __init__.py

Firstly, you're right in that I don't think you need the top-level __init__.py. Removing it doesn't solve any import error for me though.
You won't need to set PYTHONPATH and there are a few alternatives that I can think of:
Use a virtual environment (https://virtualenv.pypa.io/en/latest/). This would also require you to package up your code into an installable package (https://packaging.python.org/). I won't explain this option further since it's not directly related to your question.
Move your modules under your scripts directory. Python automatically adds the script's directory into the Python path.
Modify the sys.path variable in your scripts so they can find your local modules.
The second option is the most straightforward.
The third option would require you to add some python code to the top of your scripts, above your normal imports. In your main.py it would look like:
#!/usr/bin/env python
import os.path, sys
sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
import a
import b
What this does is:
Take the filename of the script
calculate the parent directory of the directory of the script
Prepend that directory to sys.path
Then do normal imports of your modules

Using PYTHONPATH to use a virtualenv

I have a virtualenv in a structure like this:
venv/
src/
project_files
I want to run a makefile (which calls out to Python) in the project_files, but I want to run it from a virtual environment. Because of the way my deployment orchestration works, I can't simply do a source venv/bin/activate.
Instead, I've tried to export PYTHONPATH={project_path}/venv/bin/python2.7. When I try to run the makefile, however, the python scripts aren't finding the dependencies installed in the virtualenv. Am I missing something obvious?

The PYTHONPATH environmenbt variable is not used to select the path of the Python executable - which executable is selected depends, as in all other cases, on the shell's PATH environment variable. PYTHONPATH is used to augment the search list of directories (sys.path in Python) in which Python will look for modules to satisfy imports.
Since the interpreter puts certain directories on sys.path before it actions PYTHONPATH precisely to ensure that replacement modules with standard names do not shadow the standard library names. So any standard library module will be imported from the library associated with the interpreter it was installed with (unless you do some manual furkling, which I wouldn't recommend).
venv/bin/activate does a lot of stuff that needs to be handled in the calling shell's namespace, which can make tailoring code rather difficult if you can't find a way to source the script..

You can actually just call the Python interpreter in your virtual environment. So, in your Makefile, instead of calling python, call venv/bin/python.

To run a command in a virtualenv, you could use vex utility:
$ vex venv make
You could also check, whether make PYTHON=venv/bin/python would be enough in your case.
PYTHONPATH adjusts sys.path list. It doesn't change python binary. Don't use it here.

Setting python path

I have a Django app and I'm getting an error whenever I try to run my code:
Error: No module named django_openid
Let me step back a bit and tell you how I came about this:
I formatted my computer and completely re-installed everything -- including virtualenv, and all dependent packages (in addition to Django) required for my project based on settings in my requirements.txt folder
I tried doing python manage.py syncdb and got the error
I googled the issue, and many people say it could be a path problem.
I'm confused as to how I go about changing the path variables though, and what exactly they mean. I found some documentation, but being somewhat of a hack-ish noob, it kind of goes over my head.
So my questions are:
What exactly is their purpose -- and are they on a system based level based on the version of Python or are they project dependent?
How can I see what mine are set to currently?
How can I change them (ie. where is this .profile file they talk of and can I just use a text editor)
Any input you would have would be great as this one is stumping me and I just want to get back to writing code :-)

The path is just the locations in your filesystem in which python will search for the modules you are trying to import. For example, when you run import somemodule, Python will perform a search for somemodule in all the locations contained in the path (sys.path variable).
You should check the path attribute in sys module:
import sys
print sys.path
It is just a regular list, sou you could append/remove elements from it:
sys.path.append('/path/to/some/module/folder/')
If you want to change your path for every python session you start, you should create a file to be loaded at startup, doing so:
Create a PYTHONSTARTUP environment variable and setting it to your startup file. E.g.: PYTHONSTARTUP=/home/user/.pythonrc (in a unix shell);
Edit the startup file so it contains the commands you want to be auto-executed when python is loaded;
An example of a .pythonrc could be:
import sys
sys.path.append('/path/to/some/folder/')

Do you really need to alter the path? It's always best to actually think about your reasons first. If you're only going to be running a single application on the server or you just don't care about polluting the system packages directory with potentially unnecessary packages, then put everything in the main system site-packages or dist-packages directory. Otherwise, use virtualenv.
The system-level package directory is always on the path. Virtualenv will add its site-packages directory to the path when activated, and Django will add the project directory to the path when activated. There shouldn't be a need to add anything else to the path, and really it's something you should never really have to worry about in practice.

Python path: Reusing Python module

I have written a small DB access module that is extensively reused in many programs.
My code is stored in a single directory tree /projects for backup and versioning reasons, and so the module should be placed within this directory tree, say at /projects/my_py_lib/dbconn.py.
I want to easily configure Python to automatically search for modules at the /projects/my_py_lib directory structure (of course, __init__.py should be placed within any subdirectory).
What's the best way to do this under Ubuntu?
Thanks,
Adam

You can add a PYTHONPATH environment variable to your .bashrc file. eg.
export PYTHONPATH=/projects/my_py_lib

on linux, this directory will be added to your sys.path automatically for pythonN.M
~/.local/lib/pythonN.M/site-packages/
So you can put your packages in there for each version of python you are using.
You need a copy for each version of python, otherwise the .pyc file will be recompiled every time you import the module with a different python version
This also allows fine grained control if the module only works for some of the versions of python you have installed
If you create this file
~/.local/lib/pythonN.M/site-packages/usercustomize.py
it will be imported each time you start the python interpreter

Another option is to create a soft link in /usr/lib*/python*/site-packages/:
ln -s /projects/my_py_lib /usr/lib*/python*/site-packages/
That will make the project visible to all Python programs plus any changes will be visible immediately, too.
The main drawback is that you will eventually have *.pyc files owned by root or another user unless you make sure you compile the files yourself before you start python as another user.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.