There are many of similar questions about PYTHONPATH and imports but I didn't find exactly what I needed.
I have a git repository that contains a few python helper scripts. The scripts are naturally organized in a few packages. Something like:
scripts/main.py
scripts/other_main.py
scripts/__init__.py
a/foo.py
a/bar.py
a/__init__py
b/foo.py
b/bar.py
b/__init__.py
__init__.py
scripts depends on a and b. I'm using absolute import in all modules. I run python3 scripts/main.py. Everything works as long as I set up PYTHONPATH to the root of my project.
However, I'd like to avoid users the hassle of setting up an environment variable.
What would be the right way to go? I expected this to work like in java, where the current dir is in the classpath by default but it doesn't seem to be the case. I've also tried relative import without success.
EDIT: it seems to work if I remove the top-level __init__.py
Firstly, you're right in that I don't think you need the top-level __init__.py. Removing it doesn't solve any import error for me though.
You won't need to set PYTHONPATH and there are a few alternatives that I can think of:
Use a virtual environment (https://virtualenv.pypa.io/en/latest/). This would also require you to package up your code into an installable package (https://packaging.python.org/). I won't explain this option further since it's not directly related to your question.
Move your modules under your scripts directory. Python automatically adds the script's directory into the Python path.
Modify the sys.path variable in your scripts so they can find your local modules.
The second option is the most straightforward.
The third option would require you to add some python code to the top of your scripts, above your normal imports. In your main.py it would look like:
#!/usr/bin/env python
import os.path, sys
sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
import a
import b
What this does is:
Take the filename of the script
calculate the parent directory of the directory of the script
Prepend that directory to sys.path
Then do normal imports of your modules
Related
I been reading a lot on how to set up the projects and having the __init__.py in the folder structures and also one above in the root folder where the project is been sitting.
I have this folder structure, and running in pycharm work, since it adds the path to environment variables when it starts.
C:\test_folder\folder_struture\project
C:\test_folder\folder_struture\project\pak1
C:\test_folder\folder_struture\project\pak1\pak_num1.py
C:\test_folder\folder_struture\project\pak1\__init__.py
C:\test_folder\folder_struture\project\program
C:\test_folder\folder_struture\project\program\program.py
C:\test_folder\folder_struture\project\program\__init__.py
C:\test_folder\folder_struture\project\__init__.py
C:\test_folder\folder_struture\__init__.py
When I try to run program.py where I have:
from project.pak1 import pak_num1
I get the error that the module doesn't exists. when I add the project to the PYTHONPATH variable or the PATH variable in my windows machine, now everything works fine. Is it possible that the tutorials are missing the part of setting the root folder of the project into the environs since they assume you already did it?
For every project do I need to put the root project into the environment variables or is there is a way for python to recognize that is in a python module/python structure?
Adding it to the environ will let me import absolute,
but if I try to do and relative with
from ..pak1 import pak_num1
I get:
ImportError: attempted relative import with no known parent package
If I run program.py does it look for the __init__.py in the same folder and if it find it, does it go one level up to find another __init__.py and so on to get the structure?
If you are writing lots of little scripts and start wanting to organize some of them into utility packages, mostly to be used by other scripts (ouside those packages), put all the scripts that are to be called from the command-line (your entry-points for execution, or main scripts, or whatever name you want to call this scripts that run as commands from the command line) side-by-side, in the same folder as the root folders of all your packages.
Then you can import (anything) from any package from the top-level scripts. Starting the execution from the top level scripts not only gives access to any package, as also allows for all the package internal imports to work, given that an __init__.py file rests in all package/sub-package folder/sub-folder.
For code inside a package to import from another sibling package (sibling at the top-level folder) you need either to append to sys.path the sibling package, or for example, to wrap everything yet in another folder again with an __init__.py file there. Then again you should start execution from outside this overall package, not from the scripts that were top-level before you do this.
Think of packages as something to be imported for use, not to start running from a random inner entry point.
Many times a better alternative is to configure a virtualenv, and every package you install in the environment becomes known in that environment. This also isolates the dependencies from project to project, including the Python version at use if you need. Note that this is solving also a different problem, but a hairy one, by the way.
I have inherited quite a bit of python code and all over it is the following snippet which adds the file path of the parent directory to the system path.
from os.path import join, dirname
sys.path.insert(0, join(dirname(sys.argv[0]), "..\\"))
from utilities import find, execute
My understanding of this is that it adds a path to the search path. Which during the running of a program adds numerous pathsto the search path and presumably make it slower. As each file adds it's own parent directory.
I prefer the syntax
from scm_tools.general.utilities import find, execute
because this is easier to understand and far less code. This might have implications if I am moving the code around but it's all in a single package.
Am I right in assuming that inside a package that the latter syntax is the more pythonic way of doing things ?
or does it not really matter as under the hood python is doing some magic ?
Use relative imports when you can:
from ..utilities import find, execute
This requires that you stay within the module space, which means each directory you traverse requires an __init__.py file.
There are cases where this breaks down, for example if your tests directory isn't inside the module structure. In these cases you need to edit the path, but you shouldn't edit the path blindly like the above example.
Either add to the PYTHONPATH environment variable before you code starts so you can always reference the root of the directory or only add paths that aren't already in the sys.path and try avoiding adding anything but module roots.
The PYTHONPATH change is a bit risky for code you wish to distribute. It's easy to have a change in PYTHONPATH you can't control or for you to not define that addition in a way that transfers to distributed code. It also adds an annoying module requirement that other's have to deal with -- so reserve this for adding whole swaths of modules that you want to include, like custom site-package directories. It's almost always better to use virtualenv for such situations.
If you do need to change a sys.path inside code you should try to at least avoid clobbering it all over the place or you'll have a headache trying to fix it when it goes awry. To avoid this try to only add root module paths so you can always import in a root.submodule.desiredmodule pattern. Additionally check if a path is already present before you insert it into sys.path to avoid very long sys.paths. In my test directories I oftentimes have an importable file that fixes the sys.path to the root of the directory structures I am testing:
# Add parent import capabilities
parentdir = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))
if parentdir not in sys.path:
sys.path.insert(0, parentdir)
So I structure almost all of my projects like this:
root/
|- scripts/
|- src/
|- etc. ...
I put runnable scripts in scripts/ and importable modules in src/, and by convention run every script from the root directory (so I always stay in root, then type 'python scripts/whatever')
In order to be able to import code from src/, I've decided to start every script with this:
import sys
import os
# use this to make sure we always have the dir we ran from in path
sys.path.append(os.getcwd())
To make sure root/ is always in the path for scripts being run from root.
My question is: is this considered bad style? I like my conventions of always running scripts from the root directory, and keeping my scripts separate from my modules, but it seems like a weird policy to always edit the path variable for every script I write.
If this is considered bad style, could you provide alternative recommendations? Either different ways for me to keep my existing conventions or recommendations for different ways to structure projects would be great!
Thanks!
My recommendation is to use:
root/$ python -m scripts.whatever
With the -m you use the . notation rather than the file path and you won't need to setup path code in each of the scripts because the -m tells Python to start looking for imports in the directory where you called Python.
If your file structure also happens to be installed using setup.py and may be found within the site_packages there are some other things to consider:
If you call -m from the root of the directory structure (as I've shown above) it will call the code found in your directories
If you call -m from anywhere else, it will find the installed code from sys.path and call that
This can be a subtle gotcha if you happen to be running an interpreter that has your package installed and you are trying to make changes to your scripts locally and can't figure out why your changes aren't there (this happened to a coworker of mine who wasn't using virtual environments).
The first entry of sys.path is the directory of the current script, according to the docs. In the following setup, I would like to change this default. Imagine the following directory structure:
src/
core/
stuff/
tools/
tool1.py
tool2.py
gui/
morestuff/
gui.py
The scripts tool*.py and gui.py are intended to be run as scripts, like the following:
python src/core/tools/tool2.py
python src/gui/gui.py
Now all tools import from src.core.stuff, and the GUI needs gui.morestuff. This means that sys.path[0] should point to src/, but it points to src/core/tools/ or src/gui/ by default.
I can adjust sys.path[0] in every script (with a construct like the following, e.g., at the beginning of gui.py):
if __name__ == '__main__':
if sys.path[0]: sys.path[0] = os.path.dirname(os.path.abspath(sys.path[0]))
However, this is sort of redundant, and it becomes tedious for a mature code base with thousands of scripts. I also know the -m switch:
python -m gui.gui
But this requires the current directory to be src/.
Is there a better way to achieve the desired result, e.g. by modifying the __init__.py files?
EDIT: This is for Python 2.7:
~$ python -V
Python 2.7.3
The only officially approved way to run a script that is in a package is by using the -m flag. While you could run a script directly and try to do sys.path manipulations yourself in each script, it's likely to be a big pain. If you move a script between folders, the logic for rewriting sys.path may also need to be changed to reflect the new location. Even if you get sys.path right, explicit relative imports will not work correctly.
Now, making python -m mypackage.mymodule work requires that either you be in the project's top level folder (src in your case), or for that top level folder to be on the Python search path. Requiring you to be in a specific folder is awkward, and you've said that you don't want that. Getting src into the search path is our goal then.
I think the best approach is to use the PYTHONPATH environment variable to point the interpreter to your project's src folder so that it can find your packages from anywhere.
This solution is simple to set up (the environment variable can be be set automatically in your .profile, .bashrc or some other equivalent place), and will work for any number of scripts. If you move your project, just update your environment settings and you'll be all set, without needing to do any more work for each script.
You've got three basic options here. I've been through all three in both a production environment and personal projects. In many ways they build on each other. However, my advice is to just skip to the last one.
The fundamental problem is that you need your ./src directory to be in the python search path. This is really what python packaging is all about.
PYTHONPATH
The most straightforward, user defined way to adjust your python path is through the environment variable PYTHONPATH. You can set it at run time, doing something like:
PYTHONPATH=/src python src/gui/gui.py
You can of course also set this up in your global environment so hopefully all processes that need it will find the correct PYTHONPATH. But, just remember, you'll always forget one. Usually at 3 AM when your cron task finally runs.
Site Packages
To avoid needing an environment variable, your options are pretty much to include your software in an existing entry in the source path, or find some additional way to add a new search path. So this can mean dropping the contents of your src directory into /usr/lib/python2.7/site-packages or wherever your system site-packages is located.
Since you may not want to actually include the code in site-packages, you can create a symlink for your two sub-packages.
This is of course less than ideal for a number of reasons. If you're not careful with naming then suddenly every python program on the machine is exposed to potential name conflicts. You're exposing your software to every user on the machine. You might run into issues if python get's updated. If you add a new sub-package, now you have to create a new symlink.
A slightly better approach is to include a .pth file somewhere in your site-packages. When python encounters these files, it adds the contents (which is supposed to be the name of a directory) to the search path. This avoids the problem of having to remember to add a new symlink for each new sub-package.
virtualenv and packaging
The best solution is to just bite the bullet and do real python packaging. This, combined with great tools like virtualenv and pip let you have an isolated (or semi-isolated) python environment.
Under virtualenv, you would have a custom site-packages for just your project where you can easily install your software into it, avoiding all the problems of the earlier solutions. virtualenv also makes it easy to maintain executable scripts so that the python environment it runs under is exactly as you expect.
The one downside is that you have to write and maintain a setup.py which will instruct pip (the python installer) to include your software in the virtualenv. The contents would be something like:
!/usr/bin/env python
# -*- coding: utf-8 -*-
from distutils.core import setup
setup(
name='myproject',
package_dir={'myproject': 'src'},
scripts=['src/gui/gui.py', 'src/core/tools/tool1.py', 'src/core/tools/tool2.py']
)
So, to setup this environment, it's going to look something like this:
virtualenv env
env/bin/pip install -e setup.py
To run your script, then you'd just do something like:
env/bin/tool1.py
I wanted to do this to avoid having to set PYTHONPATH in the first place
There are other places you can hook into Python's sys.path initialization, using the site module, which is (by default) automatically imported when Python initializes.
Based on the this code in site.py...
# Prefixes for site-packages; add additional prefixes like /usr/local here
PREFIXES = [sys.prefix, sys.exec_prefix]
...it looks as if the intention was that this file was designed to be modified after installation, which is one option, although it also provides other ways you can influence sys.path, e.g. by placing a .pth file somewhere inside your site-packages directory.
Assuming the desired result is to make the code work 'out of the box', this would work, but only for all users on a single system.
If you need it to work on multiple systems, then you'd have to apply the same changes to all systems.
For deployment, this is no big deal. Indeed, many Python packages already do something like this. e.g. on Ubuntu...
~$ dpkg -L python-imaging | grep pth
/usr/share/pyshared/PIL.pth
/usr/lib/python2.7/dist-packages/PIL.pth
...but if your intention is to make it easy for multiple concurrent developers, each using their own system, you may be better off sticking with the current option of adding some 'boilerplate' code to every Python module which is intended to be run as a script.
There may be another option, but it depends on exactly what you're trying to achieve.
I am tinkering with some pet projects with Python in Linux (Mint 13) and I plan to do the following:
Create a Dropbox subfolder named "pybin" where I put all my home-made python modules;
Put a symlink to this folder somewhere in the system (first candidate: /usr/lib/python2.7/dist-packages, which is in sys.path, or some similar path);
Then I just do import mymodule from any python session, and the module is imported.
I tried it and it didn't work. I suspect this has to do with differences between modules and packages, and __init__.py files, but I confess that everytime I read something about this stuff I get pretty confused. Besides learning a bit more about this, all I really want to do is find a way to import my modules the described way. It is crucial that the actual folder is inside Dropbox (or any other file-syncing folder), not in a system folder.
Thanks for any help!
Why not simply set the PYTHONPATH envvar in your .bash_profile. That way every time you execute a bash shell (normally happens upon login), this environment variable will be set the wherever you place your user defined modules. The python interpreter uses this variable to determine where to search for module imports:
PYTHONPATH="${PYTHONPATH}:/path/to/some/cool/python/package/:/path/to/another/cool/python/package/"
export PYTHONPATH