I have a directory with a python package as follows:
--docs/index.rst
--docs/...
--app/__init__.py
--app/foo.py
and I'm using sphinx with autodocs for documenting the app (in python 3.3).
Now, in the conf.py (inside docs/), I have
sys.path.insert(0, os.path.abspath('../app'))
I cd into docs/, run
make html
which gives me
SystemError: Parent module '' not loaded, cannot perform relative import
to all the modules that have a
from .foo import Bar
I have a clean virtualenv installation of Sphinx using
pip install Sphinx
after I created the (clean) environment for python 3.3.
What am I missing?
I was moving the project from python 2.* to python 3.* when this happened. All the project is working, but this...
Your app directory is a package. A package is a directory with __init.py__ and other files inside it.
If you put a package directory on your sys.path, all kinds of things go wrong.
Let's take an example:
root/
app/
app/__init__.py
app/spam.py
app/eggs.py
If you have root on your sys.path (because it's your current working directory, or because you do it explicitly, or because you've installed things correctly to your site-packages), then app is a package, app.spam is a module, and, within app.eggs, .spam is that module. So, everything works.
If you have app on your sys.path, then app is not a package, spam is a module, and, within eggs, .spam isn't anything. So, you can't use relative imports.
If you have both on your sys.path, then app is a package, spam and app.spam are both different modules (with the same contents, executed twice), and within app.eggs, .spam is a module, but within eggs, .spam isn't anything. This will cause you no end of problems.
So, most likely, the fix you want is this:
sys.path.insert(0, os.path.abspath('..'))
If there are other packages, or directories full of Python code that aren't packages, in .. that you don't want to autodoc (e.g., a tests directory with tests/test_spam.py), then you will need to restructure your directories to put app into some directory that doesn't have any other Python code in it, like this:
root/
src/
app/
tests/
doc/
Alternatively, if you didn't want app to be a package, but rather to be a sys.path root directory, then kill the __init__.py, and leave app directly in sys.path. But in that case, you can't use intra-package relative imports; all of the modules in app are top-level modules, and have to be imported as such.
The Packages section of the tutorial (and the rest of the chapter above it) explains some of this, but there's probably better introductory documentation out there.
For full details, in 3.3+, The import system has everything, nicely organized; for older versions, the reference docs are muddy, incomplete, and scattered; you have to start at The import statement, and then read The Knights Who Say Neeeow ... Wum ... Ping! (which is basically a PEP but 1.5 didn't have PEPs yet), and possibly even the ni documentation, if you can find it, plus various PEPs and minor change log entries that explain how things have changed between 1.5 and 2.7 or 3.2 or whatever.
Related
I am trying to build a Python package, that contains sub-modules and sub-packages ("libraries").
I was looking everywhere for the right way to do it, but amazingly I find it very complicated. Also went through multiple threads in StackOverFlow of course..
The problem is as follows:
In order to import a module or a package from another directory, it seems to me that there are 2 options:
a. Adding the absolute path to sys.path.
b. Installing the package with the setuptools.setup function in a setup.py file, in the main directory of the package - which installs the package into the site-packages directory of the specific Python version that in use.
Option a seems too clumsy for me. Option b is great, however I find it impractical becasue I am currently working and editing the package's source code - and the changes are not updating on the installed directory of the package, of course. In addition the installed directory of the package is not tracked by Git, and needless to say I use Git the original directory.
To conclude the question:
What is the best practice to import modules and sub-packages freely and nicely from within sub-directories of a Python package that is currently under construction?
I feel I am missing something but couldn't find a decent solution so far.
Thanks!
This is a great question, and I wish more people would think along these lines. Making a module importable and ultimately installable is absolutely necessary before it can be easily used by others.
On sys.path munging
Before I answer I will say I do use sys.path munging when I do initial development on a file outside of an existing package structure. I have an editor snippet that constructs code like this:
import sys, os
sys.path.append(os.path.expanduser('~/path/to/parent'))
from module_of_interest import * # NOQA
Given the path to the current file I use:
import ubelt as ub
fpath = ub.Path('/home/username/path/to/parent/module_of_interest.py')
modpath, modname = ub.split_modpath(fpath, check=False)
modpath = ub.Path(modpath).shrinkuser() # abstract home directory
To construct the necessary parts the snippet will insert into the file so I can interact with it from within IPython. I find taking the little bit of extra time to remove the reference to my explicit homefolder such that the code still works as long as users have the same relative path structure wrt to the home directory makes this slightly more portable.
Proper Python Package Management
That being said, sys.path munging is not a sustainable solution. Ultimately you want your package to be managed by a python package manger. I know a lot of people use poetry, but I like plain old pip, so I can describe that process, but know this isn't the only way to do it.
To do this we need to go over some basics.
Basics
You must know what Python environment you are working in. Ideally this is a virtual environment managed with pyenv (or conda or mamba or poetry ...). But it's also possible to do this in your global sytem Python environment, although that is not recommended. I like working in a single default Python virtual environment that is always activated in my .bashrc. Its always easy to switch to a new one or blow it away / start fresh.
You need to consider two root paths: the root of your repository, which I will call your repo path, and your root to your package, the package path or module path, which should be a folder with the name of the top-level Python package. You will use this name to import it. This package path must live inside the repo path. Some repos, like xdoctest, like to put the module path in a src directory. Others , like ubelt, like to have the repo path at the top-level of the repository. I think the second case is conceptually easier for new package creators / maintainers, so let's go with that.
Setting up the repo path
So now, you are in an activated Python virtual environment, and we have designated a path we will checkout the repo. I like to clone repos in $HOME/code, so perhaps the repo path is $HOME/code/my_project.
In this repo path you should have your root package path. Lets say your package is named mypymod. Any directory that contains an __init__.py file is conceptually a python module, where the contents of __init__.py are what you get when you import that directory name. The only difference between a directory module and a normal file module is that a directory module/package can have submodules or subpackages.
For example if you are in the my_project repo, i.e. when you ls you see mypymod, and you have a file structure that looks something like this...
+ my_project
+ mypymod
+ __init__.py
+ submod1.py
+ subpkg
+ __init__.py
+ submod2.py
you can import the following modules:
import mypymod
import mypymod.submod1
import mypymod.subpkg
import mypymod.subpkg.submod2
If you ensured that your current working directory was always the repo root, or you put the repo root into sys.path, then this would be all you need. Being visible in sys.path or the CWD is all that is needed for another module could see your module.
The package manifest: setup.py / pyproject.toml
Now the trick is: how do you ensure your other packages / scripts can always see this module? That is where the package manager comes in. For this we will need a setup.py or the newer pyproject.toml variant. I'll describe the older setup.py way of doing things.
All you need to do is put the setup.py in your repo root. Note: it does not go in your package directory. There are plenty of resources for how to write a setup.py so I wont describe it in much detail, but basically all you need is to populate it with enough information so it knows about the name of the package, its location, and its version.
from setuptools import setup
setup(
name='mypymod',
version='0.1.0',
packages=find_packages(include=['mypymod', 'mypymod.*']),
install_requires=[],
)
So your package structure will look like this:
+ my_project
+ setup.py
+ mypymod
+ __init__.py
+ submod1.py
+ subpkg
+ __init__.py
+ submod2.py
There are plenty of other things you can specify, I recommend looking at ubelt and xdoctest as examples. I'll note they contain a non-standard way of parsing requirements out of a requirements.txt or requirements/*.txt files, which I think is generally better than the standard way people handle requirements. But I digress.
Given something that pip or some other package manager (e.g. pipx, poetry) recognizes as a package manifest - a file that describes the contents of your package, you can now install it. If you are still developing it you can install it in editable mode, so instead of the package being copied into your site-packages, only a symbolic link is made, so any changes in your code are reflected each time you invoke Python (or immediately if you have autoreload on with IPython).
With pip it is as simple as running pip install -e <path-to-repo-root>, which is typically done by navigating into the repo and running pip install -e ..
Congrats, you now have a package you can reference from anywhere.
Making the most of your package
The python -m invocation
Now that you have a package you can reference as if it was installed via pip from pypi. There are a few tricks for using it effectively. The first is running scripts.
You don't need to specify a path to a file to run it as a script in Python. It is possible to run a script as __main__ using only its module name. This is done with the -m argument to Python. For instance you can run python -m mypymod.submod1 which will invoke $HOME/code/my_project/mypymod/submod1.py as the main module (i.e. it's __name__ attribute will be set to "__main__").
Furthermore if you want to do this with a directory module you can make a special file called __main__.py in that directory, and that is the script that will be executed. For instance if we modify our package structure
+ my_project
+ setup.py
+ mypymod
+ __init__.py
+ __main__.py
+ submod1.py
+ subpkg
+ __init__.py
+ __main__.py
+ submod2.py
Now python -m mypymod will execute $HOME/code/my_project/mypymod/__main__.py and python -m mypymod.subpkg will execute $HOME/code/my_project/mypymod/subpkg/__main__.py. This is a very handy way to make a module double as both a importable package and a command line executable (e.g. xdoctest does this).
Easier imports
One pain point you might notice is that in the above code if you run:
import mypymod
mypymod.submod1
You will get an error because by default a package doesn't know about its submodules until they are imported. You need to populate the __init__.py to expose any attributes you desire to be accessible at the top-level. You could populate the mypymod/__init__.py with:
from mypymod import submod1
And now the above code would work.
This has a tradeoff though. The more thing you make accessible immediately, the more time it takes to import the module, and with big packages it can get fairly cumbersome. Also you have to manually write the code to expose what you want, so that is a pain if you want everything.
If you took a look at ubelt's init.py you will see it has a good deal of code to explicitly make every function in every submodule accessible at a top-level. I've written yet another library called mkinit that actually automates this process, and it also has the option of using the lazy_loader library to mitigate the performance impact of exposing all attributes at the top-level. I find the mkinit tool very helpful when writing large nested packages.
Summary
To summarize the above content:
Make sure you are working in a Python virtualenv (I recommend pyenv)
Identify your "package path" inside of your "repo path".
Put an __init__.py in every directory you want to be a Python package or subpackage.
Optionally, use mkinit to autogenerate the content of your __init__.py files.
Put a setup.py / pyproject.toml in the root of your "repo path".
Use pip install -e . to install the package in editable mode while you develop it.
Use python -m to invoke module names as scripts.
Hope this helps.
This is my directory structure:
Projects
+ Project_1
+ Project_2
- Project_3
- Lib1
__init__.py # empty
moduleA.py
- Tests
__init__.py # empty
foo_tests.py
bar_tests.py
setpath.py
__init__.py # empty
foo.py
bar.py
Goals:
Have an organized project structure
Be able to independently run each .py file when necessary
Be able to reference/import both sibling and cousin modules
Keep all import/from statements at the beginning of each file.
I Achieved #1 by using the above structure
I've mostly achieved 2, 3, and 4 by doing the following (as recommended by this excellent guide)
In any package that needs to access parent or cousin modules (such as the Tests directory above) I include a file called setpath.py which has the following code:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('...'))
Then, in each module that needs parent/cousin access, such as foo_tests.py, I can write a nice clean list of imports like so:
import setpath # Annoyingly, PyCharm warns me that this is an unused import statement
import foo.py
Inside setpath.py, the second and third inserts are not strictly necessary for this example, but are included as a troubleshooting step.
My problem is that this only works for imports that reference the module name directly, and not for imports that reference the package. For example, inside bar_tests.py, neither of the two statements below work when running bar_tests.py directly.
import setpath
import Project_3.foo.py # Error
from Project_3 import foo # Error
I receive the error "ImportError: No module named 'Project_3'".
What is odd is that I can run the file directly from within PyCharm and it works fine. I know that PyCharm is doing some behind the scenes magic with the Python Path variable to make everything work, but I can't figure out what it is. As PyCharm simply runs python.exe and sets some environmental variables, it should be possible to clone this behavior from within a Python script itself.
For reasons not really germane to this question, I have to reference bar using the Project_3 qualifier.
I'm open to any solution that accomplishes the above while still meeting my earlier goals. I'm also open to an alternate directory structure if there is one that works better. I've read the Python doc on imports and packages but am still at a loss. I think one possible avenue might be manually setting the __path__ variable, but I'm not sure which one needs to be changed or what to set it to.
Those types of questions qualify as "primarily opinion based", so let me share my opinion how I would do it.
First "be able to independently run each .py file when necessary": either the file is an module, so it should not be called directly, or it is standalone executable, then it should import its dependencies starting from top level (you may avoid it in code or rather move it to common place, by using setup.py entry_points, but then your former executable effectively converts to a module). And yes, it is one of weak points of Python modules model, that causes misunderstandings.
Second, use virtualenv (or venv in Python3) and put each of your Project_x into separate one. This way project's name won't be part of Python module's path.
Third, link you've provided mentions setup.py – you may make use of it. Put your custom code into Project_x/src/mylib1, create src/mylib1/setup.py and finally your modules into src/mylib1/mylib1/module.py. Then you may install your code by pip as any other package (or pip -e so you may work on the code directly without reinstalling it, though it unfortunately has some limitations).
And finally, as you've confirmed in comment already ;). Problem with your current model was that in sys.path.insert(0, os.path.abspath('...')) you'd mistakenly used Python module's notation, which in incorrect for system paths and should be replaced with '../..' to work as expected.
I think your goals are not reasonable. Specifically, goal number 2 is a problem:
Be able to independently run each .py file when neccessary
This doesn't work well for modules in a package. At least, not if you're running the .py files naively (e.g. with python foo_tests.py on the command line). When you run the files that way, Python can't tell where the package hierarchy should start.
There are two alternatives that can work. The first option is to run your scripts from the top level folder (e.g. projects) using the -m flag to the interpreter to give it a dotted path to the main module, and using explicit relative imports to get the sibling and cousin modules. So rather than running python foo_tests.py directly, run python -m project_3.tests.foo_tests from the projects folder (or python -m tests.foo_tests from within project_3 perhaps), and have have foo_tests.py use from .. import foo.
The other (less good) option is to add a top-level folder to your Python installation's module search path on a system wide basis (e.g. add the projects folder to the PYTHON_PATH environment variable), and then use absolute imports for all your modules (e.g. import project3.foo). This is effectively what your setpath module does, but doing it system wide as part of your system's configuration, rather than at run time, it's much cleaner. It also avoids the multiple names that setpath will allow to you use to import a module (e.g. try import foo_tests, tests.foo_tests and you'll get two separate copies of the same module).
I am creating a Python package with multiple modules. I want to make sure that when I import modules within the package that they are importing only from the package and not something outside the package that has the same name.
Is the correct way of doing this is to use relative imports? Will this interfere when I move my package to a different location on my machine (or gets installed wherever on a customer's machine)?
Modern relative imports (here's a reference) are package-relative and package-specific, so as long as the internal structure of your package does not change you can move the package as a whole around wherever you want.
While Joran Beasley's answer should work as well (though does not seem necessary in those older versions of Python where absolute imports aren't the default, as the old style of importing checked within the package's directory first), I personally don't really like modifying the import path like that when you don't have to, especially if you need to load some of those other packages or modules that your modules or packages now shadow.
A warning, however: these do require that the module in question is loaded as part of a package, or at least have their __name__ set to indicate a location in a package. Relative imports won't work for a module when __name__ == '__main__', so if you're writing a simple/casual script that utilizes another module in the same directory as it (and want to make sure the script will refer to the proper directory, things won't work right if the current working directory is not set to the script's), you could do something like import os, sys; sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) (with thanks to https://stackoverflow.com/a/1432949/138772 for the idea). As noted in S.Lott's answer to the same question, this probably isn't something you'd want to do professionally or as part of a team project, but for something personal where you're just doing some menial task automation or the like it should be fine.
the sys.path tells python where to look for imports
add
import sys
sys.path.insert(0,".")
to the top of your main python script this will ensure local packages are imported BEFORE builtin packages (although tbh I think this happens automagically)
if you really want to import only packages in your folder do
import sys
sys.path = ["."]
however I do not recommend this at all as it will probably break lots of your stuff ...
most IDE's (eclipse/pycharm/etc) provide mechanisms to set up the environment a project uses including its paths
really the best option is not to name packages the same as builtin packages or 3rd party modules that are installed on your system
also the best option is to distribute it via a correctly bundled package, this should more than suffice
I do a lot of work on different projects (I'm a scientist) in a fairly standardised directory structure. e.g.:
project
/analyses/
/lib
/doc
/results
/bin
I put all my various utility scripts in /bin/ because cleanliness is next to godliness. However, I have to hard code paths (e.g. ../../x/y/z) and then I have to run things within ./bin/ or they break.
I've used Django and that has /manage.py which runs various django-things and automatically handles the path. I've also used fabric to run various user defined functions.
Question: How do I do something similar? and what's the best way? I can easily write something in /manage.py to inject the root dir into sys.path etc, but then I'd like to be able to do "./manage.py foo" which would run /bin/foo.py. Or is it possible to get fabric to call executables from a certain directory?
Basically - I want something easy and low maintenance. I want to be able to drop an executable script/file/whatever into ./bin/ and not have to deal with path issues or import issues.
What is the best way to do this?
Keep Execution at TLD
In general, try to keep your runtime at top-level. This will straighten out your imports tremendously.
If you have to do a lot of import addressing with relative imports, there's probably a
better way.
Modifying The Path
Other poster's have mentioned the PYTHONPATH. That's a great way to do it permanently in your shell.
If you don't want to/aren't able to manipulate the PYTHONPATH project path directly you can use sys.path to get yourself out of relative import hell.
Using sys.path.append
sys.path is just a list internally. You can append to it to add stuff to into your path.
Say I'm in /bin and there's a library markdown in lib/. You can append a relative paths with sys.path to import what you want.
import sys
sys.path.append('../lib')
import markdown
print markdown.markdown("""
Hello world!
------------
""")
Word to the wise: Don't get too crazy with your sys.path additions. Keep your schema simple to avoid yourself a lot confusion.
Overly eager imports can sometimes lead to cases where a python module needs to import itself, at which point execution will halt!
Using Packages and __init__.py
Another great trick is creating python packages by adding __init__.py files. __init__.py gets loaded before any other modules in the directory, so it's a great way to add imports across the entire directory. This makes it an ideal spot to add sys.path hackery.
You don't even need to necessarily add anything to the file. It's sufficient to just do touch __init__.py at the console to make a directory a package.
See this SO post for a more concrete example.
In a shell script that you source (not run) in your current shell you set the following environment variables:
PATH=$PATH:$PROJECTDIR/bin
PYTHONPATH=$PROJECTDIR/lib
Then you put your Python modules and package tree in your projects ./lib directory. Python automatically adds the PYTHONPATH environment variable to sys.path.
Then you can run any top-level script from the shell without specifying the path, and any imports from your library modules are looked for in the lib directory.
I recommend very simple top-level scripts, such as:
#!/usr/bin/python
import sys
import mytool
mytool.main(sys.argv)
Then you never have to change that, you just edit the module code, and also benefit from the byte-code caching.
You can easily achieve your goals by creating a mini package that hosts each one of your projects. Use paste scripts to create a simple project skeleton. And to make it executable, just install it via setup.py develop. Now your bin scripts just need to import the entry point to this package and execute it.
As a beginning programmer (I'm rather doing scripting), I'm struggling with my growing collection of scripts and modules.
Currently, small scripts are in a general folder that is added to my pythonpath. Specific projects get their own subfolder, but more and more I try to write the general parts of those projects as modules that can be used by other projects. But I cannot just import them unless this subfolder is also in my pythonpath.
I don't know how to organise all this. I will be happy to get your hints and recommendations on organising (python) code. Thanks!
Be sure to read about Packages, specifically the part about creating a file named __init__.py in directories you want to import from.
I have a git repository named MyCompany in my home directory. There's a link from /usr/local/lib/python2.7/site-packages/ to it. Inside MyCompany are package1, package2, package3, etc. In my code, I write import MyCompany.package1.modulefoo. Python looks in site-packages and finds MyCompany. Then it finds the package1 subdirectory with an __init__.py file in it - yay, a package! Then it imports the modulefoo.py file in that directory, and I'm off and running.
General-purpose custom modules should go to ~/.local/lib/pythonX.Y/site-packages, or to /usr/local/lib/pythonX.Y/site-packages if they should be available to everyone. Both paths are automatically available in your $PYTHONPATH. (The former is available since Python 2.6 – see PEP 370 – Per user site-packages directory.)