Imports with complex folder hierarchy - python

Hey I'm working on a project that has a set hierarchical modules with a folder structure set up like so:
module_name/
__init__.py
constants.py
utils.py
class_that_all_submodules_need_access_to.py
sub_module_a/
__init__.py
sub_module_a_class_a.py
sub_module_a_class_b.py
useful_for_sub_module_a/
__init__.py
useful_class_a.py
useful_class_b.py
sub_module_b/
__init__.py
sub_module_b_class_a.py
sub_module_b_class_b.py
etc etc etc...
The problem is, I can't figure out how to set up the imports in the init.py's so that I can access class_that_all_submodules_need_access_to.py from sub_module_a/useful_for_sub_module_a/useful_class_a.py.
I've tried looking this up on Google/StackOverflow to exhaustion and I've come up short. The peculiar thing is that PyCharm has the paths set up in such a way that I don't encounter this bug when working on the project in PyCharm, but only from other environments.
So here's one particularly inelegant solution that I've come up with. My sub_module_a/useful_for_sub_module_a/init.py looks like:
import sys
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..', '..')))
import module_name
This is similar in sub_module_*/, where instead of 3 '..'s it's just two (i.e. '..', '..' instead of '..', '..', '..' for the sys.path.insert line above). And then in sub_module_a/useful_for_sub_module_a/useful_class_a.py, I have to import module_name/constants.py (and others) like this:
import module_name.constants
from module_name.class_that_all_submodules_need_access_to import ImportantClass
While this solution works, I was wondering if there is a better/more elegant way to set up the imports and/or folder hierarchy? I'm concerned about messing with the python system path for users of this module. Is that even a valid concern?

There are two kinds of Python import: absolute and relative, see Python import. But the first import thing is that you need to understand how python finds your package? If you just put your package under your home folder, then Python knows nothing about it. You can check this blog [Python search] (https://leemendelowitz.github.io/blog/how-does-python-find-packages.html).
Thus to import modules, the first thing is that let Python know your package. After knowing the locations where Python would search packages, there are often two ways to accomplish this goal:
(1) PYTHONPATH environment variable. Set this variable in your environment configuration file, e.g., .bash_profile. This is also the simplest way.
(2) Use setuptools, which could help you distribute your package. This is a long story. I would not suggest you choose it, unless you would like to distribute your package in the future. However, it worth to know it.
If you set your path correctly, the Python import is just straight forward. If you would like use the absolute import, try
from module_name import class_that_all_submodules_need_access_to
If you would like to use the relative import, it depends on which module you are now. Suppose you are writing the module module_name.sub_module_a.sub_moduel_a_class_a, then try
from .class_that_all_submodules_need_access_to import XX_you_want
Note that relative import supports from .xx import xx format only.
Thanks.

Related

Importing modules from adjacent folders in Python

I wanted to ask if there is a way to import modules/functions from a folder in a project that is adjacent to another folder.
For example lets say I have two files:
project/src/training.py
project/lib/functions.py
Now both these folders have the __init__.py file in them. If I wanted to import functions.py into training.py, it doesn't seem to detect. I'm trying to use from lib.functions import * .I know this works from the upper level of the folder structure, where I can call both files from a script, but is there a way to do it files in above/sideways folders?
Fundamentally, the best way of doing this depends on exactly how the two modules are related. There's two possibilities:
The modules are part of one cohesive unit that is intended to be used as a single whole, rather than as separate pieces. (This is the most common situation, and it should be your default if you're not sure.)
The functions.py module is a dependency for training.py but is intended to be used separately.
Cohesive unit
If the modules are one cohesive unit, this is not the standard way of structuring a project in Python.
If you need multiple modules in the same project, the standard way of structuring the folders is to include all the modules in a single package, like so:
project/
trainingproject/
__init__.py
training.py
functions.py
other/
...
project/
...
folders/
...
The __init__.py file causes Python to recognize the trainproject/ directory as a single unit called a package. Using a package enables to use of relative imports:
training.py
from . import functions
# The rest of training.py code
Assuming your current directory is project, you can then invoke training.py as a module:
python -m trainingproject.training
Separate units
If your modules are actually separate packages, then the simplest idiomatic solutions during development is to modify the PYTHONPATH environment variable:
sh-derviative syntax:
# All the extra PYTHONPATH references on the right are to append if it already has values
# Using $PWD ensures that the path in the environment variable is absolute.
PYTHONPATH=$PYTHONPATH${PYTHONPATH:+:}$PWD/lib/
python ./src/training.py
PowerShell syntax:
$env:PYTHONPATH = $(if($env:PYTHONPATH) {$env:PYTHONPATH + ';'}) + (Resolve-Path ./lib)
python ./src/training.py
(This is possible in Command Prompt, too, but I'm omitting that since PowerShell is preferred.)
In your module, you would just do a normal import statement:
training.py
import functions
# Rest of training.py code
Doing this will work when you deploy your code to production as well if you copy all the files over and set up the correct paths, but you might want to consider putting functions.py in a wheel and then installing it with pip. That will eliminate the need to set up PYTHONPATH by installing functions.py to site-packages, which will make the import statement just work out of the box. That will also make it easier to distribute functions.py for use with other scripts independent of training.py. I'm not going to cover how to create a wheel here since that is beyond the scope of this question, but here's an introduction.
Yes, it’s as simple as writing the entire path from the working directory:
from project.src.training import *
Or
from project.lib.functions import *
I agree with what polymath stated above. If you were also wondering how to run these specific scripts or functions once they are imported, use: your_function_name(parameters), and to run a script that you have imported from the same directory, etc, use: exec(‘script_name.py). I would recommend making functions instead of using the exec command however, because it can be a bit hard to use correctly.

Differences in import expressions?

I have a question regarding the import//from statement in python.
In my views.py file (Project/App/views.py) I have this line:
from django.views.generic import TemplateView, ListView
Why do I have to include 'django' in that line? Why is it not enough to specify which directory (views) that the generic file is located in? This is what I have done in many of my previous python-only scripts - an example being:
from random import foo
as well as in my current django url.py file. There, I have:
from app.views import view
Why don't I have to specify that further, like with the first example where 'django' is included in the path-specification? How come I don't have to write it like this:
from project.app.views import view
Thank you!
Welcome to the wild world of Python's import system!
To expand on freude's answer a little bit, you've ran into one of the most consistently confusing portions of the Python language: relative vs absolute imports. While the import examples that you gave are syntactically fine, they hide some of the complexity what's going on behind the scenes. When you run:
from django.views.generic import TemplateView, ListView
Python searches through the PYTHONPATH (which you can see with something like print(sys.path)) for a package or module named django. It end up finding one somewhere among your installed libraries. Similarly, when you run:
from project.app.views import view
It searches those same paths, but instead finds the project package in the directory that the Python interpreter is currently executing in. However, if you had installed a library named project, how would it know which one you actually meant? This is generally solved by using absolute imports and by being explicit if you intend to use relative imports like this. If you wanted to be more precise in your example, you would specify that you wanted to import it relative to the current module by using a . - like so:
from .project.app.views import view
You can even see this in action in an example in the django tutorial.
See this classic answer on the subject for more detailed information.
A python script sees certain paths pointing to the the global default location like site-packages or dist-packages (you may find those directories in the python directory tree and random and django are located in one of them) or specified by the environment variable PYTHONPATH. Usually, PYTHONPATH includes your project directory (actually you may add there whatever directory you want). Your example suggests that the packages django as well as apps and random are located in those paths while project is not there. In python a package is represented by a directory containing __init__.py file as well as some other files representing modules. Now you may import modules from packages in those locations using relative paths like you have shown in your examples.

python: include files from other directories into a project

I have multiple python projects which should use a number of shared files but I can not figure out how to do this in python.
If I just copy the file into the pyhton working directory it works fine with:
from HBonds import calculateHBondsForSeveralReplicas, compareSameShapeHBMatrices, calculateHBonds
But I do not want to copy it. I want to include it from: /home/b/data/pythonWorkspace/util/HBonds
For me it would make sense to do it like this (but it does not work):
from /home/b/data/pythonWorkspace/util/HBonds/HBonds.py import calculateHBondsForSeveralReplicas, compareSameShapeHBMatrices, calculateHBonds
How can I do this?
You have to make sure the PYTHONPATH includes the path to that directory as it was pointed out in previous answer.
Or you can use a nastier way: make it available at runtime with piece of code like this.
import os
import sys
folder = os.path.dirname('/home/b/data/pythonWorkspace/util/')
if dossier not in sys.path:
sys.path.append(folder)
from HBonds import HBonds
For 3rd-party libraries, it's best to install them the stock way - either to the system's site-packages or into a virtualenv.
For project(s) you're actively developing at the machine where it's running, a maintainable solution is to add their root directory(ies) to PYTHONPATH so that you can import <top_level_module>.<submodule>.<etc> from anywhere. That's what we did at my previous occupation. The main plus here is trivial code base update and switch.
Another way is to use relative imports, but it's intended for intra-package references, so that you don't have to reiterate the package's name everywhere. If many otherwise unrelated parts of the code use the same module(s), it's probably more convenient to make the shared part a separate package which would be a dependency for all of them.

Python- working with files in multiple directories

I realize this may be considered a duplicate question to some of the other questions out there, but I've spent over an hour now reading through various pages and documentation and still don't understand what's going on here.
I'm trying to work with python files in multiple directories; I have essentially this:
myproject/
__init__.py
some_file.py
some_data.dat
tests/
__init__.py
test_some_file.py
test_some_file.py is run from the command line, as the name implies, intended to run the code contained in some_file.py, and needs to import it. However, I can't seem to do so.
I've tried:
from myproject import some_file
and also
from .. import some_file
I did manage to make it run using sys.path, but that doens't seem to be the correct way to do things based on what I've read.
Secondly, when I did make it run, using sys.path, I got an error that it couldn't find some_data.dat which is used by some_file.py.
This is a perennial question from Python programmers. The issue is that Python doesn't play nicely with scripts that are inside of packages. The situation has improved a bit over the last few releases, but it still doesn't do the right thing a lot of the time.
I think the best answer is to restrict where you run your test_some_file.py from, and use the Python interpreter's -m parameter. That is, change into the directory above myproject, and then run python -m myproject.tests.test_some_file. This is the only way that will work without messing around with sys.path.
This will allow either of your import lines to work correctly. PEP 8 currently recommends using absolute imports always, so the first version is probably better than the relative version using ...
For cases like yours, I add the directory of some_file.py to sys.path (temporarily).
Code:
import sys, os
dirname = os.path.dirname( # going up by 1 directory
os.path.dirname( # going up by 2 directories
sys.argv[0]))
sys.path.append(dirname)
import some_test

Best practice for handling path/executables in project scripts in Python (e.g. something like Django's manage.py, or fabric)

I do a lot of work on different projects (I'm a scientist) in a fairly standardised directory structure. e.g.:
project
/analyses/
/lib
/doc
/results
/bin
I put all my various utility scripts in /bin/ because cleanliness is next to godliness. However, I have to hard code paths (e.g. ../../x/y/z) and then I have to run things within ./bin/ or they break.
I've used Django and that has /manage.py which runs various django-things and automatically handles the path. I've also used fabric to run various user defined functions.
Question: How do I do something similar? and what's the best way? I can easily write something in /manage.py to inject the root dir into sys.path etc, but then I'd like to be able to do "./manage.py foo" which would run /bin/foo.py. Or is it possible to get fabric to call executables from a certain directory?
Basically - I want something easy and low maintenance. I want to be able to drop an executable script/file/whatever into ./bin/ and not have to deal with path issues or import issues.
What is the best way to do this?
Keep Execution at TLD
In general, try to keep your runtime at top-level. This will straighten out your imports tremendously.
If you have to do a lot of import addressing with relative imports, there's probably a
better way.
Modifying The Path
Other poster's have mentioned the PYTHONPATH. That's a great way to do it permanently in your shell.
If you don't want to/aren't able to manipulate the PYTHONPATH project path directly you can use sys.path to get yourself out of relative import hell.
Using sys.path.append
sys.path is just a list internally. You can append to it to add stuff to into your path.
Say I'm in /bin and there's a library markdown in lib/. You can append a relative paths with sys.path to import what you want.
import sys
sys.path.append('../lib')
import markdown
print markdown.markdown("""
Hello world!
------------
""")
Word to the wise: Don't get too crazy with your sys.path additions. Keep your schema simple to avoid yourself a lot confusion.
Overly eager imports can sometimes lead to cases where a python module needs to import itself, at which point execution will halt!
Using Packages and __init__.py
Another great trick is creating python packages by adding __init__.py files. __init__.py gets loaded before any other modules in the directory, so it's a great way to add imports across the entire directory. This makes it an ideal spot to add sys.path hackery.
You don't even need to necessarily add anything to the file. It's sufficient to just do touch __init__.py at the console to make a directory a package.
See this SO post for a more concrete example.
In a shell script that you source (not run) in your current shell you set the following environment variables:
PATH=$PATH:$PROJECTDIR/bin
PYTHONPATH=$PROJECTDIR/lib
Then you put your Python modules and package tree in your projects ./lib directory. Python automatically adds the PYTHONPATH environment variable to sys.path.
Then you can run any top-level script from the shell without specifying the path, and any imports from your library modules are looked for in the lib directory.
I recommend very simple top-level scripts, such as:
#!/usr/bin/python
import sys
import mytool
mytool.main(sys.argv)
Then you never have to change that, you just edit the module code, and also benefit from the byte-code caching.
You can easily achieve your goals by creating a mini package that hosts each one of your projects. Use paste scripts to create a simple project skeleton. And to make it executable, just install it via setup.py develop. Now your bin scripts just need to import the entry point to this package and execute it.

Categories

Resources