ADVICE REQUEST: Best Practices for Python Imports
I need advice re: how major projects do Python imports and the standard / Pythonic way to set this up.
I have a project Foobar that's a subproject of another, much larger and well-used project Dammit. My project directory looks like this:
/opt/foobar
/opt/foobar/bin/startFile.py
/opt/foobar/bin/tests/test_startFile.py
/opt/foobar/foobarLib/someModule.py
/opt/foobar/foobarLib/tests/test_someModule.py
So, for the Dammit project, I add to PYTHONPATH:
export PYTHONPATH=$PYTHONPATH:/opt/foobar
This lets me add a very clear, easy-to-track-down import of:
from foobarLib.someModule import SomeClass
to all 3 of:
* Dammit - using foobarLib in the import makes everyone know it's not in Dammit project, it's in Foobar.
* My startFile.py is an adjunct process that has the same pythonpath import.
* test_someModule.py has an explicit import, too.
PROBLEM: This only works for foobarLib, not bin.
I have tests I want to run on in /opt/foobar/bin/tests/test_startFile.py. But, how to set up the path and do the import? Should I do a relative import like this:
PROJ_ROOT = os.path.abspath(os.path.join(os.path.dirname(__file__),".."))
sys.path.insert(0, PROJ_ROOT)
Or, should I rename the bin dir to be foobarBin so I do the import as:
from foobarBin.startFile import StartFile
I'm tempted to do the following:
/opt/foobarProject/foobar
/opt/foobarProject/foobar/bin/startFile.py
/opt/foobarProject/foobar/bin/tests/test_startFile.py
/opt/foobarProject/foobar/foobarLib/someModule.py
/opt/foobarProject/foobar/foobarLib/tests/test_someModule.py
then, I can do all my imports with:
import foobar.bin.startFile
import foobar.lib.someModule
Basically, for large Python projects (AND THEIR ADJUNCTS), it seems to me we have goals of:
minimize number of dirs added to pythonpath;
make it obvious where imports are coming from. That is, people use 'import lib.thing' a lot, and if there's more than one directory in the pythonpath named 'lib',
it's troublesome / non-obvious where that is, short of invoking
python and searching sys.modules, etc.
minimize number of times we add paths to sys.path since that's runtime and somewhat non-obvious.
I've used Python a long time, and been part of projects that do it various ways, some of them more stupid and some less so. I guess I'm wondering if there is a best-practices here, and the reason for it? Is my convention of adding foobarProject layer okay, good, bad, or just one more way among many?
Related
root
- Module_1
- utils.py
- Module_2
- basic.py
I have a couple of functions in utils.py which I want to use in basic.py, I tried every possible option but not able to do it, I referred many answers, I don't want to use sys path. Is it possible, thank u in advance (:
If I have to add __init__.py, where all should I add it??
In fact there's multiple ways.
I personally would suggest a way, where you do not have to mess with the environment variable PYTHONPATH and where you do not have to change sys.path
The simplest suggestion and preferred if possible in your context
If you can, then just move Module_2/basic.py one level up, so that it is located in the parent directory of Module_1 and Module_2.
create an empty file Module_1/__init__.py
and use following import in basic.py
from Module_1.utils import function
My second preferred suggestion:
I'd suggest to create two empty files.
Module_1/__init__.py
Module_2/__init__.py
By the way I hope, that the real directory names do not contain any uppercase characters. This is strongly discouraged. compared to python2 python 3 contains no (or almost no) module with uppercase letters anymore. So I'd try to keep the same spirit.
then in basic just write
from Module_1.utils import function
!!! But:
Instead of typing
python Module_2/basic.py
you had to be in the parent directory of Module_1 and Module_2
python -m Module_2.basic
or if under a unix like operation system you could also type
PYTHONPATH="/path/to/parent_of_Mdule_1" python Module_2/basic.py
My not so preferred suggestion:
It looks simpler but is a little more magic and might not be desired with bigger projects in a bigger contents.
It seems simpler, but for bigger projects you might get lost and when you start using coding style checkers like flake8 you will get coding style warnings
Just add at the beginning of Module_2/basic.py following lines.
import os
import sys
TOPDIR = os.path.dirname(os.path.dirname(os.path.realpath(__file__)))
sys.path.insert(0, TOPDIR)
and import then the same way as in my first solution
from Module_1.utils import function
If you want to make a custom package importable the, put __init__.py in that dir. In your case, __init__.py should be in Module_1 dir. Similarly, if you want utilise the code of basic.py somewhere else then put __init__.py in Module_2 dir.
try add __init__.py in Module_1 folder and import in basic.py like this:
from Module_1.utils import func
I have a Python package called Util. It includes a bunch of files. Here is the include statements on top of one of the files in there:
from config_util import ConfigUtil #ConfigUtil is a class inside the config_util module
import error_helper as eh
This works fine when I run my unit tests.
When I install Util in a virtual environment in another package everything breaks. I will need to change the import statements to
from Util.config_util import ConfigUtil
from Util import error_helper as eh
and then everything works as before. So is there any point in using the first form or is it safe to say that it is better practice to always use the second form?
If there is no point in using the first form, then why is it allowed?
Just wrong:
from config_util import ConfigUtil
import error_helper as eh
It will only work if you happen to be in the directory Util, so that the imports resolve in the current working directory. Or you have messed with sys.path using some bad hack.
Right (using absolute imports):
from Util.config_util import ConfigUtil
import Util.error_helper as eh
Also right (using relative imports):
from .config_util import ConfigUtil
import .error_helper as eh
There is no particular advantage to using relative imports, only a couple of minor things I can think of:
Saves a few bytes in the source file (so what / who cares?)
Enables you to rename the top level without editing import statements in source code (...but how often do you do that?)
For your practical problems, maybe this answer can help you.
Regarding your direct question: there's not a lot to it, but they let you move files and rename containing directories more easily. You may also prefer relative imports for stylistic reasons; I sure do.
The semantics are the same if the paths are correct. If your module is foo.bar, then from foo.bar.baz import Baz and from .baz import Baz are the same. If they don't do the same, then you're likely calling your Python file as a script (python foo/bar.py), in which case it will be module __main__ instead of foo.bar.
I have flask web app and its structure is as follows:
/app
/__init__.py
/wsgi.py
/app
/__init__.py
/views.py
/models.py
/method.py
/common.py
/db_client.py
/amqp_client.py
/cron
/__init.py__
/daemon1.py
/daemon2.py
/static/
/main.css
/templates/
/base.html
/scripts
/nginx
/supervisor
/Dockerfile
/docker-compose.yml
In app/app/cron i have written standalone daemons which i want to call outside the docker. e.g.
python daemon1.py
daemon1.py code
from ..common import stats
from ..method import msapi, dataformater
from ..db_client import db_connection
def run_daemon():
......
......
......
if name =="main":
run_daemon()
So when i am trying to run this daemon1.py its throwing ValueError: Attempted relative import in non-package
Please suggest the right approach for import as well as to structure these daemons.
Thanks in advance.
I ran into the exact same problem with an app that was running Flask and Celery. I spent far too many hours Googling for what should be an easy answer. Alas, there was not.
I did not like the "python -m" syntax, as that was not terribly practical for calling functions within running code. And on account of of my seemingly small brain, I was not able to come to grips with any of the other answers out there.
So...there is the wrong way and the long way. Both of them work (for me) and I'm sure I'll get a tongue lashing from the community.
The Wrong Way
You can call a module directly using the imp package like so:
import imp
common = imp.load_source('common', os.path.dirname(os.path.abspath('__file__')) + '/common.py')
result = common.stats() #not sure how you call stats, but you hopefully get the idea
I had a quick search for the references that said that is a no-no, but I can't find them...sorry.
The Long Way
This method involves temporarily appending each of your modules to you PATH. This has worked for me on my Docker deploys and works nicely regardless of the container's directory structure. Here are the steps:
1) You must import the relevant modules from the parent directories in your __init__ files. This is really the whole point of the __init__ - allowing the modules in its package to be callable. So, in your case, cron/__init__ should contain:
from . import common
It doesn't look like your directories go any higher than that, but you would do the same for any other packages levels up as well.
2) Now you need to append the path of the module to the PATH variable. You can see what is in there right now by running:
sys.path
As expected, you probably won't see any of your modules in there. That means, that Python can't figure out what you want when you call the common module. In order to add the path, you need to figure out the directory structure. You will want to make this dynamic to account for changing directories.
It's worth noting that this will need to run each time your module runs. I'm not sure what your cron module is, but in my case it is Celery. So, this runs only when I fire up workers and the initial crontabs.
Here is the hack I threw together (I'm sure there is a cleaner way to do it):
curr_path = os.getcwd() #current path where cron is running
parrent_path = os.path.abspath(os.path.join(os.getcwd(), '..')) #the parent directory path
parrent_dir = os.path.basename(os.path.abspath(parrent_path)) #the parent directory name
while parrent_dir <> 'project_name': #loop until you get to the top directory - should be the project name
parrent_path = os.path.abspath(os.path.join(par_path, '..'))
parrent_dir = os.path.basename(os.path.abspath(parrent_path))
In your case, this might be a challenge since you have two directories named 'app'. Your top level 'app' is my 'project_name'. For the next step, let's assume you have changed it to 'project_name'.
3) Now you can append the path for each of your modules to the PATH variable:
sys.path.append(parrent_dir + '/app')
Now if you run sys.path again, you should see the path to /app in there.
In summary: make sure all of your __init__'s have imports, determine the paths to the modules you want to import, append the paths to the PATH variable.
I hope that helps.
#greenbergé Thank you for your solution. i tried but didn't worked for me.
So to make things work I have changed my code little bit. Apart from calling run_daemon() in main of daemon1.py, i have called function run_daemon() directly.
python -m 'from app.cron.daemon1 import run_daemon(); run_daemon()'
As it is not exact solution of the problem but things worked for me.
Is it ok to use a module referencing with more than two dots in a path? Like in this example:
# Project structure:
# sound
# __init__.py
# codecs
# __init__.py
# echo
# __init__.py
# nix
# __init__.py
# way1.py
# way2.py
# way2.py source code
from .way1 import echo_way1
from ...codecs import cool_codec
# Do something with echo_way1 and cool_codec.
UPD: Changed the example. And I know, this will work in a practice. But is it a common method of importing or not?
update Nov. 24,2020
If you wanna dig deeper in python's relative-import, I strongly recommend you this answer.
Is it ok to use a module referencing with more than two dots in a path?
Yes. You can use multiple dots in relative import path, but it is only feasible when using from xxx import yyy syntax, not import xxx syntax. Moreover, single dot, two dots and three dots mean current directory, parent directory and grandparent directory respectively, and so on.
And I know, this will work in a practice. But is it a common method of importing or not?
It depends. If your project has complex directory structure, using absolute import would be "disgusting". For example,
from sub1.sub2.sub3.sub4.sub5 import yourmethod
. In this case, using relative import will make your code clean and neat. Maybe look like
from ...sub5 import yourmethod
From PEP8:
Absolute imports are recommended, as they are usually more readable and tend to be better behaved (or at least give better error messages) if the import system is incorrectly configured (such as when a directory inside a package ends up on sys.path):
import mypkg.sibling
from mypkg import sibling
from mypkg.sibling import example
However, explicit relative imports are an acceptable alternative to absolute imports, especially when dealing with complex package layouts where using absolute imports would be unnecessarily verbose:
from . import sibling
from .sibling import example
Standard library code should avoid complex package layouts and always use absolute imports.
I am importing a lot of different scripts, so at the top of my file it gets cluttered with import statements, i.e.:
from somewhere.fileA import ...
from somewhere.fileB import ...
from somewhere.fileC import ...
...
Is there a way to move all of these somewhere else and then all I have to do is import that file instead so it's just one clean import?
I strongly advise against what you want to do. You are doing the global include file mistake again. Although only one module is importing all your modules (as opposed to all modules importing the global one), the remaining point is that if there's a valid reason for all those modules to be collected under a common name, fine. If there's no reason, then they should be kept as separate includes. The reason is documentation. If I open your file, and see only one import, I don't get any information about what is imported and where it comes from. If on the other hand, I have the list of imports, I know at a glance what is needed and what not.
Also, there's another important error I assume you are doing. When you say
from somewhere.fileA import ...
from somewhere.fileB import ...
from somewhere.fileC import ...
I assume you are importing, for example, a class, like this
from somewhere.fileA import MyClass
this is wrong. This alternative solution is much better
from somewhere import fileA
<later>
a=fileA.MyClass()
Why? two reasons: first, namespacing. If you have two modules having a class named MyClass, you would have a clash. Second, documentation. Suppose you use the first option, and I find in your code the following line
a=MyClass()
now I have no idea where this MyClass comes from, and I will have to grep around all your files in order to find it. Having it qualified with the module name allows me to immediately understand where it comes from, and immediately find, via a /search, where stuff coming from the fileA module is used in your program.
Final note: when you say "fileA" you are doing a mistake. There are modules (or packages), not files. Modules map to files, and packages map to directories, but they may also map to egg files, and you may even create a module having no file at all. This is naming of concepts, and it's a lateral issue.
Of course there is; just create a file called myimports.py in the same directory where your main file is and put your imports there. Then you can simply use from myimports import * in your main script.