Refactoring python module configuration to avoid relative imports - python

This is related to a previous question of mine.
I understand how to store and read configuration files. There are choices such as ConfigParser and ConfigObj.
Consider this structure for a hypothetical 'eggs' module:
eggs/
common/
__init__.py
config.py
foo/
__init__.py
a.py
'eggs.foo.a' needs some configuration information. What I am currently doing is, in 'a', import eggs.common.config. One problem with this is that if 'a' is moved to a deeper level in the module tree, the relative imports break. Absolute imports don't, but they require your module to be on your PYTHONPATH.
A possible alternative to the above absolute import is a relative import. Thus, in 'a',
import .common.config
Without debating the merits of relative vs absolute imports, I was wondering about other possible solutions?
edit- Removed the VCS context

"imports ... require your module to be on your PYTHONPATH"
Right.
So, what's wrong with setting PYTHONPATH?

require statement from pkg_resources maybe what you need.

As I understand it from this and previous questions you only need one path to be in sys.path. If we are talking about git as VCS (mentioned in previous question) when only one branch is checked out at any time (single working directory). You can switch, merge branches as frequently as you like.

I'm thinking of something along the lines of a more 'push-based' kind of solution. Instead of importing the shared objects (be they for configuration, or utility functions of some sort), have the top-level init export it, and each intermediate init import it from the layer above, and immediately re-export it.
I'm not sure if I've got the python terminology right, please correct me if I'm wrong.
Like this, any module that needs to use the shared object(which in the context of this example represents configuration information) simply imports it from the init at its own level.
Does this sound sensible/feasible?

You can trick the import mechanism, by adding each subdirectory to egg/__init__.py:
__path__.append(__path__[0]+"\\common")
__path__.append(__path__[0]+"\\foo")
then, you simply import all modules from the egg namespace; e.g. import egg.bar (provided you have file egg/foo/bar.py).
Note that foo and common should not be a package - in other words, they should not contain __init__.py file.
This solution completely solves the issue of eventually moving files around; however it flattens the namespace and therefore it may not be as good, especially in big projects - personally, I prefer full name resolution.

Related

Why does python packaging documentation say __init__.py file of package should be empty?

Based on the the documentation for Packaging Python Projects, __init__.py should be empty. I want to know why? because I'm placing certain objects in the __init__.py file that are used in every module in the Package. On checking bunch of __init.py__ files in my local environments for standard packages like importlib, multiprocessing etc. All of them have a bunch of code in the file.
The sole purpose of __init__.py is to indicate that the folder containing this file is a package, and to treat this folder as package, that's why it is recommended to leave it empty.
Consider following hierarchy:
foo
__init__.py
bar.py
When you use from foo import bar or import foo.bar, Python interpreter will look for __init__.py in foo folder, if it finds it, the bar module will be imported else it won't; however, this behavior has changed over the time, and it may be able to successfully import the modules/packages even if __init__.py is missing, but remember Zen of Python: Explicit is better than implicit, so it's always safe to have it.
But in case, if you need some package level variables to be defined, you can do it inside the __init__.py file, and all the modules inside the package will be able to use it.
And in fact, if you look at PEP 257, it mentions that the __init__.py can also contain the documentation for package level information.
You're taking that statement as more general than it's meant. You're reading a statement from a tutorial, where they walk you through creating a simple example project. That particular example project's __init__.py should be empty, simply because the example doesn't need to do anything in __init__.py.
Most projects' __init__.py files will not be empty. Taking a few examples from popular packages, such as numpy, requests, flask, sortedcontainers, or the stdlib asyncio, none of these example __init__.py files are empty. They may perform package initialization, import things from submodules into the main package namespace, or include metadata like __all__, __version__, or a package docstring. The example project is just simplified to the point where it doesn't have any of that.
To my knowledge, there are three things you need to be aware of when you create a non-empty __init__ file:
it might be more difficult to follow the code. If you instantiate a = B() in __init__ it's even worse. I know developers who don't like it only for this reason
on package import contents of __init__ are evaluated. Sometimes it might be computation heavy or simply not needed.
namespace conflicts. You can't really instantiate bar in init and have a bar.py file in your package.
I like importing package contents in __init__ as otherwise in bigger projects import statements become ugly. Overall it's not a good or bad practice. This advice applies only to the project in this particular example.
In some case, you didn't have any shared component in your package. Suppose Defining a little package for calculating some algorithms, Then you didn't need any shared component in your __init__

importing a python module from another repo

I have written a python module. It resides in a repo. Now to get the unit tests for the module to run, they can't be subfolder of that module. The suggested structure to fix this is:
ModuleRepo/
MyModule/
__init__.py
some_utils.py
import .some_helpers
some_helpers.py
Tests/
some_utils_test.py
import MyModule.some_utils
some_helpers_test.py
import MyModule.some_helpers
Now that works just fine. If I run the tests, they are able to import the module. The module is able to import its own files (eg: some_helpers) by pre-pending '.' to indicate the local folder.
The issue is that a different repo now wants to share this module and I don't know how to make it find the module.
eg:
ModuleRepo/
MyModule/
__init__.py
...
Tests/
...
AnotherRepo/
using_my_module.py
import ??? <-- how to find MyModule?
NOT WORKING ATTEMPT #1:
I tried initially to include the ModuleRepo under AnotherRepo using git's submodule functionality. However I dont actually want the root folder for ModuleRepo, I want the subfolder 'MyModule' only. It turns out git's submodule doesn't do that - one cant choose only a part of a repo to include.
UNDESIRABLE SYMLINK: While a symlink might work, its not something one can 'commit' to a repository and so is somewhat undesirable. Additionally I am developing both on Windows and Linux, so I need a solution which works on both.
POSSIBLE SOLUTION: Turn ModuleRepo root into a module too (adding an init.py). Then I could use git to make it a submodule of AnotherRepo. My import would be ugly but it would be: import my.module.some_utils instead of import mymodule.some_utils
Does anyone have any better solutions?
Several possibilities.
Tweak the sys.path
variable somewhere at the top level of your code to make the ModuleRepo directory listed there.
The upside is that this approach works with any solution you can
use to have these two repositories aside of one another — be it
submodules or subtree merging.
The downside is that you'd need to repeat this tweak also in the
test unit(s) of the code in using_my_module.py.
Use virtualenv for
development.
A part of the setting up the development environment for the project
would be installing of "MyModule" "the regular way".
If the "MyModule" module is not in flux and/or you are okay with
manual periodical incorporation of the developments happening
in "MyModule" into your "main" code base, you can go with so-called "vendoring" by means of using git subtree split and
git subtree add commands (or the so-called "subtree merging" instead
of the latter).
Basically, the git subtree split command allows you to
extract out of the repo hosting "MyModule"
a synthetic history graph containing only the commits which touch
files under the specified prefix — "MyModule", in your case,
and the git subtree add allows you to "subtree-merge" that subgraph
at the specified prefix in another repository.
git subtree command,
and its manual page.
The "subtree" merge strategy.

Imports with complex folder hierarchy

Hey I'm working on a project that has a set hierarchical modules with a folder structure set up like so:
module_name/
__init__.py
constants.py
utils.py
class_that_all_submodules_need_access_to.py
sub_module_a/
__init__.py
sub_module_a_class_a.py
sub_module_a_class_b.py
useful_for_sub_module_a/
__init__.py
useful_class_a.py
useful_class_b.py
sub_module_b/
__init__.py
sub_module_b_class_a.py
sub_module_b_class_b.py
etc etc etc...
The problem is, I can't figure out how to set up the imports in the init.py's so that I can access class_that_all_submodules_need_access_to.py from sub_module_a/useful_for_sub_module_a/useful_class_a.py.
I've tried looking this up on Google/StackOverflow to exhaustion and I've come up short. The peculiar thing is that PyCharm has the paths set up in such a way that I don't encounter this bug when working on the project in PyCharm, but only from other environments.
So here's one particularly inelegant solution that I've come up with. My sub_module_a/useful_for_sub_module_a/init.py looks like:
import sys
import os
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..', '..', '..')))
import module_name
This is similar in sub_module_*/, where instead of 3 '..'s it's just two (i.e. '..', '..' instead of '..', '..', '..' for the sys.path.insert line above). And then in sub_module_a/useful_for_sub_module_a/useful_class_a.py, I have to import module_name/constants.py (and others) like this:
import module_name.constants
from module_name.class_that_all_submodules_need_access_to import ImportantClass
While this solution works, I was wondering if there is a better/more elegant way to set up the imports and/or folder hierarchy? I'm concerned about messing with the python system path for users of this module. Is that even a valid concern?
There are two kinds of Python import: absolute and relative, see Python import. But the first import thing is that you need to understand how python finds your package? If you just put your package under your home folder, then Python knows nothing about it. You can check this blog [Python search] (https://leemendelowitz.github.io/blog/how-does-python-find-packages.html).
Thus to import modules, the first thing is that let Python know your package. After knowing the locations where Python would search packages, there are often two ways to accomplish this goal:
(1) PYTHONPATH environment variable. Set this variable in your environment configuration file, e.g., .bash_profile. This is also the simplest way.
(2) Use setuptools, which could help you distribute your package. This is a long story. I would not suggest you choose it, unless you would like to distribute your package in the future. However, it worth to know it.
If you set your path correctly, the Python import is just straight forward. If you would like use the absolute import, try
from module_name import class_that_all_submodules_need_access_to
If you would like to use the relative import, it depends on which module you are now. Suppose you are writing the module module_name.sub_module_a.sub_moduel_a_class_a, then try
from .class_that_all_submodules_need_access_to import XX_you_want
Note that relative import supports from .xx import xx format only.
Thanks.

How do I structure my Python project to allow named modules to be imported from sub directories

This is my directory structure:
Projects
+ Project_1
+ Project_2
- Project_3
- Lib1
__init__.py # empty
moduleA.py
- Tests
__init__.py # empty
foo_tests.py
bar_tests.py
setpath.py
__init__.py # empty
foo.py
bar.py
Goals:
Have an organized project structure
Be able to independently run each .py file when necessary
Be able to reference/import both sibling and cousin modules
Keep all import/from statements at the beginning of each file.
I Achieved #1 by using the above structure
I've mostly achieved 2, 3, and 4 by doing the following (as recommended by this excellent guide)
In any package that needs to access parent or cousin modules (such as the Tests directory above) I include a file called setpath.py which has the following code:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('...'))
Then, in each module that needs parent/cousin access, such as foo_tests.py, I can write a nice clean list of imports like so:
import setpath # Annoyingly, PyCharm warns me that this is an unused import statement
import foo.py
Inside setpath.py, the second and third inserts are not strictly necessary for this example, but are included as a troubleshooting step.
My problem is that this only works for imports that reference the module name directly, and not for imports that reference the package. For example, inside bar_tests.py, neither of the two statements below work when running bar_tests.py directly.
import setpath
import Project_3.foo.py # Error
from Project_3 import foo # Error
I receive the error "ImportError: No module named 'Project_3'".
What is odd is that I can run the file directly from within PyCharm and it works fine. I know that PyCharm is doing some behind the scenes magic with the Python Path variable to make everything work, but I can't figure out what it is. As PyCharm simply runs python.exe and sets some environmental variables, it should be possible to clone this behavior from within a Python script itself.
For reasons not really germane to this question, I have to reference bar using the Project_3 qualifier.
I'm open to any solution that accomplishes the above while still meeting my earlier goals. I'm also open to an alternate directory structure if there is one that works better. I've read the Python doc on imports and packages but am still at a loss. I think one possible avenue might be manually setting the __path__ variable, but I'm not sure which one needs to be changed or what to set it to.
Those types of questions qualify as "primarily opinion based", so let me share my opinion how I would do it.
First "be able to independently run each .py file when necessary": either the file is an module, so it should not be called directly, or it is standalone executable, then it should import its dependencies starting from top level (you may avoid it in code or rather move it to common place, by using setup.py entry_points, but then your former executable effectively converts to a module). And yes, it is one of weak points of Python modules model, that causes misunderstandings.
Second, use virtualenv (or venv in Python3) and put each of your Project_x into separate one. This way project's name won't be part of Python module's path.
Third, link you've provided mentions setup.py – you may make use of it. Put your custom code into Project_x/src/mylib1, create src/mylib1/setup.py and finally your modules into src/mylib1/mylib1/module.py. Then you may install your code by pip as any other package (or pip -e so you may work on the code directly without reinstalling it, though it unfortunately has some limitations).
And finally, as you've confirmed in comment already ;). Problem with your current model was that in sys.path.insert(0, os.path.abspath('...')) you'd mistakenly used Python module's notation, which in incorrect for system paths and should be replaced with '../..' to work as expected.
I think your goals are not reasonable. Specifically, goal number 2 is a problem:
Be able to independently run each .py file when neccessary
This doesn't work well for modules in a package. At least, not if you're running the .py files naively (e.g. with python foo_tests.py on the command line). When you run the files that way, Python can't tell where the package hierarchy should start.
There are two alternatives that can work. The first option is to run your scripts from the top level folder (e.g. projects) using the -m flag to the interpreter to give it a dotted path to the main module, and using explicit relative imports to get the sibling and cousin modules. So rather than running python foo_tests.py directly, run python -m project_3.tests.foo_tests from the projects folder (or python -m tests.foo_tests from within project_3 perhaps), and have have foo_tests.py use from .. import foo.
The other (less good) option is to add a top-level folder to your Python installation's module search path on a system wide basis (e.g. add the projects folder to the PYTHON_PATH environment variable), and then use absolute imports for all your modules (e.g. import project3.foo). This is effectively what your setpath module does, but doing it system wide as part of your system's configuration, rather than at run time, it's much cleaner. It also avoids the multiple names that setpath will allow to you use to import a module (e.g. try import foo_tests, tests.foo_tests and you'll get two separate copies of the same module).

Add path to python package to sys.path

I have a case for needing to add a path to a python package to sys.path (instead of its parent directory), but then refer to the package normally by name.
Maybe that's weird, but let me exemplify what I need and maybe you guys know how to achieve that.
I have all kind of experimental folders, modules, etc inside a path like /home/me/python.
Now I don't want to add that folder to my sys.path (PYTHONPATH) since there are experimental modules which names could clash with something useful.
But, inside /home/me/python I want to have a folder like pyutils. So I want to add /home/me/python/pyutils to PYTHONPATH, but, be able to refer to the package by its name pyutils...like I would have added /home/me/python to the path.
One helpful fact is that adding something to the python path is different from importing it into your interpreter. You can structure your modules and submodules such that names will not clash.
Look here regarding how to create modules. I read the documents, but I think that module layout takes a little learning-by-doing, which means creating your modules, and then importing them into scripts, and see if the importing is awkward or requires too much qualification.
Separately consider the python import system. When you import something you can use the "import ... as" feature to name it something different as you import, and thereby prevent naming clashes.
You seem to have already understood how you can change the PYTHONPATH using sys.path(), as documented here.
You have a number of options:
Make a new directory pyutilsdir, place pyutils in pyutilsdir,
and then add pyutilsdir to PYTHONPATH.
Move the experimental code outside of /home/me/python and add python to
your PYTHONPATH.
Rename the experimental modules so their names do not clash with
other modules. Then add python to PYTHONPATH.
Use a version control system like git or hg to make the
experimental modules available or unavailable as desired.
You could have a master branch without the experimental modules,
and a feature branch that includes them. With git, for example, you could switch between
the two with
git checkout [master|feature]
The contents of /home/me/python/pyutils (the git repo directory) would
change depending on which commit is checked out. Thus, using version control, you can keep the experimental modules in pyutils, but only make them present when you checkout the feature branch.
I'll answer to my own question since I got an idea while writing the question, and maybe someone will need that.
I added a link from that folder to my site-packages folder like that:
ln -s /home/me/python/pyutils /path/to/site-packages/pyutils
Then, since the PYTHONPATH contains the /path/to/site-packages folder, and I have a pyutils folder in it, with init.py, I can just import like:
from pyutils import mymodule
And the rest of the /home/me/python is not in the PYTHONPATH

Categories

Resources