importing a python module from another repo - python

I have written a python module. It resides in a repo. Now to get the unit tests for the module to run, they can't be subfolder of that module. The suggested structure to fix this is:
ModuleRepo/
MyModule/
__init__.py
some_utils.py
import .some_helpers
some_helpers.py
Tests/
some_utils_test.py
import MyModule.some_utils
some_helpers_test.py
import MyModule.some_helpers
Now that works just fine. If I run the tests, they are able to import the module. The module is able to import its own files (eg: some_helpers) by pre-pending '.' to indicate the local folder.
The issue is that a different repo now wants to share this module and I don't know how to make it find the module.
eg:
ModuleRepo/
MyModule/
__init__.py
...
Tests/
...
AnotherRepo/
using_my_module.py
import ??? <-- how to find MyModule?
NOT WORKING ATTEMPT #1:
I tried initially to include the ModuleRepo under AnotherRepo using git's submodule functionality. However I dont actually want the root folder for ModuleRepo, I want the subfolder 'MyModule' only. It turns out git's submodule doesn't do that - one cant choose only a part of a repo to include.
UNDESIRABLE SYMLINK: While a symlink might work, its not something one can 'commit' to a repository and so is somewhat undesirable. Additionally I am developing both on Windows and Linux, so I need a solution which works on both.
POSSIBLE SOLUTION: Turn ModuleRepo root into a module too (adding an init.py). Then I could use git to make it a submodule of AnotherRepo. My import would be ugly but it would be: import my.module.some_utils instead of import mymodule.some_utils
Does anyone have any better solutions?

Several possibilities.
Tweak the sys.path
variable somewhere at the top level of your code to make the ModuleRepo directory listed there.
The upside is that this approach works with any solution you can
use to have these two repositories aside of one another — be it
submodules or subtree merging.
The downside is that you'd need to repeat this tweak also in the
test unit(s) of the code in using_my_module.py.
Use virtualenv for
development.
A part of the setting up the development environment for the project
would be installing of "MyModule" "the regular way".
If the "MyModule" module is not in flux and/or you are okay with
manual periodical incorporation of the developments happening
in "MyModule" into your "main" code base, you can go with so-called "vendoring" by means of using git subtree split and
git subtree add commands (or the so-called "subtree merging" instead
of the latter).
Basically, the git subtree split command allows you to
extract out of the repo hosting "MyModule"
a synthetic history graph containing only the commits which touch
files under the specified prefix — "MyModule", in your case,
and the git subtree add allows you to "subtree-merge" that subgraph
at the specified prefix in another repository.
git subtree command,
and its manual page.
The "subtree" merge strategy.

Related

Importing modules from adjacent folders in Python

I wanted to ask if there is a way to import modules/functions from a folder in a project that is adjacent to another folder.
For example lets say I have two files:
project/src/training.py
project/lib/functions.py
Now both these folders have the __init__.py file in them. If I wanted to import functions.py into training.py, it doesn't seem to detect. I'm trying to use from lib.functions import * .I know this works from the upper level of the folder structure, where I can call both files from a script, but is there a way to do it files in above/sideways folders?
Fundamentally, the best way of doing this depends on exactly how the two modules are related. There's two possibilities:
The modules are part of one cohesive unit that is intended to be used as a single whole, rather than as separate pieces. (This is the most common situation, and it should be your default if you're not sure.)
The functions.py module is a dependency for training.py but is intended to be used separately.
Cohesive unit
If the modules are one cohesive unit, this is not the standard way of structuring a project in Python.
If you need multiple modules in the same project, the standard way of structuring the folders is to include all the modules in a single package, like so:
project/
trainingproject/
__init__.py
training.py
functions.py
other/
...
project/
...
folders/
...
The __init__.py file causes Python to recognize the trainproject/ directory as a single unit called a package. Using a package enables to use of relative imports:
training.py
from . import functions
# The rest of training.py code
Assuming your current directory is project, you can then invoke training.py as a module:
python -m trainingproject.training
Separate units
If your modules are actually separate packages, then the simplest idiomatic solutions during development is to modify the PYTHONPATH environment variable:
sh-derviative syntax:
# All the extra PYTHONPATH references on the right are to append if it already has values
# Using $PWD ensures that the path in the environment variable is absolute.
PYTHONPATH=$PYTHONPATH${PYTHONPATH:+:}$PWD/lib/
python ./src/training.py
PowerShell syntax:
$env:PYTHONPATH = $(if($env:PYTHONPATH) {$env:PYTHONPATH + ';'}) + (Resolve-Path ./lib)
python ./src/training.py
(This is possible in Command Prompt, too, but I'm omitting that since PowerShell is preferred.)
In your module, you would just do a normal import statement:
training.py
import functions
# Rest of training.py code
Doing this will work when you deploy your code to production as well if you copy all the files over and set up the correct paths, but you might want to consider putting functions.py in a wheel and then installing it with pip. That will eliminate the need to set up PYTHONPATH by installing functions.py to site-packages, which will make the import statement just work out of the box. That will also make it easier to distribute functions.py for use with other scripts independent of training.py. I'm not going to cover how to create a wheel here since that is beyond the scope of this question, but here's an introduction.
Yes, it’s as simple as writing the entire path from the working directory:
from project.src.training import *
Or
from project.lib.functions import *
I agree with what polymath stated above. If you were also wondering how to run these specific scripts or functions once they are imported, use: your_function_name(parameters), and to run a script that you have imported from the same directory, etc, use: exec(‘script_name.py). I would recommend making functions instead of using the exec command however, because it can be a bit hard to use correctly.

How do I structure my Python project to allow named modules to be imported from sub directories

This is my directory structure:
Projects
+ Project_1
+ Project_2
- Project_3
- Lib1
__init__.py # empty
moduleA.py
- Tests
__init__.py # empty
foo_tests.py
bar_tests.py
setpath.py
__init__.py # empty
foo.py
bar.py
Goals:
Have an organized project structure
Be able to independently run each .py file when necessary
Be able to reference/import both sibling and cousin modules
Keep all import/from statements at the beginning of each file.
I Achieved #1 by using the above structure
I've mostly achieved 2, 3, and 4 by doing the following (as recommended by this excellent guide)
In any package that needs to access parent or cousin modules (such as the Tests directory above) I include a file called setpath.py which has the following code:
import os
import sys
sys.path.insert(0, os.path.abspath('..'))
sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('...'))
Then, in each module that needs parent/cousin access, such as foo_tests.py, I can write a nice clean list of imports like so:
import setpath # Annoyingly, PyCharm warns me that this is an unused import statement
import foo.py
Inside setpath.py, the second and third inserts are not strictly necessary for this example, but are included as a troubleshooting step.
My problem is that this only works for imports that reference the module name directly, and not for imports that reference the package. For example, inside bar_tests.py, neither of the two statements below work when running bar_tests.py directly.
import setpath
import Project_3.foo.py # Error
from Project_3 import foo # Error
I receive the error "ImportError: No module named 'Project_3'".
What is odd is that I can run the file directly from within PyCharm and it works fine. I know that PyCharm is doing some behind the scenes magic with the Python Path variable to make everything work, but I can't figure out what it is. As PyCharm simply runs python.exe and sets some environmental variables, it should be possible to clone this behavior from within a Python script itself.
For reasons not really germane to this question, I have to reference bar using the Project_3 qualifier.
I'm open to any solution that accomplishes the above while still meeting my earlier goals. I'm also open to an alternate directory structure if there is one that works better. I've read the Python doc on imports and packages but am still at a loss. I think one possible avenue might be manually setting the __path__ variable, but I'm not sure which one needs to be changed or what to set it to.
Those types of questions qualify as "primarily opinion based", so let me share my opinion how I would do it.
First "be able to independently run each .py file when necessary": either the file is an module, so it should not be called directly, or it is standalone executable, then it should import its dependencies starting from top level (you may avoid it in code or rather move it to common place, by using setup.py entry_points, but then your former executable effectively converts to a module). And yes, it is one of weak points of Python modules model, that causes misunderstandings.
Second, use virtualenv (or venv in Python3) and put each of your Project_x into separate one. This way project's name won't be part of Python module's path.
Third, link you've provided mentions setup.py – you may make use of it. Put your custom code into Project_x/src/mylib1, create src/mylib1/setup.py and finally your modules into src/mylib1/mylib1/module.py. Then you may install your code by pip as any other package (or pip -e so you may work on the code directly without reinstalling it, though it unfortunately has some limitations).
And finally, as you've confirmed in comment already ;). Problem with your current model was that in sys.path.insert(0, os.path.abspath('...')) you'd mistakenly used Python module's notation, which in incorrect for system paths and should be replaced with '../..' to work as expected.
I think your goals are not reasonable. Specifically, goal number 2 is a problem:
Be able to independently run each .py file when neccessary
This doesn't work well for modules in a package. At least, not if you're running the .py files naively (e.g. with python foo_tests.py on the command line). When you run the files that way, Python can't tell where the package hierarchy should start.
There are two alternatives that can work. The first option is to run your scripts from the top level folder (e.g. projects) using the -m flag to the interpreter to give it a dotted path to the main module, and using explicit relative imports to get the sibling and cousin modules. So rather than running python foo_tests.py directly, run python -m project_3.tests.foo_tests from the projects folder (or python -m tests.foo_tests from within project_3 perhaps), and have have foo_tests.py use from .. import foo.
The other (less good) option is to add a top-level folder to your Python installation's module search path on a system wide basis (e.g. add the projects folder to the PYTHON_PATH environment variable), and then use absolute imports for all your modules (e.g. import project3.foo). This is effectively what your setpath module does, but doing it system wide as part of your system's configuration, rather than at run time, it's much cleaner. It also avoids the multiple names that setpath will allow to you use to import a module (e.g. try import foo_tests, tests.foo_tests and you'll get two separate copies of the same module).

Add path to python package to sys.path

I have a case for needing to add a path to a python package to sys.path (instead of its parent directory), but then refer to the package normally by name.
Maybe that's weird, but let me exemplify what I need and maybe you guys know how to achieve that.
I have all kind of experimental folders, modules, etc inside a path like /home/me/python.
Now I don't want to add that folder to my sys.path (PYTHONPATH) since there are experimental modules which names could clash with something useful.
But, inside /home/me/python I want to have a folder like pyutils. So I want to add /home/me/python/pyutils to PYTHONPATH, but, be able to refer to the package by its name pyutils...like I would have added /home/me/python to the path.
One helpful fact is that adding something to the python path is different from importing it into your interpreter. You can structure your modules and submodules such that names will not clash.
Look here regarding how to create modules. I read the documents, but I think that module layout takes a little learning-by-doing, which means creating your modules, and then importing them into scripts, and see if the importing is awkward or requires too much qualification.
Separately consider the python import system. When you import something you can use the "import ... as" feature to name it something different as you import, and thereby prevent naming clashes.
You seem to have already understood how you can change the PYTHONPATH using sys.path(), as documented here.
You have a number of options:
Make a new directory pyutilsdir, place pyutils in pyutilsdir,
and then add pyutilsdir to PYTHONPATH.
Move the experimental code outside of /home/me/python and add python to
your PYTHONPATH.
Rename the experimental modules so their names do not clash with
other modules. Then add python to PYTHONPATH.
Use a version control system like git or hg to make the
experimental modules available or unavailable as desired.
You could have a master branch without the experimental modules,
and a feature branch that includes them. With git, for example, you could switch between
the two with
git checkout [master|feature]
The contents of /home/me/python/pyutils (the git repo directory) would
change depending on which commit is checked out. Thus, using version control, you can keep the experimental modules in pyutils, but only make them present when you checkout the feature branch.
I'll answer to my own question since I got an idea while writing the question, and maybe someone will need that.
I added a link from that folder to my site-packages folder like that:
ln -s /home/me/python/pyutils /path/to/site-packages/pyutils
Then, since the PYTHONPATH contains the /path/to/site-packages folder, and I have a pyutils folder in it, with init.py, I can just import like:
from pyutils import mymodule
And the rest of the /home/me/python is not in the PYTHONPATH

Best practice for handling path/executables in project scripts in Python (e.g. something like Django's manage.py, or fabric)

I do a lot of work on different projects (I'm a scientist) in a fairly standardised directory structure. e.g.:
project
/analyses/
/lib
/doc
/results
/bin
I put all my various utility scripts in /bin/ because cleanliness is next to godliness. However, I have to hard code paths (e.g. ../../x/y/z) and then I have to run things within ./bin/ or they break.
I've used Django and that has /manage.py which runs various django-things and automatically handles the path. I've also used fabric to run various user defined functions.
Question: How do I do something similar? and what's the best way? I can easily write something in /manage.py to inject the root dir into sys.path etc, but then I'd like to be able to do "./manage.py foo" which would run /bin/foo.py. Or is it possible to get fabric to call executables from a certain directory?
Basically - I want something easy and low maintenance. I want to be able to drop an executable script/file/whatever into ./bin/ and not have to deal with path issues or import issues.
What is the best way to do this?
Keep Execution at TLD
In general, try to keep your runtime at top-level. This will straighten out your imports tremendously.
If you have to do a lot of import addressing with relative imports, there's probably a
better way.
Modifying The Path
Other poster's have mentioned the PYTHONPATH. That's a great way to do it permanently in your shell.
If you don't want to/aren't able to manipulate the PYTHONPATH project path directly you can use sys.path to get yourself out of relative import hell.
Using sys.path.append
sys.path is just a list internally. You can append to it to add stuff to into your path.
Say I'm in /bin and there's a library markdown in lib/. You can append a relative paths with sys.path to import what you want.
import sys
sys.path.append('../lib')
import markdown
print markdown.markdown("""
Hello world!
------------
""")
Word to the wise: Don't get too crazy with your sys.path additions. Keep your schema simple to avoid yourself a lot confusion.
Overly eager imports can sometimes lead to cases where a python module needs to import itself, at which point execution will halt!
Using Packages and __init__.py
Another great trick is creating python packages by adding __init__.py files. __init__.py gets loaded before any other modules in the directory, so it's a great way to add imports across the entire directory. This makes it an ideal spot to add sys.path hackery.
You don't even need to necessarily add anything to the file. It's sufficient to just do touch __init__.py at the console to make a directory a package.
See this SO post for a more concrete example.
In a shell script that you source (not run) in your current shell you set the following environment variables:
PATH=$PATH:$PROJECTDIR/bin
PYTHONPATH=$PROJECTDIR/lib
Then you put your Python modules and package tree in your projects ./lib directory. Python automatically adds the PYTHONPATH environment variable to sys.path.
Then you can run any top-level script from the shell without specifying the path, and any imports from your library modules are looked for in the lib directory.
I recommend very simple top-level scripts, such as:
#!/usr/bin/python
import sys
import mytool
mytool.main(sys.argv)
Then you never have to change that, you just edit the module code, and also benefit from the byte-code caching.
You can easily achieve your goals by creating a mini package that hosts each one of your projects. Use paste scripts to create a simple project skeleton. And to make it executable, just install it via setup.py develop. Now your bin scripts just need to import the entry point to this package and execute it.

Refactoring python module configuration to avoid relative imports

This is related to a previous question of mine.
I understand how to store and read configuration files. There are choices such as ConfigParser and ConfigObj.
Consider this structure for a hypothetical 'eggs' module:
eggs/
common/
__init__.py
config.py
foo/
__init__.py
a.py
'eggs.foo.a' needs some configuration information. What I am currently doing is, in 'a', import eggs.common.config. One problem with this is that if 'a' is moved to a deeper level in the module tree, the relative imports break. Absolute imports don't, but they require your module to be on your PYTHONPATH.
A possible alternative to the above absolute import is a relative import. Thus, in 'a',
import .common.config
Without debating the merits of relative vs absolute imports, I was wondering about other possible solutions?
edit- Removed the VCS context
"imports ... require your module to be on your PYTHONPATH"
Right.
So, what's wrong with setting PYTHONPATH?
require statement from pkg_resources maybe what you need.
As I understand it from this and previous questions you only need one path to be in sys.path. If we are talking about git as VCS (mentioned in previous question) when only one branch is checked out at any time (single working directory). You can switch, merge branches as frequently as you like.
I'm thinking of something along the lines of a more 'push-based' kind of solution. Instead of importing the shared objects (be they for configuration, or utility functions of some sort), have the top-level init export it, and each intermediate init import it from the layer above, and immediately re-export it.
I'm not sure if I've got the python terminology right, please correct me if I'm wrong.
Like this, any module that needs to use the shared object(which in the context of this example represents configuration information) simply imports it from the init at its own level.
Does this sound sensible/feasible?
You can trick the import mechanism, by adding each subdirectory to egg/__init__.py:
__path__.append(__path__[0]+"\\common")
__path__.append(__path__[0]+"\\foo")
then, you simply import all modules from the egg namespace; e.g. import egg.bar (provided you have file egg/foo/bar.py).
Note that foo and common should not be a package - in other words, they should not contain __init__.py file.
This solution completely solves the issue of eventually moving files around; however it flattens the namespace and therefore it may not be as good, especially in big projects - personally, I prefer full name resolution.

Categories

Resources