Python module importing with sys.path and os.path issue - python

I spent some time researching this and I just cannot work this out in my head.
I run a program in its own directory home/program/core/main.py
In main.py I try and import a module called my_module.py thats located in a different directory, say home/program/modules/my_module.py
In main.py this is how I append to sys.path so the program can be run on anyone's machine (hopefully).
import os.path
import sys
# This should give the path to home/program
sys.path.append(os.path.join(os.path.abspath(os.path.dirname(__file__), '..'))
# Which it does when checking with
print os.path.join(os.path.abspath(os.path.dirname(__file__), '..')
# So now sys.path knows the location of where modules directory is, it should work right?
import modules.my_module # <----RAISES ImportError WHY?
However if I simply do:
sys.path.append('home/program/modules')
import my_module
It all works fine. But this is not ideal as it now depends on the fact that the program must exist under home/program.

that's because modules isn't a valid python package, probably because it doesn't contain any __init__.py file (You cannot traverse directories with import without them being marked with __init__.py)
So either add an empty __init__.py file or just add the path up to modules so your first snippet is equivalent to the second one:
sys.path.append(os.path.join(os.path.abspath(os.path.dirname(__file__), '..','modules'))
import my_module
note that you can also import the module by giving the full path to it, using advanced import features: How to import a module given the full path?

Although the answer can be found here, for convenience and completeness here is a quick solution:
import importlib
dirname, basename = os.path.split(pyfilepath) # pyfilepath: /my/path/mymodule.py
sys.path.append(dirname) # only directories should be added to PYTHONPATH
module_name = os.path.splitext(basename)[0] # /my/path/mymodule.py --> mymodule
module = importlib.import_module(module_name) # name space of defined module (otherwise we would literally look for "module_name")
Now you can directly use the namespace of the imported module, like this:
a = module.myvar
b = module.myfunc(a)

Related

Given a Python *.py file what's the canonical way to determine the dotted import path for that file, programmatically?

Suppose you get a pathlib.Path that points to a *.py file.
And also suppose that this is a resource you could import in another Python file, given the appropriate import path, because your sys.path allows for it.
How do you determine the dotted import path to use in import, from just the python file?
Unlike most of the import-related questions, this is NOT about the path from one Python file to another within a directory hierarchy, it is really more about the import path you could specify in the REPL from anywhere to import that module and that's affected by sys.path contents.
Example:
$test_366_importpath$ tree -I __pycache__
.
└── sub
└── somemodule.py
somemodule.py
"some module"
class Foo:
"Foo class"
If I start python at that location, because sys.path gets the current directory, this works:
from sub.somemodule import Foo
sub.somemodule is what I am interested in.
However, if the sys.path gets altered, then I can use a different import path.
import sys
sys.path.insert(0, "/Users/me/explore/test_366_importpath/sub")
from somemodule import Foo
(note: I wouldn't be doing this "for real", neither the sys.path.insert, nor varying the dotted path I'd use, see #CryptoFool's comment. This is just a convient way to show sys.path impact)
Question:
How do I determine, programmatically, that import sub.somemodule needs to be used as the dotted path? Or import somemodule given different sys.path conditions?
Raising an ImportError or ValueError or some other exceptions if the *.py file is not importable is perfectly OK.
I'm writing a helper using
pa_script = pathlib.Path("somemodule.py").absolute().resolve() and then looking at sys.path. Once I find that a given sys.path entry is the parent for the pa_script, I can use pa_script.relative_to(parent).
From there it's trivial to get the import path by removing the .py extension and replacing os.sep with ..
Then I can feed that dotted path to importlib. Or paste into my code editor.
It's a bit tricky but not particularly hard. Makes me wonder however if there isn't a builtin or canonical way however.
I can post my code, but really if there is a canonical way to do it, it would just give the wrong impression that's it's necessary to do these complicated steps.
Well here goes then, in case anyone needs something similar
(for Python 3.10+, but removing typehints should make it work down to much earlier 3.x versions)
from pathlib import Path
import sys
import os
def get_dotted_path(path_to_py: str | Path, paths: list[str] | None = None) -> str:
"""
return a dotted-path import string from a Python filename
if given, `paths` will be examined as if it was `sys.path` else
`sys.path` is used to determine import points (this is to compute paths
assuming a different sys.path context than the current one)
)
example:
.../lib/python3.10/collections/__init__.py => "collections"
.../lib/python3.10/collections/abc.py => "collections.abc"
raises ImportError if the Python script is not in sys.path or paths
"""
parent = None
pa_target = Path(path_to_py)
paths = paths or sys.path
# get the full file path AND resolve if it's a symlink
pa_script = pa_target.absolute().resolve().absolute()
# consider pkg/subpk/__init__.py as pkg/subpk
if pa_script.name == "__init__.py":
pa_script = pa_script.parent
for path in paths:
pa_path = Path(path)
if pa_path in pa_script.parents:
parent = pa_path
break
else:
newline = "\n"
raise ImportError(
f"{pa_script} nowhere in sys.path: {newline.join([''] + paths)}"
)
pa_relative = pa_script.relative_to(parent)
res = str(pa_relative).removesuffix(".py").replace(os.sep, ".")
return res

Python: Unit Testing Module and Relative Imports

Currently have the following file hierarchy:
\package
__init__.py
run_everything.py
\subpackage
__init__.py
work.py
work1.py
work2.py
\test
__init__.py
test_work.py
test_work1.py
My first question is regarding relative imports. Suppose in \subpackage\work.py I have a function called custom_function(), and I would like to test that function in test_work.py. For some reason I can not figure out how to make this import from one module to another. Trying from .. subpackage.work1 import custom_function() does not seem to work, and yields the error Attempted relative import in non-package Is there any way to resolve this?
2)
I would like to run all test files from run_everything.py with one function, would adding a suite() function in each test_work*.py file, which adds each unit_testing class to suite.addTest(unittest.makeSuite(TestClass)), and finally importing them into the top-level run_everything.py be the most conventional way in Python2.7?
Here is a hack*
Insert the path's to "subpackage" and "test" to your python path in run_everything using:
import sys
sys.path.insert(0, '/path/to/package/subpackage')
sys.path.insert(0, '/path/to/package/test')
And then, you can import all your files using vanilla imports in run_everything:
import work, work1, work2
import test_work, test_work1
*This won't permanently affect your PYTHONPATH.

Import Python file from within executing script

I am attempting to import a python file(called test.py that resides in the parent directory) from within the currently executing python file(I'll call it a.py). All my directories involved have a file in it called init.py(with 2 underscores each side of init)
My Problem: When I attempt to import the desired file I get the following error
Attempted relative import in non-package
My code inside a.py:
try:
from .linkIO can_follow # error occurs here
except Exception,e:
print e
print success
Note: I know that if I were to create a file called b.py and import a.py(which in itself imports the desired python file) it all works, so whats going wrong?
For eg:
b.py:
import a
print "success 2"
As stated in PEP 328 all import must be absolute to prevent modules masking each other. Absolute means the module/package must be in the module-path sys.path. Relative imports (thats the dot for) are only allowed intra-packages wise, meaning if modules from the same package want to import each other.
So this leave you with following possibilities:
You make a package (which you seem to have made already) and add the package-path to sys. path
you just adjust sys.path for each module
you put all your custom modules into the same directory as the start-script/main-application
for 1. and 2. you may add a package/module to sys.path like this:
import sys
from os.path import dirname, join
sys.path.append(dirname(__file__)) #package-root-directory
or
module_dir = 'mymodules'
sys.path.append(join(dirname(__file__), module_dir)) # in the main-file
BTW:
from .linkIO can_follow
can't work! The import statement is missing!
As a reminder: if using relative imports you MUST use the from-version: from .relmodule import xyz. An import .XYZ without the from isn't allowed!

Getting a Python Modules Dierctory from Inside Itself

I have a Python module and I'd like to get that modules directory from inside itself. I want to do this because I have some files that I'd like to reference relative to the module.
First you need to get a reference to the module inside itself.
mod = sys.__modules__[__name__]
Then you can use __file__ to get to the module file.
mod.__file__
Its directory is a dirname of that.
As you are inside the module all you need is this:
import os
path_to_this_module = os.path.dirname(__file__)
However, if the module in question is actually your programs entry point, then __file__ will only be the name of the file and you'll need to expand the path:
import os
path_to_this_module = os.path.dirname(os.path.abspath(__file__))
I think this is what you are looking for:
import <module>
import os
print os.path.dirname(<module>.__file__)
You should be using pkg_resources for this, the resource* family of functions do just about everything you need without having to muck about with the filesystem.
import pkg_resources
data = pkg_resources.resource_string(__name__, "some_file")

Loading each .py file in a path - imp.load_module complains about relative import

I am trying to parse a given path for python source files, import each file and DoStuff™ to each imported module.
def ParsePath(path):
for root, dirs, files in os.walk(path):
for source in (s for s in files if s.endswith(".py")):
name = os.path.splitext(os.path.basename(source))[0]
m = imp.load_module(name, *imp.find_module(name, [root]))
DoStuff(m)
The above code works, but packages aren't recognized ValueError: Attempted relative import in non-package
My question is basically, how do I tell imp.load_module that a given module is part of a package?
You cannot directly tell Importer Protocol method load_module that the module given is part of the package. Taken from PEP 302 New Import Hooks
The built-in __import__ function
(known as PyImport_ImportModuleEx
in import.c) will then check to see whether the module doing the
import is a package or a submodule of a package. If it is indeed a
(submodule of a) package, it first tries to do the import relative
to the package (the parent package for a submodule). For example if
a package named "spam" does "import eggs", it will first look for
a
module named "spam.eggs". If that fails, the import continues as an
absolute import: it will look for a module named "eggs". Dotted
name imports work pretty much the same: if package "spam" does
"import eggs.bacon" (and "spam.eggs" exists and is itself a
package), "spam.eggs.bacon" is tried. If that fails "eggs.bacon" is
tried. (There are more subtleties that are not described here, but
these are not relevant for implementers of the Importer
Protocol.)
Deeper down in the mechanism, a dotted name import is split up by
its components. For "import spam.ham", first an "import spam" is
done, and only when that succeeds is "ham" imported as a submodule
of "spam".
The Importer Protocol operates at this level of individual
imports. By the time an importer gets a request for
"spam.ham",
module "spam" has already been imported.
You must then simulate what the built-in import does and load parent packages before loading sub modules.
The function imp.find_module always takes a plain module name without dots, but the documentation of imp.load_module says
The name argument indicates the full module name (including the package name, if this is a submodule of a package).
So you could try this:
def ParsePath(path):
for root, dirs, files in os.walk(path):
for source in (s for s in files if s.endswith(".py")):
name = os.path.splitext(os.path.basename(source))[0]
full_name = os.path.splitext(source)[0].replace(os.path.sep, '.')
m = imp.load_module(full_name, *imp.find_module(name, [root]))
DoStuff(m)
I had the same problem. Good news is that there is a way of doing it, but you have to use a combination of imp and importlib. Here's an illustrative example:
import imp
import importlib
package_path = r"C:\path_to_package"
package_name = "module"
module_absolute_name = "module.sub_module"
module_relative_name = ".sub_module"
# Load the package first
package_info = imp.find_module(package_name, [package_path])
package_module = imp.load_module(package_name, *package_info)
# Try an absolute import
importlib.import_module(module_absolute_name, package_name)
# Try a relative import
importlib.import_module(module_relative_name, package_name)
This will allow sub_module to import using relative module paths because we've already loaded the parent package and the submodule has been loaded correctly by importlib to know what it's being imported relative to.
I believe this solution is only necessary for those of us stuck in Python 2.*, but would need someone to confirm that.

Categories

Resources