I have a Python package that has an optional [extras] dependency, yet I want to adhere to typing on all methods.
The situation is that in my file, I have this
class MyClass:
def __init__(self, datastore: Datastore): # <- Datastore is azureml.core.Datastore
...
def my_func(self):
from azureml.core import Datastore
...
I import from within the function because there are other classes in the same file that should be imported when not using the extras (extras being azureml).
So this obviously fails, because I refer to Datastore before importing it. Removing the Datastore typing from the __init__ method obviously solves the problem.
So in general my question is whether it is possible, and if so how, to use typing when typing an optional (extras) package.
Notice, that importing in the class definition (below the class MyClass statement) is not a valid solution, as this code is called when the module is imported
You can use TYPE_CHECKING:
A special constant that is assumed to be True by 3rd party static type
checkers. It is False at runtime.
It is False at runtime: So it doesn't affect your module's behavior.
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from azureml.core import Datastore
class MyClass:
def __init__(self, datastore: Datastore):
...
def my_func(self):
from azureml.core import Datastore
...
Since I want to show this in action, I will use operator.itemgetter as an instance because it's recognizable for type checkers, but azureml.core is not:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from operator import itemgetter
class MyClass:
def __init__(self, datastore: itemgetter):
...
def my_func(self):
from operator import itemgetter
...
obj1 = MyClass(itemgetter(1)) # line 16
obj2 = MyClass(10) # line 17
Here is the Mypy error:
main.py:17: error: Argument 1 to "MyClass" has incompatible type "int"; expected "itemgetter[Any]"
Found 1 error in 1 file (checked 1 source file)
Which shows it works as excepted.
Just to add my two cents:
While it is certainly a solution, I consider the use of the TYPE_CHECKING constant a red flag regarding the project structure. It typically (though not always) either shows the presence of circular dependencies or poor separation of concerns.
In your case it seems to be the latter, as you state this:
I import from within the function because there are other classes in the same file that should be imported when not using the extras
If MyClass provides optional functionality to your package, it should absolutely reside in its own module and not alongside other classes that provide core functionality.
When you put MyClass into its own module (say my_class), you can place its dependencies at the top with all the other imports. Then you put the import from my_class inside a function that handles the logic of loading internal optional dependencies.
Aside from visibility and arguably better style, one advantage of such a setup over the one you presented is that the my_class module will be consistent in itself and fail on import, if the extra azureml dependency is missing (or broken/renamed/deprecated), rather than at runtime only when MyClass.my_func is called.
You'd be surprised how easy it is to accidentally forget to install all extra dependencies (even in a production environment). Then you'll thank the stars, when the code fails immediately and transparently, rather than causing errors at some point later at runtime.
Related
What I'd like to do
I'd like to import a Python module without adding it to the local namespace.
In other words, I'd like to do this:
import foo
del foo
Is there a cleaner way to do this?
Why I want to do it
The short version is that importing foo has a side effect that I want, but I don't really want it in my namespace afterwards.
The long version is that I have a base class that uses __init_subclass__() to register its subclasses. So base.py looks like this:
class Base:
_subclasses = {}
def __init_subclass__(cls, **kwargs):
super().__init_subclass__(**kwargs)
cls._subclasses[cls.__name__] = cls
#classmethod
def get_subclass(cls, class_name):
return cls._subclasses[class_name]
And its subclasses are defined in separate files, e.g. foo_a.py:
from base import Base
class FooA(Base):
pass
and so on.
The net effect here is that if I do
from base import Base
print(f"Before import: {Base._subclasses}")
import foo_a
import foo_b
print(f"After import: {Base._subclasses}")
then I would see
Before import: {}
After import: {'FooA': <class 'foo_a.FooA'>, 'FooB': <class 'foo_b.FooB'>}
So I needed to import these modules for the side effect of adding a reference to Base._subclasses, but now that that's done, I don't need them in my namespace anymore because I'm just going to be using Base.get_subclass().
I know I could just leave them there, but this is going into an __init__.py so I'd like to tidy up that namespace.
del works perfectly fine, I'm just wondering if there's a cleaner or more idiomatic way to do this.
If you want to import a module without assigning the module object to a variable, you can use importlib.import_module and ignore the return value:
import importlib
importlib.import_module("foo")
Note that using importlib.import_module is preferable over using the __import__ builtin directly for simple usages. See the builtin documenation for details.
I have the following (toy) package structure
root/
- package1/
- __init__.py
- class_a.py
- class_b.py
- run.py
In both class_a.py and class_b.py I have a class definition that I want to expose to run.py. If I want to import them this way, I will have to use
from package1.class_a import ClassA # works but doesn't look nice
I don't like that this shows the class_a.py module, and would rather use the import style
from package1 import ClassA # what I want
This is also closer to what I see from larger libraries. I found a way to do this by importing the classes in the __init__.py file like so
from class_a import ClassA
from class_b import ClassB
This works fine if it wasn't for one downside: as soon as I import ClassA as I would like (see above), I also immediately 'import' ClassB as, as far as I know, the __init__.py will be run, importing ClassB. In my real scenario, this means I implicitly import a huge class that I use very situationally (which itself imports tensorflow), so I really want to avoid this somehow. Is there a way to create the nice looking imports without automatically importing everything in the package?
It is possible but require a rather low level customization: you will have to customize the class of your package (possible since Python 3.5). That way, you can declare a __getattr__ member that will be called when you ask for a missing attribute. At that moment, you know that you have to import the relevant module and extract the correct attribute.
The init.py file should contain (names can of course be changed):
import importlib
import sys
import types
class SpecialModule(types.ModuleType):
""" Customization of a module that is able to dynamically loads submodules.
It is expected to be a plain package (and to be declared in the __init__.py)
The special attribute is a dictionary attribute name -> relative module name.
The first time a name is requested, the corresponding module is loaded, and
the attribute is binded into the package
"""
special = {'ClassA': '.class_a', 'ClassB': '.class_b'}
def __getattr__(self, name):
if name in self.special:
m = importlib.import_module(self.special[name], __name__) # import submodule
o = getattr(m, name) # find the required member
setattr(sys.modules[__name__], name, o) # bind it into the package
return o
else:
raise AttributeError(f'module {__name__} has no attribute {name}')
sys.modules[__name__].__class__ = SpecialModule # customize the class of the package
You can now use it that way:
import package1
...
obj = package1.ClassA(...) # dynamically loads class_a on first call
The downside is that clever IDE that look at the declared member could choke on that and pretend that you are accessing an inexistant member because ClassA is not statically declared in package1/__init__.py. But all will be fine at run time.
As it is a low level customization, it is up to you do know whether it is worth it...
Since 3.7 you could also declare a __gettatr__(name) function directly at the module level.
As I dig further into Python internals, I start to see abc's more often in the documentation. Unfortunately the docs don't explain how they can be used. I haven't even been able to use the "concrete implementations" of these abstract base classes.
For example, reading about class importlib.abc.SourceLoader, one learns that "is_package" is a concrete implementation of InspectLoader.is_package(). But what if I'd like to use that in my code? Is it possible? I've tried many ways but the method can't be imported.
ExtensionFileLoader is documented as a concrete implementation of importlib.abc.ExecutionLoader, but if I try to use it (such as: from importlib import machinery.ExecutionLoader), once again it can't be found.
If these methods can't be imported, why are they documented? is there any sample code to show how they can be used? Example:
import importlib.abc.SourceLoader # doesn't work
class try_pkg_check():
def main(self, source_file_name):
possible_pkgs = ['math', 'numpy']
for posbl_pkg in possible_pkgs:
answer = SourceLoader.is_package(posbl_pkg)
print("For {}, the answer is: {}".format(posbl_pkg, answer))
return None
if __name__ == "__main__":
instantiated_obj = try_pkg_check()
instantiated_obj.main()
People might comment that I shouldn't try to import an abstract class. But "is_package" is documented as concrete, so I should be able to use it somehow, which is my question.
import importlib.abc.SourceLoader
The error message that this line produces should give you a hint where you've gone wrong:
ModuleNotFoundError: No module named 'importlib.abc.SourceLoader'; 'importlib.abc' is not a package
"import foo" requires that foo be a module, but SourceLoader is a class inside a module. You need to instead write:
from importlib.abc import SourceLoader
However, there are further problems with this line:
answer = SourceLoader.is_package(posbl_pkg)
First of all, SourceLoader.is_package is an instance method, not a class or static method; it has to be called on an instance of SourceLoader, not on the class itself. However, SourceLoader is an abstract class, so it can't be directly instantiated; you need to use a concrete subclass like SourceFileLoader instead. (When the docs call SourceLoader.is_package a "concrete implementation" of InspectLoader.is_package, I believe what they mean is that SourceLoader provides a default implementation for is_package so that its subclasses don't need to override it in order to be non-abstract.)
Hence, you need to write:
from importlib.machinery import SourceFileLoader
...
answer = SourceFileLoader(fullname, path).is_package(fullname)
where fullname is "a fully resolved name of the module the loader is to handle" and path is "the path to the file for the module."
Here's the minimal reproduction for something I'm working on. This is using Python 3.6.5:
sample.py:
import importlib.util
import inspect
from test import Test
t = Test()
spec = importlib.util.spec_from_file_location('test', './test.py')
module = importlib.util.module_from_spec(spec)
spec.loader.exec_module(module)
loaded_test = None
for name, obj in inspect.getmembers(module):
if inspect.isclass(obj):
loaded_test = obj
print(type(t))
print(loaded_test)
print(isinstance(t, loaded_test))
print(issubclass(t.__class__, loaded_test))
test.py (in the same directory):
class Test(object):
pass
Running this code will give you the following output:
<class 'test.Test'>
<class 'test.Test'>
False
False
So why is the object that we load using importlib, which is identified as 'test.Test', not an instance or subclass of the 'test.Test' class I created using import? Is there a way to programmatically check if they're the same class, or is it impossible because the context of their instantiation is different?
Why is the object that we load using importlib, which is identified as test.Test, not an instance or subclass of the test.Test class I created using import?
A class is "just" an instance of a metaclass. The import system generally prevents class objects from being instantiated more than once: classes are usually defined at a module scope, and if a module has already been imported the existing module is just reused for subsequent import statements. So, different references to the same class all end up pointing to an identical class object living at the same memory location.
By using exec_module you prevented this "cache hit" in sys.modules, forcing the class declaration to be executed again, and a new class object to be created in memory.
issubclass is not doing anything clever like a deep inspection of the class source code, it's more or less just looking for identity (CPython's implementation here, with a fast-track for exact match and some complications for supporting ABCs)
Is there a way to programmatically check if they're the same class, or is it impossible because the context of their instantiation is different?
They are not the same class. Although the source code is identical, they exist in different memory locations. You don't need the complications of exec_module to see this, by the way, there are simpler ways to force recreation of the "same" class:
>>> import test
>>> t = test.Test()
>>> isinstance(t, test.Test)
True
>>> del sys.modules['test']
>>> import test
>>> isinstance(t, test.Test)
False
Or, define the class in a function block and return it from the function call. Or, create classes from the same source code by using the three-argument version of type(name, bases, dict). The isinstance check (CPython implementation here) is simple and will not detect these misdirections.
This question already has answers here:
What can I do about "ImportError: Cannot import name X" or "AttributeError: ... (most likely due to a circular import)"?
(17 answers)
Closed 7 months ago.
Let's say I have the following directory structure:
a\
__init__.py
b\
__init__.py
c\
__init__.py
c_file.py
d\
__init__.py
d_file.py
In the a package's __init__.py, the c package is imported. But c_file.py imports a.b.d.
The program fails, saying b doesn't exist when c_file.py tries to import a.b.d. (And it really doesn't exist, because we were in the middle of importing it.)
How can this problem be remedied?
You may defer the import, for example in a/__init__.py:
def my_function():
from a.b.c import Blah
return Blah()
that is, defer the import until it is really needed. However, I would also have a close look at my package definitions/uses, as a cyclic dependency like the one pointed out might indicate a design problem.
If a depends on c and c depends on a, aren't they actually the same unit then?
You should really examine why you have split a and c into two packages, because either you have some code you should split off into another package (to make them both depend on that new package, but not each other), or you should merge them into one package.
I've wondered this a couple times (usually while dealing with models that need to know about each other). The simple solution is just to import the whole module, then reference the thing that you need.
So instead of doing
from models import Student
in one, and
from models import Classroom
in the other, just do
import models
in one of them, then call models.Classroom when you need it.
Circular Dependencies due to Type Hints
With type hints, there are more opportunities for creating circular imports. Fortunately, there is a solution using the special constant: typing.TYPE_CHECKING.
The following example defines a Vertex class and an Edge class. An edge is defined by two vertices and a vertex maintains a list of the adjacent edges to which it belongs.
Without Type Hints, No Error
File: vertex.py
class Vertex:
def __init__(self, label):
self.label = label
self.adjacency_list = []
File: edge.py
class Edge:
def __init__(self, v1, v2):
self.v1 = v1
self.v2 = v2
Type Hints Cause ImportError
ImportError: cannot import name 'Edge' from partially initialized module 'edge' (most likely due to a circular import)
File: vertex.py
from typing import List
from edge import Edge
class Vertex:
def __init__(self, label: str):
self.label = label
self.adjacency_list: List[Edge] = []
File: edge.py
from vertex import Vertex
class Edge:
def __init__(self, v1: Vertex, v2: Vertex):
self.v1 = v1
self.v2 = v2
Solution using TYPE_CHECKING
File: vertex.py
from typing import List, TYPE_CHECKING
if TYPE_CHECKING:
from edge import Edge
class Vertex:
def __init__(self, label: str):
self.label = label
self.adjacency_list: List['Edge'] = []
File: edge.py
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from vertex import Vertex
class Edge:
def __init__(self, v1: 'Vertex', v2: 'Vertex'):
self.v1 = v1
self.v2 = v2
Quoted vs. Unquoted Type Hints
In versions of Python prior to 3.10, conditionally imported types must be enclosed in quotes, making them “forward references”, which hides them from the interpreter runtime.
In Python 3.7, 3.8, and 3.9, a workaround is to use the following special import.
from __future__ import annotations
This enables using unquoted type hints combined with conditional imports.
Python 3.10 (See PEP 563 -- Postponed Evaluation of Annotations)
In Python 3.10, function and variable annotations will no longer be
evaluated at definition time. Instead, a string form will be preserved
in the respective annotations dictionary. Static type checkers
will see no difference in behavior, whereas tools using annotations at
runtime will have to perform postponed evaluation.
The string form is obtained from the AST during the compilation step,
which means that the string form might not preserve the exact
formatting of the source. Note: if an annotation was a string literal
already, it will still be wrapped in a string.
The problem is that when running from a directory, by default only the packages that are sub directories are visible as candidate imports, so you cannot import a.b.d. You can however import b.d. since b is a sub package of a.
If you really want to import a.b.d in c/__init__.py you can accomplish this by changing the system path to be one directory above a and change the import in a/__init__.py to be import a.b.c.
Your a/__init__.py should look like this:
import sys
import os
# set sytem path to be directory above so that a can be a
# package namespace
DIRECTORY_SCRIPT = os.path.dirname(os.path.realpath(__file__))
sys.path.insert(0,DIRECTORY_SCRIPT+"/..")
import a.b.c
An additional difficulty arises when you want to run modules in c as scripts. Here the packages a and b do not exist. You can hack the __int__.py in the c directory to point the sys.path to the top-level directory and then import __init__ in any modules inside c to be able to use the full path to import a.b.d. I doubt that it is good practice to import __init__.py but it has worked for my use cases.
I suggest the following pattern. Using it will allow auto-completion and type hinting to work properly.
cyclic_import_a.py
import playground.cyclic_import_b
class A(object):
def __init__(self):
pass
def print_a(self):
print('a')
if __name__ == '__main__':
a = A()
a.print_a()
b = playground.cyclic_import_b.B(a)
b.print_b()
cyclic_import_b.py
import playground.cyclic_import_a
class B(object):
def __init__(self, a):
self.a: playground.cyclic_import_a.A = a
def print_b(self):
print('b1-----------------')
self.a.print_a()
print('b2-----------------')
You cannot import classes A & B using this syntax
from playgroud.cyclic_import_a import A
from playground.cyclic_import_b import B
You cannot declare the type of parameter a in class B __ init __ method, but you can "cast" it this way:
def __init__(self, a):
self.a: playground.cyclic_import_a.A = a
Another solution is to use a proxy for the d_file.
For example, let's say that you want to share the blah class with the c_file. The d_file thus contains:
class blah:
def __init__(self):
print("blah")
Here is what you enter in c_file.py:
# do not import the d_file !
# instead, use a place holder for the proxy of d_file
# it will be set by a's __init__.py after imports are done
d_file = None
def c_blah(): # a function that calls d_file's blah
d_file.blah()
And in a's init.py:
from b.c import c_file
from b.d import d_file
class Proxy(object): # module proxy
pass
d_file_proxy = Proxy()
# now you need to explicitly list the class(es) exposed by d_file
d_file_proxy.blah = d_file.blah
# finally, share the proxy with c_file
c_file.d_file = d_file_proxy
# c_file is now able to call d_file.blah
c_file.c_blah()