Duck Typing Annotations in Python3 - python

I am trying to add a type annotation to a function input argument that is a dataclass with attributes that overlap with another dataclass, which actually gets passed in as an input argument.
Consider the following code:
from dataclasses import dataclass
from typing import TypeVar
#dataclass
class Foo:
a: str
zar: str
#dataclass
class Car(Foo):
b: str
#dataclass
class CarInterface:
a: str
b: str
mar = TypeVar("mar", bound=CarInterface)
def blah(x: mar):
print(x.a)
car_instance = Car(a="blah blah", zar="11", b="bb")
blah(car_instance)
In this example, I'm trying to create my own type annotation mar which is bound by CarInterface. I want to check that whatever class is passed into blah() at least has a and b attributes (don't care if the class has other attributes such as zar). I want to do it this way because class Car (which actually gets passed in) is one of many classes that will be written in the future and passed into this function.
I also want it to be very easy to define a new Car, so I would like to avoid abstract classes as I don't think the added complexity is worth mypy being happy.
So I'm trying to create mar which uses duck typing to say that Car satisfies the interface of CarInterface.
However, I get two mypy errors.
The first is on the mar annotation in def blah
TypeVar "mar" appears only once in generic function signaturePylancereportInvalidTypeVarUse
And the other is where I pass car_instance into blah()
Argument of type "Car" cannot be assigned to parameter "x" of type "bar#blah" in function "blah"
Type "Car" cannot be assigned to type "CarInterface"
"Car" is incompatible with "CarInterface"PylancereportGeneralTypeIssues

Use a Protocol to define CarInterface rather than a dataclass:
from dataclasses import dataclass
from typing import Protocol
#dataclass
class Foo:
a: str
zar: str
#dataclass
class Car(Foo):
b: str
class CarInterface(Protocol):
a: str
b: str
def blah(x: CarInterface):
print(x.a)
car_instance = Car(a="blah blah", zar="11", b="bb")
blah(car_instance)
The above code will typecheck fine, but if you try to pass blah a Foo instead of a Car you'll get a mypy error like this:
test.py:22: error: Argument 1 to "blah" has incompatible type "Foo"; expected "CarInterface"
test.py:22: note: "Foo" is missing following "CarInterface" protocol member:
test.py:22: note: b
Found 1 error in 1 file (checked 1 source file)
A Protocol can be used as the bound for a TypeVar, but it's only necessary to use a TypeVar if you want to indicate that two variables not only implement the protocol but are also the same specific type (e.g. to indicate that a function takes any object implementing CarInterface and returns the same exact type of object rather than some other arbitrary CarInterface implementation).

Related

Python: how to type hint a dataclass?

The code below works, but I'm getting the following warning by PyCharm:
Cannot find reference __annotations__ in '(...) -> Any'.
I guess it's because I'm using Callable. I didn't find something like Dataclass. Which type should I use instead?
from __future__ import annotations
from dataclasses import dataclass
from typing import Callable
#dataclass
class Fruit:
color: str
taste: str
def get_cls() -> Callable:
return Fruit
attrs = get_cls().__annotations__ # <- IDE warning
print(attrs)
In this particular example you can just hint it directly:
from dataclasses import dataclass
#dataclass
class Fruit:
x: str
def get_cls() -> type[Fruit]:
return Fruit
attrs = get_cls().__annotations__
print(attrs)
$ python d.py
{'x': <class 'str'>}
$ mypy d.py
Success: no issues found in 1 source file
However I don't know if this is what you're asking. Are you after a generic type for any dataclass? (I would be tempted just to hint the union of all possible return types of get_cls(): the whole point about using a dataclass rather than e.g. a dict is surely to distinguish between types of data. And you do want your typechecker to warn you if you try to access attributes not defined on one of your dataclasses.)
References
See the docs on typing.Type which is now available as type (just like we can now use list and dict rather than typing.List and typing.Dict).
The simplest option is to remove the return type annotation in its entirety.
Note: PyCharm is usually smart enough to infer the return type automatically.
from __future__ import annotations
from dataclasses import dataclass
# remove this line
# from typing import Callable
#dataclass
class Fruit:
color: str
taste: str
# def get_cls() -> Callable: <== No, the return annotation is wrong (Fruit is *more* than a callable)
def get_cls():
return Fruit
attrs = get_cls().__annotations__ # <- No IDE warning, Yay!
print(attrs)
In PyCharm, the return type is correctly inferred:
To generically type hint a dataclass - since dataclasses are essentially Python classes under the hood, with auto-generated methods and some "extra" class attributes added in to the mix, you could just type hint it with typing.Protocol as shown below:
from __future__ import annotations
from dataclasses import dataclass, Field
from typing import TYPE_CHECKING, Any, Callable, Iterable, Protocol
if TYPE_CHECKING:
# this won't print
print('Oh YEAH !!')
class DataClass(Protocol):
__dict__: dict[str, Any]
__doc__: str | None
# if using `#dataclass(slots=True)`
__slots__: str | Iterable[str]
__annotations__: dict[str, str | type]
__dataclass_fields__: dict[str, Field]
# the actual class definition is marked as private, and here I define
# it as a forward reference, as I don't want to encourage
# importing private or "unexported" members.
__dataclass_params__: '_DataclassParams'
__post_init__: Callable | None
#dataclass
class Fruit:
color: str
taste: str
# noinspection PyTypeChecker
def get_cls() -> type[DataClass]:
return Fruit
attrs = get_cls().__annotations__ # <- No IDE warning, Yay!
Costs to class def
To address the comments, there does appear to be a non-negligible runtime cost associated to class definitions - hence why I wrap the def with an if block above.
The following code compares the performance with both approaches, to confirm this suspicion:
from __future__ import annotations
from dataclasses import dataclass, Field
from timeit import timeit
from typing import TYPE_CHECKING, Any, Callable, Iterable, Protocol
n = 100_000
print('class def: ', timeit("""
class DataClass(Protocol):
__dict__: dict[str, Any]
__doc__: str | None
__slots__: str | Iterable[str]
__annotations__: dict[str, str | type]
__dataclass_fields__: dict[str, Field]
__dataclass_params__: '_DataclassParams'
__post_init__: Callable | None
""", globals=globals(), number=n))
print('if <bool>: ', timeit("""
if TYPE_CHECKING:
class DataClass(Protocol):
__dict__: dict[str, Any]
__doc__: str | None
__slots__: str | Iterable[str]
__annotations__: dict[str, str | type]
__dataclass_fields__: dict[str, Field]
__dataclass_params__: '_DataclassParams'
__post_init__: Callable | None
""", globals=globals(), number=n))
Results, on Mac M1 running Python 3.10:
class def: 0.7453760829521343
if <bool>: 0.0009954579873010516
Hence, it appears to be much faster overall to wrap a class definition (when used purely for type hinting purposes) with an if block as above.
While the provided solutions do work, I just want to add a bit of context.
IMHO your annotation is not wrong. It is just not strict enough and not all that useful.
Fruit is a class. And technically speaking a class is a callable because type (the class of all classes) implements the __call__ method. In fact, that method is executed every time you create an instance of a class; even before the class' __init__ method. (For details refer to the "Callable types" subsection in this section of the data model docs.)
One problem with your annotation however, is that Callable is a generic type. Thus, you should specify its type arguments. In this case you would have a few options, depending on how narrow you want your annotation to be. The simplest one that would still be correct here is the "catch-all" callable:
def get_cls() -> Callable[..., Any]:
return Fruit
But since you know that calling the class Fruit returns an instance of that class, you might as well write this:
def get_cls() -> Callable[..., Fruit]:
return Fruit
Finally, if you know which arguments will be allowed for instantiating a Fruit (namely the color and taste attributes you defined on the dataclass), you could narrow it down even further:
def get_cls() -> Callable[[str, str], Fruit]:
return Fruit
Technically, all of those are correct. (Try it with mypy --strict.)
However, even that last annotation is not particularly useful since Fruit is not just any Callable returning a Fruit instance, it is the class Fruit itself. Therefore the most sensible annotation is (as #2e0byo pointed out) this one:
def get_cls() -> type[Fruit]:
return Fruit
That is what I would do as well.
I disagree with #rv.kvetch that removing the annotation is a solution (in any situation).
His DataClass protocol is an interesting proposal. However I would advise against it in this case for a few reasons:
It might give you all the magic attributes that make up any dataclass, but annotating with it makes you lose all information about the actualy specific class you return from get_cls, namely Fruit. In practical terms this means no auto-suggestions by the IDE of Fruit-specific attributes/methods.
You still have to place a type checker exception/ignore in get_cls because in the eyes of any static type checker type[Fruit] is not a subtype of type[DataClass]. The built-in dataclass protocol is a hack that is carried by specially tailored plugins for mypy, PyCharm etc. and those do not cover this kind of structural subtyping.
Even the forward reference to _DataclassParams is still a problem because it will never be resolved, unless you (surprise, surprise) import that protected member from the depths of the dataclasses package. Thus, this is not a stable annotation.
So from a type safety standpoint, there are two big errors in that code -- the subtyping and the unresolved reference -- and two minor errors; those being the non-parameterized generic annotations for __dataclass_fields__ (Field is generic) and __post_init__ (Callable is generic).
Still, I like protocols. Python is a protocol-oriented language. The approach is interesting.

Access type argument in any specific subclass of user-defined Generic[T] class

Context
Say we want to define a custom generic (base) class that inherits from typing.Generic.
For the sake of simplicity, we want it to be parameterized by a single type variable T. So the class definition starts like this:
from typing import Generic, TypeVar
T = TypeVar("T")
class GenericBase(Generic[T]):
...
Question
Is there a way to access the type argument T in any specific subclass of GenericBase?
The solution should be universal enough to work in a subclass with additional bases besides GenericBase and be independent of instantiation (i.e. work on the class level).
The desired outcome is a class-method like this:
class GenericBase(Generic[T]):
#classmethod
def get_type_arg(cls) -> Type[T]:
...
Usage
class Foo:
pass
class Bar:
pass
class Specific(Foo, GenericBase[str], Bar):
pass
print(Specific.get_type_arg())
The output should be <class 'str'>.
Bonus
It would be nice if all relevant type annotations were made such that static type checkers could correctly infer the specific class returned by get_type_arg.
Related questions
Generic[T] base class - how to get type of T from within instance? - This question focuses on direct instances of the custom generic class itself, not on specified subclasses.
How can I access T from a Generic[T] instance early in its lifecycle? - This is a variation on the previous one.
How to access the type arguments of typing.Generic? - This is very close, but does not cover the possibility of other base classes.
TL;DR
Grab the GenericBase from the subclass' __orig_bases__ tuple, pass it to typing.get_args, grab the first element from the tuple it returns, and make sure what you have is a concrete type.
1) Starting with get_args
As pointed out in this post, the typing module for Python 3.8+ provides the get_args function. It is convenient because given a specialization of a generic type, get_args returns its type arguments (as a tuple).
Demonstration:
from typing import Generic, TypeVar, get_args
T = TypeVar("T")
class GenericBase(Generic[T]):
pass
print(get_args(GenericBase[int]))
Output:
(<class 'int'>,)
This means that once we have access to a specialized GenericBase type, we can easily extract its type argument.
2) Continuing with __orig_bases__
As further pointed out in the aforementioned post, there is this handy little class attribute __orig_bases__ that is set by the type metaclass when a new class is created. It is mentioned here in PEP 560, but is otherwise hardly documented.
This attribute contains (as the name suggests) the original bases as they were passed to the metaclass constructor in the form of a tuple. This distinguishes it from __bases__, which contains the already resolved bases as returned by types.resolve_bases.
Demonstration:
from typing import Generic, TypeVar
T = TypeVar("T")
class GenericBase(Generic[T]):
pass
class Specific(GenericBase[int]):
pass
print(Specific.__bases__)
print(Specific.__orig_bases__)
Output:
(<class '__main__.GenericBase'>,)
(__main__.GenericBase[int],)
We are interested in the original base because that is the specialization of our generic class, meaning it is the one that "knows" about the type argument (int in this example), whereas the resolved base class is just an instance of type.
3) Simplistic solution
If we put these two together, we can quickly construct a simplistic solution like this:
from typing import Generic, TypeVar, get_args
T = TypeVar("T")
class GenericBase(Generic[T]):
#classmethod
def get_type_arg_simple(cls):
return get_args(cls.__orig_bases__[0])[0]
class Specific(GenericBase[int]):
pass
print(Specific.get_type_arg_simple())
Output:
<class 'int'>
But this will break as soon as we introduce another base class on top of our GenericBase.
from typing import Generic, TypeVar, get_args
T = TypeVar("T")
class GenericBase(Generic[T]):
#classmethod
def get_type_arg_simple(cls):
return get_args(cls.__orig_bases__[0])[0]
class Mixin:
pass
class Specific(Mixin, GenericBase[int]):
pass
print(Specific.get_type_arg_simple())
Output:
Traceback (most recent call last):
...
return get_args(cls.__orig_bases__[0])[0]
IndexError: tuple index out of range
This happens because cls.__orig_bases__[0] now happens to be Mixin, which is not a parameterized type, so get_args returns an empty tuple ().
So what we need is a way to unambiguously identify the GenericBase from the __orig_bases__ tuple.
4) Identifying with get_origin
Just like typing.get_args gives us the type arguments for a generic type, typing.get_origin gives us the unspecified version of a generic type.
Demonstration:
from typing import Generic, TypeVar, get_origin
T = TypeVar("T")
class GenericBase(Generic[T]):
pass
print(get_origin(GenericBase[int]))
print(get_origin(GenericBase[str]) is GenericBase)
Output:
<class '__main__.GenericBase'>
True
5) Putting them together
With these components, we can now write a function get_type_arg that takes a class as an argument and -- if that class is specialized form of our GenericBase -- returns its type argument:
from typing import Generic, TypeVar, get_origin, get_args
T = TypeVar("T")
class GenericBase(Generic[T]):
pass
class Specific(GenericBase[int]):
pass
def get_type_arg(cls):
for base in cls.__orig_bases__:
origin = get_origin(base)
if origin is None or not issubclass(origin, GenericBase):
continue
return get_args(base)[0]
print(get_type_arg(Specific))
Output:
<class 'int'>
Now all that is left to do is embed this directly as a class-method of GenericBase, optimize it a little bit and fix the type annotations.
One thing we can do to optimize this, is only run this algorithm only once for any given subclass of GenericBase, namely when it is defined, and then save the type in a class-attribute. Since the type argument presumably never changes for a specific class, there is no need to compute this every time we want to access the type argument. To accomplish this, we can hook into __init_subclass__ and do our loop there.
We should also define a proper response for when get_type_arg is called on a (unspecified) generic class. An AttributeError seems appropriate.
6) Full working example
from typing import Any, Generic, Optional, Type, TypeVar, get_args, get_origin
# The `GenericBase` must be parameterized with exactly one type variable.
T = TypeVar("T")
class GenericBase(Generic[T]):
_type_arg: Optional[Type[T]] = None # set in specified subclasses
#classmethod
def __init_subclass__(cls, **kwargs: Any) -> None:
"""
Initializes a subclass of `GenericBase`.
Identifies the specified `GenericBase` among all base classes and
saves the provided type argument in the `_type_arg` class attribute
"""
super().__init_subclass__(**kwargs)
for base in cls.__orig_bases__: # type: ignore[attr-defined]
origin = get_origin(base)
if origin is None or not issubclass(origin, GenericBase):
continue
type_arg = get_args(base)[0]
# Do not set the attribute for GENERIC subclasses!
if not isinstance(type_arg, TypeVar):
cls._type_arg = type_arg
return
#classmethod
def get_type_arg(cls) -> Type[T]:
if cls._type_arg is None:
raise AttributeError(
f"{cls.__name__} is generic; type argument unspecified"
)
return cls._type_arg
def demo_a() -> None:
class SpecificA(GenericBase[int]):
pass
print(SpecificA.get_type_arg())
def demo_b() -> None:
class Foo:
pass
class Bar:
pass
class GenericSubclass(GenericBase[T]):
pass
class SpecificB(Foo, GenericSubclass[str], Bar):
pass
type_b = SpecificB.get_type_arg()
print(type_b)
e = type_b.lower("E") # static type checkers correctly infer `str` type
assert e == "e"
if __name__ == '__main__':
demo_a()
demo_b()
Output:
<class 'int'>
<class 'str'>
An IDE like PyCharm even provides the correct auto-suggestions for whatever type is returned by get_type_arg, which is really nice. 🎉
7) Caveats
The __orig_bases__ attribute is not well documented. I am not sure it should be considered entirely stable. Although it doesn't appear to be "just an implementation detail" either. I would suggest keeping an eye on that.
mypy seems to agree with this caution and raises a no attribute error in the place where you access __orig_bases__. Thus a type: ignore was placed in that line.
The entire setup is for one single type parameter for our generic class. It can be adapted relatively easily to multiple parameters, though annotations for type checkers might become more tricky.
This method does not work when called directly from a specialized GenericBase class, i.e. GenericBase[str].get_type_arg(). But for that one just needs to call typing.get_args on it as shown in the very beginning.

How to annotate a custom types __iter__ to correctly indicate non-uniform returned types?

I have a custom type, for which I'd like to enable unpacking its values (a la tuple unpacking, etc.). The simplest way I know to do this in Python is to implement __iter__. This works great at runtime but I'd like however to provide type annotations so that the correct types are returned for each item, for example:
import typing as t
from dataclasses import dataclass
#dataclass
class Foo:
a: str
b: bool
def __iter__(self) -> t.Iterable[str, bool]:
yield self.a
yield self.b
At runtime, this works as-expected:
string, bool = Foo("Hello", False)
However, string and bool above are reported as Any types. Is there a reasonable way to provide this use-case whilst preserving types?
The real-world type is not easily translate-able to a NamedTuple etc.
Similar-ish to How to annotate types of multiple return values?
The feature you want is very specific to tuple builtin, and is supported via special-casing in mypy and other type checkers. However, you can tweak the type checker to make it think that your class is actually a tuple subclass, so it will get similar treatment on unpacking.
The following works (playground):
import typing as t
from dataclasses import dataclass
if t.TYPE_CHECKING:
base = tuple[str, bool]
else:
base = object
#dataclass
class Foo(base):
a: str
b: bool
def __iter__(self) -> t.Iterator[str | bool]:
yield self.a
yield self.b
p, q = Foo('a', True)
reveal_type(p)
reveal_type(q)
typing.TYPE_CHECKING is a special constant which is False at runtime (so the code inside is not executed), but True for type checkers.

Mypy check with dataclass field and alias

I'm struggling with mypy and dataclasses and especially with the field function.
Here is an example
from dataclasses import field, dataclass
#dataclass
class C:
some_int: int
some_str: str = field(metadata={"doc": "foo"})
another_int: int
c = C(42, "bla", 43)
So far, so good. Mypy and python are happy
However, if I want to make a small helper around field to easily write my doc
def doc(documentation: str):
return field(metadata={"doc": documentation})
Now I write my class like this:
#dataclass
class C:
some_int: int
some_str: str = doc("foo")
another_int: int
And mypy throws
error: Attributes without a default cannot follow attributes with one
Both are equivalent, but it seems mypy have a special case around field (if I understand correctly)
https://github.com/python/mypy/blob/v0.790/mypy/plugins/dataclasses.py#L359
So, my question is: is there a workaround to be able to write alias an for field?
Should I raise a bug on mypy?

How to get Python variable annotations?

When defining a class/module with annotated fields, how can I get annotations as like in functions?
class Test:
def __init__(self):
self.x : int
t = Test()
Now I need 'int' from getattr(t,'x')
With baseline Python, there is no option to do what you want without changing the definition of Test. The minimalist change would be to annotate the attribute at class level:
class Test:
x: int
def __init__(self):
# define self.x or not, but it needn't be annotated again
This is actually perfectly fine; by default, annotations at class scope are assumed to refer to instance attributes, not class attributes (assigning to a value at class scope creates a class attribute, but annotating it does not); you have to explicitly use typing.ClassVar to indicate the annotated type is intended to be a class attribute only. PEP 526's section on class and instance variable annotations defines these behaviors; they're something you can rely on, not just an accident of implementation.
Once you've done this, typing.get_type_hints will return {'x': int} for both Test and t in your example case.
While that's enough on its own, I'll note that in many such cases nowadays, as long as you're annotating anyway, you can simplify your code with the dataclasses module, getting the annotations and basic functionality defined for you with minimal typing. Simple replacement code for your case would be:
import dataclasses
#dataclasses.dataclass
class Test:
x: int
While your case doesn't showcase the full feature set (it's basically just replacing __init__ with the decorator), it's still doing more than meets the eye. In addition to defining __init__ for you (it expects to receive an x argument which is annotated to be an int), as well as a suitable __repr__ and __eq__, you can define defaults easily (just assign the default at point of annotation or for more complex or mutable cases, assign a dataclasses.field instead), and you can pass arguments to dataclass to make it produce sortable or immutable instances.
In your case, the main advantage is removing redundancy; x is annotated and referenced exactly once, rather than being annotated once at class level, then used (and optionally, annotated again) during initialization.
I am not sure you can get the annotations of self.x easily.
Assuming your code:
class Test:
def __init__(self):
self.x: int = None
t = Test()
I tried looking for __annotations__ in Test and t (where I would expect it to be), without much luck.
However, what you could do is this workaround:
class Test:
x: int
def __init__(self):
# annotation from here seems to be unreachable from `__annotations__`
self.x: str
t = Test()
print(Test.__annotations__)
# {'x': <class 'int'>}
print(t.__annotations__)
# {'x': <class 'int'>}
EDIT
If you want to be able to inspect the type of self.x within mypy check answer from #ruohola.
EDIT 2
Note that mypy (at least v.0.560) does get confused by annotating x both from the class and from the __init__, i.e. it looks like the annotation of self.x is boldly ignored:
import sys
class Test:
x: str = "0"
def __init__(self):
self.x: int = 1
t = Test()
print(Test.x, t.x)
# 0 1
print(Test.x is t.x)
# False
if "mypy" in sys.modules:
reveal_type(t.x)
# from mypyp: annotated_self.py:14: error: Revealed type is 'builtins.str'
reveal_type(Test.x)
# from mypy: annotated_self.py:15: error: Revealed type is 'builtins.str'
Test.x = 2
# from mypy: annotated_self.py:17: error: Incompatible types in assignment (expression has type "int", variable has type "str")
t.x = "3"
# no complaining from `mypy`
t.x = 4
# from mypy: annotated_self.py:19: error: Incompatible types in assignment (expression has type "int", variable has type "str")
print(Test.x, t.x)
# 2 4
If you're using mypy, you can use reveal_type() to check the type annotation of any expression. Note that this function is only usable when running mypy, and not at normal runtime.
I also use typing.TYPE_CHECKING, to not get an error when running the file normally, since this special constant is only assumed to be True by 3rd party type checkers.
test.py:
from typing import Dict, Optional, TYPE_CHECKING
class Test:
def __init__(self) -> None:
self.x: Optional[Dict[str, int]]
test = Test()
if TYPE_CHECKING:
reveal_type(test.x)
else:
print("not running with mypy")
Example when running mypy on it:
$ mypy test.py
test.py:10: error: Revealed type is 'Union[builtins.dict[builtins.str, builtins.int], None]'
And when running it normally:
$ python3 test.py
not running with mypy

Categories

Resources