I'm struggling with mypy and dataclasses and especially with the field function.
Here is an example
from dataclasses import field, dataclass
#dataclass
class C:
some_int: int
some_str: str = field(metadata={"doc": "foo"})
another_int: int
c = C(42, "bla", 43)
So far, so good. Mypy and python are happy
However, if I want to make a small helper around field to easily write my doc
def doc(documentation: str):
return field(metadata={"doc": documentation})
Now I write my class like this:
#dataclass
class C:
some_int: int
some_str: str = doc("foo")
another_int: int
And mypy throws
error: Attributes without a default cannot follow attributes with one
Both are equivalent, but it seems mypy have a special case around field (if I understand correctly)
https://github.com/python/mypy/blob/v0.790/mypy/plugins/dataclasses.py#L359
So, my question is: is there a workaround to be able to write alias an for field?
Should I raise a bug on mypy?
Related
I am trying to add a type annotation to a function input argument that is a dataclass with attributes that overlap with another dataclass, which actually gets passed in as an input argument.
Consider the following code:
from dataclasses import dataclass
from typing import TypeVar
#dataclass
class Foo:
a: str
zar: str
#dataclass
class Car(Foo):
b: str
#dataclass
class CarInterface:
a: str
b: str
mar = TypeVar("mar", bound=CarInterface)
def blah(x: mar):
print(x.a)
car_instance = Car(a="blah blah", zar="11", b="bb")
blah(car_instance)
In this example, I'm trying to create my own type annotation mar which is bound by CarInterface. I want to check that whatever class is passed into blah() at least has a and b attributes (don't care if the class has other attributes such as zar). I want to do it this way because class Car (which actually gets passed in) is one of many classes that will be written in the future and passed into this function.
I also want it to be very easy to define a new Car, so I would like to avoid abstract classes as I don't think the added complexity is worth mypy being happy.
So I'm trying to create mar which uses duck typing to say that Car satisfies the interface of CarInterface.
However, I get two mypy errors.
The first is on the mar annotation in def blah
TypeVar "mar" appears only once in generic function signaturePylancereportInvalidTypeVarUse
And the other is where I pass car_instance into blah()
Argument of type "Car" cannot be assigned to parameter "x" of type "bar#blah" in function "blah"
Type "Car" cannot be assigned to type "CarInterface"
"Car" is incompatible with "CarInterface"PylancereportGeneralTypeIssues
Use a Protocol to define CarInterface rather than a dataclass:
from dataclasses import dataclass
from typing import Protocol
#dataclass
class Foo:
a: str
zar: str
#dataclass
class Car(Foo):
b: str
class CarInterface(Protocol):
a: str
b: str
def blah(x: CarInterface):
print(x.a)
car_instance = Car(a="blah blah", zar="11", b="bb")
blah(car_instance)
The above code will typecheck fine, but if you try to pass blah a Foo instead of a Car you'll get a mypy error like this:
test.py:22: error: Argument 1 to "blah" has incompatible type "Foo"; expected "CarInterface"
test.py:22: note: "Foo" is missing following "CarInterface" protocol member:
test.py:22: note: b
Found 1 error in 1 file (checked 1 source file)
A Protocol can be used as the bound for a TypeVar, but it's only necessary to use a TypeVar if you want to indicate that two variables not only implement the protocol but are also the same specific type (e.g. to indicate that a function takes any object implementing CarInterface and returns the same exact type of object rather than some other arbitrary CarInterface implementation).
The variable below is initialized as none, but during __post_init__ it is replaced with an instance of outlook client.
#dataclass
class Config:
"""Outlook configuration"""
mailbox: str
inbox: str
mailbox_obj: Union["Mailbox", None] = None
However, static type analysis correctly informs that mailbox_obj has no members (...is not a known member of "None"). I don't want to guard everything with if mailbox_obj just to satisfy the type analysis. Is there another way using a dataclass field or something?
The problem would go away if I just used a regular class since I can initialize the problem variable in init where the type will be inferred to it's set value, but then I have to write that extra boilerplate.
Writing this question has reminded me of the below, which is probably what I'm looking for:
mailbox_obj: "Mailbox" = field(init=False)
Is that the right way?
Yes, you want to specify that it is not an init field, so you just want something like this:
import dataclasses
class Mailbox:
pass
#dataclasses.dataclass
class Config:
"""Outlook configuration"""
mailbox: str
inbox: str
mailbox_obj: "Mailbox" = dataclasses.field(init=False)
def __post_init__(self):
# do some stuff...
self.mailbox_obj = Mailbox()
I saved the above code in a file called test_typing.py and here is mypy:
(py310) Juans-MBP:test juan$ mypy test_typing.py
Success: no issues found in 1 source file
Both Pydantic and Dataclass can typehint the object creation based on the attributes and their typings, like these examples:
from pydantic import BaseModel, PrivateAttr, Field
from dataclasses import dataclass
# Pydantic way
class Person(BaseModel):
name : str
address : str
_valid : bool = PrivateAttr(default=False)
#dataclass way
#dataclass
class PersonDataclass():
name : str
address : str
_valid : bool = False
bob = Person(name="Bob", address="New York")
bobDataclass = PersonDataclass("Bob", "New York")
With this code, I can get typehint on object creation (see screenshots below):
pydantic typehint on object creation
dataclass typehint on object creation
Not only that, but the object's attributes also get documented.
I studied the code of pydantic to try to achieve the same result, but I couldn't. The code that I tried was this:
class MyBaseModelMeta(type):
def __new__(cls, name, bases, dct):
def new_init(self : cls, /, name : str, address : str):
self.name = name
self.address = address
self._valid = False
dct["__init__"] = new_init
dct["__annotations__"] = {"__init__": {"name": str, "address": str, "_valid": bool}}
return super().__new__(cls, name, bases, dct)
class MyBaseModel(metaclass=MyBaseModelMeta):
def __repr__(self) -> str:
return f"MyBaseModel: {self.__dict__}"
class MyPerson(MyBaseModel):
pass
myBob = MyPerson("Bob", "New York")
My class works (the dynamic init insertion works) but the class and object get no typehint.
my class works but it doesn't get typehinted
What am I doing wrong? How can I achieve the typehints?
#Daniil Fajnberg is mostly correct,
but depending on your type checker you can can use the dataclass_transform(Python 3.11)
or __dataclass_transform__ early adopters program decorator.
Pylance and Pyright (usually used in VS-Code) at least work with these.
You can only mimic the behaviour of dataclasses that way though, I don't think you're able to define that your Metaclass adds extra fields. :/
Edit:
At least pydantic uses this decorator for their BaseModel: https://pydantic-docs.helpmanual.io/visual_studio_code/#technical-details
If you dig through the code of pydantic you'll find that their ModelMetaclass is decorated with __dataclass_transform__
#chepner is right.
Static type checkers don't execute your code, they just read it.
And to answer your question how Pydantic and dataclasses do it - they cheat:
mypy.dataclasses plugin
Pydantic mypy plugin
Pydantic PyCharm plugin
Special plugins allow mypy to infer the signatures that are actually only created at runtime. (I am just joking about the "cheating" of course, but you get my point.)
If you want your own dynamic annotations to be considered by static type checkers, you will have to write your own plugins for them.
In Python 3.7, I can create a dataclass with a defaulted InitVar just fine:
from dataclasses import dataclass, InitVar, field
#dataclass
class Foo:
seed: InitVar[str] = field(default='tomato')
stored: str = field(init=False)
def __post_init__(self, seed: str):
self.stored = f'planted {seed}'
print(Foo())
Now I try to create a similar dataclass with a mutable default, for which I need to use default_factory instead:
from dataclasses import dataclass, InitVar, field
from typing import List
#dataclass
class Bar:
seeds: InitVar[List[str]] = field(default_factory=list)
stored: List[str] = field(init=False)
def __post_init__(self, seeds: List[str]):
self.stored = [f'planted {seed}' for seed in seeds]
print(Bar())
However, this is not valid. Python raises TypeError: field seeds cannot have a default factory.
The dataclasses.py file from the standard library does not explain why:
# Special restrictions for ClassVar and InitVar.
if f._field_type in (_FIELD_CLASSVAR, _FIELD_INITVAR):
if f.default_factory is not MISSING:
raise TypeError(f'field {f.name} cannot have a '
'default factory')
# Should I check for other field settings? default_factory
# seems the most serious to check for. Maybe add others. For
# example, how about init=False (or really,
# init=<not-the-default-init-value>)? It makes no sense for
# ClassVar and InitVar to specify init=<anything>.
Why? What is the rationale behind this special restriction? How does this make sense?
The rationale is that supplying a default_factory would almost always be an error.
The intent of InitVar is to create a pseudo-field, called an "init-only field". That is almost always populated by post_init() if the value is other than the default. It is never returned by module-level fields() function. The primary use case is initializing field values that depend on one or more other fields.
Given this intent, it would almost always be a user error to supply a default_factory which is:
Something we would want to see returned by the fields() function.
Entirely unnecessary if we're using post_init() where you can call a factory directly.
Not suited for the case where the object creation depends on other field values.
When there is a field in a dataclass for which the type can be anything, how can you omit the annotation?
#dataclass
class Favs:
fav_number: int = 80085
fav_duck = object()
fav_word: str = 'potato'
It seems the code above doesn't actually create a field for fav_duck. It just makes that a plain old class attribute.
>>> Favs()
Favs(fav_number=80085, fav_word='potato')
>>> print(*Favs.__dataclass_fields__)
fav_number fav_word
>>> Favs.fav_duck
<object at 0x7fffea519850>
The dataclass decorator examines the class to find fields, by looking for names in __annotations__. It is the presence of annotation which makes the field, so, you do need an annotation.
You can, however, use a generic one:
#dataclass
class Favs:
fav_number: int = 80085
fav_duck: 'typing.Any' = object()
fav_word: str = 'potato'
According to PEP 557 which defines the meaning of data classes,
The dataclass decorator examines the class to find fields. A field is defined as any variable identified in __annotations__. That is, a variable that has a type annotation.
Which is to say that the premise of this question (e.g. "How can I use dataclass with a field that has no type annotation) must be rejected. The term 'field' in the context of dataclass necessitates that the attribute has a type annotation by definition.
Note that using a generic type annotation like typing.Any is not the same as having an unannotated attribute, since the attribute will appear in __annotations__.
Finally, the helper function make_dataclass will automatically use typing.Any for the type annotation in cases when only an attribute name is supplied, and this is also mentioned in the PEP with an example.
Types Hints are an optional feature of Python. This also means, that using #dataclass does not require from you to define types.
In the annotation you can write many things. These are not checked if you don't want them to be checked. These examples work:
#dataclass
class ColoredObject:
color : ""
name : ""
#dataclass
class ColoredObject:
color : ...
name : ...
#dataclass
class ColoredObject:
color : True
name : True
#dataclass
class ColoredObject:
color : object
name : object
#dataclass
class ColoredObject:
color : None
name : None
I listed so many options here so that you can decide if you like some of them or not. It is your decision how you use code annotations.
For people who are used to languages that are more statically typed than Python, this may be an awkward style. It looks like abusing an empty string or the Ellipse object for this purpose. And that is true. But keep in mind that code readability is also important in programming. Interestingly most readers of your code would intuitively understand if you write ... without even knowing that there exists something like an Ellipse object.
Of course if you don't want to confuse people who prefer type hints or if you want to use tools that expect correct type hints, you must establish an agreement on this.
The solution with typing.Any I have not listed. Of course that is a good solution. But types are an optional feature of Python. And that also means that knowing that there is something like typing.Any is optional knowledge.
If you are able to add a from __future__ import annotations at the top of the file, this will convert all annotations in the file to strings, so that they are then lazy-evaluated; this could be quite useful for defining annotations that don't necessary need to be resolved at runtime.
For example, one way to use a short and simple type annotation (_) for a field:
from __future__ import annotations
from dataclasses import dataclass
from typing import TYPE_CHECKING
# added to silence any IDE warnings (i.e. PyCharm)
if TYPE_CHECKING:
_ = object
#dataclass
class Favs:
fav_number: int = 80085
fav_duck: _ = object()
fav_word: str = 'potato'
print(Favs())
Prints:
Favs(fav_number=80085, fav_duck=<object object at 0x11b754ad0>, fav_word='potato')
If you don't want to or are unable to use a __future__ import (i.e. if you are on Python 3.6 or below, or want to silence IDE warnings of "unresolved references") you could always define a value for the type annotation beforehand:
from dataclasses import dataclass
# or:
# = ...
_ = object
#dataclass
class Favs:
fav_number: int = 80085
fav_duck: _ = object()
fav_word: str = 'potato'
print(Favs())