With PEP 557 data classes are introduced into python standard library.
They make use of the #dataclass decorator and they are supposed to be "mutable namedtuples with default" but I'm not really sure I understand what this actually means and how they are different from common classes.
What exactly are python data classes and when is it best to use them?
Data classes are just regular classes that are geared towards storing state, rather than containing a lot of logic. Every time you create a class that mostly consists of attributes, you make a data class.
What the dataclasses module does is to make it easier to create data classes. It takes care of a lot of boilerplate for you.
This is especially useful when your data class must be hashable; because this requires a __hash__ method as well as an __eq__ method. If you add a custom __repr__ method for ease of debugging, that can become quite verbose:
class InventoryItem:
'''Class for keeping track of an item in inventory.'''
name: str
unit_price: float
quantity_on_hand: int = 0
def __init__(
self,
name: str,
unit_price: float,
quantity_on_hand: int = 0
) -> None:
self.name = name
self.unit_price = unit_price
self.quantity_on_hand = quantity_on_hand
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
def __repr__(self) -> str:
return (
'InventoryItem('
f'name={self.name!r}, unit_price={self.unit_price!r}, '
f'quantity_on_hand={self.quantity_on_hand!r})'
def __hash__(self) -> int:
return hash((self.name, self.unit_price, self.quantity_on_hand))
def __eq__(self, other) -> bool:
if not isinstance(other, InventoryItem):
return NotImplemented
return (
(self.name, self.unit_price, self.quantity_on_hand) ==
(other.name, other.unit_price, other.quantity_on_hand))
With dataclasses you can reduce it to:
from dataclasses import dataclass
#dataclass(unsafe_hash=True)
class InventoryItem:
'''Class for keeping track of an item in inventory.'''
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
The same class decorator can also generate comparison methods (__lt__, __gt__, etc.) and handle immutability.
namedtuple classes are also data classes, but are immutable by default (as well as being sequences). dataclasses are much more flexible in this regard, and can easily be structured such that they can fill the same role as a namedtuple class.
The PEP was inspired by the attrs project, which can do even more (including slots, validators, converters, metadata, etc.).
If you want to see some examples, I recently used dataclasses for several of my Advent of Code solutions, see the solutions for day 7, day 8, day 11 and day 20.
If you want to use dataclasses module in Python versions < 3.7, then you could install the backported module (requires 3.6) or use the attrs project mentioned above.
Overview
The question has been addressed. However, this answer adds some practical examples to aid in the basic understanding of dataclasses.
What exactly are python data classes and when is it best to use them?
code generators: generate boilerplate code; you can choose to implement special methods in a regular class or have a dataclass implement them automatically.
data containers: structures that hold data (e.g. tuples and dicts), often with dotted, attribute access such as classes, namedtuple and others.
"mutable namedtuples with default[s]"
Here is what the latter phrase means:
mutable: by default, dataclass attributes can be reassigned. You can optionally make them immutable (see Examples below).
namedtuple: you have dotted, attribute access like a namedtuple or a regular class.
default: you can assign default values to attributes.
Compared to common classes, you primarily save on typing boilerplate code.
Features
This is an overview of dataclass features (TL;DR? See the Summary Table in the next section).
What you get
Here are features you get by default from dataclasses.
Attributes + Representation + Comparison
import dataclasses
#dataclasses.dataclass
##dataclasses.dataclass() # alternative
class Color:
r : int = 0
g : int = 0
b : int = 0
These defaults are provided by automatically setting the following keywords to True:
#dataclasses.dataclass(init=True, repr=True, eq=True)
What you can turn on
Additional features are available if the appropriate keywords are set to True.
Order
#dataclasses.dataclass(order=True)
class Color:
r : int = 0
g : int = 0
b : int = 0
The ordering methods are now implemented (overloading operators: < > <= >=), similarly to functools.total_ordering with stronger equality tests.
Hashable, Mutable
#dataclasses.dataclass(unsafe_hash=True) # override base `__hash__`
class Color:
...
Although the object is potentially mutable (possibly undesired), a hash is implemented.
Hashable, Immutable
#dataclasses.dataclass(frozen=True) # `eq=True` (default) to be immutable
class Color:
...
A hash is now implemented and changing the object or assigning to attributes is disallowed.
Overall, the object is hashable if either unsafe_hash=True or frozen=True.
See also the original hashing logic table with more details.
What you don't get
To get the following features, special methods must be manually implemented:
Unpacking
#dataclasses.dataclass
class Color:
r : int = 0
g : int = 0
b : int = 0
def __iter__(self):
yield from dataclasses.astuple(self)
Optimization
#dataclasses.dataclass
class SlottedColor:
__slots__ = ["r", "b", "g"]
r : int
g : int
b : int
The object size is now reduced:
>>> imp sys
>>> sys.getsizeof(Color)
1056
>>> sys.getsizeof(SlottedColor)
888
In some circumstances, __slots__ also improves the speed of creating instances and accessing attributes. Also, slots do not allow default assignments; otherwise, a ValueError is raised.
See more on slots in this blog post.
Summary Table
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Feature | Keyword | Example | Implement in a Class |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
| Attributes | init | Color().r -> 0 | __init__ |
| Representation | repr | Color() -> Color(r=0, g=0, b=0) | __repr__ |
| Comparision* | eq | Color() == Color(0, 0, 0) -> True | __eq__ |
| | | | |
| Order | order | sorted([Color(0, 50, 0), Color()]) -> ... | __lt__, __le__, __gt__, __ge__ |
| Hashable | unsafe_hash/frozen | {Color(), {Color()}} -> {Color(r=0, g=0, b=0)} | __hash__ |
| Immutable | frozen + eq | Color().r = 10 -> TypeError | __setattr__, __delattr__ |
| | | | |
| Unpacking+ | - | r, g, b = Color() | __iter__ |
| Optimization+ | - | sys.getsizeof(SlottedColor) -> 888 | __slots__ |
+----------------------+----------------------+----------------------------------------------------+-----------------------------------------+
+These methods are not automatically generated and require manual implementation in a dataclass.
* __ne__ is not needed and thus not implemented.
Additional features
Post-initialization
#dataclasses.dataclass
class RGBA:
r : int = 0
g : int = 0
b : int = 0
a : float = 1.0
def __post_init__(self):
self.a : int = int(self.a * 255)
RGBA(127, 0, 255, 0.5)
# RGBA(r=127, g=0, b=255, a=127)
Inheritance
#dataclasses.dataclass
class RGBA(Color):
a : int = 0
Conversions
Convert a dataclass to a tuple or a dict, recursively:
>>> dataclasses.astuple(Color(128, 0, 255))
(128, 0, 255)
>>> dataclasses.asdict(Color(128, 0, 255))
{'r': 128, 'g': 0, 'b': 255}
Limitations
Lacks mechanisms to handle starred arguments
Working with nested dataclasses can be complicated
References
R. Hettinger's talk on Dataclasses: The code generator to end all code generators
T. Hunner's talk on Easier Classes: Python Classes Without All the Cruft
Python's documentation on hashing details
Real Python's guide on The Ultimate Guide to Data Classes in Python 3.7
A. Shaw's blog post on A brief tour of Python 3.7 data classes
E. Smith's github repository on dataclasses
From the PEP specification:
A class decorator is provided which inspects a class definition for
variables with type annotations as defined in PEP 526, "Syntax for
Variable Annotations". In this document, such variables are called
fields. Using these fields, the decorator adds generated method
definitions to the class to support instance initialization, a repr,
comparison methods, and optionally other methods as described in the
Specification section. Such a class is called a Data Class, but
there's really nothing special about the class: the decorator adds
generated methods to the class and returns the same class it was
given.
The #dataclass generator adds methods to the class that you'd otherwise have to define yourself like __repr__, __init__, __lt__, and __gt__.
Consider this simple class Foo
from dataclasses import dataclass
#dataclass
class Foo:
def bar():
pass
Here is the dir() built-in comparison. On the left-hand side is the Foo without the #dataclass decorator, and on the right is with the #dataclass decorator.
Here is another diff, after using the inspect module for comparison.
Related
Is it possible to have something like
class MyAbstract {
final int myFieldSomebodyHasToDefine;
}
class MyAbstractImplementation extends MyAbstract {
final int myFieldSomebodyHasToDefine = 5;
}
using dataclasses in python?
If you are working with a python interpreter before version 3.8, there is no straightforward way. However, since python 3.8, the final decorator has been added to the language. After importing it from the typing module in python, you can use it for methods and classes.
You may also use FINAL type for values.
Here is an example
from typing import final, Final
#final
class Base:
#final
def h(self)->None:
print("old")
class Child(Base):
# Bad overriding
def h(self) -> None:
print("new")
if __name__ == "__main__":
b = Base()
b.h()
c = Child()
c.h()
RATE: Final = 3000
# Bad value assignment
RATE = 7
print(RATE)
Important note: Python does not force the developer with final and FINAL. You can yet change the values upon your wish. The decorators of mostly informative for developers.
For more information, you may visit: https://peps.python.org/pep-0591/
Update: This is also an instance for dataclass
#dataclass
class Item:
"""Class for keeping track of an item in inventory."""
price: float
quantity_on_hand: int = 0
name:Final[str] = "ItemX"
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
As you can see, name is a final field. However, you must put the final values with a default value below all of the fields without an initial value.
I am creating new data class in python.
#dataclass
class User(Mixin):
id: int = None
items: List[DefaultItem] = None
This items is array of DefaultItem objects but I need this to be multiple possible objects like:
items: List[DefaultItem OR SomeSpecificItem OR SomeOtherItem] = None
How can I do something like this in python?
You can use typing.Union for this.
items: List[Union[DefaultItem, SomeSpecificItem, SomeOtherItem]] = None
And if you are on Python 3.10, they've added a convenient shorthand notation:
items: list[DefaultItem | SomeSpecificItem | SomeOtherItem] = None
Also just as a note: If items is allowed to be None, you should mark the type as Optional.
Also, a note that in Python 3.10, you can also pass the kw_only parameter to the #dataclass decorator to work around the issue which I suspect you're having, wherein all fields in a subclass are required to have a default value when there is at least one field with a default value in the superclass, Mixin in this case.
I added an example below to illustrate this a little better:
from dataclasses import dataclass
#dataclass
class Mixin:
string: str
integer: int = 222
#dataclass(kw_only=True)
class User(Mixin):
id: int
items: list['A | B | C']
class A: ...
class B: ...
class C: ...
u = User(string='abc', id=321, integer=123, items=[])
print(u)
Note that I've also wrapped the Union arguments in a string, so that the expression is forward-declared (i.e. not evaluated yet), since the classes in the Union arguments are defined a bit later.
This code works in 3.10 because the kw_only param is enabled, so now only keyword arguments are accepted to the constructor. This allows you to work around that issue as mentioned, where you would otherwise need to define a default value for all fields in a subclass when there's at least one default field in a parent class.
In earlier Python versions than 3.10, missing the kw_only argument, you'd expect to run into a TypeError as below:
TypeError: non-default argument 'id' follows default argument
The workaround for this in a pre-3.10 scenario is exactly how you had it: define a default value for all fields in the User class as below.
from __future__ import annotations
from dataclasses import dataclass, field
#dataclass
class Mixin:
string: str
integer: int = 222
#dataclass
class User(Mixin):
id: int = None
items: list[A | B | C] = field(default_factory=list)
class A: ...
class B: ...
class C: ...
u = User('abc', 123, 321)
print(u)
This question already has answers here:
Can you annotate return type when value is instance of cls?
(4 answers)
Closed 2 years ago.
Is there an inverse function for Type[SomeType] so that Instance[Type[SomeType]] == SomeType?
I'm given a class and I'd like to annotate the return value of calling its constructor
class FixedSizeUInt(int):
size: int = 0
def __new__(cls, value: int):
cls_max: int = cls.max_value()
if not 0 <= value <= cls_max:
raise ValueError(f"{value} is outside range " +
f"[0, {cls_max}]")
new: Callable[[cls, int], Instance[cls]] = super().__new__ ### HERE
return new(cls, value)
#classmethod
def max_value(cls) -> int:
return 2**(cls.size) - 1
Edit:
This class is abstract, it needs to be subclassed for it to make sense, as a size of 0 only allows for 0 as its value.
class NodeID(FixedSizeUInt):
size: int = 40
class NetworkID(FixedSizeUInt):
size: int = 64
Edit 2: For this specific case, using generics will suffice, as explained in https://stackoverflow.com/a/39205612/5538719 . Still, the question of a inverse of Type remains. Maybe the question then is: Will generics cover every case so that an inverse function is never needed?
I believe you want:
new: Callable[[Type[FixedSizeUInt], int], FixedSizeUInt] = ...
Or a little more dynamically:
from typing import TypeVar, Callable
T = TypeVar('T')
...
def __new__(cls: Type[T], value: int):
...
new: Callable[[Type[T], int], T] = ...
Still, the question of a inverse of Type remains. Maybe the question then is: Will generics cover every case so that an inverse function is never needed?
It's not about generics, it's about type hints in general. Take int as an example. int is the class. int() creates an instance of the class. In type hints, int means instance of int. Using a class as a type hint always talks about an instance of that type, not the class itself. Because talking about instances-of is the more typical case, talking about the class itself is less common.
So, you need to use a class in a type hint and a class in a type hint means instance of that class. Logically, there's no need for an Instance[int] type hint, since you cannot have a non-instance type hint to begin with. On the contrary, a special type hint Type[int] is needed for the special case that you want to talk about the class.
The typing module contains many protocols and abstract base classes that formally specify protocols which are informally described in the data model, so they can be used for type hints.
However I was unable to find such a protocol or abstract base class for objects that support __add__. Is there any formal specification of such a protocol? If not how would such an implementation look like?
Update:
Since I'm interested in such a class for the purpose of typing, such a class would only be useful if it's fully type itself, like the examples in the typing module.
You could define one yourself using the abc module. The ABC metaclass that is provided there allows you to define a __subclasshook__, in which you can check for class methods such as __add__. If this method is defined for a certain class, it is then considered a subclass of that abc.
from abc import ABC
class Addable(ABC):
#classmethod
def __subclasshook__(cls, C):
if cls is Addable:
if any("__add__" in B.__dict__ for B in C.__mro__):
return True
return NotImplemented
class Adder():
def __init__(self, x):
self.x = x
def __add__(self, x):
return x + self.x
inst = Adder(5)
# >>> isinstance(inst, Addable)
# True
# >>> issubclass(Adder, Addable)
# True
To my knowledge there is no pre-defined Addable protocol. You can definie one yourself:
from typing import Protocol, TypeVar
T = TypeVar("T")
class Addable(Protocol):
def __add__(self: T, other: T) -> T: ...
This protocol requires that both summands and the result share a common ancestor type.
The protocol can then be used as follows:
Tadd = TypeVar("Tadd", bound=Addable)
def my_sum(*args: Tadd, acc: Tadd) -> Tadd:
res = acc
for value in args:
res += value
return res
my_sum("a", "b", "c", acc="") # correct, returns string "abc"
my_sum(1, 2, 6, acc=0) # correct, returns int 9
my_sum(1, 2.0, 6, acc=0) # correct, returns float 9.0
my_sum(True, False, False, acc=False) # correct, returns bool 1 (mypy reveal_type returns bool, running it in python leads to result 1)
my_sum(True, False, 1, acc=1.0) # incorrect IMHO, but not detected by mypy, returns float 3.0
my_sum(1, 2, 6, acc="") # incorrect, detected by mypy
my_sum(1, 2, "6", acc=0) # incorrect, detected by mypy
There exists such a protocol. Just not in the code that's usually executed...
There exists a special package called typeshed that's used by type-checkers to add type-hints to code that isn't implemented with any type-hints... The typeshed package doesn't exist during runtime though.
So you can use _typeshed.SupportsAdd during type-checking - but to check during execution you'd need to check for the __add__ method dynamically or you implement the SupportsAdd protocol yourself...
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from _typeshed import SupportsAdd
# SupportsAdd must always be written in quotation marks when used later...
def require_adding_str_results_in_int(a_obj: "SupportsAdd[str, int]"):
# the type checker guarantees that this should work now
assert type(a_obj + "a str") == int
# to check during runtime though you'd probably use
assert hasattr(a_obj, "__add__")
Hope that helped...
Most likely it's the best way to implement your own protocol and use the #runtime_checkable decorator. You can look at the source code of typeshed to get inspirations.
I have a function that takes a list of objects and prints it.
bc_directives = t.Union[
data.Open,
data.Close,
data.Commodity,
data.Balance,
data.Pad,
data.Transaction,
data.Note,
data.Event,
data.Query,
data.Price,
data.Document,
data.Custom,
]
def print_entries(entries: t.List[bc_directives], file: t.IO) -> None:
pass
but if I do :
accounts: t.List[bc_directives] = []
for entry in data.sorted(entries):
if isinstance(entry, data.Open):
accounts.append(entry)
continue
accounts = sorted(accounts, key=lambda acc: acc.account)
# the attribute account does not exist for the other class.
print_entries(accounts)
Here I have a problem.
mypy complain that the other class does not have account attribute. Of course it is designed like that.
Item "Commodity" of "Union[Open, Close, Commodity, Balance, Pad, Transaction, Note, Event, Query, Price, Document, Custom]" has no attribute "account"
If I change the definition of accounts to t.List[data.Open], mypy complains when I used print_entries. (but it should be the best).
So how can I use use a subset of a union and get mypy to not complain?
You should make print_entries accept a Sequence, not a List. Here is a simplified example demonstrating a type-safe version of your code:
from typing import IO, List, Sequence, Union
class Open:
def __init__(self, account: int) -> None:
self.account = account
class Commodity: pass
Directives = Union[Open, Commodity]
def print_entries(entries: Sequence[Directives]) -> None:
for entry in entries:
print(entry)
accounts: List[Open] = [Open(1), Open(2), Open(3)]
print_entries(accounts)
The reason why making print_entries accept a list of your directive types is because it would introduce a potential bug in your code -- if print_entries were to do entries.append(Commodities()), your list of accounts would no longer contain only Open objects, breaking type safety.
Sequence is a read-only version of a list and so sidesteps this problem entirely, letting it have fewer restrictions. (That is, List[T] is a subclass of Sequence[T]).
More precisely, we say that Sequence is a covariant type: if if we have some child type C that subclasses a parent type P (if P :> C), it is always true that Sequence[P] :> Sequence[C].
In contrast, Lists are invariant: List[P] and List[C] will have no inherent relationship to each other, and neither subclasses the other.
Here is a tabular summary of the different kinds of relationships generic types can be designed to have:
| Foo[P] :> Foo[C] | Foo[C] :> Foo[P] | Used for
--------------------------------------------------------------------------------------
Covariant | True | False | Read-only types
Contravariant | False | True | Write-only types
Invariant | False | False | Both readable and writable types
Bivariant | True | True | Nothing (not type safe)