Constructing dataclasses with properties while using explicit keyword arguments - python

I would like to use a dataclass with an invariant that should not change during the lifetime of such instantiated objects. To indicate that the instance variables of these objects are private, their names are prefixed with an underscore. These instance variables can easily be accessed with the use of properties, as demonstrated with the example code below:
from dataclasses import dataclass
#dataclass
class C():
_x: int = 3
#property
def x(self) -> int:
return self._x
def p(self) -> None:
print(self._x)
The problem arises when I want to call the constructor of this class with explicit keyword arguments. To do so, I now have to provide the names of the instance variables with an underscore as well. This seems really counterintuitive, since the private variables are now accessed from outside of the class.
a = C() # sets 'a.x' to 3
a.p() # prints 3
b = C(5) # sets 'b.x' to 5
b.p() # prints 5
c = C(_x=7) # sets 'c.x' to 7
c = C(x=7) # error: unexpected keyword argument 'x'
One way to solve this problem, is to simply provide an explicit constructor with matching arguments:
def __init__(self, x: int = 3) -> None:
self._x = x
However, this also seems to be dreadfully counterintuitive as this approach contradicts the whole notion of a dataclass. Is there a way to use a dataclass in combination with properties that allows me to use explicit keyword arguments when constructing such objects without having to access/acknowledge instance variables intended to be private?

The dataclass is essentially a handful of methods that you can attach to your class. These methods provide reusable logic that dataclass developers thought applies to certain use cases. Setting private fields via __init__ arguments is not among these use cases, so what you want is not supported by the dataclass module.
Luckily, it appears someone else has written a different module that does cover this use case: https://pypi.org/project/dataclass-property/
You could also look at some alternative frameworks, such as pydantic, to see if they meet your needs better.

Related

dataclasses.dataclass with __init_subclass__

My confusion is with the interplay between dataclasses & __init_subclass__.
I am trying to implement a base class that will exclusively be inherited from. In this example, A is the base class. It is my understanding from reading the python docs on dataclasses that simply adding a decorator should automatically create some special dunder methods for me. Quoting their docs:
For example, this code:
from dataclasses import dataclass
#dataclass
class InventoryItem:
"""Class for keeping track of an item in inventory."""
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
will add, among other things, a __init__() that looks like:
def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0):
self.name = name
self.unit_price = unit_price
self.quantity_on_hand = quantity_on_hand
This is an instance variable, no? From the classes docs, it shows a toy example, which reads super clear.
class Dog:
kind = 'canine' # class variable shared by all instances
def __init__(self, name):
self.name = name # instance variable unique to each instance
A main gap in my understanding is - is it an instance variable or a class variable? From my testing below, it is a class variable, but from the docs, it shows an instance variable as it's proximal implementation. It may be that most of my problem is there. I've also read the python docs on classes, which do not go into dataclasses.
The problem continues with the seemingly limited docs on __init_subclass__, which yields another gap in my understanding. I am also making use of __init_subclass__, in order to enforce that my subclasses have indeed instantiated the variable x.
Below, we have A, which has an instance variable x set to None. B, C, and D all subclass A, in different ways (hoping) to determine implementation specifics.
B inherits from A, setting a class variable of x.
D is a dataclass, which inherits from A, setting what would appear to be a class variable of x. However, given their docs from above, it seems that the class variable x of D should be created as an instance variable. Thus, when D is created, it should first call __init_subclass__, in that function, it will check to see if x exists in D - by my understanding, it should not; however, the code passes scot-free. I believe D() will create x as an instance variable because the dataclass docs show that this will create an __init__ for the user.
"will add, among other things..." <insert __init__ code>
I must be wrong here but I'm struggling to put it together.
import dataclasses
class A:
def __init__(self):
self.x = None
def __init_subclass__(cls):
if not getattr(cls, 'x') or not cls.x:
raise TypeError(
f'Cannot instantiate {cls.__name__}, as all subclasses of {cls.__base__.__name__} must set x.'
)
class B(A):
x = 'instantiated-in-b'
#dataclasses.dataclass
class D(A):
x : str = 'instantiated-in-d'
class C(A):
def __init__(self):
self.x = 'instantiated-in-c'
print('B', B())
print('D', D())
print('C', C())
The code, per my expectation, properly fails with C(). Executing the above code will succeed with D, which does not compute for me. In my understanding (which is wrong), I am defining a field, which means that dataclass should expand my class variables as instance variables. (The previous statement is most probably where I am wrong, but I cannot find anything that documents this behavior. Are data classes not actually expanding class variables as instance variables? It certainly appears that way from the visual explanation in their docs.) From the dataclass docs:
The dataclass() decorator examines the class to find fields. A field is defined as a class variable that has a type annotation.
Thus - why - when creating an instance D() - does it slide past the __init_subclass__ of its parent A?
Apologies for the lengthy post, I must be missing something simple, so if once can point me in the right direction, that would be excellent. TIA!
I have just found the implementation for dataclasses from the CPython github.
Related Articles:
Understanding __init_subclass__
python-why-use-self-in-a-class
proper-way-to-create-class-variable-in-data-class
how-to-get-instance-variables-in-python
enforcing-class-variables-in-a-subclass
__init_subclass__ is called when initializing a subclass. Not when initializing an instance of a subclass - it's called when initializing the subclass itself. Your exception occurs while trying to create the C class, not while trying to evaluate C().
Decorators, such as #dataclass, are a post-processing mechanism, not a pre-processing mechanism. A class decorator takes an existing class that has already gone through all the standard initialization, including __init_subclass__, and modifies the class. Since this happens after __init_subclass__, __init_subclass__ doesn't see any of the modifications that #dataclass performs.
Even if the decorator were to be applied first, D still would have passed the check in A.__init_subclass__, because the dataclass decorator will set D.x to the default value of the x field anyway, so __init_subclass__ will find a value of x. In this case, that happens to be the same thing you set D.x to in the original class definition, but it can be a different object in cases where you construct field objects explicitly.
(Also, you probably wanted to write hasattr instead of getattr in not getattr(cls, 'x').)

Call the generated __init__ from custom constructor in dataclass for defaults

Is it possible to benefit from dataclasses.field, especially for default values, but using a custom constuctor? I know the #dataclass annotation sets default values in the generated __init__, and won't do it anymore if I replace it. So, is it possible to replace the generated __init__, and to still call it inside?
#dataclass
class A:
l: list[int] = field(default_factory=list)
i: int = field(default=0)
def __init__(self, a: Optional[int]): # completely different args than instance attributes
self.call_dataclass_generated_init() # call generated init to set defaults
if a is not None: # custom settings of attributes
i = 2*a
A workaround would be to define __new__ instead of overriding __init__, but I prefer to avoid that.
This question is quite close, but the answers only address the specific use-case that is given as a code example. Also, I don't want to use __post_init__ because I need to use __setattr__ which is an issue for static type checking, and it doesn't help tuning the arguments that __init__ will take anyway.
I don't want to use a class method either, I really want callers to use the custom constructor.
This one is also close, but it's only about explaining why the new constructor replaces the generated one, not about how to still call the latter (there's also a reply suggesting to use Pydantic, but I don't want to have to subclass BaseModel, because it will mess my inheritance).
So, in short, I want to benefit from dataclass's feature to have default values for attributes, without cumbersome workarounds. Note that raw default values is not an option for me because it sets class attributes:
class B:
a: int = 0 # this will create B.a class attribute, and vars(B()) will be empty
l: list[int] = [] # worse, a mutable object will be shared between instances

Type hint for return value in subclass

I am writing a CustomEnum class in which I want to add some helper methods, that would then be available by the classes subclassing my CustomEnum. One of the methods is to return a random enum value, and this is where I am stuck. The function works as expected, but on the type-hinting side, I cannot figure out a way of saying "the return type is the same type of cls".
I am fairly sure there's some TypeVar or similar magic involved, but since I never had to use them I never took the time to figure them out.
class CustomEnum(Enum):
#classmethod
def random(cls) -> ???:
return random.choice(list(cls))
class SubclassingEnum(CustomEnum):
A = "a"
B = "b"
random_subclassing_enum: SubclassingEnum
random_subclassing_enum = SubclassingEnum.random() # Incompatible types in assignment (expression has type "CustomEnum", variable has type "SubclassingEnum")
Can somebody help me or give me a hint on how to proceed?
Thanks!
The syntax here is kind of horrible, but I don't think there's a cleaner way to do this. The following passes MyPy:
from typing import TypeVar
from enum import Enum
import random
T = TypeVar("T", bound="CustomEnum")
class CustomEnum(Enum):
#classmethod
def random(cls: type[T]) -> T:
return random.choice(list(cls))
(In python versions <= 3.8, you have to use typing.Type rather than the builtin type if you want to parameterise it.)
What's going on here?
T is defined at the top as being a type variable that is "bound" to the CustomEnum class. This means that a variable annotated with T can only be an instance of CustomEnum or an instance of a class inheriting from CustomEnum.
In the classmethod above, we're actually using this type-variable to define the type of the cls parameter with respect to the return type. Usually we do the opposite — we usually define a function's return types with respect to the types of that function's input parameters. So it's understandable if this feels a little mind-bending!
We're saying: this method leads to instances of a class — we don't know what the class will be, but we know it will either be CustomEnum or a class inheriting from CustomEnum. We also know that whatever class is returned, we can guarantee that the type of the cls parameter in the function will be "one level up" in the type heirarchy from the type of the return value.
In a lot of situations, we might know that type[cls] will always be a fixed value. In those situations, it would be possible to hardcode that into the type annotations. However, it's best not to do so, and instead to use this method, which clearly shows the relationship between the type of the input and the return type (even if it uses horrible syntax to do so!).
Further reading: the MyPy documentation on the type of class objects.
Further explanation and examples
For the vast majority of classes (not with Enums, they use metaclasses, but let's leave that aside for the moment), the following will hold true:
Example 1
Class A:
pass
instance_of_a = A()
type(instance_of_a) == A # True
type(A) == type # True
Example 2
class B:
pass
instance_of_b = B()
type(instance_of_b) == B # True
type(B) == type # True
For the cls parameter of your CustomEnum.random() method, we're annotating the equivalent of A rather than instance_of_a in my Example 1 above.
The type of instance_of_a is A.
But the type of A is not A — A is a class, not an instance of a class.
Classes are not instances of classes; they are either instances of type or instances of custom metaclasses that inherit from type.
No metaclasses are being used here; ergo, the type of A is type.
The rule is as follows:
The type of all python class instances will be the class they're an instance of.
The type of all python classes will be either type or (if you're being too clever for your own good) a custom metaclass that inherits from type.
With your CustomEnum class, we could annotate the cls parameter with the metaclass that the enum module uses (enum.EnumType, if you want to know). But, as I say — best not to. The solution I've suggested illustrates the relationship between the input type and the return type more clearly.
Starting in Python 3.11, the correct return annotation for this code is Self:
from typing import Self
class CustomEnum(Enum):
#classmethod
def random(cls) -> Self:
return random.choice(list(cls))
Quoting from the PEP:
This PEP introduces a simple and intuitive way to annotate methods that return an instance of their class. This behaves the same as the TypeVar-based approach specified in PEP 484 but is more concise and easier to follow.
The current workaround for this is unintuitive and error-prone:
Self = TypeVar("Self", bound="Shape")
class Shape:
#classmethod
def from_config(cls: type[Self], config: dict[str, float]) -> Self:
return cls(config["scale"])
We propose using Self directly:
from typing import Self
class Shape:
#classmethod
def from_config(cls, config: dict[str, float]) -> Self:
return cls(config["scale"])
This avoids the complicated cls: type[Self] annotation and the TypeVar declaration with a bound. Once again, the latter code behaves equivalently to the former code.

Is there a shorthand initializer in Python?

I have found no reference for a short constructor call that would initialize variables of the caller's choice. I am looking for
class AClass:
def __init__(self):
pass
instance = AClass(var1=3, var2=5)
instead of writing the heavier
class AClass:
def __init__(self, var1, var2):
self.var1 = var1
self.var2 = var2
or the much heavier
instance = AClass()
instance.var1 = 3
instance.var2 = 5
Am I missing something?
This is an excellent question and has been a puzzle also for me.
In the modern Python world, there are three (excellent) shorthand initializers (this term is clever, I am adopting it), depending on your needs. None requires any footwork with __init__ methods (which is what you wanted to avoid in the first place).
Namespace object
If you wish to assign arbitrary values to an instance (i.e. not enforced by the class), you should use a particular data structure called namespace. A namespace object is an object accessible with the dot notation, to which you can assign basically what you want.
You can import the Namespace class from argparse (it is covered here: How do I create a Python namespace (argparse.parse_args value)?). Since Python 3.3. a SimpleNamespace class is available from the standard types package.
from types import SimpleNamespace
instance = SimpleNamespace(var1=var1, var2=var2)
You can also write:
instance = SimpleNamespace()
instance.var1 = var1
instance.var2 = var2
Let's say its the "quick and dirty way", which would work in a number of cases. In general there is not even the need to declare your class.
If you want your instances to still have a few methods and properties you could still do:
class AClass(Namespace):
def mymethod(self, ...):
pass
And then:
instance = AClass(var1=var1, var2=var2, etc.)
That gives you maximum flexibility.
Named tuple
On the other hand, if you want the class to enforce those attributes, then you have another, more solid option.
A named tuple produces immutable instances, which are initialized once and for all. Think of them as ordinary tuples, but with each item also accessible with the dot notation. This class namedtuple is part of the standard distribution of Python. This how you generate your class:
from collections import namedtuple
AClass = namedtuple("AClass", "var1 var2")
Note how cool and short the definition is and not __init__ method required. You can actually complete your class after that.
And to create an object:
instance = AClass(var1, var2)
or
instance = AClass(var1=var1, var2=var2)
Named list
But what if you want that instance to be mutable, i.e. to allow you update the properties of the instance? The answer is the named list (also known as RecordClass). Conceptually it is like a normal list, where the items are also accessible with the dot notation.
There are various implementations. I personally use the aptly named namedlist.
The syntax is identical:
from namedlist import namedlist
AClass = namedlist("AClass", "var1 var2")
And to create an object:
instance = AClass(var1, var2)
or:
instance = AClass(var1=var1, var2=var2)
And you can then modify them:
instance.var1 = var3
But you can't add an attribute that is not defined.
>>> instance.var4 = var4
File "<stdin>", line 1, in <module>
AttributeError: 'instance' object has no attribute 'var4'
Usage
Here is my two-bit:
Namespace object is for maximum flexibility and there is not even the need to declare a class; with the risk of having instances that don't behave properly (but Python is a language for consenting adults). If you have only one instance and/or you know what you're doing, that would be the way to go.
namedtuple class generator is perfect to generate objects for returns from functions (see this brief explanation in a lecture from Raymond Hettinger). Rather than returning bland tuples that the user needs to look up in the documentation, the tuple returned is self-explanatory (a dir or help will do it). And it it's compatible with tuple usage anyway (e.g. k,v, z = my_func()). Plus it's immutable, which has its own advantages.
namedlist class generator is useful in a wide range of cases, including when you need to return multiple values from a function, which then need to be amended at a later stage (and you can still unpack them: k, v, z = instance). If you need a mutable object from a proper class with enforced attributes, that might be the go-to solution.
If you use them well, this might significantly cut down time spent on writing classes and handling instances!
Update (September 2020)
#PPC: your dream has come true.
Since Python 3.7, a new tool is available as a standard: dataclasses (unsurprisingly, the designer of the named list package, Eric V. Smith, is also behind it).
In essence, it provides an automatic initialization of class variables.
from dataclasses import dataclass
#dataclass
class InventoryItem:
"""Class for keeping track of an item in inventory."""
name: str
unit_price: float
quantity_on_hand: int = 0
def total_cost(self) -> float:
return self.unit_price * self.quantity_on_hand
(from the official doc)
What the #dataclass decorator will do, will be to automatically add the __init__() method:
def __init__(self, name: str, unit_price: float, quantity_on_hand: int=0):
self.name = name
self.unit_price = unit_price
self.quantity_on_hand = quantity_on_hand
IMHO, it's a pretty, eminently pythonic solution.
Eric also maintains a backport of dataclasses on github, for Python 3.6.
You can update the __dict__ attribute of your object directly, which is where the attributes are stored
class AClass:
def __init__(self, **kwargs):
self.__dict__.update(kwargs)
c = AClass(var1=1, var2='a')
You can use the dictionary representation of the object's attributes, and update its elements with the keyword arguments given to the constructor:
class AClass:
def __init__(self, **kwargs):
self.__dict__.update(**kwargs)
instance = AClass(var1=3, var2=5)
print(instance.var1, instance.var2) # prints 3 5
However, consider this question and its answers considering the style of this. Unless you know what you are doing, better explicitly set the arguments one by one. It will be better understandable for you and other people later - explicit is better than implicit. If you do it the __dict__.update way, document it properly.
Try
class AClass:
def __init__(self, **vars):
self.var1 = vars.get('var1')

python class keyword arguments

I'm writing a class for something and I keep stumbling across the same tiresome to type out construction. Is there some simple way I can set up class so that all the parameters in the constructor get initialized as their own name, i.e. fish = 0 -> self.fish = fish?
class Example(object):
def __init__(self, fish=0, birds=0, sheep=0):
self.fish = fish
self.birds = birds
self.sheep = sheep
Short answer: no. You are not required to initialize everything in the constructor (you could do it lazily), unless you need it immediately or expose it (meaning that you don't control access). But, since in Python you don't declare data fields, it will become difficult, much difficult, to track them all if they appear in different parts of the code.
More comprehensive answer: you could do some magic with **kwargs (which holds a dictionary of argument name/value pairs), but that is highly discouraged, because it makes documenting the changes almost impossible and difficult for users to check if a certain argument is accepted or not. Use it only for optional, internal flags. It could be useful when having 20 or more parameters to pass, but in that case I would suggest to rethink the design and cluster data.
In case you need a simple key/value storage, consider using a builtin, such as dict.
You could use the inspect module:
import inspect
class Example(object):
def __init__(self, fish=0, birds=0, sheep=0):
frame = inspect.currentframe()
args, _, _, values = inspect.getargvalues(frame)
for i in args:
setattr(self, i, values[i])
This works, but is more complicated that just setting them manually. It should be possible to hide this with a decorator:
#set_attributes
def __init__(self, fish=0, birds=0, sheep=0):
pass
but defining set_attributes gets tricky because the decorator inserts another stack frame into the mix, and I can't quite get the details right.
For Python 3.7+, you can try using data classes in combination with type annotations.
https://docs.python.org/3/library/dataclasses.html
Import the module and use the decorator. Type-annotate your variables and there's no need to define an init method, because it will automatically be created for you.
from dataclasses import dataclass
#dataclass
class Example:
fish: int = 0
birds: int = 0
sheep: int = 0

Categories

Resources