Class __init__ attributes, DRY vs IDE functinality - python

What's the proper way to follow the DRY rule in class __init__ method?
I know this two ways:
class Foo:
def __init__(self, x, y, z=None):
self.x = x
self.y = y
self.z = z
class Bar:
_fields = ['x', 'y', 'z']
def __init__(self, x, y, z=None):
for field in self.__class__._fields:
setattr(self, field, locals()[field])
The method in Foo is very repetitive, you have to type each attribute name three times, it gets quite exhausting doing this in classes with even small number of attributes and names not so lengthy.
On the other hand, the method used in Bar is way shorter, but has the draw back of constant warnings from the IDE's stating that there's 'unresolved references'. And also this method doesn't allow using the dot operator functionality in the IDE's to auto-complete the attributes.
I'm looking for what to do to create the classes not repeating attributes names all over and still be able to use some functionality from the IDE's.
I'm using PyCharm as my IDE, but I would happily change to any other if there's support for what I'm trying to do.

"On the other hand, the method used in Bar is way shorter"
Not really... In your example it is the same number of lines and WAY more characters. Of course, this might not be the case if you're passing LOTS of arguments to the constructor, but... once you start having too many arguments in the constructor then you're about due for a refactor to figure out how to cut down the number of arguments anyway...
Use the first version. Your future self/collaborators will thank you for it1.
1Actually, your future collaborators probably won't know to thank you... I suppose it might be more accurate to say that your future self/collaborators won't hunt you down to make your life miserable for writing hard to read code :-)

Related

Should I repeat parent class __init__ arguments in the child class's __init__, or using **kwargs instead

Imagine a base class that you'd like to inherit from:
class Shape:
def __init__(self, x: float, y: float):
self.x = x
self.y = y
There seem to be two common patterns of handling a parent's kwargs in a child class's __init__ method.
You can restate the parent's interface completely:
class Circle(Shape):
def __init__(self, x: float, y: float, radius: float):
super().__init__(x=x, y=y)
self.radius = radius
Or you can specify only the part of the interface which is specific to the child, and hand the remaining kwargs to the parent's __init__:
class Circle(Shape):
def __init__(self, radius: float, **kwargs):
super().__init__(**kwargs)
self.radius = radius
Both of these seem to have pretty big drawbacks, so I'd be interested to hear what is considered standard or best practice.
The "restate the interface" method is appealing in toy examples like you commonly find in discussions of Python inheritance, but what if we're subclassing something with a really complicated interface, like pandas.DataFrame or logging.Logger?
Also, if the parent interface changes, I have to remember to change all of my child class's interfaces to match, type hints and all. Not very DRY.
In these cases, you're almost certain to go for the **kwargs option.
But the **kwargs option leaves the user unsure about which arguments are actually required.
In the toy example above, a user might naively write:
circle = Circle() # Argument missing for parameter "radius"
Their IDE (or mypy or Pyright) is being helpful and saying that the radius parameter is required.
circle = Circle(radius=5)
The IDE (or type checker) is now happy, but the code won't actually run:
Traceback (most recent call last):
File "foo.py", line 13, in <module>
circle = Circle(radius=5)
File "foo.py", line 9, in __init__
super().__init__(**kwargs)
TypeError: Shape.__init__() missing 2 required positional arguments: 'x' and 'y'
So I'm stuck with a choice between writing out the parent interface multiple times, and not being warned by my IDE when I'm using a child class incorrectly.
What to do?
Research
This mypy issue is loosely related to this.
This reddit thread has a good rehearsal of the relevant arguments for/against each approach I outline.
This SO question is maybe a duplicate of this one. Does the fact I'm talking about __init__ make any difference though?
I've found a real duplicate, although the answer is a bit esoteric and doesn't seem like it would qualify as best, or normal, practice.
If the parent class has required (positional) arguments (as your Shape class does), then I'd argue that you must include those arguments in the __init__ of the child (Circle) for the sake of being able to pass around "shape-like" instances and be sure that a Circle will behave like any other shape. So this would be your Circle class:
class Shape:
def __init__(x: float, y: float):
self.x = x
self.y = y
class Circle(Shape):
def __init__(x: float, y: float, radius: float):
super().__init__(x=x, y=y)
self.radius = radius
# The expectation is that this should work with all instances of `Shape`
def move_shape(shape: Shape, x: float, y: float):
shape.x = x
shape.y = y
However if the parent class is using optional kwargs, that's where stuff gets tricky. You shouldn't have to define colour: str on your Circle class just because colour is an optional argument for Shape. It's up to the developer using your Circle class to know the interface of all shapes and if need be, interrogate the code and note that Circle can accept colour=green as it passes **kwargs to its parent constructor:
class Shape:
def __init__(x: float, y: float, colour: str = "black"):
self.x = x
self.y = y
self.colour = colour
class Circle(Shape):
def __init__(x: float, y: float, radius: float, **kwargs):
super().__init__(x=x, y=y, **kwargs)
self.radius = radius
def move_shape(shape: Shape, x: float, y: float):
shape.x = x
shape.y = y
def colour_shape(shape: Shape, colour: str):
shape.colour = colour
Generally my attitude is that a docstring exists to explain why something is written the way it is, not what it's doing. That should be clear from the code. So, if your Circle requires an x and y parameter for use in the parent class, then it should say as much in the signature. If the parent class has optional requirements, then **kwargs is sufficient in the child class and it's incumbent upon the developer to interrogate Circle and Shape to see what the options are.
The solution I would consider most reasonable (though I realize what I'm saying might not be canonical) is to repeat the parent-class parameters that are required, but leave the optional ones to **kwargs.
Benefits:
clean code that is easy for a human reader to understand,
keeps the type checkers happy,
repeats only the essential stuff,
supports all the optional parameters without repeating those.
I think the best way to do this is to use the **kwargs approach, but to also define a __signature__ attribute on the class. This is a typing.Signature object that describes the arguments that the class expects.
from typing import Signature
class Shape:
def __init__(self, x: float, y: float):
self.x = x
self.y = y
class Circle(Shape):
def __init__(self, radius: float, **kwargs):
super().__init__(**kwargs)
self.radius = radius
__signature__ = Signature(
parameters=[
Parameter('radius', Parameter.POSITIONAL_OR_KEYWORD, annotation=float),
Parameter('x', Parameter.POSITIONAL_OR_KEYWORD, annotation=float),
Parameter('y', Parameter.POSITIONAL_OR_KEYWORD, annotation=float),
]
)
This will allow type checkers to understand that radius is a required argument, and that x and y are optional.
TL;DR
In Python 3.10, dataclasses provide a clean solution
Dataclasses provides inheritance mechanisms with automatically generated cumulative constructors that provide everything needed for mypy or vscode to do the type checking, yet is completely DRY.
Starting with Python 3.10, dataclasses are suitable more often than one might think, so I contend they could be considered a canonical answer to the question here, although probably not for earlier versions of Python.
Details
My thought process
I've been doing some more thinking and research on this topic, partly inspired by other answers and comments here. I have also done some tests, and eventually convinced myself that the canonical answer should be to use #dataclass whenever possible (which is more often than one might think, at least with Python >= 3.10).
There will of course be cases that really cannot be cast to dataclasses, and then I'll say use my first answer for those.
Augmented example without dataclasses
Let me augment the example a bit to illustrate my idea.
class Shape:
def __init__(self, x: float, y: float, name="default name", verbose=False):
self.x = x
self.y = y
self.name = name
if verbose:
print("Initialized:", self)
def __repr__(self):
return f"{type(self).__name__}(x={self.x},y={self.y},name={self.name})"
class Circle(Shape):
def __init__(self, x: float, y: float, r: float, **kwargs):
self.r = r
super().__init__(x, y, **kwargs)
def __repr__(self):
return f"{type(self).__name__}(r={self.r},x={self.x},y={self.y},name={self.name})"
Here I've added the optional parameter name with a default value that gets stored, and the optional parameter verbose that affects what __init__ does without getting stored. Those add parameters to __init__ beyond just required data fields.
And I've already applied the solution I suggested in my first answer, which was to repeat the required arguments, but only the required arguments.
Solution using dataclasses in Python 3.10
Now, let's rewrite this with dataclasses:
from dataclasses import dataclass, InitVar, field
#dataclass
class Shape:
x: float
y: float
name: str = field(kw_only=True, default="default name")
verbose: InitVar[bool] = field(kw_only=True, default=False)
def __post_init__(self, verbose):
if verbose:
print("Initialized:", self)
#dataclass
class Circle(Shape):
r: float
Notice how much shorter the code is. name is still optional with a default value. verbose is still accepted as an initialization parameter that is not stored. I get my __repr__ for free. And it makes the constructor of Circle explicitly require x and y, as well as r, so both mypy and pylint (and presumably vscode too) do complain if any of them is missing. In fact, being automatically generated, the Circle constructor repeats everything in the Shape constructor, but I didn't have to write it, so that's perfectly DRY.
An inherited init-only parameter
Let's add another init-only parameter, scale, which has the effect of scaling everything in both classes, i.e., x, y and r are multiplied by scale. This case is a bit twisted, but it lets me require a __post_init__ in the subclass too, which I would like to illustrate.
from dataclasses import dataclass, InitVar, field
#dataclass
class DCShape:
x: float
y: float
scale: InitVar[float] = field(kw_only=True, default=1)
name: str = field(kw_only=True, default="default name")
verbose: InitVar[bool] = field(kw_only=True, default=False)
def __post_init__(self, scale, verbose):
self.x = scale * self.x
self.y = scale * self.y
if verbose:
print("Initialized:", self)
#dataclass
class DCCircle(DCShape):
r: float
def __post_init__(self, scale, verbose):
self.r = scale * self.r
super().__post_init__(scale, verbose)
This is a pretty decent solution too, in my opinion. I did have to repeat scale and verbose in both classes' __post_init__ functions, and the subclass's instance has to call the superclass's instance explicitly, but this is still something I'd be happy to use in real production code.
Why not with Python <= 3.9?
To make this clean, I had to use keyword-only fields, which were only introduced to dataclasses with Python 3.10.
With earlier version, I would have had to also give Circle.r a default value, and then presumably add custom code to make sure that default value wasn't use, which would mean mypy would not notice if r was missing, so I feel that kills that solution. Although for cases where the base class only has required field, dataclasses work well before 3.10 too.
References
The dataclasses module
__post_init__: other initialization code
InitVar: init only variables
I come from C++. In there this is OOP 101. But since I transitioned to Python, throughout my career I used this no matter how annoying it was to duplicate parent constructor arguments. Also I found the hard way it was very hard to debug with **kwargs with a large enough code base. So, based on my experience I will upvote #joanis answer highlighting the benefits
Benefits:
clean code that is easy for a human reader to understandkeeps the
type checkers happyrepeats only the essential stuff supports all
the optional parameters without repeating those.
Although if one needs **kwargs is an option, but as far as I know, type checking will still not work; i.e. a debugging nightmare. Python is after all, an interpreted language.
And if you analyse the problem a bit deeper, it makes sense. Python Object creation only looks at a Single Class name eg. r = Circle(x,y) . There is no way to identify what it is inherited from without looking at the class. Where In C++ we could write something like below, which is compiled, and then resolved at runtime. Python IDE's inform most of the error prior to execution. But I can see why its taken time to provide a solution to this particular problem since **kwargs essentially does not indicate any information that could be validated prior to a run.
#C++ code
#include <iostream>
using namespace std;
class Shape {
public:
string x = "";
};
class Circle: public Shape
{
public:
string x = "50";
};
int main(void) {
Shape r= Circle();
cout<<r.x;
}
Do I look forward to this in Python? Well absolutely, along with some proper Polymorphism like C++. But sadly as advertised these languages have very different mechanisms. To quote from here "Python does not have the concept of virtual function or interface like C++. However, Python provides an infrastructure that allows us to build something resembling interfaces’ functionality" (Not a very well reputed site, but I found this quote to be noteworthy)
on the same line a comment on this line in the problem statement;
Also, if the parent interface changes, I have to remember to change all of my child class's interfaces to match, type hints and all. Not very DRY.
I actually believe this is necessary! We should be careful not to change Parent classes after a bunch of code is written.
However, I recommend reading this "Learning Path Advanced Python Programming" by Dr.Gabriele Lanaro (Or any other book regarding Python Design Patterns) to know a bit about how you can avoid pretty much many pitfalls like this via Design Patterns.
Lastly, while this may probably not the complete and satisfactory answer you are looking for my suggestions are;
If the project is large enough, stick to duplicating class constructors without using **kwargs (The solution with signature was nice! I can't see how it can harm at scale too)
If parent class is library file or third party class consider using an Adapter Pattern.
Have look at Builder Pattern and something called Fluent Builder Pattern
Hope this helps!
I searched a little bit and this is what I found:
1.-There are lots of libraries that can do the job, like Makefun, or Merge_args, or Python Forge, they all work the same:
Using inspect and/or functools libraries and trying to merge their signatures (they get all parameters by e.g tuple(signature(your_function).parameters.values()), or getfullargspec(your_function.__init__) (remember to slice it) and then replacing the signature.
Since they have put a lot effort in that task, I'd recommend you to use them if you want a solid solution.
2.-I had the same problem long ago, and I ended up restating only the most important parameters, and leaving the rest with **kwargs. But there's a better tournaround if you want something complete without any library (but not so DRY by my part haha): just use print(signature(Shape.__init__)) (Remember to from inspect import signature) and copy what's useful to you :).
3.-I saw #cactusboat's answer and I came up to this too, hope it helps:
from inspect import signature, Signature
from itertools import chain
from collections import OrderedDict
def make_signature(subclass, superclass):
"""Returns a signature object for the subclass constructor's __signature__ attribute."""
sub_params = signature(superclass.__init__).parameters
sup_params = signature(subclass.__init__).parameters
mixed_params = OrderedDict(chain(sub_params.items(), sup_params.items()))
mixed_signature = Signature(parameters=mixed_params.values())
return mixed_signature
class Shape:
def __init__(self, x: float, y: float):
self.x = x
self.y = y
class Circle(Shape):
def __init__(self, radius: float, **kwargs):
super().__init__(**kwargs)
self.radius = radius
Circle.__signature__ = make_signature(Circle, Shape)
if __name__ == "__main__":
help(Circle) # Circle(self, x: float, y: float, radius: float, **kwargs) ...
4.-As you can see, there are many ways, but there isn't any canonical one, the closest one could be PEP 362, and it has an example on how to Visualize Callable Objects’ Signature. But it's hard, since you would be falling into the adapter pattern.
Hope it helps! I'm kinda new into programming, so I did my best to find the best that I could. Greetings!
I believe you have to answer this question (which calls for opinions) from the point of view that you intend your code to be used.
For you as a developer that know the classes that you are coding, **kwargs can save some lengthy copy/pasting and refactoring if a superclass is modified. Meanwhile, you can use the method that you like most and even mix usage because you have full control over your own code. In the end, this is a matter of convenience.
For dev-users that will use your set of classes as a library, they will expect a fully documented API and your own potential refactoring work do not matter to them. So in that case it will be more intuitive to them to have the full parameter list.
For lambda-users that will just use your code without knowing what is inside it, it does not matter at all, but it should work no matter what. A general rule of thumb is to catch as many errors in the IDE while you can still fix it, not at runtime when it is too late. In that regard, **kwargs is more sensitive and is more prone to lead to bad user experience.
I would suggest using Pydantic. This introduces a dependency which might be a deal breaker, but I think it might be what you are looking for.
Example:
from Pydantic import BaseModel
class ShapePydantic(BaseModel):
x: float
y: float
class CirclePydantic(ShapePydantic):
radius:float
Type hint:
Worth noting that Pydantic allows extra inputs extra fields (or arguments) by default, but this can be turned off by using extra=Extra.forbid.
from pydantic import Extra
class CirclePydantic(ShapePydantic, extra=Extra.forbid):
radius:float
It's a bit tricky and depends on how you intend for your class to be used. For example if you allow usage of positional arguments then you get situation that you describe and basically break the expected order of arguments.
My opinion is that if you use **kwargs then it's better to prohibit use of positional arguments at all (notice asterisk before arguments):
class Shape:
def __init__(self, *, x: float, y: float):
self.x = x
self.y = y
class Circle(Shape):
def __init__(self, *, radius: float, **kwargs):
super().__init__(**kwargs)
self.radius = radius
This solves the issue of unexpected order but does not help the end user.
For that I would suggest to use stubs. It's still can be considered a code duplication, although it can be generated for you (granted you have not too complicated code) and if needed can be tweaked manually. Besides that, it allows developer to provide better type annotations in complicated situations.
That way, you can actually even use any variant you like as a developer, as long as stubs match your implementation, even support overloaded initializers and IDE will show to the user which signature is applicable to their arguments.
Still, I would suggest not to mix named positional arguments and **kwargs unless there is a good clear reason for it (like generic decorators or some kind of proxy). Especially complicated things become when using *args, **kwargs combo, since client now can unknowingly pass in the same argument twice (if there are no type annotations/stubs). This forces you to handle such cases and write complicated "parsing" of arguments. Such an approach can be justified in a large and complicated interface and considered better, since, in a way, provides more flexibility, but for a small interface it would be an overkill and pain.
If using *args, **kwargs then stub file is a must in my opinion.
This is probably not be the best practice, but if you want to avoid using the docstrings and want the IDE (type checker) to find these errors, you can use this implementation.
class Shape:
def __init__(self, x: float, y: float):
self.x = x
self.y = y
class Circle:
def __init__(self, radius: float, shape: Shape):
self.shape = shape
self.radius = radius

Is this a valid use of metaclasses

I've been watching some videos on decorators and metaclasses and I think I understand them better now. One maxim I took away was "don't use metaclasses if you can do it more simply without using them". Some time ago I wrote a metaclass without really understanding what I was doing and I went back and reviewed it. I'm pretty certain that I've done something sensible here but I thought I'd check ....
PS I'm mildly concerned that the Colour class is used in the Metaclass definition, I feel it ought to be used at the Class level but that would complicate the code.
import webcolors
# This is a holding class for demo purposes
# actual code allows much more, eg 0.3*c0 + 0.7*c1
class Colour(list):
def __init__(self,*arg):
super().__init__(arg)
# define a metaclass to implement Class.name
class MetaColour(type):
def __getattr__(cls,name):
try:
_ = webcolors.name_to_rgb(name)
return Colour(_.blue,_.green,_.red)
except ValueError as e:
raise ValueError(f"{name} is not a valid name for a colour ({e})")
return f"name({name})"
# a class based on the metaclass MetaColour
class Colours(metaclass=MetaColour):
pass
print("blue = ",Colours.blue)
print("green = ",Colours.green)
print("lime = ",Colours.lime)
print("orange = ",Colours.orange)
print()
print("lilac = ",Colours.lilac)
Edit: I realise I could have written the Colour class so that Colour("red") was equivalent to Colours.red but felt at the time that using Colours.red was more elegant and added the implication that the Colour 'red' was a constant, not something that has to be looked up and can vary.
If you really need Colours to be a class, then this metaclass just does it job - and seems fine. There is no problem at all in making use Colour inside it - there is no such thing as "metaclass code can not make use of any 'ordinary' class" - it is Python code as usuall.
The remark I'd do there is that maybe you don't need to use Colours as a class, and instead just create the Colours class, with all the functionality you need, and create a single instance of it. The remainder of the code will use this instance instead of the Colours class.
Yes, a single instance is the "singleton pattern" - but unlike some complicated code you can find around on how to make your class "be a singleton" (including some widely spread bad-practice about needing a metaclass to have a singleton in Python), you can just create the instance, assign it to a name, and be done with it. Just like in your example you have the "webcolors" object you are using.
For an extra-singleton bonus, you can make your single instance of Colours be named Colours, and shadow the class, preventing any accidental use of the class instead of the instance.
(And, although it might be obvious, for sake of completeness: in the "use Colours as an instance" case there is no need for this metaclass at all - the same __getattr__ method goes into the class body)
Of course, again, if you have uses for Colours as a class, there is no problem with this design.

Reducing syntactic cruft when a method accesses instance variables?

The following code is a simplified example of a task I'm working on, using Python, that seems to be a natural fit for an OOP style:
class Foo:
def __init__(self):
self.x = 1
self.y = 1
self.z = 1
def method(self):
return bar(self.x,self.y,self.z)
def bar(x,y,z):
return x+y+z
f = Foo()
print(f.method())
In the example code above, I have three instance variables in my object, but in my actual application it would be more like 10 or 15 variables, and if I implement what I have in mind in this style, then I'm going to end up with a lot of code that looks like this:
return bar(self.a.self.b,self.c,self.d,self.e,self.f,self.g,self.h,self.i)
Wow, it sure would be nice to be able to write this in a style more like this:
return bar(a,b,c,d,e,f,g,h,i)
That would be a lot more concise and readable. One way to do this might be to rewrite bar so that it takes a Foo object as an input rather than a bunch of scalar variables, but I would prefer not to do that. Actually, that would just push the syntactic cruft down into the bar function, where I guess I would have code that looked like this:
def bar(f):
return f.a+f.b+f.c
Is there a nicer way to handle this? My understanding is that without the "self.", I would be referencing class variables rather than instance variables. I thought about using a dictionary, but that seems even cruftier, with all the ["a"] stuff. Might there be some automated way to take a dictionary with keys like "a","b","c",... and kind of unload the values into local variables named a, b, c, and so on?
I think you're going about this the wrong way. You're correct that your examples are hard to read, but I don't think the root cause is Python's syntax. An argument list that contains 10-15 variables is going to be difficult to read in any programming languages. I think the problem is your program's structure. Instead of trying to find ways around Python's syntax and conventions, consider trying to refactor your program so your classes don't need so many attributes, or refactor your methods so they don't need to return so many attributes.
Unfortunately I can't help you do that without seeing the full version of your code, but Code Review Stack Exchange would be a good place to get some help with that. Reducing the number of values returned and not coming up with unconventional ways to list and manipulate your attributes will make your code easier to read and maintain, both for others and yourself in the future.
Well, you could do it like so if you really wanted to, but I would advice against it. What if you add a field to your class and so on? Also it just makes things more complicated.
class Foo:
def __init__(self):
self.x = 1
self.y = 1
self.z = 1
def method(self):
return bar(**vars(self)) # expand all attributes as arguments
def bar(x,y,z):
return x+y+z
f = Foo()
print(f.method())
You can use __dict__ to create attributes from data of varying length, and then use classmethod to sum the attributes passed:
import string
class Foo:
def __init__(self, data):
self.__dict__ = dict(zip(string.ascii_lowercase, data))
#classmethod
def bar(cls, instance, vals = []):
return sum(instance.__dict__.values()) if not vals else sum(getattr(instance, i) for i in vals)
f = Foo(range(20))
print(Foo.bar(f))
print(Foo.bar(f, ['a', 'c', 'e', 'k', 'm']))
Output:
190
28

Python class design: explicit keyword arguments vs. **kwargs vs. #property

Is there a generally accepted best practice for creating a class whose instances will have many (non-defaultable) variables?
For example, by explicit arguments:
class Circle(object):
def __init__(self,x,y,radius):
self.x = x
self.y = y
self.radius = radius
using **kwargs:
class Circle(object):
def __init__(self, **kwargs):
if 'x' in kwargs:
self.x = kwargs['x']
if 'y' in kwargs:
self.y = kwargs['y']
if 'radius' in kwargs:
self.radius = kwargs['radius']
or using properties:
class Circle(object):
def __init__(self):
pass
#property
def x(self):
return self._x
#x.setter
def x(self, value):
self._x = value
#property
def y(self):
return self._y
#y.setter
def y(self, value):
self._y = value
#property
def radius(self):
return self._radius
#radius.setter
def radius(self, value):
self._radius = value
For classes which implement a small number of instance variables (like the example above), it seems like the natural solution is to use explicit arguments, but this approach quickly becomes unruly as the number of variables grows. Is there a preferred approach when the number of instance variables grows lengthy?
I'm sure there are many different schools of thought on this, by here's how I've usually thought about it:
Explicit Keyword Arguments
Pros
Simple, less code
Very explicit, clear what attributes you can pass to the class
Cons
Can get very unwieldy as you mention when you have LOTs of things to pass in
Prognosis
This should usually be your method of first attack. If you find however that your list of things you are passing in is getting too long, it is likely pointing to more of a structural problem with the code. Do some of these things you are passing in share any common ground? Could you encapsulate that in a separate object? Sometimes I've used config objects for this and then you go from passing in a gazillion args to passing in 1 or 2
Using **kwargs
Pros
Seamlessly modify or transform arguments before passing it to a wrapped system
Great when you want to make a variable number of arguments look like part of the api, e.g. if you have a list or dictionary
Avoid endlessly long and hard to maintain passthrough definitions to a lower level system,
e.g.
def do_it(a, b, thing=None, zip=2, zap=100, zimmer='okay', zammer=True):
# do some stuff with a and b
# ...
get_er_done(abcombo, thing=thing, zip=zip, zap=zap, zimmer=zimmer, zammer=zammer)
Instead becomes:
def do_it(a, b, **kwargs):
# do some stuff with a and b
# ...
get_er_done(abcombo, **kwargs)
Much cleaner in cases like this, and can see get_er_done for the full signature, although good docstrings can also just list all the arguments as if they were real arguments accepted by do_it
Cons
Makes it less readable and explicit what the arguments are in cases where it is not a more or less simple passthrough
Can really easily hide bugs and obfuscate things for maintainers if you are not careful
Prognosis
The *args and **kwargs syntax is super useful, but also can be super dangerous and hard to maintain as you lose the explicit nature of what arguments you can pass in. I usually like to use these in situations when I have a method that basically is just a wrapper around another method or system and you want to just pass things through without defining everything again, or in interesting cases where the arguments need to be pre-filtered or made more dynamic, etc. If you are just using it to hide the fact that you have tons and tons of arguments and keyword arguments, **kwargs will probably just exacerbate the problem by making your code even more unwieldy and arcane.
Using Properties
Pros
Very explicit
Provides a great way of creating objects when they are somehow still "valid" when not all parameters are you known and passing around half-formed objects through a pipeline to slowly populate args. Also for attributes that don't need to be set, but could be, it sometimes provides a clean way of pairing down your __init__'s
Are great when you want to present a simple interface of attributes, e.g. for an api, but under the hood are doing more complicated cooler things like maintaining caches, or other fun stuff
Cons
A lot more verbose, more code to maintain
Counterpoint to above, can introduce danger in allowing invalid objects with some properties not yet fully initialized to be generated when they should never be allowed to exist
Prognosis
I actually really like taking advantage of the getter and setter properties, especially when I am doing tricky stuff with private versions of those attributes that I don't want to expose. It can also be good for config objects and other things and is nice and explicit, which I like. However, if I am initializing an object where I don't want to allow half-formed ones to be walking around and they are serving no purpose, it's still better to just go with explicit argument and keyword arguments.
TL;DR
**kwargs and properties have nice specific use cases, but just stick to explicit keyword arguments whenever practical/possible. If there are too many instance variables, consider breaking up your class into hierarchical container objects.
Without really knowing the particulars of your situation, the classic answer is this: if your class initializer requires a whole bunch of arguments, then it is probably doing too much, and it should be factored into several classes.
Take a Car class defined as such:
class Car:
def __init__(self, tire_size, tire_tread, tire_age, paint_color,
paint_condition, engine_size, engine_horsepower):
self.tire_size = tire_size
self.tire_tread = tire_tread
# ...
self.engine_horsepower = engine_horsepower
Clearly a better approach would be to define Engine, Tire, and Paint
classes (or namedtuples) and pass instances of these into Car():
class Car:
def __init__(self, tire, paint, engine):
self.tire = tire
self.paint = paint
self.engine = engine
If something is required to make an instance of a class, for example, radius in your Circle class, it should be a required argument to __init__ (or factored into a smaller class which is passed into __init__, or set by an alternative constructor). The reason is this: IDEs, automatic documentation generators, code autocompleters, linters, and the like can read a method's argument list. If it's just **kwargs, there's no information there. But if it has the names of the arguments you expect, then these tools can do their work.
Now, properties are pretty cool, but I'd hesitate to use them until necessary (and you'll know when they are necessary). Leave your attributes as they are and allow people to access them directly. If they shouldn't be set or changed, document it.
Lastly, if you really must have a whole bunch of arguments, but don't want to write a bunch of assignments in your __init__, you might be interested in Alex Martelli's answer to a related question.
Passing arguments to the __init__ is usually the best practice like in any Object Oriented programming language. In your example, setters/getters would allow the object to be in this weird state where it doesn't have any attribute yet.
Specifying the arguments, or using **kwargs depends on the situation. Here's a good rule of thumb:
If you have many arguments, **kwargs is a good solution, since it avoids code like this:
def __init__(first, second, third, fourth, fifth, sixth, seventh,
ninth, tenth, eleventh, twelfth, thirteenth, fourteenth,
...
)
If you're heavily using inheritance. **kwargs is the best solution:
class Parent:
def __init__(self, many, arguments, here):
self.many = many
self.arguments = arguments
self.here = here
class Child(Parent):
def __init__(self, **kwargs):
self.extra = kwargs.pop('extra')
super().__init__(**kwargs)
avoids writing:
class Child:
def __init__(self, many, arguments, here, extra):
self.extra = extra
super().__init__(many, arguments, here)
For all other cases, specifying the arguments is better since it allows developers to use both positional and named arguments, like this:
class Point:
def __init__(self, x, y):
self.x = x
self.y = y
Can be instantiated by Point(1, 2) or Point(x=1, y=2).
For general knowledge, you can see how namedtuple does it and use it.
Your second approach can be written in more elegant way:
class A:
def __init__(self, **kwargs):
self.__dict__ = {**self.__dict__, **kwargs}
a = A(x=1, y=2, verbose=False)
b = A(x=5, y=6, z=7, comment='bar')
print(a.x + b.x)
But all already mentioned disadvantages persist...

Python: must __init__(self, foo) always be followed by self.foo = foo?

I've been striving mightily for three days to wrap my head around __init__ and "self", starting at Learn Python the Hard Way exercise 42, and moving on to read parts of the Python documentation, Alan Gauld's chapter on Object-Oriented Programming, Stack threads like this one on "self", and this one, and frankly, I'm getting ready to hit myself in the face with a brick until I pass out.
That being said, I've noticed a really common convention in initial __init__ definitions, which is to follow up with (self, foo) and then immediately declare, within that definition, that self.foo = foo.
From LPTHW, ex42:
class Game(object):
def __init__(self, start):
self.quips = ["a list", "of phrases", "here"]
self.start = start
From Alan Gauld:
def __init__(self,val): self.val = val
I'm in that horrible space where I can see that there's just One Big Thing I'm not getting, and I it's remaining opaque no matter how much I read about it and try to figure it out. Maybe if somebody can explain this little bit of consistency to me, the light will turn on. Is this because we need to say that "foo," the variable, will always be equal to the (foo) parameter, which is itself contained in the "self" parameter that's automatically assigned to the def it's attached to?
You might want to study up on object-oriented programming.
Loosely speaking, when you say
class Game(object):
def __init__(self, start):
self.start = start
you're saying:
I have a type of "thing" named Game
Whenever a new Game is created, it will demand me for some extra piece of information, start. (This is because the Game's initializer, named __init__, asks for this information.)
The initializer (also referred to as the "constructor", although that's a slight misnomer) needs to know which object (which was created just a moment ago) it's initializing. That's the first parameter -- which is usually called self by convention (but which you could call anything else...).
The game probably needs to remember what the start I gave it was. So it stores this information "inside" itself, by creating an instance variable also named start (nothing special, it's just whatever name you want), and assigning the value of the start parameter to the start variable.
If it doesn't store the value of the parameter, it won't have that informatoin available for later use.
Hope this explains what's happening.
I'm not quite sure what you're missing, so let me hit some basic items.
There are two "special" intialization names in a Python class object, one that is relatively rare for users to worry about, called __new__, and one that is much more usual, called __init__.
When you invoke a class-object constructor, e.g. (based on your example) x = Game(args), this first calls Game.__new__ to obtain memory in which to hold the object, and then Game.__init__ to fill in that memory. Most of the time, you can allow the underlying object.__new__ to allocate the memory, and you just need to fill it in. (You can use your own allocator for special weird rare cases like objects that never change and may share identities, the way ordinary integers do for instance. It's also for "metaclasses" that do weird stuff. But that's all a topic for much later.)
Your Game.__init__ function is called with "all the arguments to the constructor" plus one stashed in the front, which is the memory allocated for that object itself. (For "ordinary" objects that's mostly a dictionary of "attributes", plus the magic glue for classes, but for objects with __slots__ the attributes dictionary is omitted.) Naming that first argument self is just a convention—but don't violate it, people will hate you if you do. :-)
There's nothing that requires you to save all the arguments to the constructor. You can set any or all instance attributes you like:
class Weird(object):
def __init__(self, required_arg1, required_arg2, optional_arg3 = 'spam'):
self.irrelevant = False
def __str__(self):
...
The thing is that a Weird() instance is pretty useless after initialization, because you're required to pass two arguments that are simply thrown away, and given a third optional argument that is also thrown away:
x = Weird(42, 0.0, 'maybe')
The only point in requiring those thrown-away arguments is for future expansion, as it were (you might have these unused fields during early development). So if you're not immediately using and/or saving arguments to __init__, something is definitely weird in Weird.
Incidentally, the only reason for using (object) in the class definition is to indicate to Python 2.x that this is a "new-style" class (as distinguished from very-old-Python "instance only" classes). But it's generally best to use it—it makes what I said above about object.__new__ true, for instance :-) —until Python 3, where the old-style stuff is gone entirely.
Parameter names should be meaningful, to convey the role they play in the function/method or some information about their content.
You can see parameters of constructors to be even more important because they are often required for the working of the new instance and contain information which is needed in other methods of the class as well.
Imagine you have a Game class which accepts a playerList.
class Game:
def __init__(self, playerList):
self.playerList = playerList # or self.players = playerList
def printPlayerList(self):
print self.playerList # or print self.players
This list is needed in various methods of the class. Hence it makes sense to assign it to self.playerList. You could also assign it to self.players, whatever you feel more comfortable with and you think is understandable. But if you don't assign it to self.<somename> it won't be accessible in other methods.
So there is nothing special about how to name parameters/attributes/etc (there are some special class methods though), but using meaningful names makes the code easier to understand. Or would you understand the meaning of the above class if you had:
class G:
def __init__(self, x):
self.y = x
def ppl(self):
print self.y
? :) It does exactly the same but is harder to understand...

Categories

Resources