When (and why) was the Python __new__() function introduced?
There are three steps in creating an instance of a class, e.g. MyClass():
MyClass.__call__() is called. This method must be defined in the metaclass of MyClass.
MyClass.__new__() is called (by __call__). Defined on MyClass itself. This creates the instance.
MyClass.__init__() is called (also by __call__). This initializes the instance.
Creation of the instance can be influenced either by overloading __call__ or __new__. There usually is little reason to overload __call__ instead of __new__ (e.g. Using the __call__ method of a metaclass instead of __new__?).
We have some old code (still running strong!) where __call__ is overloaded. The reason given was that __new__ was not available at the time. So I tried to learn more about the history of both Python and our code, but I could not figure out when __new__ was introduced.
__new__ appears in the documentation for Python 2.4 and not in those for Python 2.3, but it does not appear in the whathsnew of any of the Python 2 versions. The first commit that introduced __new__ (Merge of descr-branch back into trunk.) that I could find is from 2001, but the 'back into trunk' message is an indication that there was something before. PEP 252 (Making Types Look More Like Classes) and PEP 253 (Subtyping Built-in Types) from a few months earlier seem to be relevant.
Learning more about the introduction of __new__ would teach us more about why Python is the way it is.
Edit for clarification:
It seems that class.__new__ duplicates functionality that is already provided by metaclass.__call__. It seems un-Pythonic to add a method only to replicate existing functionality in a better way.
__new__ is one of the few class methods that you get out of the box (i.e. with cls as first argument), thereby introducing complexity that wasn't there before. If the class is the first argument of a function, then it can be argued that the function should be a normal method of the metaclass. But that method did already exist: __call__(). I feel like I'm missing something.
There should be one-- and preferably only one --obvious way to do it.
The blog post The Inside Story on New-Style Classes
(from the aptly named http://python-history.blogspot.com) written by Guido van Rossum (Python's BDFL) provides some good information regarding this subject.
Some relevant quotes:
New-style classes introduced a new class method __new__() that lets
the class author customize how new class instances are created. By
overriding __new__() a class author can implement patterns like the
Singleton Pattern, return a previously created instance (e.g., from a
free list), or to return an instance of a different class (e.g., a
subclass). However, the use of __new__ has other important
applications. For example, in the pickle module, __new__ is used to
create instances when unserializing objects. In this case, instances
are created, but the __init__ method is not invoked.
Another use of __new__ is to help with the subclassing of immutable
types. By the nature of their immutability, these kinds of objects can
not be initialized through a standard __init__() method. Instead, any
kind of special initialization must be performed as the object is
created; for instance, if the class wanted to modify the value being
stored in the immutable object, the __new__ method can do this by
passing the modified value to the base class __new__ method.
You can read the entire post for more information on this subject.
Another post about New-style Classes which was written along with the above quoted post has some additional information.
Edit:
In response to OP's edit and the quote from the Zen of Python, I would say this.
Zen of Python was not written by the creator of the language but by Tim Peters and was published only in August 19, 2004. We have to take into account the fact that __new__ appears only in the documentation of Python 2.4 (which was released on November 30, 2004), and this particular guideline (or aphorism) did not even exist publicly when __new__ was introduced into the language.
Even if such a document of guidelines existed informally before, I do not think that the author(s) intended them to be misinterpreted as a design document for an entire language and ecosystem.
I will not explain the history of __new__ here because I have only used Python since 2005, so after it was introduced into the language. But here is the rationale behind it.
The normal configuration method for a new object is the __init__ method of its class. The object has already been created (usually via an indirect call to object.__new__) and the method just initializes it. Simply, if you have a truely non mutable object, it is too late.
In that use case the Pythonic way is the __new__ method, which builds and returns the new object. The nice point with it, is that is is still included in the class definition and does not require a specific metaclass. Standard documentation states:
new() is intended mainly to allow subclasses of immutable types (like int, str, or tuple) to customize instance creation. It is also commonly overridden in custom metaclasses in order to customize class creation.
Defining a __call__ method on the metaclass is indeed allowed but is IMHO non Pythonic, because __new__ should be enough. In addition, __init__, __new__ and metaclasses each dig deeper inside the internal Python machinery. So the rule shoud be do not use __new__ if __init__ is enough, and do not use metaclasses if __new__ is enough.
Related
I've been working on multiprocessing and C++ extensions and I don't quite get the __getstate_manages_dict__ function (I know how to use it, but I'm not really sure how it works). The boost/Python documentation for pickle support says this:
The author of a Boost.Python extension class might provide a __getstate__ method without considering the possibilities that: * his class is used in Python as a base class. Most likely the dict of instances of the derived class needs to be pickled in order to restore the instances correctly. * the user adds items to the instance's __dict__ directly. Again, the __dict__ of the instance then needs to be pickled.
To alert the user to this highly unobvious problem, a safety guard is
provided. If __getstate__ is defined and the instance's __dict__ is
not empty, Boost.Python tests if the class has an attribute
__getstate_manages_dict__. An exception is raised if this attribute is not defined:
I've seen some examples where the object's __dict__ is returned in __getstate__ and then updated in __setstate__. What is this __dict__ refering to? Is it the __dict__ attribute of the derived class object? Also, why does this dict needs to be handled explicitly if pickle calls __init__ to create a new object and then sets the attribute?
Thanks
I know how to use it, but I'm not really sure how it works
It's a boolean value that is False by default. The point is to signal to Boost that the __getstate__/__setstate__ implementation handles the class' __dict__ attribute, so that the information won't be lost in the pickling process.
The idea is that Boost::Python can't actually determine whether the code is written properly, so instead you are made to jump through this extra hurdle so that, if you are unaware of the problem, you see an error message - as they say, to alert the user to this highly unobvious problem.
It's not doing anything magical. It's just there to confirm that you "read the warranty", so to speak.
The boost/Python documentation for pickle support says this:
This is just explaining the reasons why it's important to consider the __dict__ contents - even if you don't want to pickle all the attributes that you set explicitly in the __init__ (for example, because the class holds a reference to a large resource that you intend to load in some other way, or a results cache, or...). In short, instances of your class might contain information that you didn't expect them to contain, and that your __getstate__ implementation won't know how to handle, unless it takes the instance's __dict__ into account.
Hence the "practical advice" offered: "If __getstate__ is required, include the instance's __dict__ in the Python object that is returned."
What is this __dict__ referring to? Is it the __dict__ attribute of the derived class object?
It's the __dict__ attribute of the instance that __getstate__ was called upon. That could be an instance of a derived class, if that class doesn't override the methods. Or it could be an instance of the base class, which may or may not have had extra attributes added outside the class implementation.
Also, why does this dict needs to be handled explicitly if pickle calls init to create a new object and then sets the attribute?
See above. When you get the attributes (so that the pickle file can be written), you need to make sure that you actually get all the necessary attributes, or else they'll be missing upon restoration. Hard-coded logic can miss some.
According to this post:
Python memoising/deferred lookup property decorator
A mnemonic decorator can be used to declare a lazy property in a class. There is even an 'official' package that can be used out of the box:
https://pypi.python.org/pypi/lazy
however, both of these implementation has a severe problem: any memorized values will be attempted to be pickled by python. If these values are unpicklable it will cause the program to break down.
My question is: is there an easy way to implement scala's "#transient lazy val" declaration without too much tinkering? This declaration should remember the property in case of multiple invocation, and drop it once the class/object is serialized.
Unaware of scala implementation details, but the easiest solution comes to my mind, if you're satisfied with other aspects of the 'lazy property' library you've found, would be implementing __getstate__ and __setstate__ object methods, as described in Pickling and unpickling normal class instances
These methods are called by pickle/unpickle handler during object instance (de)serialization.
This way you can have fine-grained control of how/which attributes of your object serialized.
You should read corresponding documentation on another two pickle-related methods as well (take care of __getinitargs__ specifically).
Python deserialized objects initialization differes from common __new__ & __init__ sequence
I read the highest rated answer to this question, and it says we should call the super class' __init__ if we need to, and we don't have to. But my question is more about convention.
Should I normally, as a general rule, always call the superclass' __init__ in my class' __init__, regardless of whether or not I currently 'need' the functionality in that method?
Some classes need their __init__ method to be called in order to work. They use their __init__ method sets attributes that will be needed.
Example:
class one ():
def __init__ (self):
self.number = 20
def show_number(self):
return self.number
If you inherit from the above class, you will need to call its __init__ method in order to define the attribute number. If the __init__ method is not called you could get an error when you try to call the method show_number.
As for the syntax, if nothing happens in the __init__ method of the inherited class you don't need to call it. If you think not calling the __init__ method would confuse others, you can always explain your reasoning with comments. It does not do any harm to call it even if you don't need it.
This answer has some downvotes because the downvoters disagree with me on the focus, and perhaps on what "convention" means. I think we mostly agree on the actual practice when it comes to writing code.
No. You should not normally, as a general rule, always call the superclass's __init__ in your class's __init__, regardless of whether or not you currently "need" the functionality in that method.
But please note that my emphasis is on that last phrase, starting with "regardless", and that is what my "no" answer is meant to address. You shouldn't be throwing something into your Python code "just because someone told you to" or "just because that seems to be what most people are doing".
You should include something if it is needed, and not include something if it is not.
It is very often the case, some would argue that it is normally the case, that you do want to call the superclass's __init__ method in your subclass's __init__ method. I do this myself most of the time.
But why?
Crucially, it is not because of some "convention". I do it because my subclass normally needs the same initialization as the superclass, plus a bit of extra customization. Note that the extra customization is the whole reason for overriding __init__ in the first place. If the initialization of your subclass is meant to be identical to that of the superclass, then you shouldn't be defining your own __init__ at all.
It's not a convention in Python to code something you don't need. Some people have their own conventions to include unnecessary things; perhaps in the name of "defensive programming" or because they are used to a different language in which more boilerplate is required.
Where Python's conventions come in is when you have a choice between multiple ways to express something useful. It's true that Python does not emphasize brevity above all else. But that doesn't mean it emphasizes verbosity either. So let me add this, in case it's not clear:
You should normally, as a general rule, always avoid unnecessary boilerplate code. (And not just in Python.)
[For those who think the phrase "normally always" is awkward or nonsensical: I completely agree, but I was trying to emphasize my point by repeating the asker's own choice of words.]
Yes. As a general rule you should call the superclass's __init__ from a subclass's __init__ method. This is not a Python convention, it is what OO best practice suggests you should do (given the language you happen to be using, leaves this decision up to you):
The superclass doesn't know about the subclass, but the subclass is expected to know about the semantics of the superclass it inherits from. Ultimately it is up to the subclass to maintain a consistent behavior of a true sub-typing (which unfortunately, the Python language does little to help the programmer with). You, as the subclass implementer get to decide whether or not you need to call the superclass __init__, just like you get to decide whether you do/don't need/want to call the superclass's implementation of any method you override. However, initialization of an object tends to be a pretty important step in the life-cycle of many objects. Until the object has been initialized one can argue the object is not truly an instance of the given class. It is an implied and important post-condition in ~all sane OO languages that when you instantiate an object certain things have happened and we can depend on those things having happened, initialization (or "construction" in other languages) being the central thing - whether it involves complex parameter validation and computed value generation, or is just a null op. So if you don't call super's initializer you better know exactly what your in for.
Another argument for calling super's __init__ as a general rule is, if you didn't write the superclass, or your not tightly controlling the version, the implementation of __init__ may change in future (does adding something to __init__ call for a major version bump? I'm not sure but I bet a lot of folks wouldn't bump the major version number for that, even if technically they should). So if you do call super's __init__ your code is less likely to break with updates to the superclass implementation.
Update: Should have read the linked question before answering this question. Most of the answers over there echo the general sentiment here - perhaps not in the same terms or as strongly in favor of calling __init__() as a general rule as I am. But I'll leave this answer for others that fall here.
I'm familiar with the theory about __new__ vs __init__. The former one defines how an instance of a class is created (new object inside the memory), whereas the latter one initializes it (assigns initial state attributes - fields). There is a couple of articles in the web about this, such as this one:
Use __new__ when you need to control the creation of a new instance.
Use __init__ when you need to control initialization of a new
instance.
As I said, I do understand the difference, yet, I can't imagine a real world example of situation when I need to use __new__ instead of __init__. If I can customize something during object creation, I can move it to object initialization - as long as it's the same object. The mentioned link says:
In general, you shouldn't need to override __new__ unless you're
subclassing an immutable type like str, int, unicode or tuple.
And here comes my question - can someone give an example of situation, when overriding __new__ is in fact the right solution that can't be done using __init__ and why is that?
Singletone pattern - the most obvious example.
When you've created one more object - it is not the singletone, right?
Thus, you have to handle this when you're creating your object. One variant of the solution - to use __new__() method.
This question already has answers here:
How do I access the child classes of an object in django without knowing the name of the child class?
(8 answers)
Closed 7 years ago.
I have a model
BaseModel
and several subclasses of it
ChildModelA(BaseModel), ChildModelB(BaseModel), ...
using multi-table inheritance. In future I plan to have dozens of subclass models.
All subclasses have some implementation of method
do_something()
How can I call do_somthing from a BaseModel instance?
Almost identical problem (without solution) is posted here:
http://peterbraden.co.uk/article/django-inheritance
A simpler question: how I resolve BaseModel instnace to one of its subclasses instance without checking all possible subclasses?
If you want to avoid checking all possible subclasses, the only way I can think of would be to store the class name associated with the subclass in a field defined on the base class. Your base class might have a method like this:
def resolve(self):
module, cls_name = self.class_name.rsplit(".",1)
module = import_module(module)
cls = getattr(module, cls_name)
return cls.objects.get(pk=self.pk)
This answer does not make me happy and I too would love to see a better solution, as I will be facing a similar problem soon.
Will you ever be working with an instance of the base type or will you always be working with instances of the children? If the latter is the case then call the method, even if you have a reference to the base type since the object itself IS-A child type.
Since Python support duck typing this means that your method call will be bond appropriately since the child instance will truly have this method.
A pythonic programming style which
determines an object’s type by
inspection of its method or attribute
signature rather than by explicit
relationship to some type object (“If
it looks like a duck and quacks like a
duck, it must be a duck.”) By
emphasizing interfaces rather than
specific types, well-designed code
improves its flexibility by allowing
polymorphic substitution. Duck-typing
avoids tests using type() or
isinstance(). (Note, however, that
duck-typing can be complemented with
abstract base classes.) Instead, it
typically employs hasattr() tests or
EAFP programming.
Note that EAFP stands for Easier to Ask Forgiveness than Permission:
Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many try and except statements. The technique contrasts with the LBYL style common to many other languages such as C.
I agree with Andrew. On a couple of sites we have a class that supports a whole bunch of methods (but not fields (this was pre-ORM refactor)) that are common to most-but-not-all of our content classes. They make use of hasattr to sidestep situations where the method doesn't make sense.
This means most of our classes are defined as:
class Foo(models.Model, OurKitchenSinkClass):
Basically it's sort of a MixIn type of thing. Works great, easy to maintain.