How to (or why not) call unicode.__init__ from subclass - python

I've encountered a situation where subclassing unicode results in Deprecation Warnings on Python prior to 3.3 and errors on Python 3.3:
# prove that unicode.__init__ accepts parameters
s = unicode('foo')
s.__init__('foo')
unicode.__init__(s, 'foo')
class unicode2(unicode):
def __init__(self, other):
super(unicode2, self).__init__(other)
s = unicode2('foo')
class unicode3(unicode):
def __init__(self, other):
unicode.__init__(self, other)
s = unicode3('foo')
Curiously, the warnings/errors don't occur in the first three lines, but instead occur on lines 8 and 14. Here's the output on Python 2.7.
> python -Wd .\init.py
.\init.py:8: DeprecationWarning: object.__init__() takes no parameters
super(unicode2, self).__init__(other)
.\init.py:14: DeprecationWarning: object.__init__() takes no parameters
unicode.__init__(self, other)
The code is simplified to exemplify the issue. In a real-world application, I would perform more than simply calling the super __init__.
It appears from the first three lines that the unicode class implements __init__ and that method accepts at least a single parameter. However, if I want to call that method from a subclass, I appear to be unable to do so, whether I invoke super() or not.
Why is it okay to call unicode.__init__ on a unicode instance but not on a unicode subclass? What is an author to do if subclassing the unicode class?

I suspect the issue comes from the fact that unicode is immutable.
After a unicode instance is created, it cannot be modified. So, any initialization logic is going to be in the __new__ method (which is called to do the instance creation), rather than __init__ (which is called only after the instance exists).
A subclass of an immutable type doesn't have the same strict requirements, so you can do things in unicode2.__init__ if you want, but calling unicode.__init__ is unnecessary (and probably won't do what you think it would do anyway).
A better solution is probably to do your customized logic in your own __new__ method:
class unicode2(unicode):
def __new__(cls, value):
# optionally do stuff to value here
self = super(unicode2, cls).__new__(cls, value)
# optionally do stuff to self here
return self
You can make your class immutable too, if you want, by giving it a __setattr__ method that always raises an exception (you might also want to give the class a __slots__ property to save memory by omitting the per-instance __dict__).

Related

Inheriting from "str" in Python 2.7 and Python 3

I'm porting some older code from python 2.7 to python 3, and I'm trying to figure out how inherinting from str works in both versions. Here is some of the code.
class OtoString(str):
def __init__(self, p_string):
str.__init__(self, p_string)
def is_url(self):
if self.startswith("http://") or self.startswith("https://"):
return True
else:
return False
print(OtoString("https://stackoverflow.com").is_url())
Running this in Python 2.7 works just fine, but when I run this code in Python 3.7 I get a TypeError:
TypeError: object.__init__() takes exactly one argument (the instance to initialize)
It would be helpful if someone could explain how exactly inheriting from str works, what this line does,
str.__init__(self, p_string)
why this doesn't work in Python 3 and how I could make it work.
str is an immutable type, and like all immutable types, should perform both construction and initialization in __new__, not __init__. The correct code (that should work on both Python 2 and Python 3) to replace __init__ would be:
def __new__(cls, p_string):
return str.__new__(cls, p_string)
Note that it receives a class object, not an existing instance, and it returns the result of calling __new__ on the superclass (because __new__ actually makes the new object, it doesn't just initialize one handed to it like __init__ does).
In this particular case, you should just omit the definition of __init__/__new__ entirely (you'll inherit str's version automatically). But if you need to do additional work (e.g. compute some normalized version of p_string before final construction), the __new__ above is the correct pattern.
Also, to avoid bloating the memory use of your class, I suggest adding:
__slots__ = ()
as the first line inside your class body; that will avoid making room for an unused __dict__ and __weakref__, keeping your behavior and overhead much closer to that of str (on my 64 bit Python 3.6, it reduces the per instance memory overhead, above the cost of the string data itself, from 217 bytes to 81 bytes). Final version would be just:
class OtoString(str):
__slots__ = ()
def is_url(self):
return self.startswith(("http://", "https://"))
__init__ is called after the object is constructed. str is immutable, so you cannot modify the value in constructor. The construction must take place in __new__, which is class method, so that's why the first parameter is cls and not self:
class OtoString(str):
def __new__(cls, *args, **kw):
return str.__new__(cls, *args, **kw)
def is_url(self):
if self.startswith("http://") or self.startswith("https://"):
return True
else:
return False
print(OtoString("https://stackoverflow.com").is_url())
Prints:
True
try this:
class OtoString(str):
def __init__(self,p_string):
str.__init__(self)
self.p_string=p_string
def is_url(self):
if self.p_string.startswith('https://') or self.p_string.startswith('http://'):
return True
return False
it works on my computer.
this line
str.__init__(self, p_string)
should be str.__init__(self) in python3.x, and it
actually inherits str's init. class OtoString(str)inherits all functions(including init) in str.

Porting Subclass of Unicode to Python 3

I'm porting a legacy codebase from Python 2.7 to Python 3.6. In that codebase I have a number of instances of things like:
class EntityName(unicode):
#staticmethod
def __new__(cls, s):
clean = cls.strip_junk(s)
return super(EntityName, cls).__new__(cls, clean)
def __init__(self, s):
self._clean = s
self._normalized = normalized_name(self._clean)
self._simplified = simplified_name(self._clean)
self._is_all_caps = None
self._is_all_lower = None
super(EntityName, self).__init__(self._clean)
It might be called like this:
EntityName("Guy DeFalt")
When porting this to Python 3 the above code fails because unicode is no longer a class you can extend (at least, if there is an equivalent class I cannot find it). Given that str is unicode now, I tried to just swap str in, but the parent init doesn't take a the string value I'm trying to pass:
TypeError: object.__init__() takes no parameters
This makes sense because str does not have an __init__ method - this does not seem to be an idiomatic way of using this class. So my question has two major branches:
Is there a better way to be porting classes that sub-classed the old unicode class?
If subclassing str is appropriate, how should I modify the __init__ function for idiomatic behavior?
The right way to subclass a string or another immutable class in Python 3 is same as in Python 2:
class MyString(str):
def __new__(cls, initial_arguments): # no staticmethod
desired_string_value = get_desired_string_value(initial_arguments)
return super(MyString, cls).__new__(cls, desired_string_value)
# can be shortened to super().__new__(...)
def __init__(self, initial_arguments): # arguments are unused
self.whatever = whatever(self)
# no need to call super().__init__(), but if you must, do not pass arguments
There are several issues with your sample. First, why __new__ is #staticmethod? It's #classmethod, although you don't need to specify this. Second, the code seems to operate under the assumption that when you call __new__ of the superclass, it somehow calls your __init__ as well. I'm deriving this from looking at how self._clean is supposed to be set. This is not the case. When you call MyString(arguments), the following happens:
First Python calls __new__ with the class parameter (usually called cls) and arguments. __new__ must return the class instance. To do this it can create it, as we do, or do something else; e.g. it may return an existing one or, in fact, anything.
Then Python calls __init__ with the instance it received from __new__ (this parameter is usually called self) and the same arguments.
(There's a special case: Python won't call __init__ if __new__ returned something that is not a subclass of the passed class.)
Python uses class hierarchy to see which __new__ and __init__ to call. It's up to you to correctly sort out the arguments and use proper superclass calls in these two methods.

calling super subclass in python 2.7

I am trying to understand some basic OOP in python. If I try to subclass a class like list, how do I invoke the parent constructor? After tinkering for a bit I found that it is:
super(subclass_name, self).__init__(args).
However I dont intuitively understand this. Why can't I just do list(args)? or
list.__init__(args)?
The following is the relevant snippet:
class slist(list):
def __init__(self, iterable):
#super(slist, self).__init__(iterable) <--- This works)
list.__init__(iterable) # This does not work
self.__l = list(iterable)
def __str__(self):
return ",".join([str(s) for s in self.__l])
list.__init__(iterable) is missing the information of which list to initialize, and list(iterable) builds a different list entirely unrelated to the one you're trying to initialize.
If you don't want to use super, you can do list.__init__(self, iterable).
list.__init__(iterable) is incorrect. You need to tell __init__() which object it is initializing. list(args) is even more incorrect, as it creates a completely new list object rather than initializing your object. It calls list.__new__() rather than list.__init__(). You need to pass self to the constructor call to correctly initialize the parent class:
list.__init__(self, args)
and that would work. Using super() though usually allows for cleaner syntax. For example, the above could be rewritten as:
super(slist, self).__init__(args)
However, the main reason useage of super() is encouraged over simply calling the parent constructor, is for cases where you have multiple inheritance. super() will automatically call the constructor of each parent class in the correct order. This is closely related to Python's Method Resolution Order.

Explicit passing of Self when calling super class's __init__ in python

This question is in relation to posts at What does 'super' do in Python? , How do I initialize the base (super) class? , and Python: How do I make a subclass from a superclass? which describes two ways to initialize a SuperClass from within a SubClass as
class SuperClass:
def __init__(self):
return
def superMethod(self):
return
## One version of Initiation
class SubClass(SuperClass):
def __init__(self):
SuperClass.__init__(self)
def subMethod(self):
return
or
class SuperClass:
def __init__(self):
return
def superMethod(self):
return
## Another version of Initiation
class SubClass(SuperClass):
def __init__(self):
super(SubClass, self).__init__()
def subMethod(self):
return
So I'm a little confused about needing to explicitly pass self as a parameter in
SuperClass.__init__(self)
and
super(SubClass, self).__init__().
(In fact if I call SuperClass.__init__() I get the error
TypeError: __init__() missing 1 required positional argument: 'self'
). But when calling constructors or any other class method (ie :
## Calling class constructor / initiation
c = SuperClass()
k = SubClass()
## Calling class methods
c.superMethod()
k.superMethod()
k.subMethod()
), The self parameter is passed implicitly .
My understanding of the self keyword is it is not unlike the this pointer in C++, whereas it provides a reference to the class instance. Is this correct?
If there would always be a current instance (in this case SubClass), then why does self need to be explicitly included in the call to SuperClass.__init__(self)?
Thanks
This is simply method binding, and has very little to do with super. When you can x.method(*args), Python checks the type of x for a method named method. If it finds one, it "binds" the function to x, so that when you call it, x will be passed as the first parameter, before the rest of the arguments.
When you call a (normal) method via its class, no such binding occurs. If the method expects its first argument to be an instance (e.g. self), you need to pass it in yourself.
The actual implementation of this binding behavior is pretty neat. Python objects are "descriptors" if they have a __get__ method (and/or __set__ or __delete__ methods, but those don't matter for methods). When you look up an attribute like a.b, Python checks the class of a to see if it has a attribute b that is a descriptor. If it does, it translates a.b into type(a).b.__get__(a, type(a)). If b is a function, it will have a __get__ method that implements the binding behavior I described above. Other kinds of descriptors can have different behaviors. For instance, the classmethod decorator replaces a method with a special descriptor that binds the function the class, rather than the instance.
Python's super creates special objects that handle attribute lookups differently than normal objects, but the details don't matter too much for this issue. The binding behavior of methods called through super is just like what I described in the first paragraph, so self gets passed automatically to the bound method when it is called. The only thing special about super is that it may bind a different function than you'd get lookup up the same method name on self (that's the whole point of using it).
The following example might elucidate things:
class Example:
def method(self):
pass
>>> print(Example.method)
<unbound method Example.method>
>>> print(Example().method)
<bound method Example.method of <__main__.Example instance at 0x01EDCDF0>>
When a method is bound, the instance is passed implicitly. When a method is unbound, the instance needs to be passed explicitly.
The other answers will definitely offer some more detail on the binding process, but I think it's worth showing the above snippet.
The answer is non-trivial and would probably warrant a good article. A very good explanation of how super() works is brilliantly given by Raymond Hettinger in a Pycon 2015 talk, available here and a related article.
I will attempt a short answer and if it is not sufficient I (and hopefully the community) will expand on it.
The answer has two key pieces:
Python's super() needs to have an object on which the method being overridden is called, so it is explicitly passed with self. This is not the only possible implementation and in fact, in Python 3, it is no longer required that you pass the self instance.
Python super() is not like Java, or other compiled languages, super. Python's implementation is designed to support the multiple collaborative inheritance paradigm, as explained in Hettinger's talk.
This has an interesting consequence in Python: the method resolution in super() depends not only on the parent class, but on the children classes as well (consequence of multiple inheritance). Note that Hettinger is using Python 3.
The official Python 2.7 documentation on super is also a good source of information (better understood after watching the talk, in my opinion).
Because in SuperClass.__init__(self), you're calling the method on the class, not the instance, so it cannot be passed implicitly. Similarly you cannot just call SubClass.subMethod(), but you can call SubClass.subMethod(k) and it'll be equivalent to k.subMethod(). Similarly if self refers to a SubClass then self.__init__() means SubClass.__init__(self), so if you want to call SuperClass.__init you have to call it directly.

Emulating membership-test in Python: delegating __contains__ to contained-object correctly

I am used to that Python allows some neat tricks to delegate functionality to other objects. One example is delegation to contained objects.
But it seams, that I don't have luck, when I want to delegate __contains __:
class A(object):
def __init__(self):
self.mydict = {}
self.__contains__ = self.mydict.__contains__
a = A()
1 in a
I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: argument of type 'A' is not iterable
What I am making wrong? When I call a.__contains __(1), everything goes smooth. I even tried to define an __iter __ method in A to make A more look like an iterable, but it did not help. What I am missing out here?
Special methods such as __contains__ are only special when defined on the class, not on the instance (except in legacy classes in Python 2, which you should not use anyway).
So, do your delegation at class level:
class A(object):
def __init__(self):
self.mydict = {}
def __contains__(self, other):
return self.mydict.__contains__(other)
I'd actually prefer to spell the latter as return other in self.mydict, but that's a minor style issue.
Edit: if and when "totally dynamic per-instance redirecting of special methods" (like old-style classes offered) is indispensable, it's not hard to implement it with new-style classes: you just need each instance that has such peculiar need to be wrapped in its own special class. For example:
class BlackMagic(object):
def __init__(self):
self.mydict = {}
self.__class__ = type(self.__class__.__name__, (self.__class__,), {})
self.__class__.__contains__ = self.mydict.__contains__
Essentially, after the little bit of black magic reassigning self.__class__ to a new class object (which behaves just like the previous one but has an empty dict and no other instances except this one self), anywhere in an old-style class you would assign to self.__magicname__, assign to self.__class__.__magicname__ instead (and make sure it's a built-in or staticmethod, not a normal Python function, unless of course in some different case you do want it to receive the self when called on the instance).
Incidentally, the in operator on an instance of this BlackMagic class is faster, as it happens, than with any of the previously proposed solutions -- or at least so I'm measuring with my usual trusty -mtimeit (going directly to the built-in method, instead of following normal lookup routes involving inheritance and descriptors, shaves a bit of the overhead).
A metaclass to automate the self.__class__-per-instance idea would not be hard to write (it could do the dirty work in the generated class's __new__ method, and maybe also set all magic names to actually assign on the class if assigned on the instance, either via __setattr__ or many, many properties). But that would be justified only if the need for this feature was really widespread (e.g. porting a huge ancient Python 1.5.2 project that liberally use "per-instance special methods" to modern Python, including Python 3).
Do I recommend "clever" or "black magic" solutions? No, I don't: almost invariably it's better to do things in simple, straightforward ways. But "almost" is an important word here, and it's nice to have at hand such advanced "hooks" for the rare, but not non-existent, situations where their use may actually be warranted.

Categories

Resources