Porting Subclass of Unicode to Python 3

Porting Subclass of Unicode to Python 3 - python

I'm porting a legacy codebase from Python 2.7 to Python 3.6. In that codebase I have a number of instances of things like:
class EntityName(unicode):
#staticmethod
def __new__(cls, s):
clean = cls.strip_junk(s)
return super(EntityName, cls).__new__(cls, clean)
def __init__(self, s):
self._clean = s
self._normalized = normalized_name(self._clean)
self._simplified = simplified_name(self._clean)
self._is_all_caps = None
self._is_all_lower = None
super(EntityName, self).__init__(self._clean)
It might be called like this:
EntityName("Guy DeFalt")
When porting this to Python 3 the above code fails because unicode is no longer a class you can extend (at least, if there is an equivalent class I cannot find it). Given that str is unicode now, I tried to just swap str in, but the parent init doesn't take a the string value I'm trying to pass:
TypeError: object.__init__() takes no parameters
This makes sense because str does not have an __init__ method - this does not seem to be an idiomatic way of using this class. So my question has two major branches:
Is there a better way to be porting classes that sub-classed the old unicode class?
If subclassing str is appropriate, how should I modify the __init__ function for idiomatic behavior?

The right way to subclass a string or another immutable class in Python 3 is same as in Python 2:
class MyString(str):
def __new__(cls, initial_arguments): # no staticmethod
desired_string_value = get_desired_string_value(initial_arguments)
return super(MyString, cls).__new__(cls, desired_string_value)
# can be shortened to super().__new__(...)
def __init__(self, initial_arguments): # arguments are unused
self.whatever = whatever(self)
# no need to call super().__init__(), but if you must, do not pass arguments
There are several issues with your sample. First, why __new__ is #staticmethod? It's #classmethod, although you don't need to specify this. Second, the code seems to operate under the assumption that when you call __new__ of the superclass, it somehow calls your __init__ as well. I'm deriving this from looking at how self._clean is supposed to be set. This is not the case. When you call MyString(arguments), the following happens:
First Python calls __new__ with the class parameter (usually called cls) and arguments. __new__ must return the class instance. To do this it can create it, as we do, or do something else; e.g. it may return an existing one or, in fact, anything.
Then Python calls __init__ with the instance it received from __new__ (this parameter is usually called self) and the same arguments.
(There's a special case: Python won't call __init__ if __new__ returned something that is not a subclass of the passed class.)
Python uses class hierarchy to see which __new__ and __init__ to call. It's up to you to correctly sort out the arguments and use proper superclass calls in these two methods.

Related

Inheriting from "str" in Python 2.7 and Python 3

I'm porting some older code from python 2.7 to python 3, and I'm trying to figure out how inherinting from str works in both versions. Here is some of the code.
class OtoString(str):
def __init__(self, p_string):
str.__init__(self, p_string)
def is_url(self):
if self.startswith("http://") or self.startswith("https://"):
return True
else:
return False
print(OtoString("https://stackoverflow.com").is_url())
Running this in Python 2.7 works just fine, but when I run this code in Python 3.7 I get a TypeError:
TypeError: object.__init__() takes exactly one argument (the instance to initialize)
It would be helpful if someone could explain how exactly inheriting from str works, what this line does,
str.__init__(self, p_string)
why this doesn't work in Python 3 and how I could make it work.

str is an immutable type, and like all immutable types, should perform both construction and initialization in __new__, not __init__. The correct code (that should work on both Python 2 and Python 3) to replace __init__ would be:
def __new__(cls, p_string):
return str.__new__(cls, p_string)
Note that it receives a class object, not an existing instance, and it returns the result of calling __new__ on the superclass (because __new__ actually makes the new object, it doesn't just initialize one handed to it like __init__ does).
In this particular case, you should just omit the definition of __init__/__new__ entirely (you'll inherit str's version automatically). But if you need to do additional work (e.g. compute some normalized version of p_string before final construction), the __new__ above is the correct pattern.
Also, to avoid bloating the memory use of your class, I suggest adding:
__slots__ = ()
as the first line inside your class body; that will avoid making room for an unused __dict__ and __weakref__, keeping your behavior and overhead much closer to that of str (on my 64 bit Python 3.6, it reduces the per instance memory overhead, above the cost of the string data itself, from 217 bytes to 81 bytes). Final version would be just:
class OtoString(str):
__slots__ = ()
def is_url(self):
return self.startswith(("http://", "https://"))

__init__ is called after the object is constructed. str is immutable, so you cannot modify the value in constructor. The construction must take place in __new__, which is class method, so that's why the first parameter is cls and not self:
class OtoString(str):
def __new__(cls, *args, **kw):
return str.__new__(cls, *args, **kw)
def is_url(self):
if self.startswith("http://") or self.startswith("https://"):
return True
else:
return False
print(OtoString("https://stackoverflow.com").is_url())
Prints:
True

try this:
class OtoString(str):
def __init__(self,p_string):
str.__init__(self)
self.p_string=p_string
def is_url(self):
if self.p_string.startswith('https://') or self.p_string.startswith('http://'):
return True
return False
it works on my computer.
this line
str.__init__(self, p_string)
should be str.__init__(self) in python3.x, and it
actually inherits str's init. class OtoString(str)inherits all functions(including init) in str.

What's the correct way to implement a metaclass with a different signature than `type`?

Say I want to implement a metaclass that should serve as a class factory. But unlike the type constructor, which takes 3 arguments, my metaclass should be callable without any arguments:
Cls1 = MyMeta()
Cls2 = MyMeta()
...
For this purpose I defined a custom __new__ method with no parameters:
class MyMeta(type):
def __new__(cls):
return super().__new__(cls, 'MyCls', (), {})
But the problem is that python automatically calls the __init__ method with the same arguments as the __new__ method, so trying to call MyMeta() ends up throwing an exception:
TypeError: type.__init__() takes 1 or 3 arguments
Which makes sense, since type can be called with 1 or 3 arguments. But what's the correct way to fix this? I see 3 (4?) options:
I could add an empty __init__ method to my metaclass, but since I'm not sure if type.__init__ does anything important, this might not be a good idea.
I could implement an __init__ method that calls super().__init__(cls.__name__, cls.__bases__, vars(cls)).
I could use a meta-metaclass and override its __call__ method, rather than messing with __new__ and __init__.
Bonus option: Maybe I shouldn't try to change the signature?
So my question is: Are the 3 solutions I listed correct or are there any subtle bugs hidden in them? Which solution is best (i.e. the most correct)?

An interface deviating from the parent signature is a questionable design in regular classes too. You don't need the extra complexity of metaclasses to get into this kind of mess - you can cause the same new/init jumble by subclassing a datetime or whatever.
I want to have a metaclass and an easy way to create instances of that metaclass.
The usual pattern in Python is to write a factory using a from_something classmethod. To take the example of creating datetime instances from a different init signature, there is for example datetime.fromtimestamp, but you have many other examples too (dict.fromkeys, int.from_bytes, bytes.fromhex...)
There is nothing specific to metaclasses here, so use the same pattern:
class MyMeta(type):
#classmethod
def from_no_args(cls, name=None):
if name is None:
name = cls.__name__ + 'Instance'
return cls(name, (), {})
Usage:
>>> class A(metaclass=MyMeta):
... pass
...
>>> B = MyMeta.from_no_args()
>>> C = MyMeta.from_no_args(name='C')
>>> A.__name__
'A'
>>> B.__name__
'MyMetaInstance'
>>> C.__name__
'C'

Accessing the parameters of a constructor from a metaclass

TL;DR -
I have a class that uses a metaclass.
I would like to access the parameters of the object's constructor from the metaclass, just before the initialization process, but I couldn't find a way to access those parameters.
How can I access the constructor's parameters from the metaclass function __new__?
In order to practice the use of metaclasses in python, I would like to create a class that would be used as the supercomputer "Deep Thought" from the book "The Hitchhiker's Guide to the Galaxy".
The purpose of my class would be to store the various queries the supercomputer gets from users.
At the bottom line, it would just get some arguments and store them.
If one of the given arguments is number 42 or the string "The answer to life, the universe, and everything", I don't want to create a new object but rather return a pointer to an existing object.
The idea behind this is that those objects would be the exact same so when using the is operator to compare those two, the result would be true.
In order to be able to use the is operator and get True as an answer, I would need to make sure those variables point to the same object. So, in order to return a pointer to an existing object, I need to intervene in the middle of the initialization process of the object. I cannot check the given arguments at the constructor itself and modify the object's inner-variables accordingly because it would be too late: If I check the given parameters only as part of the __init__ function, those two objects would be allocated on different portions of the memory (they might be equal but won't return True when using the is operator).
I thought of doing something like that:
class SuperComputer(type):
answer = 42
def __new__(meta, name, bases, attributes):
# Check if args contains the number "42"
# or has the string "The answer to life, the universe, and everything"
# If so, just return a pointer to an existing object:
return SuperComputer.answer
# Else, just create the object as it is:
return super(SuperComputer, meta).__new__(meta, name, bases, attributes)
class Query(object):
__metaclass__ = SuperComputer
def __init__(self, *args, **kwargs):
self.args = args
for key, value in kwargs.items():
setattr(self, key, value)
def main():
number = Query(42)
string = Query("The answer to life, the universe, and everything")
other = Query("Sunny", "Sunday", 123)
num2 = Query(45)
print number is string # Should print True
print other is string # Should print False
print number is num2 # Should print False
if __name__ == '__main__':
main()
But I'm stuck on getting the parameters from the constructor.
I saw that the __new__ method gets only four arguments:
The metaclass instance itself, the name of the class, its bases, and its attributes.
How can I send the parameters from the constructor to the metaclass?
What can I do in order to achieve my goal?

You don't need a metaclass for that.
The fact is __init__ is not the "constructor" of an object in Python, rather, it is commonly called an "initializator" . The __new__ is closer to the role of a "constructor" in other languages, and it is not available only for the metaclass - all classes have a __new__ method. If it is not explicitly implemented, the object.__new__ is called directly.
And actually, it is object.__new__ which creates a new object in Python. From pure Python code, there is no other possible way to create an object: it will always go through there. That means that if you implement the __new__ method on your own class, you have the option of not creating a new instance, and instead return another pre-existing instance of the same class (or any other object).
You only have to keep in mind that: if __new__ returns an instance of the same class, then the default behavior is that __init__ is called on the same instance. Otherwise, __init__ is not called.
It is also worth noting that in recent years some recipe for creating "singletons" in Python using metaclasses became popular - it is actually an overkill approach,a s overriding __new__ is also preferable for creating singletons.
In your case, you just need to have a dictionary with the parameters you want to track as your keys, and check if you create a new instance or "recycle" one whenever __new__ runs. The dictionary may be a class attribute, or a global variable at module level - that is your pick:
class Recycler:
_instances = {}
def __new__(cls, parameter1, ...):
if parameter1 in cls._instances:
return cls._instances[parameter1]
self = super().__new__(cls) # don't pass remaining parameters to object.__new__
_instances[parameter1] = self
return self
If you'd have any code in __init__ besides that, move it to __new__ as well.
You can have a baseclass with this behavior and have a class hierarchy without needing to re-implement __new__ for every class.
As for a metaclass, none of its methods are called when actually creating a new instance of the classes created with that metaclass. It would only be of use to automatically insert this behavior, by decorating or creating a fresh __new__ method, on classes created with that metaclass. Since this behavior is easier to track, maintain, and overall to combine with other classes just using ordinary inheritance, no need for a metaclass at all.

Explicit passing of Self when calling super class's init in python

This question is in relation to posts at What does 'super' do in Python? , How do I initialize the base (super) class? , and Python: How do I make a subclass from a superclass? which describes two ways to initialize a SuperClass from within a SubClass as
class SuperClass:
def __init__(self):
return
def superMethod(self):
return
## One version of Initiation
class SubClass(SuperClass):
def __init__(self):
SuperClass.__init__(self)
def subMethod(self):
return
or
class SuperClass:
def __init__(self):
return
def superMethod(self):
return
## Another version of Initiation
class SubClass(SuperClass):
def __init__(self):
super(SubClass, self).__init__()
def subMethod(self):
return
So I'm a little confused about needing to explicitly pass self as a parameter in
SuperClass.__init__(self)
and
super(SubClass, self).__init__().
(In fact if I call SuperClass.__init__() I get the error
TypeError: __init__() missing 1 required positional argument: 'self'
). But when calling constructors or any other class method (ie :
## Calling class constructor / initiation
c = SuperClass()
k = SubClass()
## Calling class methods
c.superMethod()
k.superMethod()
k.subMethod()
), The self parameter is passed implicitly .
My understanding of the self keyword is it is not unlike the this pointer in C++, whereas it provides a reference to the class instance. Is this correct?
If there would always be a current instance (in this case SubClass), then why does self need to be explicitly included in the call to SuperClass.__init__(self)?
Thanks

This is simply method binding, and has very little to do with super. When you can x.method(*args), Python checks the type of x for a method named method. If it finds one, it "binds" the function to x, so that when you call it, x will be passed as the first parameter, before the rest of the arguments.
When you call a (normal) method via its class, no such binding occurs. If the method expects its first argument to be an instance (e.g. self), you need to pass it in yourself.
The actual implementation of this binding behavior is pretty neat. Python objects are "descriptors" if they have a __get__ method (and/or __set__ or __delete__ methods, but those don't matter for methods). When you look up an attribute like a.b, Python checks the class of a to see if it has a attribute b that is a descriptor. If it does, it translates a.b into type(a).b.__get__(a, type(a)). If b is a function, it will have a __get__ method that implements the binding behavior I described above. Other kinds of descriptors can have different behaviors. For instance, the classmethod decorator replaces a method with a special descriptor that binds the function the class, rather than the instance.
Python's super creates special objects that handle attribute lookups differently than normal objects, but the details don't matter too much for this issue. The binding behavior of methods called through super is just like what I described in the first paragraph, so self gets passed automatically to the bound method when it is called. The only thing special about super is that it may bind a different function than you'd get lookup up the same method name on self (that's the whole point of using it).

The following example might elucidate things:
class Example:
def method(self):
pass
>>> print(Example.method)
<unbound method Example.method>
>>> print(Example().method)
<bound method Example.method of <__main__.Example instance at 0x01EDCDF0>>
When a method is bound, the instance is passed implicitly. When a method is unbound, the instance needs to be passed explicitly.
The other answers will definitely offer some more detail on the binding process, but I think it's worth showing the above snippet.

The answer is non-trivial and would probably warrant a good article. A very good explanation of how super() works is brilliantly given by Raymond Hettinger in a Pycon 2015 talk, available here and a related article.
I will attempt a short answer and if it is not sufficient I (and hopefully the community) will expand on it.
The answer has two key pieces:
Python's super() needs to have an object on which the method being overridden is called, so it is explicitly passed with self. This is not the only possible implementation and in fact, in Python 3, it is no longer required that you pass the self instance.
Python super() is not like Java, or other compiled languages, super. Python's implementation is designed to support the multiple collaborative inheritance paradigm, as explained in Hettinger's talk.
This has an interesting consequence in Python: the method resolution in super() depends not only on the parent class, but on the children classes as well (consequence of multiple inheritance). Note that Hettinger is using Python 3.
The official Python 2.7 documentation on super is also a good source of information (better understood after watching the talk, in my opinion).

Because in SuperClass.__init__(self), you're calling the method on the class, not the instance, so it cannot be passed implicitly. Similarly you cannot just call SubClass.subMethod(), but you can call SubClass.subMethod(k) and it'll be equivalent to k.subMethod(). Similarly if self refers to a SubClass then self.__init__() means SubClass.__init__(self), so if you want to call SuperClass.__init you have to call it directly.

How to (or why not) call unicode.init from subclass

I've encountered a situation where subclassing unicode results in Deprecation Warnings on Python prior to 3.3 and errors on Python 3.3:
# prove that unicode.__init__ accepts parameters
s = unicode('foo')
s.__init__('foo')
unicode.__init__(s, 'foo')
class unicode2(unicode):
def __init__(self, other):
super(unicode2, self).__init__(other)
s = unicode2('foo')
class unicode3(unicode):
def __init__(self, other):
unicode.__init__(self, other)
s = unicode3('foo')
Curiously, the warnings/errors don't occur in the first three lines, but instead occur on lines 8 and 14. Here's the output on Python 2.7.
> python -Wd .\init.py
.\init.py:8: DeprecationWarning: object.__init__() takes no parameters
super(unicode2, self).__init__(other)
.\init.py:14: DeprecationWarning: object.__init__() takes no parameters
unicode.__init__(self, other)
The code is simplified to exemplify the issue. In a real-world application, I would perform more than simply calling the super __init__.
It appears from the first three lines that the unicode class implements __init__ and that method accepts at least a single parameter. However, if I want to call that method from a subclass, I appear to be unable to do so, whether I invoke super() or not.
Why is it okay to call unicode.__init__ on a unicode instance but not on a unicode subclass? What is an author to do if subclassing the unicode class?

I suspect the issue comes from the fact that unicode is immutable.
After a unicode instance is created, it cannot be modified. So, any initialization logic is going to be in the __new__ method (which is called to do the instance creation), rather than __init__ (which is called only after the instance exists).
A subclass of an immutable type doesn't have the same strict requirements, so you can do things in unicode2.__init__ if you want, but calling unicode.__init__ is unnecessary (and probably won't do what you think it would do anyway).
A better solution is probably to do your customized logic in your own __new__ method:
class unicode2(unicode):
def __new__(cls, value):
# optionally do stuff to value here
self = super(unicode2, cls).__new__(cls, value)
# optionally do stuff to self here
return self
You can make your class immutable too, if you want, by giving it a __setattr__ method that always raises an exception (you might also want to give the class a __slots__ property to save memory by omitting the per-instance __dict__).

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Porting Subclass of Unicode to Python 3 - python

Related

Inheriting from "str" in Python 2.7 and Python 3

What's the correct way to implement a metaclass with a different signature than `type`?

Accessing the parameters of a constructor from a metaclass

Explicit passing of Self when calling super class's init in python

How to (or why not) call unicode.init from subclass

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Porting Subclass of Unicode to Python 3 - python

Related

Inheriting from "str" in Python 2.7 and Python 3

What's the correct way to implement a metaclass with a different signature than `type`?

Accessing the parameters of a constructor from a metaclass

Explicit passing of Self when calling super class's __init__ in python

How to (or why not) call unicode.__init__ from subclass

Categories

Resources

Explicit passing of Self when calling super class's init in python

How to (or why not) call unicode.init from subclass