Python how to extend `str` and overload its constructor? [duplicate] - python

This question already has answers here:
inheritance from str or int
(6 answers)
Closed 7 years ago.
I have an sequence of characters, a string if you will, but I want to store metadata about the origin of the string. Additionally I want to provide a simplified constructor.
I've tried extending the str class in as many ways as Google would resolve for me. I gave up when I came to this;
class WcStr(str):
"""wc value and string flags"""
FLAG_NIBBLES = 8 # Four Bytes
def __init__(self, value, flags):
super(WcStr, self).__init__()
self.value = value
self.flags = flags
#classmethod
def new_nibbles(cls, nibbles, flag_nibbles=None):
if flag_nibbles is None:
flag_nibbles = cls.FLAG_NIBBLES
return cls(
nibbles[flag_nibbles+1:],
nibbles[:flag_nibbles]
)
When I comment-out both parameters to #classmethod's cls() call it gives me this error:
TypeError: __init__() takes exactly 3 arguments (1 given)
Pretty typical, wrong number of args error,
With a two more arguments (eg as shown in the example code):
TypeError: str() takes at most 1 argument (2 given)
I've tried changing the __init__'s args, the super().__init__'s args, neither seem to make ant change.
With only one argument passed to cls(...) call, as the str class's error asks, I get this:
TypeError: __init__() takes exactly 3 arguments (2 given)
So I can't win here, whats gone wrong?
Ps this should be a second post but what property does str's raw string value get put into? I'd like to overload as little of the str class as I can to add this metadata into the constructor.

This is exactly what the __new__ method is for.
In Python, creating an object actually has two steps. In pseudocode:
value = the_class.__new__(the_class, *args, **kwargs)
if isinstance(value, the_class):
value.__init__(*args, **kwargs)
The two steps are called construction and initialization. Most types don't need anything fancy in construction, so they can just use the default __new__ and define an __init__ method—which is why tutorials, etc. only mention __init__.
But str objects are immutable, so the initializer can't do the usual stuff of setting up attributes and so on, because you can't set attributes on an immutable object.
So, if you want to change what the str actually holds, you have to override its __new__ method, and call the super __new__ with your modified arguments.
In this case, you don't actually want to do that… but you do want to make sure str.__new__ doesn't see your extra arguments, so you still need to override it, just to hide those arguments from it.
Meanwhile, you ask:
what property does str's raw string value get put into?
It doesn't. What would be the point? Its value is a string, so you'd have a str which had an attribute which was the same str which had an attribute which etc. ad infinitum.
Under the covers, of course, it has to be storing something. But that's under the covers. In particular, in CPython, the str class is implemented in C, and it contains, among other things, a C char * array of the actual bytes used to represent the string. You can't access that directly.
But, as a subclass of str, if you want to know your value as a string, that's just self. That's the whole point of being a subclass, after all.
So:
class WcStr(str):
"""wc value and string flags"""
FLAG_NIBBLES = 8 # Four Bytes
def __new__(cls, value, *args, **kwargs):
# explicitly only pass value to the str constructor
return super(WcStr, cls).__new__(cls, value)
def __init__(self, value, flags):
# ... and don't even call the str initializer
self.flags = flags
Of course you don't really need __init__ here; you could do your initialization along with your construction in __new__. But if you don't intend for flags to be an immutable, only-set-during-construction kind of value, it makes more conceptual sense to do it the initializer, just like any normal class.
Meanwhile:
I'd like to overload as little of the str class as I can
That may not do what you want. For example, str.__add__ and str.__getitem__ are going to return a str, not an instance of your subclass. If that's good, then you're done. If not, you will have to overload all of those methods and change them to wrap up the return value with the appropriate metadata. (You can do this programmatically, either by generating wrappers at class definition time, or by using a __getattr__ method that generates wrappers on the fly.)
One last thing to consider: the str constructor doesn't take exactly one argument.
It can take 0:
str() == ''
And, while this isn't relevant in Python 2, in Python 3 it can take 2:
str(b'abc', 'utf-8') == 'abc'
Plus, even when it takes 1 argument, it obviously doesn't have to be a string:
str(123) == '123'
So… are you sure this is the interface you want? Maybe you'd be better off creating an object that owns a string (in self.value), and just using it explicitly. Or even using it implicitly, duck-typing as a str by just delegating most or all of the str methods to self.value?

Instead of __init__ try new:
def __new__(cls, value, flags):
obj = str.__new__(cls, value)
obj.flags = flags
return obj

Related

Issues extending python in-built types with additional attributes and methods

I'm currently creating a container for data that are of built-in python types. The data provided is of an unknown built-in type.
Since built-in types can't have their attributes and methods added/modified.
However, I have issues with instantiation of the inherited class.
NOTE: The following code would be inside of another class
With this code:
def createContainer(data):
dataType = type(data)
class container(dataType):
def __init__(self, data):
super().__init__(data)
## Instantiate the container attributes here
return container(data)
Of the types I tested, data of these types were properly created:
List
Dictionary
Set
Bytearray
However, this left a lot of types that wouldn't work, this would result in the error:
TypeError: object.__init__() takes exactly one argument (the instance to initialize)
Question 1 - why is this error occur when I am only passing one argument regardless of the data type.
When attempting this route:
class container(dataType):
def __init__(self, data):
super().__init__()
## Instantiate the container attributes here
Of the types tested, data of these types were properly stored:
String
Integer
Float
Tuple
Complex
Byte
Frozenset
The other types (list, dictionary, set, bytearray) were created, but were empty, and did not contain the data I passed.
Question 2a - I am not passing any argument to the super constructor, so why are the above types being initialized properly, when no data is being passed?
Question 2b - when i replace super().__init__() with dataType.__init__(data) I get the same results. I'm assuming they do the same thing in this instance, but how is that possible if we didn't provide the argument into super().__init__()
Question 3 - From what I can see, method 1 only works with mutable types. Whereas method 2, only works with immutable types. But how would this effect the way these types are instantiated?
Final Question - I know I could write conditionals in the constructor using isinstance() to check the type and create the container and initialize in the method that works. But is there a 'one size fits all' approach?
The constructors in both classes are redundant. It sounds like you are aiming to create a data structure that works both for mutable and immutable data types. I doubt it will ever work.
Most of your problems are due to the fact that type(...) doesn't return the parent class, but a <class 'type'>. So when you do class container (dataType), you create a class of the type ... type.
# example
>>> type(str) #when you expect str, it returns ...
<class 'type'>
To solve this, change dataType = type(data) to dataType = data.__class__.
# second example
>>> "my data is str".__class__
<class 'str'>
EDIT:
When you do super().__init__() and get the error TypeError: object.__init__() takes exactly one argument (the instance to initialize), it's due to the fact that when you create a class with a __init__ function, it's always __init__(self, ...). In your case, super() returns the object class, and the object's __init__ function is defined as __init__(self) with no more arguments. So it expects one argument, self. But you don't pass it, and it raises an error. I hope you understand what I mean.

Porting Subclass of Unicode to Python 3

I'm porting a legacy codebase from Python 2.7 to Python 3.6. In that codebase I have a number of instances of things like:
class EntityName(unicode):
#staticmethod
def __new__(cls, s):
clean = cls.strip_junk(s)
return super(EntityName, cls).__new__(cls, clean)
def __init__(self, s):
self._clean = s
self._normalized = normalized_name(self._clean)
self._simplified = simplified_name(self._clean)
self._is_all_caps = None
self._is_all_lower = None
super(EntityName, self).__init__(self._clean)
It might be called like this:
EntityName("Guy DeFalt")
When porting this to Python 3 the above code fails because unicode is no longer a class you can extend (at least, if there is an equivalent class I cannot find it). Given that str is unicode now, I tried to just swap str in, but the parent init doesn't take a the string value I'm trying to pass:
TypeError: object.__init__() takes no parameters
This makes sense because str does not have an __init__ method - this does not seem to be an idiomatic way of using this class. So my question has two major branches:
Is there a better way to be porting classes that sub-classed the old unicode class?
If subclassing str is appropriate, how should I modify the __init__ function for idiomatic behavior?
The right way to subclass a string or another immutable class in Python 3 is same as in Python 2:
class MyString(str):
def __new__(cls, initial_arguments): # no staticmethod
desired_string_value = get_desired_string_value(initial_arguments)
return super(MyString, cls).__new__(cls, desired_string_value)
# can be shortened to super().__new__(...)
def __init__(self, initial_arguments): # arguments are unused
self.whatever = whatever(self)
# no need to call super().__init__(), but if you must, do not pass arguments
There are several issues with your sample. First, why __new__ is #staticmethod? It's #classmethod, although you don't need to specify this. Second, the code seems to operate under the assumption that when you call __new__ of the superclass, it somehow calls your __init__ as well. I'm deriving this from looking at how self._clean is supposed to be set. This is not the case. When you call MyString(arguments), the following happens:
First Python calls __new__ with the class parameter (usually called cls) and arguments. __new__ must return the class instance. To do this it can create it, as we do, or do something else; e.g. it may return an existing one or, in fact, anything.
Then Python calls __init__ with the instance it received from __new__ (this parameter is usually called self) and the same arguments.
(There's a special case: Python won't call __init__ if __new__ returned something that is not a subclass of the passed class.)
Python uses class hierarchy to see which __new__ and __init__ to call. It's up to you to correctly sort out the arguments and use proper superclass calls in these two methods.

How to (or why not) call unicode.__init__ from subclass

I've encountered a situation where subclassing unicode results in Deprecation Warnings on Python prior to 3.3 and errors on Python 3.3:
# prove that unicode.__init__ accepts parameters
s = unicode('foo')
s.__init__('foo')
unicode.__init__(s, 'foo')
class unicode2(unicode):
def __init__(self, other):
super(unicode2, self).__init__(other)
s = unicode2('foo')
class unicode3(unicode):
def __init__(self, other):
unicode.__init__(self, other)
s = unicode3('foo')
Curiously, the warnings/errors don't occur in the first three lines, but instead occur on lines 8 and 14. Here's the output on Python 2.7.
> python -Wd .\init.py
.\init.py:8: DeprecationWarning: object.__init__() takes no parameters
super(unicode2, self).__init__(other)
.\init.py:14: DeprecationWarning: object.__init__() takes no parameters
unicode.__init__(self, other)
The code is simplified to exemplify the issue. In a real-world application, I would perform more than simply calling the super __init__.
It appears from the first three lines that the unicode class implements __init__ and that method accepts at least a single parameter. However, if I want to call that method from a subclass, I appear to be unable to do so, whether I invoke super() or not.
Why is it okay to call unicode.__init__ on a unicode instance but not on a unicode subclass? What is an author to do if subclassing the unicode class?
I suspect the issue comes from the fact that unicode is immutable.
After a unicode instance is created, it cannot be modified. So, any initialization logic is going to be in the __new__ method (which is called to do the instance creation), rather than __init__ (which is called only after the instance exists).
A subclass of an immutable type doesn't have the same strict requirements, so you can do things in unicode2.__init__ if you want, but calling unicode.__init__ is unnecessary (and probably won't do what you think it would do anyway).
A better solution is probably to do your customized logic in your own __new__ method:
class unicode2(unicode):
def __new__(cls, value):
# optionally do stuff to value here
self = super(unicode2, cls).__new__(cls, value)
# optionally do stuff to self here
return self
You can make your class immutable too, if you want, by giving it a __setattr__ method that always raises an exception (you might also want to give the class a __slots__ property to save memory by omitting the per-instance __dict__).

Python: Why would float.__new__(cls) work when cls isn't float?

I'm a little confused by the following example from the python documentation here.
>>> class inch(float):
... "Convert from inch to meter"
... def __new__(cls, arg=0.0):
... return float.__new__(cls, arg*0.0254)
...
>>> print inch(12)
0.3048
>>>
Presumably, float is here is the actual float class defined somewhere deep inside Python. When we call float.__new__(cls, argument) , we're sneakily calling the function that returns instances of float for a given argument, but we're passing it the inch class instead of the float class. Since the inch class doesn't really do anything, why does this work?
Because inch is a subclass of float, it satisfies all the requirements that the float.__new__() instance factory has. It is the job of the __new__(cls) static method to create instances of the first argument, not of it's 'own' class.
Note the word 'static method' there. The __new__ factory is really just a specialist function tied to a class only for inheritance reasons. In other words, it is a function that plays well in a object-oriented hierarchy. You are supposed to find it via super() or perhaps call it directly (as done here). The following would actually be a little more pythonic:
def __new__(cls, arg=0.0):
return super(inch, cls).__new__(cls, arg*0.0254)
because that would call the 'correct' __new__ function if inch were to be used in a multiple-inheritance hierarchy; in this simple example it'll end up calling float.__new__ just the same.
So, __new__(cls, ...) is expected to create an instance of type cls. Why then tie it to a class at all and not make it a more generic function then? Because in the case of float.__new__(cls, value) it not only creates a new instance of type cls, it also sets it's initial value to value. And in order for that to work, float.__new__(...) needs to have intimate knowledge of what the float class looks like. Because inch() is a subclass of float(), it has the exact same necessary bits to be a float() too, and thus when the float.__new__() factory creates a new inch instance, all those bits are there to make it a inch() instance instead.
A little background is needed to answer this question:
Only object.__new__() can create a new instances type of objects, this
kind of objects cannot be subclassed.
An instance has a type, which can be assigned when passing the type name
cls to __new__(cls) as the first argument. class keyword creats
another kind of objects: classes (a.k.a types), and these kinds of objects
can be subclassed.
Now, go back to your example, what
float.__new__(cls, argument)
essentially does is using object.__new__(cls) to create a new instance (float.__base__ is object),
assign the type cls (inch in this case) to it, and also does something with argument defined in float.__new__.
So it is not surprising that it'd work when cls isn't float but inch: the class/type is already created by class inch(float), you are just assigning this type to a new instance.

__new__ and __init__ in Python

I am learning Python and so far I can tell the things below about __new__ and __init__:
__new__ is for object creation
__init__ is for object initialization
__new__ is invoked before __init__ as __new__ returns a new instance and __init__ invoked afterwards to initialize inner state.
__new__ is good for immutable object as they cannot be changed once they are assigned. So we can return new instance which has new state.
We can use __new__ and __init__ for both mutable object as its inner state can be changed.
But I have another questions now.
When I create a new instance such as a = MyClass("hello","world"), how these arguments are passed? I mean how I should structure the class using __init__ and __new__ as they are different and both accepts arbitrary arguments besides default first argument.
self keyword is in terms of name can be changed to something else? But I am wondering cls is in terms of name is subject to change to something else as it is just a parameter name?
I made a little experiments as such below:
>>> class MyClass(tuple):
def __new__(tuple):
return [1,2,3]
and I did below:
>>> a = MyClass()
>>> a
[1, 2, 3]
Albeit I said I want to return tuple, this code works fine and returned me [1,2,3]. I knew we were passing the first parameters as the type we wanted to receive once the __new__ function is invoked. We are talking about New function right? I don't know other languages return type other than bound type?
And I did anther things as well:
>>> issubclass(MyClass,list)
False
>>> issubclass(MyClass,tuple)
True
>>> isinstance(a,MyClass)
False
>>> isinstance(a,tuple)
False
>>> isinstance(a,list)
True
I didn't do more experiment because the further wasn't bright and I decided to stop there and decided to ask StackOverflow.
The SO posts I read:
Python object creation
Python's use of __new__ and __init__?
how I should structure the class using __init__ and __new__ as they are different and both accepts arbitrary arguments besides default first argument.
Only rarely will you have to worry about __new__. Usually, you'll just define __init__ and let the default __new__ pass the constructor arguments to it.
self keyword is in terms of name can be changed to something else? But I am wondering cls is in terms of name is subject to change to something else as it is just a parameter name?
Both are just parameter names with no special meaning in the language. But their use is a very strong convention in the Python community; most Pythonistas will never change the names self and cls in these contexts and will be confused when someone else does.
Note that your use of def __new__(tuple) re-binds the name tuple inside the constructor function. When actually implementing __new__, you'll want to do it as
def __new__(cls, *args, **kwargs):
# do allocation to get an object, say, obj
return obj
Albeit I said I want to return tuple, this code works fine and returned me [1,2,3].
MyClass() will have the value that __new__ returns. There's no implicit type checking in Python; it's the responsibility of the programmer to return the correct type ("we're all consenting adults here"). Being able to return a different type than requested can be useful for implementing factories: you can return a subclass of the type requested.
This also explains the issubclass/isinstance behavior you observe: the subclass relationship follows from your use of class MyClass(tuple), the isinstance reflects that you return the "wrong" type from __new__.
For reference, check out the requirements for __new__ in the Python Language Reference.
Edit: ok, here's an example of potentially useful use of __new__. The class Eel keeps track of how many eels are alive in the process and refuses to allocate if this exceeds some maximum.
class Eel(object):
MAX_EELS = 20
n_eels = 0
def __new__(cls, *args, **kwargs):
if cls.n_eels == cls.MAX_EELS:
raise HovercraftFull()
obj = super(Eel, cls).__new__(cls)
cls.n_eels += 1
return obj
def __init__(self, voltage):
self.voltage = voltage
def __del__(self):
type(self).n_eels -= 1
def electric(self):
"""Is this an electric eel?"""
return self.voltage > 0
Mind you, there are smarter ways to accomplish this behavior.

Categories

Resources