Inheriting from "str" in Python 2.7 and Python 3 - python

I'm porting some older code from python 2.7 to python 3, and I'm trying to figure out how inherinting from str works in both versions. Here is some of the code.
class OtoString(str):
def __init__(self, p_string):
str.__init__(self, p_string)
def is_url(self):
if self.startswith("http://") or self.startswith("https://"):
return True
else:
return False
print(OtoString("https://stackoverflow.com").is_url())
Running this in Python 2.7 works just fine, but when I run this code in Python 3.7 I get a TypeError:
TypeError: object.__init__() takes exactly one argument (the instance to initialize)
It would be helpful if someone could explain how exactly inheriting from str works, what this line does,
str.__init__(self, p_string)
why this doesn't work in Python 3 and how I could make it work.

str is an immutable type, and like all immutable types, should perform both construction and initialization in __new__, not __init__. The correct code (that should work on both Python 2 and Python 3) to replace __init__ would be:
def __new__(cls, p_string):
return str.__new__(cls, p_string)
Note that it receives a class object, not an existing instance, and it returns the result of calling __new__ on the superclass (because __new__ actually makes the new object, it doesn't just initialize one handed to it like __init__ does).
In this particular case, you should just omit the definition of __init__/__new__ entirely (you'll inherit str's version automatically). But if you need to do additional work (e.g. compute some normalized version of p_string before final construction), the __new__ above is the correct pattern.
Also, to avoid bloating the memory use of your class, I suggest adding:
__slots__ = ()
as the first line inside your class body; that will avoid making room for an unused __dict__ and __weakref__, keeping your behavior and overhead much closer to that of str (on my 64 bit Python 3.6, it reduces the per instance memory overhead, above the cost of the string data itself, from 217 bytes to 81 bytes). Final version would be just:
class OtoString(str):
__slots__ = ()
def is_url(self):
return self.startswith(("http://", "https://"))

__init__ is called after the object is constructed. str is immutable, so you cannot modify the value in constructor. The construction must take place in __new__, which is class method, so that's why the first parameter is cls and not self:
class OtoString(str):
def __new__(cls, *args, **kw):
return str.__new__(cls, *args, **kw)
def is_url(self):
if self.startswith("http://") or self.startswith("https://"):
return True
else:
return False
print(OtoString("https://stackoverflow.com").is_url())
Prints:
True

try this:
class OtoString(str):
def __init__(self,p_string):
str.__init__(self)
self.p_string=p_string
def is_url(self):
if self.p_string.startswith('https://') or self.p_string.startswith('http://'):
return True
return False
it works on my computer.
this line
str.__init__(self, p_string)
should be str.__init__(self) in python3.x, and it
actually inherits str's init. class OtoString(str)inherits all functions(including init) in str.

Related

Porting Subclass of Unicode to Python 3

I'm porting a legacy codebase from Python 2.7 to Python 3.6. In that codebase I have a number of instances of things like:
class EntityName(unicode):
#staticmethod
def __new__(cls, s):
clean = cls.strip_junk(s)
return super(EntityName, cls).__new__(cls, clean)
def __init__(self, s):
self._clean = s
self._normalized = normalized_name(self._clean)
self._simplified = simplified_name(self._clean)
self._is_all_caps = None
self._is_all_lower = None
super(EntityName, self).__init__(self._clean)
It might be called like this:
EntityName("Guy DeFalt")
When porting this to Python 3 the above code fails because unicode is no longer a class you can extend (at least, if there is an equivalent class I cannot find it). Given that str is unicode now, I tried to just swap str in, but the parent init doesn't take a the string value I'm trying to pass:
TypeError: object.__init__() takes no parameters
This makes sense because str does not have an __init__ method - this does not seem to be an idiomatic way of using this class. So my question has two major branches:
Is there a better way to be porting classes that sub-classed the old unicode class?
If subclassing str is appropriate, how should I modify the __init__ function for idiomatic behavior?
The right way to subclass a string or another immutable class in Python 3 is same as in Python 2:
class MyString(str):
def __new__(cls, initial_arguments): # no staticmethod
desired_string_value = get_desired_string_value(initial_arguments)
return super(MyString, cls).__new__(cls, desired_string_value)
# can be shortened to super().__new__(...)
def __init__(self, initial_arguments): # arguments are unused
self.whatever = whatever(self)
# no need to call super().__init__(), but if you must, do not pass arguments
There are several issues with your sample. First, why __new__ is #staticmethod? It's #classmethod, although you don't need to specify this. Second, the code seems to operate under the assumption that when you call __new__ of the superclass, it somehow calls your __init__ as well. I'm deriving this from looking at how self._clean is supposed to be set. This is not the case. When you call MyString(arguments), the following happens:
First Python calls __new__ with the class parameter (usually called cls) and arguments. __new__ must return the class instance. To do this it can create it, as we do, or do something else; e.g. it may return an existing one or, in fact, anything.
Then Python calls __init__ with the instance it received from __new__ (this parameter is usually called self) and the same arguments.
(There's a special case: Python won't call __init__ if __new__ returned something that is not a subclass of the passed class.)
Python uses class hierarchy to see which __new__ and __init__ to call. It's up to you to correctly sort out the arguments and use proper superclass calls in these two methods.

Specify value of base class [duplicate]

Why I have problem creating a class inheriting from str (or also from int)
class C(str):
def __init__(self, a, b):
str.__init__(self,a)
self.b = b
C("a", "B")
TypeError: str() takes at most 1 argument (2 given)
the same happens if I try to use int instead of str, but it works with custom classes. I need to use __new__ instead of __init__? why?
>>> class C(str):
... def __new__(cls, *args, **kw):
... return str.__new__(cls, *args, **kw)
...
>>> c = C("hello world")
>>> type(c)
<class '__main__.C'>
>>> c.__class__.__mro__
(<class '__main__.C'>, <type 'str'>, <type 'basestring'>, <type 'object'>)
Since __init__ is called after the object is constructed, it is too late to modify the value for immutable types. Note that __new__ is a classmethod, so I have called the first parameter cls
See here for more information
>>> class C(str):
... def __new__(cls, value, meta):
... obj = str.__new__(cls, value)
... obj.meta = meta
... return obj
...
>>> c = C("hello world", "meta")
>>> c
'hello world'
>>> c.meta
'meta'
When you instantiate a class, the arguments that you pass in, are passed to both the __new__ (constructor) and then to the __init__ (initializer) methods of the class. So if you inherit from a class that has restrictions on number of arguments that may be supplied during instantiation, you must guarantee that neither its __new__, nor its __init__ would get more arguments than they expect to get. So that is the problem that you have. You instantiate your class with C("a", "B"). The interpreter looks for __new__ method in C. C doesn't have it, so python peeps into its base class str. And as it has one, that one is used and supplied with the both arguments. But str.__new__ expects to get only one argument (besides its class object as the first argument). So TypeError is raised. That is why you must extend it in your child class similarly to what you do with __init__. But bear in mind that it must return class instance and that it is a static method (irrespective of whether it is defined with #staticmethod decorator or not) that counts if you use super function.
Inheriting built-in types is very seldom worth while. You have to deal with several issues and you don't really get much benefit.
It is almost always better to use composition. Instead of inheriting str, you would keep a str object as an attribute.
class EnhancedString(object):
def __init__(self, *args, **kwargs):
self.s = str(*args, **kwargs)
you can defer any methods you want to work on the underlying str self.s manually or automatically using __getattr__.
That being said, needing your own string type is something that should give you pause. There are many classes that should store a string as their main data, but you generally want to use str or unicode (the latter if you're representing text) for general representation of strings. (One common exception is if you have need to use a UI toolkit's string type.) If you want to add functionality to your strings, try if you can to use functions that operate on strings rather than new objects to serve as strings, which keeps your code simpler and more compatible with everyone else's programs.
After carefully reading this, here is another attempt at subclassing str. The change from other answers is creating the instance in the correct class using super(TitleText, cls).__new__ . This one seems to behave like a str whenever it's used, but has allowed me to override a method:
class TitleText(str):
title_text=""
def __new__(cls,content,title_text):
o=super(TitleText, cls).__new__(cls,content)
o.title_text = title_text
return o
def title(self):
return self.title_text
>>> a=TitleText('name','A nice name')
>>> a
'name'
>>> a[0]
'n'
>>> a[0:2]
'na'
>>> a.title()
'A nice name'
This lets you do slicing and subscripting correctly. What's this for? For renaming the Django application in the admin index page.
Use __new__ in case of immutable types:
class C(str):
def __new__(cls, content, b):
return str.__new__(cls, content)
def __str__(self):
return str.__str__(self)
a=C("hello", "world")
print a
print returns hello.
Python strings are immutable types. The function __new__ is called to create a new instance of object C. The python __new__ function is basically exists to allow inheritance from immutable types.
The question was already answered above, this is just a tangential observation that may be useful to somebody.
I hit this question when trying to figure out a way to remove a temporary file after the dictionary it was being referred to goes deleted.
The context is a Flask session: the user can upload some files but give up before effectively commit the whole workflow it has to go through to get his/her data into the final destination. Until then, I keep the files in a temporary directory. Let's say the user give up and closes the browser window, I don't want those files lingering around.
Since I keep the temporary path in a Flask session -- which is just a dictionary that eventually goes deleted (e.g, timeout), I can customize a str class to hold the temporary directory address/path, and have its __del__ method handling the temporary directory deletion.
Here it goes:
class Tempdir(str):
def __new__(cls, *args, **kwargs):
from tempfile import mkdtemp
_dir = mkdtemp()
return super().__new__(cls, _dir)
def __del__(self):
from shutil import rmtree
rmtree(str(self))
Instantiate it in your python interpreter/app:
> d = Tempfile()
> d
'/var/folders/b1/frq3gywj3ljfqrf1yc7zk06r0000gn/T/tmptwa_g5fw'
>
> import os
> os.path.exists(d)
True
When you exit the interpreter:
$ ls /var/folders/b1/frq3gywj3ljfqrf1yc7zk06r0000gn/T/tmptwa_g5fw
ls: /var/folders/b1/frq3gywj3ljfqrf1yc7zk06r0000gn/T/tmptwa_g5fw: No such file or directory
There you go.

Python how to extend `str` and overload its constructor? [duplicate]

This question already has answers here:
inheritance from str or int
(6 answers)
Closed 7 years ago.
I have an sequence of characters, a string if you will, but I want to store metadata about the origin of the string. Additionally I want to provide a simplified constructor.
I've tried extending the str class in as many ways as Google would resolve for me. I gave up when I came to this;
class WcStr(str):
"""wc value and string flags"""
FLAG_NIBBLES = 8 # Four Bytes
def __init__(self, value, flags):
super(WcStr, self).__init__()
self.value = value
self.flags = flags
#classmethod
def new_nibbles(cls, nibbles, flag_nibbles=None):
if flag_nibbles is None:
flag_nibbles = cls.FLAG_NIBBLES
return cls(
nibbles[flag_nibbles+1:],
nibbles[:flag_nibbles]
)
When I comment-out both parameters to #classmethod's cls() call it gives me this error:
TypeError: __init__() takes exactly 3 arguments (1 given)
Pretty typical, wrong number of args error,
With a two more arguments (eg as shown in the example code):
TypeError: str() takes at most 1 argument (2 given)
I've tried changing the __init__'s args, the super().__init__'s args, neither seem to make ant change.
With only one argument passed to cls(...) call, as the str class's error asks, I get this:
TypeError: __init__() takes exactly 3 arguments (2 given)
So I can't win here, whats gone wrong?
Ps this should be a second post but what property does str's raw string value get put into? I'd like to overload as little of the str class as I can to add this metadata into the constructor.
This is exactly what the __new__ method is for.
In Python, creating an object actually has two steps. In pseudocode:
value = the_class.__new__(the_class, *args, **kwargs)
if isinstance(value, the_class):
value.__init__(*args, **kwargs)
The two steps are called construction and initialization. Most types don't need anything fancy in construction, so they can just use the default __new__ and define an __init__ method—which is why tutorials, etc. only mention __init__.
But str objects are immutable, so the initializer can't do the usual stuff of setting up attributes and so on, because you can't set attributes on an immutable object.
So, if you want to change what the str actually holds, you have to override its __new__ method, and call the super __new__ with your modified arguments.
In this case, you don't actually want to do that… but you do want to make sure str.__new__ doesn't see your extra arguments, so you still need to override it, just to hide those arguments from it.
Meanwhile, you ask:
what property does str's raw string value get put into?
It doesn't. What would be the point? Its value is a string, so you'd have a str which had an attribute which was the same str which had an attribute which etc. ad infinitum.
Under the covers, of course, it has to be storing something. But that's under the covers. In particular, in CPython, the str class is implemented in C, and it contains, among other things, a C char * array of the actual bytes used to represent the string. You can't access that directly.
But, as a subclass of str, if you want to know your value as a string, that's just self. That's the whole point of being a subclass, after all.
So:
class WcStr(str):
"""wc value and string flags"""
FLAG_NIBBLES = 8 # Four Bytes
def __new__(cls, value, *args, **kwargs):
# explicitly only pass value to the str constructor
return super(WcStr, cls).__new__(cls, value)
def __init__(self, value, flags):
# ... and don't even call the str initializer
self.flags = flags
Of course you don't really need __init__ here; you could do your initialization along with your construction in __new__. But if you don't intend for flags to be an immutable, only-set-during-construction kind of value, it makes more conceptual sense to do it the initializer, just like any normal class.
Meanwhile:
I'd like to overload as little of the str class as I can
That may not do what you want. For example, str.__add__ and str.__getitem__ are going to return a str, not an instance of your subclass. If that's good, then you're done. If not, you will have to overload all of those methods and change them to wrap up the return value with the appropriate metadata. (You can do this programmatically, either by generating wrappers at class definition time, or by using a __getattr__ method that generates wrappers on the fly.)
One last thing to consider: the str constructor doesn't take exactly one argument.
It can take 0:
str() == ''
And, while this isn't relevant in Python 2, in Python 3 it can take 2:
str(b'abc', 'utf-8') == 'abc'
Plus, even when it takes 1 argument, it obviously doesn't have to be a string:
str(123) == '123'
So… are you sure this is the interface you want? Maybe you'd be better off creating an object that owns a string (in self.value), and just using it explicitly. Or even using it implicitly, duck-typing as a str by just delegating most or all of the str methods to self.value?
Instead of __init__ try new:
def __new__(cls, value, flags):
obj = str.__new__(cls, value)
obj.flags = flags
return obj

How to (or why not) call unicode.__init__ from subclass

I've encountered a situation where subclassing unicode results in Deprecation Warnings on Python prior to 3.3 and errors on Python 3.3:
# prove that unicode.__init__ accepts parameters
s = unicode('foo')
s.__init__('foo')
unicode.__init__(s, 'foo')
class unicode2(unicode):
def __init__(self, other):
super(unicode2, self).__init__(other)
s = unicode2('foo')
class unicode3(unicode):
def __init__(self, other):
unicode.__init__(self, other)
s = unicode3('foo')
Curiously, the warnings/errors don't occur in the first three lines, but instead occur on lines 8 and 14. Here's the output on Python 2.7.
> python -Wd .\init.py
.\init.py:8: DeprecationWarning: object.__init__() takes no parameters
super(unicode2, self).__init__(other)
.\init.py:14: DeprecationWarning: object.__init__() takes no parameters
unicode.__init__(self, other)
The code is simplified to exemplify the issue. In a real-world application, I would perform more than simply calling the super __init__.
It appears from the first three lines that the unicode class implements __init__ and that method accepts at least a single parameter. However, if I want to call that method from a subclass, I appear to be unable to do so, whether I invoke super() or not.
Why is it okay to call unicode.__init__ on a unicode instance but not on a unicode subclass? What is an author to do if subclassing the unicode class?
I suspect the issue comes from the fact that unicode is immutable.
After a unicode instance is created, it cannot be modified. So, any initialization logic is going to be in the __new__ method (which is called to do the instance creation), rather than __init__ (which is called only after the instance exists).
A subclass of an immutable type doesn't have the same strict requirements, so you can do things in unicode2.__init__ if you want, but calling unicode.__init__ is unnecessary (and probably won't do what you think it would do anyway).
A better solution is probably to do your customized logic in your own __new__ method:
class unicode2(unicode):
def __new__(cls, value):
# optionally do stuff to value here
self = super(unicode2, cls).__new__(cls, value)
# optionally do stuff to self here
return self
You can make your class immutable too, if you want, by giving it a __setattr__ method that always raises an exception (you might also want to give the class a __slots__ property to save memory by omitting the per-instance __dict__).

__new__ and __init__ in Python

I am learning Python and so far I can tell the things below about __new__ and __init__:
__new__ is for object creation
__init__ is for object initialization
__new__ is invoked before __init__ as __new__ returns a new instance and __init__ invoked afterwards to initialize inner state.
__new__ is good for immutable object as they cannot be changed once they are assigned. So we can return new instance which has new state.
We can use __new__ and __init__ for both mutable object as its inner state can be changed.
But I have another questions now.
When I create a new instance such as a = MyClass("hello","world"), how these arguments are passed? I mean how I should structure the class using __init__ and __new__ as they are different and both accepts arbitrary arguments besides default first argument.
self keyword is in terms of name can be changed to something else? But I am wondering cls is in terms of name is subject to change to something else as it is just a parameter name?
I made a little experiments as such below:
>>> class MyClass(tuple):
def __new__(tuple):
return [1,2,3]
and I did below:
>>> a = MyClass()
>>> a
[1, 2, 3]
Albeit I said I want to return tuple, this code works fine and returned me [1,2,3]. I knew we were passing the first parameters as the type we wanted to receive once the __new__ function is invoked. We are talking about New function right? I don't know other languages return type other than bound type?
And I did anther things as well:
>>> issubclass(MyClass,list)
False
>>> issubclass(MyClass,tuple)
True
>>> isinstance(a,MyClass)
False
>>> isinstance(a,tuple)
False
>>> isinstance(a,list)
True
I didn't do more experiment because the further wasn't bright and I decided to stop there and decided to ask StackOverflow.
The SO posts I read:
Python object creation
Python's use of __new__ and __init__?
how I should structure the class using __init__ and __new__ as they are different and both accepts arbitrary arguments besides default first argument.
Only rarely will you have to worry about __new__. Usually, you'll just define __init__ and let the default __new__ pass the constructor arguments to it.
self keyword is in terms of name can be changed to something else? But I am wondering cls is in terms of name is subject to change to something else as it is just a parameter name?
Both are just parameter names with no special meaning in the language. But their use is a very strong convention in the Python community; most Pythonistas will never change the names self and cls in these contexts and will be confused when someone else does.
Note that your use of def __new__(tuple) re-binds the name tuple inside the constructor function. When actually implementing __new__, you'll want to do it as
def __new__(cls, *args, **kwargs):
# do allocation to get an object, say, obj
return obj
Albeit I said I want to return tuple, this code works fine and returned me [1,2,3].
MyClass() will have the value that __new__ returns. There's no implicit type checking in Python; it's the responsibility of the programmer to return the correct type ("we're all consenting adults here"). Being able to return a different type than requested can be useful for implementing factories: you can return a subclass of the type requested.
This also explains the issubclass/isinstance behavior you observe: the subclass relationship follows from your use of class MyClass(tuple), the isinstance reflects that you return the "wrong" type from __new__.
For reference, check out the requirements for __new__ in the Python Language Reference.
Edit: ok, here's an example of potentially useful use of __new__. The class Eel keeps track of how many eels are alive in the process and refuses to allocate if this exceeds some maximum.
class Eel(object):
MAX_EELS = 20
n_eels = 0
def __new__(cls, *args, **kwargs):
if cls.n_eels == cls.MAX_EELS:
raise HovercraftFull()
obj = super(Eel, cls).__new__(cls)
cls.n_eels += 1
return obj
def __init__(self, voltage):
self.voltage = voltage
def __del__(self):
type(self).n_eels -= 1
def electric(self):
"""Is this an electric eel?"""
return self.voltage > 0
Mind you, there are smarter ways to accomplish this behavior.

Categories

Resources