I have a big tree with hundreds of thousands of nodes, and I'm using __slots__ to reduce the memory consumption. I just found a very strange bug and fixed it, but I don't understand the behavior that I saw.
Here's a simplified code sample:
class NodeBase(object):
__slots__ = ["name"]
def __init__(self, name):
self.name = name
class NodeTypeA(NodeBase):
name = "Brian"
__slots__ = ["foo"]
I then execute the following:
>>> node = NodeTypeA("Monty")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in __init__
AttributeError: 'NodeTypeA' object attribute 'name' is read-only
There is no error if NodeTypeA.name is not defined (side note: that attribute was there by mistake, and had no reason for being there). There is also no error if NodeTypeA.__slots__ is never defined, and it therefore has a __dict__.
The thing I don't understand is: why does the existence of a class variable in a superclass interfere with setting an instance variable in a slot in the child class?
Can anybody explain why this combination results in the object attribute is read-only error? I know my example is contrived, and is unlikely to be intentional in a real program, but that doesn't make this behavior any less strange.
Thanks,
Jonathan
A smaller example:
class C(object):
__slots__ = ('x',)
x = 0
C().x = 1
The documentation on slots states at one point:
__slots__ are implemented at the class level by creating descriptors (Implementing Descriptors) for each variable name. As a result, class attributes cannot be used to set default values for instance variables defined by __slots__; otherwise, the class attribute would overwrite the descriptor assignment.
When __slots__ is in use, attribute assignment to slot attributes needs to go through the descriptors created for the slot attributes. Shadowing the descriptors in a subclass causes Python to be unable to find the routine needed to set the attribute. Python can still see that an attribute is there, though (because it finds the object that's shadowing the descriptor), so it reports that the attribute is read-only.
Related
EDIT: Disclaimer - I don't mean deletion in the sense that applies to languages that aren't memory-managed (e.g. free in C++). Deletion here is to be understood as the fact that the superclass doesn't have the subclass as one of its subclasses anymore after its been deleted.
In Python, you can delete a class (yes I do mean a class, not an instance) by doing the following:
class Super:
...
class DeleteMe(Super):
...
print(Super.__subclasses__())
# [<class '__main__.DeleteMe'>]
del DeleteMe
import gc
gc.collect() # Force a collection
print(Super.__subclasses__())
# []
I am trying to emulate this behaviour but I want the DeleteMe class to be able to destroy itself. Here is what I've tried:
class Super:
...
class DeleteMe(Super):
def self_delete(self):
print(self.__class__)
# <class '__main__.DeleteMe'>, this looks right
del self.__class__ # this fails
import gc
gc.collect()
print(Super.__subclasses__())
# [<class '__main__.DeleteMe'>]
DeleteMe().self_delete()
It fails with the following traceback:
Traceback (most recent call last):
File "/Users/rayan/Desktop/test.py", line 10, in <module>
DeleteMe().self_delete()
File "/Users/rayan/Desktop/test.py", line 4, in self_delete
del self.__class__
TypeError: can't delete __class__ attribute
How can I achieve this self-destructing behaviour?
Note: not a duplicate of How to remove classes from __subclasses__?, that question covers the first case where the deletion happens outside of the class
del DestructMe
This is not deleting the class. This is deleting the name that happens to refer to the class. If there are no other references to the class (and that includes the name you just deleted, any module that's ever imported the class, any instances of the class, and any other places where the class might happen to be stored), then the garbage collector might delete the class when you gc.collect().
Now an instance always knows its own class, via the __class__ attribute. It makes little sense to delete self.__class__, because then what would we be left with? An instance with no class? What can we do with it? We can't call methods on it since those are defined on the class, and we can't do anything object-like on it since it's no longer an instance of object (a superclass of the class we just removed). So really we have a sort of silly looking dictionary that doesn't even do all of the dict things in Python. Hence, disallowed.
You cannot delete data in Python. That's the garbage collector's job. There is no Python equivalent of C's free or C++'s delete. del in Python deletes bindings or dictionary entries. It does not remove data; it removes pointers that happen to point to data.
I have a piece of code that I am trying to understand, and even with the existing answers, I really couldn't understand the purpose of the following code, Can someone please help me in understanding the same?
I have already looked a various relevant questions ( __get__() ) here and I couldnt find specific answers. I understand that the class below is trying to create a method on the fly ( possibly we get to this class from a __getattr__() method which fails to find an attribute ) and return the method to the caller. I have commented right above the lines of code I need understanding with.
class MethodGen(object):
def __getattr__(self, name):
method = self.method_gen(name)
if method:
return self.method_gen(name)
def method_gen(self, name):
def method(*args, **kwargs):
print("Creating a method here")
# Below are the two lines of code I need help understanding with
method.__name__ = name
setattr(self, name, method.__get__(self))
return method
If I am not wrong, the method() function's attribute __name__ has been set, but in setattr() function, the attribute of the class MethodGen, name is set to what ?
This question really intrigued me. The two answers provided didn't seem to tell the whole story. What bothered me was the fact that in this line:
setattr(self, name, method.__get__(self))
the code is not setting things up so that method.__get__ Will be called at some point. Rather, method.__get__ is actually Being Called! But isn't the idea that this __get__ method will be called when a particular attribute of an object, an instance of MethodGen in this case, is actually referenced? If you read the docs, this is the impression you get...that an attribute is linked to a Descriptor that implements __get__, and that implementation determines what gets returned when that attribute is referenced. But again, that's not what's going on here. This is all happening before that point. So what IS really going on here?
The answer lies HERE. The key language is this:
To support method calls, functions include the __get__() method for
binding methods during attribute access. This means that all functions
are non-data descriptors which return bound methods when they are
invoked from an object.
method.__get__(self) is exactly what's being described here. So what method.__get__(self) is actually doing is returning a reference to the "method" function that is bound to self. Since in our case, self is an instance of MethodGen, this call is returning a reference to the "method" function that is bound to an instance of MethodGen. In this case, the __get__ method has nothing to do with the act of referencing an attribute. Rather, this call is turning a function reference into a method reference!
So now we have a reference to a method we've created on the fly. But how do we set it up so it gets called at the right time, when an attribute with the right name is referenced on the instance it is bound to? That's where the setattr(self, name, X) part comes in. This call takes our new method and binds it to the attribute with name name on our instance.
All of the above then is why:
setattr(self, name, method.__get__(self))
is adding a new method to self, the instance of the MethodGen class on which method_gen has been called.
The method.__name__ = name part is not all that important. Executing just the line of code discussed above gives you all the behavior you really want. This extra step just attaches a name to our new method so that code that asks for the name of the method, like code that uses introspection to write documentation, will get the right name. It is the instance attribute's name...the name passed to setattr...that really matters, and really "names" the method.
Interesting, never seen this done before, seems tough to maintain (probably will make some fellow developers want to hang you).
I changed some code so you can see a little more of what is happening.
class MethodGen(object):
def method_gen(self, name):
print("Creating a method here")
def method(*args, **kwargs):
print("Calling method")
print(args) # so we can see what is actually being outputted
# Below are the two lines of code I need help understanding with
method.__name__ = name # These the method name equal to name (i.e. we can call the method this way)
# The following is adding the new method to the current class.
setattr(self, name, method.__get__(self)) # Adds the method to this class
# I would do: setattr(self, name, method) though and remove the __get__
return method # Returns the emthod
m = MethodGen()
test = m.method_gen("my_method") # I created a method in MethodGen class called my_method
test("test") # It returned a pointer to the method that I can use
m.my_method("test") # Or I can now call that method in the class.
m.method_gen("method_2")
m.method_2("test2")
Consider the class below:
class Foo:
def bar(self):
print("hi")
f = Foo()
f.bar()
bar is a class attribute that has a function as its value. Because function implements the descriptor protocol, however, accessing it as Foo.bar or f.bar does not immediately return the function itself; it causes the function's __get__ method to be invoked, and that returns either the original function (as in Foo.bar) or a new value of type instancemethod (as in f.bar). f.bar() is evaluated as Foo.bar.__get__(f, Foo)().
method_gen takes the function named method, and attaches an actual method retrieved by calling the function's __get__ method to an object. The intent is so that something like this works:
>>> m = MethodGen()
>>> n = MethodGen()
>>> m.foo()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'MethodGen' object has no attribute 'foo'
>>> n.foo()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'MethodGen' object has no attribute 'foo'
>>> m.method_gen('foo')
<function foo at 0x10465c758>
>>> m.foo()
Creating a method here
>>> n.foo()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'MethodGen' object has no attribute 'foo'
Initially, MethodGen does not have any methods other than method_gen. You can see the exception raised when attempting to invoke a method named foo on either of two instances. Calling method_gen, however, attaches a new method to just that particular instance. After calling m.method_gen("foo"), m.foo() calls the method defined by method_gen. That call does not affect other instances of MethodGen like n.
From https://stackoverflow.com/a/1529099/156458
To support arbitrary attribute assignment, an object needs a __dict__:
a dict associated with the object, where arbitrary attributes can be
stored. Otherwise, there's nowhere to put new attributes.
An instance of object does not carry around a __dict__ -- if it did,
before the horrible circular dependence problem (since __dict__, like most
everything else, inherits from object;-), this would saddle every
object in Python with a dict, which would mean an overhead of many
bytes per object that currently doesn't have or need a dict
(essentially, all objects that don't have arbitrarily assignable
attributes don't have or need a dict).
...
When the class has the __slots__ special attribute (a sequence of strings), then the class statement (more precisely, the default metaclass, type) does not equip every instance of that class with a __dict__ (and therefore the ability to have arbitrary attributes), just a finite, rigid set of "slots" (basically places which can each hold one reference to some object) with the given names.
If an object doesn't have __dict__, must its class have a __slots__ attribute?
For example, an instance of object doesn't have a __dict__:
>>> object().__dict__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'object' object has no attribute '__dict__'
but it doesn't have __slots__ either:
>>> object.__slots__
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: type object 'object' has no attribute '__slots__'
Do object instances have any attributes at all?
How many possibilities are there:
an object has __dict__, and its class has __dict__ but no __slots__
an object doesn't have __dict__, and its class has __slots__
an object doesn't have __dict__, and its class doesn't have __slots__ ?
Is it possible to tell if an object has __dict__ from its class?
if its class has __slots__, then it doesn't have __dict__, correct?
if its class doesn't have __slots__, how can I tell if it has __dict__ or not?
For user defined classes (defined using the class keyword in regular Python code), a class will always have __slots__ on the class, __dict__ on the instance, or both (if one of the slots defined is '__dict__', or one of the user defined classes in an inheritance chain defines __slots__ and another one does not, creating __dict__ implicitly). So that's three of four possibilities covered for user defined classes.
Edit: A correction: Technically, a user-defined class could have neither; the class would be defined with __slots__, but have it deleted after definition time (the machinery that sets up the type doesn't require __slots__ to persist after the class definition finishes). No sane person should do this, and it could have undesirable side-effects (full behavior untested), but it's possible.
For built-in types, at least in the CPython reference interpreter, they're extremely unlikely to have __slots__ (if they did, it would be to simulate a user-defined class, defining it doesn't actually do anything useful). A built-in type typically stores its attributes as raw C level values and pointers on a C level struct, optionally with explicitly created descriptors or accessor methods, which eliminates the purpose of __slots__, which are just a convenient limited purpose equivalent of such struct games for user defined classes. __dict__ is opt-in for built-in types, not on by default (though the opt-in process is fairly easy; you need to put a PyObject* entry somewhere in the struct and provide the offset to it in the type definition).
To be clear, __dict__ need not appear on the class for it to appear on its instances; __slots__ is class level, and can suppress the __dict__ on the instance, but has no effect on whether the class itself has a __dict__; user defined classes always have __dict__, but their instances won't if you're careful to use __slots__ appropriately.
So in short:
(Sane) User defined classes have at least one of __dict__ (on the instances) or __slots__ (on the class), and can have both. Insane user defined classes could have neither, but only a deranged developer would do it.
Built-in classes often have neither, may provide __dict__, and almost never provide __slots__ as it is pointless for them.
Examples:
# Class has __slots__, instances don't have __dict__
class DictLess:
__slots__ = ()
# Instances have __dict__, class lacks __slots__
class DictOnly:
pass
# Class has __slots__, instances have __dict__ because __slots__ declares it
class SlottedDict:
__slots__ = '__dict__',
# Class has __slots__ without __dict__ slot, instances have it anyway from unslotted parent
class DictFromParent(DictOnly):
__slots__ = ()
# Complete insanity: __slots__ takes effect at class definition time, but can
# be deleted later, without changing the class behavior:
class NoSlotNoDict:
__slots__ = ()
del NoSlotNoDict.__slots__
# Instances have no __dict__, class has no __slots__ but acts like it does
# (the machinery to make it slotted isn't undone by deleting __slots__)
# Please, please don't actually do this
# Built-in type without instance __dict__ or class defined __slots__:
int().__dict__ # Raises AttributeError
int.__slots__ # Also raises AttributeError
# Built-in type that opts in to __dict__ on instances:
import functools
functools.partial(int).__dict__ # Works fine
functools.partial.__slots__ # Raises AttributeError
While trying to wrap arbitrary objects, I came across a problem with dictionaries and lists. Investigating, I managed to come up with a simple piece of code whose behaviour I simply do not understand. I hope some of you can tell me what is going on:
>>> class Cl(object): # simple class that prints (and suppresses) each attribute lookup
... def __getattribute__(self, name):
... print 'Access:', name
...
>>> i = Cl() # instance of class
>>> i.test # test that __getattribute__ override works
Access: test
>>> i.__getitem__ # test that it works for special functions, too
Access: __getitem__
>>> i['foo'] # but why doesn't this work?
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'Cl' object has no attribute '__getitem__'
Magic __methods__() are treated specially: They are internally assigned to "slots" in the type data structure to speed up their look-up, and they are only looked up in these slots. If the slot is empty, you get the error message you got.
See Special method lookup for new-style classes in the documentation for further details. Excerpt:
In addition to bypassing any instance attributes in the interest of correctness, implicit special method lookup generally also bypasses the __getattribute__() method even of the object’s metaclass.
[…]
Bypassing the __getattribute__() machinery in this fashion provides significant scope for speed optimisations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method must be set on the class object itself in order to be consistently invoked by the interpreter).
I subclassed from StringIO to create a MockFile-class. There should be an Attribute "name" in the derived class, but creating this Attribute throws an AttributeError.
Puzzled i did a __dict__ lookup and found that there was already a name-key. Iterating through the __mro__ i found a property named 'name', obviously read-only in theio.TextIOWrapper class.
So i have basically two Questions:
for what is this 'name' property intended
is it safe to overwrite it with a settattr assignment?
The example-code for completness:
class MockFile(StringIO):
def __init__(self, name, buffer_ = None):
super(MockFile, self).__init__(buffer_)
self.name = name
>>> mfile = MockFile('stringio.tmp', u'#MockFile')
leads to:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 4, in __init__
AttributeError: can't set attribute
The name property of io.StringIO in Python 2.6 comes from the class hierarchy in the io module. It's a somewhat complex setup with both inheritance and composition, and the name property is used to propagate names from underlying objects to the various wrappers and specializations. The actual property on io.StringIO is gone in Python 2.7 and later, though, so you should be fine to shadow it in your subclass.
You can't use setattr() to set the property any more than actual assignment -- settattr() and attribute assignment both work the same way. The nature of property prevents you from shadowing the baseclass property with an instance attribute (without doing more.) You can, however, define a property of your own with the same name, or trick Python into not seeing the property in the first place:
class MockFile(StringIO):
name = None
def __init__(self, name, buffer_ = None):
super(MockFile, self).__init__(buffer_)
self.name = name