TL;DR Is there any way to create a weak reference that will call a callback upon having 1 strong reference left instead of 0?
For those who think it's an X Y problem, here's the long explanation:
I have quite a challenging issue that I'm trying to solve with my code.
Suppose we have an instance of some class Foo, and a different class Bar which references the instance as it uses it:
class Foo: # Can be anything
pass
class Bar:
"""I must hold the instance in order to do stuff"""
def __init__(self, inst):
self.inst = inst
foo_to_bar = {}
def get_bar(foo):
"""Creates Bar if one doesn't exist"""
return foo_to_bar.setdefault(foo, Bar(foo))
# We can either have
bar = get_foobar(Foo())
# Bar must hold a strong reference to foo
# Or
foo = Foo()
bar = get_foobar(foo)
bar2 = get_foobar(foo) # Same Bar
del bar
del bar2
bar3 = get_foobar(foo) # Same Bar
# In this case, as long as foo exists, we want the same bar to show up,
# therefore, foo must in some way hold a strong reference back to bar
Now here's the tricky part: You can solve this issue using a circular reference, where foo references bar and bar references foo, but hey, what's the fun part in that? It will take longer to clean up, will not work in case Foo defines __slots__ and generally will be a poor solution.
Is there any way, I can create a foo_to_bar mapping that cleans upon a single reference to both foo and bar? In essence:
import weakref
foo_to_bar = weakref.WeakKeyDictionary()
# If bar is referenced only once (as the dict value) and foo is
# referenced only once (from bar.inst) their mapping will be cleared out
This way it can work perfectly as having foo outside the function makes sure bar is still there (I might require __slots__ on Foo to support __weakref__) and having bar outside the function results in foo still being there (because of the strong reference in Bar).
WeakKeyDictionary does not work beacuse {weakref.ref(inst): bar.inst} will cause circular reference.
Alternatively, is there any way to hook into the reference counting mechanism (in order to clean when both objects get to 1 reference each) without incurring significant overhead?
You are overthinking this. You don't need to track if there is just one reference left. Your mistake is to create a circular reference in the first place.
Store _BarInner objects in your cache, that have no reference to Foo instances. Upon access to the mapping, return a lightweight Bar instance that contains both the _BarInner and Foo references:
from weakref import WeakKeyDictionary
from collections.abc import Mapping
class Foo:
pass
class Bar:
"""I must hold the instance in order to do stuff"""
def __init__(self, inst, inner):
self._inst = inst
self._inner = inner
# Access to interesting stuff is proxied on to the inner object,
# with the instance information included *as needed*.
#property
def spam(self):
self.inner.spam(self.inst)
class _BarInner:
"""The actual data you want to cache"""
def spam(self, instance):
# do something with instance, but *do not store any references to that
# object on self*.
class BarMapping(Mapping):
def __init__(self):
self._mapping = WeakKeyDictionary()
def __getitem__(self, inst):
inner = self._mapping.get(inst)
if inner is None:
inner = self._mapping[inst] = _BarInner()
return Bar(inst, inner)
Translating this to the bdict project linked in the comments, you can simplify things drastically:
Don't worry about lack of support for weak references in projects. Document that your project will only support per-instance data on types that have a __weakref__ attribute. That's enough.
Don't distinguish between slots and no-slots types. Always store per-instance data away from the instances. This lets you simplify your code.
The same goes for the 'strong' and 'autocache' flags. The flyweight should always keep a strong reference. Per-instance data should always be stored.
Use a single class for the descriptor return value. The ClassBoundDict type is all you need. Store the instance and owner data passed to __get__ in that object, and vary behaviour in __setitem__ accordingly.
Look at collections.ChainMap() to encapsulate access to the class and instance mappings for read access.
Related
I have a somewhat complex class Thing, and an associated mixin IterMixin (to make the class iterable)...and a funky method elsewhere in the codebase which receives an instance of my class as an argument.
In fact, I'm attempting to bundle up a bunch of parameters as single object to be passed to multiple external functions beyond the funky function below. A parameter object design pattern of sorts...
class IterMixin():
def __iter__(self):
for attr, value in self.__dict__.items():
yield attr, value
class Thing(IterMixin):
def __iter__(self, foo=None, bar=None, baz=999):
if foo is None:
self.foo = {}
else:
self.foo = foo
if bar is None:
self.foo = {}
else:
self.bar = bar
self.baz = baz
#property
def foo(self):
return self._foo
#foo.setter
def foo(self, data)
self._foo = self.parser(data)
#property
def bar(self):
return self._bar
#bar.setter
def bar(self, more_data)
self._bar, self.baz = self.another_parser(more_data)
def parser(self, data):
...do stuff...
return foo
def another_parser(self, more_data):
...do add'l stuff...
return bar, baz
With regard to the funky function, in a completely different module, via the Thing class, I want to pass Thing's attributes (foo, bar, and baz) to the funky function as one argument...like so:
thing_args = Thing()
def funky(*thing_args):
...do stuff...
...expecting to manipulate keys from things_arg
...
return whatever
PROBLEM:
If I do not make the setters for the attributes foo and bar private (for example, via self._foo)--i.e., by way of an underscore--then I evoke infinite recursion during class initialization ...as the __init__ and setters for these attributes loop over and over and repeatedly call themselves. To avoid that, I used the#property decorator and "privatized" the foo and bar while setting them.
However, when I pass an instance of the Thing class, and unpack its attributes as args in the funky function via a splat or asterick, if I introspect the resultant keys for those attributes, I still get _foo and _bar. I can't seem to get rid of the underscores. (In other words, I get the "privatized" attribute names of Thing.)
The biz logic of funky needs the unpacked values to not have any underscores.
Why is this happening (the underscores upon unpacking)? How can I fix this? Is there a more elegant way to either initialize the foo and bar attributes without privatizing anything? Or perhaps a more Pythonic way to pass all the attributes in the Thing class to my funky function?
First, you've got a major problem that will prevent you from even seeing the problem you've asked for help with: Your Thing class defines an __iter__ method that doesn't super, and doesn't yield or return anything. Hopefully that part is just some typo and you know how to fix it to do whatever you actually wanted there.
No, onto the problem you're asking about:
class IterMixin():
def __iter__(self):
for attr, value in self.__dict__.items():
yield attr, value
Try printing out the __dict__ of your instances. Or, better, instances of a minimal example like this:
class Thing:
#property
def foo(self):
return self._foo
#foo.setter
def foo(self, data):
self._foo = data
t = Thing()
t.foo = 2
print(t.__dict__)
The output is {'_foo': 2}.
You've tried to hide the attributes by giving them private names and putting them behind properties, but then you've gone around behind the properties' backs and looked directly into the __dict__ where the real attributes are.
And what else could be there? Your actual _foo has to be stored somewhere on each instance. That foo, on the other hand, isn't really a value, it's a getter/setter that uses that private attribute, so it isn't stored anywhere.
If you really want to use reflection to find all of the "public values" on an instance, you can do something like this:
for attr, value in inspect.getmembers(self):
if not attr.startswith('_') and not callable(value):
yield attr, value
However, I think it would be much better to not do this reflectively. Simpler and cleaner options include:
Add a _fields = 'foo', 'bar', 'baz' and have the base class iterate _fields_.
Write a decorator that registers a property, and have the base class iterate that registry.
Build something that lets you specify the attributes more declaratively and writes the boilerplate for you. See namedtuple, dataclass, and attrs for some inspiration.
Just use attrs (or, if you're not the OP but someone reading this from the future who can rely on 3.7+, dataclass) to do that work for you.
Rethink your design. A class whose instances iterate name-value pairs of their public attributes is weird in the first place. A "parameter object" that acted like a mapping to be used for keyword-splatting could be useful; one that acted like a normal iterable could be useful; one that acts as an iterable of name-value pairs is useless for anything except for passing to a dict construct (at which point it's, again, simpler to be a mapping). Plus, a mixin is really not helping you with the hard part of doing it. Whatever you actually need to do, ask for help on how to do that, instead of how to make this code that shouldn't work work anyway.
This is a two-part query, which broadly relates to class attributes referencing mutable and immutable objects, and how these should be dealt with in code design. I have abstracted away the details to provide an example class below.
In this example, the class is designed for two instances which, through an instance method, can access a class attribute that references a mutable object (a list in this case), each can “take” (by mutating the object) elements of this object into their own instance attribute (by mutating the object it references). If one instance “takes” an element of the class attribute, that element is subsequently unavailable to the other instance, which is the effect I wish to achieve. I find this a convenient way of avoiding the use of class methods, but is it bad practice?
Also in this example, there is a class method that reassigns an immutable object (a Boolean value, in this case) to a class attribute based on the state of an instance attribute. I can achieve this by using a class method with cls as the first argument and self as the second argument, but I’m not sure if this is correct. On the other hand, perhaps this is how I should be dealing with the first part of this query?
class Foo(object):
mutable_attr = ['1', '2']
immutable_attr = False
def __init__(self):
self.instance_attr = []
def change_mutable(self):
self.instance_attr.append(self.mutable_attr[0])
self.mutable_attr.remove(self.mutable_attr[0])
#classmethod
def change_immutable(cls, self):
if len(self.instance_attr) == 1:
cls.immutable_attr = True
eggs = Foo()
spam = Foo()
If you want a class-level attribute (which, as you say, is "visible" to all instances of this class) using a class method like you show is fine. This is, mostly, a question of style and there are no clear answers here. So what you show is fine.
I just want to point out that you don't have to use a class method to accomplish your goal. To accomplish your goal this is also perfectly fine (and in my opinion, more standard):
class Foo(object):
# ... same as it ever was ...
def change_immutable(self):
"""If instance has list length of 1, change immutable_attr for all insts."""
if len(self.instance_attr) == 1:
type(self).immutable_attr = True
Or even:
def change_immutable(self):
"""If instance has list length of 1, change immutable_attr for all insts."""
if len(self.instance_attr) == 1:
Foo.immutable_attr = True
if that's what you want to do. The major point being that you are not forced into using a class method to get/set class level attributes.
The type builtin function (https://docs.python.org/2/library/functions.html#type) simply returns the class of an instance. For new style classes (most classes nowadays, ones that ultimately descend from object) type(self) is the same as self.__class__, but using type is the more idiomatic way to access an object's type.
You use type when you want to write code that gets an object's ultimate type, even if it's subclassed. This may or may not be what you want to do. For example, say you have this:
class Baz(Foo):
pass
bazzer = Baz()
bazzer.change_mutable()
bazzer.change_immutable()
Then the code:
type(self).immutable_attr = True
Changes the immutable_attr on the Baz class, not the Foo class. That may or may not be what you want -- just be aware that only objects that descend from Baz see this. If you want to make it visible to all descendants of Foo, then the more appropriate code is:
Foo.immutable_attr = True
Hope this helps -- this question is a good one but a bit open ended. Again, major point being you are not forced to use class methods to set/get class attrs -- but not that there's anything wrong with that either :)
Just finally note the way you first wrote it:
#classmethod
def change_immutable(cls, self):
if len(self.instance_attr) == 1:
cls.immutable_attr = True
Is like doing the:
type(self).immutable_attr = True
way, because the cls variable will not necessarily be Foo if it's subclassed. If you for sure want to set it for all instances of Foo, then just setting the Foo class directly:
Foo.immutable_attr = True
is the way to go.
This is one possibility:
class Foo(object):
__mutable_attr = ['1', '2']
__immutable_attr = False
def __init__(self):
self.instance_attr = []
def change_mutable(self):
self.instance_attr.append(self.__class__.__mutable_attr.pop(0))
if len(self.instance_attr) == 1:
self.__class__.__immutable_attr = True
#property
def immutable_attr(self):
return self.__class__.__immutable_attr
So a little bit of explanation:
1. I'm making it harder to access class attributes from the outside to protect them from accidental change by prefixing them with double underscore.
2. I'm doing pop() and append() in one line.
3. I'm setting the value for __immutable_attr immediately after modifying __mutable_attr if the condition is met.
4. I'm exposing immutable_attr as read only property to provide easy way to check it's value.
5. I'm using self.__class__ to access class of the instance - it's more readable than type(self) and gives us direct access to attributes with double underscore.
It's possible to use type in Python to create a new class object, as you probably know:
A = type('A', (object,), {})
a = A() # create an instance of A
What I'm curious about is whether there's any problem with creating different class objects with the same name, eg, following on from the above:
B = type('A', (object,), {})
In other words, is there an issue with this second class object, B, having the same name as our first class object, A?
The motivation for this is that I'd like to get a clean copy of a class to apply different decorators to without using the inheritance approach described in this question.
So I'd like to define a class normally, eg:
class Fruit(object):
pass
and then make a fresh copy of it to play with:
def copy_class(cls):
return type(cls.__name__, cls.__bases__, dict(cls.__dict__))
FreshFruit = copy_class(fruit)
In my testing, things I do with FreshFruit are properly decoupled from things I do to Fruit.
However, I'm unsure whether I should also be mangling the name in copy_class in order to avoid unexpected problems.
In particular, one concern I have is that this could cause the class to be replaced in the module's dictionary, such that future imports (eg, from module import Fruit return the copied class).
There is no reason why you can't have 2 classes with the same __name__ in the same module if you want to and have a good reason to do so.
e.g. In your example from module import Fruit -- python doesn't care at all about the __name__ of the class. It looks in the module's globals for Fruit and imports what it finds there.
Note that, in general, this approach isn't great if you're using super (although the same can be said for class decorators ...):
class A(Base):
def foo(self):
super(A, self).foo()
B = copy_class(A)
In this case, when B.foo is called, it will end up calling super(A, self) which could lead to funky behaviour in a number of circumstances. . .
Consider this code snippet:
import gc
from weakref import ref
def leak_class(create_ref):
class Foo(object):
# make cycle non-garbage collectable
def __del__(self):
pass
if create_ref:
# create a strong reference cycle
Foo.bar = Foo()
return ref(Foo)
# without reference cycle
r = leak_class(False)
gc.collect()
print r() # prints None
# with reference cycle
r = leak_class(True)
gc.collect()
print r() # prints <class '__main__.Foo'>
It creates a reference cycle that cannot be collected, because the referenced instance has a __del__ method. The cycle is created here:
# create a strong reference cycle
Foo.bar = Foo()
This is just a proof of concept, the reference could be added by some external code, a descriptor or anything. If that's not clear to you, remember that each objects mantains a reference to its class:
+-------------+ +--------------------+
| | Foo.bar | |
| Foo (class) +------------>| foo (Foo instance) |
| | | |
+-------------+ +----------+---------+
^ |
| foo.__class__ |
+--------------------------------+
If I could guarantee that Foo.bar is only accessed from Foo, the cycle wouldn't be necessary, as theoretically the instance could hold only a weak reference to its class.
Can you think of a practical way to make this work without a leak?
As some are asking why would external code modify a class but can't control its lifecycle, consider this example, similar to the real-life example I was working to:
class Descriptor(object):
def __get__(self, obj, kls=None):
if obj is None:
try:
obj = kls._my_instance
except AttributeError:
obj = kls()
kls._my_instance = obj
return obj.something()
# usage example #
class Example(object):
foo = Descriptor()
def something(self):
return 100
print Example.foo
In this code only Descriptor (a non-data descriptor) is part of the API I'm implementing. Example class is an example of how the descriptor would be used.
Why does the descriptor store a reference to an instance inside the class itself? Basically for caching purposes. Descriptor required this contract with the implementor: it would be used in any class assuming that
The class has a constructor with no args, that gives an "anonymous instance" (my definition)
The class has some behavior-specific methods (something here).
An instance of the class can stay alive for an undefined amount of time.
It doesn't assume anything about:
How long it takes to construct an object
Whether the class implements del or other magic methods
How long the class is expected to live
Moreover the API was designed to avoid any extra load on the class implementor. I could have moved the responsibility for caching the object to the implementor, but I wanted a standard behavior.
There actually is a simple solution to this problem: make the default behavior to cache the instance (like it does in this code) but allow the implementor to override it if they have to implement __del__.
Of course this wouldn't be as simple if we assumed that the class state had to be preserved between calls.
As a starting point, I was coding a "weak object", an implementation of object that only kept a weak reference to its class:
from weakref import proxy
def make_proxy(strong_kls):
kls = proxy(strong_kls)
class WeakObject(object):
def __getattribute__(self, name):
try:
attr = kls.__dict__[name]
except KeyError:
raise AttributeError(name)
try:
return attr.__get__(self, kls)
except AttributeError:
return attr
def __setattr__(self, name, value):
# TODO: implement...
pass
return WeakObject
Foo.bar = make_proxy(Foo)()
It appears to work for a limited number of use cases, but I'd have to reimplement the whole set of object methods, and I don't know how to deal with classes that override __new__.
For your example, why don't you store _my_instance in a dict on the descriptor class, rather than on the class holding the descriptor? You could use a weakref or WeakValueDictionary in that dict, so that when the object disappears the dict will just lose its reference and the descriptor will create a new one on the next access.
Edit: I think you have a misunderstanding about the possibility of collecting the class while the instance lives on. Methods in Python are stored on the class, not the instance (barring peculiar tricks). If you have an object obj of class Class, and you allowed Class to be garbage collected while obj still exists, then calling a method obj.meth() on the object would fail, because the method would have disappeared along with the class. That is why your only option is to weaken your class->obj reference; even if you could make objects weakly reference their class, all it would do is break the class if the weakness ever "took effect" (i.e., if the class were collected while an instance still existed).
The problem you're facing is just a special case of the general ref-cycle-with-__del__ problem.
I don't see anything unusual in the way the cycles are created in your case, which is to say, you should resort to the standard ways of avoiding the general problem.
I think implementing and using a weak object would be hard to get right, and you would still need to remember to use it in all places where you define __del__. It doesn't sound like the best approach.
Instead, you should try the following:
consider not defining __del__ in your class (recommended)
in classes which define __del__, avoid reference cycles (in general, it might be hard/impossible to make sure no cycles are created anywhere in your code. In your case, seems like you want the cycles to exist)
explicitly break the cycles, using del (if there are appropriate points to do that in your code)
scan the gc.garbage list, and explicitly break reference cycles (using del)
Suppose we have the following code:
class A:
var = 0
a = A()
I do understand that a.var and A.var are different variables, and I think I understand why this thing happens. I thought it was just a side effect of python's data model, since why would someone want to modify a class variable in an instance?
However, today I came across a strange example of such a usage: it is in google app engine db.Model reference. Google app engine datastore assumes we inherit db.Model class and introduce keys as class variables:
class Story(db.Model):
title = db.StringProperty()
body = db.TextProperty()
created = db.DateTimeProperty(auto_now_add=True)
s = Story(title="The Three Little Pigs")
I don't understand why do they expect me to do like that? Why not introduce a constructor and use only instance variables?
The db.Model class is a 'Model' style class in classic Model View Controller design pattern.
Each of the assignments in there are actually setting up columns in the database, while also giving an easy to use interface for you to program with. This is why
title="The Three Little Pigs"
will update the object as well as the column in the database.
There is a constructor (no doubt in db.Model) that handles this pass-off logic, and it will take a keyword args list and digest it to create this relational model.
This is why the variables are setup the way they are, so that relation is maintained.
Edit: Let me describe that better. A normal class just sets up the blue print for an object. It has instance variables and class variables. Because of the inheritence to db.Model, this is actually doing a third thing: Setting up column definitions in a database. In order to do this third task it is making EXTENSIVE behinds the scenes changes to things like attribute setting and getting. Pretty much once you inherit from db.Model you aren't really a class anymore, but a DB template. Long story short, this is a VERY specific edge case of the use of a class
If all variables are declared as instance variables then the classes using Story class as superclass will inherit nothing from it.
From the Model and Property docs, it looks like Model has overridden __getattr__ and __setattr__ methods so that, in effect, "Story.title = ..." does not actually set the instance attribute; instead it sets the value stored with the instance's Property.
If you ask for story.__dict__['title'], what does it give you?
I do understand that a.var and A.var are different variables
First off: as of now, no, they aren't.
In Python, everything you declare inside the class block belongs to the class. You can look up attributes of the class via the instance, if the instance doesn't already have something with that name. When you assign to an attribute of an instance, the instance now has that attribute, regardless of whether it had one before. (__init__, in this regard, is just another function; it's called automatically by Python's machinery, but it simply adds attributes to an object, it doesn't magically specify some kind of template for the contents of all instances of the class - there's the magic __slots__ class attribute for that, but it still doesn't do quite what you might expect.)
But right now, a has no .var of its own, so a.var refers to A.var. And you can modify a class attribute via an instance - but note modify, not replace. This requires, of course, that the original value of the attribute is something modifiable - a list qualifies, a str doesn't.
Your GAE example, though, is something totally different. The class Story has attributes which specifically are "properties", which can do assorted magic when you "assign to" them. This works by using the class' __getattr__, __setattr__ etc. methods to change the behaviour of the assignment syntax.
The other answers have it mostly right, but miss one critical thing.
If you define a class like this:
class Foo(object):
a = 5
and an instance:
myinstance = Foo()
Then Foo.a and myinstance.a are the very same variable. Changing one will change the other, and if you create multiple instances of Foo, the .a property on each will be the same variable. This is because of the way Python resolves attribute access: First it looks in the object's dict, and if it doesn't find it there, it looks in the class's dict, and so forth.
That also helps explain why assignments don't work the way you'd expect given the shared nature of the variable:
>>> bar = Foo()
>>> baz = Foo()
>>> Foo.a = 6
>>> bar.a = 7
>>> bar.a
7
>>> baz.a
6
What happened here is that when we assigned to Foo.a, it modified the variable that all instance of Foo normally resolve when you ask for instance.a. But when we assigned to bar.a, Python created a new variable on that instance called a, which now masks the class variable - from now on, that particular instance will always see its own local value.
If you wanted each instance of your class to have a separate variable initialized to 5, the normal way to do it would be like this:
class Foo(object);
def __init__(self):
self.a = 5
That is, you define a class with a constructor that sets the a variable on the new instance to 5.
Finally, what App Engine is doing is an entirely different kind of black magic called descriptors. In short, Python allows objects to define special __get__ and __set__ methods. When an instance of a class that defines these special methods is attached to a class, and you create an instance of that class, attempts to access the attribute will, instead of setting or returning the instance or class variable, they call the special __get__ and __set__ methods. A much more comprehensive introduction to descriptors can be found here, but here's a simple demo:
class MultiplyDescriptor(object):
def __init__(self, multiplicand, initial=0):
self.multiplicand = multiplicand
self.value = initial
def __get__(self, obj, objtype):
if obj is None:
return self
return self.multiplicand * self.value
def __set__(self, obj, value):
self.value = value
Now you can do something like this:
class Foo(object):
a = MultiplyDescriptor(2)
bar = Foo()
bar.a = 10
print bar.a # Prints 20!
Descriptors are the secret sauce behind a surprising amount of the Python language. For instance, property is implemented using descriptors, as are methods, static and class methods, and a bunch of other stuff.
These class variables are metadata to Google App Engine generate their models.
FYI, in your example, a.var == A.var.
>>> class A:
... var = 0
...
... a = A()
... A.var = 3
... a.var == A.var
1: True