Given the example:
>>> import gc
>>> d = { 1 : object() }
>>> gc.get_referrers(d[1])
[] # Python 2.7
[{1: <object object at 0x003A0468>}] # Python 2.5
Why is d not listed as refererrer to to the object?
EDIT1: Although the dict in d references the object, why is the dictionairy not listed?
The doc mentions that:
This function will only locate those containers which support garbage
collection; extension types which do refer to other objects but do not
support garbage collection will not be found.
Seems that dictionaries do not support it.
And here is why:
The garbage collector tries to avoid tracking simple containers which
can’t be part of a cycle. In Python 2.7, this is now true for tuples
and dicts containing atomic types (such as ints, strings, etc.).
Transitively, a dict containing tuples of atomic types won’t be
tracked either. This helps reduce the cost of each garbage collection
by decreasing the number of objects to be considered and traversed by
the collector.
— From What's new in Python 2.7
It seems that object() is considered an atomic type, and trying this with an instance of a user-defined class (that is, not object) confirms this as your code now works.
# Python 2.7
>>> class A(object): pass
>>> r = A()
>>> d = {1: r}
>>> del r
>>> gc.get_referrers(d[1])
[{1: <__main__.A instance at 0x0000000002663708>}]
See also issue 4688.
This is a change in how objects are tracked in Python 2.7; tuples and dictionaries containing only atomic types (including instances of object()), which would never require cycle breaking, are not listed anymore.
See http://bugs.python.org/issue4688; this was implemented to avoid a performance issues with creating loads of tuples or dictionaries.
The work-around is to add an object to your dictionary that does need tracking:
>>> gc.is_tracked(d)
False
>>> class Foo(object): pass
...
>>> d['_'] = Foo()
>>> gc.is_tracked(d)
True
>>> d in gc.get_referrers(r)
True
Once tracked, a dictionary only goes back to being untracked after a gc collection cycle:
>>> del d['_']
>>> gc.is_tracked(d)
True
>>> d in gc.get_referrers(r)
True
>>> gc.collect()
0
>>> gc.is_tracked(d)
False
>>> d in gc.get_referrers(r)
False
Related
Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))
Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))
Why does CPython (no clue about other Python implementations) have the following behavior?
tuple1 = ()
tuple2 = ()
dict1 = {}
dict2 = {}
list1 = []
list2 = []
# makes sense, tuples are immutable
assert(id(tuple1) == id(tuple2))
# also makes sense dicts are mutable
assert(id(dict1) != id(dict2))
# lists are mutable too
assert(id(list1) != id(list2))
assert(id(()) == id(()))
# why no assertion error on this?
assert(id({}) == id({}))
# or this?
assert(id([]) == id([]))
I have a few ideas why it may, but can't find a concrete reason why.
EDIT
To further prove Glenn's and Thomas' point:
[1] id([])
4330909912
[2] x = []
[3] id(x)
4330909912
[4] id([])
4334243440
When you call id({}), Python creates a dict and passes it to the id function. The id function takes its id (its memory location), and throws away the dict. The dict is destroyed. When you do it twice in quick succession (without any other dicts being created in the mean time), the dict Python creates the second time happens to use the same block of memory as the first time. (CPython's memory allocator makes that a lot more likely than it sounds.) Since (in CPython) id uses the memory location as the object id, the id of the two objects is the same. This obviously doesn't happen if you assign the dict to a variable and then get its id(), because the dicts are alive at the same time, so their id has to be different.
Mutability does not directly come into play, but code objects caching tuples and strings do. In the same code object (function or class body or module body) the same literals (integers, strings and certain tuples) will be re-used. Mutable objects can never be re-used, they're always created at runtime.
In short, an object's id is only unique for the lifetime of the object. After the object is destroyed, or before it is created, something else can have the same id.
CPython is garbage collecting objects as soon as they go out of scope, so the second [] is created after the first [] is collected. So, most of the time it ends up in the same memory location.
This shows what's happening very clearly (the output is likely to be different in other implementations of Python):
class A:
def __init__(self): print("a")
def __del__(self): print("b")
# a a b b False
print(A() is A())
# a b a b True
print(id(A()) == id(A()))
#!/usr/bin/env python
def modify_dict():
d['two'] = 2
d = {'one':1}
modify_dict()
print d
I get
$ ./globaltest.py
{'two': 2, 'one': 1}
I was hoping to see only {'one':1} since d is not declared global within the function. Why did d get both key-value pairs ?
Take a look at the data model of python. Dictionaries and lists are mutable objects, which is why globally defined dictionaries for example do not need to be declared global. Their contents may be changed at any time.
To understand mutability, think of the strings in python. They are an immutable object. You can for example replace the contents of a given string but in doing so the interpreter creates a new string object, and thus gives this string object a new identity (and thus a memory address).
>>> s = "foo"
>>> id(s)
140202745404072
>>> s = "bar"
>>> id(s)
140202745404112
I've answered a few similar questions before, so take a look if you can find more information from them.
Python search for variables is based on the LEGB rule:
Local, Enclosing functions, Global, Built-in
When you call your function it tries to find a variable named d, and will find in global scope since you created d before calling the function. And since d is mutable, it will get updated.
In a quick workaround is copy dictionary in local scope of the function.
import copy
d = {'one':1}
def modify_dict():
local_d = copy.deepcopy(d)
local_d['two'] = 2
print local_d
modify_dict()
print d
you will see output following :
>>>{'two': 2, 'one': 1}
>>>{'one': 1}
I think the best way to explain the situation is with an example:
>>> class Person:
... def __init__(self, brother=None):
... self.brother = brother
...
>>> bob = Person()
>>> alice = Person(brother=bob)
>>> import shelve
>>> db = shelve.open('main.db', writeback=True)
>>> db['bob'] = bob
>>> db['alice'] = alice
>>> db['bob'] is db['alice'].brother
True
>>> db['bob'] == db['alice'].brother
True
>>> db.close()
>>> db = shelve.open('main.db',writeback=True)
>>> db['bob'] is db['alice'].brother
False
>>> db['bob'] == db['alice'].brother
False
The expected output for both comparisons is True again. However, pickle (which is used by shelve) seems to be re-instantiating bob and alice.brother separately. How can I "fix" this using shelve/pickle? Is it possible for db['alice'].brother to point to db['bob'] or something similar? Notice I do not want only to compare both, I need both to actually be the same.
As suggested by Blckknght I tried pickling the entire dictionary at once, but the problem persists since it seems to pickle each key separately.
I believe that the issue you're seeing comes from the way the shelve module stores its values. Each value is pickled independently of the other values in the shelf, which means that if the same object is inserted as a value under multiple keys, the identity will not be preserved between the keys. However, if a single value has multiple references to the same object, the identity will be maintained within that single value.
Here's an example:
a = object() # an arbitrary object
db = shelve.open("text.db")
db['a'] = a
db['another_a'] = a
db['two_a_references'] = [a, a]
db.close()
db = shelve.open("text.db") # reopen the db
print(db['a'] is db['another_a']) # prints False
print(db['two_a_references'][0] is db['two_a_references'][1]) # prints True
The first print tries to confirm the identity of two versions of the object a that were inserted in the database, one under the key 'a' directly, and another under 'another_a'. It doesn't work because the separate values are pickled separately, and so the identity between them was lost.
The second print tests whether the two references to a that were stored under the key 'two_a_references' were maintained. Because the list was pickled in one go, the identity is kept.
So to address your issue you have a few options. One approach is to avoid testing for identity and rely on an __eq__ method in your various object types to determine if two objects are semantically equal, even if they are not the same object. Another would be to bundle all your data into a single object (e.g. a dictionary) which you'd then save with pickle.dump and restore with pickle.load rather than using shelve (or you could adapt this recipe for a persistent dictionary, which is linked from the shelve docs, and does pretty much that).
The appropriate way, in Python, is to implement the __eq__ and __ne__ functions inside of the Person class, like this:
class Person(object):
def __eq__(self, other):
return (isinstance(other, self.__class__)
and self.__dict__ == other.__dict__)
def __ne__(self, other):
return not self.__eq__(other)
Generally, that should be sufficient, but if these are truly database objects and have a primary key, it would be more efficient to check that attribute instead of self.__dict__.
Problem
To preserve identity with shelve you need to preserve identity with pickleread this.
Solution
This class saves all the objects on its class site and restores them if the identity is the same. You should be able to subclass from it.
>>> class PickleWithIdentity(object):
identity = None
identities = dict() # maybe use weakreference dict here
def __reduce__(self):
if self.identity is None:
self.identity = os.urandom(10) # do not use id() because it is only 4 bytes and not random
self.identities[self.identity] = self
return open_with_identity, (self.__class__, self.__dict__), self.__dict__
>>> def open_with_identity(cls, dict):
if dict['identity'] in cls.identities:
return cls.identities[dict['identity']]
return cls()
>>> p = PickleWithIdentity()
>>> p.asd = 'asd'
>>> import pickle
>>> import os
>>> pickle.loads(pickle.dumps(p))
<__main__.PickleWithIdentity object at 0x02D2E870>
>>> pickle.loads(pickle.dumps(p)) is p
True
Further problems can occur because the state may be overwritten:
>>> p.asd
'asd'
>>> ps = pickle.dumps(p)
>>> p.asd = 123
>>> pickle.loads(ps)
<__main__.PickleWithIdentity object at 0x02D2E870>
>>> p.asd
'asd'