I am trying to understand how the refcounts work in python, Can someone explain why the count gets printed as 2 when its just 1 instance for the object? (is it because of the temporary being passed to the getrefcount method?) Also why is the number more when invoked when invoked from the member (is self reference bumping the refcount?)
import sys
class A:
def print_ref_count(self):
print sys.getrefcount(self)
a = A()
print sys.getrefcount(a) # prints 2
a.print_ref_count() # prints 4
print sys.getrefcount(a) # prints 2
There are implicit reference increments everywhere in Python. When you pass a as an argument to a function, it is incref-ed, as getrefcount's own docstring notes:
The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to getrefcount().
The two additional references when you invoke the method are, respectively:
The reference held by the name self within the method
The implicit reference created when it makes a bound method object specific to that instance when you do a.print_ref_count (even before actually calling it); type(a.print_ref_count) is a temporary bound method, which must include references to both the instance and the method itself (you could do to_print = a.print_ref_count, then del a, and to_print must still work even though a is no longer referencing the instance)
Getting hung up on the reference counts is not very helpful though; it's an implementation detail of Python (CPython specifically; other Python interpreters can and do use non-reference counting garbage collection), and as you can see, many implicit references are created that may not actually be stored in named variables at all.
Related
I have the following function:
def myfn():
big_obj = BigObj()
result = consume(big_obj)
return result
When is the reference count for the value of BigObj() increased / decreased:
Is it:
when consume(big_obj) is called (since big_obj is not referenced afterwards in myfn)
when the function returns
some point, I don't no yet
Would it make a difference to change the last line to:
return consume(big_obj)
Edit (clarification for comments):
A local variable exists until the function returns
the reference can be deleted with del obj
But what is with temporaries (e.g f1(f2())?
I checked references to temporaries with this code:
import sys
def f2(c):
print("f2: References to c", sys.getrefcount(c))
def f0():
print("f0")
f2(object())
def f1():
c = object()
print("f1: References to c", sys.getrefcount(c))
f2(c)
f0()
f1()
This prints:
f0
f2: References to c 3
f1: References to c 2
f2: References to c 4
It seems, that references to temporary variables are held. Not that getrefcount gives one more than you would expect because it holds a reference, too.
When is the reference count for big_obj decreased
big_obj does not have a reference count. Variables don't have reference counts. Values do.
big_obj = BigObj()
This line of code creates an instance of the BigObj class. The reference count for that instance may increase or decrease multiple times, depending on the implementation details of that creation process (which is not necessarily written in Python). Notably, though, the assignment to the name big_obj increases the reference count.
when the function returns
At this point, the name big_obj ceases to exist - the name does not disappear simply because it won't be used again. (That's really hard to detect in the general case, and there isn't a particular benefit to it normally). If you must cause a name to cease to exist at a specific point in the operation (for example, because you know that is the last reference and want to trigger garbage collection; or perhaps because you are doing something tricky with __weakref__) then that is what the del statement is for.
Because a name for the object ceases to exist, its reference count decreases. If and when that count reaches zero, the object is garbage collected. It may have any number of references stored in other places, for a wide variety of reasons. (For example, there could be a bug in C code that implements the class; or the class might deliberately maintain its own list of every instance ever created.)
Note that all of the above pertains specifically to the reference implementation. In other implementations, things will differ. There might be some other trigger for garbage collection to happen. There might not be reference counting at all (as with Jython).
From the comments, it seems like what you are worried about is the potential for a memory leak. The code that you show cannot cause a memory leak - but neither can it fix a memory leak caused elsewhere. In Python, as in garbage-collected languages in general, memory leaks happen because objects hold on to references to each other that aren't needed. But there is no concept of "ownership" or "transfer" of references normally - all you need to do is not do things like "maintain a list of every instance ever created" without a) a good reason and b) a way to take instances out of that list when you want to forget about them.
A local variable, though, by definition, cannot extend the lifetime of an object beyond the local scope.
Disclaimer: Most information is from the comments. So credit for every one who participated in the discussion.
When an object is deleted is an implementation detail in general.
I will refer to CPython, which is based on reference counting. I ran the code examples with CPython 3.10.0.
An object is deleted, when the reference count hits zero.
Returning from a function deletes all local references.
Assigning a name to a new value decreases the reference count of the old value
passing a local increases the reference count. The reference is in on the stack(frame)
Returning from a function removes the reference from the stack
The last point is even valid for temporary references like f(g()). The last reference to g() is deleted, when f returns (assuming that g does not save a reference somewhere)see here
So for the example from the question:
def myfn():
big_obj = BigObj() # reference 1
result = consume(big_obj) # reference 2 on the stack frame for
# consume. Not yet counting any
# reference inside of consume
# after consume returns: The stack frame
# and reference 2 are deleted. Reference
# 1 remains
return result # myfn returns reference 1 is deleted.
# BigObj is deleted
def consume(big_obj):
pass # consume is holding reference 3
If we would change this to:
def myfn():
return consume(BigObj()) # reference is still saved on the stack
# frame and will be only deleted after
# consume returns
def consume(big_obj):
pass # consume is holding reference 2
How can I check reliably, if an object was deleted?
You cannot rely on gc.get_objects(). gc is used to detect and recycle reference cycles. Not every reference is tracked by the gc.
You can create a weak reference and check if the reference is still valid.
class BigObj:
pass
import weakref
ref = None
def make_ref(obj):
global ref
ref = weakref.ref(obj)
return obj
def myfn():
return consume(make_ref(BigObj()))
def consume(obj):
obj = None # remove to see impact on ref count
print(sys.getrefcount(ref()))
print(ref()) # There is still a valid reference. It is the one from consume stack frame
myfn()
How to pass a reference to a function and remove all references in the calling function?
You can box the reference, pass to the function and clear the boxed reference from inside the function:
class Ref:
def __init__(ref):
self.ref = ref
def clear():
self.ref = None
def f1(ref):
r = ref.ref
ref.clear()
def f2():
f1(Ref(object())
Variables have function scope in Python, so they aren't removed until the function returns. As far as I can tell, you can't destroy a reference to a local variable in a function from outside that function. I added some gc calls in the example code to test this.
import gc
class BigObj:
pass
def consume(obj):
del obj # only deletes the local reference to obj, but another one still exists in the calling function
def myfn():
big_obj = BigObj()
big_obj_id = id(big_obj) # in CPython, this is the memory address of the object
consume(big_obj)
print(any(id(obj) == big_obj_id for obj in gc.get_objects()))
return big_obj_id
>>> big_obj_id = myfn()
True
>>> gc.collect() # I'm not sure where the reference cycle is, but I needed to run this to clean out the big object from the gc's list of objects in my shell
>>> print(any(id(obj) == big_obj_id for obj in gc.get_objects()))
False
Since True was printed, the big object still existed after we forced garbage collection to occur even though there were no references to that variable after that point in the function. Forcing garbage collection after the function returns rightfully determines that the reference count to the big object is 0, so it cleans that object up. NOTE: As the comments below point out, ids for deleted objects may be reused so checking for equal ids may result in false positives. However, I'm confident that the conclusion is still correct.
One thing you can do to reclaim that memory earlier is to make the big object global, which could allow you to delete it from within the called function.
def consume():
# do whatever you need to do with the big object
big_obj_id = id(big_obj)
del globals()["big_obj"]
print(any(id(obj) == big_obj_id for obj in gc.get_objects()))
# do anything else you need to do without the big object
def myfn():
globals()["big_obj"] = BigObj()
result = consume()
return result
>>> myfn()
False
This sort of pattern is pretty weird and likely very hard to maintain though, so I would advise against using this. If you only need to delete the big object right after consume() is called, you could do something like this in order to free up the memory used by the big object as soon as possible.
big_obj = BigObj()
consume(big_obj)
del big_obj
Another strategy you might try is deleting the references within the big object that's passed in from the consume() function with del big_obj.x for some attribute x.
I'm trying to create a class using a static List, which collects all new instances of an object class. The problem I'm facing, seems like as soon as i try to use a list the same way as for example an integer, i can't use the magic marker __del__ anymore.
My Example:
class MyClass(object):
count = 0
#instances = []
def __init__(self, a, b):
self.a = a
self.b = b
MyClass.count += 1
#MyClass.instances.append(self)
def __str__(self):
return self.__repr__()
def __repr__(self):
return "a: " + str(self.a) + ", b: " + str(self.b)
def __del__(self):
MyClass.count -= 1
#MyClass.instances.remove(self)
A = MyClass(1,'abc')
B = MyClass(2,'def')
print MyClass.count
del B
print MyClass.count
With comments I get the correct answer:
2
1
But without the comments - including now the static object list MyClass.instances I get the wrong answer:
2
2
It seems like MyClass can't reach its __del__ method anymore! How Come?
From the docs,
del x doesn’t directly call x.__del__() — the former decrements the reference
count for x by one, and the latter is only called when x‘s reference count
reaches zero.
When you uncomment,
instances = []
...
...
MyClass.instances.append(self)
You are storing a reference to the current Object in the MyClass.instances. That means, the reference count is internally incremented by 1. That is why __del__ is not getting called immediately.
To resolve this problem, explicitly remove the item from the list like this
MyClass.instances.remove(B)
del B
Now it will print
2
1
as expected.
There is one more way to fix this problem. That is to use weakref. From the docs,
A weak reference to an object is not enough to keep the object alive:
when the only remaining references to a referent are weak references,
garbage collection is free to destroy the referent and reuse its
memory for something else. A primary use for weak references is to
implement caches or mappings holding large objects, where it’s desired
that a large object not be kept alive solely because it appears in a
cache or mapping.
So, having a weakref will not postpone object's deletion. With weakref, this can be fixed like this
MyClass.instances.append(weakref.ref(self))
...
...
# MyClass.instances.remove(weakref.ref(self))
MyClass.instances = [w_ref for w_ref in MyClass.instances if w_ref() is None]
Instead of using remove method, we can call each of the weakref objects and if they return None, they are already dead. So, we filter them out with the list comprehension.
So, now, when you say del B, even though weakrefs exist for B, it will call __del__ (unless you have made some other variable point to the same object, like by doing an assigment).
From to http://docs.python.org/2.7/reference/datamodel.html#basic-customization I quote (paragraph in gray after object.__del__):
del x doesn’t directly call x.__del__() — the former decrements the reference count for x by one, and the latter is only called when x‘s reference count reaches zero.
Here you call del B but there is still an instance of B in MyClass.instances, so that B is still referenced and hence not destroyed, so that the __del__ function is not called.
If you call directly B.__del__(), it works.
__del__ is only called when no more instances are left.
You should consider putting only weak refs into the MyClass.instances list.
This can be achieved with import weakref and then
either using a WeakSet for the list
or putting weakref.ref(self) into the list.
__del__ is automatically called whenever the last "strict" reference is removed. The weakrefs disappear automatically.
But be aware that there are some caveats on __del__ mentioned in the docs.
__del__ is used when the garbage collector remove an object from the memory. If you add your object to MyClass.instances then the object is marked as "used" and the garbage collector will never try to remove it. And so __del__ is never called.
You'd better use an explicit function (MyClass.del_element()) because you can't really predict when __del__ will be called (even if you don't add it to a list).
I would like to do something like the following:
class Foo(object):
def __init__(self):
self.member = 10
pass
def factory(foo):
foo = Foo()
aTestFoo = None
factory(aTestFoo)
print aTestFoo.member
However it crashes with AttributeError: 'NoneType' object has no attribute 'member':
the object aTestFoo has not been modified inside the call of the function factory.
What is the pythonic way of performing that ? Is it a pattern to avoid ? If it is a current mistake, how is it called ?
In C++, in the function prototype, I would have added a reference to the pointer to be created in the factory... but maybe this is not the kind of things I should think about in Python.
In C#, there's the key word ref that allows to modify the reference itself, really close to the C++ way. I don't know in Java... and I do wonder in Python.
Python does not have pass by reference. One of the few things it shares with Java, by the way. Some people describe argument passing in Python as call by value (and define the values as references, where reference means not what it means in C++), some people describe it as pass by reference with reasoning I find quite questionable (they re-define it to use to what Python calls "reference", and end up with something which has nothing to do with what has been known as pass by reference for decades), others go for terms which are not as widely used and abused (popular examples are "{pass,call} by {object,sharing}"). See Call By Object on effbot.org for a rather extensive discussion on the defintions of the various terms, on history, and on the flaws in some of the arguments for the terms pass by reference and pass by value.
The short story, without naming it, goes like this:
Every variable, object attribute, collection item, etc. refers to an object.
Assignment, argument passing, etc. create another variable, object attribute, collection item, etc. which refers to the same object but has no knowledge which other variables, object attributes, collection items, etc. refer to that object.
Any variable, object attribute, collection item, etc. can be used to modify an object, and any other variable, object attribute, collection item, etc. can be used to observe that modification.
No variable, object attribute, collection item, etc. refers to another variable, object attribute, collection items, etc. and thus you can't emulate pass by reference (in the C++ sense) except by treating a mutable object/collection as your "namespace". This is excessively ugly, so don't use it when there's a much easier alternative (such as a return value, or exceptions, or multiple return values via iterable unpacking).
You may consider this like using pointers, but not pointers to pointers (but sometimes pointers to structures containing pointers) in C. And then passing those pointers by value. But don't read too much into this simile. Python's data model is significantly different from C's.
You are making a mistake here because in Python
"We call the argument passing technique _call by sharing_,
because the argument objects are shared between the
caller and the called routine. This technique does not
correspond to most traditional argument passing techniques
(it is similar to argument passing in LISP). In particular it
is not call by value because mutations of arguments per-
formed by the called routine will be visible to the caller.
And it is not call by reference because access is not given
to the variables of the caller, but merely to certain objects."
in Python, the variables in the formal argument list are bound to the
actual argument objects. the objects are shared between caller
and callee; there are no "fresh locations" or extra "stores" involved.
(which, of course, is why the CLU folks called this mechanism "call-
by-sharing".)
and btw, Python functions doesn't run in an extended environment, either. function bodies have very limited access to the surrounding environment.
The Assignment Statements section of the Python docs might be interesting.
The = statement in Python acts differently depending on the situation, but in the case you present, it just binds the new object to a new local variable:
def factory(foo):
# This makes a new instance of Foo,
# and binds it to a local variable `foo`,
foo = Foo()
# This binds `None` to a top-level variable `aTestFoo`
aTestFoo = None
# Call `factory` with first argument of `None`
factory(aTestFoo)
print aTestFoo.member
Although it can potentially be more confusing than helpful, the dis module can show you the byte-code representation of a function, which can reveal how Python works internally. Here is the disassembly of `factory:
>>> dis.dis(factory)
4 0 LOAD_GLOBAL 0 (Foo)
3 CALL_FUNCTION 0
6 STORE_FAST 0 (foo)
9 LOAD_CONST 0 (None)
12 RETURN_VALUE
What that says is, Python loads the global Foo class by name (0), and calls it (3, instantiation and calling are very similar), then stores the result in a local variable (6, see STORE_FAST). Then it loads the default return value None (9) and returns it (12)
What is the pythonic way of performing that ? Is it a pattern to avoid ? If it is a current mistake, how is it called ?
Factory functions are rarely necessary in Python. In the occasional case where they are necessary, you would just return the new instance from your factory (instead of trying to assign it to a passed-in variable):
class Foo(object):
def __init__(self):
self.member = 10
pass
def factory():
return Foo()
aTestFoo = factory()
print aTestFoo.member
Your factory method doesn't return anything - and by default it will have a return value of None. You assign aTestFoo to None, but never re-assign it - which is where your actual error is coming from.
Fixing these issues:
class Foo(object):
def __init__(self):
self.member = 10
pass
def factory(obj):
return obj()
aTestFoo = factory(Foo)
print aTestFoo.member
This should do what I think you are after, although such patterns are not that typical in Python (ie, factory methods).
I have to write a testing module and have c++-Background. That said, I am aware that there are no pointers in python but how do I achieve the following:
I have a test method which looks in pseudocode like this:
def check(self,obj,prop,value):
if obj.prop <> value: #this does not work,
#getattr does not work either, (objects has no such method (interpreter output)
#I am working with objects from InCyte's python interface
#the supplied findProp method does not do either (i get
#None for objects I can access on the shell with obj.prop
#and yes I supply the method with a string 'prop'
if self._autoadjust:
print("Adjusting prop from x to y")
obj.prop = value #setattr does not work, see above
else:
print("Warning Value != expected value for obj")
Since I want to check many different objects in separate functions I would like to be able to keep the check method in place.
In general, how do I ensure that a function affects the passed object and does not create a copy?
myobj.size=5
resize(myobj,10)
print myobj.size #jython =python2.5 => print is not a function
I can't make resize a member method since the myobj implementation is out of reach, and I don't want to type myobj=resize(myobj, 10) everywhere
Also, how can I make it so that I can access those attributes in a function to which i pass the object and the attribute name?
getattr isn't a method, you need to call it like this
getattr(obj, prop)
similarly setattr is called like this
setattr(obj, prop, value)
In general how do I ensure that a function affects the passed object and does not create a copy?
Python is not C++, you never create copies unless you explicitly do so.
I cant make resize a member method since myobj implementation is out of reach, and I don't want to type myobj=resize(myobj,10) everywere
I don't get it? Why should be out of reach? if you have the instance, you can invoke its methods.
In general, how do I ensure that a function affects the passed object
By writing code inside the function that affects the passed-in object, instead of re-assigning to the name.
and does not create a copy?
A copy is never created unless you ask for one.
Python "variables" are names for things. They don't store objects; they refer to objects. However, unlike C++ references, they can be made to refer to something else.
When you write
def change(parameter):
parameter = 42
x = 23
change(x)
# x is still 23
The reason x is still 23 is not because a copy was made, because a copy wasn't made. The reason is that, inside the function, parameter starts out as a name for the passed-in integer object 23, and then the line parameter = 42 causes parameter to stop being a name for 23, and start being a name for 42.
If you do
def change(parameter):
parameter.append(42)
x = [23]
change(x)
# now x is [23, 42]
The passed-in parameter changes, because .append on a list changes the actual list object.
I can't make resize a member method since the myobj implementation is out of reach
That doesn't matter. When Python compiles, there is no type-checking step, and there is no step to look up the implementation of a method to insert the call. All of that is handled when the code actually runs. The code will get to the point myobj.resize(), look for a resize attribute of whatever object myobj currently refers to (after all, it can't know ahead of time even what kind of object it's dealing with; variables don't have types in Python but instead objects do), and attempt to call it (throwing the appropriate exceptions if (a) the object turns out not to have that attribute; (b) the attribute turns out not to actually be a method or other sort of function).
Also, how can I make it so that I can access those attributes in a function to which i pass the object and the attribute name? / getattr does not work either
Certainly it works if you use it properly. It is not a method; it is a built-in top-level function. Same thing with setattr.
This question already has answers here:
"Least Astonishment" and the Mutable Default Argument
(33 answers)
Closed 8 years ago.
Refer to this
>>> def foo(counter=[0]):
... counter[0] += 1
... print("Counter is %i." % counter[0]);
...
>>> foo()
Counter is 1.
>>> foo()
Counter is 2.
>>>
Default values are initialized only when the function is first evaluated, not each time it is executed, so you can use a list or any other mutable object to maintain static values.
Question> Why the counter can keep its updated value during different callings? Is it true that counter refers to the same memory used to store the temporary list of the default parameter so that it can refer to the same memory address and keep the updated values during the calls?
The object created as the default argument becomes part of the function and persists until the function is destroyed.
First things first: if you are coding in Python: forget about "memory address" - you will never need one. Yes, there are objects, and they are placed in memory, and if you are referring to the same object, it is in the same "memory address" - but that does not matter - there could even be an implementation where objects don't have a memory address at all (just a place in a data structure, for example).
Then, when Python encounters the function body, as it is defined above, it does create a code object with the contents of the function body, and executes the function definition line - resolving any expressions inlined there and setting the results of those expressions as the default parameters for that function. There is nothing "temporary" about these objects. The expressions (in this case [0]) are evaluated, the resulting objects (in this case a Python list with a single element) are created, and assigned to a reference in the function object (a position in the functions's "func_defaults" attribute - remember that functions themselves are objects in Python.)
Whenever you call that function, if you don't pass a value to the counter parameter, it is assigned to the object recorded in the func_defaults attribute -in this case, a Python list. And it is the same Python list that was created at function parsing time.
What happens is that Python lists themselves are mutable: one can change its contents, add more elements, etc...but they are still the same list.
What the code above does is exactly incrementing the element in the position 0 of the list.
You can access this list at any time in the example above by typing foo.func_defaults[0]
If you want to "reset" the counter, you can just do: foo.func_defaults[0][0]=0, for example.
Of course, it is a side effect of how thigns a reimplemented in Python, and though consistent, and even docuemnted, should not be used in "real code". If you need "static variables", use a class instead of a function like the above:
class Foo(object):
def __init__(self):
self.counter = 0
def __call__(self):
self.counter += 1
print("Counter is %i." % self.counter)
And on the console:
>>> foo = Foo()
>>> foo()
Counter is 1.
>>> foo()
Counter is 2.
>>>
Default arguments are evaluated at the function definition, and not its calls. When the foo function object is created (remember, functions are first class citizens in python) its arguments, including default arguments, are all local names bound to the function object. The function object is created when the def: statement is first encountered.
In this case counter is set to a mutable list in the definition of foo(), so calling it without an argument gives the original mutable list instantiated at definition the name counter. This makes calls to foo() use and modify the original list.
Like all function arguments, counter is a local name defined when the function is declared. The default argument, too, is evaluated once when the function is declared.
If, when calling foo(), no value is passed for counter, that default value, the exact object instance provided at function definition time, is given the name counter. Since this is a mutable object (in this case, a list), any changes made to it remain after the function has completed.
The function contains a reference to the list in its default argument tuple, func_defaults, which prevents it from being destroyed.