I read different articles about garbage collection. They say that we need gc module to clean the reference cycle object. But can we do the cleaning by simply using del?
For example, if I do the following, do I successfully free the memory of this reference cycle object? If yes, then why do we need gc module anyway? If not, then why not?
>>> x = [1, 2, 3]
>>> x.append(x) # create reference cycle
>>> print(x)
[1, 2, 3, [...]]
>>> sys.getrefcount(x)
3
>>> del x[3]
>>> print(x)
[1, 2, 3]
>>> sys.getrefcount(x)
2
>>> del x # reference count of x goes to 0!
cpython is reference counted. When the object reference count goes to zero, it is deleted - and that's done inline with the code that caused the decrement. The garbage collector is there to handle the case where an object is unreachable but its reference count is not zero. Its not that every circular reference needs a garbage collection. It matters how that reference is unwound.
>>> x = [1, 2, 3]
We created a list and bound it to x. The list itself is just an anonymous object in memory, but currently reachable by x. Lets call the actual object the list.
>>> x.append(x) # create reference cycle
>>> print(x)
[1, 2, 3, [...]]
>>> sys.getrefcount(x)
3
At this point, the list has references from x, its own 4th element and the getrefcount parameter (which disappears when the function returns, leaving 2 references).
>>> del x[3]
del doesn't actually delete things. It unbinds objects. In that case, the list's __delitem__ function is called unbinding the list from its own 4th element, leaving a single reference.
>>> del x
This time del unbinds the list from x and removes x from the namespace. The reference count went to zero and the list was deleted. The garbage collector was never envolved.
Now lets mix it up.
>>> x = [1, 2, 3]
>>> x.append(x) # create reference cycle
We have 2 references to the list. But this time just
>>> del x
del unbinds the list from x and decrements its reference count to 1. That's not enough to delete the list. Now we have a problem. the list is no longer assigned to any variable that we can get to. Its not in x because x doesn't exist any more. But its still in memory. That is what the garbage collector attempts to fix.
We can't just use del because there is no variable to do the del on.
Related
I found this line in the pip source:
sys.path[:] = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
As I understand the line above is doing the same as below:
sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
With one difference: in the first case sys.path still points to the same object in memory while in the second case sys.path points to the new list created from two existing.
Another one thing is that the first case is two times slower than second:
>>> timeit('a[:] = a + [1,2]', setup='a=[]', number=20000)
2.111023200035561
>>> timeit('a = a + [1,2]', setup='a=[]', number=20000)
1.0290934000513516
The reason as I think is that in the case of slice assignment objects from a (references to objects) are copied to a new list and then copied back to the resized a.
So what are the benefits of using a slice assignment?
Assigning to a slice is useful if there are other references to the same list, and you want all references to pick up the changes.
So if you do something like:
bar = [1, 2, 3]
foo = bar
bar[:] = [5, 4, 3, 2, 1]
print(foo)
this will print [5, 4, 3, 2, 1]. If you instead do:
bar = [5, 4, 3, 2, 1]
print(foo)
the output will be [1, 2, 3].
With one difference: in the first case sys.path still points to the same object in memory while in the second case sys.path points to the new list created from two existing.
Right: That’s the whole point, you’re modifying the object behind the name instead of the name. Thus all other names referring to the same object also see the changes.
Another one thing is that the first case is two times slower than second:
Not really. Slice assignment performs a copy. Performing a copy is an O(n) operation while performing a name assignment is O(1). In other words, the bigger the list, the slower the copy; whereas the name assignment always takes the same (short) time.
Your assumptions are very good!
In python a variable is a name that has been set to point to an object in memory, which in essence is what gives python the ability to be a dynamically typed language, i.e. you can have the same variable as a number, then reassign it to a string etc.
as shown here whenever you assign a new value to a variable, you are just pointing a name to a different object in memory
>>> a = 1
>>> id(a)
10968800
>>> a = 1.0
>>> id(a)
140319774806136
>>> a = 'hello world'
>>> id(a)
140319773005552
(in CPython the id refers to its address in memory).
Now for your question sys.path is a list, and a python list is a mutable type, thus meaning that the type itself can change, i.e.
>>> l = []
>>> id(l)
140319772970184
>>> l.append(1)
>>> id(l)
140319772970184
>>> l.append(2)
>>> id(l)
140319772970184
even though I modified the list by adding items, the list still points to the same object, and following the nature of python, a lists elements as well are only pointers to different areas in memory (the elements aren't the objects, the are only like variables to the objects held there) as shown here,
>>> l
[1, 2]
>>> id(l[0])
10968800
>>> l[0] = 3
>>> id(l[0])
10968864
>>> id(l)
140319772970184
After reassigning to l[0] the id of that element has changed. but once again the list hasn't.
Seeing that assigning to an index in the list only changes the places where lists elements where pointing, now you will understand that when I reassign l I don't reassign, I just change where l was pointing
>>> id(l)
140319772970184
>>> l = [4, 5, 6]
>>> id(l)
140319765766728
but if I reassign to all of ls indexes, then l stays the same object only the elements point to different places
>>> id(l)
140319765766728
>>> l[:] = [7, 8, 9]
>>> id(l)
140319765766728
That will also give you understanding on why it is slower, as python is reassigning the elements of the list, and not just pointing the list somewhere else.
One more little point if you are wondering about the part where the line finishes with
sys.path[:] = ... + sys.path
it goes in the same concept, python first creates the object on the right side of the = and then points the name on the left side to the new object, so when python is still creating the new list on the right side, sys.path is in essence the original list, and python takes all of its elements and then reassigns all of the newly created elements to the mappings in the original sys.paths addresses (since we used [:])
now for why pip is using [:] instead of reassigning, I don't really know, but I would believe that it might have a benefit of reusing the same object in memory for sys.path.
python itself also does it for the small integers, for example
>>> id(a)
10968800
>>> id(b)
10968800
>>> id(c)
10968800
a, b and c all point to the same object in memory even though all requested to create an 1 and point to it, since python knows that the small numbers are most probably going to be used a lot in programs (for example in for loops) so they create it and reuse it throughout.
(you might also find it being the case with filehandles that python will recycle instead of creating a new one.)
You are right, slice assignment will not rebind, and slice object is one type of objects in Python. You can use it to set and get.
In [1]: a = [1, 2, 3, 4]
In [2]: a[slice(0, len(a), 2)]
Out[2]: [1, 3]
In [3]: a[slice(0, len(a), 2)] = 6, 6
In [4]: a[slice(0, len(a), 1)] = range(10)
In [5]: a
Out[5]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [6]: a[:] = range(4)
In [7]: a
Out[7]: [0, 1, 2, 3]
the id of the object before and after should be same but its not happening. can someone explain me why a new object is being made.
L = [1, 2, 3]
print(id(L))
L = L + [4]
print(id(L))
both id's are that are being printed is different shouldn't it be the same its a mutable object. but when i use the append method of list to add 4 then the id is same
While lists are mutable, that doesn't mean that all operations involving them mutate the list in place. In your example, you're doing L + [4] to concatenate two lists. The list.__add__ method that gets invoked to implement that creates a new list, rather than modifying L. You're binding the old name L to the new list, so the value you get from id(L) changes.
If you want to mutate L while adding a value onto the end, there are several ways you can do it. L.append(4) is the obvious pick if you have just a single item to add. L.extend([4]) or the nearly synonymous L += [4] can work if the second list has more items in it than one.
Note that sometimes creating a new list will be what you want to do! If want to keep an unmodified reference to the old list, it may be desirable to create a new list with most of its contents at the same time you add new values. While you could copy the list then use one of the in place methods I mentioned above, you can also just use + to copy and add values to the list at the same time (just bind the result to a new name):
L = [1, 2, 3]
M = L + [4] # this is more convenient than M = list(L); M.append(4)
print(L) # unchanged, still [1, 2, 3]
print(M) # new list [1, 2, 3, 4]
its a mutable object
yes, you can change the value without creating a new object. But with the +, you are creating a new object.
To mute a mutable value, use methods (such as append) or set items (a[0] = ...). As soon as you have L=, the object formerly referenced by L is lost (if it doesn't have any other references) and L gets a new value.
This makes sense because, in fact, with L = L+[0], you are saying "calculate the value of L+[0] and assign it to L" not "add [0] to L".
I study programming languages and a quiz question and solution was this:
def foo(x):
x.append (3)
x = [8]
return x
x=[1, 5]
y= foo(x)
print x
print y
Why does this print as follows:
[1 5 3 ]
[8]
Why doesn't x equal to 8 ??
The other two answers are great. I suggest you to try id to get address.
See the following example
def foo(x):
x.append (3)
print "global",id(x)
x = [8]
print "local ",id(x)
return x
x=[1, 5]
print "global",id(x)
y= foo(x)
print "global",id(x)
print x
print y
And the output
global 140646798391920
global 140646798391920
local 140646798392928
global 140646798391920
[1, 5, 3]
[8]
As you can see, the address of the variable x remains same when you manipulate it but changes when you use =. Variable assignment inside a function makes the variable local to the function
You have a lot of things going on there. So let's go step by step.
x = [1,5] You are assigning a list of 1,5 to x
y=foo(x) You are calling foo and passing in x and assigning whatever gets returned from foo
inside foo you call x.append(3) which appends 3 to the list that was passed in.
you then set x = [8] which is now a reference to a local variable x which then gets returned from foo ultimately setting y = [8]
Object References
The key to understanding this is that Python passes variables around using object references. These are similar to pointers in a language like c++, but are different in very key ways.
When an assignment is made (using the assignment operator, =):
x = [1, 5]
Actually TWO things have been created. First is the object itself, which is the list [1, 5]. This object is a separate entity from the second thing, which is the (global) object reference to the object.
Object(s) Object Reference(s)
[1, 5] x (global) <--- New object created
In python, objects are passed into functions by object reference; they are not passed "by reference" or "by value" like in c++. This means that when x is passed into the foo function, there is a new, local object reference to the object created.
Object(s) Object Reference(s)
[1, 5] x (global), x (local to foo) <--- Now two object references
Now inside of foo we call x.append(3), which directly changes the object (referred to by the foo-local x object reference) itself:
Object(s) Object Reference(s)
[1, 5, 3] x (global), x (local to foo) <--- Still two object references
Next, we do something different. We assign the local-foo x object reference (or re-assign, since the object reference already existed previously) to a NEW LIST.
Object(s) Object Reference(s)
[1, 5, 3] x (global) <--- One object reference remains
[8] x (local to foo) <--- New object created
Notice that the global object reference, x, is still there! It has not been impacted. We only re-assigned the local-foo x object reference to a NEW list.
And finally, it should be clear that when the function returns, we have this:
Object(s) Object Reference(s)
[1, 5, 3] x (global) <--- Unchanged
[8] y (global), x (local to foo) <--- New object reference created
Sidebar
Notice that the local-foo x object reference is still there! This is important behavior to understand, because if we do something like this:
def f(a = []):
a.append(1)
return a
f()
f()
f()
We will NOT get:
[1]
[1]
[1]
Instead, we will get:
[1]
[1, 1]
[1, 1, 1]
The statement a = [] is only evaluated ONCE by the interpreter when the program first runs, and that object reference never gets deleted (unless we delete the function itself).
As a result, when f() is called, local-f a is not changed back to []. It "remembers" its previous value; this is because that local object reference is still valid (i.e., it has not been deleted), and therefore the object does not get garbage collected between function calls.
Contrast with Pointers
One of the ways object references are different from pointers is in assignment. If Python used actual pointers, you would expect behavior such as the following:
a = 1
b = a
b = 2
assert a == 2
However, this assertion produces an error. The reason is that b = 2 does NOT impact the object "pointed to" by the object reference, b. It creates a NEW object (2) and re-assigns b to that object.
Another way object references are different from pointers is in deletion:
a = 1
b = a
del a
assert b is None
This assertion also produces an error. The reason is the same as in the example above; del a does NOT impact the object "pointed to" by the object reference, b. It simply deletes the object reference, a. The object reference, b, and the object it points to, 1, are not impacted.
You might be asking, "Well then how do I delete the actual object?" The answer is YOU CAN'T! You can only delete all the references to that object. Once there are no longer any references to an object, the object becomes eligible for garbage collection and it will be deleted for you (although you can force this to happen using the gc module). This feature is known as memory management, and it is one of the primary strengths of Python, and it is also one of the reasons why Python uses object references in the first place.
Mutability
Another subject that needs to be understood is that there are two types of objects: mutable, and immutable. Mutable objects can be changed, while immutable objects cannot be changed.
A list, such as [1, 5], is mutable. A tuple or int, on the other hand, is immutable.
The append Method
Now that all of this is clear, we should be able to intuit the answer to the question "How does append work in Python?" The result of the append() method is an operation on the mutable list object itself. The list is changed "in place", so to speak. There is not a new list created and then assigned to the foo-local x. This is in contrast to the assignment operator =, which creates a NEW object and assigns that object to an object reference.
The append function modifies the x that was passed into the function, whereas assigning something new to x changed the locally scoped value and returned it.
The scope of x inside foo is specific to the function and is independent from the main calling context. x inside foo starts out referencing the same object as x in the main context because that's the parameter that was passed in, but the moment you use the assignment operator to set it to [8] you have allocated a new object to which x inside foo points, which is now totally different from x in the main context. To illustrate further, try changing foo to this:
def foo(x):
print("Inside foo(): id(x) = " + str(id(x)))
x.append (3)
print("After appending: id(x) = " + str(id(x)))
x = [8]
print("After setting x to [8], id(x) = " + str(id(x)))
return x
When I executed, I got this output:
Inside foo(): id(x) = 4418625336
After appending: id(x) = 4418625336
After setting x to [8], id(x) = 4418719896
[1, 5, 3]
[8]
(the IDs you see will vary, but the point will still be clear I hope)
You can see that append just mutates the existing object - no need to allocate a new one. But once the = operator executes, a new object gets allocated (and eventually returned to the main context, where it is assigned to y).
This question already has answers here:
Why does list(my_list) modify the object?
(2 answers)
Closed 9 years ago.
So I came across something very weird in python. I tried adding a reference to the list to itself. The code might help demonstrate what I am saying better than I can express. I am using IDLE editor(interactive mode).
>>>l=[1,2,3]
>>>l.append(l)
>>>print(l)
[1,2,3,[...]]
>>>del l[:-1]
>>>print(l)
[[...]]
So far the output is as expected. But when I do this.
y=l[:]
print(y)
To me it seems that the output should be
[[...]]
But it is
[[[...]]]
Apparently instead of creating a copy of the list, it puts a reference to the list in y.
y[0] is l returns True. I can't seem to find a good explanation for this. Any ideas?
The difference is only in the way the list is displayed. I.e. the value of y is exactly what you'd expect.
The difference in the way the lists are displayed results from the fact that, unlike l, y is not a self-referencing list:
l[0] is l
=> True
y[0] is y
=> False
y is not self-referencing, because y does not reference y. It references l, which is self-referencing.
Therefor, the logic which translates the list to a string detects the potential infinite-recursion one level deeper when working on y, than on l.
This is perfectly expected. When Python prints recursive lists, it checks that the list it is printing hasn't yet been encountered and if it has prints [...]. An important point to understand is that it doesn't test for equality (as in ==) but for identity (as in is). Therefore,
when you print l = [l]. You have l[0] is l returns True and therefore it prints [[...]].
now y = l[:] makes a copy of l and therefore y is l returns False. So here is what happens. It starts printing y so it prints [ ??? ] where ??? is replaced by the printing of y[0]. Now y[0] is l and is not y. So it prints [[???]] with ??? replaced by y[0][0]. Now y[0][0] is l which has already been encountered. So it prints [...] for it giving finally [[[...]]].
You need to have a full copy of the objects. You need to use copy.deepcopy and you would see the expected results.
>>> from copy import deepcopy
>>> l=[1,2,3]
>>> l.append(l)
>>> print(l)
[1, 2, 3, [...]]
>>> del l[:-1]
>>> print(l)
[[...]]
>>> y=deepcopy(l)
>>> print(y)
[[...]]
>>> y[0] is l
False
>>>
When you use the slice notation to copy the list, the inner references are retained which cause the behavior that you observe.
Slicing generates list of items. There is only one item - list "l". So, we have new list of one element - list "l".
Why does the variable L gets manipulated in the sorting(L) function call? In other languages, a copy of L would be passed through to sorting() as a copy so that any changes to x would not change the original variable?
def sorting(x):
A = x #Passed by reference?
A.sort()
def testScope():
L = [5,4,3,2,1]
sorting(L) #Passed by reference?
return L
>>> print testScope()
>>> [1, 2, 3, 4, 5]
Long story short: Python uses pass-by-value, but the things that are passed by value are references. The actual objects have 0 to infinity references pointing at them, and for purposes of mutating that object, it doesn't matter who you are and how you got a reference to the object.
Going through your example step by step:
L = [...] creates a list object somewhere in memory, the local variable L stores a reference to that object.
sorting (strictly speaking, the callable object pointed to be the global name sorting) gets called with a copy of the reference stored by L, and stores it in a local called x.
The method sort of the object pointed to by the reference contained in x is invoked. It gets a reference to the object (in the self parameter) as well. It somehow mutates that object (the object, not some reference to the object, which is merely more than a memory address).
Now, since references were copied, but not the object the references point to, all the other references we discussed still point to the same object. The one object that was modified "in-place".
testScope then returns another reference to that list object.
print uses it to request a string representation (calls the __str__ method) and outputs it. Since it's still the same object, of course it's printing the sorted list.
So whenever you pass an object anywhere, you share it with whoever recives it. Functions can (but usually won't) mutate the objects (pointed to by the references) they are passed, from calling mutating methods to assigning members. Note though that assigning a member is different from assigning a plain ol' name - which merely means mutating your local scope, not any of the caller's objects. So you can't mutate the caller's locals (this is why it's not pass-by-reference).
Further reading: A discussion on effbot.org why it's not pass-by-reference and not what most people would call pass-by-value.
Python has the concept of Mutable and Immutable objects. An object like a string or integer is immutable - every change you make creates a new string or integer.
Lists are mutable and can be manipulated in place. See below.
a = [1, 2, 3]
b = [1, 2, 3]
c = a
print a is b, a is c
# False True
print a, b, c
# [1, 2, 3] [1, 2, 3] [1, 2, 3]
a.reverse()
print a, b, c
# [3, 2, 1] [1, 2, 3] [3, 2, 1]
print a is b, a is c
# False True
Note how c was reversed, because c "is" a. There are many ways to copy a list to a new object in memory. An easy method is to slice: c = a[:]
It's specifically mentioned in the documentation the .sort() function mutates the collection. If you want to iterate over a sorted collection use sorted(L) instead. This provides a generator instead of just sorting the list.
a = 1
b = a
a = 2
print b
References are not the same as separate objects.
.sort() also mutates the collection.