Numpy Matrix inside Python class exhibiting linked behaviour? - python

If you make a class as such in Python:
import numpy as np
class Foo:
def __init__(self, data):
self.data = data
self.data_copy = self.copy(self.data)
def copy(self, data):
a = []
for e in data:
a.append(e)
return a
def change(self, val):
for i in range(0, len(self.data_copy)):
self.data_copy[i] += val
And then create an instance of the class such as:
a = Foo([np.matrix([1,2,3]), np.matrix([5,6,7])])
Now if we call the a.change(np.matrix([5,5,2])) function, which should only modify the self.data_copy list, the self.data list also updates with the changes. It appears, even after making a new list, the Numpy matrices in the two lists remain linked.
This is a nice feature in some respects, but does not work had I passed in an ordinary Python list of numbers. Is this a bug, or just a side-effect of how Numpy matrices are copied? And if so, what's the best way to replicate the behaviour with ordinary Python lists?

When you make your "copy", you're just making a new list that contains the same objects as the old list. That is, when you iterate through data you're iterating through references to the objects in it, and when you append e you're appending a reference rather than a new or copied object. Thus any changes to those objects will be visible in any other list that references them. It seems like what you want is copies of the actual matrices. To do this, in your copy method, instead of appending e append something like numpy.array(e, copy=True). This will create true copies of the matrices and not just new references to the old ones.
More generally, Python objects are effectively always passed by reference. This doesn't matter for immutable objects (strings, integers, tuples, etc), but for lists, dictionaries, or user defined classes that can mutate, you will need to make explicit copies. Often the built in copy module, or simply constructing a new object directly from the old, is what you want to do.
Edit: I now see what you mean. I had slightly misunderstood your original question. You're referring to += mutating the matrix objects rather than truly being = self + other. This is simply a fact of how += works for most Python collection types. += is in fact a separate method, distinct from assigning the result of adding. You will still see this behavior with normal Python lists.
a = [1, 2, 3]
b = a
b += [4]
>>> print(a)
[1, 2, 3, 4]
You can see that the += is mutating the original object rather than creating a new one and setting b to reference it. However if you do:
b = b + [4]
>>> print(a)
[1, 2, 3]
>>> print(b)
[1, 2, 3, 4]
This will have the desired behavior. The + operator for collections (lists, numpy arrays) does indeed return a new object, however += usually just mutates the old one.

Related

Why does accessing list items via index in a function work when changing its value, but the iterator variable way doesn't? [duplicate]

This question already has answers here:
How do I operate on the actual object, not a copy, in a python for loop?
(3 answers)
Closed 2 years ago.
I am trying to increment the elements of a list by passing it into a increment() function that I have defined.
I have tried two ways to do this.
Accessing using the index.
# List passed to a function
def increment(LIST):
for i in range(len(LIST)):
LIST[i] += 1
return LIST
li = [1, 2, 3, 4]
li = increment(li)
print(li)
This outputs the desired result: [2, 3, 4, 5]
Accessing using iterator variables.
# List passed to a function
def increment(LIST):
for item in LIST:
item += 1
return LIST
li = [1, 2, 3, 4]
li = increment(li)
print(li)
This outputs: [1, 2, 3, 4]
I wish to know the reason behind this difference.
Python's in-place operators can be confusing. The "in-place" refers to the current binding of the object, not necessarily the object itself. Whether the object mutates itself or creates a new object for the in-place binding, depends on its own implementation.
If the object implements __iadd__, then the object performs the operation and returns a value. Python binds that value to the current variable. That's the "in-place" part. A mutable object may return itself whereas an immutable object returns a different object entirely. If the object doesn't implement __iadd__, python falls back to several other operators, but the result is the same. Whatever the object chooses to return is bound to the current variable.
In this bit of code
for item in LIST:
item += 1
a value of the list is bound to a variable called "item" on each iteration. It is still also bound to the list. The inplace add rebinds item, but doesn't do anything to the list. If this was an object that mutated itself with iadd, its still bound to the list and you'll see the mutated value. But python integers are immmutable. item was rebound to the new integer, but the original int is still bound to the list.
Which way any given object works, you kinda just have to know. Immutables like integers and mutables like lists are pretty straight forward. Packages that rely heavily on fancy meta-coding like pandas are all over the map.
The reasoning behind this is because integers are immutable in python. You are essentially creating a new integer when performing the operation item +=1
This post has more information on the topic
If you wished to update the list, you would need to create a new list or update the list entry.
def increment(LIST):
result = []
for item in LIST:
result.append(item+1)
return result
li = [1, 2, 3, 4]
li = increment(li)
print(li)

Id should be similar but it isnt can someone explain me why

the id of the object before and after should be same but its not happening. can someone explain me why a new object is being made.
L = [1, 2, 3]
print(id(L))
L = L + [4]
print(id(L))
both id's are that are being printed is different shouldn't it be the same its a mutable object. but when i use the append method of list to add 4 then the id is same
While lists are mutable, that doesn't mean that all operations involving them mutate the list in place. In your example, you're doing L + [4] to concatenate two lists. The list.__add__ method that gets invoked to implement that creates a new list, rather than modifying L. You're binding the old name L to the new list, so the value you get from id(L) changes.
If you want to mutate L while adding a value onto the end, there are several ways you can do it. L.append(4) is the obvious pick if you have just a single item to add. L.extend([4]) or the nearly synonymous L += [4] can work if the second list has more items in it than one.
Note that sometimes creating a new list will be what you want to do! If want to keep an unmodified reference to the old list, it may be desirable to create a new list with most of its contents at the same time you add new values. While you could copy the list then use one of the in place methods I mentioned above, you can also just use + to copy and add values to the list at the same time (just bind the result to a new name):
L = [1, 2, 3]
M = L + [4] # this is more convenient than M = list(L); M.append(4)
print(L) # unchanged, still [1, 2, 3]
print(M) # new list [1, 2, 3, 4]
its a mutable object
yes, you can change the value without creating a new object. But with the +, you are creating a new object.
To mute a mutable value, use methods (such as append) or set items (a[0] = ...). As soon as you have L=, the object formerly referenced by L is lost (if it doesn't have any other references) and L gets a new value.
This makes sense because, in fact, with L = L+[0], you are saying "calculate the value of L+[0] and assign it to L" not "add [0] to L".

Why deepcopy of list of integers returns the same integers in memory?

I understand the differences between shallow copy and deep copy as I have learnt in class. However the following doesn't make sense
import copy
a = [1, 2, 3, 4, 5]
b = copy.deepcopy(a)
print(a is b)
print(a[0] is b[0])
----------------------------
~Output~
>False
>True
----------------------------
Shouldn't print(a[0] is b[0]) evaluate to False as the objects and their constituent elements are being recreated at a different memory location in a deep copy? I was just testing this out as we had discussed this in class yet it doesn't seem to work.
It was suggested in another answer that this may be due to the fact Python has interned objects for small integers. While this statement is correct, it is not what causes that behaviour.
Let's have a look at what happens when we use bigger integers.
> from copy import deepcopy
> x = 1000
> x is deepcopy(x)
True
If we dig down in the copy module we find out that calling deepcopy with an atomic value defers the call to the function _deepcopy_atomic.
def _deepcopy_atomic(x, memo):
return x
So what is actually happening is that deepcopy will not copy an atomic value, but only return it.
By example this is the case for int, float, str, function and more.
The reason of this behavior is that Python optimize small integers so they are not actually in different memory location. Check out the id of 1, they are always the same:
>>> x = 1
>>> y = 1
>>> id(x)
1353557072
>>> id(y)
1353557072
>>> a = [1, 2, 3, 4, 5]
>>> id(a[0])
1353557072
>>> import copy
>>> b = copy.deepcopy(a)
>>> id(b[0])
1353557072
Reference from Integer Objects:
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)
Olivier Melançon's answer is the correct one if we take this as a mechanical question of how the deepcopy function call ends up returning references to the same int objects rather than copies of them. I'll take a step back and answer the question of why that is the sensible thing for deepcopy to do.
The reason we need to make copies of data structures - either deep or shallow copies - is so we can modify their contents without affecting the state of the original; or so we can modify the original while still keeping a copy of the old state. A deep copy is needed for that purpose when a data structure has nested parts which are themselves mutable. Consider this example, which multiplies every number in a 2D grid, like [[1, 2], [3, 4]]:
import copy
def multiply_grid(grid, k):
new_grid = copy.deepcopy(grid)
for row in new_grid:
for i in range(len(row)):
row[i] *= k
return new_grid
Objects such as lists are mutable, so the operation row[i] *= k changes their state. Making a copy of the list is a way to defend against mutation; a deep copy is needed here to make copies of both the outer list and the inner lists (i.e. the rows), which are also mutable.
But objects such as integers and strings are immutable, so their state cannot be modified. If an int object is 13 then it will stay 13, even if you multiply it by k; the multiplication results in a different int object. There is no mutation to defend against, and hence no need to make a copy.
Interestingly, deepcopy doesn't need to make copies of tuples if their components are all immutable*, but it does when they have mutable components:
>>> import copy
>>> x = ([1, 2], [3, 4])
>>> x is copy.deepcopy(x)
False
>>> y = (1, 2)
>>> y is copy.deepcopy(y)
True
The logic is the same: if an object is immutable but has nested components which are mutable, then a copy is needed to avoid mutation to the components of the original. But if the whole structure is completely immutable, there is no mutation to defend against and hence no need for a copy.
* As Kelly Bundy points out in the comments, deepcopy sometimes does make copies of deeply-immutable objects, for example it does generally make copies of frozenset instances. The principle is that it doesn't need to make copies of those objects; it is an implementation detail whether or not it does in some specific cases.

Nested list or Var list?

Im trying to make a nested list work but the problem is that whenever i append a variable it is the same as the first one.
array = [[1,0]]
index = 1
for stuff in list:
array.insert(0,array[0])
array[0][0]+=1
index += 1
if index == 5:
break
print(array)
This returns [[5, 0], [5, 0], [5, 0], [5, 0], [5, 0]]
The weird thing is if i were to make the list into a int it would work.
array = [1]
index = 1
for stuff in array:
array.insert(0,array[0])
array[0]+=1
index += 1
if index == 5:
break
print(array)
This one returns [5, 4, 3, 2, 1]
For the program i am writing i need to remember two numbers. Should i just give up on making it a list or should i make it into two ints or even a tuple? Also is it even possible to do with lists?
I changed list into array same concept though
That is because of this line
list.insert(list[0])
This always refers the list[0] and refereed in all the inserts which you did in the for loop.
And list of integers and list of lists, behave differently.
Also, mention your expected output.
Just to follow up with an explanation from the docs:
Assignment statements in Python do not copy objects, they create
bindings between a target and an object.
So whenever you want to copy objects that contain other objects, like your list contains integers or how a class may contain other members, you should know about this difference (also from the docs):
A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in
the original.
A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
To achieve a deep copy, as you tried to do in this situation, you can either use Python's built-in list copy method, if you're on Python 3.3 or higher:
deepcopy = list.copy()
Or use the copy module for lower Python versions, which includes a copy.deepcopy() function that returns a deep-copy of a list (or any other compound object).

Python Variable Scope (passing by reference or copy?)

Why does the variable L gets manipulated in the sorting(L) function call? In other languages, a copy of L would be passed through to sorting() as a copy so that any changes to x would not change the original variable?
def sorting(x):
A = x #Passed by reference?
A.sort()
def testScope():
L = [5,4,3,2,1]
sorting(L) #Passed by reference?
return L
>>> print testScope()
>>> [1, 2, 3, 4, 5]
Long story short: Python uses pass-by-value, but the things that are passed by value are references. The actual objects have 0 to infinity references pointing at them, and for purposes of mutating that object, it doesn't matter who you are and how you got a reference to the object.
Going through your example step by step:
L = [...] creates a list object somewhere in memory, the local variable L stores a reference to that object.
sorting (strictly speaking, the callable object pointed to be the global name sorting) gets called with a copy of the reference stored by L, and stores it in a local called x.
The method sort of the object pointed to by the reference contained in x is invoked. It gets a reference to the object (in the self parameter) as well. It somehow mutates that object (the object, not some reference to the object, which is merely more than a memory address).
Now, since references were copied, but not the object the references point to, all the other references we discussed still point to the same object. The one object that was modified "in-place".
testScope then returns another reference to that list object.
print uses it to request a string representation (calls the __str__ method) and outputs it. Since it's still the same object, of course it's printing the sorted list.
So whenever you pass an object anywhere, you share it with whoever recives it. Functions can (but usually won't) mutate the objects (pointed to by the references) they are passed, from calling mutating methods to assigning members. Note though that assigning a member is different from assigning a plain ol' name - which merely means mutating your local scope, not any of the caller's objects. So you can't mutate the caller's locals (this is why it's not pass-by-reference).
Further reading: A discussion on effbot.org why it's not pass-by-reference and not what most people would call pass-by-value.
Python has the concept of Mutable and Immutable objects. An object like a string or integer is immutable - every change you make creates a new string or integer.
Lists are mutable and can be manipulated in place. See below.
a = [1, 2, 3]
b = [1, 2, 3]
c = a
print a is b, a is c
# False True
print a, b, c
# [1, 2, 3] [1, 2, 3] [1, 2, 3]
a.reverse()
print a, b, c
# [3, 2, 1] [1, 2, 3] [3, 2, 1]
print a is b, a is c
# False True
Note how c was reversed, because c "is" a. There are many ways to copy a list to a new object in memory. An easy method is to slice: c = a[:]
It's specifically mentioned in the documentation the .sort() function mutates the collection. If you want to iterate over a sorted collection use sorted(L) instead. This provides a generator instead of just sorting the list.
a = 1
b = a
a = 2
print b
References are not the same as separate objects.
.sort() also mutates the collection.

Categories

Resources