Arrays in Python are assigned by value or by reference? - python

I am wondering why when I delete the original array it affects the copied array:
arr = [1,2,3]
arr1 = arr
del arr[:]
print(arr1) #this prints []
but when I modify the elements of the original array there is no effect on the copied array:
arr = [1,2,3]
arr1 = arr
arr = [4,5,6]
print(arr1) #this prints [1,2,3]
If someone can explain this issue I appreciate your help, thanks in advance.

You did not modify the elements of the original array, but rather re-assigned a new list to the arr variable. Your intuition of thinking changes to elements would be reflected in arr1 if you properly accessed its elements is indeed true as lists are mutable in Python. For instance,
arr = [1,2,3]
arr1 = arr
arr[1] = 4
print(arr1) #this prints [1,4,3]

Objects in python would be considered to be passed by reference. It's a bit different than that, however.
arr = [1, 2, 3]
This statement does two things. First it creates a list object in memory; second it points the "arr" label to this object.
arr1 = arr
This statement creates a new label "arr1" and points it to the same list object pointed to by arr.
Now in your original code you did this:
del arr[:]
This deleted the elements of the list object and now any label pointing to it will point to an empty list. In your second batch of code you did this:
arr = [4, 5, 6]
This created a new list object in memory, and pointed the "arr" label to it. Now you have two list objects in memory, each being pointed to by two different labels. I just checked on my console, and arr points to [4,5,6] and arr1 to [1,2,3].
Here is a good post about it: http://robertheaton.com/2014/02/09/pythons-pass-by-object-reference-as-explained-by-philip-k-dick/

There are several ways to copy an array, but
arr1 = arr
is not one of them (in C-speak, that is just an aliased pointer). Try one of
arr1 = arr[:] # slice that includes all elements, i.e. a shallow copy
arr1 = copy.copy(arr) # from the copy module, does a shallow copy
arr1 = copy.deepcopy(arr) # you guessed it.. ;-)
after any of those you will see that changes to arr1 and arr are independent (of course if you're using a shallow copy then the items in the list will be shared).

You did not do a full copy of the list, instead you just re-assigned the variable name.
In order to make a deep-copy of the list do the following:
arr1 = [x for x in arr]

Related

The rules of list references?

I got really confused on the list reference in python. Please help me understand. a simple case as below:
arr1 = []
arr2 = [1, 2]
arr1.append(arr2)
#arr2[0] = 5
arr2 = [6]
print(arr1)
So after append arr2 to arr1 without deep copy, to my understanding any change on arr2 would be reflected in arr1. However, only changing components like arr2[0] = 5 updates the arr1, while arr2 = [6] won't. Any reason why?
to my understanding any change on arr2 would be reflected in arr1
This is true for mutation, but assigning a new object to a label does not mutate the object, you are going to create a new list object [6] in memory and assign it to that label. arr2 now points to this new object(with different id()) not the one stored in arr1.
List objects are mutable so you can mutate them with lets say .append() method. In this case, any change to arr2 using .append() will reflect the list stored in arr1
arr1 = []
arr2 = [1, 2]
arr1.append(arr2)
arr2.append(6)
print(arr1)
Anytime you want to check if they are the same objects or not, simply print ids before and after. In case of mutation:
print(id(arr1[0])) # 2157378023744
print(id(arr2)) # 2157378023744
When you execute arr2 = [6], you create a new list object that is now referenced by arr2. The reference in the list still points to the initial list, so you cannot use the label arr2 anymore to change the contents of the list referenced in arr1.

What is the need of ellipsis[...] while modifying array values in numpy?

import numpy as np
a = np.arange(0,60,5)
a = a.reshape(3,4)
for x in np.nditer(a, op_flags = ['readwrite']):
x[...] = 2*x
print 'Modified array is:'
print a
In the above code, why can't we simply write x=2*x instead of x[...]=2*x?
No matter what kind of object we were iterating over or how that object was implemented, it would be almost impossible for x = 2*x to do anything useful to that object. x = 2*x is an assignment to the variable x; even if the previous contents of the x variable were obtained by iterating over some object, a new assignment to x would not affect the object we're iterating over.
In this specific case, iterating over a NumPy array with np.nditer(a, op_flags = ['readwrite']), each iteration of the loop sets x to a zero-dimensional array that's a writeable view of a cell of a. x[...] = 2*x writes to the contents of the zero-dimensional array, rather than rebinding the x variable. Since the array is a view of a cell of a, this assignment writes to the corresponding cell of a.
This is very similar to the difference between l = [] and l[:] = [] with ordinary lists, where l[:] = [] will clear an existing list and l = [] will replace the list with a new, empty list without modifying the original. Lists don't support views or zero-dimensional lists, though.

Changing list elements in shallow copy

I have one question about list shallow copy.
In both examples, I modified one element of the list, but in example 1, list b changed, while in example 2, list d is not changed. I am confused since in both examples, I modified an element of the list.
What's the difference?
Example 1:
a=[1,2,[3,5],4]
b=list(a)
a[1]=0
print(a) # [1, 0, [3, 5], 4]
print(b) # [1, 2, [3, 5], 4]
Example 2:
c=[1,2,[3,5],4]
d=list(c)
c[2][0]=0
print(c) # [1, 2, [0, 5], 4]
print(d) # [1, 2, [0, 5], 4]
A shallow copy means that you get a new list but the elements are the same. So both lists have the same first element, second element, etc.
If you add, remove, or replace a value from the shallow copied list that change is not reflected in the original (and vise-versa) because the shallow copy created a new list. However if you change an element in either that change is visible in both because both lists reference the same item. So the inner list is actually shared between both the new list and the old list and if you change it, that change is visible in both.
Note that you actually didn't change an element in either example, you replace an element of the list in the first example and in the second example, you replace an element of an element of your list.
I'm currently using graphviz a lot so let me add some images to illustrate this:
The shallow copy means you get a new list but the objects stored in the list are the same:
If you replace an element in any of these the corresponding element will just reference a new item (your first example). See how one list references the two and the other the zero:
While a change to an referenced item will change that item and every object that references that item will see the change:
[1. = copies the reference of object, hence any changes in either list, reflects in another
b=list(a) or b=a.copy() -> do the same work.
That is it copies the reference of only the individual objects i.e like b[0]=a[0] and b2=a2 and so on. With int, string etc, it's like if x = 10 and y = x and changing the value of 'x' or 'y' won't affect the other. This is what happens for the remaining elements of the a and b when you do a shallow copy.
So as in your question when doing b=list(a) and a[1]=0 using a shallow copy behaves as explained above and hence the changes are not reflected in both the list . But the nested listed acts as list assignment like a=[1,2,3] and b=a and making a2=3 will change b2 to 3 as well, i.e.changes in a or b effect both (same as in case 1 above). So this is why in case of a list with in a list any changes reflects in both the list. As in your example doing d=list(c) (here when copying d[2]=c[2] this is similar to list assignment i.e. the reference is copied and in case of list assignment changes are reflected in both so changes to d2 or c2 is reflected in both list) so doing c[2][0] = 0 will also change d[2][0] to zero.
Try the code at http://www.pythontutor.com/visualize.html#mode=edit
to understand better
a=[1,2,"hello",[3,4],5]
b=a
c=a.copy()
a[0]=2
a[3][0]=6
In the both examples, you are creating a shallow copy of the list. The shallow copies essentially copies the aliases to all elements in the first list to the second list.
So you have copied the reference to an [int, int, list, int]. The int elements are immutable, but the list element is mutable. So the third elements both point to the same object in Python's memory. Modifying that object modifies all references to it.

What is the meaning of arr[:] in assignment in numpy?

I occasionally use numpy, and I'm trying to become smarter about how I vectorize operations. I'm reading some code and trying to understand the semantics of the following:
arr_1[:] = arr_2
In this case,
I understand that in arr[:, 0], we're selecting the first column of the array, but I'm confused about what the difference is between arr_1[:] = arr_2 and arr_1 = arr_2
Your question involves a mix of basic Python syntax, and numpy specific details. In many ways it is the same for lists, but not exactly.
arr[:, 0] returns the 1st column of arr (a view), arr[:,0]=10 sets the values of that column to 10.
arr[:] returns arr (alist[:] returns a copy of a list). arr[:]=arr2 performs an inplace replacement; changing the values of arr to the values of arr2. The values of arr2 will be broadcasted and copied as needed.
arr=arr2 sets the object that the arr variable is pointing to. Now arr and arr2 point to the same thing (whether array, list or anything else).
arr[...]=arr2 also works when copying all the data
Play about with these actions in an interactive session. Try variations in the shape of arr2 to see how values get broadcasted. Also check id(arr) to see the object that the variable points to. And arr.__array_interface__ to see the data buffer of the array. That helps you distinguish views from copies.
arr_1[:] = ... changes the elements of the existing list object that arr_1 refers to.
arr_1 = ... makes the name arr_1 refer to a different list object.
The main difference is what happens if some other name also referred to the original list object. If that's the case, then the former updates the thing that both names refer to; while the latter changes what one name refers to while leaving the other referring to the original thing.
>>> a = [0]
>>> b = a
>>> a[:] = [1]
>>> print(b)
[1] <--- note, change reflected by a and b
>>> a = [2]
>>> print(b)
[1] <--- but now a points at something else, so no change to b
Perhaps it is best to understand by using id to examine the memory location of each variable.
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
>>> id(arr1)
4595568512
>>> id(arr2)
4595566192
# Slice assignment
arr1[:] = arr2
>>> arr1
array([4, 5, 6])
>>> id(arr1) # The object still points to the same memory location of `arr1`.
4595568512
# Reassignment.
arr1 = arr2
>>> id(arr1) # The object is now pointing to the object located to where `arr2` points.
4595566192
Using arr_1[:] = arr_2 is a shortcut for arr_1.__setitem__(slice(None, None), arr_2). The reason that is used instead of arr_1 = arr_2 is when you use __setitem__, you are modifying arr_1, whereas when you say arr_1 = arr_2, you are redefining arr_1. Using __setitem__, therefore, will modify other references to the arr_1 object rather than just redefining arr_1.

Deepcopy on nested referenced lists created by list multiplication does not work

As much as I love Python, the reference and deepcopy stuff sometimes freaks me out.
Why does deepcopy not work here:
>>> import copy
>>> a = 2*[2*[0]]
>>> a
[[0, 0], [0, 0]]
>>> b = copy.deepcopy(a)
>>> b[0][0] = 1
>>> b
[[1, 0], [1, 0]] #should be: [[1, 0], [0, 1]]
>>>
I am using a numpy array as a workarround which I need later on anyway. But I really had hoped that if I used deepcopy I would not have to chase any unintended references any more. Are there any more traps where it does not work?
It doesn't work because you are creating an array with two references to the same array.
An alternative approach is:
[[0]*2 for i in range(2)]
Or the more explicit:
[[0 for j in range(2)] for i in range(2)]
This works because it creates a new array on each iteration.
Are there any more traps where it does not work?
Any time you have an array containing references you should be careful. For example [Foo()] * 2 is not the same as [Foo() for i in range(2)]. In the first case only one object is constructed and the array contains two references to it. In the second case, two separate objects are constructed.
It works exactly as you have expected.
a = 2*[2*[0]]
When you multiply [[0,0]] with 2 *, both elements of the new list will point to the SAME [0,0] list. a[0] and a[1] are the same list, because the reference is copied, not the data (which would be impossible). Changing the first element of one of them changes the first element of the other.
copy.deepcopy copies the list correctly, preserving unique objects.

Categories

Resources