I occasionally use numpy, and I'm trying to become smarter about how I vectorize operations. I'm reading some code and trying to understand the semantics of the following:
arr_1[:] = arr_2
In this case,
I understand that in arr[:, 0], we're selecting the first column of the array, but I'm confused about what the difference is between arr_1[:] = arr_2 and arr_1 = arr_2
Your question involves a mix of basic Python syntax, and numpy specific details. In many ways it is the same for lists, but not exactly.
arr[:, 0] returns the 1st column of arr (a view), arr[:,0]=10 sets the values of that column to 10.
arr[:] returns arr (alist[:] returns a copy of a list). arr[:]=arr2 performs an inplace replacement; changing the values of arr to the values of arr2. The values of arr2 will be broadcasted and copied as needed.
arr=arr2 sets the object that the arr variable is pointing to. Now arr and arr2 point to the same thing (whether array, list or anything else).
arr[...]=arr2 also works when copying all the data
Play about with these actions in an interactive session. Try variations in the shape of arr2 to see how values get broadcasted. Also check id(arr) to see the object that the variable points to. And arr.__array_interface__ to see the data buffer of the array. That helps you distinguish views from copies.
arr_1[:] = ... changes the elements of the existing list object that arr_1 refers to.
arr_1 = ... makes the name arr_1 refer to a different list object.
The main difference is what happens if some other name also referred to the original list object. If that's the case, then the former updates the thing that both names refer to; while the latter changes what one name refers to while leaving the other referring to the original thing.
>>> a = [0]
>>> b = a
>>> a[:] = [1]
>>> print(b)
[1] <--- note, change reflected by a and b
>>> a = [2]
>>> print(b)
[1] <--- but now a points at something else, so no change to b
Perhaps it is best to understand by using id to examine the memory location of each variable.
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
>>> id(arr1)
4595568512
>>> id(arr2)
4595566192
# Slice assignment
arr1[:] = arr2
>>> arr1
array([4, 5, 6])
>>> id(arr1) # The object still points to the same memory location of `arr1`.
4595568512
# Reassignment.
arr1 = arr2
>>> id(arr1) # The object is now pointing to the object located to where `arr2` points.
4595566192
Using arr_1[:] = arr_2 is a shortcut for arr_1.__setitem__(slice(None, None), arr_2). The reason that is used instead of arr_1 = arr_2 is when you use __setitem__, you are modifying arr_1, whereas when you say arr_1 = arr_2, you are redefining arr_1. Using __setitem__, therefore, will modify other references to the arr_1 object rather than just redefining arr_1.
Related
I got really confused on the list reference in python. Please help me understand. a simple case as below:
arr1 = []
arr2 = [1, 2]
arr1.append(arr2)
#arr2[0] = 5
arr2 = [6]
print(arr1)
So after append arr2 to arr1 without deep copy, to my understanding any change on arr2 would be reflected in arr1. However, only changing components like arr2[0] = 5 updates the arr1, while arr2 = [6] won't. Any reason why?
to my understanding any change on arr2 would be reflected in arr1
This is true for mutation, but assigning a new object to a label does not mutate the object, you are going to create a new list object [6] in memory and assign it to that label. arr2 now points to this new object(with different id()) not the one stored in arr1.
List objects are mutable so you can mutate them with lets say .append() method. In this case, any change to arr2 using .append() will reflect the list stored in arr1
arr1 = []
arr2 = [1, 2]
arr1.append(arr2)
arr2.append(6)
print(arr1)
Anytime you want to check if they are the same objects or not, simply print ids before and after. In case of mutation:
print(id(arr1[0])) # 2157378023744
print(id(arr2)) # 2157378023744
When you execute arr2 = [6], you create a new list object that is now referenced by arr2. The reference in the list still points to the initial list, so you cannot use the label arr2 anymore to change the contents of the list referenced in arr1.
I found this line in the pip source:
sys.path[:] = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
As I understand the line above is doing the same as below:
sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
With one difference: in the first case sys.path still points to the same object in memory while in the second case sys.path points to the new list created from two existing.
Another one thing is that the first case is two times slower than second:
>>> timeit('a[:] = a + [1,2]', setup='a=[]', number=20000)
2.111023200035561
>>> timeit('a = a + [1,2]', setup='a=[]', number=20000)
1.0290934000513516
The reason as I think is that in the case of slice assignment objects from a (references to objects) are copied to a new list and then copied back to the resized a.
So what are the benefits of using a slice assignment?
Assigning to a slice is useful if there are other references to the same list, and you want all references to pick up the changes.
So if you do something like:
bar = [1, 2, 3]
foo = bar
bar[:] = [5, 4, 3, 2, 1]
print(foo)
this will print [5, 4, 3, 2, 1]. If you instead do:
bar = [5, 4, 3, 2, 1]
print(foo)
the output will be [1, 2, 3].
With one difference: in the first case sys.path still points to the same object in memory while in the second case sys.path points to the new list created from two existing.
Right: That’s the whole point, you’re modifying the object behind the name instead of the name. Thus all other names referring to the same object also see the changes.
Another one thing is that the first case is two times slower than second:
Not really. Slice assignment performs a copy. Performing a copy is an O(n) operation while performing a name assignment is O(1). In other words, the bigger the list, the slower the copy; whereas the name assignment always takes the same (short) time.
Your assumptions are very good!
In python a variable is a name that has been set to point to an object in memory, which in essence is what gives python the ability to be a dynamically typed language, i.e. you can have the same variable as a number, then reassign it to a string etc.
as shown here whenever you assign a new value to a variable, you are just pointing a name to a different object in memory
>>> a = 1
>>> id(a)
10968800
>>> a = 1.0
>>> id(a)
140319774806136
>>> a = 'hello world'
>>> id(a)
140319773005552
(in CPython the id refers to its address in memory).
Now for your question sys.path is a list, and a python list is a mutable type, thus meaning that the type itself can change, i.e.
>>> l = []
>>> id(l)
140319772970184
>>> l.append(1)
>>> id(l)
140319772970184
>>> l.append(2)
>>> id(l)
140319772970184
even though I modified the list by adding items, the list still points to the same object, and following the nature of python, a lists elements as well are only pointers to different areas in memory (the elements aren't the objects, the are only like variables to the objects held there) as shown here,
>>> l
[1, 2]
>>> id(l[0])
10968800
>>> l[0] = 3
>>> id(l[0])
10968864
>>> id(l)
140319772970184
After reassigning to l[0] the id of that element has changed. but once again the list hasn't.
Seeing that assigning to an index in the list only changes the places where lists elements where pointing, now you will understand that when I reassign l I don't reassign, I just change where l was pointing
>>> id(l)
140319772970184
>>> l = [4, 5, 6]
>>> id(l)
140319765766728
but if I reassign to all of ls indexes, then l stays the same object only the elements point to different places
>>> id(l)
140319765766728
>>> l[:] = [7, 8, 9]
>>> id(l)
140319765766728
That will also give you understanding on why it is slower, as python is reassigning the elements of the list, and not just pointing the list somewhere else.
One more little point if you are wondering about the part where the line finishes with
sys.path[:] = ... + sys.path
it goes in the same concept, python first creates the object on the right side of the = and then points the name on the left side to the new object, so when python is still creating the new list on the right side, sys.path is in essence the original list, and python takes all of its elements and then reassigns all of the newly created elements to the mappings in the original sys.paths addresses (since we used [:])
now for why pip is using [:] instead of reassigning, I don't really know, but I would believe that it might have a benefit of reusing the same object in memory for sys.path.
python itself also does it for the small integers, for example
>>> id(a)
10968800
>>> id(b)
10968800
>>> id(c)
10968800
a, b and c all point to the same object in memory even though all requested to create an 1 and point to it, since python knows that the small numbers are most probably going to be used a lot in programs (for example in for loops) so they create it and reuse it throughout.
(you might also find it being the case with filehandles that python will recycle instead of creating a new one.)
You are right, slice assignment will not rebind, and slice object is one type of objects in Python. You can use it to set and get.
In [1]: a = [1, 2, 3, 4]
In [2]: a[slice(0, len(a), 2)]
Out[2]: [1, 3]
In [3]: a[slice(0, len(a), 2)] = 6, 6
In [4]: a[slice(0, len(a), 1)] = range(10)
In [5]: a
Out[5]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [6]: a[:] = range(4)
In [7]: a
Out[7]: [0, 1, 2, 3]
I am wondering why when I delete the original array it affects the copied array:
arr = [1,2,3]
arr1 = arr
del arr[:]
print(arr1) #this prints []
but when I modify the elements of the original array there is no effect on the copied array:
arr = [1,2,3]
arr1 = arr
arr = [4,5,6]
print(arr1) #this prints [1,2,3]
If someone can explain this issue I appreciate your help, thanks in advance.
You did not modify the elements of the original array, but rather re-assigned a new list to the arr variable. Your intuition of thinking changes to elements would be reflected in arr1 if you properly accessed its elements is indeed true as lists are mutable in Python. For instance,
arr = [1,2,3]
arr1 = arr
arr[1] = 4
print(arr1) #this prints [1,4,3]
Objects in python would be considered to be passed by reference. It's a bit different than that, however.
arr = [1, 2, 3]
This statement does two things. First it creates a list object in memory; second it points the "arr" label to this object.
arr1 = arr
This statement creates a new label "arr1" and points it to the same list object pointed to by arr.
Now in your original code you did this:
del arr[:]
This deleted the elements of the list object and now any label pointing to it will point to an empty list. In your second batch of code you did this:
arr = [4, 5, 6]
This created a new list object in memory, and pointed the "arr" label to it. Now you have two list objects in memory, each being pointed to by two different labels. I just checked on my console, and arr points to [4,5,6] and arr1 to [1,2,3].
Here is a good post about it: http://robertheaton.com/2014/02/09/pythons-pass-by-object-reference-as-explained-by-philip-k-dick/
There are several ways to copy an array, but
arr1 = arr
is not one of them (in C-speak, that is just an aliased pointer). Try one of
arr1 = arr[:] # slice that includes all elements, i.e. a shallow copy
arr1 = copy.copy(arr) # from the copy module, does a shallow copy
arr1 = copy.deepcopy(arr) # you guessed it.. ;-)
after any of those you will see that changes to arr1 and arr are independent (of course if you're using a shallow copy then the items in the list will be shared).
You did not do a full copy of the list, instead you just re-assigned the variable name.
In order to make a deep-copy of the list do the following:
arr1 = [x for x in arr]
I am passing a single element of a list to a function. I want to modify that element, and therefore, the list itself.
def ModList(element):
element = 'TWO'
l = list();
l.append('one')
l.append('two')
l.append('three')
print l
ModList(l[1])
print l
But this method does not modify the list. It's like the element is passed by value. The output is:
['one','two','three']
['one','two','three']
I want that the second element of the list after the function call to be 'TWO':
['one','TWO','three']
Is this possible?
The explanations already here are correct. However, since I have wanted to abuse python in a similar fashion, I will submit this method as a workaround.
Calling a specific element from a list directly returns a copy of the value at that element in the list. Even copying a sublist of a list returns a new reference to an array containing copies of the values. Consider this example:
>>> a = [1, 2, 3, 4]
>>> b = a[2]
>>> b
3
>>> c = a[2:3]
>>> c
[3]
>>> b=5
>>> c[0]=6
>>> a
[1, 2, 3, 4]
Neither b, a value only copy, nor c, a sublist copied from a, is able to change values in a. There is no link, despite their common origin.
However, numpy arrays use a "raw-er" memory allocation and allow views of data to be returned. A view allows data to be represented in a different way while maintaining the association with the original data. A working example is therefore
>>> import numpy as np
>>> a = np.array([1, 2, 3, 4])
>>> a
array([1, 2, 3, 4])
>>> b = a[2]
>>> b
3
>>> b=5
>>> a
array([1, 2, 3, 4])
>>> c = a[2:3]
>>> c
array([3])
>>> c[0]=6
>>> a
array([1, 2, 6, 4])
>>>
While extracting a single element still copies by value only, maintaining an array view of element 2 is referenced to the original element 2 of a (although it is now element 0 of c), and the change made to c's value changes a as well.
Numpy ndarrays have many different types, including a generic object type. This means that you can maintain this "by-reference" behavior for almost any type of data, not only numerical values.
Python doesn't do pass by reference. Just do it explicitly:
l[1] = ModList(l[1])
Also, since this only changes one element, I'd suggest that ModList is a confusing name.
Python is a pass by value language hence you can't change the value by assignment in the function ModList. What you could do instead though is pass the list and index into ModList and then modify the element that way
def ModList(theList, theIndex) :
theList[theIndex] = 'TWO'
ModList(l, 1)
In many cases you can also consider to let the function both modify and return the modified list. This makes the caller code more readable:
def ModList(theList, theIndex) :
theList[theIndex] = 'TWO'
return theList
l = ModList(l, 1)
As much as I love Python, the reference and deepcopy stuff sometimes freaks me out.
Why does deepcopy not work here:
>>> import copy
>>> a = 2*[2*[0]]
>>> a
[[0, 0], [0, 0]]
>>> b = copy.deepcopy(a)
>>> b[0][0] = 1
>>> b
[[1, 0], [1, 0]] #should be: [[1, 0], [0, 1]]
>>>
I am using a numpy array as a workarround which I need later on anyway. But I really had hoped that if I used deepcopy I would not have to chase any unintended references any more. Are there any more traps where it does not work?
It doesn't work because you are creating an array with two references to the same array.
An alternative approach is:
[[0]*2 for i in range(2)]
Or the more explicit:
[[0 for j in range(2)] for i in range(2)]
This works because it creates a new array on each iteration.
Are there any more traps where it does not work?
Any time you have an array containing references you should be careful. For example [Foo()] * 2 is not the same as [Foo() for i in range(2)]. In the first case only one object is constructed and the array contains two references to it. In the second case, two separate objects are constructed.
It works exactly as you have expected.
a = 2*[2*[0]]
When you multiply [[0,0]] with 2 *, both elements of the new list will point to the SAME [0,0] list. a[0] and a[1] are the same list, because the reference is copied, not the data (which would be impossible). Changing the first element of one of them changes the first element of the other.
copy.deepcopy copies the list correctly, preserving unique objects.