According to this Official Documentation:
list[:]
creates a new list by shallow copy. I performed following experiments:
>>> squares = [1, 4, 9, 16, 25]
>>> new_squares = square[:]
>>> squares is new_squares
False
>>> squares[0] is new_squares[0]
True
>>> id(squares)
4468706952
>>> id(new_squares)
4468425032
>>> id(squares[0])
4466081856
>>> id(new_squares[0])
4466081856
All here look good! new_square and square are different object (list here), but because of shallow copy, they share the same content. However, the following results make me confused:
>>> new_squares[0] = 0
>>> new_squares
[0, 4, 9, 16, 25]
>>> squares
[1, 4, 9, 16, 25]
I update new_square[0] but square is not affected. I checked their ids:
>>> id(new_squares[0])
4466081824
>>> id(squares[0])
4466081856
You can find that the id of squares[0] keeps no change but the id of new_squares[0] changes. This is quite different from the shallow copy I have understood before.
Could anyone can explain it? Thanks!
You have a list object that represents a container of other objects. When you do a shallow copy, you create a new list object (as you see) that contains references to the same objects that the original list contained.
new_squares[0] = 0 is an assignment. You're saying "set a new object at the 0th index of the list". Well, the lists are now separate objects and you're flatly replacing the object held at an index of the copy. It wouldn't matter if the object at the 0th index was mutable either, since you're just replacing the reference that the list object holds.
If the list instead contained a mutable object and you were to modify that object in place without completely changing what object is stored in that index, then you would see the change across both lists. Not because the lists are in any way linked, but because they hold reference to a mutable object that you have now changed.
This can be illustrated below, where I can separately make modifications to the shallow-copied list, and also cause a mutable object to change across both lists, even when that mutable object is now at difference indices between the two.
# MAKING A CHANGE TO THE LIST
a = [1, {'c': 'd'}, 3, 4]
b = a[:]
b.insert(0, 0)
print(a)
print(b)
print()
# MODIFYING A MUTABLE OBJECT INSIDE THE LIST
a[1]['c'] = 'something_else'
print(a)
print(b)
list are mutables, integers are immutables
when you do:
squares = [1, 4, 9, 16, 25]
new_squares = square[:]
squares and new_squares have different ids
if you do:
[id(squares[i]) for i in range(len(squares))]
[id(new_squares[i]) for i in range(len(new_squares))]
you'll see the same id for each integer.
If you modify an integer with another value, you'll have a new id for this integer
Related
I found this line in the pip source:
sys.path[:] = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
As I understand the line above is doing the same as below:
sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
With one difference: in the first case sys.path still points to the same object in memory while in the second case sys.path points to the new list created from two existing.
Another one thing is that the first case is two times slower than second:
>>> timeit('a[:] = a + [1,2]', setup='a=[]', number=20000)
2.111023200035561
>>> timeit('a = a + [1,2]', setup='a=[]', number=20000)
1.0290934000513516
The reason as I think is that in the case of slice assignment objects from a (references to objects) are copied to a new list and then copied back to the resized a.
So what are the benefits of using a slice assignment?
Assigning to a slice is useful if there are other references to the same list, and you want all references to pick up the changes.
So if you do something like:
bar = [1, 2, 3]
foo = bar
bar[:] = [5, 4, 3, 2, 1]
print(foo)
this will print [5, 4, 3, 2, 1]. If you instead do:
bar = [5, 4, 3, 2, 1]
print(foo)
the output will be [1, 2, 3].
With one difference: in the first case sys.path still points to the same object in memory while in the second case sys.path points to the new list created from two existing.
Right: That’s the whole point, you’re modifying the object behind the name instead of the name. Thus all other names referring to the same object also see the changes.
Another one thing is that the first case is two times slower than second:
Not really. Slice assignment performs a copy. Performing a copy is an O(n) operation while performing a name assignment is O(1). In other words, the bigger the list, the slower the copy; whereas the name assignment always takes the same (short) time.
Your assumptions are very good!
In python a variable is a name that has been set to point to an object in memory, which in essence is what gives python the ability to be a dynamically typed language, i.e. you can have the same variable as a number, then reassign it to a string etc.
as shown here whenever you assign a new value to a variable, you are just pointing a name to a different object in memory
>>> a = 1
>>> id(a)
10968800
>>> a = 1.0
>>> id(a)
140319774806136
>>> a = 'hello world'
>>> id(a)
140319773005552
(in CPython the id refers to its address in memory).
Now for your question sys.path is a list, and a python list is a mutable type, thus meaning that the type itself can change, i.e.
>>> l = []
>>> id(l)
140319772970184
>>> l.append(1)
>>> id(l)
140319772970184
>>> l.append(2)
>>> id(l)
140319772970184
even though I modified the list by adding items, the list still points to the same object, and following the nature of python, a lists elements as well are only pointers to different areas in memory (the elements aren't the objects, the are only like variables to the objects held there) as shown here,
>>> l
[1, 2]
>>> id(l[0])
10968800
>>> l[0] = 3
>>> id(l[0])
10968864
>>> id(l)
140319772970184
After reassigning to l[0] the id of that element has changed. but once again the list hasn't.
Seeing that assigning to an index in the list only changes the places where lists elements where pointing, now you will understand that when I reassign l I don't reassign, I just change where l was pointing
>>> id(l)
140319772970184
>>> l = [4, 5, 6]
>>> id(l)
140319765766728
but if I reassign to all of ls indexes, then l stays the same object only the elements point to different places
>>> id(l)
140319765766728
>>> l[:] = [7, 8, 9]
>>> id(l)
140319765766728
That will also give you understanding on why it is slower, as python is reassigning the elements of the list, and not just pointing the list somewhere else.
One more little point if you are wondering about the part where the line finishes with
sys.path[:] = ... + sys.path
it goes in the same concept, python first creates the object on the right side of the = and then points the name on the left side to the new object, so when python is still creating the new list on the right side, sys.path is in essence the original list, and python takes all of its elements and then reassigns all of the newly created elements to the mappings in the original sys.paths addresses (since we used [:])
now for why pip is using [:] instead of reassigning, I don't really know, but I would believe that it might have a benefit of reusing the same object in memory for sys.path.
python itself also does it for the small integers, for example
>>> id(a)
10968800
>>> id(b)
10968800
>>> id(c)
10968800
a, b and c all point to the same object in memory even though all requested to create an 1 and point to it, since python knows that the small numbers are most probably going to be used a lot in programs (for example in for loops) so they create it and reuse it throughout.
(you might also find it being the case with filehandles that python will recycle instead of creating a new one.)
You are right, slice assignment will not rebind, and slice object is one type of objects in Python. You can use it to set and get.
In [1]: a = [1, 2, 3, 4]
In [2]: a[slice(0, len(a), 2)]
Out[2]: [1, 3]
In [3]: a[slice(0, len(a), 2)] = 6, 6
In [4]: a[slice(0, len(a), 1)] = range(10)
In [5]: a
Out[5]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [6]: a[:] = range(4)
In [7]: a
Out[7]: [0, 1, 2, 3]
After learning about lists and box-and-pointer diagrams, I decided to create random stuff for myself and test out my knowledge. I am going to use the words shallow copy and suspected shallow copies as I'm not really sure whether they are correct by definition. My queries are in the reasons provide for the behaviour of such code, please tell me whether I'm thinking soundly.
Code A
from copy import *
x=[1,[2,[3,[4]]]] #normal copy/hardcopy
a=x
v=list(x) #suspected shallow copy
y=x.copy() #shallow copy
z=deepcopy(x) #theoretical deep copy
w=x[:] #suspected shallow copy
def test():
print("Original:",x)
print("hardcopy:",a)
print("suspected shallow copy",v)
print("shallow copy",y)
print("deep copy:",z)
print("suspected shallow copy",w)
x[1]=x[1]+[4]
test()
Output A:
Original: [1, [2, [3, [4]], 4]]
hardcopy: [1, [2, [3, [4]], 4]]
suspected shallow copy [1, [2, [3, [4]]]]
shallow copy [1, [2, [3, [4]]]]
deep copy: [1, [2, [3, [4]]]]
suspected shallow copy [1, [2, [3, [4]]]]
Code B
a=(1,2,[1,2,3])
def shallow_copy(x):
tup=()
for i in x:
tup+=(i,)
return tup
def hardcopy(x):
return x
b=hardcopy(a)
c=shallow_copy(a)
a[2]+=[3]
Output B:
I see TypeError in IDLE here, but the mutation of the list element is still done, and across ALL a,b,c
Continuation from output B:
a[2][0]=a[2][0]+99
a,b,c
Output C:
((1, 2, [100, 2, 3, 3]), (1, 2, [100, 2, 3, 3]), (1, 2, [100, 2, 3, 3]))
Code D:
a=[1,2,(1,2,3)]
def shallow_copy(x):
tup=[]
for i in x:
tup+=[i]
return tup
def hardcopy(x):
return x
b=hardcopy(a)
c=shallow_copy(a)
d=a.copy()
a[2]=a[2]+(4,)
a,b,c,d
Output D:
[1, 2, (1, 2, 3, 4)], [1, 2, (1, 2, 3, 4)],
[1, 2, (1, 2, 3)], [1, 2, (1, 2, 3)]
From Output A, we observe the following:
1)For lists which have shallow copies, doing x[1]=x[1]+[4] does not affect the shallow copies. My reasons for the above could be
a) = followed by + does __add__ instead of __iadd__(which is +=), and doing __add__ should not modify the object, only changing the value for one pointer(x and its hardcopy in this case)
This is further supported in Output B but somehow contradicted in Output C, could be partly due to reason (b) below, but can't be too sure.
b) We executed this in the first layer(only 1 slice operator), maybe there's some kind of rule which prevents these elements from being modified.
This is supported by both Output B and Output C, though Output B might be argued to be in the first layer, think of it as increasing the elements in the 2nd layer, and it fits the above observation.
2)What is the reason why the TypeError appeared in Output B, but is still executed? I know that whether an Exception might be triggered is based on the final sequence you are actually changing(the list in this case), but why is there still TypeError: 'tuple' object does not support item assignment ?
I have presented my views for the above questions. I appreciate any thoughts(theoretical solutions preferably) on this question as I'm still relatively new to programming.
To answer question 1, which looks complex but whose answer is probably quite simple:
when you have a another name referencing the original object, you will see the changes in the original. Those changes will not reflect in other copies (being those either shallow or deep) if(!) you change the objects using the form x[1] = x[1] + [4]. This is because you are assigning a new object into x[1], instead of making an in-place change like in x[1].append(4).
You can check that with the id() function.
To answer your question 2, and adapted from the official docs:
let's make
a = (['hello'],)
then
a[0] += [' world']
this is the same as
a[0] = operator.iadd(a[0],[' world'])
The iadd changes the list in place, but then the assignment fails because you can't assign to a tuple (immutable type) index.
If you make
a[0] = a[0] + [' world']
the concatenation goes into a new list object, then the assignment to the tuple index fails too. But the new object gets lost. a[0] wasn't changed in place.
To clarify OP's comment, directly from the docs in here it says that
Many operations have an “in-place” version. Listed below are functions providing a more primitive access to in-place operators than the usual syntax does; for example, the statement x += y is equivalent to x = operator.iadd(x, y). Another way to put it is to say that z = operator.iadd(x, y) is equivalent to the compound statement z = x; z += y.
In those examples, note that when an in-place method is called, the
computation and assignment are performed in two separate steps. The
in-place functions listed below only do the first step, calling the
in-place method. The second step, assignment, is not handled.
As for your Output D:
Writing
b = hardcopy(a)
does nothing more than writing
b = a
really, b is a new name referencing the same object that a references.
This is because a is mutable and so a reference pointing to the original object is passed into local function name x. Returning x just returns the same reference into b.
That's why you see further changes in a reflected in b. Again you make a[2] a new different object tuple by assignment, so now a[2] and b[2] reference a new tuple (1,2,3,4), while c and d still reference the old tuple object. And now because they are tuples you can't change them in place, like lists.
As for the term "hardcopy", I wouldn't use it. It doesn't appear even once in official docs, and the mentions in Python SO questions beside this one, appear in other contexts. And it is ambiguous (contrary to "shallow" and "deep" which give a good clue for their meaning). I would think exactly the opposite (an object copy) for the term "hardcopy" you describe (an additional name/reference/pointer to the same object). Of course there are eventually many ways to say the same thing. We say "copy" because its shorter, and for immutables it doesn't matter if the copy happens or not (you can't change them anyway). For mutables saying "copy" usually means "shallow copy", because you have to "go further" in your code if you want a "deep copy".
I have one question about list shallow copy.
In both examples, I modified one element of the list, but in example 1, list b changed, while in example 2, list d is not changed. I am confused since in both examples, I modified an element of the list.
What's the difference?
Example 1:
a=[1,2,[3,5],4]
b=list(a)
a[1]=0
print(a) # [1, 0, [3, 5], 4]
print(b) # [1, 2, [3, 5], 4]
Example 2:
c=[1,2,[3,5],4]
d=list(c)
c[2][0]=0
print(c) # [1, 2, [0, 5], 4]
print(d) # [1, 2, [0, 5], 4]
A shallow copy means that you get a new list but the elements are the same. So both lists have the same first element, second element, etc.
If you add, remove, or replace a value from the shallow copied list that change is not reflected in the original (and vise-versa) because the shallow copy created a new list. However if you change an element in either that change is visible in both because both lists reference the same item. So the inner list is actually shared between both the new list and the old list and if you change it, that change is visible in both.
Note that you actually didn't change an element in either example, you replace an element of the list in the first example and in the second example, you replace an element of an element of your list.
I'm currently using graphviz a lot so let me add some images to illustrate this:
The shallow copy means you get a new list but the objects stored in the list are the same:
If you replace an element in any of these the corresponding element will just reference a new item (your first example). See how one list references the two and the other the zero:
While a change to an referenced item will change that item and every object that references that item will see the change:
[1. = copies the reference of object, hence any changes in either list, reflects in another
b=list(a) or b=a.copy() -> do the same work.
That is it copies the reference of only the individual objects i.e like b[0]=a[0] and b2=a2 and so on. With int, string etc, it's like if x = 10 and y = x and changing the value of 'x' or 'y' won't affect the other. This is what happens for the remaining elements of the a and b when you do a shallow copy.
So as in your question when doing b=list(a) and a[1]=0 using a shallow copy behaves as explained above and hence the changes are not reflected in both the list . But the nested listed acts as list assignment like a=[1,2,3] and b=a and making a2=3 will change b2 to 3 as well, i.e.changes in a or b effect both (same as in case 1 above). So this is why in case of a list with in a list any changes reflects in both the list. As in your example doing d=list(c) (here when copying d[2]=c[2] this is similar to list assignment i.e. the reference is copied and in case of list assignment changes are reflected in both so changes to d2 or c2 is reflected in both list) so doing c[2][0] = 0 will also change d[2][0] to zero.
Try the code at http://www.pythontutor.com/visualize.html#mode=edit
to understand better
a=[1,2,"hello",[3,4],5]
b=a
c=a.copy()
a[0]=2
a[3][0]=6
In the both examples, you are creating a shallow copy of the list. The shallow copies essentially copies the aliases to all elements in the first list to the second list.
So you have copied the reference to an [int, int, list, int]. The int elements are immutable, but the list element is mutable. So the third elements both point to the same object in Python's memory. Modifying that object modifies all references to it.
I thought that if you assign a variable to another list, it's not copied, but it points to the same location. That's why deepcopy() is for. This is not true with Python 2.7: it's copied.
>>> a=[1,2,3]
>>> b=a
>>> b=b[1:]+b[:1]
>>> b
[2, 3, 1]
>>> a
[1, 2, 3]
>>>
>>> a=(1,2,3)
>>> b=a
>>> b=b[1:]+b[:1]
>>> a
(1, 2, 3)
>>> b
(2, 3, 1)
>>>
What am I missing?
This line changes what b points to:
b=b[1:]+b[:1]
List or tuple addition creates a new list or tuple, and the assignment operator makes b refer to that new list while leaving a referring to the original list or tuple.
Slicing a list or tuple also creates a new object, so that line creates three new objects - one for each slice, and then one for the sum. b = a + b would be a simpler example to demonstrate that addition creates a new object.
You will sometimes see c = b[:] as a way to shallow copy a list, making use of the fact that slicing creates a new object.
When you do b=b[1:]+b[:1] you first create a new object of two b slices and then assign b to reference that object. The same is for both list and tuple cases
Im trying to make a nested list work but the problem is that whenever i append a variable it is the same as the first one.
array = [[1,0]]
index = 1
for stuff in list:
array.insert(0,array[0])
array[0][0]+=1
index += 1
if index == 5:
break
print(array)
This returns [[5, 0], [5, 0], [5, 0], [5, 0], [5, 0]]
The weird thing is if i were to make the list into a int it would work.
array = [1]
index = 1
for stuff in array:
array.insert(0,array[0])
array[0]+=1
index += 1
if index == 5:
break
print(array)
This one returns [5, 4, 3, 2, 1]
For the program i am writing i need to remember two numbers. Should i just give up on making it a list or should i make it into two ints or even a tuple? Also is it even possible to do with lists?
I changed list into array same concept though
That is because of this line
list.insert(list[0])
This always refers the list[0] and refereed in all the inserts which you did in the for loop.
And list of integers and list of lists, behave differently.
Also, mention your expected output.
Just to follow up with an explanation from the docs:
Assignment statements in Python do not copy objects, they create
bindings between a target and an object.
So whenever you want to copy objects that contain other objects, like your list contains integers or how a class may contain other members, you should know about this difference (also from the docs):
A shallow copy constructs a new compound object and then (to the extent possible) inserts references into it to the objects found in
the original.
A deep copy constructs a new compound object and then, recursively, inserts copies into it of the objects found in the original.
To achieve a deep copy, as you tried to do in this situation, you can either use Python's built-in list copy method, if you're on Python 3.3 or higher:
deepcopy = list.copy()
Or use the copy module for lower Python versions, which includes a copy.deepcopy() function that returns a deep-copy of a list (or any other compound object).