I am confused about the slice operation.
>>> s = "hello world"
>>> y = s[::]
>>> id(s)
4507906480
>>> id(y)
4507906480 # they are the same - no new object was created
>>> z = s[:2]
>>> z
'he'
>>> id(z)
4507835488 # z is a new object
What allocation rule does slice operation follow?
For most built-in types, slicing is always a shallow copy... in the sense that modifying the copy will not modify the original. This means that for immutable types, an object counts as a copy of itself. The copy module also uses this concept of "copy":
>>> t = (1, 2, 3)
>>> copy.copy(t) is t
True
Objects are free to use whatever allocation strategy they choose, as long as they implement the semantics they document. y can be the same object as s, but z cannot, because s and z store different values.
Related
I have some confusions when playing with data type numpy array in Python.
Question 1
I execute the following scripts in python intepreter
>>> import numpy as np
>>> L = [1000,2000,3000]
>>> A = np.array(L)
>>> B = A
Then I check the following things:
>>> A is B
True
>>> id(A) == id(B)
True
>>> id(A[0]) == id(B[0])
True
That's fine. But some strange things happened then.
>>> A[0] is B[0]
False
But how can A[0] and B[0] be different things? They have the same id!
For List in python, we have
>>> LL = [1000,2000,3000]
>>> SS = LL
>>> LL[0] is SS[0]
True
The method to store numpy array is totally different with list? And we also have
>>> A[0] = 1001
>>> B[0]
1001
It seems that A[0] and B[0] is the identical objects.
Question2
I make a copy of A.
>>> C = A[:]
>>> C is A
False
>>> C[0] is A[0]
False
That is fine. A and C seem to be independent with each other. But
>>> A[0] = 1002
>>> C[0]
1002
It seems that A and C is not independent? I am totally confused.
You are asking two completely independent questions, so here's two answsers.
The data of Numpy arrays is internally stored as a contiguous C array. Each entry in the array is just a number. Python objects on the other hand require some housekeeping data, e.g. the reference count and a pointer to the type object. You can't simply have a raw pointer to a number in memory. For this reason, Numpy "boxes" a number in a Python object if you access an individual elemtent. This happens everytime you access an element, so even A[0] and A[0] are different objects:
>>> A[0] is A[0]
False
This is at the heart of why Numpy can store arrays in a more memory-efficient way: It does not store a full Python object for each entry, and only creates these objects on the fly when needed. It is optimised for vectorised operations on the array, not for individual element access.
When you execute C = A[:] you are creating a new view for the same data. You are not making a copy. You will then have two different wrapper objects, pointed to by A and C respectively, but they are backed by the same buffer. The base attribute of an array refers to the array object it was originally created from:
>>> A.base is None
True
>>> C.base is A
True
New views on the same data are particularly useful when combined with indexing, since you can get views that only include some slice of the original array, but are backed by the same memory.
To actually make a copy of an array, use the copy() method.
As a more general remark, you should not read too much into object identity in Python. In general, if x is y is true, you know that they are really the same object. However, if this returns false, they can still be two different proxies to the same object.
Two variables in Python have the same id:
a = 10
b = 10
a is b
>>> True
If I take two lists:
a = [1, 2, 3]
b = [1, 2, 3]
a is b
>>> False
according to this link Senderle answered that immutable object references have the same id and mutable objects like lists have different ids.
So now according to his answer, tuples should have the same ids - meaning:
a = (1, 2, 3)
b = (1, 2, 3)
a is b
>>> False
Ideally, as tuples are not mutable, it should return True, but it is returning False!
What is the explanation?
Immutable objects don't have the same id, and as a matter of fact this is not true for any type of objects that you define separately. Generally speaking, every time you define an object in Python, you'll create a new object with a new identity. However, for the sake of optimization (mostly) there are some exceptions for small integers (between -5 and 256) and interned strings, with a special length --usually less than 20 characters--* which are singletons and have the same id (actually one object with multiple pointers). You can check this like following:
>>> 30 is (20 + 10)
True
>>> 300 is (200 + 100)
False
>>> 'aa' * 2 is 'a' * 4
True
>>> 'aa' * 20 is 'a' * 40
False
And for a custom object:
>>> class A:
... pass
...
>>> A() is A() # Every time you create an instance you'll have a new instance with new identity
False
Also note that the is operator will check the object's identity, not the value. If you want to check the value you should use ==:
>>> 300 == 3*100
True
And since there is no such optimizational or interning rule for tuples or any mutable type for that matter, if you define two same tuples in any size they'll get their own identities, hence different objects:
>>> a = (1,)
>>> b = (1,)
>>>
>>> a is b
False
It's also worth mentioning that rules of "singleton integers" and "interned strings" are true even when they've been defined within an iterator.
>>> a = (100, 700, 400)
>>>
>>> b = (100, 700, 400)
>>>
>>> a[0] is b[0]
True
>>> a[1] is b[1]
False
* A good and detailed article on this: http://guilload.com/python-string-interning/
Immutable != same object.*
An immutable object is simply an object whose state cannot be altered; and that is all. When a new object is created, a new address will be assigned to it. As such, checking if the addresses are equal with is will return False.
The fact that 1 is 1 or "a" is "a" returns True is due to integer caching and string interning performed by Python so do not let it confuse you; it is not related with the objects in question being mutable/immutable.
*Empty immutable objects do refer to the same object and their isness does return true, this is a special implementation specific case, though.
Take a look at this code:
>>> a = (1, 2, 3)
>>> b = (1, 2, 3)
>>> c = a
>>> id(a)
178153080L
>>> id(b)
178098040L
>>> id(c)
178153080L
In order to figure out why a is c is evaluated as True whereas a is b yields False I strongly recommend you to run step-by-step the snippet above in the Online Python Tutor. The graphical representation of the objects in memory will provide you with a deeper insight into this issue (I'm attaching a screenshot).
According to the documentation, immutables may have same id but it is not guaranteed that they do. Mutables always have different ids.
https://docs.python.org/3/reference/datamodel.html#objects-values-and-types
Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed.
In previous versions of Python, tuples were assigned different IDs. (Pre 3.7)
As of Python 3.7+, two variables with the same tuple assigned may have the same id:
>>>a = (1, 2, 3)
>>>b = (1, 2, 3)
>>>a is b
True
Integers above 256 also have different ids:
>>>a = 123
>>>b = 123
>>>a is b
True
>>>
>>>a = 257
>>>b = 257
>>>a is b
False
Check below code..
tupils a and b are retaining their older references(ID) back when we have assigned their older values back. (BUT, THIS WILL NOT BE THE CASE WITH LISTS AS THEY ARE MUTABLE)
Initially a and b have same values ( (1,2) ), but they have difference IDs. After alteration to their values, when we reassign value (1,2) to a and b, they are now referencing to THEIR OWN same IDs (88264264 and 88283400 respectively).
>>> a = (1,2)
>>> b = (1,2)
>>> a , b
((1, 2), (1, 2))
>>> id(a)
88264264
>>> id(b)
88283400
>>> a = (3,4)
>>> b = (3,4)
>>> id(a)
88280008
>>> id(b)
88264328
>>> a = (1,2)
>>> b = (1,2)
>>> id(a)
88264264
>>> id(b)
88283400
>>> a , b
((1, 2), (1, 2))
>>> id(a) , id(b)
(88264264, 88283400)
>>>
**Check the link Why don't tuples get the same ID when assigned the same values?
also after reading this. Another case also been discussed here.
I thought that if you assign a variable to another list, it's not copied, but it points to the same location. That's why deepcopy() is for. This is not true with Python 2.7: it's copied.
>>> a=[1,2,3]
>>> b=a
>>> b=b[1:]+b[:1]
>>> b
[2, 3, 1]
>>> a
[1, 2, 3]
>>>
>>> a=(1,2,3)
>>> b=a
>>> b=b[1:]+b[:1]
>>> a
(1, 2, 3)
>>> b
(2, 3, 1)
>>>
What am I missing?
This line changes what b points to:
b=b[1:]+b[:1]
List or tuple addition creates a new list or tuple, and the assignment operator makes b refer to that new list while leaving a referring to the original list or tuple.
Slicing a list or tuple also creates a new object, so that line creates three new objects - one for each slice, and then one for the sum. b = a + b would be a simpler example to demonstrate that addition creates a new object.
You will sometimes see c = b[:] as a way to shallow copy a list, making use of the fact that slicing creates a new object.
When you do b=b[1:]+b[:1] you first create a new object of two b slices and then assign b to reference that object. The same is for both list and tuple cases
I understand the differences between shallow copy and deep copy as I have learnt in class. However the following doesn't make sense
import copy
a = [1, 2, 3, 4, 5]
b = copy.deepcopy(a)
print(a is b)
print(a[0] is b[0])
----------------------------
~Output~
>False
>True
----------------------------
Shouldn't print(a[0] is b[0]) evaluate to False as the objects and their constituent elements are being recreated at a different memory location in a deep copy? I was just testing this out as we had discussed this in class yet it doesn't seem to work.
It was suggested in another answer that this may be due to the fact Python has interned objects for small integers. While this statement is correct, it is not what causes that behaviour.
Let's have a look at what happens when we use bigger integers.
> from copy import deepcopy
> x = 1000
> x is deepcopy(x)
True
If we dig down in the copy module we find out that calling deepcopy with an atomic value defers the call to the function _deepcopy_atomic.
def _deepcopy_atomic(x, memo):
return x
So what is actually happening is that deepcopy will not copy an atomic value, but only return it.
By example this is the case for int, float, str, function and more.
The reason of this behavior is that Python optimize small integers so they are not actually in different memory location. Check out the id of 1, they are always the same:
>>> x = 1
>>> y = 1
>>> id(x)
1353557072
>>> id(y)
1353557072
>>> a = [1, 2, 3, 4, 5]
>>> id(a[0])
1353557072
>>> import copy
>>> b = copy.deepcopy(a)
>>> id(b[0])
1353557072
Reference from Integer Objects:
The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)
Olivier Melançon's answer is the correct one if we take this as a mechanical question of how the deepcopy function call ends up returning references to the same int objects rather than copies of them. I'll take a step back and answer the question of why that is the sensible thing for deepcopy to do.
The reason we need to make copies of data structures - either deep or shallow copies - is so we can modify their contents without affecting the state of the original; or so we can modify the original while still keeping a copy of the old state. A deep copy is needed for that purpose when a data structure has nested parts which are themselves mutable. Consider this example, which multiplies every number in a 2D grid, like [[1, 2], [3, 4]]:
import copy
def multiply_grid(grid, k):
new_grid = copy.deepcopy(grid)
for row in new_grid:
for i in range(len(row)):
row[i] *= k
return new_grid
Objects such as lists are mutable, so the operation row[i] *= k changes their state. Making a copy of the list is a way to defend against mutation; a deep copy is needed here to make copies of both the outer list and the inner lists (i.e. the rows), which are also mutable.
But objects such as integers and strings are immutable, so their state cannot be modified. If an int object is 13 then it will stay 13, even if you multiply it by k; the multiplication results in a different int object. There is no mutation to defend against, and hence no need to make a copy.
Interestingly, deepcopy doesn't need to make copies of tuples if their components are all immutable*, but it does when they have mutable components:
>>> import copy
>>> x = ([1, 2], [3, 4])
>>> x is copy.deepcopy(x)
False
>>> y = (1, 2)
>>> y is copy.deepcopy(y)
True
The logic is the same: if an object is immutable but has nested components which are mutable, then a copy is needed to avoid mutation to the components of the original. But if the whole structure is completely immutable, there is no mutation to defend against and hence no need for a copy.
* As Kelly Bundy points out in the comments, deepcopy sometimes does make copies of deeply-immutable objects, for example it does generally make copies of frozenset instances. The principle is that it doesn't need to make copies of those objects; it is an implementation detail whether or not it does in some specific cases.
If we have a list s, is there any difference between calling list(s) versus s[:]? It seems to me like they both create new list objects with the exact elements of s.
In both cases, they should create a (shallow) copy of the list.
Note that there is one corner case (which is hardly worth mentioning) where it might be different...
list = tuple # Don't ever do this!
list_copy = list(some_list) # Oops, actually it's a tuple ...
actually_list_copy = some_list[:]
With that said, nobody in their right mind should ever shadow the builtin list like that.
My advice, use whichever you feel is easier to read and works nicely in the current context.
list(...) makes it explicit that the output is a list and will make a list out of any iterable.
something[:] is a common idiom for "give me a shallow copy of this sequence, I don't really care what kind of sequence it is ...", but it doesn't work on arbitrary iterables.
list() is better - it's more readable. Other than that there is no difference.
The short answer is use list(). In google type python [:] then type python list.
If s is a list then there is no difference, but will s always be a list? Or could it be a sequence or a generator?
In [1]: nums = 1, 2, 3
In [2]: nums
Out[2]: (1, 2, 3)
In [3]: nums[:]
Out[3]: (1, 2, 3)
In [4]: list(nums)
Out[4]: [1, 2, 3]
In [7]: strings = (str(x) for x in nums)
In [8]: strings
Out[8]: <generator object <genexpr> at 0x7f77be460550>
In [9]: strings[:]
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-9-358af12435ff> in <module>()
----> 1 strings[:]
TypeError: 'generator' object has no attribute '__getitem__'
In [10]: list(strings)
Out[10]: ['1', '2', '3']
I didn't realize, but as ventsyv mentioned, s[:] and list(s) both create a copy of s.
Note you can check if an object is the same using is and id() can be used to get the object's memory address to actually see if they are the same or not.
>>> s = [1,2,3]
>>> listed_s = list(s)
>>> id(s)
44056328
>>> id(listed_s) # different
44101840
>>> listed_s is s
False
>>> bracket_s = s[:]
>>> bracket_s is s
False
>>> id(bracket_s)
44123760
>>> z = s # points to the same object in memory
>>> z is s
True
>>> id(z)
44056328
>>>
id(...)
id(object) -> integer
Return the identity of an object. This is guaranteed to be unique among
simultaneously existing objects. (Hint: it's the object's memory address.)