Variables and memory allocated for them? - python

I have a question about Python deal with memory to copy variables.
For example, I have a list(or string, tuple, dictionary, set) variable
A = [1,2,3]
then I assign the value of A to another variable B
B = A
then if I do "some changes" to A, e.g.,
A.pop(0)
then B also changes, i.e.,
print(A,B) will give me ([2,3], [2,3])
I read some material and they say "B=A did not copy the value of A to a new place in memory labelled by B. It just made the name B point to the same position in memory as A." Can I interpret this as we still only have one place of memory, but now it has 2 names?
However, I found that if I did some other changes to A, such as
A = [5,6] # I reassign A value,
Then I found
print(A,B)
gives me ([5,6],[1,2,3])
So I am confused here. It seems that now we have two places of memory

Your first understanding was correct. When you do
B = A
you now have two names pointing to the same object in memory.
Your misunderstanding is what happens when you do
A = [5, 6]
This doesn't copy [5, 6] to that location in memory. It allocates a new list [5, 6] and then changes the name A to point to this. But B still points to the same list that it pointed to before.
Basically, every time you do
A = <something>
you're changing where A points, not changing the thing that it points to.

Lists are objects and therefore 'call-by-reference'. When you write B=A you'll get a reference (c-pointer) on the object behind A (not A itself!), so basically, as your code is telling you already, A is B == True. The reference is not on A but on the object that A points to, so if you change A to A = [5,6] the interpreter will notice that you've got another reference (B) on the old list and will keep that reference and the list (else it would land in the garbage collector). It'll only change the adress stored in A.
If you then, however, reassing B=A, B will be [5,6].

you assign new obj to a at second time
>>>
>>> a= [1,2,3]
>>> id(a)
4353139632
>>> b = a
>>> id(b)
4353139632
>>> a= [4,5]
>>> id(a)
4353139776
>>> id(b)
4353139632

Lists, tuples, and objects are referenced in Python. You can see these variable names as pointers in C. So, A is a pointer to some location, storing an array, when you did B = A you copied the reference to that location ( the address ) to B.
So, when you changed contents at that location, via A, then consequently, answer would be what that is at that memory location, whether you access it via A or B.
However, if you would like to copy the elements, you can use
B = [i for i in A]
or something like that.
and when you assigned some other value to A, A = [5,6], the reference at A is now pointing to some other memory location, and that at B to the original location, so B stays same.

Related

difference in variable assigning in Python between integer and list

I am studying Wes McKinney's 'Python for data analysis'.
At some point he says:
"When assigning a variable (or name) in Python, you are creating a reference to the object on the righthand side of the equals sign. In practical terms, consider a list of integers:
In [8]: a = [1, 2, 3]
In [9]: b = a
In [11]: a.append(4)
In [12]: b
output will be:
Out[12]: [1, 2, 3, 4]
He reasons as such:
"In some languages, the assignment of b will cause the data [1, 2, 3] to be copied. In Python, a and b actually now refer to the same object, the original list"
My question is that why the same thing does not occur in the case below:
In [8]: a = 5
In [9]: b = a
In [11]: a +=1
In [12]: b
Where I still get
Out[12]: 5
for b?
In the first case, you're creating a list and both a and b are pointing at this list. When you're changing the list, then both variables are pointers at the list including its changes.
But if you increase the value of a variable that points at an integer. 5 is still 5, you're not changing the integer. You're changing which object the variable a is pointing to. So a is now pointing at the value 6, while b is still pointing at 5. You're not changing the thing that a is pointing to, you're changing WHAT a is pointing to. b doesn't care about that.

Reference Assignment in Python - Why isn't the value changed?

I know that "variable assignment" in python is, in fact, a binding / re-binding of a name (the variable) to an object.
b = [1,2,3]
a = b[2] # binding a to b[2] ?
a = 1000
b is [1, 2, 3]
after this change, why b is not changed?
here is another example:
b = [1,2,3]
a = b
a[0] = 1000
this case b is [1000, 2, 3]
isn't assignment in Python reference binding?
Thank you
a = b[2] # binding a to b[2] ?
Specifically, this binds the name a to the same value referenced by b[2]. It does not bind a to the element in the list b at index 2. The name a is entirely independent of the list where it got its value from.
a = 1000
Now you bind the name a to a new value, the integer 1000. Since a has no association with b, b does not change.
In your second example:
a = b
Now a is bound to the same list value that b is bound to. So when you do
a[0] = 1000
You modify an element in the underlying list that has two different names. When you access the list by either name, you will see the same value.
a[0] = ... is a special form of assignment that desugars to a method call,
a.__setattr__(0, ...)
You aren't reassigning to the name a; you are assigning to a "slot" in the object referenced by a (and b).
Lists are mutable objects, ints aren't. 4 can't become 5 in python, but a list can change its contents.

What's the point of assignment to slice?

I found this line in the pip source:
sys.path[:] = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
As I understand the line above is doing the same as below:
sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
With one difference: in the first case sys.path still points to the same object in memory while in the second case sys.path points to the new list created from two existing.
Another one thing is that the first case is two times slower than second:
>>> timeit('a[:] = a + [1,2]', setup='a=[]', number=20000)
2.111023200035561
>>> timeit('a = a + [1,2]', setup='a=[]', number=20000)
1.0290934000513516
The reason as I think is that in the case of slice assignment objects from a (references to objects) are copied to a new list and then copied back to the resized a.
So what are the benefits of using a slice assignment?
Assigning to a slice is useful if there are other references to the same list, and you want all references to pick up the changes.
So if you do something like:
bar = [1, 2, 3]
foo = bar
bar[:] = [5, 4, 3, 2, 1]
print(foo)
this will print [5, 4, 3, 2, 1]. If you instead do:
bar = [5, 4, 3, 2, 1]
print(foo)
the output will be [1, 2, 3].
With one difference: in the first case sys.path still points to the same object in memory while in the second case sys.path points to the new list created from two existing.
Right: That’s the whole point, you’re modifying the object behind the name instead of the name. Thus all other names referring to the same object also see the changes.
Another one thing is that the first case is two times slower than second:
Not really. Slice assignment performs a copy. Performing a copy is an O(n) operation while performing a name assignment is O(1). In other words, the bigger the list, the slower the copy; whereas the name assignment always takes the same (short) time.
Your assumptions are very good!
In python a variable is a name that has been set to point to an object in memory, which in essence is what gives python the ability to be a dynamically typed language, i.e. you can have the same variable as a number, then reassign it to a string etc.
as shown here whenever you assign a new value to a variable, you are just pointing a name to a different object in memory
>>> a = 1
>>> id(a)
10968800
>>> a = 1.0
>>> id(a)
140319774806136
>>> a = 'hello world'
>>> id(a)
140319773005552
(in CPython the id refers to its address in memory).
Now for your question sys.path is a list, and a python list is a mutable type, thus meaning that the type itself can change, i.e.
>>> l = []
>>> id(l)
140319772970184
>>> l.append(1)
>>> id(l)
140319772970184
>>> l.append(2)
>>> id(l)
140319772970184
even though I modified the list by adding items, the list still points to the same object, and following the nature of python, a lists elements as well are only pointers to different areas in memory (the elements aren't the objects, the are only like variables to the objects held there) as shown here,
>>> l
[1, 2]
>>> id(l[0])
10968800
>>> l[0] = 3
>>> id(l[0])
10968864
>>> id(l)
140319772970184
After reassigning to l[0] the id of that element has changed. but once again the list hasn't.
Seeing that assigning to an index in the list only changes the places where lists elements where pointing, now you will understand that when I reassign l I don't reassign, I just change where l was pointing
>>> id(l)
140319772970184
>>> l = [4, 5, 6]
>>> id(l)
140319765766728
but if I reassign to all of ls indexes, then l stays the same object only the elements point to different places
>>> id(l)
140319765766728
>>> l[:] = [7, 8, 9]
>>> id(l)
140319765766728
That will also give you understanding on why it is slower, as python is reassigning the elements of the list, and not just pointing the list somewhere else.
One more little point if you are wondering about the part where the line finishes with
sys.path[:] = ... + sys.path
it goes in the same concept, python first creates the object on the right side of the = and then points the name on the left side to the new object, so when python is still creating the new list on the right side, sys.path is in essence the original list, and python takes all of its elements and then reassigns all of the newly created elements to the mappings in the original sys.paths addresses (since we used [:])
now for why pip is using [:] instead of reassigning, I don't really know, but I would believe that it might have a benefit of reusing the same object in memory for sys.path.
python itself also does it for the small integers, for example
>>> id(a)
10968800
>>> id(b)
10968800
>>> id(c)
10968800
a, b and c all point to the same object in memory even though all requested to create an 1 and point to it, since python knows that the small numbers are most probably going to be used a lot in programs (for example in for loops) so they create it and reuse it throughout.
(you might also find it being the case with filehandles that python will recycle instead of creating a new one.)
You are right, slice assignment will not rebind, and slice object is one type of objects in Python. You can use it to set and get.
In [1]: a = [1, 2, 3, 4]
In [2]: a[slice(0, len(a), 2)]
Out[2]: [1, 3]
In [3]: a[slice(0, len(a), 2)] = 6, 6
In [4]: a[slice(0, len(a), 1)] = range(10)
In [5]: a
Out[5]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [6]: a[:] = range(4)
In [7]: a
Out[7]: [0, 1, 2, 3]

Why do python lists act like this when using the = operator [duplicate]

This question already has answers here:
Variable assignment and modification (in python) [duplicate]
(6 answers)
Closed 4 years ago.
How come the following code:
a = [1,2,3]
b = a
b[0] = 3
print(a)
will print list b after it has been altered?[3,2,3].
Also why is this true but that the following code:
a = [1,2,3]
b = a
b = [0,0,0]
print(a,b)
prints [1, 2, 3] [0, 0, 0]?? This seems inconsistent. If the first code is true, then shouldn't the second code print [0,0,0][0,0,0]? Can someone please provide an explanation for this?
In python there are two types of data... mutable and immutable. Numbers, strings, boolean, tuples, and other simple types are immutable. Dicts, lists, sets, objects, classes, and other complex types are mutable.
When you say:
a = [1,2,3]
b = a
You've created a single mutable list in memory, assigned a to point to it, and then assigned b to point to it. It's the same thing in memory.
Therefore when you mutate it (modify it):
b[0] = 3
It is a modification (mutation) of the index [0] of the value which b points to at that same memory location.
However, when you replace it:
b = [0,0,0]
It is creating a new mutable list in memory and assigning b to point at it.
Check out the id() function. It will tell you the "address" of any variable. You can see which names are pointing to the same memory location with id(varname).
Bonus: Every value in python is passed by reference... meaning that when you assign it to a variable it simply causes that variable to point to that value where it was in memory. Having immutable types allows python to "reuse" the same memory location for common immutable types.
Consider some common values when the interpreter starts up:
>>> import sys
>>> sys.getrefcount('abc')
68
>>> sys.getrefcount(100)
110
>>> sys.getrefcount(2)
6471
However, a value that is definitely not present would return 2. This has to do with the fact that a couple of references to that value were in-use during the call to sys.getrefcount
>>> sys.getrefcount('nope not me. I am definitely not here already.')
2
Notice that an empty tuple has a lot of references:
>>> sys.getrefcount(tuple())
34571
But an empty list has no extra references:
>>> sys.getrefcount(list())
1
Why is this? Because tuple is immutable so it is fine to share that value across any number of variables. However, lists are mutable so they MUST NOT be shared across arbitrary variables or changes to one would affect the others.
Incidentally, this is also why you must NEVER use mutable types as default argument values to functions. Consider this innocent little function:
>>> def foo(value=[]):
... value.append(1)
... print(value)
...
...
When you call it you might expect to get [1] printed...
>>> foo()
[1]
However, when you call it again, you prob. won't expect to get [1,1] out... ???
>>> foo()
[1, 1]
And on and on...
>>> foo()
[1, 1, 1]
>>> foo()
[1, 1, 1, 1]
WHY IS THIS? Because default arguments to functions are evaluated once during function definition, and not at function run time. That way if you use a mutable value as a default argument value, then you will be stuck with that one value, mutating in unexpected ways as the function is called multiple times.
The proper way to do it is this:
>>> def foo(value=None):
... if value is None:
... value = []
... value.append(1)
... print(value)
...
...
>>>
>>> foo()
[1]
>>> foo()
[1]
>>> foo()
[1]

Some confusions on how numpy array stored in Python

I have some confusions when playing with data type numpy array in Python.
Question 1
I execute the following scripts in python intepreter
>>> import numpy as np
>>> L = [1000,2000,3000]
>>> A = np.array(L)
>>> B = A
Then I check the following things:
>>> A is B
True
>>> id(A) == id(B)
True
>>> id(A[0]) == id(B[0])
True
That's fine. But some strange things happened then.
>>> A[0] is B[0]
False
But how can A[0] and B[0] be different things? They have the same id!
For List in python, we have
>>> LL = [1000,2000,3000]
>>> SS = LL
>>> LL[0] is SS[0]
True
The method to store numpy array is totally different with list? And we also have
>>> A[0] = 1001
>>> B[0]
1001
It seems that A[0] and B[0] is the identical objects.
Question2
I make a copy of A.
>>> C = A[:]
>>> C is A
False
>>> C[0] is A[0]
False
That is fine. A and C seem to be independent with each other. But
>>> A[0] = 1002
>>> C[0]
1002
It seems that A and C is not independent? I am totally confused.
You are asking two completely independent questions, so here's two answsers.
The data of Numpy arrays is internally stored as a contiguous C array. Each entry in the array is just a number. Python objects on the other hand require some housekeeping data, e.g. the reference count and a pointer to the type object. You can't simply have a raw pointer to a number in memory. For this reason, Numpy "boxes" a number in a Python object if you access an individual elemtent. This happens everytime you access an element, so even A[0] and A[0] are different objects:
>>> A[0] is A[0]
False
This is at the heart of why Numpy can store arrays in a more memory-efficient way: It does not store a full Python object for each entry, and only creates these objects on the fly when needed. It is optimised for vectorised operations on the array, not for individual element access.
When you execute C = A[:] you are creating a new view for the same data. You are not making a copy. You will then have two different wrapper objects, pointed to by A and C respectively, but they are backed by the same buffer. The base attribute of an array refers to the array object it was originally created from:
>>> A.base is None
True
>>> C.base is A
True
New views on the same data are particularly useful when combined with indexing, since you can get views that only include some slice of the original array, but are backed by the same memory.
To actually make a copy of an array, use the copy() method.
As a more general remark, you should not read too much into object identity in Python. In general, if x is y is true, you know that they are really the same object. However, if this returns false, they can still be two different proxies to the same object.

Categories

Resources