I found this line in the pip source:
sys.path[:] = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
As I understand the line above is doing the same as below:
sys.path = glob.glob(os.path.join(WHEEL_DIR, "*.whl")) + sys.path
With one difference: in the first case sys.path still points to the same object in memory while in the second case sys.path points to the new list created from two existing.
Another one thing is that the first case is two times slower than second:
>>> timeit('a[:] = a + [1,2]', setup='a=[]', number=20000)
2.111023200035561
>>> timeit('a = a + [1,2]', setup='a=[]', number=20000)
1.0290934000513516
The reason as I think is that in the case of slice assignment objects from a (references to objects) are copied to a new list and then copied back to the resized a.
So what are the benefits of using a slice assignment?
Assigning to a slice is useful if there are other references to the same list, and you want all references to pick up the changes.
So if you do something like:
bar = [1, 2, 3]
foo = bar
bar[:] = [5, 4, 3, 2, 1]
print(foo)
this will print [5, 4, 3, 2, 1]. If you instead do:
bar = [5, 4, 3, 2, 1]
print(foo)
the output will be [1, 2, 3].
With one difference: in the first case sys.path still points to the same object in memory while in the second case sys.path points to the new list created from two existing.
Right: That’s the whole point, you’re modifying the object behind the name instead of the name. Thus all other names referring to the same object also see the changes.
Another one thing is that the first case is two times slower than second:
Not really. Slice assignment performs a copy. Performing a copy is an O(n) operation while performing a name assignment is O(1). In other words, the bigger the list, the slower the copy; whereas the name assignment always takes the same (short) time.
Your assumptions are very good!
In python a variable is a name that has been set to point to an object in memory, which in essence is what gives python the ability to be a dynamically typed language, i.e. you can have the same variable as a number, then reassign it to a string etc.
as shown here whenever you assign a new value to a variable, you are just pointing a name to a different object in memory
>>> a = 1
>>> id(a)
10968800
>>> a = 1.0
>>> id(a)
140319774806136
>>> a = 'hello world'
>>> id(a)
140319773005552
(in CPython the id refers to its address in memory).
Now for your question sys.path is a list, and a python list is a mutable type, thus meaning that the type itself can change, i.e.
>>> l = []
>>> id(l)
140319772970184
>>> l.append(1)
>>> id(l)
140319772970184
>>> l.append(2)
>>> id(l)
140319772970184
even though I modified the list by adding items, the list still points to the same object, and following the nature of python, a lists elements as well are only pointers to different areas in memory (the elements aren't the objects, the are only like variables to the objects held there) as shown here,
>>> l
[1, 2]
>>> id(l[0])
10968800
>>> l[0] = 3
>>> id(l[0])
10968864
>>> id(l)
140319772970184
After reassigning to l[0] the id of that element has changed. but once again the list hasn't.
Seeing that assigning to an index in the list only changes the places where lists elements where pointing, now you will understand that when I reassign l I don't reassign, I just change where l was pointing
>>> id(l)
140319772970184
>>> l = [4, 5, 6]
>>> id(l)
140319765766728
but if I reassign to all of ls indexes, then l stays the same object only the elements point to different places
>>> id(l)
140319765766728
>>> l[:] = [7, 8, 9]
>>> id(l)
140319765766728
That will also give you understanding on why it is slower, as python is reassigning the elements of the list, and not just pointing the list somewhere else.
One more little point if you are wondering about the part where the line finishes with
sys.path[:] = ... + sys.path
it goes in the same concept, python first creates the object on the right side of the = and then points the name on the left side to the new object, so when python is still creating the new list on the right side, sys.path is in essence the original list, and python takes all of its elements and then reassigns all of the newly created elements to the mappings in the original sys.paths addresses (since we used [:])
now for why pip is using [:] instead of reassigning, I don't really know, but I would believe that it might have a benefit of reusing the same object in memory for sys.path.
python itself also does it for the small integers, for example
>>> id(a)
10968800
>>> id(b)
10968800
>>> id(c)
10968800
a, b and c all point to the same object in memory even though all requested to create an 1 and point to it, since python knows that the small numbers are most probably going to be used a lot in programs (for example in for loops) so they create it and reuse it throughout.
(you might also find it being the case with filehandles that python will recycle instead of creating a new one.)
You are right, slice assignment will not rebind, and slice object is one type of objects in Python. You can use it to set and get.
In [1]: a = [1, 2, 3, 4]
In [2]: a[slice(0, len(a), 2)]
Out[2]: [1, 3]
In [3]: a[slice(0, len(a), 2)] = 6, 6
In [4]: a[slice(0, len(a), 1)] = range(10)
In [5]: a
Out[5]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [6]: a[:] = range(4)
In [7]: a
Out[7]: [0, 1, 2, 3]
Related
I am studying Wes McKinney's 'Python for data analysis'.
At some point he says:
"When assigning a variable (or name) in Python, you are creating a reference to the object on the righthand side of the equals sign. In practical terms, consider a list of integers:
In [8]: a = [1, 2, 3]
In [9]: b = a
In [11]: a.append(4)
In [12]: b
output will be:
Out[12]: [1, 2, 3, 4]
He reasons as such:
"In some languages, the assignment of b will cause the data [1, 2, 3] to be copied. In Python, a and b actually now refer to the same object, the original list"
My question is that why the same thing does not occur in the case below:
In [8]: a = 5
In [9]: b = a
In [11]: a +=1
In [12]: b
Where I still get
Out[12]: 5
for b?
In the first case, you're creating a list and both a and b are pointing at this list. When you're changing the list, then both variables are pointers at the list including its changes.
But if you increase the value of a variable that points at an integer. 5 is still 5, you're not changing the integer. You're changing which object the variable a is pointing to. So a is now pointing at the value 6, while b is still pointing at 5. You're not changing the thing that a is pointing to, you're changing WHAT a is pointing to. b doesn't care about that.
I know the difference in a[:] and a in assignment to a variable and also the special case of slice assignment.
Suppose,
a=[1,2,3,4,5]
What is the difference between the following two statements?
b=a[:]+[6,7,8,9,10] #1
b=a+[6,7,8,9,10] #2
In both cases, both a and b have the same values at the end.
I have referred the following links -
When and why to use [:] in python
Understanding slice notation
Python why would you use [:] over =
They haven't mentioned their difference in an expression as such.
a[:] grabs a full slice of the list – in this context, it has no difference in effect since you're assigning to a new list (though it does copy the list, so it's slower at scale).
# create the list.
>>> a = [1, 2, 3, 4, 5]
# see its address
>>> id(a)
4349194440
# see the (different) address of a copy
>>> id(a[:])
4350338120
# reassign the entire list using slice syntax
>>> a[:] = [5, 6, 7]
>>> a
[5, 6, 7]
# still the same first ID though
>>> id(a)
4349194440
>>>
In python list slicing a[:] and a has only difference in their id's because a[:] is making an exact copy of a in another address location.
Also considering python immutable string slicing a[:] and a have no difference.Both points to same address location.
a=[1,2,3,4,5]
b=a[:]+[6,7,8,9,10] #1
b=a+[6,7,8,9,10] #2
Case-1 a[:] , means you are slicing the sequence and a sequence can be anything like string, list etc. Basically this is read as a[start:end:steps],where start end are our indexing values AND steps are number of jumps. If we do not provide any values then by default start = 0 AND end = last element of sequence AND steps = 1. So in your case, you are simply taking the whole elements of list a.
Case-2 a , It simply means the whole a
Conclusion:- With the help of a[:] you can get the desired elements.
Examples-->>
a = [1,2,3,4]
a[1:4]
>> [1,2,3]
a[::2]
>> [1,3]
I hope it may help you.
After learning about lists and box-and-pointer diagrams, I decided to create random stuff for myself and test out my knowledge. I am going to use the words shallow copy and suspected shallow copies as I'm not really sure whether they are correct by definition. My queries are in the reasons provide for the behaviour of such code, please tell me whether I'm thinking soundly.
Code A
from copy import *
x=[1,[2,[3,[4]]]] #normal copy/hardcopy
a=x
v=list(x) #suspected shallow copy
y=x.copy() #shallow copy
z=deepcopy(x) #theoretical deep copy
w=x[:] #suspected shallow copy
def test():
print("Original:",x)
print("hardcopy:",a)
print("suspected shallow copy",v)
print("shallow copy",y)
print("deep copy:",z)
print("suspected shallow copy",w)
x[1]=x[1]+[4]
test()
Output A:
Original: [1, [2, [3, [4]], 4]]
hardcopy: [1, [2, [3, [4]], 4]]
suspected shallow copy [1, [2, [3, [4]]]]
shallow copy [1, [2, [3, [4]]]]
deep copy: [1, [2, [3, [4]]]]
suspected shallow copy [1, [2, [3, [4]]]]
Code B
a=(1,2,[1,2,3])
def shallow_copy(x):
tup=()
for i in x:
tup+=(i,)
return tup
def hardcopy(x):
return x
b=hardcopy(a)
c=shallow_copy(a)
a[2]+=[3]
Output B:
I see TypeError in IDLE here, but the mutation of the list element is still done, and across ALL a,b,c
Continuation from output B:
a[2][0]=a[2][0]+99
a,b,c
Output C:
((1, 2, [100, 2, 3, 3]), (1, 2, [100, 2, 3, 3]), (1, 2, [100, 2, 3, 3]))
Code D:
a=[1,2,(1,2,3)]
def shallow_copy(x):
tup=[]
for i in x:
tup+=[i]
return tup
def hardcopy(x):
return x
b=hardcopy(a)
c=shallow_copy(a)
d=a.copy()
a[2]=a[2]+(4,)
a,b,c,d
Output D:
[1, 2, (1, 2, 3, 4)], [1, 2, (1, 2, 3, 4)],
[1, 2, (1, 2, 3)], [1, 2, (1, 2, 3)]
From Output A, we observe the following:
1)For lists which have shallow copies, doing x[1]=x[1]+[4] does not affect the shallow copies. My reasons for the above could be
a) = followed by + does __add__ instead of __iadd__(which is +=), and doing __add__ should not modify the object, only changing the value for one pointer(x and its hardcopy in this case)
This is further supported in Output B but somehow contradicted in Output C, could be partly due to reason (b) below, but can't be too sure.
b) We executed this in the first layer(only 1 slice operator), maybe there's some kind of rule which prevents these elements from being modified.
This is supported by both Output B and Output C, though Output B might be argued to be in the first layer, think of it as increasing the elements in the 2nd layer, and it fits the above observation.
2)What is the reason why the TypeError appeared in Output B, but is still executed? I know that whether an Exception might be triggered is based on the final sequence you are actually changing(the list in this case), but why is there still TypeError: 'tuple' object does not support item assignment ?
I have presented my views for the above questions. I appreciate any thoughts(theoretical solutions preferably) on this question as I'm still relatively new to programming.
To answer question 1, which looks complex but whose answer is probably quite simple:
when you have a another name referencing the original object, you will see the changes in the original. Those changes will not reflect in other copies (being those either shallow or deep) if(!) you change the objects using the form x[1] = x[1] + [4]. This is because you are assigning a new object into x[1], instead of making an in-place change like in x[1].append(4).
You can check that with the id() function.
To answer your question 2, and adapted from the official docs:
let's make
a = (['hello'],)
then
a[0] += [' world']
this is the same as
a[0] = operator.iadd(a[0],[' world'])
The iadd changes the list in place, but then the assignment fails because you can't assign to a tuple (immutable type) index.
If you make
a[0] = a[0] + [' world']
the concatenation goes into a new list object, then the assignment to the tuple index fails too. But the new object gets lost. a[0] wasn't changed in place.
To clarify OP's comment, directly from the docs in here it says that
Many operations have an “in-place” version. Listed below are functions providing a more primitive access to in-place operators than the usual syntax does; for example, the statement x += y is equivalent to x = operator.iadd(x, y). Another way to put it is to say that z = operator.iadd(x, y) is equivalent to the compound statement z = x; z += y.
In those examples, note that when an in-place method is called, the
computation and assignment are performed in two separate steps. The
in-place functions listed below only do the first step, calling the
in-place method. The second step, assignment, is not handled.
As for your Output D:
Writing
b = hardcopy(a)
does nothing more than writing
b = a
really, b is a new name referencing the same object that a references.
This is because a is mutable and so a reference pointing to the original object is passed into local function name x. Returning x just returns the same reference into b.
That's why you see further changes in a reflected in b. Again you make a[2] a new different object tuple by assignment, so now a[2] and b[2] reference a new tuple (1,2,3,4), while c and d still reference the old tuple object. And now because they are tuples you can't change them in place, like lists.
As for the term "hardcopy", I wouldn't use it. It doesn't appear even once in official docs, and the mentions in Python SO questions beside this one, appear in other contexts. And it is ambiguous (contrary to "shallow" and "deep" which give a good clue for their meaning). I would think exactly the opposite (an object copy) for the term "hardcopy" you describe (an additional name/reference/pointer to the same object). Of course there are eventually many ways to say the same thing. We say "copy" because its shorter, and for immutables it doesn't matter if the copy happens or not (you can't change them anyway). For mutables saying "copy" usually means "shallow copy", because you have to "go further" in your code if you want a "deep copy".
I thought that if you assign a variable to another list, it's not copied, but it points to the same location. That's why deepcopy() is for. This is not true with Python 2.7: it's copied.
>>> a=[1,2,3]
>>> b=a
>>> b=b[1:]+b[:1]
>>> b
[2, 3, 1]
>>> a
[1, 2, 3]
>>>
>>> a=(1,2,3)
>>> b=a
>>> b=b[1:]+b[:1]
>>> a
(1, 2, 3)
>>> b
(2, 3, 1)
>>>
What am I missing?
This line changes what b points to:
b=b[1:]+b[:1]
List or tuple addition creates a new list or tuple, and the assignment operator makes b refer to that new list while leaving a referring to the original list or tuple.
Slicing a list or tuple also creates a new object, so that line creates three new objects - one for each slice, and then one for the sum. b = a + b would be a simpler example to demonstrate that addition creates a new object.
You will sometimes see c = b[:] as a way to shallow copy a list, making use of the fact that slicing creates a new object.
When you do b=b[1:]+b[:1] you first create a new object of two b slices and then assign b to reference that object. The same is for both list and tuple cases
I am passing a single element of a list to a function. I want to modify that element, and therefore, the list itself.
def ModList(element):
element = 'TWO'
l = list();
l.append('one')
l.append('two')
l.append('three')
print l
ModList(l[1])
print l
But this method does not modify the list. It's like the element is passed by value. The output is:
['one','two','three']
['one','two','three']
I want that the second element of the list after the function call to be 'TWO':
['one','TWO','three']
Is this possible?
The explanations already here are correct. However, since I have wanted to abuse python in a similar fashion, I will submit this method as a workaround.
Calling a specific element from a list directly returns a copy of the value at that element in the list. Even copying a sublist of a list returns a new reference to an array containing copies of the values. Consider this example:
>>> a = [1, 2, 3, 4]
>>> b = a[2]
>>> b
3
>>> c = a[2:3]
>>> c
[3]
>>> b=5
>>> c[0]=6
>>> a
[1, 2, 3, 4]
Neither b, a value only copy, nor c, a sublist copied from a, is able to change values in a. There is no link, despite their common origin.
However, numpy arrays use a "raw-er" memory allocation and allow views of data to be returned. A view allows data to be represented in a different way while maintaining the association with the original data. A working example is therefore
>>> import numpy as np
>>> a = np.array([1, 2, 3, 4])
>>> a
array([1, 2, 3, 4])
>>> b = a[2]
>>> b
3
>>> b=5
>>> a
array([1, 2, 3, 4])
>>> c = a[2:3]
>>> c
array([3])
>>> c[0]=6
>>> a
array([1, 2, 6, 4])
>>>
While extracting a single element still copies by value only, maintaining an array view of element 2 is referenced to the original element 2 of a (although it is now element 0 of c), and the change made to c's value changes a as well.
Numpy ndarrays have many different types, including a generic object type. This means that you can maintain this "by-reference" behavior for almost any type of data, not only numerical values.
Python doesn't do pass by reference. Just do it explicitly:
l[1] = ModList(l[1])
Also, since this only changes one element, I'd suggest that ModList is a confusing name.
Python is a pass by value language hence you can't change the value by assignment in the function ModList. What you could do instead though is pass the list and index into ModList and then modify the element that way
def ModList(theList, theIndex) :
theList[theIndex] = 'TWO'
ModList(l, 1)
In many cases you can also consider to let the function both modify and return the modified list. This makes the caller code more readable:
def ModList(theList, theIndex) :
theList[theIndex] = 'TWO'
return theList
l = ModList(l, 1)