Python shallow copy and deep copy in using append method

Python shallow copy and deep copy in using append method - python

Some problems come out when using append method in python3.5. The code is presented
# generate boson basis in lexicographic order
def boson_basis(L,N):
basis=[]
state=[0 for i in range(1,L+1)]
pos=0
# initialize the state to |N,0,...,0>
state[0]=N
basis.append(state)
# find the first non-zero position in reverse order
while state[L-1]<N:
for i in range(-2,-L-1,-1):
if state[i]>0:
pos=L+i
break
sum=0
for i in range(0,pos):
sum=sum+state[i]
state[pos]=state[pos]-1
state[pos+1]=N-sum-state[pos]
basis.append(state)
return basis
result=boson_basis(3,3)
the expected result should be [[3,0,0],[2,1,0],...,[0,0,3]], but this code generates wrong results with all elements are the same as the last one, i.e. [[0,0,3],...,[0,0,3]]. I use the pdb to debug it and I find that once the state is modified, the former state that has been appended into basis is also changed simultaneously. It implies that append uses deepcopy automatically which is beyond my understanding. In fact, this error can be fixed if we use basis(state.copy()) explicitly.
On the other hand, the following simple code shows no error in using append
x=3
b=[]
b.append(x)
x=x+2
after x is changed to x=5, b remains unchanged b=[3]. It really puzzles me and seems contradictory with the former example.

As revealed in the comments already, there's no copy whatsoever involved in an append operation.
So you'll have to explicitly take care of this yourself, e.g. by replacing
basis.append(state)
with
basis.append(state[:])
The slicing operation with : creates a copy of state.
Mind: it does not copy the lists elements - which as long as you're keeping only plain numbers and not objects in your list should be fine though.

Related

How do you know in advance if a method (or function) will alter the variable when called?

I am new to Python from R. I have recently spent a lot of time reading up on how everything in Python is an object, objects can call methods on themselves, methods are functions within a class, yada yada yada.
Here's what I don't understand. Take the following simple code:
mylist = [3, 1, 7]
If I want to know how many times the number 7 occurs, I can do:
mylist.count(7)
That, of course, returns 1. And if I want to save the count number to another variable:
seven_counts = mylist.count(7)
So far, so good. Other than the syntax, the behavior is similar to R. However, let's say I am thinking about adding a number to my list:
mylist.append(9)
Wait a minute, that method actually changed the variable itself! (i.e., "mylist" has been altered and now includes the number 9 as the fourth digit in the list.) Assigning the code to a new variable (like I did with seven_counts) produces garbage:
newlist = mylist.append(9)
I find the inconsistency in this behavior a bit odd, and frankly undesirable. (Let's say I wanted to see what the result of the append looked like first and then have the option to decide whether or not I want to assign it to a new variable.)
My question is simple:
Is there a way to know in advance if calling a particular method will actually alter your variable (object)?

Aside from reading the documentation (which for some methods will include type annotations specifying the return value) or playing with the method in the interactive interpreter (including using help() to check the docstring for a type annotation), no, you can't know up front just by looking at the method.
That said, the behavior you're seeing is intentional. Python methods either return a new modified copy of the object or modify the object in place; at least among built-ins, they never do both (some methods mutate the object and return a non-None value, but it's never the object just mutated; the pop method of dict and list is an example of this case).
This either/or behavior is intentional; if they didn't obey this rule, you'd have had an even more confusing and hard to identify problem, namely, determining whether append mutated the value it was called on, or returned a new object. You definitely got back a list, but is it a new list or the same list? If it mutated the value it was called on, then
newlist = mylist.append(9)
is a little strange; newlist and mylist would be aliases to the same list (so why have both names?). You might not even notice for a while; you'd continue using newlist, thinking it was independent of mylist, only to look at mylist and discover it was all messed up. By having all such "modify in place" methods return None (or at least, not the original object), the error is discovered more quickly/easily; if you try and use newlist, mistakenly believing it to be a list, you'll immediately get TypeErrors or AttributeErrors.
Basically, the only way to know in advance is to read the documentation. For methods whose name indicates a modifying operation, you can check the return value and often get an idea as to whether they're mutating. It helps to know what types are mutable in the first place; list, dict, set and bytearray are all mutable, and the methods they have that their immutable counterparts (aside from dict, which has no immutable counterpart) lack tend to mutate the object in place.
The default tends to be to mutate the object in place simply because that's more efficient; if you have a 100,000 element list, a default behavior for append that made a new 100,001 element list and returned it would be extremely inefficient (and there would be no obvious way to avoid it). For immutable types (e.g. str, tuple, frozenset) this is unavoidable, and you can use those types if you want a guarantee that the object is never mutate in place, but it comes at a cost of unnecessary creation and destruction of objects that will slow down your code in most cases.

Just checkout the doc:
>>> list.count.__doc__
'L.count(value) -> integer -- return number of occurrences of value'
>>> list.append.__doc__
'L.append(object) -> None -- append object to end'
There isn't really an easy way to tell, but:
immutable object --> no way of changing through method calls
So, for example, tuple has no methods which affect the tuple as it is unchangeable so methods can only return new instances.
And if you "wanted to see what the result of the append looked like first and then have the option to decide whether or not I want to assign it to a new variable" then you can concatenate the list with a new list with one element.
i.e.
>>> l = [1,2,3]
>>> k = l + [4]
>>> l
[1, 2, 3]
>>> k
[1, 2, 3, 4]

Not from merely your invocation (your method call). You can guarantee that the method won't change the object if you pass in only immutable objects, but some methods are defined to change the object -- and will either not be defined for the one you use, or will fault in execution.
I Real Life, you look at the method's documentation: that will tell you exactly what happens.
[I was about to include what Joe Iddon's answer covers ...]

About the way to modify the list in-place in a function in Python

If I try to modify the 'board' list in-place in the way below, it doesn't work, it seems like it generate some new 'board' instead of modify in-place.
def func(self, board):
"""
:type board: List[List[str]]
"""
board = [['A' for j in range(len(board[0]))] for i in range(len(board))]
return
I have to do something like this to modify it in-place, what's the reason? Thanks.
for i in range(len(board)):
for j in range(len(board[0])):
board[i][j] = 'A'

You seem to understand the difference between these two cases, and want to know why Python makes you handle them differently?
I have to do something like this to modify it in-place, what's the reason?
Creating a new copy is something that has a value. So it makes sense for it to be an expression. In fact, list comprehensions would be useless if they weren't expressions.
Mutating a list in-place isn't something that has a value. So, there's no reason to make it an expression, and in fact, it would be weird to do so. Sure, you could come up with some kind of value (like, say, the list being mutated). But that would be at odds with everything else in the design of Python: spam.append(eggs) doesn't return spam, it returns nothing. spam = eggs doesn't have a value. And so on.
Secondarily, the comprehension style feeds very well into the iterable paradigm, which is fundamental to Python. For example, notice that you can turn a list comprehension into a generator comprehension (which gives you a lazy iterator over values that are computed on demand) just by changing the […] to (…). What useful equivalent could there be for mutation?
Making the transforming-copy more convenient also encourages people to use a non-mutating style, which often leads to better answers for many problems. When you want to know how to avoid writing three lines of nested statement to mutate some global, the answer is to stop mutating that global and instead pass in a parameter and return the new value.
Also, the syntax was copied from Haskell, where there is no mutation.
But of course all those "often" and "usually" don't mean "never". Sometimes (unless you're designing a language with no mutation), you need to do things in-place. That's why we have list.sort as well as sorted. (And a lot of work has gone into optimizing the hell out of list.sort; it's not just an afterthought.)
Python doesn't stop you from doing it. It just doesn't bend over quite as far to make it easy as it does for copying.

that is not modifying it in place. The list comprehension syntax [x for y in z] is creating a new list. The original list is not modified by this syntax. Making the name inside the function point to a new list won't change what list the name outside the function is pointing.
In other words, when calling a function python passes a reference to the object, not the name, so there is no easy way to change which object the variable name outside the function is refering to.

Why does modifying what list() return not work?

Let's say I have a list that contains three strings, and I want a new list that drops one of the strings. I know there are alternate ways of doing this, but I was surprised that the following does not work:
x = ['A','B','C']
y = list(x).remove('A')
Why does the above not work?
Edit: Thanks for the answers everyone!

Per the Python Programming FAQ (emphasis added):
Some operations (for example y.append(10) and y.sort()) mutate the object, whereas superficially similar operations (for example y = y + [10] and sorted(y)) create a new object. In general in Python (and in all cases in the standard library) a method that mutates an object will return None to help avoid getting the two types of operations confused. So if you mistakenly write y.sort() thinking it will give you a sorted copy of y, you’ll instead end up with None, which will likely cause your program to generate an easily diagnosed error.
Since remove is a mutating method (changes the list it's called on in-place), it follows the general pattern of returning None. If it didn't, a line like:
y = x.remove('A')
would appear to work, but it would be aliasing y to the same list referenced by x, not creating a new list at all, and it might take some time for that mistake to be noticed, even as you use x and y believing them to be independent. By returning None, any attempt to use y believing it to be a separate list (or a list at all), will likely fail loudly (as it does in your case, with or without the list wrapping, making your misuse of remove obvious).
This also generally encourages Python's (loose) guideline to avoid shoving too many steps in a process on a single line. If you want to copy a list and remove one element, you do it in two steps:
y = list(x)
y.remove('A')
and it works just fine.

How to make a variable (truly) local to a procedure or function

ie we have the global declaration, but no local.
"Normally" arguments are local, I think, or they certainly behave that way.
However if an argument is, say, a list and a method is applied which modifies the list, some surprising (to me) results can ensue.
I have 2 questions: what is the proper way to ensure that a variable is truly local?
I wound up using the following, which works, but it can hardly be the proper way of doing it:
def AexclB(a,b):
z = a+[] # yuk
for k in range(0, len(b)):
try: z.remove(b[k])
except: continue
return z
Absent the +[], "a" in the calling scope gets modified, which is not desired.
(The issue here is using a list method,
The supplementary question is, why is there no "local" declaration?
Finally, in trying to pin this down, I made various mickey mouse functions which all behaved as expected except the last one:
def fun4(a):
z = a
z = z.append(["!!"])
return z
a = ["hello"]
print "a=",a
print "fun4(a)=",fun4(a)
print "a=",a
which produced the following on the console:
a= ['hello']
fun4(a)= None
a= ['hello', ['!!']]
...
>>>
The 'None' result was not expected (by me).
Python 2.7 btw in case that matters.
PS: I've tried searching here and elsewhere but not succeeded in finding anything corresponding exactly - there's lots about making variables global, sadly.

It's not that z isn't a local variable in your function. Rather when you have the line z = a, you are making z refer to the same list in memory that a already points to. If you want z to be a copy of a, then you should write z = a[:] or z = list(a).
See this link for some illustrations and a bit more explanation http://henry.precheur.org/python/copy_list

Python will not copy objects unless you explicitly ask it to. Integers and strings are not modifiable, so every operation on them returns a new instance of the type. Lists, dictionaries, and basically every other object in Python are mutable, so operations like list.append happen in-place (and therefore return None).
If you want the variable to be a copy, you must explicitly copy it. In the case of lists, you slice them:
z = a[:]

There is a great answer than will cover most of your question in here which explains mutable and immutable types and how they are kept in memory and how they are referenced. First section of the answer is for you. (Before How do we get around this? header)
In the following line
z = z.append(["!!"])
Lists are mutable objects, so when you call append, it will update referenced object, it will not create a new one and return it. If a method or function do not retun anything, it means it returns None.
Above link also gives an immutable examle so you can see the real difference.
You can not make a mutable object act like it is immutable. But you can create a new one instead of passing the reference when you create a new object from an existing mutable one.
a = [1,2,3]
b = a[:]
For more options you can check here

What you're missing is that all variable assignment in python is by reference (or by pointer, if you like). Passing arguments to a function literally assigns values from the caller to the arguments of the function, by reference. If you dig into the reference, and change something inside it, the caller will see that change.
If you want to ensure that callers will not have their values changed, you can either try to use immutable values more often (tuple, frozenset, str, int, bool, NoneType), or be certain to take copies of your data before mutating it in place.
In summary, scoping isn't involved in your problem here. Mutability is.
Is that clear now?
Still not sure whats the 'correct' way to force the copy, there are
various suggestions here.
It differs by data type, but generally <type>(obj) will do the trick. For example list([1, 2]) and dict({1:2}) both return (shallow!) copies of their argument.
If, however, you have a tree of mutable objects and also you don't know a-priori which level of the tree you might modify, you need the copy module. That said, I've only needed this a handful of times (in 8 years of full-time python), and most of those ended up causing bugs. If you need this, it's a code smell, in my opinion.
The complexity of maintaining copies of mutable objects is the reason why there is a growing trend of using immutable objects by default. In the clojure language, all data types are immutable by default and mutability is treated as a special cases to be minimized.

If you need to work on a list or other object in a truly local context you need to explicitly make a copy or a deep copy of it.
from copy import copy
def fn(x):
y = copy(x)

Why is my (local) variable behaving like a global variable?

I have no use for a global variable and never define one explicitly, and yet I seem to have one in my code. Can you help me make it local, please?
def algo(X): # randomized algorithm
while len(X)>2:
# do a bunch of things to nested list X
print(X)
# tracing: output is the same every time, where it shouldn't be.
return len(X[1][1])
def find_min(X): # iterate algo() multiple times to find minimum
m = float('inf')
for i in some_range:
new = algo(X)
m = min(m, new)
return m
X = [[[..], [...]],
[[..], [...]],
[[..], [...]]]
print(find_min(X))
print(X)
# same value as inside the algo() call, even though it shouldn't be affected.
X appears to be behaving like a global variable. The randomized algorithm algo() is really performed only once on the first call because with X retaining its changed value, it never makes it inside the while loop. The purpose of iterations in find_min is thus defeated.
I'm new to python and even newer to this forum, so let me know if I need to clarify my question. Thanks.
update Many thanks for all the answers so far. I almost understand it, except I've done something like this before with a happier result. Could you explain why this code below is different, please?
def qsort(X):
for ...
# recursively sort X in place
count+=1 # count number of operations
return X, count
X = [ , , , ]
Y, count = qsort(X)
print(Y) # sorted
print(X) # original, unsorted.
Thank you.
update II To answer my own second question, the difference seems to be the use of a list method in the first code (not shown) and the lack thereof in the second code.

As others have pointed out already, the problem is that the list is passed as a reference to the function, so the list inside the function body is the very same object as the one you passed to it as an argument. Any mutations your function performs are thus visible from outside.
To solve this, your algo function should operate on a copy of the list that it gets passed.
As you're operating on a nested list, you should use the deepcopy function from the copy module to create a copy of your list that you can freely mutate without affecting anything outside of your function. The built-in list function can also be used to copy lists, but it only creates shallow copies, which isn't what you want for nested lists, because the inner lists would still just be pointers to the same objects.
from copy import deepcopy
def algo (X):
X = deepcopy(X)
...

When you do find_min(X), you are passing the object X (a list in this case) to the function. If that function mutates the list (e.g., by appending to it) then yes, it will affect the original object. Python does not copy objects just because you pass them to a function.

When you pass an object to a python function, the object isn't copied, but rather a pointer to the object is passed.
This makes sense because it greatly speeds up execution - in the case of a long list, there is no need to copy all of its elements.
However, this means that when you modify a passed object (for example, your list X), the modification applies to that object, even after the function returns.
For example:
def foo(x):
x.extend('a')
print x
l = []
foo(l)
foo(l)
Will print:
['a']
['a', 'a']

Python lists are mutable (i.e., they can be changed) and the use of algo within find_min function call does change the value of X (i.e., it is pass-by-reference for lists). See this SO question, for example.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.