Related
I'm at the point in learning Python where I'm dealing with the Mutable Default Argument problem.
# BAD: if `a_list` is not passed in, the default will wrongly retain its contents between successive function calls
def bad_append(new_item, a_list=[]):
a_list.append(new_item)
return a_list
# GOOD: if `a_list` is not passed in, the default will always correctly be []
def good_append(new_item, a_list=None):
if a_list is None:
a_list = []
a_list.append(new_item)
return a_list
I understand that a_list is initialized only when the def statement is first encountered, and that's why subsequent calls of bad_append use the same list object.
What I don't understand is why good_append works any different. It looks like a_list would still be initialized only once; therefore, the if statement would only be true on the first invocation of the function, meaning a_list would only get reset to [] on the first invocation, meaning it would still accumulate all past new_item values and still be buggy.
Why isn't it? What concept am I missing? How does a_list get wiped clean every time good_append runs?
It looks like a_list would still be initialized only once
"initialization" is not something that happens to variables in Python, because variables in Python are just names. "initialization" only happens to objects, and it's done via the class' __init__ method.
When you write a = 0, that is an assignment. That is saying "a shall refer to the object that is described by the expression 0". It is not initialization; a can name anything else of any type at any later time, and that happens as a result of assigning something else to a. Assignment is just assignment. The first one is not special.
When you write def good_append(new_item, a_list=None), that is not "initializing" a_list. It is setting up an internal reference to an object, the result of evaluating None, so that when good_append is called without a second parameter, that object is automatically assigned to a_list.
meaning a_list would only get reset to [] on the first invocation
No, a_list gets set to [] any time that a_list is None to begin with. That is, when either None is passed explicitly, or the argument is omitted.
The problem with [] occurs because the expression [] is only evaluated once in this context. When the function is compiled, [] is evaluated, a specific list object is created - that happens to be empty to start - and that object is used as the default.
How does a_list get wiped clean every time good_append runs?
It doesn't. It doesn't need to be.
You know how the problem is described as being with "mutable default arguments"?
None is not mutable.
The problem occurs when you modify the object that the parameter has as a default.
a_list = [] does not modify whatever object a_list previously referred to. It cannot; arbitrary objects cannot magically transform in-place into empty lists. a_list = [] means "a_list shall stop referring to what it previously referred to, and start referring to []". The previously-referred-to object is unchanged.
When the function is compiled, and one of the arguments has a default value, that value - an object - gets baked into the function (which is also, itself, an object!). When you write code that mutates an object, the object mutates. If the object being referred to happens to be the object baked into the function, it still mutates.
But you cannot mutate None. It is immutable.
You can mutate []. It is a list, and lists are mutable. Appending an item to a list mutates the list.
The default value of a_list (or any other default value, for that matter) is stored in the function's interiors once it has been initialized and thus can be modified in any way:
>>> def f(x=[]): return x
...
>>> f.func_defaults
([],)
>>> f.func_defaults[0] is f()
True
resp. for Python 3:
>>> def f(x=[]): return x
...
>>> f.__defaults__
([],)
>>> f.__defaults__[0] is f()
True
So the value in func_defaults is the same which is as well known inside function (and returned in my example in order to access it from outside.
In other words, what happens when calling f() is an implicit x = f.func_defaults[0]. If that object is modified subsequently, you'll keep that modification.
In contrast, an assignment inside the function gets always a new []. Any modification will last until the last reference to that [] has gone; on the next function call, a new [] is created.
In order words again, it is not true that [] gets the same object on every execution, but it is (in the case of default argument) only executed once and then preserved.
The problem only exists if the default value is mutable, which None is not. What gets stored along with the function object is the default value. When the function is called, the function's context is initialized with the default value.
a_list = []
just assigns a new object to the name a_list in the context of the current function call. It does not modify None in any way.
No, in good_insert a_list is not initalised only once.
Each time the function is called without specifying the a_list argument, the default is used and a new instance of list is used and returned, the new list does not replace the default value.
The python tutorial says that
the default value is evaluated only once.
The evaluated (only once) default value is stored internally (name it x for simplicity).
case []:
When you define the function with a_list defaulted to [], if you don't provide a_list, it is assigned the internal variable x when . Therefore, when you append to a_list, you are actually appending to x (because a_list and x refer to the same variable now). When you call the function again without a_list, the updated x is re-assigned to a_list.
case None:
The value None is evaluated once and stored in x. If you don't provide, a_list, the variable x is assigned to a_list. But you don't append to x of course. You reassign an empty array to a_list. At this point x and a_list are different variables. The same way when you call the function again without a_list, it first gets the value None from x but then a_list gets assigned to an empty array again.
Note that, for the a_list = [] case, if you provide an explicit value for a_list when you call the function, the new argument does not override x because that's evaluated only once.
If I pass a dataframe to a function and modify it inside the function, is it pass-by-value or pass-by-reference?
I run the following code
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
def letgo(df):
df = df.drop('b',axis=1)
letgo(a)
the value of a does not change after the function call. Does it mean it is pass-by-value?
I also tried the following
xx = np.array([[1,2], [3,4]])
def letgo2(x):
x[1,1] = 100
def letgo3(x):
x = np.array([[3,3],[3,3]])
It turns out letgo2() does change xx and letgo3() does not. Why is it like this?
The short answer is, Python always does pass-by-value, but every Python variable is actually a pointer to some object, so sometimes it looks like pass-by-reference.
In Python every object is either mutable or non-mutable. e.g., lists, dicts, modules and Pandas data frames are mutable, and ints, strings and tuples are non-mutable. Mutable objects can be changed internally (e.g., add an element to a list), but non-mutable objects cannot.
As I said at the start, you can think of every Python variable as a pointer to an object. When you pass a variable to a function, the variable (pointer) within the function is always a copy of the variable (pointer) that was passed in. So if you assign something new to the internal variable, all you are doing is changing the local variable to point to a different object. This doesn't alter (mutate) the original object that the variable pointed to, nor does it make the external variable point to the new object. At this point, the external variable still points to the original object, but the internal variable points to a new object.
If you want to alter the original object (only possible with mutable data types), you have to do something that alters the object without assigning a completely new value to the local variable. This is why letgo() and letgo3() leave the external item unaltered, but letgo2() alters it.
As #ursan pointed out, if letgo() used something like this instead, then it would alter (mutate) the original object that df points to, which would change the value seen via the global a variable:
def letgo(df):
df.drop('b', axis=1, inplace=True)
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo(a) # will alter a
In some cases, you can completely hollow out the original variable and refill it with new data, without actually doing a direct assignment, e.g. this will alter the original object that v points to, which will change the data seen when you use v later:
def letgo3(x):
x[:] = np.array([[3,3],[3,3]])
v = np.empty((2, 2))
letgo3(v) # will alter v
Notice that I'm not assigning something directly to x; I'm assigning something to the entire internal range of x.
If you absolutely must create a completely new object and make it visible externally (which is sometimes the case with pandas), you have two options. The 'clean' option would be just to return the new object, e.g.,
def letgo(df):
df = df.drop('b',axis=1)
return df
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
a = letgo(a)
Another option would be to reach outside your function and directly alter a global variable. This changes a to point to a new object, and any function that refers to a afterward will see that new object:
def letgo():
global a
a = a.drop('b',axis=1)
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo() # will alter a!
Directly altering global variables is usually a bad idea, because anyone who reads your code will have a hard time figuring out how a got changed. (I generally use global variables for shared parameters used by many functions in a script, but I don't let them alter those global variables.)
To add to #Mike Graham's answer, who pointed to a very good read:
In your case, what is important to remember is the difference between names and values. a, df, xx, x, are all names, but they refer to the same or different values at different points of your examples:
In the first example, letgo rebinds df to another value, because df.drop returns a new DataFrame unless you set the argument inplace = True (see doc). That means that the name df (local to the letgo function), which was referring to the value of a, is now referring to a new value, here the df.drop return value. The value a is referring to still exists and hasn't changed.
In the second example, letgo2 mutates x, without rebinding it, which is why xx is modified by letgo2. Unlike the previous example, here the local name x always refers to the value the name xx is referring to, and changes that value in place, which is why the value xx is referring to has changed.
In the third example, letgo3 rebinds x to a new np.array. That causes the name x, local to letgo3 and previously referring to the value of xx, to now refer to another value, the new np.array. The value xx is referring to hasn't changed.
The question isn't PBV vs. PBR. These names only cause confusion in a language like Python; they were invented for languages that work like C or like Fortran (as the quintessential PBV and PBR languages). It is true, but not enlightening, that Python always passes by value. The question here is whether the value itself is mutated or whether you get a new value. Pandas usually errs on the side of the latter.
http://nedbatchelder.com/text/names.html explains very well what Python's system of names is.
Python is neither pass by value nor pass by reference. It is pass by assignment.
Supporting reference, the Python FAQ:
https://docs.python.org/3/faq/programming.html#how-do-i-write-a-function-with-output-parameters-call-by-reference
IOW:
If you pass an immutable value, changes to it do not change its
value in the caller - because you are rebinding the name to a new
object.
If you pass a mutable value, changes made in the called function,
also change the value in the caller, so long as you do not rebind
that name to a new object. If you reassign the variable,
creating a new object, that change and subsequent changes to the
name are not seen in the caller.
So if you pass a list, and change its 0th value, that change is seen in both the called and the caller. But if you reassign the list with a new list, this change is lost. But if you slice the list and replace that with a new list, that change is seen in both the called and the caller.
EG:
def change_it(list_):
# This change would be seen in the caller if we left it alone
list_[0] = 28
# This change is also seen in the caller, and replaces the above
# change
list_[:] = [1, 2]
# This change is not seen in the caller.
# If this were pass by reference, this change too would be seen in
# caller.
list_ = [3, 4]
thing = [10, 20]
change_it(thing)
# here, thing is [1, 2]
If you're a C fan, you can think of this as passing a pointer by value - not a pointer to a pointer to a value, just a pointer to a value.
HTH.
Here is the doc for drop:
Return new object with labels in requested axis removed.
So a new dataframe is created. The original has not changed.
But as for all objects in python, the data frame is passed to the function by reference.
you need to make 'a' global at the start of the function otherwise it is a local variable and does not change the 'a' in the main code.
Short answer:
By value: df2 = df.copy()
By references : df2 = df
I'm writing a Python class based on list. The constructor builds a list based on two other lists that are passed as parameters. The logic is roughly: copy list A to the new instance, then iterate over list B, adding some entries and using others to modify entries from list A.
I've got two versions of the constructor. In the first, list A and list B were processed by loops. Then I decided to get clever; I used a comprehension to replace the loop that adds list A to the new instance.
The first version of the constructor works perfectly. The second version returns an empty list, even though I can look at the value of self in the debugger immediately before the constructor ends, and see that it's correct.
Why is this happening, and what can I do to make the second version work?
Here is the code that makes the second version misbehave. It copies list A to the new instance, then iterates over the instance to update data in a dictionary that represents the items in list B. ba is list A; getkey is a function (passed as a parameter) which derives a dictionary key from a list element; _dictb is a dictionary that contains an element for each element in list B.
self = [ [bae,None] for bae in ba ] # Copy list A to self
for n in xrange(0,len(self)) : # Iterate over list B
_key = getkey( self[n][0])
if _dictb.has_key(_key) :
_dictb[_key] = n
In the first version, which works, the code above is replaced by this; the operations performed and the meanings of the variables are the same:
for bae in ba :
_key = getkey(bae)
if _dictb.has_key(_key) :
_dictb[_key] = len(self)
self.append( [bae,None] )
I will assume you're really talking about Python constructors here and have, for some weird reason, omitted the class definition and the def __init__ statement. In the "second version," which is the first code snippet you gave, your first line assigns a list to a local variable named "self". That DOES NOT replace the object being constructed with a different object. The thing that gets returned from a constructor is the new object, not the variable self.
Solution: don't assign to self. Ever. Don't even think about it.
Also you can't use a list comprehension to create a subclass of list, only an instance of list. Your "first version" (second code snippet) works and there is nothing wrong with it.
Aside: you should replace "_dictb.has_key(key)" with "key in _dictb."
If you are sub-classing a list you don't want to re-assign the object self in your constructor. This will not affect the internal contents of the list itself but replace the reference of self to be the new list comprehension you just created.
In other words it won't have the desired effect you're after.
Instead do something like this:
class MyList(list):
def __init__(self, xs):
super(MyList, self).__init__()
for x in xs: # Assign all values of xs to the list object ``self``
self.append(x)
def chan(ref, let, mode):
if mode[0]=="d":
ref=-ref
a=ord(let)
a=a+ref
let=chr(a)
return let
ref=1
let="q"
chan(ref, let,"k")
print(let)
When I run this it comes out with "q" when i want it to come out with "r"
What have I done wrong and what do I need to do to make it work?
You need to assign the return value of the chan() function back to the let variable:
let = chan(ref, let,"k")
When you pass in a variable to a python function, it passes a copy of a pointer to the same memory location as the function argument that was passed into the function. What this means is that if you change properties on an object passed to a function, those changes persist (provided the object is mutable) outside the function. But re-assigning that pointer inside the function does not affect the argument outside of the function, as you are simply pointing that pointer copy (inside the function) to a different piece of memory, which does not affect the original variable. Python does neither pass by value nor pass by reference in the sense that other languages do this. There are many articles detailing this, such as:
http://stupidpythonideas.blogspot.com/2013/11/does-python-pass-by-value-or-by.html
So the code above does not change let as you are modifying a copy of the let pointer inside the function, not changing the original pointer itself.
I have no use for a global variable and never define one explicitly, and yet I seem to have one in my code. Can you help me make it local, please?
def algo(X): # randomized algorithm
while len(X)>2:
# do a bunch of things to nested list X
print(X)
# tracing: output is the same every time, where it shouldn't be.
return len(X[1][1])
def find_min(X): # iterate algo() multiple times to find minimum
m = float('inf')
for i in some_range:
new = algo(X)
m = min(m, new)
return m
X = [[[..], [...]],
[[..], [...]],
[[..], [...]]]
print(find_min(X))
print(X)
# same value as inside the algo() call, even though it shouldn't be affected.
X appears to be behaving like a global variable. The randomized algorithm algo() is really performed only once on the first call because with X retaining its changed value, it never makes it inside the while loop. The purpose of iterations in find_min is thus defeated.
I'm new to python and even newer to this forum, so let me know if I need to clarify my question. Thanks.
update Many thanks for all the answers so far. I almost understand it, except I've done something like this before with a happier result. Could you explain why this code below is different, please?
def qsort(X):
for ...
# recursively sort X in place
count+=1 # count number of operations
return X, count
X = [ , , , ]
Y, count = qsort(X)
print(Y) # sorted
print(X) # original, unsorted.
Thank you.
update II To answer my own second question, the difference seems to be the use of a list method in the first code (not shown) and the lack thereof in the second code.
As others have pointed out already, the problem is that the list is passed as a reference to the function, so the list inside the function body is the very same object as the one you passed to it as an argument. Any mutations your function performs are thus visible from outside.
To solve this, your algo function should operate on a copy of the list that it gets passed.
As you're operating on a nested list, you should use the deepcopy function from the copy module to create a copy of your list that you can freely mutate without affecting anything outside of your function. The built-in list function can also be used to copy lists, but it only creates shallow copies, which isn't what you want for nested lists, because the inner lists would still just be pointers to the same objects.
from copy import deepcopy
def algo (X):
X = deepcopy(X)
...
When you do find_min(X), you are passing the object X (a list in this case) to the function. If that function mutates the list (e.g., by appending to it) then yes, it will affect the original object. Python does not copy objects just because you pass them to a function.
When you pass an object to a python function, the object isn't copied, but rather a pointer to the object is passed.
This makes sense because it greatly speeds up execution - in the case of a long list, there is no need to copy all of its elements.
However, this means that when you modify a passed object (for example, your list X), the modification applies to that object, even after the function returns.
For example:
def foo(x):
x.extend('a')
print x
l = []
foo(l)
foo(l)
Will print:
['a']
['a', 'a']
Python lists are mutable (i.e., they can be changed) and the use of algo within find_min function call does change the value of X (i.e., it is pass-by-reference for lists). See this SO question, for example.