Why can we modify list with the append method but can't do the same with the list concatenation?
I know about the local and global scope, I'm confused about why we can do it with append method, thanks in advance
some_list=[]
def foo():
some_list.append('apple')
foo()
print(some_list)
#it works
with list concatenation
some_list=[]
def foo():
some_list+=['apple']
foo()
print(some_list)
#UnboundLocalError: local variable 'some_list' referenced before assignment
Augmented operations like += reassign the original variable, even if its not strictly necessary.
Python's operators turn into calls to an object's magic methods: __iadd__ for +=. Immutable objects like int can't change themselves so you can't do an in-place += like you can in C. Instead, python's augmented methods return an object to be reassigned to the variable being manipulated. Mutable objects like lists just return themselves while immutable objects return a different object.
Since the variable is being reassigned, it has to follow the same scoping rules as any other variable. The reassignment causes python to assume that the variable is in the local namespace and you need the global keyword to override that assumption.
Local and global scopes apply to the variable itself, specifically what object it is referencing. The variable just points to an object in memory. You can't change the 'pointer' of the variable in a different local scope, but you can change the object that it points to. Here, you can use append because it changes the list object itself, the one stored in memory. It doesn't change what the variable some_list points to. On the other hand, the second example, you try to reassign some_list to refer to a new list that was created in combination with the old list. Since this can't happen, the interpreter now treats this some_list as a local variable separate from the other, gloabl one
Related
If I pass a dataframe to a function and modify it inside the function, is it pass-by-value or pass-by-reference?
I run the following code
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
def letgo(df):
df = df.drop('b',axis=1)
letgo(a)
the value of a does not change after the function call. Does it mean it is pass-by-value?
I also tried the following
xx = np.array([[1,2], [3,4]])
def letgo2(x):
x[1,1] = 100
def letgo3(x):
x = np.array([[3,3],[3,3]])
It turns out letgo2() does change xx and letgo3() does not. Why is it like this?
The short answer is, Python always does pass-by-value, but every Python variable is actually a pointer to some object, so sometimes it looks like pass-by-reference.
In Python every object is either mutable or non-mutable. e.g., lists, dicts, modules and Pandas data frames are mutable, and ints, strings and tuples are non-mutable. Mutable objects can be changed internally (e.g., add an element to a list), but non-mutable objects cannot.
As I said at the start, you can think of every Python variable as a pointer to an object. When you pass a variable to a function, the variable (pointer) within the function is always a copy of the variable (pointer) that was passed in. So if you assign something new to the internal variable, all you are doing is changing the local variable to point to a different object. This doesn't alter (mutate) the original object that the variable pointed to, nor does it make the external variable point to the new object. At this point, the external variable still points to the original object, but the internal variable points to a new object.
If you want to alter the original object (only possible with mutable data types), you have to do something that alters the object without assigning a completely new value to the local variable. This is why letgo() and letgo3() leave the external item unaltered, but letgo2() alters it.
As #ursan pointed out, if letgo() used something like this instead, then it would alter (mutate) the original object that df points to, which would change the value seen via the global a variable:
def letgo(df):
df.drop('b', axis=1, inplace=True)
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo(a) # will alter a
In some cases, you can completely hollow out the original variable and refill it with new data, without actually doing a direct assignment, e.g. this will alter the original object that v points to, which will change the data seen when you use v later:
def letgo3(x):
x[:] = np.array([[3,3],[3,3]])
v = np.empty((2, 2))
letgo3(v) # will alter v
Notice that I'm not assigning something directly to x; I'm assigning something to the entire internal range of x.
If you absolutely must create a completely new object and make it visible externally (which is sometimes the case with pandas), you have two options. The 'clean' option would be just to return the new object, e.g.,
def letgo(df):
df = df.drop('b',axis=1)
return df
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
a = letgo(a)
Another option would be to reach outside your function and directly alter a global variable. This changes a to point to a new object, and any function that refers to a afterward will see that new object:
def letgo():
global a
a = a.drop('b',axis=1)
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo() # will alter a!
Directly altering global variables is usually a bad idea, because anyone who reads your code will have a hard time figuring out how a got changed. (I generally use global variables for shared parameters used by many functions in a script, but I don't let them alter those global variables.)
To add to #Mike Graham's answer, who pointed to a very good read:
In your case, what is important to remember is the difference between names and values. a, df, xx, x, are all names, but they refer to the same or different values at different points of your examples:
In the first example, letgo rebinds df to another value, because df.drop returns a new DataFrame unless you set the argument inplace = True (see doc). That means that the name df (local to the letgo function), which was referring to the value of a, is now referring to a new value, here the df.drop return value. The value a is referring to still exists and hasn't changed.
In the second example, letgo2 mutates x, without rebinding it, which is why xx is modified by letgo2. Unlike the previous example, here the local name x always refers to the value the name xx is referring to, and changes that value in place, which is why the value xx is referring to has changed.
In the third example, letgo3 rebinds x to a new np.array. That causes the name x, local to letgo3 and previously referring to the value of xx, to now refer to another value, the new np.array. The value xx is referring to hasn't changed.
The question isn't PBV vs. PBR. These names only cause confusion in a language like Python; they were invented for languages that work like C or like Fortran (as the quintessential PBV and PBR languages). It is true, but not enlightening, that Python always passes by value. The question here is whether the value itself is mutated or whether you get a new value. Pandas usually errs on the side of the latter.
http://nedbatchelder.com/text/names.html explains very well what Python's system of names is.
Python is neither pass by value nor pass by reference. It is pass by assignment.
Supporting reference, the Python FAQ:
https://docs.python.org/3/faq/programming.html#how-do-i-write-a-function-with-output-parameters-call-by-reference
IOW:
If you pass an immutable value, changes to it do not change its
value in the caller - because you are rebinding the name to a new
object.
If you pass a mutable value, changes made in the called function,
also change the value in the caller, so long as you do not rebind
that name to a new object. If you reassign the variable,
creating a new object, that change and subsequent changes to the
name are not seen in the caller.
So if you pass a list, and change its 0th value, that change is seen in both the called and the caller. But if you reassign the list with a new list, this change is lost. But if you slice the list and replace that with a new list, that change is seen in both the called and the caller.
EG:
def change_it(list_):
# This change would be seen in the caller if we left it alone
list_[0] = 28
# This change is also seen in the caller, and replaces the above
# change
list_[:] = [1, 2]
# This change is not seen in the caller.
# If this were pass by reference, this change too would be seen in
# caller.
list_ = [3, 4]
thing = [10, 20]
change_it(thing)
# here, thing is [1, 2]
If you're a C fan, you can think of this as passing a pointer by value - not a pointer to a pointer to a value, just a pointer to a value.
HTH.
Here is the doc for drop:
Return new object with labels in requested axis removed.
So a new dataframe is created. The original has not changed.
But as for all objects in python, the data frame is passed to the function by reference.
you need to make 'a' global at the start of the function otherwise it is a local variable and does not change the 'a' in the main code.
Short answer:
By value: df2 = df.copy()
By references : df2 = df
I have a list of variables and values that I assign to them using an eval statement.
I am trying to use a dictionary comprehension to match the variable with the evaluated value.
When I use a for loop for i in range(0,10) where len(ChosenVarNameList) = 10:
dictinitial = {}
for i in range (0,len(ChosenVarNameList)):
dictinitial[ChosenVarNameList[i]] = eval("%s" %ChosenVarNameList[i])
I can create the dictionary.
When I reference individual indexes also I can see the dictionary populating correctly (with the code below).
dictinitialnew = {ChosenVarNameList[0] : (eval("%s"
%ChosenVarNameList[0])).
However when I try a dictionary comprehension like the code below:
dictinitialnew = {ChosenVarNameList[i] : (eval("%s" %ChosenVarNameList[i]))
for i in range (0,len(ChosenVarNameList)) }
I get code saying the first variable name let's say 'Code1' is not defined. Is there a way to do this using a dictionary comprehension or is there an alternative that I must use to get around this problem.
Thanks in advance.
Your problem is due to dict comprehensions introducing nested scope. For most practical purposes, a dict comprehension in a function like:
def myfunc(iterable, y):
return {x: y for x in iterable}
is implemented as something very similar to:
def myfunc(iterable, y):
def _unnamed_func_(_unnamed_it_):
retval = {}
for x in _unnamed_it_:
retval[x] = y # Note: y is read from nested scope, not passed to inner func
return retval
return _unnamed_func_(iterable) # Note: iterable passed as argument
That _unnamed_func_, like all functions with closure scope, determines what values from the nested scope are needed at the moment it is defined and folds them into its own closure scope; in this case, it needs y from the nested scope, but not iterable (because the first iterable you iterate over is passed to the virtual function as an argument, not through closure scope).
Problem is, eval is executed with knowledge of only the local and global scopes (as well as implicit knowledge of the builtin scope that all code has); it doesn't know about the nested scope, and since you only reference the variables via eval, the nested function doesn't know it needs them either.
You can demonstrate the general problem with simpler code:
def outer(x):
def inner():
return eval('x')
return inner
If you try to run that with outer(1)() (and no x in global scope), it will die with NameError: name 'x' is not defined, because x was not part of the closure scope of inner, and it was promptly discarded when outer returned. Changing eval('x') to just x allows it to work (it returns 1 in the example case), because without eval getting in the way, inner pulls x into its closure scope at definition time so it's available when inner is run later.
Your design is a bad one to start with (eval should not be used for reading simple variable names), dict comprehensions just make it break completely.
The reason is behaves this way is that the language definitions for comprehensions are built off of the definition of a generator expression, and generator expressions must be implemented with closure scope; since they run lazily, if they didn't use a closure scope to keep nested variables they rely on alive, by the time they are run out the scope might no longer exist. list comprehensions in Python 2 used to execute without a closure, but it caused some weird artifacts (e.g. running foo = [x for x in y] in a class definition would give the class a class attribute named x with the final value x took in the comprehension) and in Python 3, all comprehensions were changed to use an implicit closure scope (this was only a change for listcomps; dict and set comprehensions were added later, and used closure scopes from the start).
When trying out code that assigns a GUID to class instances, I wrote something similar to the following:
#!/usr/bin/env python
# -*- coding: UTF-8 -*-
x_id = 0 # immutable
x_id_list = [0] # mutable
def fx(x):
global x_id # Disable to get "UnboundLocalError: local variable 'x_id' referenced before assignment"
if x is None:
x_id += 2
x_id_list[0] += 2
else:
x_id += 1
x_id_list[0] += 1
return (x_id - 1)
return x_id
expected = [x for x in xrange(10)]
actual = [fx(x) for x in expected]
assert(expected == actual), "expected = {}, actual = {}".format(expected, actual)
print x_id_list
print x_id
Notice that only the immutable x_id throws the UnboundLocalError if its global scope is not defined, but the mutable x_id_list continues to work fine without its global scope needing to be defined.
Why is that?
The issue is not that x_id is immutable (it isn't - the integer value 0 is what's immutable), but that you cannot assign to a variable defined outside a function without explicitly declaring your intent to do so via global. Mutating the list x_id_list refers to does not change the value of the x_id_list variable itself, and therefore is permitted.
If you tried to do x_id_list += [1] you'd run into the same error. It is assigning to a variable, not mutating a value, that is the problem here.
From the docs:
In Python, variables that are only referenced inside a function are implicitly global. If a variable is assigned a value anywhere within the function’s body, it’s assumed to be a local unless explicitly declared as global.
Though a bit surprising at first, a moment’s consideration explains this. On one hand, requiring global for assigned variables provides a bar against unintended side-effects. On the other hand, if global was required for all global references, you’d be using global all the time. You’d have to declare as global every reference to a built-in function or to a component of an imported module. This clutter would defeat the usefulness of the global declaration for identifying side-effects.
This answer also goes into some more detail.
That's because global scope variables are read-only in functions.
Since you're modifying the variable x_id, you need to reference it before using it, or Python will try to create a new variable that is local to the function.
Having used the global expression at the beginning of your function, you've told Python that you need to reference and modify that variable in your function. Since you are modifying the variable x_id, this happens. Since x_id_list is mutable, you are essentially not modifying it visibly (only internally, unlike x_id, where such a change is external), and therefore, you don't need the global keyword.
it seems that strings and dicts behave fundamentally differently in python. when i pass a string to a function it gets modified in the local function's scope only, but when i do the same with a dict, it gets modified in the scope beyond the function:
def change_str(s):
s += " qwe"
def change_arr(a):
a[1] = "qwe"
ss = "asd"
change_str(ss)
print ss
# prints:
# asd
aa = {0:"asd"}
change_arr(aa)
print aa
# prints:
# {0: 'asd', 1: 'qwe'}
is this behavior intentional, and if so then why?
It is intentional behavior. Strings are immutable in python, so essentially all string operations return a new string and as your functions do not return anything, you cannot see the new string asd qwe. You can change the contents of mutable containers outside of local scope without declaring them global.
You can read more about mutable types in the official documentation of pythons data model.
Don't let the 'assignment' operator fool you. This is what is really going on in each of these functions:
def change_str(s):
# operation has been split into 2 steps for clarity
t = s.__iadd__("qwe") # creates a new string object
s = t # as you know, without the `global` keyword, this `s` is local.
def change_arr(a):
a.__setitem__(1, "qwe")
As you can see, only one of these functions actually has an assignment operation. The []= is shorthand for (or equivalent to) .__setitem__().
Yes, it's intentional. Each type determines how operators work on it. The dict type is set up so that a[1] = "qwe" modifies the dict object. Such changes will be seen in any piece of code that references that object. The string type is set up so that s += "qwe" does not modify the object, but returns a new object. So other code that was referencing the original object will see no changes.
The shorthand way of saying that is that strings are immutable and dicts are mutable. However, it's worth noting that "dicts are mutable" isn't the whole reason why the behavior happens. The reason is that item assignment (someDict[item] = val) is an operation that actaully mutates a dict.
I thought that changes to variables passed to a python function remain in the function's local scope and are not passed to global scope. But when I wrote a test script:
#! /usr/bin/python
from numpy import *
def fun(box, var):
box[0]=box[0]*4
var=var*4
return 0
ubox,x = array([1.]), 1.
print ubox,x
fun(ubox,x)
print ubox,x
The output is:
[myplay4]$ ./temp.py
[ 1.] 1.0
[ 4.] 1.0
The integer variable x is not affected by the operation inside the function but the array is. Lists are also affected but this only happens if operating on list/array slices not on individual elements.
Can anyone please explain why the local scope passes to global scope in this case?
The important thing to realize is that when pass an object to a function, the function does not work with an independent copy of that object, it works with the same object. So any changes to the object are visible to the outside.
You say that changes to local variables remain local. That's true, but it only applies to changing variables (i.e. reassigning them). It does not apply to mutating an object that a variable points to.
In your example, you reassign var, so the change is not visible on the outside. However you're mutating box (by reassigning one of its elements). That change is visible on the outside. If you simply reassigned box to refer to a different object (box = something), that change would not be visible on the outside.
In your function
def fun(box, var):
box[0]=box[0]*4
var=var*4
return 0
both box and var are local, and changing them does not change their value in the calling scope. However, this line:
box[0]=box[0]*4
does not change box; it changes the object that box refers to. If that line were written as
box = box[0]*4 + box[1:]
then box in calling scope would indeed remain unchanged.
This has nothing to do with scope at all.
You're passing an object into a function. Inside that function, that object is mutated. The point is that the variable inside the function refers to the same object as in the calling function, so the changes are visible outside.
This does not depend on scope. It depends on how Python copies objects. For this respect, there are three kind of objects: scalars, mutable objects, immutable objects.
Scalars are copied by value, mutable objects are copied by reference and immutable objects are probably copied by reference but since you cannot modify them there is no implication.
Scalars are fore example all numeric types. Immutable are: strings and tuple. Mutable are: lists, dictionaries and other objects.