Scope of a object/variable during method calls [duplicate] - python

If I pass a dataframe to a function and modify it inside the function, is it pass-by-value or pass-by-reference?
I run the following code
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
def letgo(df):
df = df.drop('b',axis=1)
letgo(a)
the value of a does not change after the function call. Does it mean it is pass-by-value?
I also tried the following
xx = np.array([[1,2], [3,4]])
def letgo2(x):
x[1,1] = 100
def letgo3(x):
x = np.array([[3,3],[3,3]])
It turns out letgo2() does change xx and letgo3() does not. Why is it like this?

The short answer is, Python always does pass-by-value, but every Python variable is actually a pointer to some object, so sometimes it looks like pass-by-reference.
In Python every object is either mutable or non-mutable. e.g., lists, dicts, modules and Pandas data frames are mutable, and ints, strings and tuples are non-mutable. Mutable objects can be changed internally (e.g., add an element to a list), but non-mutable objects cannot.
As I said at the start, you can think of every Python variable as a pointer to an object. When you pass a variable to a function, the variable (pointer) within the function is always a copy of the variable (pointer) that was passed in. So if you assign something new to the internal variable, all you are doing is changing the local variable to point to a different object. This doesn't alter (mutate) the original object that the variable pointed to, nor does it make the external variable point to the new object. At this point, the external variable still points to the original object, but the internal variable points to a new object.
If you want to alter the original object (only possible with mutable data types), you have to do something that alters the object without assigning a completely new value to the local variable. This is why letgo() and letgo3() leave the external item unaltered, but letgo2() alters it.
As #ursan pointed out, if letgo() used something like this instead, then it would alter (mutate) the original object that df points to, which would change the value seen via the global a variable:
def letgo(df):
df.drop('b', axis=1, inplace=True)
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo(a) # will alter a
In some cases, you can completely hollow out the original variable and refill it with new data, without actually doing a direct assignment, e.g. this will alter the original object that v points to, which will change the data seen when you use v later:
def letgo3(x):
x[:] = np.array([[3,3],[3,3]])
v = np.empty((2, 2))
letgo3(v) # will alter v
Notice that I'm not assigning something directly to x; I'm assigning something to the entire internal range of x.
If you absolutely must create a completely new object and make it visible externally (which is sometimes the case with pandas), you have two options. The 'clean' option would be just to return the new object, e.g.,
def letgo(df):
df = df.drop('b',axis=1)
return df
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
a = letgo(a)
Another option would be to reach outside your function and directly alter a global variable. This changes a to point to a new object, and any function that refers to a afterward will see that new object:
def letgo():
global a
a = a.drop('b',axis=1)
a = pd.DataFrame({'a':[1,2], 'b':[3,4]})
letgo() # will alter a!
Directly altering global variables is usually a bad idea, because anyone who reads your code will have a hard time figuring out how a got changed. (I generally use global variables for shared parameters used by many functions in a script, but I don't let them alter those global variables.)

To add to #Mike Graham's answer, who pointed to a very good read:
In your case, what is important to remember is the difference between names and values. a, df, xx, x, are all names, but they refer to the same or different values at different points of your examples:
In the first example, letgo rebinds df to another value, because df.drop returns a new DataFrame unless you set the argument inplace = True (see doc). That means that the name df (local to the letgo function), which was referring to the value of a, is now referring to a new value, here the df.drop return value. The value a is referring to still exists and hasn't changed.
In the second example, letgo2 mutates x, without rebinding it, which is why xx is modified by letgo2. Unlike the previous example, here the local name x always refers to the value the name xx is referring to, and changes that value in place, which is why the value xx is referring to has changed.
In the third example, letgo3 rebinds x to a new np.array. That causes the name x, local to letgo3 and previously referring to the value of xx, to now refer to another value, the new np.array. The value xx is referring to hasn't changed.

The question isn't PBV vs. PBR. These names only cause confusion in a language like Python; they were invented for languages that work like C or like Fortran (as the quintessential PBV and PBR languages). It is true, but not enlightening, that Python always passes by value. The question here is whether the value itself is mutated or whether you get a new value. Pandas usually errs on the side of the latter.
http://nedbatchelder.com/text/names.html explains very well what Python's system of names is.

Python is neither pass by value nor pass by reference. It is pass by assignment.
Supporting reference, the Python FAQ:
https://docs.python.org/3/faq/programming.html#how-do-i-write-a-function-with-output-parameters-call-by-reference
IOW:
If you pass an immutable value, changes to it do not change its
value in the caller - because you are rebinding the name to a new
object.
If you pass a mutable value, changes made in the called function,
also change the value in the caller, so long as you do not rebind
that name to a new object. If you reassign the variable,
creating a new object, that change and subsequent changes to the
name are not seen in the caller.
So if you pass a list, and change its 0th value, that change is seen in both the called and the caller. But if you reassign the list with a new list, this change is lost. But if you slice the list and replace that with a new list, that change is seen in both the called and the caller.
EG:
def change_it(list_):
# This change would be seen in the caller if we left it alone
list_[0] = 28
# This change is also seen in the caller, and replaces the above
# change
list_[:] = [1, 2]
# This change is not seen in the caller.
# If this were pass by reference, this change too would be seen in
# caller.
list_ = [3, 4]
thing = [10, 20]
change_it(thing)
# here, thing is [1, 2]
If you're a C fan, you can think of this as passing a pointer by value - not a pointer to a pointer to a value, just a pointer to a value.
HTH.

Here is the doc for drop:
Return new object with labels in requested axis removed.
So a new dataframe is created. The original has not changed.
But as for all objects in python, the data frame is passed to the function by reference.

you need to make 'a' global at the start of the function otherwise it is a local variable and does not change the 'a' in the main code.

Short answer:
By value: df2 = df.copy()
By references : df2 = df

Related

why does my list keep growing even when a new class is instantiated? [duplicate]

I'm at the point in learning Python where I'm dealing with the Mutable Default Argument problem.
# BAD: if `a_list` is not passed in, the default will wrongly retain its contents between successive function calls
def bad_append(new_item, a_list=[]):
a_list.append(new_item)
return a_list
# GOOD: if `a_list` is not passed in, the default will always correctly be []
def good_append(new_item, a_list=None):
if a_list is None:
a_list = []
a_list.append(new_item)
return a_list
I understand that a_list is initialized only when the def statement is first encountered, and that's why subsequent calls of bad_append use the same list object.
What I don't understand is why good_append works any different. It looks like a_list would still be initialized only once; therefore, the if statement would only be true on the first invocation of the function, meaning a_list would only get reset to [] on the first invocation, meaning it would still accumulate all past new_item values and still be buggy.
Why isn't it? What concept am I missing? How does a_list get wiped clean every time good_append runs?
It looks like a_list would still be initialized only once
"initialization" is not something that happens to variables in Python, because variables in Python are just names. "initialization" only happens to objects, and it's done via the class' __init__ method.
When you write a = 0, that is an assignment. That is saying "a shall refer to the object that is described by the expression 0". It is not initialization; a can name anything else of any type at any later time, and that happens as a result of assigning something else to a. Assignment is just assignment. The first one is not special.
When you write def good_append(new_item, a_list=None), that is not "initializing" a_list. It is setting up an internal reference to an object, the result of evaluating None, so that when good_append is called without a second parameter, that object is automatically assigned to a_list.
meaning a_list would only get reset to [] on the first invocation
No, a_list gets set to [] any time that a_list is None to begin with. That is, when either None is passed explicitly, or the argument is omitted.
The problem with [] occurs because the expression [] is only evaluated once in this context. When the function is compiled, [] is evaluated, a specific list object is created - that happens to be empty to start - and that object is used as the default.
How does a_list get wiped clean every time good_append runs?
It doesn't. It doesn't need to be.
You know how the problem is described as being with "mutable default arguments"?
None is not mutable.
The problem occurs when you modify the object that the parameter has as a default.
a_list = [] does not modify whatever object a_list previously referred to. It cannot; arbitrary objects cannot magically transform in-place into empty lists. a_list = [] means "a_list shall stop referring to what it previously referred to, and start referring to []". The previously-referred-to object is unchanged.
When the function is compiled, and one of the arguments has a default value, that value - an object - gets baked into the function (which is also, itself, an object!). When you write code that mutates an object, the object mutates. If the object being referred to happens to be the object baked into the function, it still mutates.
But you cannot mutate None. It is immutable.
You can mutate []. It is a list, and lists are mutable. Appending an item to a list mutates the list.
The default value of a_list (or any other default value, for that matter) is stored in the function's interiors once it has been initialized and thus can be modified in any way:
>>> def f(x=[]): return x
...
>>> f.func_defaults
([],)
>>> f.func_defaults[0] is f()
True
resp. for Python 3:
>>> def f(x=[]): return x
...
>>> f.__defaults__
([],)
>>> f.__defaults__[0] is f()
True
So the value in func_defaults is the same which is as well known inside function (and returned in my example in order to access it from outside.
In other words, what happens when calling f() is an implicit x = f.func_defaults[0]. If that object is modified subsequently, you'll keep that modification.
In contrast, an assignment inside the function gets always a new []. Any modification will last until the last reference to that [] has gone; on the next function call, a new [] is created.
In order words again, it is not true that [] gets the same object on every execution, but it is (in the case of default argument) only executed once and then preserved.
The problem only exists if the default value is mutable, which None is not. What gets stored along with the function object is the default value. When the function is called, the function's context is initialized with the default value.
a_list = []
just assigns a new object to the name a_list in the context of the current function call. It does not modify None in any way.
No, in good_insert a_list is not initalised only once.
Each time the function is called without specifying the a_list argument, the default is used and a new instance of list is used and returned, the new list does not replace the default value.
The python tutorial says that
the default value is evaluated only once.
The evaluated (only once) default value is stored internally (name it x for simplicity).
case []:
When you define the function with a_list defaulted to [], if you don't provide a_list, it is assigned the internal variable x when . Therefore, when you append to a_list, you are actually appending to x (because a_list and x refer to the same variable now). When you call the function again without a_list, the updated x is re-assigned to a_list.
case None:
The value None is evaluated once and stored in x. If you don't provide, a_list, the variable x is assigned to a_list. But you don't append to x of course. You reassign an empty array to a_list. At this point x and a_list are different variables. The same way when you call the function again without a_list, it first gets the value None from x but then a_list gets assigned to an empty array again.
Note that, for the a_list = [] case, if you provide an explicit value for a_list when you call the function, the new argument does not override x because that's evaluated only once.

Python weird thing with list local and global scope

Why can we modify list with the append method but can't do the same with the list concatenation?
I know about the local and global scope, I'm confused about why we can do it with append method, thanks in advance
some_list=[]
def foo():
some_list.append('apple')
foo()
print(some_list)
#it works
with list concatenation
some_list=[]
def foo():
some_list+=['apple']
foo()
print(some_list)
#UnboundLocalError: local variable 'some_list' referenced before assignment
Augmented operations like += reassign the original variable, even if its not strictly necessary.
Python's operators turn into calls to an object's magic methods: __iadd__ for +=. Immutable objects like int can't change themselves so you can't do an in-place += like you can in C. Instead, python's augmented methods return an object to be reassigned to the variable being manipulated. Mutable objects like lists just return themselves while immutable objects return a different object.
Since the variable is being reassigned, it has to follow the same scoping rules as any other variable. The reassignment causes python to assume that the variable is in the local namespace and you need the global keyword to override that assumption.
Local and global scopes apply to the variable itself, specifically what object it is referencing. The variable just points to an object in memory. You can't change the 'pointer' of the variable in a different local scope, but you can change the object that it points to. Here, you can use append because it changes the list object itself, the one stored in memory. It doesn't change what the variable some_list points to. On the other hand, the second example, you try to reassign some_list to refer to a new list that was created in combination with the old list. Since this can't happen, the interpreter now treats this some_list as a local variable separate from the other, gloabl one

How to make a variable (truly) local to a procedure or function

ie we have the global declaration, but no local.
"Normally" arguments are local, I think, or they certainly behave that way.
However if an argument is, say, a list and a method is applied which modifies the list, some surprising (to me) results can ensue.
I have 2 questions: what is the proper way to ensure that a variable is truly local?
I wound up using the following, which works, but it can hardly be the proper way of doing it:
def AexclB(a,b):
z = a+[] # yuk
for k in range(0, len(b)):
try: z.remove(b[k])
except: continue
return z
Absent the +[], "a" in the calling scope gets modified, which is not desired.
(The issue here is using a list method,
The supplementary question is, why is there no "local" declaration?
Finally, in trying to pin this down, I made various mickey mouse functions which all behaved as expected except the last one:
def fun4(a):
z = a
z = z.append(["!!"])
return z
a = ["hello"]
print "a=",a
print "fun4(a)=",fun4(a)
print "a=",a
which produced the following on the console:
a= ['hello']
fun4(a)= None
a= ['hello', ['!!']]
...
>>>
The 'None' result was not expected (by me).
Python 2.7 btw in case that matters.
PS: I've tried searching here and elsewhere but not succeeded in finding anything corresponding exactly - there's lots about making variables global, sadly.
It's not that z isn't a local variable in your function. Rather when you have the line z = a, you are making z refer to the same list in memory that a already points to. If you want z to be a copy of a, then you should write z = a[:] or z = list(a).
See this link for some illustrations and a bit more explanation http://henry.precheur.org/python/copy_list
Python will not copy objects unless you explicitly ask it to. Integers and strings are not modifiable, so every operation on them returns a new instance of the type. Lists, dictionaries, and basically every other object in Python are mutable, so operations like list.append happen in-place (and therefore return None).
If you want the variable to be a copy, you must explicitly copy it. In the case of lists, you slice them:
z = a[:]
There is a great answer than will cover most of your question in here which explains mutable and immutable types and how they are kept in memory and how they are referenced. First section of the answer is for you. (Before How do we get around this? header)
In the following line
z = z.append(["!!"])
Lists are mutable objects, so when you call append, it will update referenced object, it will not create a new one and return it. If a method or function do not retun anything, it means it returns None.
Above link also gives an immutable examle so you can see the real difference.
You can not make a mutable object act like it is immutable. But you can create a new one instead of passing the reference when you create a new object from an existing mutable one.
a = [1,2,3]
b = a[:]
For more options you can check here
What you're missing is that all variable assignment in python is by reference (or by pointer, if you like). Passing arguments to a function literally assigns values from the caller to the arguments of the function, by reference. If you dig into the reference, and change something inside it, the caller will see that change.
If you want to ensure that callers will not have their values changed, you can either try to use immutable values more often (tuple, frozenset, str, int, bool, NoneType), or be certain to take copies of your data before mutating it in place.
In summary, scoping isn't involved in your problem here. Mutability is.
Is that clear now?
Still not sure whats the 'correct' way to force the copy, there are
various suggestions here.
It differs by data type, but generally <type>(obj) will do the trick. For example list([1, 2]) and dict({1:2}) both return (shallow!) copies of their argument.
If, however, you have a tree of mutable objects and also you don't know a-priori which level of the tree you might modify, you need the copy module. That said, I've only needed this a handful of times (in 8 years of full-time python), and most of those ended up causing bugs. If you need this, it's a code smell, in my opinion.
The complexity of maintaining copies of mutable objects is the reason why there is a growing trend of using immutable objects by default. In the clojure language, all data types are immutable by default and mutability is treated as a special cases to be minimized.
If you need to work on a list or other object in a truly local context you need to explicitly make a copy or a deep copy of it.
from copy import copy
def fn(x):
y = copy(x)

why is python global scope affected by local scope operations?

I thought that changes to variables passed to a python function remain in the function's local scope and are not passed to global scope. But when I wrote a test script:
#! /usr/bin/python
from numpy import *
def fun(box, var):
box[0]=box[0]*4
var=var*4
return 0
ubox,x = array([1.]), 1.
print ubox,x
fun(ubox,x)
print ubox,x
The output is:
[myplay4]$ ./temp.py
[ 1.] 1.0
[ 4.] 1.0
The integer variable x is not affected by the operation inside the function but the array is. Lists are also affected but this only happens if operating on list/array slices not on individual elements.
Can anyone please explain why the local scope passes to global scope in this case?
The important thing to realize is that when pass an object to a function, the function does not work with an independent copy of that object, it works with the same object. So any changes to the object are visible to the outside.
You say that changes to local variables remain local. That's true, but it only applies to changing variables (i.e. reassigning them). It does not apply to mutating an object that a variable points to.
In your example, you reassign var, so the change is not visible on the outside. However you're mutating box (by reassigning one of its elements). That change is visible on the outside. If you simply reassigned box to refer to a different object (box = something), that change would not be visible on the outside.
In your function
def fun(box, var):
box[0]=box[0]*4
var=var*4
return 0
both box and var are local, and changing them does not change their value in the calling scope. However, this line:
box[0]=box[0]*4
does not change box; it changes the object that box refers to. If that line were written as
box = box[0]*4 + box[1:]
then box in calling scope would indeed remain unchanged.
This has nothing to do with scope at all.
You're passing an object into a function. Inside that function, that object is mutated. The point is that the variable inside the function refers to the same object as in the calling function, so the changes are visible outside.
This does not depend on scope. It depends on how Python copies objects. For this respect, there are three kind of objects: scalars, mutable objects, immutable objects.
Scalars are copied by value, mutable objects are copied by reference and immutable objects are probably copied by reference but since you cannot modify them there is no implication.
Scalars are fore example all numeric types. Immutable are: strings and tuple. Mutable are: lists, dictionaries and other objects.

Parse Method Variable Names In Python?

Given the python function:
def MyPythonMethod(value1, value2):
# defining some variables
a = 4
myValue = 15.65
listValues = [4, 67, 83, -23]
# doing some operation on the list
listValues[0] = listValues[1]
# looping through the values
for i in listValues:
print i
How can I extract the names and types of all the variables in method MyPythonMethod?
Ideally, I'd like to get all variable names and their types given a method name. for example, the output for method MyPythonMethod will look like this:
varNames = ["a", "myValue", "listValues", "i"]
varTypes = ["int", "float", "list", "float"]
Any ideas?
1 Variables don't have a type in python. Objects have a type, and variables point to objects.
[2] you can use the inspect module to get info about the internals of your function.
Read the docs -- they will tell you what is available for inspection.
MyPythonMethod.func_code.co_varnames will give you the local variable names, for example.
( And note that MyPythonMethod, as defined, is actually a function, not a method. )
[3] But even when you get the names of the local variables, the aren't bound to any objects
except while the function is executing. The value 4 is bound to local var 'a' in the function -- before and after the function is called, there is no 'a' and it's not bound to anything.
[4] If you run the function in the debugger, you can halt the execution at any point and inspect the variables and objects created in the function.
[5] If the function raises an exception, you can catch the exception and get access to some of the state of the function at the time of the exception.
You can't do this "from the outside".
Local variables don't exist until the method runs. Although the scope of all variables is known statically, i.e. at compiletime, I don't think you can get this information easily without crawling through the AST or bytecode yourself. (Edit: Steven proved me wrong about this one... code objects have a tuple containing all local variable names)
A given chunk of code doesn't have access to any scopes but its own and the sourrounding "lexical" scopes (builtins, module-level globals, local scopes of enclosing functions).
There is no such thing as the type of a variable (in Python) - any variable can refer to any number of objects of completely different types during its lifetime. What should the output be if you add a = "foo"? And if you then add a = SomeClass()?
Inside the method itself, you could use locals() to get a dictionary of local variables and the objects they currently refer to, and you could proceed to call type on the values (the objects). Of course this only gets you the type of the object currently referred to. As hinted in the comment, I doubt that this is useful. What do you really want to do, i.e. what problem are you trying to solve?
If you use pdb can't you set the last line as a breakpoint and then ask the debugger to look at the top stack frame and list the variables for you? Or you could look at the pdb code and copy its tricks for how to introduce the breakpoint and then inspect the stack frame beneath the breakpoint function that you register.

Categories

Resources