UnboundLocalError when using += operator [duplicate] - python

I've seen there are actually two (maybe more) ways to concatenate lists in Python:
One way is to use the extend() method:
a = [1, 2]
b = [2, 3]
b.extend(a)
the other to use the plus (+) operator:
b += a
Now I wonder: which of those two options is the 'pythonic' way to do list concatenation and is there a difference between the two? (I've looked up the official Python tutorial but couldn't find anything anything about this topic).

The only difference on a bytecode level is that the .extend way involves a function call, which is slightly more expensive in Python than the INPLACE_ADD.
It's really nothing you should be worrying about, unless you're performing this operation billions of times. It is likely, however, that the bottleneck would lie some place else.

You can't use += for non-local variable (variable which is not local for function and also not global)
def main():
l = [1, 2, 3]
def foo():
l.extend([4])
def boo():
l += [5]
foo()
print l
boo() # this will fail
main()
It's because for extend case compiler will load the variable l using LOAD_DEREF instruction, but for += it will use LOAD_FAST - and you get *UnboundLocalError: local variable 'l' referenced before assignment*

You can chain function calls, but you can't += a function call directly:
class A:
def __init__(self):
self.listFoo = [1, 2]
self.listBar = [3, 4]
def get_list(self, which):
if which == "Foo":
return self.listFoo
return self.listBar
a = A()
other_list = [5, 6]
a.get_list("Foo").extend(other_list)
a.get_list("Foo") += other_list #SyntaxError: can't assign to function call

I would say that there is some difference when it comes with numpy (I just saw that the question ask about concatenating two lists, not numpy array, but since it might be a issue for beginner, such as me, I hope this can help someone who seek the solution to this post), for ex.
import numpy as np
a = np.zeros((4,4,4))
b = []
b += a
it will return with error
ValueError: operands could not be broadcast together with shapes (0,) (4,4,4)
b.extend(a) works perfectly

The .extend() method on lists works with any iterable*, += works with some but can get funky.
import numpy as np
l = [2, 3, 4]
t = (5, 6, 7)
l += t
l
[2, 3, 4, 5, 6, 7]
l = [2, 3, 4]
t = np.array((5, 6, 7))
l += t
l
array([ 7, 9, 11])
l = [2, 3, 4]
t = np.array((5, 6, 7))
l.extend(t)
l
[2, 3, 4, 5, 6, 7]
Python 3.6
*pretty sure .extend() works with any iterable but please comment if I am incorrect
Edit: "extend()" changed to "The .extend() method on lists"
Note: David M. Helmuth's comment below is nice and clear.

Actually, there are differences among the three options: ADD, INPLACE_ADD and extend. The former is always slower, while the other two are roughly the same.
With this information, I would rather use extend, which is faster than ADD, and seems to me more explicit of what you are doing than INPLACE_ADD.
Try the following code a few times (for Python 3):
import time
def test():
x = list(range(10000000))
y = list(range(10000000))
z = list(range(10000000))
# INPLACE_ADD
t0 = time.process_time()
z += x
t_inplace_add = time.process_time() - t0
# ADD
t0 = time.process_time()
w = x + y
t_add = time.process_time() - t0
# Extend
t0 = time.process_time()
x.extend(y)
t_extend = time.process_time() - t0
print('ADD {} s'.format(t_add))
print('INPLACE_ADD {} s'.format(t_inplace_add))
print('extend {} s'.format(t_extend))
print()
for i in range(10):
test()
ADD 0.3540440000000018 s
INPLACE_ADD 0.10896000000000328 s
extend 0.08370399999999734 s
ADD 0.2024550000000005 s
INPLACE_ADD 0.0972940000000051 s
extend 0.09610200000000191 s
ADD 0.1680199999999985 s
INPLACE_ADD 0.08162199999999586 s
extend 0.0815160000000077 s
ADD 0.16708400000000267 s
INPLACE_ADD 0.0797719999999913 s
extend 0.0801490000000058 s
ADD 0.1681250000000034 s
INPLACE_ADD 0.08324399999999343 s
extend 0.08062700000000689 s
ADD 0.1707760000000036 s
INPLACE_ADD 0.08071900000000198 s
extend 0.09226200000000517 s
ADD 0.1668420000000026 s
INPLACE_ADD 0.08047300000001201 s
extend 0.0848089999999928 s
ADD 0.16659500000000094 s
INPLACE_ADD 0.08019399999999166 s
extend 0.07981599999999389 s
ADD 0.1710910000000041 s
INPLACE_ADD 0.0783479999999912 s
extend 0.07987599999999873 s
ADD 0.16435900000000458 s
INPLACE_ADD 0.08131200000001115 s
extend 0.0818660000000051 s

From the CPython 3.5.2 source code:
No big difference.
static PyObject *
list_inplace_concat(PyListObject *self, PyObject *other)
{
PyObject *result;
result = listextend(self, other);
if (result == NULL)
return result;
Py_DECREF(result);
Py_INCREF(self);
return (PyObject *)self;
}

ary += ext creates a new List object, then copies data from lists "ary" and "ext" into it.
ary.extend(ext) merely adds reference to "ext" list to the end of the "ary" list, resulting in less memory transactions.
As a result, .extend works orders of magnitude faster and doesn't use any additional memory outside of the list being extended and the list it's being extended with.
╰─➤ time ./list_plus.py
./list_plus.py 36.03s user 6.39s system 99% cpu 42.558 total
╰─➤ time ./list_extend.py
./list_extend.py 0.03s user 0.01s system 92% cpu 0.040 total
The first script also uses over 200MB of memory, while the second one doesn't use any more memory than a 'naked' python3 process.
Having said that, the in-place addition does seem to do the same thing as .extend.

I've looked up the official Python tutorial but couldn't find anything anything about this topic
This information happens to be buried in the Programming FAQ:
... for lists, __iadd__ [i.e. +=] is equivalent to calling extend on the list and returning the list. That's why we say that for lists, += is a "shorthand" for list.extend
You can also see this for yourself in the CPython source code: https://github.com/python/cpython/blob/v3.8.2/Objects/listobject.c#L1000-L1011

Only .extend() can be used when the list is in a tuple
This will work
t = ([],[])
t[0].extend([1,2])
while this won't
t = ([],[])
t[0] += [1,2]
The reason is that += generates a new object. If you look at the long version:
t[0] = t[0] + [1,2]
you can see how that would change which object is in the tuple, which is not possible. Using .extend() modifies an object in the tuple, which is allowed.

According to the Python for Data Analysis.
“Note that list concatenation by addition is a comparatively expensive operation since a new list must be created and the objects copied over. Using extend to append elements to an existing list, especially if you are building up a large list, is usually preferable. ”
Thus,
everything = []
for chunk in list_of_lists:
everything.extend(chunk)
is faster than the concatenative alternative:
everything = []
for chunk in list_of_lists:
everything = everything + chunk

Related

Learning python: Why these two function yield different results? [duplicate]

I'm trying to understand Python's approach to variable scope. In this example, why is f() able to alter the value of x, as perceived within main(), but not the value of n?
def f(n, x):
n = 2
x.append(4)
print('In f():', n, x)
def main():
n = 1
x = [0,1,2,3]
print('Before:', n, x)
f(n, x)
print('After: ', n, x)
main()
Output:
Before: 1 [0, 1, 2, 3]
In f(): 2 [0, 1, 2, 3, 4]
After: 1 [0, 1, 2, 3, 4]
See also: How do I pass a variable by reference?
Some answers contain the word "copy" in the context of a function call. I find it confusing.
Python doesn't copy objects you pass during a function call ever.
Function parameters are names. When you call a function, Python binds these parameters to whatever objects you pass (via names in a caller scope).
Objects can be mutable (like lists) or immutable (like integers and strings in Python). A mutable object you can change. You can't change a name, you just can bind it to another object.
Your example is not about scopes or namespaces, it is about naming and binding and mutability of an object in Python.
def f(n, x): # these `n`, `x` have nothing to do with `n` and `x` from main()
n = 2 # put `n` label on `2` balloon
x.append(4) # call `append` method of whatever object `x` is referring to.
print('In f():', n, x)
x = [] # put `x` label on `[]` ballon
# x = [] has no effect on the original list that is passed into the function
Here are nice pictures on the difference between variables in other languages and names in Python.
You've got a number of answers already, and I broadly agree with J.F. Sebastian, but you might find this useful as a shortcut:
Any time you see varname =, you're creating a new name binding within the function's scope. Whatever value varname was bound to before is lost within this scope.
Any time you see varname.foo() you're calling a method on varname. The method may alter varname (e.g. list.append). varname (or, rather, the object that varname names) may exist in more than one scope, and since it's the same object, any changes will be visible in all scopes.
[note that the global keyword creates an exception to the first case]
f doesn't actually alter the value of x (which is always the same reference to an instance of a list). Rather, it alters the contents of this list.
In both cases, a copy of a reference is passed to the function. Inside the function,
n gets assigned a new value. Only the reference inside the function is modified, not the one outside it.
x does not get assigned a new value: neither the reference inside nor outside the function are modified. Instead, x’s value is modified.
Since both the x inside the function and outside it refer to the same value, both see the modification. By contrast, the n inside the function and outside it refer to different values after n was reassigned inside the function.
I will rename variables to reduce confusion. n -> nf or nmain. x -> xf or xmain:
def f(nf, xf):
nf = 2
xf.append(4)
print 'In f():', nf, xf
def main():
nmain = 1
xmain = [0,1,2,3]
print 'Before:', nmain, xmain
f(nmain, xmain)
print 'After: ', nmain, xmain
main()
When you call the function f, the Python runtime makes a copy of xmain and assigns it to xf, and similarly assigns a copy of nmain to nf.
In the case of n, the value that is copied is 1.
In the case of x the value that is copied is not the literal list [0, 1, 2, 3]. It is a reference to that list. xf and xmain are pointing at the same list, so when you modify xf you are also modifying xmain.
If, however, you were to write something like:
xf = ["foo", "bar"]
xf.append(4)
you would find that xmain has not changed. This is because, in the line xf = ["foo", "bar"] you have change xf to point to a new list. Any changes you make to this new list will have no effects on the list that xmain still points to.
Hope that helps. :-)
If the functions are re-written with completely different variables and we call id on them, it then illustrates the point well. I didn't get this at first and read jfs' post with the great explanation, so I tried to understand/convince myself:
def f(y, z):
y = 2
z.append(4)
print ('In f(): ', id(y), id(z))
def main():
n = 1
x = [0,1,2,3]
print ('Before in main:', n, x,id(n),id(x))
f(n, x)
print ('After in main:', n, x,id(n),id(x))
main()
Before in main: 1 [0, 1, 2, 3] 94635800628352 139808499830024
In f(): 94635800628384 139808499830024
After in main: 1 [0, 1, 2, 3, 4] 94635800628352 139808499830024
z and x have the same id. Just different tags for the same underlying structure as the article says.
My general understanding is that any object variable (such as a list or a dict, among others) can be modified through its functions. What I believe you are not able to do is reassign the parameter - i.e., assign it by reference within a callable function.
That is consistent with many other languages.
Run the following short script to see how it works:
def func1(x, l1):
x = 5
l1.append("nonsense")
y = 10
list1 = ["meaning"]
func1(y, list1)
print(y)
print(list1)
It´s because a list is a mutable object. You´re not setting x to the value of [0,1,2,3], you´re defining a label to the object [0,1,2,3].
You should declare your function f() like this:
def f(n, x=None):
if x is None:
x = []
...
n is an int (immutable), and a copy is passed to the function, so in the function you are changing the copy.
X is a list (mutable), and a copy of the pointer is passed o the function so x.append(4) changes the contents of the list. However, you you said x = [0,1,2,3,4] in your function, you would not change the contents of x in main().
Python is copy by value of reference. An object occupies a field in memory, and a reference is associated with that object, but itself occupies a field in memory. And name/value is associated with a reference. In python function, it always copy the value of the reference, so in your code, n is copied to be a new name, when you assign that, it has a new space in caller stack. But for the list, the name also got copied, but it refer to the same memory(since you never assign the list a new value). That is a magic in python!
When you are passing the command n = 2 inside the function, it finds a memory space and label it as 2. But if you call the method append, you are basically refrencing to location x (whatever the value is) and do some operation on that.
Python is a pure pass-by-value language if you think about it the right way. A python variable stores the location of an object in memory. The Python variable does not store the object itself. When you pass a variable to a function, you are passing a copy of the address of the object being pointed to by the variable.
Contrast these two functions
def foo(x):
x[0] = 5
def goo(x):
x = []
Now, when you type into the shell
>>> cow = [3,4,5]
>>> foo(cow)
>>> cow
[5,4,5]
Compare this to goo.
>>> cow = [3,4,5]
>>> goo(cow)
>>> goo
[3,4,5]
In the first case, we pass a copy the address of cow to foo and foo modified the state of the object residing there. The object gets modified.
In the second case you pass a copy of the address of cow to goo. Then goo proceeds to change that copy. Effect: none.
I call this the pink house principle. If you make a copy of your address and tell a
painter to paint the house at that address pink, you will wind up with a pink house.
If you give the painter a copy of your address and tell him to change it to a new address,
the address of your house does not change.
The explanation eliminates a lot of confusion. Python passes the addresses variables store by value.
As jouell said. It's a matter of what points to what and i'd add that it's also a matter of the difference between what = does and what the .append method does.
When you define n and x in main, you tell them to point at 2 objects, namely 1 and [1,2,3]. That is what = does : it tells what your variable should point to.
When you call the function f(n,x), you tell two new local variables nf and xf to point at the same two objects as n and x.
When you use "something"="anything_new", you change what "something" points to. When you use .append, you change the object itself.
Somehow, even though you gave them the same names, n in the main() and the n in f() are not the same entity, they only originally point to the same object (same goes for x actually). A change to what one of them points to won't affect the other. However, if you instead make a change to the object itself, that will affect both variables as they both point to this same, now modified, object.
Lets illustrate the difference between the method .append and the = without defining a new function :
compare
m = [1,2,3]
n = m # this tells n to point at the same object as m does at the moment
m = [1,2,3,4] # writing m = m + [4] would also do the same
print('n = ', n,'m = ',m)
to
m = [1,2,3]
n = m
m.append(4)
print('n = ', n,'m = ',m)
In the first code, it will print n = [1, 2, 3] m = [1, 2, 3, 4], since in the 3rd line, you didnt change the object [1,2,3], but rather you told m to point to a new, different, object (using '='), while n still pointed at the original object.
In the second code, it will print n = [1, 2, 3, 4] m = [1, 2, 3, 4]. This is because here both m and n still point to the same object throughout the code, but you modified the object itself (that m is pointing to) using the .append method... Note that the result of the second code will be the same regardless of wether you write m.append(4) or n.append(4) on the 3rd line.
Once you understand that, the only confusion that remains is really to understand that, as I said, the n and x inside your f() function and the ones in your main() are NOT the same, they only initially point to the same object when you call f().
Please allow me to edit again. These concepts are my experience from learning python by try error and internet, mostly stackoverflow. There are mistakes and there are helps.
Python variables use references, I think reference as relation links from name, memory adress and value.
When we do B = A, we actually create a nickname of A, and now the A has 2 names, A and B. When we call B, we actually are calling the A. we create a ink to the value of other variable, instead of create a new same value, this is what we call reference. And this thought would lead to 2 porblems.
when we do
A = [1]
B = A # Now B is an alias of A
A.append(2) # Now the value of A had been changes
print(B)
>>> [1, 2]
# B is still an alias of A
# Which means when we call B, the real name we are calling is A
# When we do something to B, the real name of our object is A
B.append(3)
print(A)
>>> [1, 2, 3]
This is what happens when we pass arguments to functions
def test(B):
print('My name is B')
print(f'My value is {B}')
print(' I am just a nickname, My real name is A')
B.append(2)
A = [1]
test(A)
print(A)
>>> [1, 2]
We pass A as an argument of a function, but the name of this argument in that function is B.
Same one with different names.
So when we do B.append, we are doing A.append
When we pass an argument to a function, we are not passing a variable , we are passing an alias.
And here comes the 2 problems.
the equal sign always creates a new name
A = [1]
B = A
B.append(2)
A = A[0] # Now the A is a brand new name, and has nothing todo with the old A from now on.
B.append(3)
print(A)
>>> 1
# the relation of A and B is removed when we assign the name A to something else
# Now B is a independent variable of hisown.
the Equal sign is a statesment of clear brand new name,
this was the concused part of mine
A = [1, 2, 3]
# No equal sign, we are working on the origial object,
A.append(4)
>>> [1, 2, 3, 4]
# This would create a new A
A = A + [4]
>>> [1, 2, 3, 4]
and the function
def test(B):
B = [1, 2, 3] # B is a new name now, not an alias of A anymore
B.append(4) # so this operation won't effect A
A = [1, 2, 3]
test(A)
print(A)
>>> [1, 2, 3]
# ---------------------------
def test(B):
B.append(4) # B is a nickname of A, we are doing A
A = [1, 2, 3]
test(A)
print(A)
>>> [1, 2, 3, 4]
the first problem is
the left side of and equation is always a brand new name, new variable,
unless the right side is a name, like B = A, this create an alias only
The second problem, there are something would never be changed, we cannot modify the original, can only create a new one.
This is what we call immutable.
When we do A= 123 , we create a dict which contains name, value, and adress.
When we do B = A, we copy the adress and value from A to B, all operation to B effect the same adress of the value of A.
When it comes to string, numbers, and tuple. the pair of value and adress could never be change. When we put a str to some adress, it was locked right away, the result of all modifications would be put into other adress.
A = 'string' would create a protected value and adess to storage the string 'string' . Currently, there is no built-in functions or method cound modify a string with the syntax like list.append, because this code modify the original value of a adress.
the value and adress of a string, a number, or a tuple is protected, locked, immutable.
All we can work on a string is by the syntax of A = B.method , we have to create a new name to storage the new string value.
please extend this discussion if you still get confused.
this discussion help me to figure out mutable / immutable / refetence / argument / variable / name once for all, hopely this could do some help to someone too.
##############################
had modified my answer tons of times and realized i don't have to say anything, python had explained itself already.
a = 'string'
a.replace('t', '_')
print(a)
>>> 'string'
a = a.replace('t', '_')
print(a)
>>> 's_ring'
b = 100
b + 1
print(b)
>>> 100
b = b + 1
print(b)
>>> 101
def test_id(arg):
c = id(arg)
arg = 123
d = id(arg)
return
a = 'test ids'
b = id(a)
test_id(a)
e = id(a)
# b = c = e != d
# this function do change original value
del change_like_mutable(arg):
arg.append(1)
arg.insert(0, 9)
arg.remove(2)
return
test_1 = [1, 2, 3]
change_like_mutable(test_1)
# this function doesn't
def wont_change_like_str(arg):
arg = [1, 2, 3]
return
test_2 = [1, 1, 1]
wont_change_like_str(test_2)
print("Doesn't change like a imutable", test_2)
This devil is not the reference / value / mutable or not / instance, name space or variable / list or str, IT IS THE SYNTAX, EQUAL SIGN.

Networkx: Example for antichains algorithms [duplicate]

I am reading the Python cookbook at the moment and am currently looking at generators. I'm finding it hard to get my head round.
As I come from a Java background, is there a Java equivalent? The book was speaking about 'Producer / Consumer', however when I hear that I think of threading.
What is a generator and why would you use it? Without quoting any books, obviously (unless you can find a decent, simplistic answer direct from a book). Perhaps with examples, if you're feeling generous!
Note: this post assumes Python 3.x syntax.†
A generator is simply a function which returns an object on which you can call next, such that for every call it returns some value, until it raises a StopIteration exception, signaling that all values have been generated. Such an object is called an iterator.
Normal functions return a single value using return, just like in Java. In Python, however, there is an alternative, called yield. Using yield anywhere in a function makes it a generator. Observe this code:
>>> def myGen(n):
... yield n
... yield n + 1
...
>>> g = myGen(6)
>>> next(g)
6
>>> next(g)
7
>>> next(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
As you can see, myGen(n) is a function which yields n and n + 1. Every call to next yields a single value, until all values have been yielded. for loops call next in the background, thus:
>>> for n in myGen(6):
... print(n)
...
6
7
Likewise there are generator expressions, which provide a means to succinctly describe certain common types of generators:
>>> g = (n for n in range(3, 5))
>>> next(g)
3
>>> next(g)
4
>>> next(g)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Note that generator expressions are much like list comprehensions:
>>> lc = [n for n in range(3, 5)]
>>> lc
[3, 4]
Observe that a generator object is generated once, but its code is not run all at once. Only calls to next actually execute (part of) the code. Execution of the code in a generator stops once a yield statement has been reached, upon which it returns a value. The next call to next then causes execution to continue in the state in which the generator was left after the last yield. This is a fundamental difference with regular functions: those always start execution at the "top" and discard their state upon returning a value.
There are more things to be said about this subject. It is e.g. possible to send data back into a generator (reference). But that is something I suggest you do not look into until you understand the basic concept of a generator.
Now you may ask: why use generators? There are a couple of good reasons:
Certain concepts can be described much more succinctly using generators.
Instead of creating a function which returns a list of values, one can write a generator which generates the values on the fly. This means that no list needs to be constructed, meaning that the resulting code is more memory efficient. In this way one can even describe data streams which would simply be too large to fit in memory.
Generators allow for a natural way to describe infinite streams. Consider for example the Fibonacci numbers:
>>> def fib():
... a, b = 0, 1
... while True:
... yield a
... a, b = b, a + b
...
>>> import itertools
>>> list(itertools.islice(fib(), 10))
[0, 1, 1, 2, 3, 5, 8, 13, 21, 34]
This code uses itertools.islice to take a finite number of elements from an infinite stream. You are advised to have a good look at the functions in the itertools module, as they are essential tools for writing advanced generators with great ease.
† About Python <=2.6: in the above examples next is a function which calls the method __next__ on the given object. In Python <=2.6 one uses a slightly different technique, namely o.next() instead of next(o). Python 2.7 has next() call .next so you need not use the following in 2.7:
>>> g = (n for n in range(3, 5))
>>> g.next()
3
A generator is effectively a function that returns (data) before it is finished, but it pauses at that point, and you can resume the function at that point.
>>> def myGenerator():
... yield 'These'
... yield 'words'
... yield 'come'
... yield 'one'
... yield 'at'
... yield 'a'
... yield 'time'
>>> myGeneratorInstance = myGenerator()
>>> next(myGeneratorInstance)
These
>>> next(myGeneratorInstance)
words
and so on. The (or one) benefit of generators is that because they deal with data one piece at a time, you can deal with large amounts of data; with lists, excessive memory requirements could become a problem. Generators, just like lists, are iterable, so they can be used in the same ways:
>>> for word in myGeneratorInstance:
... print word
These
words
come
one
at
a
time
Note that generators provide another way to deal with infinity, for example
>>> from time import gmtime, strftime
>>> def myGen():
... while True:
... yield strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime())
>>> myGeneratorInstance = myGen()
>>> next(myGeneratorInstance)
Thu, 28 Jun 2001 14:17:15 +0000
>>> next(myGeneratorInstance)
Thu, 28 Jun 2001 14:18:02 +0000
The generator encapsulates an infinite loop, but this isn't a problem because you only get each answer every time you ask for it.
First of all, the term generator originally was somewhat ill-defined in Python, leading to lots of confusion. You probably mean iterators and iterables (see here). Then in Python there are also generator functions (which return a generator object), generator objects (which are iterators) and generator expressions (which are evaluated to a generator object).
According to the glossary entry for generator it seems that the official terminology is now that generator is short for "generator function". In the past the documentation defined the terms inconsistently, but fortunately this has been fixed.
It might still be a good idea to be precise and avoid the term "generator" without further specification.
Generators could be thought of as shorthand for creating an iterator. They behave like a Java Iterator. Example:
>>> g = (x for x in range(10))
>>> g
<generator object <genexpr> at 0x7fac1c1e6aa0>
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> list(g) # force iterating the rest
[3, 4, 5, 6, 7, 8, 9]
>>> g.next() # iterator is at the end; calling next again will throw
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Hope this helps/is what you are looking for.
Update:
As many other answers are showing, there are different ways to create a generator. You can use the parentheses syntax as in my example above, or you can use yield. Another interesting feature is that generators can be "infinite" -- iterators that don't stop:
>>> def infinite_gen():
... n = 0
... while True:
... yield n
... n = n + 1
...
>>> g = infinite_gen()
>>> g.next()
0
>>> g.next()
1
>>> g.next()
2
>>> g.next()
3
...
There is no Java equivalent.
Here is a bit of a contrived example:
#! /usr/bin/python
def mygen(n):
x = 0
while x < n:
x = x + 1
if x % 3 == 0:
yield x
for a in mygen(100):
print a
There is a loop in the generator that runs from 0 to n, and if the loop variable is a multiple of 3, it yields the variable.
During each iteration of the for loop the generator is executed. If it is the first time the generator executes, it starts at the beginning, otherwise it continues from the previous time it yielded.
I like to describe generators, to those with a decent background in programming languages and computing, in terms of stack frames.
In many languages, there is a stack on top of which is the current stack "frame". The stack frame includes space allocated for variables local to the function including the arguments passed in to that function.
When you call a function, the current point of execution (the "program counter" or equivalent) is pushed onto the stack, and a new stack frame is created. Execution then transfers to the beginning of the function being called.
With regular functions, at some point the function returns a value, and the stack is "popped". The function's stack frame is discarded and execution resumes at the previous location.
When a function is a generator, it can return a value without the stack frame being discarded, using the yield statement. The values of local variables and the program counter within the function are preserved. This allows the generator to be resumed at a later time, with execution continuing from the yield statement, and it can execute more code and return another value.
Before Python 2.5 this was all generators did. Python 2.5 added the ability to pass values back in to the generator as well. In doing so, the passed-in value is available as an expression resulting from the yield statement which had temporarily returned control (and a value) from the generator.
The key advantage to generators is that the "state" of the function is preserved, unlike with regular functions where each time the stack frame is discarded, you lose all that "state". A secondary advantage is that some of the function call overhead (creating and deleting stack frames) is avoided, though this is a usually a minor advantage.
It helps to make a clear distinction between the function foo, and the generator foo(n):
def foo(n):
yield n
yield n+1
foo is a function.
foo(6) is a generator object.
The typical way to use a generator object is in a loop:
for n in foo(6):
print(n)
The loop prints
# 6
# 7
Think of a generator as a resumable function.
yield behaves like return in the sense that values that are yielded get "returned" by the generator. Unlike return, however, the next time the generator gets asked for a value, the generator's function, foo, resumes where it left off -- after the last yield statement -- and continues to run until it hits another yield statement.
Behind the scenes, when you call bar=foo(6) the generator object bar is defined for you to have a next attribute.
You can call it yourself to retrieve values yielded from foo:
next(bar) # Works in Python 2.6 or Python 3.x
bar.next() # Works in Python 2.5+, but is deprecated. Use next() if possible.
When foo ends (and there are no more yielded values), calling next(bar) throws a StopInteration error.
The only thing I can add to Stephan202's answer is a recommendation that you take a look at David Beazley's PyCon '08 presentation "Generator Tricks for Systems Programmers," which is the best single explanation of the how and why of generators that I've seen anywhere. This is the thing that took me from "Python looks kind of fun" to "This is what I've been looking for." It's at http://www.dabeaz.com/generators/.
This post will use Fibonacci numbers as a tool to build up to explaining the usefulness of Python generators.
This post will feature both C++ and Python code.
Fibonacci numbers are defined as the sequence: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, ....
Or in general:
F0 = 0
F1 = 1
Fn = Fn-1 + Fn-2
This can be transferred into a C++ function extremely easily:
size_t Fib(size_t n)
{
//Fib(0) = 0
if(n == 0)
return 0;
//Fib(1) = 1
if(n == 1)
return 1;
//Fib(N) = Fib(N-2) + Fib(N-1)
return Fib(n-2) + Fib(n-1);
}
But if you want to print the first six Fibonacci numbers, you will be recalculating a lot of the values with the above function.
For example: Fib(3) = Fib(2) + Fib(1), but Fib(2) also recalculates Fib(1). The higher the value you want to calculate, the worse off you will be.
So one may be tempted to rewrite the above by keeping track of the state in main.
// Not supported for the first two elements of Fib
size_t GetNextFib(size_t &pp, size_t &p)
{
int result = pp + p;
pp = p;
p = result;
return result;
}
int main(int argc, char *argv[])
{
size_t pp = 0;
size_t p = 1;
std::cout << "0 " << "1 ";
for(size_t i = 0; i <= 4; ++i)
{
size_t fibI = GetNextFib(pp, p);
std::cout << fibI << " ";
}
return 0;
}
But this is very ugly, and it complicates our logic in main. It would be better to not have to worry about state in our main function.
We could return a vector of values and use an iterator to iterate over that set of values, but this requires a lot of memory all at once for a large number of return values.
So back to our old approach, what happens if we wanted to do something else besides print the numbers? We'd have to copy and paste the whole block of code in main and change the output statements to whatever else we wanted to do.
And if you copy and paste code, then you should be shot. You don't want to get shot, do you?
To solve these problems, and to avoid getting shot, we may rewrite this block of code using a callback function. Every time a new Fibonacci number is encountered, we would call the callback function.
void GetFibNumbers(size_t max, void(*FoundNewFibCallback)(size_t))
{
if(max-- == 0) return;
FoundNewFibCallback(0);
if(max-- == 0) return;
FoundNewFibCallback(1);
size_t pp = 0;
size_t p = 1;
for(;;)
{
if(max-- == 0) return;
int result = pp + p;
pp = p;
p = result;
FoundNewFibCallback(result);
}
}
void foundNewFib(size_t fibI)
{
std::cout << fibI << " ";
}
int main(int argc, char *argv[])
{
GetFibNumbers(6, foundNewFib);
return 0;
}
This is clearly an improvement, your logic in main is not as cluttered, and you can do anything you want with the Fibonacci numbers, simply define new callbacks.
But this is still not perfect. What if you wanted to only get the first two Fibonacci numbers, and then do something, then get some more, then do something else?
Well, we could go on like we have been, and we could start adding state again into main, allowing GetFibNumbers to start from an arbitrary point.
But this will further bloat our code, and it already looks too big for a simple task like printing Fibonacci numbers.
We could implement a producer and consumer model via a couple of threads. But this complicates the code even more.
Instead let's talk about generators.
Python has a very nice language feature that solves problems like these called generators.
A generator allows you to execute a function, stop at an arbitrary point, and then continue again where you left off.
Each time returning a value.
Consider the following code that uses a generator:
def fib():
pp, p = 0, 1
while 1:
yield pp
pp, p = p, pp+p
g = fib()
for i in range(6):
g.next()
Which gives us the results:
0
1
1
2
3
5
The yield statement is used in conjuction with Python generators. It saves the state of the function and returns the yeilded value. The next time you call the next() function on the generator, it will continue where the yield left off.
This is by far more clean than the callback function code. We have cleaner code, smaller code, and not to mention much more functional code (Python allows arbitrarily large integers).
Source
I believe the first appearance of iterators and generators were in the Icon programming language, about 20 years ago.
You may enjoy the Icon overview, which lets you wrap your head around them without concentrating on the syntax (since Icon is a language you probably don't know, and Griswold was explaining the benefits of his language to people coming from other languages).
After reading just a few paragraphs there, the utility of generators and iterators might become more apparent.
I put up this piece of code which explains 3 key concepts about generators:
def numbers():
for i in range(10):
yield i
gen = numbers() #this line only returns a generator object, it does not run the code defined inside numbers
for i in gen: #we iterate over the generator and the values are printed
print(i)
#the generator is now empty
for i in gen: #so this for block does not print anything
print(i)
Performance difference:
macOS Big Sur 11.1
MacBook Pro (13-inch, M1, 2020)
Chip Apple M1
Memory 8gb
CASE 1
import random
import psutil # pip install psutil
import os
from datetime import datetime
def memory_usage_psutil():
# return the memory usage in MB
process = psutil.Process(os.getpid())
mem = process.memory_info().rss / float(2 ** 20)
return '{:.2f} MB'.format(mem)
names = ['John', 'Milovan', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']
print('Memory (Before): {}'.format(memory_usage_psutil()))
def people_list(num_people):
result = []
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
result.append(person)
return result
t1 = datetime.now()
people = people_list(1000000)
t2 = datetime.now()
print('Memory (After) : {}'.format(memory_usage_psutil()))
print('Took {} Seconds'.format(t2 - t1))
output:
Memory (Before): 50.38 MB
Memory (After) : 1140.41 MB
Took 0:00:01.056423 Seconds
Function which returns a list of 1 million results.
At the bottom I'm printing out the memory usage and the total time.
Base memory usage was around 50.38 megabytes and this memory after is after I created that list of 1 million records so you can see here that it jumped up by nearly 1140.41 megabytes and it took 1,1 seconds.
CASE 2
import random
import psutil # pip install psutil
import os
from datetime import datetime
def memory_usage_psutil():
# return the memory usage in MB
process = psutil.Process(os.getpid())
mem = process.memory_info().rss / float(2 ** 20)
return '{:.2f} MB'.format(mem)
names = ['John', 'Milovan', 'Adam', 'Steve', 'Rick', 'Thomas']
majors = ['Math', 'Engineering', 'CompSci', 'Arts', 'Business']
print('Memory (Before): {}'.format(memory_usage_psutil()))
def people_generator(num_people):
for i in range(num_people):
person = {
'id': i,
'name': random.choice(names),
'major': random.choice(majors)
}
yield person
t1 = datetime.now()
people = people_generator(1000000)
t2 = datetime.now()
print('Memory (After) : {}'.format(memory_usage_psutil()))
print('Took {} Seconds'.format(t2 - t1))
output:
Memory (Before): 50.52 MB
Memory (After) : 50.73 MB
Took 0:00:00.000008 Seconds
After I ran this that the memory is almost exactly the same and that's because the generator hasn't actually done anything yet it's not holding those million values in memory it's waiting for me to grab the next one.
Basically it didn't take any time because as soon as it gets to the first yield statement it stops.
I think that it is generator a little bit more readable and it also gives you big performance boosts not only with execution time but with memory.
As well and you can still use all of the comprehensions and this generator expression here so you don't lose anything in that area. So those are a few reasons why you would use generators and also some of the advantages that come along with that.
Experience with list comprehensions has shown their widespread utility throughout Python. However, many of the use cases do not need to have a full list created in memory. Instead, they only need to iterate over the elements one at a time.
For instance, the following summation code will build a full list of squares in memory, iterate over those values, and, when the reference is no longer needed, delete the list:
sum([x*x for x in range(10)])
Memory is conserved by using a generator expression instead:
sum(x*x for x in range(10))
Similar benefits are conferred on constructors for container objects:
s = Set(word for line in page for word in line.split())
d = dict( (k, func(k)) for k in keylist)
Generator expressions are especially useful with functions like sum(), min(), and max() that reduce an iterable input to a single value:
max(len(line) for line in file if line.strip())
more

Pass a List into Python Function

As a new python programmer, I have two questions about list and really appreciate your advice:
Question 1:
For the following code:
nums1 = [1,2,3,8,0,0,0]
m = 3
nums2 = [2,5,6]
n = 3
def merge(nums1, m, nums2, n):
nums1[:] = sorted(nums1[:m]+nums2)
merge(nums1, m, nums2, n)
nums1
What it does is: pass list nums1 and list nums2 into merge function, and merge them into list nums1 with the first m items in nums1 and n items in nums2, and sort list nums1. So the results are: [1, 2, 2, 3, 5, 6]
So my question is: since list nums1 was defined outside the scope of function merge, how come it has the ability to update nums1? And in the following example:
x = 10
def reassign(x):
x = 2
reassign(x)
x
Variable x was defined outside of function reassign, and the reassign function was not able to update x defined outside of reassign, which is why x returns 10.
Question 2:
In the above code I provided, if I write it like the following:
Note: I just modified nums1[:] into nums1 when assigning sorted(nums1[:m]+nums2)
nums1 = [1,2,3,8,0,0,0]
m = 3
nums2 = [2,5,6]
n = 3
def merge(nums1, m, nums2, n):
nums1 = sorted(nums1[:m]+nums2)
merge(nums1, m, nums2, n)
nums1
nums1 returns [1,2,3,8,0,0,0], so my question is: after adding [:] after nums1, how come the function has the ability to nums1? What does [:] in that example?
To replicate what you are saying, take the following:
var = 10
lst = [1, 2, 3]
def func():
var = 11
lst[:] = [1, 2, 3, 4]
func()
print(var, lst)
The above will output 10 [1, 2, 3, 4]. Now notice the following:
var = 10
lst = [1, 2, 3]
def func():
print(var)
print(lst)
func()
Outputs 10 [1, 2, 3] -- so we know that functions can access global variables, but in most cases cannot modify them. Now let us look at both cases (int and list): The two cases are followed:
The var variable is not being modified due to the difference of reference between local and global scope (while we can access the global scope, we can't modify it). I recommend playing around with printing globals() and locals() for fun. This case can be fixed if we do:
def func():
global var
var = 11
The lst variable is being modified with the [:] notation because as referenced here, the slice assignment [:] utilizes the operator function setitem(). Therefore, technically, lst[:] = is the equivalent of doing:
from operator import setitem
lst = [1, 2, 3]
# Both of these are equivalent.
lst[:] = [1, 2, 3, 4]
setitem(a, slice(0, len(a)), [1, 2, 3, 4])
setitem does not discriminate between local or global scopes.
(Don't use the [:] thing. That's horrible.)
My informal answer
When you say nums1[:] Python is finding the GLOBAL list called nums1.
However, inside of a function, Python pays attention to new variables first. Why?
- it would suck if any variable name you ised outisde of a function, was now restricted from being used as DIFFERENT variable inside the function
H = True # some variable.pretend it means "High"
def euro_height(inches):
H = inches # since H is a nice abbrev for height in inches
return H*2.54 # Centimeters
I don't want my H inside the function to overwrite something I already stored. Therefore, within euro_height, H is considered a different local variable, which only that function can see and use. If I want to use the H from outside of the function, I'd have to first tell Python to access it. Then I can use it.
H = True # some variable.pretend it means "High"
def euro_height(inches):
global H
print(H) # will say true
renamed_var = inches # since H is a nice abbrev for height in inches
return renamed_var*2.54 # Centimeters
If I were to assign H = inches inside the function now, it would overwrite the True value for the global H. So instead, I rename it, because there is already an H I want to use.
The name for all this is called namespaces. Hopefully, you are doing a Python tutorial. You will understand this when they teach functions. I highyl suggest doing a tutorial if you aren't.
For more about this answer related to what happened to you, look at the interactive examples here https://www.programiz.com/python-programming/global-local-nonlocal-variables
Also, never use mylist[:] again. =) It's poor syntax.
It just returns the entire list. so just use the name of the list mylist. By adding the brackets, you forced finding og the global var, and created your problem
If you are confused, take the following.
Firstly placement and change are not the same things
nums1 = [1,2,3,8,0,0,0]
lst = None
m = 3
nums2 = [2,5,6]
n = 3
def merge(nums1, m, nums2, n):
global lst
lst = sorted(nums1[:m]+nums2)
merge(nums1, m, nums2, n)
print(lst)
nums1[:] means your whole list.
If you use nums1[:] instead of nums1 You can change your former nums1[:] list using sorted(nums1[:m]+nums2). When you do this change what both your former and latest list, so you in the nums1 array list changing new with new attached variables place each other. But if you use only nums1 instead of nums1[:] the latest nums1 now refers to a different list from the former.

How to monkey patch python list __setitem__ method

I'd like to monkey-patch Python lists, in particular, replacing the __setitem__ method with custom code. Note that I am not trying to extend, but to overwrite the builtin types. For example:
>>> # Monkey Patch
... # Replace list.__setitem__ with a Noop
...
>>> myList = [1,2,3,4,5]
>>> myList[0] = "Nope"
>>> myList
[1, 2, 3, 4, 5]
Yes, I know that is a downright perverted thing to do to python code. No, my usecase doesn't really make sense. Nonetheless, can it be done?
Possible avenues:
Setting a read only attribute on builtins using ctypes
The forbiddenfruit module allows patching of C builtins, but does not work when trying to override the list methods
This Gist also manages monkey patching of builtin by manipulating the object's dictionary. I've updated it to Python3 here but it still doesn't allow overriding of the methods.
The Pyrthon library overrides the list type in a module to make it immutable by using AST transformation. This could be worth investigating.
Demonstrative example
I actually manage to override the methods themselves, as shown below:
import ctypes
def magic_get_dict(o):
# find address of dict whose offset is stored in the type
dict_addr = id(o) + type(o).__dictoffset__
# retrieve the dict object itself
dict_ptr = ctypes.cast(dict_addr, ctypes.POINTER(ctypes.py_object))
return dict_ptr.contents.value
def magic_flush_mro_cache():
ctypes.PyDLL(None).PyType_Modified(ctypes.cast(id(object), ctypes.py_object))
print(list.__setitem__)
dct = magic_get_dict(list)
dct['__setitem__'] = lambda s, k, v: s
magic_flush_mro_cache()
print(list.__setitem__)
x = [1,2,3,4,5]
print(x.__setitem__)
x.__setitem__(0,10)
x[1] = 20
print(x)
Which outputs the following:
➤ python3 override.py
<slot wrapper '__setitem__' of 'list' objects>
<function <lambda> at 0x10de43f28>
<bound method <lambda> of [1, 2, 3, 4, 5]>
[1, 20, 3, 4, 5]
But as shown in the output, this doesn't seem to affect the normal syntax for setting an item (x[0] = 0)
Alternative: Monkey patching an individual list instance
As a lesser alternative, if I was able to monkey patch an individual list's instance, this could work too. Perhaps by changing the class pointer of the list to a custom class.
A little late to the party, but nonetheless, here's the answer.
As user2357112 hinted in the comment above, modifying the dict won't suffice, since __getitme__ (and other double-underscore names) are mapped to their slot, and won't be updated without calling update_slot (which isn't exported, so that would be a little tricky).
Inspired by the above comment, here's a working example of making __setitem__ a no-op for specific lists:
# assuming v3.8 (tested on Windows x64 and Ubuntu x64)
# definition of PyTypeObject: https://github.com/python/cpython/blob/3.8/Include/cpython/object.h#L177
# no extensive testing was performed and I'll let other decide if this is a good idea or not, but it's possible
import ctypes
Py_TPFLAGS_HEAPTYPE = (1 << 9)
# calculate the offset of the tp_flags field
offset = ctypes.sizeof(ctypes.c_ssize_t) * 1 # PyObject_VAR_HEAD.ob_base.ob_refcnt
offset += ctypes.sizeof(ctypes.c_void_p) * 1 # PyObject_VAR_HEAD.ob_base.ob_type
offset += ctypes.sizeof(ctypes.c_ssize_t) * 1 # PyObject_VAR_HEAD.ob_size
offset += ctypes.sizeof(ctypes.c_void_p) * 1 # tp_name
offset += ctypes.sizeof(ctypes.c_ssize_t) * 2 # tp_basicsize+tp_itemsize
offset += ctypes.sizeof(ctypes.c_void_p) * 1 # tp_dealloc
offset += ctypes.sizeof(ctypes.c_ssize_t) * 1 # tp_vectorcall_offset
offset += ctypes.sizeof(ctypes.c_void_p) * 7 # tp_getattr+tp_setattr+tp_as_async+tp_repr+tp_as_number+tp_as_sequence+tp_as_mapping
offset += ctypes.sizeof(ctypes.c_void_p) * 6 # tp_hash+tp_call+tp_str+tp_getattro+tp_setattro+tp_as_buffer
tp_flags = ctypes.c_ulong.from_address(id(list) + offset)
assert(tp_flags.value == list.__flags__) # should be the same
lst1 = [1,2,3]
lst2 = [1,2,3]
dont_set_me = [lst1] # these lists cannot be set
# define new method
orig = list.__setitem__
def new_setitem(self, *args):
if [_ for _ in dont_set_me if _ is self]: # check for identical object in list
print('Nope')
else:
return orig(self, *args)
tp_flags.value |= Py_TPFLAGS_HEAPTYPE # add flag, to allow type_setattro to continue
list.__setitem__ = new_setitem # set method, this will already call PyType_Modified and update_slot
tp_flags.value &= (~Py_TPFLAGS_HEAPTYPE) # remove flag
print(lst1, lst2) # > [1, 2, 3] [1, 2, 3]
lst1[0],lst2[0]='x','x' # > Nope
print(lst1, lst2) # > [1, 2, 3] ['x', 2, 3]
Edit
See here why it's not supported to begin with. Mainly, as explained by Guido van Rossum:
This is prohibited intentionally to prevent accidental fatal changes to built-in types (fatal to parts of the code that you never though of). Also, it is done to prevent the changes to affect different interpreters residing in the address space, since built-in types (unlike user-defined classes) are shared between all such interpreters.
I also searched for all usages of Py_TPFLAGS_HEAPTYPE in cpython and they all seem to be related to GC or some validations.
So I guess if:
You don't change the types structure (I believe the above doesnt)
You're not using multiple interpreters in the same process
You remove the flag and immediately restore it in a single-threaded state
You don't really do anything that can affect GC when the flag is removed
You'll just be fine <generic disclaimer here>.
Can't be done. If you do force that using CTypes, you will just crash the Python runtime faster than anything else - as many things itnernally just make use of Python data types.

Python 3.4: Function unexpectedly redefines its input [duplicate]

I'm trying to understand Python's approach to variable scope. In this example, why is f() able to alter the value of x, as perceived within main(), but not the value of n?
def f(n, x):
n = 2
x.append(4)
print('In f():', n, x)
def main():
n = 1
x = [0,1,2,3]
print('Before:', n, x)
f(n, x)
print('After: ', n, x)
main()
Output:
Before: 1 [0, 1, 2, 3]
In f(): 2 [0, 1, 2, 3, 4]
After: 1 [0, 1, 2, 3, 4]
See also: How do I pass a variable by reference?
Some answers contain the word "copy" in the context of a function call. I find it confusing.
Python doesn't copy objects you pass during a function call ever.
Function parameters are names. When you call a function, Python binds these parameters to whatever objects you pass (via names in a caller scope).
Objects can be mutable (like lists) or immutable (like integers and strings in Python). A mutable object you can change. You can't change a name, you just can bind it to another object.
Your example is not about scopes or namespaces, it is about naming and binding and mutability of an object in Python.
def f(n, x): # these `n`, `x` have nothing to do with `n` and `x` from main()
n = 2 # put `n` label on `2` balloon
x.append(4) # call `append` method of whatever object `x` is referring to.
print('In f():', n, x)
x = [] # put `x` label on `[]` ballon
# x = [] has no effect on the original list that is passed into the function
Here are nice pictures on the difference between variables in other languages and names in Python.
You've got a number of answers already, and I broadly agree with J.F. Sebastian, but you might find this useful as a shortcut:
Any time you see varname =, you're creating a new name binding within the function's scope. Whatever value varname was bound to before is lost within this scope.
Any time you see varname.foo() you're calling a method on varname. The method may alter varname (e.g. list.append). varname (or, rather, the object that varname names) may exist in more than one scope, and since it's the same object, any changes will be visible in all scopes.
[note that the global keyword creates an exception to the first case]
f doesn't actually alter the value of x (which is always the same reference to an instance of a list). Rather, it alters the contents of this list.
In both cases, a copy of a reference is passed to the function. Inside the function,
n gets assigned a new value. Only the reference inside the function is modified, not the one outside it.
x does not get assigned a new value: neither the reference inside nor outside the function are modified. Instead, x’s value is modified.
Since both the x inside the function and outside it refer to the same value, both see the modification. By contrast, the n inside the function and outside it refer to different values after n was reassigned inside the function.
I will rename variables to reduce confusion. n -> nf or nmain. x -> xf or xmain:
def f(nf, xf):
nf = 2
xf.append(4)
print 'In f():', nf, xf
def main():
nmain = 1
xmain = [0,1,2,3]
print 'Before:', nmain, xmain
f(nmain, xmain)
print 'After: ', nmain, xmain
main()
When you call the function f, the Python runtime makes a copy of xmain and assigns it to xf, and similarly assigns a copy of nmain to nf.
In the case of n, the value that is copied is 1.
In the case of x the value that is copied is not the literal list [0, 1, 2, 3]. It is a reference to that list. xf and xmain are pointing at the same list, so when you modify xf you are also modifying xmain.
If, however, you were to write something like:
xf = ["foo", "bar"]
xf.append(4)
you would find that xmain has not changed. This is because, in the line xf = ["foo", "bar"] you have change xf to point to a new list. Any changes you make to this new list will have no effects on the list that xmain still points to.
Hope that helps. :-)
If the functions are re-written with completely different variables and we call id on them, it then illustrates the point well. I didn't get this at first and read jfs' post with the great explanation, so I tried to understand/convince myself:
def f(y, z):
y = 2
z.append(4)
print ('In f(): ', id(y), id(z))
def main():
n = 1
x = [0,1,2,3]
print ('Before in main:', n, x,id(n),id(x))
f(n, x)
print ('After in main:', n, x,id(n),id(x))
main()
Before in main: 1 [0, 1, 2, 3] 94635800628352 139808499830024
In f(): 94635800628384 139808499830024
After in main: 1 [0, 1, 2, 3, 4] 94635800628352 139808499830024
z and x have the same id. Just different tags for the same underlying structure as the article says.
My general understanding is that any object variable (such as a list or a dict, among others) can be modified through its functions. What I believe you are not able to do is reassign the parameter - i.e., assign it by reference within a callable function.
That is consistent with many other languages.
Run the following short script to see how it works:
def func1(x, l1):
x = 5
l1.append("nonsense")
y = 10
list1 = ["meaning"]
func1(y, list1)
print(y)
print(list1)
It´s because a list is a mutable object. You´re not setting x to the value of [0,1,2,3], you´re defining a label to the object [0,1,2,3].
You should declare your function f() like this:
def f(n, x=None):
if x is None:
x = []
...
n is an int (immutable), and a copy is passed to the function, so in the function you are changing the copy.
X is a list (mutable), and a copy of the pointer is passed o the function so x.append(4) changes the contents of the list. However, you you said x = [0,1,2,3,4] in your function, you would not change the contents of x in main().
Python is copy by value of reference. An object occupies a field in memory, and a reference is associated with that object, but itself occupies a field in memory. And name/value is associated with a reference. In python function, it always copy the value of the reference, so in your code, n is copied to be a new name, when you assign that, it has a new space in caller stack. But for the list, the name also got copied, but it refer to the same memory(since you never assign the list a new value). That is a magic in python!
When you are passing the command n = 2 inside the function, it finds a memory space and label it as 2. But if you call the method append, you are basically refrencing to location x (whatever the value is) and do some operation on that.
Python is a pure pass-by-value language if you think about it the right way. A python variable stores the location of an object in memory. The Python variable does not store the object itself. When you pass a variable to a function, you are passing a copy of the address of the object being pointed to by the variable.
Contrast these two functions
def foo(x):
x[0] = 5
def goo(x):
x = []
Now, when you type into the shell
>>> cow = [3,4,5]
>>> foo(cow)
>>> cow
[5,4,5]
Compare this to goo.
>>> cow = [3,4,5]
>>> goo(cow)
>>> goo
[3,4,5]
In the first case, we pass a copy the address of cow to foo and foo modified the state of the object residing there. The object gets modified.
In the second case you pass a copy of the address of cow to goo. Then goo proceeds to change that copy. Effect: none.
I call this the pink house principle. If you make a copy of your address and tell a
painter to paint the house at that address pink, you will wind up with a pink house.
If you give the painter a copy of your address and tell him to change it to a new address,
the address of your house does not change.
The explanation eliminates a lot of confusion. Python passes the addresses variables store by value.
As jouell said. It's a matter of what points to what and i'd add that it's also a matter of the difference between what = does and what the .append method does.
When you define n and x in main, you tell them to point at 2 objects, namely 1 and [1,2,3]. That is what = does : it tells what your variable should point to.
When you call the function f(n,x), you tell two new local variables nf and xf to point at the same two objects as n and x.
When you use "something"="anything_new", you change what "something" points to. When you use .append, you change the object itself.
Somehow, even though you gave them the same names, n in the main() and the n in f() are not the same entity, they only originally point to the same object (same goes for x actually). A change to what one of them points to won't affect the other. However, if you instead make a change to the object itself, that will affect both variables as they both point to this same, now modified, object.
Lets illustrate the difference between the method .append and the = without defining a new function :
compare
m = [1,2,3]
n = m # this tells n to point at the same object as m does at the moment
m = [1,2,3,4] # writing m = m + [4] would also do the same
print('n = ', n,'m = ',m)
to
m = [1,2,3]
n = m
m.append(4)
print('n = ', n,'m = ',m)
In the first code, it will print n = [1, 2, 3] m = [1, 2, 3, 4], since in the 3rd line, you didnt change the object [1,2,3], but rather you told m to point to a new, different, object (using '='), while n still pointed at the original object.
In the second code, it will print n = [1, 2, 3, 4] m = [1, 2, 3, 4]. This is because here both m and n still point to the same object throughout the code, but you modified the object itself (that m is pointing to) using the .append method... Note that the result of the second code will be the same regardless of wether you write m.append(4) or n.append(4) on the 3rd line.
Once you understand that, the only confusion that remains is really to understand that, as I said, the n and x inside your f() function and the ones in your main() are NOT the same, they only initially point to the same object when you call f().
Please allow me to edit again. These concepts are my experience from learning python by try error and internet, mostly stackoverflow. There are mistakes and there are helps.
Python variables use references, I think reference as relation links from name, memory adress and value.
When we do B = A, we actually create a nickname of A, and now the A has 2 names, A and B. When we call B, we actually are calling the A. we create a ink to the value of other variable, instead of create a new same value, this is what we call reference. And this thought would lead to 2 porblems.
when we do
A = [1]
B = A # Now B is an alias of A
A.append(2) # Now the value of A had been changes
print(B)
>>> [1, 2]
# B is still an alias of A
# Which means when we call B, the real name we are calling is A
# When we do something to B, the real name of our object is A
B.append(3)
print(A)
>>> [1, 2, 3]
This is what happens when we pass arguments to functions
def test(B):
print('My name is B')
print(f'My value is {B}')
print(' I am just a nickname, My real name is A')
B.append(2)
A = [1]
test(A)
print(A)
>>> [1, 2]
We pass A as an argument of a function, but the name of this argument in that function is B.
Same one with different names.
So when we do B.append, we are doing A.append
When we pass an argument to a function, we are not passing a variable , we are passing an alias.
And here comes the 2 problems.
the equal sign always creates a new name
A = [1]
B = A
B.append(2)
A = A[0] # Now the A is a brand new name, and has nothing todo with the old A from now on.
B.append(3)
print(A)
>>> 1
# the relation of A and B is removed when we assign the name A to something else
# Now B is a independent variable of hisown.
the Equal sign is a statesment of clear brand new name,
this was the concused part of mine
A = [1, 2, 3]
# No equal sign, we are working on the origial object,
A.append(4)
>>> [1, 2, 3, 4]
# This would create a new A
A = A + [4]
>>> [1, 2, 3, 4]
and the function
def test(B):
B = [1, 2, 3] # B is a new name now, not an alias of A anymore
B.append(4) # so this operation won't effect A
A = [1, 2, 3]
test(A)
print(A)
>>> [1, 2, 3]
# ---------------------------
def test(B):
B.append(4) # B is a nickname of A, we are doing A
A = [1, 2, 3]
test(A)
print(A)
>>> [1, 2, 3, 4]
the first problem is
the left side of and equation is always a brand new name, new variable,
unless the right side is a name, like B = A, this create an alias only
The second problem, there are something would never be changed, we cannot modify the original, can only create a new one.
This is what we call immutable.
When we do A= 123 , we create a dict which contains name, value, and adress.
When we do B = A, we copy the adress and value from A to B, all operation to B effect the same adress of the value of A.
When it comes to string, numbers, and tuple. the pair of value and adress could never be change. When we put a str to some adress, it was locked right away, the result of all modifications would be put into other adress.
A = 'string' would create a protected value and adess to storage the string 'string' . Currently, there is no built-in functions or method cound modify a string with the syntax like list.append, because this code modify the original value of a adress.
the value and adress of a string, a number, or a tuple is protected, locked, immutable.
All we can work on a string is by the syntax of A = B.method , we have to create a new name to storage the new string value.
please extend this discussion if you still get confused.
this discussion help me to figure out mutable / immutable / refetence / argument / variable / name once for all, hopely this could do some help to someone too.
##############################
had modified my answer tons of times and realized i don't have to say anything, python had explained itself already.
a = 'string'
a.replace('t', '_')
print(a)
>>> 'string'
a = a.replace('t', '_')
print(a)
>>> 's_ring'
b = 100
b + 1
print(b)
>>> 100
b = b + 1
print(b)
>>> 101
def test_id(arg):
c = id(arg)
arg = 123
d = id(arg)
return
a = 'test ids'
b = id(a)
test_id(a)
e = id(a)
# b = c = e != d
# this function do change original value
del change_like_mutable(arg):
arg.append(1)
arg.insert(0, 9)
arg.remove(2)
return
test_1 = [1, 2, 3]
change_like_mutable(test_1)
# this function doesn't
def wont_change_like_str(arg):
arg = [1, 2, 3]
return
test_2 = [1, 1, 1]
wont_change_like_str(test_2)
print("Doesn't change like a imutable", test_2)
This devil is not the reference / value / mutable or not / instance, name space or variable / list or str, IT IS THE SYNTAX, EQUAL SIGN.

Categories

Resources