Keep a structure "alive" in a function over iteration? - python

I want to keep astructure saved over iterations of a function, in order to keep data to use it for later purpose.
There is a simple way to do it, in using a global variable.
The thing is, anyone could access the variable.
So, is there a way to create a variable accessing only by the function, but that would not be erased when running this function?
Ex :
mylist = []
def test():
global mylist
if mylist:
my_list.append(1)
other_stuff, using my_list
Is exactly what I want to do, but my_list could be accessed by anyone else.
(ok I know, the example is completely dumb)
Sorry for the formulation, but I could not come with something simpler
EDIT : Ok, so with so different (and all interesting) solutions. I'll quickly explain the idea of what I want to do :)
Let's imagine I want to calculate is a number is a prime. This is usually computationnaly coslty.
So I got a is_prime(value) function, returning False or True.
value can be any number, but there are chances (in my case) that value takes several times the same value (litteraly ^^).
In this case, I could use a (not to long ) list of primes I have already found to quickly check and save computations.
But my prime_list is no use in a function returning true/false.
So here is my question :).
Hope to clarify some minds (including me! ).

Here is a case where a list as the default value of a parameter can come in handy:
def test(myList=[]):
myList.append(1)
other_stuff, using myList
I leave the if myList off, since it will exist (it was created when test was defined). (Plus, if myList tests if the list is non-empty, not defined).
myList, as a function parameter, is local to test. When test is defined, it is initialized to a list. The value of the list persists for the life of your program.
Compare to the standard advice of
def test(myList = None):
if myList is None:
myList = []
which is necessary when you want a fresh empty list inside test at each call if no list is provided by the call.

You can add a "static variable" to the function as follows:
>>> def fun(x):
... fun.counter += x
... print fun.counter
>>> fun.counter = 0
>>> fun(1)
1
>>> fun(5)
6

One solution is to wrap it in another function:
def make_tester():
mylist = []
def test():
if mylist:
my_list.append(1)
# other_stuff, using my_list
return test
test = make_tester()
This incurs a small overhead in that you have to first call tester to create the function that will be called later. OTOH, it has the possible benefit that you can establish multiple contexts, each with their own instance of mylist:
test_a = make_tester()
test_b = make_tester()

Related

Dictionary comprehension multiple ways

What would be the difference between the following two statements in python?
l = [1,2,3,4]
a = {item:0 for item in l}
b = dict((item,0) for item in l)
a == b
# True
I believe the first is the proper way to initialize a dictionary via comprehension from PEP, yet the second way seems to just create a generator expression and then create a dict from that (and so maybe it does the exact same thing as the first approach behind the scenes?). What actually would be the difference between the two, and which one should be preferred over the other?
a = {item:0 for item in l}
Directly constructs a dict, no intermediates.
b = dict((item,0) for item in l)
Generates a tuple for each item in the list and feeds that to the dict() constructor.
Without really digging into the guts of the resulting Python byte code, I doubt there's an easy way of finding out how exactly they differ. Performance-wise, they are likely to be very close as well.
The main thing here I would consider is readability and maintainability. The first way only relies on the elements you need, without involving an intermediate data type (tuple) and without directly calling a type, but instead relying on the language itself to hook things up correctly. As a bonus, it's shorter and simpler - I don't see any advantage in using the second option, except maybe for the explicit use of dict, telling others what the expected type is. But if they don't get that from the {} in the first instance, I doubt they're much good anyway...
I figured I'd test the speed:
from timeit import timeit
from random import randint
l = [randint(0, 1000) for _ in range(1000)]
def first():
return {item: 0 for item in l}
def second():
return dict((item,0) for item in l)
print(timeit(first, number=10000))
print(timeit(second, number=10000))
Result:
0.46899440000000003
1.0817516999999999
Consistently faster as well, so no need to ever use the second option, it seems. If there's anything surprising here, it's actually how poorly optimised the second example is and how badly it performs.

Python adding additional code to inline for loops?

The following is totally bogus code. But let's say you needed to do some extra side effecting function calls (for debugging to logs)? How would you put that in?
[ i for i in range(10) ]
Or does one always have to rewrite as a normal for loop?
list=[]
for i in range(10):
otherStuff()
list.append(i)
In C, there is a comma operator for such things...
Plainly, don't use side-effects in list comprehensions. It makes your code incredibly unclear to the next person who has to maintain it, even if you understand it perfectly. List comprehensions are a succinct way of creating a list, not a way to call a function n times.
For further reading, see the question Is it Pythonic to use list comprehensions for just side effects?
In other words, you should use an explicit for loop for that.
You need to include a call to your side-effect-having code somewhere in your value expression, but you need to ignore that value.
or is one possible choice for this. Just make sure that your side-effect function returns a "Falsey" value (False, None, 0, etc.), and put your debug call in the left-hand side of the or.
def debug_func(i):
print i, i**3
return None
whole_numbers = [ debug_func(i) or i for i in range(10) ]
print whole_numbers
As an alternative, your function could be an identity function, always returning its sole argument:
def debug_func(i):
print i, i**3
return i
# Production code:
whole_numbers = [i for i in range(10)]
# Debug code
whole_numbers = [debug_func(i) for i in range(10)]
Here's one option that doesn't require anything about what your function returns:
[(myfunc(), i)[1] for i in range(10)]
You can also do more than one function at a time:
[(myfunc(), myfunc2(), i)[-1] for i in range(10)]

Why is my (local) variable behaving like a global variable?

I have no use for a global variable and never define one explicitly, and yet I seem to have one in my code. Can you help me make it local, please?
def algo(X): # randomized algorithm
while len(X)>2:
# do a bunch of things to nested list X
print(X)
# tracing: output is the same every time, where it shouldn't be.
return len(X[1][1])
def find_min(X): # iterate algo() multiple times to find minimum
m = float('inf')
for i in some_range:
new = algo(X)
m = min(m, new)
return m
X = [[[..], [...]],
[[..], [...]],
[[..], [...]]]
print(find_min(X))
print(X)
# same value as inside the algo() call, even though it shouldn't be affected.
X appears to be behaving like a global variable. The randomized algorithm algo() is really performed only once on the first call because with X retaining its changed value, it never makes it inside the while loop. The purpose of iterations in find_min is thus defeated.
I'm new to python and even newer to this forum, so let me know if I need to clarify my question. Thanks.
update Many thanks for all the answers so far. I almost understand it, except I've done something like this before with a happier result. Could you explain why this code below is different, please?
def qsort(X):
for ...
# recursively sort X in place
count+=1 # count number of operations
return X, count
X = [ , , , ]
Y, count = qsort(X)
print(Y) # sorted
print(X) # original, unsorted.
Thank you.
update II To answer my own second question, the difference seems to be the use of a list method in the first code (not shown) and the lack thereof in the second code.
As others have pointed out already, the problem is that the list is passed as a reference to the function, so the list inside the function body is the very same object as the one you passed to it as an argument. Any mutations your function performs are thus visible from outside.
To solve this, your algo function should operate on a copy of the list that it gets passed.
As you're operating on a nested list, you should use the deepcopy function from the copy module to create a copy of your list that you can freely mutate without affecting anything outside of your function. The built-in list function can also be used to copy lists, but it only creates shallow copies, which isn't what you want for nested lists, because the inner lists would still just be pointers to the same objects.
from copy import deepcopy
def algo (X):
X = deepcopy(X)
...
When you do find_min(X), you are passing the object X (a list in this case) to the function. If that function mutates the list (e.g., by appending to it) then yes, it will affect the original object. Python does not copy objects just because you pass them to a function.
When you pass an object to a python function, the object isn't copied, but rather a pointer to the object is passed.
This makes sense because it greatly speeds up execution - in the case of a long list, there is no need to copy all of its elements.
However, this means that when you modify a passed object (for example, your list X), the modification applies to that object, even after the function returns.
For example:
def foo(x):
x.extend('a')
print x
l = []
foo(l)
foo(l)
Will print:
['a']
['a', 'a']
Python lists are mutable (i.e., they can be changed) and the use of algo within find_min function call does change the value of X (i.e., it is pass-by-reference for lists). See this SO question, for example.

Python class function default variables are class objects? [duplicate]

This question already has answers here:
Why does using `arg=None` fix Python's mutable default argument issue?
(5 answers)
"Least Astonishment" and the Mutable Default Argument
(33 answers)
Closed 9 months ago.
I was writing some code this afternoon, and stumbled across a bug in my code. I noticed that the default values for one of my newly created objects was carrying over from another object! For example:
class One(object):
def __init__(self, my_list=[]):
self.my_list = my_list
one1 = One()
print(one1.my_list)
[] # empty list, what you'd expect.
one1.my_list.append('hi')
print(one1.my_list)
['hi'] # list with the new value in it, what you'd expect.
one2 = One()
print(one2.my_list)
['hi'] # Hey! It saved the variable from the other One!
So I know it can be solved by doing this:
class One(object):
def __init__(self, my_list=None):
self.my_list = my_list if my_list is not None else []
What I would like to know is... Why? Why are Python classes structured so that the default values are saved across instances of the class?
This is a known behaviour of the way Python default values work, which is often surprising to the unwary. The empty array object [] is created at the time of definition of the function, rather than at the time it is called.
To fix it, try:
def __init__(self, my_list=None):
if my_list is None:
my_list = []
self.my_list = my_list
Several others have pointed out that this is an instance of the "mutable default argument" issue in Python. The basic reason is that the default arguments have to exist "outside" the function in order to be passed into it.
But the real root of this as a problem has nothing to do with default arguments. Any time it would be bad if a mutable default value was modified, you really need to ask yourself: would it be bad if an explicitly provided value was modified? Unless someone is extremely familiar with the guts of your class, the following behaviour would also be very surprising (and therefore lead to bugs):
>>> class One(object):
... def __init__(self, my_list=[]):
... self.my_list = my_list
...
>>> alist = ['hello']
>>> one1 = One(alist)
>>> alist.append('world')
>>> one2 = One(alist)
>>>
>>> print(one1.my_list) # Huh? This isn't what I initialised one1 with!
['hello', 'world']
>>> print(one2.my_list) # At least this one's okay...
['hello', 'world']
>>> del alist[0]
>>> print one2.my_list # What the hell? I just modified a local variable and a class instance somewhere else got changed?
['world']
9 times out of 10, if you discover yourself reaching for the "pattern" of using None as the default value and using if value is None: value = default, you shouldn't be. You should be just not modifying your arguments! Arguments should not be treated as owned by the called code unless it is explicitly documented as taking ownership of them.
In this case (especially because you're initialising a class instance, so the mutable variable is going to live a long time and be used by other methods and potentially other code that retrieves it from the instance) I would do the following:
class One(object):
def __init__(self, my_list=[])
self.my_list = list(my_list)
Now you're initialising the data of your class from a list provided as input, rather than taking ownership of a pre-existing list. There's no danger that two separate instances end up sharing the same list, nor that the list is shared with a variable in the caller which the caller may want to continue using. It also has the nice effect that your callers can provide tuples, generators, strings, sets, dictionaries, home-brewed custom iterable classes, etc, and you know you can still count on self.my_list having an append method, because you made it yourself.
There's still a potential problem here, if the elements contained in the list are themselves mutable then the caller and this instance can still accidentally interfere with each other. I find it not to very often be a problem in practice in my code (so I don't automatically take a deep copy of everything), but you have to be aware of it.
Another issue is that if my_list can be very large, the copy can be expensive. There you have to make a trade-off. In that case, maybe it is better to just use the passed-in list after all, and use the if my_list is None: my_list = [] pattern to prevent all default instances sharing the one list. But if you do that you need to make it clear, either in documentation or the name of the class, that callers are relinquishing ownership of the lists they use to initialise the instance. Or, if you really want to be constructing a list solely for the purpose of wrapping up in an instance of One, maybe you should figure out how to encapsulate the creation of the list inside the initialisation of One, rather than constructing it first; after all, it's really part of the instance, not an initialising value. Sometimes this isn't flexible enough though.
And sometimes you really honestly do want to have aliasing going on, and have code communicating by mutating values they both have access to. I think very hard before I commit to such a design, however. And it will surprise others (and you when you come back to the code in X months), so again documentation is your friend!
In my opinion, educating new Python programmers about the "mutable default argument" gotcha is actually (slightly) harmful. We should be asking them "Why are you modifying your arguments?" (and then pointing out the way default arguments work in Python). The very fact of a function having a sensible default argument is often a good indicator that it isn't intended as something that receives ownership of a pre-existing value, so it probably shouldn't be modifying the argument whether or not it got the default value.
Basically, python function objects store a tuple of default arguments, which is fine for immutable things like integers, but lists and other mutable objects are often modified in-place, resulting in the behavior you observed.
This is standard behavior of default arguments anywhere in Python, not just in classes.
For more explanation, see Mutable defaults for function/method arguments.
Python functions are objects. Default arguments of a function are attributes of that function. So if the default value of an argument is mutable and it's modified inside your function, the changes are reflected in subsequent calls to that function.
Not an answer, but it's worth noting this is also true for class variables defined outside any class functions.
Example:
>>> class one:
... myList = []
...
>>>
>>> one1 = one()
>>> one1.myList
[]
>>> one2 = one()
>>> one2.myList.append("Hello Thar!")
>>>
>>> one1.myList
['Hello Thar!']
>>>
Note that not only does the value of myList persist, but every instance of myList points to the same list.
I ran into this bug/feature myself, and spent something like 3 hours trying to figure out what was going on. It's rather challenging to debug when you are getting valid data, but it's not from the local computations, but previous ones.
It's made worse since this is not just a default argument. You can't just put myList in the class definition, it has to be set equal to something, although whatever it is set equal to is only evaluated once.
The solution, at least for me, was to simply create all the class variable inside __init__.

How to maintain lists and dictionaries between function calls in Python?

I have a function. Inside that I'm maintainfing a dictionary of values.
I want that dictionary to be maintained between different function calls
Suppose the dic is :
a = {'a':1,'b':2,'c':3}
At first call,say,I changed a[a] to 100
Dict becomes a = {'a':100,'b':2,'c':3}
At another call,i changed a[b] to 200
I want that dic to be a = {'a':100,'b':200,'c':3}
But in my code a[a] doesn't remain 100.It changes to initial value 1.
I need an answer ASAP....I m already late...Please help me friends...
You might be talking about a callable object.
class MyFunction( object ):
def __init__( self ):
self.rememberThis= dict()
def __call__( self, arg1, arg2 ):
# do something
rememberThis['a'] = arg1
return someValue
myFunction= MyFunction()
From then on, use myFunction as a simple function. You can access the rememberThis dictionary using myFunction.rememberThis.
You could use a static variable:
def foo(k, v):
foo.a[k] = v
foo.a = {'a': 1, 'b': 2, 'c': 3}
foo('a', 100)
foo('b', 200)
print foo.a
Rather than forcing globals on the code base (that can be the decision of the caller) I prefer the idea of keeping the state related to an instance of the function. A class is good for this but doesn't communicate well what you are trying to accomplish and can be a bit verbose. Taking advantage of closures is, in my opinion, a lot cleaner.
def function_the_world_sees():
a = {'a':1,'b':2,'c':3}
def actual_function(arg0, arg1):
a[arg0] = arg1
return a
return actual_function
stateful_function = function_the_world_sees()
stateful_function("b", 100)
stateful_function("b", 200)
The main caution to keep in mind is that when you make assignments in "actual_function", they occur within "actual_function". This means you can't reassign a to a different variable. The work arounds I use are to put all of my variables I plan to reassign into either into a single element list per variable or a dictionary.
If 'a' is being created inside the function. It is going out of scope. Simply create it outside the function(and before the function is called). By doing this the list/hash will not be deleted after the program leaves the function.
a = {'a':1,'b':2,'c':3}
# call you funciton here
This question doesn't have an elegant answer, in my opinion. The options are callable objects, default values, and attribute hacks. Callable objects are the right answer, but they bring in a lot of structure for what would be a single "static" declaration in another language. Default values are a minor change to the code, but it's kludgy and can be confusing to a new python programmer looking at your code. I don't like them because their existence isn't hidden from anyone who might be looking at your API.
I generally go with an attribute hack. My preferred method is:
def myfunct():
if not hasattr(myfunct, 'state'): myfunct.state = list()
# access myfunct.state in the body however you want
This keeps the declaration of the state in the first line of the function where it belongs, as well as keeping myfunct as a function. The downside is you do the attribute check every time you call the function. This is almost certainly not going to be a bottleneck in most code.
You can 'cheat' using Python's behavior for default arguments. Default arguments are only evaluated once; they get reused for every call of the function.
>>> def testFunction(persistent_dict={'a': 0}):
... persistent_dict['a'] += 1
... print persistent_dict['a']
...
>>> testFunction()
1
>>> testFunction()
2
This isn't the most elegant solution; if someone calls the function and passes in a parameter it will override the default, which probably isn't what you want.
If you just want a quick and dirty way to get the results, that will work. If you're doing something more complicated it might be better to factor it out into a class like S. Lott mentioned.
EDIT: Renamed the dictionary so it wouldn't hide the builtin dict as per the comment below.
Personally, I like the idea of the global statement. It doesn't introduce a global variable but states that a local identifier actually refers to one in the global namespace.
d = dict()
l = list()
def foo(bar, baz):
global d
global l
l.append(bar, baz)
d[bar] = baz
In python 3.0 there is also a "nonlocal" statement.

Categories

Resources