Python Scoping/Static Misunderstanding - python

I'm really stuck on why the following code block 1 result in output 1 instead of output 2?
Code block 1:
class FruitContainer:
def __init__(self,arr=[]):
self.array = arr
def addTo(self,something):
self.array.append(something)
def __str__(self):
ret = "["
for item in self.array:
ret = "%s%s," % (ret,item)
return "%s]" % ret
arrayOfFruit = ['apple', 'banana', 'pear']
arrayOfFruitContainers = []
while len(arrayOfFruit) > 0:
tempFruit = arrayOfFruit.pop(0)
tempB = FruitContainer()
tempB.addTo(tempFruit)
arrayOfFruitContainers.append(tempB)
for container in arrayOfFruitContainers:
print container
**Output 1 (actual):**
[apple,banana,pear,]
[apple,banana,pear,]
[apple,banana,pear,]
**Output 2 (desired):**
[apple,]
[banana,]
[pear,]
The goal of this code is to iterate through an array and wrap each in a parent object. This is a reduction of my actual code which adds all apples to a bag of apples and so forth. My guess is that, for some reason, it's either using the same object or acting as if the fruit container uses a static array. I have no idea how to fix this.

You should never use a mutable value (like []) for a default argument to a method. The value is computed once, and then used for every invocation. When you use an empty list as a default value, that same list is used every time the method is invoked without the argument, even as the value is modified by previous function calls.
Do this instead:
def __init__(self,arr=None):
self.array = arr or []

Your code has a default argument to initialize the class. The value of the default argument is evaluated once, at compile time, so every instance is initialized with the same list. Change it like so:
def __init__(self, arr=None):
if arr is None:
self.array = []
else:
self.array = arr
I discussed this more fully here: How to define a class in Python

As Ned says, the problem is you are using a list as a default argument. There is more detail here. The solution is to change __init__ function as below:
def __init__(self,arr=None):
if arr is not None:
self.array = arr
else:
self.array = []

A better solution than passing in None — in this particular instance, rather than in general — is to treat the arr parameter to __init__ as an enumerable set of items to pre-initialize the FruitContainer with, rather than an array to use for internal storage:
class FruitContainer:
def __init__(self, arr=()):
self.array = list(arr)
...
This will allow you to pass in other enumerable types to initialize your container, which more advanced Python users will expect to be able to do:
myFruit = ('apple', 'pear') # Pass a tuple
myFruitContainer = FruitContainer(myFruit)
myOtherFruit = file('fruitFile', 'r') # Pass a file
myOtherFruitContainer = FruitContainer(myOtherFruit)
It will also defuse another potential aliasing bug:
myFruit = ['apple', 'pear']
myFruitContainer1 = FruitContainer(myFruit)
myFruitContainer2 = FruitContainer(myFruit)
myFruitContainer1.addTo('banana')
'banana' in str(myFruitContainer2)
With all other implementations on this page, this will return True, because you have accidentally aliased the internal storage of your containers.
Note: This approach is not always the right answer: "if not None" is better in other cases. Just ask yourself: am I passing in a set of objects, or a mutable container? If the class/function I'm passing my objects in to changes the storage I gave it, would that be (a) surprising or (b) desirable? In this case, I would argue that it is (a); thus, the list(...) call is the best solution. If (b), "if not None" would be the right approach.

Related

Python DFS, can't figure out why return list is empty if using append/pop() but works for []+[] in recursive call

can someone plz tell me why this code generate empty "res" variable? If uncomment the commented line and remove below will be working fine.
Codes not work:
class Solution(object):
def dfs(self, nums, res, line):
if not nums:
print(line)
res.append(line)
return
for i, num in enumerate(nums):
line.append(num)
# self.dfs(nums[:i]+nums[i+1:], res, line+[num])
self.dfs(nums[:i]+nums[i+1:], res, line)
line.pop()
def permute(self, nums):
"""
:type nums: List[int]
:rtype: List[List[int]]
"""
res = []
self.dfs(nums, res, [])
print(res)
return res
if __name__ == "__main__":
Solution().permute([1,2])
Works fine if changing to:
for i, num in enumerate(nums):
self.dfs(nums[:i]+nums[i+1:], res, line+[num])
except using append/pop for passing the DFS. Even the "line" variable before appending to "res" is correct.
Does it have something to do with referencing? The only thing I could think of is whatever passed to res got cleaned up. I would really appreciate if someone could show me the link to reference.
Your guessing is correct. Just like Java, list in python is copy-by-reference. Consider the following example:
a = [1, 2, 3]
b = a
a.append(4)
print(b)
You will get
[1, 2, 3, 4]
Thus, one easy way to fix the code is that replacing the res.append(line) in dfs() right before return into:
res.append(line[:])
Basically, line[:] will create a separate list object. You can see more information in this thread.
Python function call semantics:
Are not pass-by-reference. The receiving end does not need to de-reference it.
Are not pass-by-value. There is no copy made of the value.
May best be described as pass by object. The object itself is passed to the function; not a reference, not a copy, the very object itself.
See Frederik Lundh's article about call by object semantics. The Wikipedia article for this names it “call by sharing” and attributes Barbara Liskov with coining that term.
This means that any container object you pass as a parameter will be present in the function as the same object, and modifications there will persist in the container object. The same is true of any mutable object.
So, if you want to modify a container passed in, don't modify it directly. Create a copy (usually by calling the type, e.g. list, to get a new instance), and modify that.
If the function's purpose is to generate a new container, it should create its own instance and return that copy.
So I think the dfs function is poorly designed. It should instead do something like:
def dfs(nums, line):
result = []
for i, num in enumerate(nums):
line.append(num)
result.extend(dfs(nums[:i]+nums[i+1:], line))
line.pop()
else:
print(line)
result.append(line)
return result

Python: Passing mutable(?) object to method

I'm trying to implement a class with a method that calls another method with an object that's part of the class where the lowest method mutates the object. My implementation is a little more complicated, so I'll post just some dummy code so you can see what I'm talking about:
class test:
def __init__(self,list):
self.obj = list
def mult(self, x, n):
x = x*n
def numtimes(self, n):
self.mult(self.obj, n)
Now, if I create an object of this type and run the numtimes method, it won't update self.obj:
m = test([1,2,3,4])
m.numtimes(3)
m.obj #returns [1,2,3,4]
Whereas I'd like it to give me [1,2,3,4,1,2,3,4,1,2,3,4]
Basically, I need to pass self.obj to the mult method and have it mutate self.obj so that when I call m.obj, I'll get [1,2,3,4,1,2,3,4,1,2,3,4] instead of [1,2,3,4].
I feel like this is just a matter of understanding how python passes objects as arguments to methods (like it's making a copy of the object, and instead I need to use a pointer), but maybe not. I'm new to python and could really use some help here.
Thanks in advance!!
Allow me to take on the bigger subject of mutability.
Lists are mutable objects, and support both mutable operations, and immutable operations. That means, operations that change the list in-place, and operations that return a new list. Tuples, for contrast, only are only immutable.
So, to multiply a list, you can choose two methods:
a *= b
This is a mutable operation, that will change 'a' in-place.
a = a * b
This is an immutable operation. It will evaluate 'a*b', create a new list with the correct value, and assign 'a' to that new list.
Here, already, lies a solution to your problem. But, I suggest you read on a bit. When you pass around lists (and other objects) as parameters, you are only passing a new reference, or "pointer" to that same list. So running mutable operations on that list will also change the one that you passed. The result might be a very subtle bug, when you write:
>>> my_list = [1,2,3]
>>> t = test(my_list)
>>> t.numtimes(2)
>>> my_list
[1,2,3,1,2,3] # Not what you intended, probably!
So here's my final recommendation. You can choose to use mutable operations, that's fine. But then create a new copy from your arguments, as such:
def __init__(self,l):
self.obj = list(l)
OR use immutable operations, and reassign them to self:
def mult(self, x, n):
self.x = x*n
Or do both, there's no harm in being extra safe :)
The multiplication x * n creates a new instance and does not alter the existing list. See here:
a = [1]
print (id (a) )
a = a * 2
print (id (a) )
This should work:
class test:
def __init__(self,list):
self.obj = list
def mult(_, x, n):
x *= n
def numtimes(self, n):
self.mult(self.obj, n)

Python: printing all nodes of tree unintentionally stores data

I've created a general tree in python, by creating a Node object. Each node can have either 0, 1, or 2 trees.
I'm trying to create a method to print a list of all the nodes in a tree. The list need not be in order. Here's my simplistic attempt:
def allChildren(self, l = list()):
l.append(self)
for child in self.children:
l = child.allChildren(l)
return l
The first time I run this method, it works correctly. However, for some reason it is storing the previous runs. The second time I run the method, it prints all the nodes twice. Even if I create 2 separate trees, it still remembers the previous runs. E.g: I create 2 trees, a and b. If I run a.allChildren() I receive the correct result. Then I run b.allChildren() and recieve all of a's nodes and all of b's nodes.
You have a mutable value as the default value of your function parameter l. In Python, this means that when you call l.append(self), you are permanently modifying the default parameter.
In order to avoid this problem, set l to a new list every time the function is called, if no list is passed in:
def allChildren(self, l = None):
if l is None:
l = list()
l.append(self)
for child in self.children:
l = child.allChildren(l)
return l
This phenomenon is explained much more thoroughly in this question.
try this:
def allChildren(self, l = None):
if(l==None):
l = list()
l.append(self)
for child in self.children:
l = child.allChildren(l)
return l
And check out this answer for explanation.
If you're writing default parameter like l = list(), it will create list when compiling function, so it will one instance of list for all function calls. To prevent this, use None and create new list inside the function:
def allChildren(self, l = None):
if not l: l = []
l.append(self)
for child in self.children:
l = child.allChildren(l)
return l

How to remove duplicates in set for objects?

I have set of objects:
class Test(object):
def __init__(self):
self.i = random.randint(1,10)
res = set()
for i in range(0,1000):
res.add(Test())
print len(res) = 1000
How to remove duplicates from set of objects ?
Thanks for answers, it's work:
class Test(object):
def __init__(self, i):
self.i = i
# self.i = random.randint(1,10)
# self.j = random.randint(1,20)
def __keys(self):
t = ()
for key in self.__dict__:
t = t + (self.__dict__[key],)
return t
def __eq__(self, other):
return isinstance(other, Test) and self.__keys() == other.__keys()
def __hash__(self):
return hash(self.__keys())
res = set()
res.add(Test(2))
...
res.add(Test(8))
result: [2,8,3,4,5,6,7]
but how to save order ? Sets not support order. Can i use list instead set for example ?
Your objects must be hashable (i.e. must have __eq__() and __hash__() defined) for sets to work properly with them:
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
def __eq__(self, other):
return self.i == other.i
def __hash__(self):
return self.i
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
If you have several attributes, hash and compare a tuple of them (thanks, delnan):
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
self.k = random.randint(1, 10)
self.j = random.randint(1, 10)
def __eq__(self, other):
return (self.i, self.k, self.j) == (other.i, other.k, other.j)
def __hash__(self):
return hash((self.i, self.k, self.j))
Your first question is already answered by Pavel Anossov.
But you have another question:
but how to save order ? Sets not support order. Can i use list instead set for example ?
You can use a list, but there are a few downsides:
You get the wrong interface.
You don't get automatic handling of duplicates. You have to explicitly write if foo not in res: res.append(foo). Obviously, you can wrap this up in a function instead of writing it repeatedly, but it's still extra work.
It's going to be a lot less efficient if the collection can get large. Basically, adding a new element, checking whether an element already exists, etc. are all going to be O(N) instead of O(1).
What you want is something that works like an ordered set. Or, equivalently, like a list that doesn't allow duplicates.
If you do all your adds first, and then all your lookups, and you don't need lookups to be fast, you can get around this by first building a list, then using unique_everseen from the itertools recipes to remove duplicates.
Or you could just keep a set and a list or elements by order (or a list plus a set of elements seen so far). But that can get a bit complicated, so you might want to wrap it up.
Ideally, you want to wrap it up in a type that has exactly the same API as set. Something like an OrderedSet akin to collections.OrderedDict.
Fortunately, if you scroll to the bottom of that docs page, you'll see that exactly what you want already exists; there's a link to an OrderedSet recipe at ActiveState.
So, copy it, paste it into your code, then just change res = set() to res = OrderedSet(), and you're done.
I think you can easily do what you want with a list as you asked in your first post since you defined the eq operator :
l = []
if Test(0) not in l :
l.append(Test(0))
My 2 cts ...
Pavel Anossov's answer is great for allowing your class to be used in a set with the semantics you want. However, if you want to preserve the order of your items, you'll need a bit more. Here's a function that de-duplicates a list, as long as the list items are hashable:
def dedupe(lst):
seen = set()
results = []
for item in lst:
if item not in seen:
seen.add(item)
results.append(item)
return results
A slightly more idiomatic version would be a generator, rather than a function that returns a list. This gets rid of the results variable, using yield rather than appending the unique values to it. I've also renamed the lst parameter to iterable, since it will work just as well on any iterable object (such as another generator).
def dedupe(iterable):
seen = set()
for item in iterable:
if item not in seen:
seen.add(item)
yield item

Python set with the ability to pop a random element

I am in need of a Python (2.7) object that functions like a set (fast insertion, deletion, and membership checking) but has the ability to return a random value. Previous questions asked on stackoverflow have answers that are things like:
import random
random.sample(mySet, 1)
But this is quite slow for large sets (it runs in O(n) time).
Other solutions aren't random enough (they depend on the internal representation of python sets, which produces some results which are very non-random):
for e in mySet:
break
# e is now an element from mySet
I coded my own rudimentary class which has constant time lookup, deletion, and random values.
class randomSet:
def __init__(self):
self.dict = {}
self.list = []
def add(self, item):
if item not in self.dict:
self.dict[item] = len(self.list)
self.list.append(item)
def addIterable(self, item):
for a in item:
self.add(a)
def delete(self, item):
if item in self.dict:
index = self.dict[item]
if index == len(self.list)-1:
del self.dict[self.list[index]]
del self.list[index]
else:
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[item]
def getRandom(self):
if self.list:
return self.list[random.randomint(0,len(self.list)-1)]
def popRandom(self):
if self.list:
index = random.randint(0,len(self.list)-1)
if index == len(self.list)-1:
del self.dict[self.list[index]]
return self.list.pop()
returnValue = self.list[index]
self.list[index] = self.list.pop()
self.dict[self.list[index]] = index
del self.dict[returnValue]
return returnValue
Are there any better implementations for this, or any big improvements to be made to this code?
I think the best way to do this would be to use the MutableSet abstract base class in collections. Inherit from MutableSet, and then define add, discard, __len__, __iter__, and __contains__; also rewrite __init__ to optionally accept a sequence, just like the set constructor does. MutableSet provides built-in definitions of all other set methods based on those methods. That way you get the full set interface cheaply. (And if you do this, addIterable is defined for you, under the name extend.)
discard in the standard set interface appears to be what you have called delete here. So rename delete to discard. Also, instead of having a separate popRandom method, you could just define popRandom like so:
def popRandom(self):
item = self.getRandom()
self.discard(item)
return item
That way you don't have to maintain two separate item removal methods.
Finally, in your item removal method (delete now, discard according to the standard set interface), you don't need an if statement. Instead of testing whether index == len(self.list) - 1, simply swap the final item in the list with the item at the index of the list to be popped, and make the necessary change to the reverse-indexing dictionary. Then pop the last item from the list and remove it from the dictionary. This works whether index == len(self.list) - 1 or not:
def discard(self, item):
if item in self.dict:
index = self.dict[item]
self.list[index], self.list[-1] = self.list[-1], self.list[index]
self.dict[self.list[index]] = index
del self.list[-1] # or in one line:
del self.dict[item] # del self.dict[self.list.pop()]
One approach you could take is to derive a new class from set which salts itself with random objects of a type derived from int.
You can then use pop to select a random element, and if it is not of the salt type, reinsert and return it, but if it is of the salt type, insert a new, randomly-generated salt object (and pop to select a new object).
This will tend to alter the order in which objects are selected. On average, the number of attempts will depend on the proportion of salting elements, i.e. amortised O(k) performance.
Can't we implement a new class inheriting from set with some (hackish) modifications that enable us to retrieve a random element from the list with O(1) lookup time? Btw, on Python 2.x you should inherit from object, i.e. use class randomSet(object). Also PEP8 is something to consider for you :-)
Edit:
For getting some ideas of what hackish solutions might be capable of, this thread is worth reading:
http://python.6.n6.nabble.com/Get-item-from-set-td1530758.html
Here's a solution from scratch, which adds and pops in constant time. I also included some extra set functions for demonstrative purposes.
from random import randint
class RandomSet(object):
"""
Implements a set in which elements can be
added and drawn uniformly and randomly in
constant time.
"""
def __init__(self, seq=None):
self.dict = {}
self.list = []
if seq is not None:
for x in seq:
self.add(x)
def add(self, x):
if x not in self.dict:
self.dict[x] = len(self.list)
self.list.append(x)
def pop(self, x=None):
if x is None:
i = randint(0,len(self.list)-1)
x = self.list[i]
else:
i = self.dict[x]
self.list[i] = self.list[-1]
self.dict[self.list[-1]] = i
self.list.pop()
self.dict.pop(x)
return x
def __contains__(self, x):
return x in self.dict
def __iter__(self):
return iter(self.list)
def __repr__(self):
return "{" + ", ".join(str(x) for x in self.list) + "}"
def __len__(self):
return len(self.list)
Yes, I'd implement an "ordered set" in much the same way you did - and use a list as an internal data structure.
However, I'd inherit straight from "set" and just keep track of the added items in an
internal list (as you did) - and leave the methods I don't use alone.
Maybe add a "sync" method to update the internal list whenever the set is updated
by set-specific operations, like the *_update methods.
That if using an "ordered dict" does not cover your use cases. (I just found that trying to cast ordered_dict keys to a regular set is not optmized, so if you need set operations on your data that is not an option)
If you don't mind only supporting comparable elements, then you could use blist.sortedset.

Categories

Resources