In a program I have something like this going on:
class MyClass:
def __init__(self, index, other_attribute)
self.index = index #is of type int
self.other_attribute = other_attribute
# I know this is being taken out of python 3,
# but I haven't converted it to using __eq__ etc.
def __cmp__(self, other):
if self.index < other.index:
return -1
if self.index == other.index:
return 0
if self.index > other.index:
return -1
here's the problem
#here are some objects
a = MyClass(1, something)
b = MyClass(1, something_else)
c = MyClass(2, something_more)
ary = [a,c]
if b not in ary:
ary.append(b)
This will not append b because their indices are equal, but they are still different instances. This is b == a is true, but b is a is false. I would like to test for membership by address, not by equivalence. Is there a way to have the in and not in operators use is and not ==? Are there other operators/algorithms that would solve this problem?
If you would like to test membership by address, you could use this any/is combo:
if not any(b is x for x in ary):
ary.append(b)
If you are stuck on using the in syntax, you could define your own list object and implement a __contains__ method that compares with is rather than ==.
Related
Python3 removed the cmp parameter which was used for comparison during sorting and replaced it with key.
The cmp_to_key function was introduced to be "primarily used as a transition tool for programs being converted from Python 2".
It is unclear to me how to recreate cmp behavior natively in python3 with key. For example, if I want to sort strings by length, with a tie-breaker by lexicographic order, I would write something like this:
def compare(word1, word2):
if len(word1) > len(word2):
return 1
if len(word2) > len(word1):
return -1
if word1 < word2:
return 1
if word2 < word1:
return -1
return 0
myStrings.sort(key=functools.cmp_to_key(compare), reverse=True)
How can a comparison across objects be done in a python3-native way, meaning without cmp_to_key?
One thing I can think of is returning a more complex key, for example:
# This would return something like 5.12 for "hello" and 5.20 for "world"
def myKey(word):
return len(word) + myLexicographicalValueFunction(word)/100
myStrings.sort(key=myKey, reverse=True)
But this doesn't seem to scale well as you add more variables to compare. What is a pythonic way of using multiple fields for key evaluation during sorting in python3?
For that specific comparison, you could use a tuple as the key:
myStings.sort(key=lambda s:(-len(s),s),reverse=True)
For object classes, you would probably need to implement a comparison operator (e.g. def __lt__(self,other): method) to let the sort know how to compare them natively.
If you can't modify the object class or if your comparison is very complex, you could create a wrapper object which implements the __lt__() method using a lambda that you povide:
class ObjectComp:
def __init__(self,instance,cmp):
self.instance = instance
self.cmp = cmp
def __lt__(self,other):
return self.cmp(self.instance,other.instance)
def cmpsort(objects,cmp):
return [obj.instance for obj in sorted(ObjectComp(o,cmp) for o in objects)]
s = cmpsort([1,11,21,12,13],cmp=lambda a,b:a%10>b%10)
print(s)
[13, 12, 1, 11, 21]
If you have a complicated comparison, you are applying a distinctive meaning to the string that in pythonic terms probably means it would benefit from its own type. Python is happy to allow you to subclass generic types so in your case you probably want something like this:
class InterestingWord(str):
def __lt__(self, x: str) -> bool:
if len(self) < len(x):
return True
return super().__lt__(x)
def __gt__(self, x: str) -> bool:
if len(self) > len(x):
return True
return super().__gt__(x)
Thus:
In [2]: InterestingWord("00") > InterestingWord("1")
Out[2]: True
In [3]: "00" > "1"
Out[3]: False
Why does my equality method produce True when the 2 objects point and b point to 2 different objects in memory?
import math
def main():
point = Point(2, 3)
print(point == Point(2, 3))
b = Point(2, 3)
print(id(point), id(b))
class Point:
def __init__(self, x=0, y=0):
self.x = x
self.y = y
def distance_from_origin(self):
return math.hypot(self.x, self.y)
def __eq__(self, other):
return id(self.x) == id(other.x) and id(self.y) == id(other.y)
def __repr__(self):
return f"Point({self.x!r}, {self.y!r})"
def __str__(self):
return f"{self.x!r}, {self.y!r}"
if name == 'main':
main()
id of Point objects are different, because they're different objects and there is no cache/interning mechanism for them (which would be wrong because they're mutable).
== works because when invoking == on Point, you call __eq__ and it's coded like this:
def __eq__(self, other):
return id(self.x) == id(other.x) and id(self.y) == id(other.y)
so, it's wrong, but it works most of the time because of interning of integers from -5 to 256 in CPython (further tests show that it works with bigger values but it's not guaranteed). Counter example:
a = 912
b = 2345
point = Point(a, b)
print(point == Point(456*2, b))
you'll get False even if 456*2 == 912
Rewrite as this so you won't have surprises with big integers:
def __eq__(self, other):
return self.x == other.x and self.y == other.y
If you remove this __eq__ method, you'll get False, as in that case, Python default == operator on an unknown object only has object identity to perform comparisons.
But the purpose of == is to compare object contents, not ids. Coding an equality method that tests identities can lead to surprises, as shown above.
In Python when people use ==, they expect objects to be equal if values are equal. Identities are an implementation detail, just forget about it.
(Previous versions of Python require you do define __ne__ as well, as it's not automatically the inverse of __eq__ and can lead to strange bugs)
In a nutshell: don't use is (besides is None idiom) or id unless you're writing a very complex low-level program with caching and weird stuff or when debugging your program.
Python caches small integers (in range [-5, 256]), thus id(self.x) == id(other.x) and id(self.y) == id(other.y) is True. Since self.x and other.x are the same objects in memory. Find a different way of comparing those two objects or get rid of your custom __eq__ and use the default way (Python will return False for point == Point(2, 3) in that case).
See this answer for more on the issue.
When comparing tuples of objects apparently the __eq__ method of the object is called and then the compare method:
import timeit
setup = """
import random
import string
import operator
random.seed('slartibartfast')
d={}
class A(object):
eq_calls = 0
cmp_calls = 0
def __init__(self):
self.s = ''.join(random.choice(string.ascii_uppercase) for _ in
range(16))
def __hash__(self): return hash(self.s)
def __eq__(self, other):
self.__class__.eq_calls += 1
return self.s == other.s
def __ne__(self, other): return self.s != other.s
def __cmp__(self, other):
self.__class__.cmp_calls += 1
return cmp(self.s ,other.s)
for i in range(1000): d[A()] = 0"""
print min(timeit.Timer("""
for k,v in sorted(d.iteritems()): pass
print A.eq_calls
print A.cmp_calls""", setup=setup).repeat(1, 1))
print min(timeit.Timer("""
for k,v in sorted(d.iteritems(),key=operator.itemgetter(0)): pass
print A.eq_calls
print A.cmp_calls""", setup=setup).repeat(1, 1))
Prints:
8605
8605
0.0172435735131
0
8605
0.0103719966418
So in the second case where we compare the keys (that is the A instances) directly __eq__ is not called, while in the first case apparently the first ellement of the tuple are compared via equal and then via cmp. But why are they not compared directly via cmp ? What I really don't quite get is the default sorted behavior on the absence of a cmp or key parameter.
It is just how tuple comparison is implemented: tuplerichcompare
it searches the first index where items are different and then compare on that. That's why you see an __eq__ and then a __cmp__ call.
Moreover if you do not implement the __eq__ operator for A, you will see that __cmp__ is called twice once for equality and once for comparison.
For instance,
print min(timeit.Timer("""
l =list()
for i in range(5):
l.append((A(),A(),A()))
l[-1][0].s='foo'
l[-1][1].s='foo2'
for _ in sorted(l): pass
print A.eq_calls
print A.cmp_calls""", setup=setup).repeat(1, 1))
prints out 24 and 8 calls respectively (the exact number clearly depends on random seed but in this case they will always have a ratio of 3)
Is it possible to have a list be evaluated lazily in Python?
For example
a = 1
list = [a]
print list
#[1]
a = 2
print list
#[1]
If the list was set to evaluate lazily then the final line would be [2]
The concept of "lazy" evaluation normally comes with functional languages -- but in those you could not reassign two different values to the same identifier, so, not even there could your example be reproduced.
The point is not about laziness at all -- it is that using an identifier is guaranteed to be identical to getting a reference to the same value that identifier is referencing, and re-assigning an identifier, a bare name, to a different value, is guaranteed to make the identifier refer to a different value from them on. The reference to the first value (object) is not lost.
Consider a similar example where re-assignment to a bare name is not in play, but rather any other kind of mutation (for a mutable object, of course -- numbers and strings are immutable), including an assignment to something else than a bare name:
>>> a = [1]
>>> list = [a]
>>> print list
[[1]]
>>> a[:] = [2]
>>> print list
[[2]]
Since there is no a - ... that reassigns the bare name a, but rather an a[:] = ... that reassigns a's contents, it's trivially easy to make Python as "lazy" as you wish (and indeed it would take some effort to make it "eager"!-)... if laziness vs eagerness had anything to do with either of these cases (which it doesn't;-).
Just be aware of the perfectly simple semantics of "assigning to a bare name" (vs assigning to anything else, which can be variously tweaked and controlled by using your own types appropriately), and the optical illusion of "lazy vs eager" might hopefully vanish;-)
Came across this post when looking for a genuine lazy list implementation, but it sounded like a fun thing to try and work out.
The following implementation does basically what was originally asked for:
from collections import Sequence
class LazyClosureSequence(Sequence):
def __init__(self, get_items):
self._get_items = get_items
def __getitem__(self, i):
return self._get_items()[i]
def __len__(self):
return len(self._get_items())
def __repr__(self):
return repr(self._get_items())
You use it like this:
>>> a = 1
>>> l = LazyClosureSequence(lambda: [a])
>>> print l
[1]
>>> a = 2
>>> print l
[2]
This is obviously horrible.
Python is not really very lazy in general.
You can use generators to emulate lazy data structures (like infinite lists, et cetera), but as far as things like using normal list syntax, et cetera, you're not going to have laziness.
That is a read-only lazy list where it only needs a pre-defined length and a cache-update function:
import copy
import operations
from collections.abc import Sequence
from functools import partialmethod
from typing import Dict, Union
def _cmp_list(a: list, b: list, op, if_eq: bool, if_long_a: bool) -> bool:
"""utility to implement gt|ge|lt|le class operators"""
if a is b:
return if_eq
for ia, ib in zip(a, b):
if ia == ib:
continue
return op(ia, ib)
la, lb = len(a), len(b)
if la == lb:
return if_eq
if la > lb:
return if_long_a
return not if_long_a
class LazyListView(Sequence):
def __init__(self, length):
self._range = range(length)
self._cache: Dict[int, Value] = {}
def __len__(self) -> int:
return len(self._range)
def __getitem__(self, ix: Union[int, slice]) -> Value:
length = len(self)
if isinstance(ix, slice):
clone = copy.copy(self)
clone._range = self._range[slice(*ix.indices(length))] # slicing
return clone
else:
if ix < 0:
ix += len(self) # negative indices count from the end
if not (0 <= ix < length):
raise IndexError(f"list index {ix} out of range [0, {length})")
if ix not in self._cache:
... # update cache
return self._cache[ix]
def __iter__(self) -> dict:
for i, _row_ix in enumerate(self._range):
yield self[i]
__eq__ = _eq_list
__gt__ = partialmethod(_cmp_list, op=operator.gt, if_eq=False, if_long_a=True)
__ge__ = partialmethod(_cmp_list, op=operator.ge, if_eq=True, if_long_a=True)
__le__ = partialmethod(_cmp_list, op=operator.le, if_eq=True, if_long_a=False)
__lt__ = partialmethod(_cmp_list, op=operator.lt, if_eq=False, if_long_a=False)
def __add__(self, other):
"""BREAKS laziness and returns a plain-list"""
return list(self) + other
def __mul__(self, factor):
"""BREAKS laziness and returns a plain-list"""
return list(self) * factor
__radd__ = __add__
__rmul__ = __mul__
Note that this class is discussed also in this SO.
I have a list of objects in Python. I then have another list of objects. I want to go through the first list and see if any items appear in the second list.
I thought I could simply do
for item1 in list1:
for item2 in list2:
if item1 == item2:
print "item %s in both lists"
However this does not seem to work. Although if I do:
if item1.title == item2.title:
it works okay. I have more attributes than this though so don't really want to do 1 big if statement comparing all the attributes if I don't have to.
Can anyone give me help or advise on what I can do to find the objects which appear in both lists.
Thanks
Assuming that your object has only a title attribute which is relevant for equality, you have to implement the __eq__ method as follows:
class YourObject:
[...]
def __eq__(self, other):
return self.title == other.title
Of course if you have more attributes that are relevant for equality, you must include those as well. You might also consider implementing __ne__ and __cmp__ for consistent behaviour.
In case the objects are not the same instance, you need to implement the __eq__ method for python to be able to tell when 2 objects are actually equal.
Of course that most library types, such as strings and lists already have __eq__ implemented, which may be the reason comparing titles works for you (are they strings?).
For further information see the python documentation.
Here is a random example for __eq__.
set intersection will do for that.
>>> x=[1,2,3,4]
>>> y=[3,4,5,6]
>>> for i in set(x) & set(y):
... print "item %d in both lists" %i
...
item 3 in both lists
item 4 in both lists
Finding objects who appear in both lists:
l1 = [1,2,3,4,5]
l2 = [3,4,5]
common = set(l1).intersection(set(l2))
Combine this with the __eq__ implementation on the object as the others suggested.
You need to write an __eq__ function to define how to compare objects for equality. If you want sorting, then yo should have a __cmp__ function, and it makes the most sense to implement __eq__ in terms of __cmp__.
def __eq__(self, other):
return cmp(self, other) == 0
You should probably also implement __hash__, and you definitely should if you plan to put your objects into a set or dictionary. The default __hash__ for objects is id(), which effectively makes all objects unique(i.e. uniqueness is not based on object contents).
I wrote a base class/interface for a class that does this sort of equivalence comparison. You may find it useful:
class Comparable(object):
def attrs(self):
raise Exception("Must be implemented in concrete sub-class!")
def __values(self):
return (getattr(self, attr) for attr in self.attrs())
def __hash__(self):
return reduce(lambda x, y: 37 * x + hash(y), self.__values(), 0)
def __cmp__(self, other):
for s, o in zip(self.__values(), other.__values()):
c = cmp(s, o)
if c:
return c
return 0
def __eq__(self, other):
return cmp(self, other) == 0
def __lt__(self, other):
return cmp(self, other) < 0
def __gt__(self, other):
return cmp(self, other) > 0
if __name__ == '__main__':
class Foo(Comparable):
def __init__(self, x, y):
self.x = x
self.y = y
def attrs(self):
return ('x', 'y')
def __str__(self):
return "Foo[%d,%d]" % (self.x, self.y)
def foo_iter(x):
for i in range(x):
for j in range(x):
yield Foo(i, j)
for a in foo_iter(4):
for b in foo_iter(4):
if a<b: print "%(a)s < %(b)s" % locals()
if a==b: print "%(a)s == %(b)s" % locals()
if a>b: print "%(a)s > %(b)s" % locals()
The derived class must implement attrs() that returns a tuple or list of the object's attributes that contribute to its identity (i.e. unchanging attributes that make it what it is). Most importantly, the code correctly handles equivalence where there are multiple attributes, and this is old school code that is often done incorrectly.
matches = [x for x in listA if x in listB]
Try the following:
list1 = [item1, item2, item3]
list2 = [item3, item4, item5]
for item in list1:
if item in list2:
print "item %s in both lists" % item