I have a list of objects in Python. I then have another list of objects. I want to go through the first list and see if any items appear in the second list.
I thought I could simply do
for item1 in list1:
for item2 in list2:
if item1 == item2:
print "item %s in both lists"
However this does not seem to work. Although if I do:
if item1.title == item2.title:
it works okay. I have more attributes than this though so don't really want to do 1 big if statement comparing all the attributes if I don't have to.
Can anyone give me help or advise on what I can do to find the objects which appear in both lists.
Thanks
Assuming that your object has only a title attribute which is relevant for equality, you have to implement the __eq__ method as follows:
class YourObject:
[...]
def __eq__(self, other):
return self.title == other.title
Of course if you have more attributes that are relevant for equality, you must include those as well. You might also consider implementing __ne__ and __cmp__ for consistent behaviour.
In case the objects are not the same instance, you need to implement the __eq__ method for python to be able to tell when 2 objects are actually equal.
Of course that most library types, such as strings and lists already have __eq__ implemented, which may be the reason comparing titles works for you (are they strings?).
For further information see the python documentation.
Here is a random example for __eq__.
set intersection will do for that.
>>> x=[1,2,3,4]
>>> y=[3,4,5,6]
>>> for i in set(x) & set(y):
... print "item %d in both lists" %i
...
item 3 in both lists
item 4 in both lists
Finding objects who appear in both lists:
l1 = [1,2,3,4,5]
l2 = [3,4,5]
common = set(l1).intersection(set(l2))
Combine this with the __eq__ implementation on the object as the others suggested.
You need to write an __eq__ function to define how to compare objects for equality. If you want sorting, then yo should have a __cmp__ function, and it makes the most sense to implement __eq__ in terms of __cmp__.
def __eq__(self, other):
return cmp(self, other) == 0
You should probably also implement __hash__, and you definitely should if you plan to put your objects into a set or dictionary. The default __hash__ for objects is id(), which effectively makes all objects unique(i.e. uniqueness is not based on object contents).
I wrote a base class/interface for a class that does this sort of equivalence comparison. You may find it useful:
class Comparable(object):
def attrs(self):
raise Exception("Must be implemented in concrete sub-class!")
def __values(self):
return (getattr(self, attr) for attr in self.attrs())
def __hash__(self):
return reduce(lambda x, y: 37 * x + hash(y), self.__values(), 0)
def __cmp__(self, other):
for s, o in zip(self.__values(), other.__values()):
c = cmp(s, o)
if c:
return c
return 0
def __eq__(self, other):
return cmp(self, other) == 0
def __lt__(self, other):
return cmp(self, other) < 0
def __gt__(self, other):
return cmp(self, other) > 0
if __name__ == '__main__':
class Foo(Comparable):
def __init__(self, x, y):
self.x = x
self.y = y
def attrs(self):
return ('x', 'y')
def __str__(self):
return "Foo[%d,%d]" % (self.x, self.y)
def foo_iter(x):
for i in range(x):
for j in range(x):
yield Foo(i, j)
for a in foo_iter(4):
for b in foo_iter(4):
if a<b: print "%(a)s < %(b)s" % locals()
if a==b: print "%(a)s == %(b)s" % locals()
if a>b: print "%(a)s > %(b)s" % locals()
The derived class must implement attrs() that returns a tuple or list of the object's attributes that contribute to its identity (i.e. unchanging attributes that make it what it is). Most importantly, the code correctly handles equivalence where there are multiple attributes, and this is old school code that is often done incorrectly.
matches = [x for x in listA if x in listB]
Try the following:
list1 = [item1, item2, item3]
list2 = [item3, item4, item5]
for item in list1:
if item in list2:
print "item %s in both lists" % item
Related
I am using a hashable object as a key to a dictionary. The objects are hashable and I can store key-value-pairs in the dict, but when I create a copy of the same object (that gives me the same hash), I get a KeyError.
Here is some small example code:
class Object:
def __init__(self, x): self.x = x
def __hash__(self): return hash(self.x)
o1 = Object(1.)
o2 = Object(1.)
hash(o1) == hash(o2) # This is True
data = {}
data[o1] = 2.
data[o2] # Desired: This should output 2.
In my scenario above, how can I achieve that data[o2] also returns 2.?
You need to implement both __hash__ and __eq__:
class Object:
def __init__(self, x): self.x = x
def __hash__(self): return hash(self.x)
def __eq__(self, other): return self.x == other.x if isinstance(other, self.__class__) else NotImplemented
Per Python documentation:
if a class does not define an __eq__() method it should not define a __hash__() operation either
After finding the hash, Python's dictionary compares the keys using __eq__ and realize they're different, that's why you're not getting the correct output.
You can use the __eq__ magic method to implement a equality check on your object.
def __eq__(self, other):
if (isinstance(other, C)):
return self.x == self.x
You can learn more about magic methods from this link.
So as stated before your object need to implement __ eq__ trait (equality ==), If you want to understand why:
Sometimes hash of different object are the same, this is called collision.
Dictionary manages that by testing if the objects are equals. If they are not dictionary has to manage the collision. How they do that Is implementation details and can vary a lot. A dummy implementation would be list of tuple key value.
Under the hood, a dummy implementation may look like that :
dico[key] = [(object1, value), (object2, value)]
I have a list of instances of a certain class. This list contains `duplicates', in the sense that duplicates share the exact same attributes. I want to remove the duplicates from this list.
I can check whether two instances share the same attributes by using
class MyClass:
def __eq__(self, other) :
return self.__dict__ == other.__dict__
I could of course iterate through the whole list of instances and compare them element by element to remove duplicates, but I was wondering if there is a more pythonic way to do this, preferably using the in operator + list comprehension.
sets (no order)
A set cannot contain duplicate elements. list(set(content)) will deduplicate a list. This is not too inefficient and is probably one of the better ways to do it :P You will need to define a __hash__ function for your class though, which must be the same for equal elements and different for unequal elements for this to work. Note that the hash value must obey the aforementioned rule but otherwise it may change between runs without causing issues.
index function (stable order)
You could do lambda l: [l[index] for index in range(len(l)) if index == l.index(l[index])]. This only keeps elements that are the first in the list.
in operator (stable order)
def uniquify(content):
result = []
for element in content:
if element not in result:
result.append(element)
return result
This will keep appending elements to the output list unless they are already in the output list.
A little more on the set approach. You can safely implement a hash by delegating to a tuple's hash - just hash a tuple of all the attributes you want to look at. You will also need to define an __eq__ that behaves properly.
class MyClass:
def __init__(self, a, b, c):
self.a = a
self.b = b
self.c = c
def __eq__(self, other):
return (self.a, self.b, self.c) == (other.a, other.b, other.c)
def __hash__(self):
return hash((self.a, self.b, self.c))
def __repr__(self):
return "MyClass({!r}, {!r}, {!r})".format(self.a, self.b, self.c)
As you're doing so much tuple construction, you could just make your class iterable:
def __iter__(self):
return iter((self.a, self.b, self.c))
This enables you to call tuple on self instead of laboriously doing .a, .b, .c etc.
You can then do something like this:
def unordered_elim(l):
return list(set(l))
If you want to preserve ordering, you can use an OrderedDict instead:
from collections import OrderedDict
def ordered_elim(l):
return list(OrderedDict.fromkeys(l).keys())
This should be faster than using in or index, while still preserving ordering. You can test it something like this:
data = [MyClass("this", "is a", "duplicate"),
MyClass("first", "unique", "datum"),
MyClass("this", "is a", "duplicate"),
MyClass("second", "unique", "datum")]
print(unordered_elim(data))
print(ordered_elim(data))
With this output:
[MyClass('first', 'unique', 'datum'), MyClass('second', 'unique', 'datum'), MyClass('this', 'is a', 'duplicate')]
[MyClass('this', 'is a', 'duplicate'), MyClass('first', 'unique', 'datum'), MyClass('second', 'unique', 'datum')]
NB if any of your attributes aren't hashable, this won't work, and you'll either need to work around it (change a list to a tuple) or use a slow, n ^ 2 approach like in.
I have nested dictionaries that may contain other dictionaries or lists. I need to be able to compare a list (or set, really) of these dictionaries to show that they are equal.
The order of the list is not uniform. Typically, I would turn the list into a set, but it is not possible since there are values that are also dictionaries.
a = {'color': 'red'}
b = {'shape': 'triangle'}
c = {'children': [{'color': 'red'}, {'age': 8},]}
test_a = [a, b, c]
test_b = [b, c, a]
print(test_a == test_b) # False
print(set(test_a) == set(test_b)) # TypeError: unhashable type: 'dict'
Is there a good way to approach this to show that test_a has the same contents as test_b?
You can use a simple loop to check if each of one list is in the other:
def areEqual(a, b):
if len(a) != len(b):
return False
for d in a:
if d not in b:
return False
return True
I suggest writing a function that turns any Python object into something orderable, with its contents, if it has any, in sorted order. If we call it canonicalize, we can compare nested objects with:
canonicalize(test_a) == canonicalize(test_b)
Here's my attempt at writing a canonicalize function:
def canonicalize(x):
if isinstance(x, dict):
x = sorted((canonicalize(k), canonicalize(v)) for k, v in x.items())
elif isinstance(x, collections.abc.Iterable) and not isinstance(x, str):
x = sorted(map(canonicalize, x))
else:
try:
bool(x < x) # test for unorderable types like complex
except TypeError:
x = repr(x) # replace with something orderable
return x
This should work for most Python objects. It won't work for lists of heterogeneous items, containers that contain themselves (which will cause the function to hit the recursion limit), nor float('nan') (which has bizarre comparison behavior, and so may mess up the sorting of any container it's in).
It's possible that this code will do the wrong thing for non-iterable, unorderable objects, if they don't have a repr function that describes all the data that makes up their value (e.g. what is tested by ==). I picked repr as it will work on any kind of object and might get it right (it works for complex, for example). It should also work as desired for classes that have a repr that looks like a constructor call. For classes that have inherited object.__repr__ and so have repr output like <Foo object at 0xXXXXXXXX> it at least won't crash, though the objects will be compared by identity rather than value. I don't think there's any truly universal solution, and you can add some special cases for classes you expect to find in your data if they don't work with repr.
If the elements in both lists are shallow, the idea of sorting them, and then comparing with equality can work. The problem with #Alex's solution is that he is only using "id" - but if instead of id, one uses a function that will sort dictionaries properly, things shuld just work:
def sortkey(element):
if isinstance(element, dict):
element = sorted(element.items())
return repr(element)
sorted(test_a, key=sortkey) == sorted(test_b, key=sotrkey)
(I use an repr to wrap the key because it will cast all elements to string before comparison, which will avoid typerror if different elements are of unorderable types - which would almost certainly happen if you are using Python 3.x)
Just to be clear, if your dictionaries and lists have nested dictionaries themselves, you should use the answer by #m_callens. If your inner lists are also unorderd, you can fix this to work, jsut sorting them inside the key function as well.
In this case they are the same dicts so you can compare ids (docs). Note that if you introduced a new dict whose values were identical it would still be treated differently. I.e. d = {'color': 'red'} would be treated as not equal to a.
sorted(map(id, test_a)) == sorted(map(id, test_b))
As #jsbueno points out, you can do this with the kwarg key.
sorted(test_a, key=id) == sorted(test_b, key=id)
An elegant and relatively fast solution:
class QuasiUnorderedList(list):
def __eq__(self, other):
"""This method isn't as ineffiecient as you think! It runs in O(1 + 2 + 3 + ... + n) time,
possibly better than recursively freezing/checking all the elements."""
for item in self:
for otheritem in other:
if otheritem == item:
break
else:
# no break was reached, item not found.
return False
return True
This runs in O(1 + 2 + 3 + ... + n) flat. While slow for dictionaries of low depth, this is faster for dictionaries of high depth.
Here's a considerably longer snippet which is faster for dictionaries where depth is low and length is high.
class FrozenDict(collections.Mapping, collections.Hashable): # collections.Hashable = portability
"""Adapated from http://stackoverflow.com/a/2704866/1459669"""
def __init__(self, *args, **kwargs):
self._d = dict(*args, **kwargs)
self._hash = None
def __iter__(self):
return iter(self._d)
def __len__(self):
return len(self._d)
def __getitem__(self, key):
return self._d[key]
def __hash__(self):
# It would have been simpler and maybe more obvious to
# use hash(tuple(sorted(self._d.iteritems()))) from this discussion
# so far, but this solution is O(n). I don't know what kind of
# n we are going to run into, but sometimes it's hard to resist the
# urge to optimize when it will gain improved algorithmic performance.
# Now thread safe by CrazyPython
if self._hash is None:
_hash = 0
for pair in self.iteritems():
_hash ^= hash(pair)
self._hash = _hash
return _hash
def freeze(obj):
if type(obj) in (str, int, ...): # other immutable atoms you store in your data structure
return obj
elif issubclass(type(obj), list): # ugly but needed
return set(freeze(item) for item in obj)
elif issubclass(type(obj), dict): # for defaultdict, etc.
return FrozenDict({key: freeze(value) for key, value in obj.items()})
else:
raise NotImplementedError("freeze() doesn't know how to freeze " + type(obj).__name__ + " objects!")
class FreezableList(list, collections.Hashable):
_stored_freeze = None
_hashed_self = None
def __eq__(self, other):
if self._stored_freeze and (self._hashed_self == self):
frozen = self._stored_freeze
else:
frozen = freeze(self)
if frozen is not self._stored_freeze:
self._stored_hash = frozen
return frozen == freeze(other)
def __hash__(self):
if self._stored_freeze and (self._hashed_self == self):
frozen = self._stored_freeze
else:
frozen = freeze(self)
if frozen is not self._stored_freeze:
self._stored_hash = frozen
return hash(frozen)
class UncachedFreezableList(list, collections.Hashable):
def __eq__(self, other):
"""No caching version of __eq__. May be faster.
Don't forget to get rid of the declarations at the top of the class!
Considerably more elegant."""
return freeze(self) == freeze(other)
def __hash__(self):
"""No caching version of __hash__. See the notes in the docstring of __eq__2"""
return hash(freeze(self))
Test all three (QuasiUnorderedList, FreezableList, and UncachedFreezableList) and see which one is faster in your situation. I'll betcha it's faster than the other solutions.
When comparing tuples of objects apparently the __eq__ method of the object is called and then the compare method:
import timeit
setup = """
import random
import string
import operator
random.seed('slartibartfast')
d={}
class A(object):
eq_calls = 0
cmp_calls = 0
def __init__(self):
self.s = ''.join(random.choice(string.ascii_uppercase) for _ in
range(16))
def __hash__(self): return hash(self.s)
def __eq__(self, other):
self.__class__.eq_calls += 1
return self.s == other.s
def __ne__(self, other): return self.s != other.s
def __cmp__(self, other):
self.__class__.cmp_calls += 1
return cmp(self.s ,other.s)
for i in range(1000): d[A()] = 0"""
print min(timeit.Timer("""
for k,v in sorted(d.iteritems()): pass
print A.eq_calls
print A.cmp_calls""", setup=setup).repeat(1, 1))
print min(timeit.Timer("""
for k,v in sorted(d.iteritems(),key=operator.itemgetter(0)): pass
print A.eq_calls
print A.cmp_calls""", setup=setup).repeat(1, 1))
Prints:
8605
8605
0.0172435735131
0
8605
0.0103719966418
So in the second case where we compare the keys (that is the A instances) directly __eq__ is not called, while in the first case apparently the first ellement of the tuple are compared via equal and then via cmp. But why are they not compared directly via cmp ? What I really don't quite get is the default sorted behavior on the absence of a cmp or key parameter.
It is just how tuple comparison is implemented: tuplerichcompare
it searches the first index where items are different and then compare on that. That's why you see an __eq__ and then a __cmp__ call.
Moreover if you do not implement the __eq__ operator for A, you will see that __cmp__ is called twice once for equality and once for comparison.
For instance,
print min(timeit.Timer("""
l =list()
for i in range(5):
l.append((A(),A(),A()))
l[-1][0].s='foo'
l[-1][1].s='foo2'
for _ in sorted(l): pass
print A.eq_calls
print A.cmp_calls""", setup=setup).repeat(1, 1))
prints out 24 and 8 calls respectively (the exact number clearly depends on random seed but in this case they will always have a ratio of 3)
What is the difference between using a special method and just defining a normal class method? I was reading this site which lists a lot of them.
For example it gives a class like this.
class Word(str):
'''Class for words, defining comparison based on word length.'''
def __new__(cls, word):
# Note that we have to use __new__. This is because str is an immutable
# type, so we have to initialize it early (at creation)
if ' ' in word:
print "Value contains spaces. Truncating to first space."
word = word[:word.index(' ')] # Word is now all chars before first space
return str.__new__(cls, word)
def __gt__(self, other):
return len(self) > len(other)
def __lt__(self, other):
return len(self) < len(other)
def __ge__(self, other):
return len(self) >= len(other)
def __le__(self, other):
return len(self) <= len(other)
For each of those special methods why can't I just make a normal method instead, what are they doing different? I think I just need a fundamental explanation that I can't find, thanks.
It is a pythonic way to do this:
word1 = Word('first')
word2 = Word('second')
if word1 > word2:
pass
instead of direct usage of comparator method
NotMagicWord(str):
def is_greater(self, other)
return len(self) > len(other)
word1 = NotMagicWord('first')
word2 = NotMagicWord('second')
if word1.is_greater(word2):
pass
And the same with all other magic method. You define __len__ method to tell python its length using built-in len function, for example. All magic method will be called implicitly while standard operations like binary operators, object calling, comparision and a lot of other. A Guide to Python's Magic Methods is really good, read it and see what behavior you can give to your objects. It similar to operator overloading in C++, if you are familiar with it.
A method like __gt__ is called when you use comparison operators in your code. Writing something like
value1 > value2
Is the equivalent of writing
value1.__gt__(value2)
"Magic methods" are used by Python to implement a lot of its underlying structure.
For example, let's say I have a simple class to represent an (x, y) coordinate pair:
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
So, __init__ would be an example of one of these "magic methods" -- it allows me to automatically initialize the class by simply doing Point(3, 2). I could write this without using magic methods by creating my own "init" function, but then I would need to make an explicit method call to initialize my class:
class Point(object):
def init(self, x, y):
self.x = x
self.y = y
return self
p = Point().init(x, y)
Let's take another example -- if I wanted to compare two point variables, I could do:
class Point(object):
def __init__(self, x, y):
self.x = x
self.y = y
def __eq__(self, other):
return self.x == other.x and self.y == other.y
This lets me compare two points by doing p1 == p2. In contrast, if I made this a normal eq method, I would have to be more explicit by doing p1.eq(p2).
Basically, magic methods are Python's way of implementing a lot of its syntactic sugar in a way that allows it to be easily customizable by programmers.
For example, I could construct a class that pretends to be a function by implementing __call__:
class Foobar(object):
def __init__(self, a):
self.a = a
def __call__(self, b):
return a + b
f = Foobar(3)
print f(4) # returns 7
Without the magic method, I would have to manually do f.call(4), which means I can no longer pretend the object is a function.
Special methods are handled specially by the rest of the Python language. For example, if you try to compare two Word instances with <, the __lt__ method of Word will be called to determine the result.
The magic methods are called when you use <, ==, > to compare the objects. functools has a helper called total_ordering that will fill in the missing comparison methods if you just define __eq__ and __gt__.
Because str already has all the comparison operations defined, it's necessary to add them as a mixin if you want to take advantage of total_ordering
from functools import total_ordering
#total_ordering
class OrderByLen(object):
def __eq__(self, other):
return len(self) == len(other)
def __gt__(self, other):
return len(self) > len(other)
class Word(OrderByLen, str):
'''Class for words, defining comparison based on word length.'''
def __new__(cls, word):
# Note that we have to use __new__. This is because str is an immutable
# type, so we have to initialize it early (at creation)
if ' ' in word:
print "Value contains spaces. Truncating to first space."
word = word[:word.index(' ')] # Word is now all chars before first space
return str.__new__(cls, word)
print Word('cat') < Word('dog') # False
print Word('cat') > Word('dog') # False
print Word('cat') == Word('dog') # True
print Word('cat') <= Word('elephant') # True
print Word('cat') >= Word('elephant') # False