Best way to find objects not present in both lists - python

I am working on a module that depends on checking if there are any objects not present in either of the 2 lists. The implementation is supposed to be in Python.
Consider the simplified object def:
class Foo(object):
def __init__(self, attr_one=None, attr_two=None):
self.attr_one = attr_one
self.attr_two = attr_two
def __eq__(self, other):
return self.attr_one == other.attr_one and self.attr_two == other.attr_two
I have two separate lists that can encapsulates multiple instances of class Foo as follows:
list1 = [Foo('abc', 2), Foo('bcd', 3), Foo('cde', 4)]
list2 = [Foo('abc', 2), Foo('bcd', 4), Foo('efg', 5)]
I need to figure out the objects that are present in one list and absent in the other on the basis of attr_one. In this case, the desired output for items present in first list and missing in the second list is given below.
`['Foo('bcd', 3), Foo('cde', 4)]`
Similarly, the items present in list 2 but not in list 1
[Foo('bcd', 4), Foo('efg', 5)]
I would like to know if there is a way to match the basis of attr_one as well.
List 1 List 2
Foo('bcd', 3) Foo('bcd', 4)
Foo('cde', 4) None
None Foo('efg', 5)

Since you already have an __eq__ method defined, You can use list comprehension to find the uniqueness of the objects in either of the lists.
print [obj for obj in list1 if obj not in list2]

A good way to quickly compare lists to determine which elements are present in one but not the other is to create sets from the original lists and take the difference between the two sets. In order for the list to be made into a set, the objects it contains must be hashable, so you must define a new __hash__() method for your Foo objects:
def __hash__(self):
return hash((self.attr_one,self.attr_two))
Note that since tuples are hashable, as long as attr_one and attr_two are hashable types, this implementation should be pretty solid.
Now, to determine which elements are present in one list but not the other:
set1 = set(list1)
set2 = set(list2)
missing_from_1 = set2 - set1
missing_from_2 = set1 - set2
To do this on the basis of only one of the attributes, you can create your sets using only the attributes themselves:
set1 = set([i.attr_one for i in list1])
Of course, this means that you'll end up with results that only tell you the attr_one values that are present in one list but not the other, rather than giving you the actual Foo objects. The objects themselves are easy to find, however, once you have the "missing" sets:
missing_Foos = set()
for attr in missing_from_2:
for i in list1:
if i.attr_one == attr:
missing_Foos.add(i)
This can be rather computationally expensive, though, if you have very long lists.
EDIT: using sets is only really useful if you have extremely large lists and therefore need to take advantage of the computational efficiency of set operations. Otherwise, it may be simpler to simply use list comprehensions, as suggested in the other answer.

There are two ways I'd do this - either using sets, or with filter:
class Foo(object):
def __init__(self, attr_one=None, attr_two=None):
self.attr_one = attr_one
self.attr_two = attr_two
def __eq__(self, other):
return self.attr_one == other.attr_one and self.attr_two == other.attr_two
def __hash__(self):
return hash(self.attr_one)
def __repr__(self):
return "<Foo {} {}>".format(self.attr_one, self.attr_two)
def main():
a = Foo('test', 1)
b = Foo('test', 1)
list1 = [Foo('abc', 2), Foo('bcd', 3), Foo('cde', 4)]
list2 = [Foo('abc', 2), Foo('bcd', 4), Foo('efg', 5)]
# With sets
list1set = set(list1)
list2set = set(list2)
print list1set.intersection(list2set)
# Returns set([<Foo abc 2>])
# With filter
list2attr_one = [l.attr_one for l in list2]
print filter(lambda x: x.attr_one in list2attr_one, list1)
# Returns [<Foo abc 2>, <Foo bcd 3>]

Related

Python: How can 2 dictionaries with a list be compared neglecting the list items order? [duplicate]

a = [1, 2, 3, 1, 2, 3]
b = [3, 2, 1, 3, 2, 1]
a & b should be considered equal, because they have exactly the same elements, only in different order.
The thing is, my actual lists will consist of objects (my class instances), not integers.
O(n): The Counter() method is best (if your objects are hashable):
def compare(s, t):
return Counter(s) == Counter(t)
O(n log n): The sorted() method is next best (if your objects are orderable):
def compare(s, t):
return sorted(s) == sorted(t)
O(n * n): If the objects are neither hashable, nor orderable, you can use equality:
def compare(s, t):
t = list(t) # make a mutable copy
try:
for elem in s:
t.remove(elem)
except ValueError:
return False
return not t
You can sort both:
sorted(a) == sorted(b)
A counting sort could also be more efficient (but it requires the object to be hashable).
>>> from collections import Counter
>>> a = [1, 2, 3, 1, 2, 3]
>>> b = [3, 2, 1, 3, 2, 1]
>>> print (Counter(a) == Counter(b))
True
If you know the items are always hashable, you can use a Counter() which is O(n)
If you know the items are always sortable, you can use sorted() which is O(n log n)
In the general case you can't rely on being able to sort, or has the elements, so you need a fallback like this, which is unfortunately O(n^2)
len(a)==len(b) and all(a.count(i)==b.count(i) for i in a)
If you have to do this in tests:
https://docs.python.org/3.5/library/unittest.html#unittest.TestCase.assertCountEqual
assertCountEqual(first, second, msg=None)
Test that sequence first contains the same elements as second, regardless of their order. When they don’t, an error message listing the differences between the sequences will be generated.
Duplicate elements are not ignored when comparing first and second. It verifies whether each element has the same count in both sequences. Equivalent to: assertEqual(Counter(list(first)), Counter(list(second))) but works with sequences of unhashable objects as well.
New in version 3.2.
or in 2.7:
https://docs.python.org/2.7/library/unittest.html#unittest.TestCase.assertItemsEqual
Outside of tests I would recommend the Counter method.
The best way to do this is by sorting the lists and comparing them. (Using Counter won't work with objects that aren't hashable.) This is straightforward for integers:
sorted(a) == sorted(b)
It gets a little trickier with arbitrary objects. If you care about object identity, i.e., whether the same objects are in both lists, you can use the id() function as the sort key.
sorted(a, key=id) == sorted(b, key==id)
(In Python 2.x you don't actually need the key= parameter, because you can compare any object to any object. The ordering is arbitrary but stable, so it works fine for this purpose; it doesn't matter what order the objects are in, only that the ordering is the same for both lists. In Python 3, though, comparing objects of different types is disallowed in many circumstances -- for example, you can't compare strings to integers -- so if you will have objects of various types, best to explicitly use the object's ID.)
If you want to compare the objects in the list by value, on the other hand, first you need to define what "value" means for the objects. Then you will need some way to provide that as a key (and for Python 3, as a consistent type). One potential way that would work for a lot of arbitrary objects is to sort by their repr(). Of course, this could waste a lot of extra time and memory building repr() strings for large lists and so on.
sorted(a, key=repr) == sorted(b, key==repr)
If the objects are all your own types, you can define __lt__() on them so that the object knows how to compare itself to others. Then you can just sort them and not worry about the key= parameter. Of course you could also define __hash__() and use Counter, which will be faster.
If the comparison is to be performed in a testing context, use assertCountEqual(a, b) (py>=3.2) and assertItemsEqual(a, b) (2.7<=py<3.2).
Works on sequences of unhashable objects too.
If the list contains items that are not hashable (such as a list of objects) you might be able to use the Counter Class and the id() function such as:
from collections import Counter
...
if Counter(map(id,a)) == Counter(map(id,b)):
print("Lists a and b contain the same objects")
Let a,b lists
def ass_equal(a,b):
try:
map(lambda x: a.pop(a.index(x)), b) # try to remove all the elements of b from a, on fail, throw exception
if len(a) == 0: # if a is empty, means that b has removed them all
return True
except:
return False # b failed to remove some items from a
No need to make them hashable or sort them.
I hope the below piece of code might work in your case :-
if ((len(a) == len(b)) and
(all(i in a for i in b))):
print 'True'
else:
print 'False'
This will ensure that all the elements in both the lists a & b are same, regardless of whether they are in same order or not.
For better understanding, refer to my answer in this question
You can write your own function to compare the lists.
Let's get two lists.
list_1=['John', 'Doe']
list_2=['Doe','Joe']
Firstly, we define an empty dictionary, count the list items and write in the dictionary.
def count_list(list_items):
empty_dict={}
for list_item in list_items:
list_item=list_item.strip()
if list_item not in empty_dict:
empty_dict[list_item]=1
else:
empty_dict[list_item]+=1
return empty_dict
After that, we'll compare both lists by using the following function.
def compare_list(list_1, list_2):
if count_list(list_1)==count_list(list_2):
return True
return False
compare_list(list_1,list_2)
from collections import defaultdict
def _list_eq(a: list, b: list) -> bool:
if len(a) != len(b):
return False
b_set = set(b)
a_map = defaultdict(lambda: 0)
b_map = defaultdict(lambda: 0)
for item1, item2 in zip(a, b):
if item1 not in b_set:
return False
a_map[item1] += 1
b_map[item2] += 1
return a_map == b_map
Sorting can be quite slow if the data is highly unordered (timsort is extra good when the items have some degree of ordering). Sorting both also requires fully iterating through both lists.
Rather than mutating a list, just allocate a set and do a left-->right membership check, keeping a count of how many of each item exist along the way:
If the two lists are not the same length you can short circuit and return False immediately.
If you hit any item in list a that isn't in list b you can return False
If you get through all items then you can compare the values of a_map and b_map to find out if they match.
This allows you to short-circuit in many cases long before you've iterated both lists.
plug in this:
def lists_equal(l1: list, l2: list) -> bool:
"""
import collections
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
ref:
- https://stackoverflow.com/questions/9623114/check-if-two-unordered-lists-are-equal
- https://stackoverflow.com/questions/7828867/how-to-efficiently-compare-two-unordered-lists-not-sets
"""
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
set_comp = set(l1) == set(l2) # removes duplicates, so returns true when not sometimes :(
multiset_comp = compare(l1, l2) # approximates multiset
return set_comp and multiset_comp #set_comp is gere in case the compare function doesn't work

How do I design a set/priority queue in Python 3 paired with a custom comparator for Tuples? [duplicate]

I wish to hold a heap of objects, not just numbers. They will have an integer attribute in them that the heap can sort by. The easiest way to use heaps in python is heapq, but how do I tell it to sort by a specific attribute when using heapq?
According to the example from the documentation, you can use tuples, and it will sort by the first element of the tuple:
>>> h = []
>>> heappush(h, (5, 'write code'))
>>> heappush(h, (7, 'release product'))
>>> heappush(h, (1, 'write spec'))
>>> heappush(h, (3, 'create tests'))
>>> heappop(h)
(1, 'write spec')
So if you don't want to (or can't?) do a __cmp__ method, you can manually extract your sorting key at push time.
Note that if the first elements in a pair of tuples are equal, further elements will be compared. If this is not what you want, you need to ensure that each first element is unique.
heapq sorts objects the same way list.sort does, so just define a method __cmp__() within your class definition, which will compare itself to another instance of the same class:
def __cmp__(self, other):
return cmp(self.intAttribute, other.intAttribute)
Works in Python 2.x.
In 3.x use:
def __lt__(self, other):
return self.intAttribute < other.intAttribute
According to the Official Document, a solution to this is to store entries as tuples (please take a look at Section 8.4.1 and 8.4.2).
For example, your object is something like this in tuple's format
(key, value_1, value_2)
When you put the objects (i.e. tuples) into heap, it will take the first attribute in the object (in this case is key) to compare. If a tie happens, the heap will use the next attribute (i.e. value_1) and so on.
For example:
import heapq
heap = []
heapq.heappush(heap, (0,'one', 1))
heapq.heappush(heap, (1,'two', 11))
heapq.heappush(heap, (1, 'two', 2))
heapq.heappush(heap, (1, 'one', 3))
heapq.heappush(heap, (1,'two', 3))
heapq.heappush(heap, (1,'one', 4))
heapq.heappush(heap, (1,'two', 5))
heapq.heappush(heap, (1,'one', 1))
show_tree(heap)
Output:
(0, 'one', 1)
(1, 'one', 1) (1, 'one', 4)
(1, 'one', 3) (1, 'two', 3) (1, 'two', 2) (1, 'two', 5)
(1, 'two', 11)
About pretty print a heap in python (updated the link): show_tree()
I feel the simplest way is to override the existing cmp_lt function of the heapq module. A short example:
import heapq
# your custom function. Here, comparing tuples a and b based on their 2nd element
def new_cmp_lt(self,a,b):
return a[1]<b[1]
#override the existing "cmp_lt" module function with your function
heapq.cmp_lt=new_cmp_lt
#Now use everything like normally used
Note: Somebody more qualified should comment if this conflicts with recommended coding practices. But it can still be useful for something "quick & dirty" e.g. in coding interviews with limited time and a lot more to do instead of spending time on subclassing correctly.
I had the same question but none of the above answers hit the spot although some were close but not elaborated enough. Anyway, I did some research and tried this piece of code and hopefully this should be sufficient for someone next who is looking to get an answer:
The problem with using a tuple is it only uses the first item which is not very flexible. I wanted something similar to std::priority_queue in c++ like this:
std::priority_queue<pair<int, int>, vector<pair<int, int>>, comparator> pq;
where I could design my own comparator which is more common in real world applications.
Hopefully the below snippet helps:
https://repl.it/#gururajks/EvenAccurateCylinders
import heapq
class PQNode:
def __init__(self, key, value):
self.key = key
self.value = value
# compares the second value
def __lt__(self, other):
return self.value < other.value
def __str__(self):
return str("{} : {}".format(self.key, self.value))
input = [PQNode(1, 4), PQNode(7, 4), PQNode(6, 9), PQNode(2, 5)]
hinput = []
for item in input:
heapq.heappush(hinput, item)
while (hinput):
print (heapq.heappop(hinput))
Python 3 Update
This other answers here are out-of-date:
Some are Python 2 specific. The __cmp__ method doesn't exist anymore.
Some do not reflect best practices and target only __lt__ instead of all the rich comparisons as recommended by PEP 8.
Some do not use modern tooling such as dataclasses, attrgetter, or total_ordering.
Modern solution with Dataclasses
With dataclasses, it is easy to make a data holder with customized ordering. For example, here is a Person class that excludes the name field from the comparison order:
from dataclasses import dataclass, field
#dataclass(order=True)
class Person:
name: str = field(compare=False)
age: int
actors = [
Person('T Hanks', 65),
Person('E Olson', 33),
Person('A Tapping', 58),
]
This works perfectly with heaps:
>>> heapify(actors)
>>> heappop(actors)
Person(name='E Olson', age=33)
>>> heappop(actors)
Person(name='A Tapping', age=58)
>>> heappop(actors)
Person(name='T Hanks', age=65)
Handling Existing Classes
Sometimes you have to work with the data as provided and need to control the comparison order without changing the original class.
The solution is to add a wrapper with the new comparison. This leaves the unoriginal data and its class unchanged. Here is a modern recipe for adding such a wrapper:
from functools import total_ordering
from operator import attrgetter
def new_compare(*field_names):
extract = attrgetter(*field_names)
#total_ordering
class ComparisonWrapper:
def __init__(self, obj):
self.obj = obj
def __eq__(self, other):
return extract(self.obj) == extract(other.obj)
def __lt__(self, other):
return extract(self.obj) < extract(other.obj)
return ComparisonWrapper
For example, you may be given the following data and cannot alter it or its class directly:
class Person:
def __init__(self, name, age):
self.name = name
self.age = age
def __repr__(self):
return f'Person({self.name!r}, {self.age})'
actors = [
Person('T Hanks', 65),
Person('E Olson', 33),
Person('A Tapping', 58),
]
The wrapper can be applied gracefully with map(). To unwrap the data, access the obj attribute:
>>> from heapq import heapify, heappop
>>> data = list(map(new_compare('age'), actors))
>>> heapify(data)
>>> heappop(data).obj
Person('E Olson', 33)
>>> heappop(data).obj
Person('A Tapping', 58)
>>> heappop(data).obj
Person('T Hanks', 65)
Wrappers versus Decorating Tuples
As noted in the modern documentation, the traditional solution with decorating tuples no longer works for some essential use cases. In particular, if the objects in the heap are functions, a tuple in the form of (priority, task) no longer works in Python 3 because functions cannot be compared.
The new suggestion is to use a wrapper such as:
from dataclasses import dataclass, field
from typing import Any
#dataclass(order=True)
class PrioritizedItem:
priority: int
item: Any=field(compare=False)
This will always work even if the item objects aren't comparable.
Unfortunately, you can't, although this is an often requested feature.
One option would be to insert (key, value) tuples into the heap. However, that won't work if the values throw an exception when compared (they will be compared in the case of a tie between keys).
A second option would be to define a __lt__ (less-than) method in the class that will use the appropriate attribute to compare the elements for sorting. However, that might not be possible if the objects were created by another package or if you need them to compare differently elsewhere in the program.
A third option would be to use the sortedlist class from the blist module (disclaimer: I'm the author). The constructor for sortedlist takes a key parameter that lets you specify a function to return the sort key of an element, similar to the key parameter of list.sort and sorted.
You could implement a heapdict. Note the use of popitem() to get the lowest priority item.
import heapdict as hd
import string
import numpy as np
h = hd.heapdict()
keys = [char for char in string.ascii_lowercase[:10]]
vals = [i for i in np.random.randint(0,10, 10)]
for k,v in zip(keys,vals):
h[k] = v
for i in range(len(vals)):
print h.popitem()
There is a module called heaps. The Github address is https://github.com/gekco/heapy. You can apply your own key / sort function at instantiation of the class or when creating the heap from an array, which is very useful as this saves you adding it as an argument every time you perform an action.
Example where I want the list what the smallest element at the last position of the tuple be on top of the heap:
>>> from heapy.heap import Heap
>>> a = [(3, 5, 10), (-5, 3, 8), (7, 8, 9), (-4, 0, 2)]
>>> x = Heap.from_array(a, key=lambda t : t[-1])
>>> x.length
4
>>> x.top()
(-4, 0, 2)
>>> x.insert((-1, 0, 1))
>>> x.length
5
>>> x.top()
(-1, 0, 1)
>>> a
[(3, 5, 10), (-5, 3, 8), (7, 8, 9), (-4, 0, 2)]

Comparing list of unique objects with custom function

I need to compare hundreds of objects stored in a unique list to find duplicates:
object_list = {Object_01, Object_02, Object_03, Object_04, Object_05, ...}
I've written a custom function, which returns True, if the objects are equal and False if not:
object_01.compare(object_02)
>>> True
Compare method works well, but takes a lot of time per execution. I'm currently using itertools.combinations(x, 2) to iterate through all combinations. I've thought it's a good idea to use a dict for storing already compared objects and create new sets dynamically like:
dct = {'Compared': {}}
dct['Compared'] = set()
import itertools
for a, b in itertools.combinations(x, 2):
if b.name not in dct['Compared']:
if compare(a,b) == True:
#print (a,b)
key = a.name
value = b.name
if key not in dct:
dct[key] = set()
dct[key].add(value)
else:
dct[key].add(value)
dct[key].add(key)
dct['Compared'].add(b)
Current Output:
Compared: {'Object_02', 'Object_01', 'Object_03', 'Object_04', 'Object_05'}
Object_01: {'Object_02', 'Object_03', 'Object_01'}
Object_04: {'Object_05', 'Object_04'}
Object_05: {'Object_04'}
...
I would like to know: Is there a faster way to iterate through all combinations and how to break/prevent the iteration of an object, which is already assigned to a list of duplicates?
Desired Output:
Compared: {'Object_02', 'Object_01', 'Object_03', 'Object_04', 'Object_05'}
Object_01: {'Object_02', 'Object_03', 'Object_01'}
Object_04: {'Object_05', 'Object_04'}
...
Note: Compare method is a c-wrapper. Requirement is to find an algorithm around it.
You don't need to calculate all combinations, you just need to check if a given item is a duplicate:
for i, a in enumerate(x):
if any(a.compare(b) for b in x[:i]):
# a is a duplicate of an already seen item, so do something
This is still technically O(n^2), but you've cut out at least half the checks required, and should be a bit faster.
In short, x[:i] returns all items in the list before index i. If the item x[i] appears in that list, you know it's a duplicate. If not, there may be a duplicate after it in the list, but you worry about that when you get there.
Using any is also important here: if it finds any true item, it will immediately stop, without checking the rest of the iterable.
You could also improve the number of checks by removing known duplicates from the list you're checking against:
x_copy = x[:]
removed = 0
for i, a in enumerate(x):
if any(a.compare(b) for b in x_copy[:i-removed]):
del x_copy[i-removed]
removed += 1
# a is a duplicate of an already seen item, so do something
Note that we use a copy, to avoid changing the sequence we're iterating over, and we need to take account for the number of items we've removed when using indexes.
Next, we just need to figure out how to build the dictionary.
THis might be a little more complex. The first step is to figure out exactly which element is a duplicate. This can be done by realising any is just a wrapper around a for loop:
def any(iterable):
for item in iterable:
if item: return True
return False
We can then make a minor change, and pass in a function:
def first(iterable, fn):
for item in iterable:
if fn(item): return item
return None
Now, we change our duplicate finder as follows:
d = collections.defaultdict(list)
x_copy = x[:]
removed = 0
for i, a in enumerate(x):
b = first(x_copy[:i-removed], a.compare):
if b is not None:
# b is the first occurring duplicate of a
del x_copy[i-removed]
removed += 1
d[b.name].append(a)
else:
# we've not seen a yet, but might see it later
d[a.name].append(a)
This will put every element in the list into a dict(-like). If you only want the duplicates, it's then just a case of getting all the entries with a length greater than 1.
Group the objects by name if you want to find the dups grouping by attributes
class Foo:
def __init__(self,i,j):
self.i = i
self.j = j
object_list = {Foo(1,2),Foo(3,4),Foo(1,2),Foo(3,4),Foo(5,6)}
from collections import defaultdict
d = defaultdict(list)
for obj in object_list:
d[(obj.i,obj.j)].append(obj)
print(d)
defaultdict(<type 'list'>, {(1, 2): [<__main__.Foo instance at 0x7fa44ee7d098>, <__main__.Foo instance at 0x7fa44ee7d128>],
(5, 6): [<__main__.Foo instance at 0x7fa44ee7d1b8>],
(3, 4): [<__main__.Foo instance at 0x7fa44ee7d0e0>, <__main__.Foo instance at 0x7fa44ee7d170>]})
If not the name then use a tuple to store all the attributes you use to check for comparison.
Or sort the list by the attributes that matter and use groupby to group:
class Foo:
def __init__(self,i,j):
self.i = i
self.j = j
object_list = {Foo(1,2),Foo(3,4),Foo(1,2),Foo(3,4),Foo(5,6)}
from itertools import groupby
from operator import attrgetter
groups = [list(v) for k,v in groupby(sorted(object_list, key=attrgetter("i","j")),key=attrgetter("i","j"))]
print(groups)
[[<__main__.Foo instance at 0x7f794a944d40>, <__main__.Foo instance at 0x7f794a944dd0>], [<__main__.Foo instance at 0x7f794a944d88>, <__main__.Foo instance at 0x7f794a944e18>], [<__main__.Foo instance at 0x7f794a944e60>]]
You could also implement lt, eq and hash to make your objects sortable and hashable:
class Foo(object):
def __init__(self,i,j):
self.i = i
self.j = j
def __lt__(self, other):
return (self.i, self.j) < (other.i, other.j)
def __hash__(self):
return hash((self.i,self.j))
def __eq__(self, other):
return (self.i, self.j) == (other.i, other.j)
print(set(object_list))
object_list.sort()
print(map(lambda x: (getattr(x,"i"),getattr(x,"j")),object_list))
set([<__main__.Foo object at 0x7fdff2fc08d0>, <__main__.Foo object at 0x7fdff2fc09d0>, <__main__.Foo object at 0x7fdff2fc0810>])
[(1, 2), (1, 2), (3, 4), (3, 4), (5, 6)]
Obviously the attributes need to be hashable, if you had lists you could change to tuples etc..

python - Common lists among lists in a list

I need to be able to find the first common list (which is a list of coordinates in this case) between a variable amount of lists.
i.e. this list
>>> [[[1,2],[3,4],[6,7]],[[3,4],[5,9],[8,3],[4,2]],[[3,4],[9,9]]]
should return
>>> [3,4]
If easier, I can work with a list of all common lists(coordinates) between the lists that contain the coordinates.
I can't use sets or dictionaries because lists are not hashable(i think?).
Correct, list objects are not hashable because they are mutable. tuple objects are hashable (provided that all their elements are hashable). Since your innermost lists are all just integers, that provides a wonderful opportunity to work around the non-hashableness of lists:
>>> lists = [[[1,2],[3,4],[6,7]],[[3,4],[5,9],[8,3],[4,2]],[[3,4],[9,9]]]
>>> sets = [set(tuple(x) for x in y) for y in lists]
>>> set.intersection(*sets)
set([(3, 4)])
Here I give you a set which contains tuples of the coordinates which are present in all the sublists. To get a list of list like you started with:
[list(x) for x in set.intersection(*sets)]
does the trick.
To address the concern by #wim, if you really want a reference to the first element in the intersection (where first is defined by being first in lists[0]), the easiest way is probably like this:
#... Stuff as before
intersection = set.intersection(*sets)
reference_to_first = next( (x for x in lists[0] if tuple(x) in intersection), None )
This will return None if the intersection is empty.
If you are looking for the first child list that is common amongst all parent lists, the following will work.
def first_common(lst):
first = lst[0]
rest = lst[1:]
for x in first:
if all(x in r for r in rest):
return x
Solution with recursive function. :)
This gets first duplicated element.
def get_duplicated_element(array):
global result, checked_elements
checked_elements = []
result = -1
def array_recursive_check(array):
global result, checked_elements
if result != -1: return
for i in array:
if type(i) == list:
if i in checked_elements:
result = i
return
checked_elements.append(i)
array_recursive_check(i)
array_recursive_check(array)
return result
get_duplicated_element([[[1,2],[3,4],[6,7]],[[3,4],[5,9],[8,3],[4,2]],[[3,4],[9,9]]])
[3, 4]
you can achieve this with a list comprehension:
>>> l = [[[1,2],[3,4],[6,7]],[[3,4],[5,9],[8,3],[4,2]],[[3,4],[9,9]]]
>>> lcombined = sum(l, [])
>>> [k[0] for k in [(i,lcombined.count(i)) for i in lcombined] if k[1] > 1][0]
[3, 4]

How to efficiently compare two unordered lists (not sets)?

a = [1, 2, 3, 1, 2, 3]
b = [3, 2, 1, 3, 2, 1]
a & b should be considered equal, because they have exactly the same elements, only in different order.
The thing is, my actual lists will consist of objects (my class instances), not integers.
O(n): The Counter() method is best (if your objects are hashable):
def compare(s, t):
return Counter(s) == Counter(t)
O(n log n): The sorted() method is next best (if your objects are orderable):
def compare(s, t):
return sorted(s) == sorted(t)
O(n * n): If the objects are neither hashable, nor orderable, you can use equality:
def compare(s, t):
t = list(t) # make a mutable copy
try:
for elem in s:
t.remove(elem)
except ValueError:
return False
return not t
You can sort both:
sorted(a) == sorted(b)
A counting sort could also be more efficient (but it requires the object to be hashable).
>>> from collections import Counter
>>> a = [1, 2, 3, 1, 2, 3]
>>> b = [3, 2, 1, 3, 2, 1]
>>> print (Counter(a) == Counter(b))
True
If you know the items are always hashable, you can use a Counter() which is O(n)
If you know the items are always sortable, you can use sorted() which is O(n log n)
In the general case you can't rely on being able to sort, or has the elements, so you need a fallback like this, which is unfortunately O(n^2)
len(a)==len(b) and all(a.count(i)==b.count(i) for i in a)
If you have to do this in tests:
https://docs.python.org/3.5/library/unittest.html#unittest.TestCase.assertCountEqual
assertCountEqual(first, second, msg=None)
Test that sequence first contains the same elements as second, regardless of their order. When they don’t, an error message listing the differences between the sequences will be generated.
Duplicate elements are not ignored when comparing first and second. It verifies whether each element has the same count in both sequences. Equivalent to: assertEqual(Counter(list(first)), Counter(list(second))) but works with sequences of unhashable objects as well.
New in version 3.2.
or in 2.7:
https://docs.python.org/2.7/library/unittest.html#unittest.TestCase.assertItemsEqual
Outside of tests I would recommend the Counter method.
The best way to do this is by sorting the lists and comparing them. (Using Counter won't work with objects that aren't hashable.) This is straightforward for integers:
sorted(a) == sorted(b)
It gets a little trickier with arbitrary objects. If you care about object identity, i.e., whether the same objects are in both lists, you can use the id() function as the sort key.
sorted(a, key=id) == sorted(b, key==id)
(In Python 2.x you don't actually need the key= parameter, because you can compare any object to any object. The ordering is arbitrary but stable, so it works fine for this purpose; it doesn't matter what order the objects are in, only that the ordering is the same for both lists. In Python 3, though, comparing objects of different types is disallowed in many circumstances -- for example, you can't compare strings to integers -- so if you will have objects of various types, best to explicitly use the object's ID.)
If you want to compare the objects in the list by value, on the other hand, first you need to define what "value" means for the objects. Then you will need some way to provide that as a key (and for Python 3, as a consistent type). One potential way that would work for a lot of arbitrary objects is to sort by their repr(). Of course, this could waste a lot of extra time and memory building repr() strings for large lists and so on.
sorted(a, key=repr) == sorted(b, key==repr)
If the objects are all your own types, you can define __lt__() on them so that the object knows how to compare itself to others. Then you can just sort them and not worry about the key= parameter. Of course you could also define __hash__() and use Counter, which will be faster.
If the comparison is to be performed in a testing context, use assertCountEqual(a, b) (py>=3.2) and assertItemsEqual(a, b) (2.7<=py<3.2).
Works on sequences of unhashable objects too.
If the list contains items that are not hashable (such as a list of objects) you might be able to use the Counter Class and the id() function such as:
from collections import Counter
...
if Counter(map(id,a)) == Counter(map(id,b)):
print("Lists a and b contain the same objects")
Let a,b lists
def ass_equal(a,b):
try:
map(lambda x: a.pop(a.index(x)), b) # try to remove all the elements of b from a, on fail, throw exception
if len(a) == 0: # if a is empty, means that b has removed them all
return True
except:
return False # b failed to remove some items from a
No need to make them hashable or sort them.
I hope the below piece of code might work in your case :-
if ((len(a) == len(b)) and
(all(i in a for i in b))):
print 'True'
else:
print 'False'
This will ensure that all the elements in both the lists a & b are same, regardless of whether they are in same order or not.
For better understanding, refer to my answer in this question
You can write your own function to compare the lists.
Let's get two lists.
list_1=['John', 'Doe']
list_2=['Doe','Joe']
Firstly, we define an empty dictionary, count the list items and write in the dictionary.
def count_list(list_items):
empty_dict={}
for list_item in list_items:
list_item=list_item.strip()
if list_item not in empty_dict:
empty_dict[list_item]=1
else:
empty_dict[list_item]+=1
return empty_dict
After that, we'll compare both lists by using the following function.
def compare_list(list_1, list_2):
if count_list(list_1)==count_list(list_2):
return True
return False
compare_list(list_1,list_2)
from collections import defaultdict
def _list_eq(a: list, b: list) -> bool:
if len(a) != len(b):
return False
b_set = set(b)
a_map = defaultdict(lambda: 0)
b_map = defaultdict(lambda: 0)
for item1, item2 in zip(a, b):
if item1 not in b_set:
return False
a_map[item1] += 1
b_map[item2] += 1
return a_map == b_map
Sorting can be quite slow if the data is highly unordered (timsort is extra good when the items have some degree of ordering). Sorting both also requires fully iterating through both lists.
Rather than mutating a list, just allocate a set and do a left-->right membership check, keeping a count of how many of each item exist along the way:
If the two lists are not the same length you can short circuit and return False immediately.
If you hit any item in list a that isn't in list b you can return False
If you get through all items then you can compare the values of a_map and b_map to find out if they match.
This allows you to short-circuit in many cases long before you've iterated both lists.
plug in this:
def lists_equal(l1: list, l2: list) -> bool:
"""
import collections
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
ref:
- https://stackoverflow.com/questions/9623114/check-if-two-unordered-lists-are-equal
- https://stackoverflow.com/questions/7828867/how-to-efficiently-compare-two-unordered-lists-not-sets
"""
compare = lambda x, y: collections.Counter(x) == collections.Counter(y)
set_comp = set(l1) == set(l2) # removes duplicates, so returns true when not sometimes :(
multiset_comp = compare(l1, l2) # approximates multiset
return set_comp and multiset_comp #set_comp is gere in case the compare function doesn't work

Categories

Resources