common elements in two lists where elements are the same - python

I have two lists like thw following:
a=['not','not','not','not']
b=['not','not']
and I have to find the len of the list containing the intesection of the two above list, so that the result is:
intersection=['not','not']
len(intersection)
2
Now the problem is that I have tried filter(lambda x: x in a,b) and filter (lambda x: x in b,a) but when one of two list in longer than the other I do not get an intersection but just a membership checking. In the example above, since all the members of a are in b I get a len of common elements of 4; what I instead want is the intersection, which is len 2.
Using set().intersection(set()) would instead create a set, which is not what I want since all the elements are the same.
Can you suggest me any valuable and compact solution to the problem?

If you don't mind using collections.Counter, then you could have a solution like
>>> import collections
>>> a=['not','not','not','not']
>>> b=['not','not']
>>> c1 = collections.Counter(a)
>>> c2 = collections.Counter(b)
and then index by 'not'
>>> c1['not'] + c2['not']
6
For the intersection, you need to
>>> (c1 & c2) ['not']
2

I don't see any particularly compact way to compute this. Let's just go for a solution first.
The intersection is some sublist of the shorter list (e.g. b). Now, for better performance when the shorter list is not extremely short, make the longer list a set (e.g. set(a)). The intersection can then be expressed as a list comprehension of those items in the shorter list which are also in the longer set:
def common_elements(a, b):
shorter, longer = (a, b) if len(a)<len(b) else (b, a)
longer = set(longer)
intersection = [item for item in shorter if item in longer]
return intersection
a = ['not','not','not','not']
b = ['not','not']
print(common_elements(a,b))

Have you considered the following approach?
a = ['not','not','not','not']
b = ['not','not']
min(len(a), len(b))
# 2
Since all the elements are the same, the number of common elements is just the minimum of the lengths of both lists.

Do it by set. First make those lists to sets and then take their intersection. Now there might be repetitions in the intersection. So for each elements in intersection take the minimum repetitions in a and b.
>>> a=['not','not','not','not']
>>> b=['not','not']
>>> def myIntersection(A,B):
... setIn = set(A).intersection(set(B))
... rt = []
... for i in setIn:
... for j in range(min(A.count(i),B.count(i))):
... rt.append(i)
... return rt
...
>>> myIntersection(a,b)
['not', 'not']

Related

Indices of elements in a python list of tuples

Problem
Given 2 lists A and B, I want to get the indices of all elements in List A which are present in List B. Each element is a tuple.
I am using lists of size 40,000 elements or so.
Sample case
Input:
A = [(1,2),(3,4),(5,6),(7,8)]
B = [(1,2),(3,4),(5,6)]
Expected output:
[0,1,2]
Attempted solutions
I tried two solutions:
1) using map function
m = map(a.index,b)
list(m)
2) using list comprehension
m = [a.index(item) for item in b if item in a]
These methods seem to be taking too much time. Is there any other way to accomplish this?
Below would be your best bet. I am using a set (i.e., set(B)) as searching for the specific tuple can be done in O(1) Time Complexity.
m = [index for index, tuple in enumerate(A) if tuple in set(B)]

Checking existence of combination in a list against a list of lists

If i have a list of list of integers S: [[1,2,3],[3,4,5],[5,6,7]], and a single list T: [2,3,1]. I want to return true if T as a combination is contained in S. Assuming each element of S has same length as that of T.
In this case, I want to return true.
Restrictions: No sorting of any kind, and note S has all unique lists, but within a list, it can have duplicate elements.
How can I do this as efficiently as possible. I can iterate through each element of S and turn it into a set and compare it with set(T), but that seems very slow if size of S and length of each element of S gets bigger.
You can use sorted:
>>> S = [[1,2,3],[3,4,5],[5,6,7]]
>>> T = [2,3,1]
>>> any(sorted(T) == sorted(x) for x in S)
True
Using itertools?
from itertools import combinations
for i in combinations(t, len(t)): if i in s: return True;
Or using sorted:
t = sorted(t)
for i in s: if sorted(i)==t: return True
Collections and counter? It'll run in O(n), whereas sorting will run in roughly O(n log n), and raw comparison will be O(n²)
import collections
T = collections.Counter([2,3,1])
any(T == collections.Counter(x) for x in S)
If a stipulation is without sorting, you might want to look into an order-independent hash function for the combinations. Two starting points I liked were (1) the sum of the hashes of the individual elements of the combination, and (2) a tuple (sum(combination), product(combination)).

How to compare two lists and return the number of times they match at each index in python?

I have two lists containing 1's and 0's, e.g.
list1 = [1,1,0,1,0,1]
list2 = [0,1,0,1,1,0]
I want to find the number of times they match at each index. So in this case the output would be 3 because they have the same value at indices 1,2 and 3 only.
Currently I'm doing this:
matches_list = []
for i in list1:
index = list[1].index(i)
if list1[index] == list2[index]:
mathes_list.append(i)
else:
pass
return len(matches_list)
However this is very slow and I want to do this many times over to compare a large number of these lists
I was hoping someone could advise me on a quicker way to do this. Is there a way to use the set() function, or something similar, for example to compare two lists but maintain the order of each one?
zip the lists, compare the elements, compute the sum.
>>> list1 = [1,1,0,1,0,1]
>>> list2 = [0,1,0,1,1,0]
>>> sum(a == b for a,b in zip(list1, list2))
3
(Consider using itertools.izip in Python 2 for memory efficiency.)
Here's a lightning fast numpy answer:
import numpy as np
list1 = np.array([1,1,0,1,0,1])
list2 = np.array([0,1,0,1,1,0])
len(np.where(list1==list2)[0])
The numpy np.where function will return the indexes of all the points in the pair of lists that conform to a function (in this case list1==list2 at indices [1,2,3]) along with a datatype description. In the above case, I strip out the array of indices and count how many there are with len().
You can use map with operator.eq and sum:
>>> import operator
>>> sum(map(operator.eq, list1, list2))
This works because True is interpreted as 1 when summed and False like 0.
You could also use numpy for this:
>>> import numpy as np
>>> np.count_nonzero(np.asarray(list1) == np.asarray(list2))

How to check if all items in a list are there in another list?

I have two lists say
List1 = ['a','c','c']
List2 = ['x','b','a','x','c','y','c']
Now I want to find out if all elements of List1 are there in List2. In this case all there are. I can't use the subset function because I can have repeated elements in lists. I can use a for loop to count the number of occurrences of each item in List1 and see if it is less than or equal to the number of occurrences in List2. Is there a better way to do this?
Thanks.
When number of occurrences doesn't matter, you can still use the subset functionality, by creating a set on the fly:
>>> list1 = ['a', 'c', 'c']
>>> list2 = ['x', 'b', 'a', 'x', 'c', 'y', 'c']
>>> set(list1).issubset(list2)
True
If you need to check if each element shows up at least as many times in the second list as in the first list, you can make use of the Counter type and define your own subset relation:
>>> from collections import Counter
>>> def counterSubset(list1, list2):
c1, c2 = Counter(list1), Counter(list2)
for k, n in c1.items():
if n > c2[k]:
return False
return True
>>> counterSubset(list1, list2)
True
>>> counterSubset(list1 + ['a'], list2)
False
>>> counterSubset(list1 + ['z'], list2)
False
If you already have counters (which might be a useful alternative to store your data anyway), you can also just write this as a single line:
>>> all(n <= c2[k] for k, n in c1.items())
True
Be aware of the following:
>>>listA = ['a', 'a', 'b','b','b','c']
>>>listB = ['b', 'a','a','b','c','d']
>>>all(item in listB for item in listA)
True
If you read the "all" line as you would in English, This is not wrong but can be misleading, as listA has a third 'b' but listB does not.
This also has the same issue:
def list1InList2(list1, list2):
for item in list1:
if item not in list2:
return False
return True
Just a note. The following does not work:
>>>tupA = (1,2,3,4,5,6,7,8,9)
>>>tupB = (1,2,3,4,5,6,6,7,8,9)
>>>set(tupA) < set(TupB)
False
If you convert the tuples to lists it still does not work. I don't know why strings work but ints do not.
Works but has same issue of not keeping count of element occurances:
>>>set(tupA).issubset(set(tupB))
True
Using sets is not a comprehensive solution for multi-occurrance element matching.
But here is a one-liner solution/adaption to shantanoo's answer without try/except:
all(True if sequenceA.count(item) <= sequenceB.count(item) else False for item in sequenceA)
A builtin function wrapping a list comprehension using a ternary conditional operator. Python is awesome! Note that the "<=" should not be "==".
With this solution sequence A and B can be type tuple and list and other "sequences" with "count" methods. The elements in both sequences can be most types. I would not use this with dicts as it is now, hence the use "sequence" instead of "iterable".
A solution using Counter and the builtin intersection method (note that - is proper multiset difference, not an element-wise subtraction):
from collections import Counter
def is_subset(l1, l2):
c1, c2 = Counter(l1), Counter(l2)
return not c1 - c2
Test:
>>> List1 = ['a','c','c']
>>> List2 = ['x','b','a','x','c','y','c']
>>> is_subset(List1, List2)
True
I can't use the subset function because I can have repeated elements in lists.
What this means is that you want to treat your lists as multisets rather than sets. The usual way to handle multisets in Python is with collections.Counter:
A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.
And, while you can implement subset for multisets (implemented with Counter) by looping and comparing counts, as in poke's answer, this is unnecessary—just as you can implement subset for sets (implemented with set or frozenset) by looping and testing in, but it's unnecessary.
The Counter type already implements all the set operators extended in the obvious way for multisets.<1 So you can just write subset in terms of those operators, and it will work for both set and Counter out of the box.
With (multi)set difference:2
def is_subset(c1, c2):
return not c1 - c2
Or with (multi)set intersection:
def is_subset(c1, c2):
def c1 & c2 == c1
1. You may be wondering why, if Counter implements the set operators, it doesn't just implement < and <= for proper subset and subset. Although I can't find the email thread, I'm pretty sure this was discussed, and the answer was that "the set operators" are defined as the specific set of operators defined in the initial version of collections.abc.Set (which has since been expanded, IIRC…), not all operators that set happens to include for convenience, in the exact same way that Counter doesn't have named methods like intersection that's friendly to other types than & just because set does.
2. This depends on the fact that collections in Python are expected to be falsey when empty and truthy otherwise. This is documented here for the builtin types, and the fact that bool tests fall back to len is explained here—but it's ultimately just a convention, so that "quasi-collections" like numpy arrays can violate it if they have a good reason. It holds for "real collections" like Counter, OrderedDict, etc. If you're really worried about that, you can write len(c1 - c2) == 0, but note that this is against the spirit of PEP 8.
This will return true is all the items in List1 are in List2
def list1InList2(list1, list2):
for item in list1:
if item not in list2:
return False
return True
def check_subset(list1, list2):
try:
[list2.remove(x) for x in list1]
return 'all elements in list1 are in list2'
except:
return 'some elements in list1 are not in list2'

Determine if 2 lists have the same elements, regardless of order? [duplicate]

This question already has answers here:
How to efficiently compare two unordered lists (not sets)?
(12 answers)
Closed 6 years ago.
Sorry for the simple question, but I'm having a hard time finding the answer.
When I compare 2 lists, I want to know if they are "equal" in that they have the same contents, but in different order.
Ex:
x = ['a', 'b']
y = ['b', 'a']
I want x == y to evaluate to True.
You can simply check whether the multisets with the elements of x and y are equal:
import collections
collections.Counter(x) == collections.Counter(y)
This requires the elements to be hashable; runtime will be in O(n), where n is the size of the lists.
If the elements are also unique, you can also convert to sets (same asymptotic runtime, may be a little bit faster in practice):
set(x) == set(y)
If the elements are not hashable, but sortable, another alternative (runtime in O(n log n)) is
sorted(x) == sorted(y)
If the elements are neither hashable nor sortable you can use the following helper function. Note that it will be quite slow (O(n²)) and should generally not be used outside of the esoteric case of unhashable and unsortable elements.
def equal_ignore_order(a, b):
""" Use only when elements are neither hashable nor sortable! """
unmatched = list(b)
for element in a:
try:
unmatched.remove(element)
except ValueError:
return False
return not unmatched
Determine if 2 lists have the same elements, regardless of order?
Inferring from your example:
x = ['a', 'b']
y = ['b', 'a']
that the elements of the lists won't be repeated (they are unique) as well as hashable (which strings and other certain immutable python objects are), the most direct and computationally efficient answer uses Python's builtin sets, (which are semantically like mathematical sets you may have learned about in school).
set(x) == set(y) # prefer this if elements are hashable
In the case that the elements are hashable, but non-unique, the collections.Counter also works semantically as a multiset, but it is far slower:
from collections import Counter
Counter(x) == Counter(y)
Prefer to use sorted:
sorted(x) == sorted(y)
if the elements are orderable. This would account for non-unique or non-hashable circumstances, but this could be much slower than using sets.
Empirical Experiment
An empirical experiment concludes that one should prefer set, then sorted. Only opt for Counter if you need other things like counts or further usage as a multiset.
First setup:
import timeit
import random
from collections import Counter
data = [str(random.randint(0, 100000)) for i in xrange(100)]
data2 = data[:] # copy the list into a new one
def sets_equal():
return set(data) == set(data2)
def counters_equal():
return Counter(data) == Counter(data2)
def sorted_lists_equal():
return sorted(data) == sorted(data2)
And testing:
>>> min(timeit.repeat(sets_equal))
13.976069927215576
>>> min(timeit.repeat(counters_equal))
73.17287588119507
>>> min(timeit.repeat(sorted_lists_equal))
36.177085876464844
So we see that comparing sets is the fastest solution, and comparing sorted lists is second fastest.
This seems to work, though possibly cumbersome for large lists.
>>> A = [0, 1]
>>> B = [1, 0]
>>> C = [0, 2]
>>> not sum([not i in A for i in B])
True
>>> not sum([not i in A for i in C])
False
>>>
However, if each list must contain all the elements of other then the above code is problematic.
>>> A = [0, 1, 2]
>>> not sum([not i in A for i in B])
True
The problem arises when len(A) != len(B) and, in this example, len(A) > len(B). To avoid this, you can add one more statement.
>>> not sum([not i in A for i in B]) if len(A) == len(B) else False
False
One more thing, I benchmarked my solution with timeit.repeat, under the same conditions used by Aaron Hall in his post. As suspected, the results are disappointing. My method is the last one. set(x) == set(y) it is.
>>> def foocomprehend(): return not sum([not i in data for i in data2])
>>> min(timeit.repeat('fooset()', 'from __main__ import fooset, foocount, foocomprehend'))
25.2893661496
>>> min(timeit.repeat('foosort()', 'from __main__ import fooset, foocount, foocomprehend'))
94.3974742993
>>> min(timeit.repeat('foocomprehend()', 'from __main__ import fooset, foocount, foocomprehend'))
187.224562545
As mentioned in comments above, the general case is a pain. It is fairly easy if all items are hashable or all items are sortable. However I have recently had to try solve the general case. Here is my solution. I realised after posting that this is a duplicate to a solution above that I missed on the first pass. Anyway, if you use slices rather than list.remove() you can compare immutable sequences.
def sequences_contain_same_items(a, b):
for item in a:
try:
i = b.index(item)
except ValueError:
return False
b = b[:i] + b[i+1:]
return not b

Categories

Resources