Get common elements majority of lists in python - python

Given 4 lists, I want to get elements that are common to 3 or more lists.
a = [1, 2, 3, 4]
b = [1, 2, 3, 4, 5]
c = [1, 3, 4, 5, 6]
d = [1, 2, 6, 7]
Hence, the output should be [1, 2, 3, 4].
My current code is as follows.
result1 = set(a) & set(b) & set(c)
result2 = set(b) & set(c) & set(d)
result3 = set(c) & set(d) & set(a)
result4 = set(d) & set(a) & set(b)
final_result = list(result1)+list(result2)+list(result3)+list(result4)
print(set(final_result))
It works fine, and give the desired output. However, I am interested in knowing if there is an easy way of doing this in Python, ie: are there any built in functions for this?

Using a Counter, you can do this like:
Code:
a = [1, 2, 3, 4]
b = [1, 2, 3, 4, 5]
c = [1, 3, 4, 5, 6]
d = [1, 2, 6, 7]
from collections import Counter
counts = Counter(sum(([list(set(i)) for i in (a, b, c, d)]), []))
print(counts)
more_than_three = [i for i, c in counts.items() if c >= 3]
print(more_than_three)
Results:
Counter({1: 4, 2: 3, 3: 3, 4: 3, 5: 2, 6: 2, 7: 1})
[1, 2, 3, 4]

Iterate over the values in all lists to create a dict of {value: number_of_lists_the_value_appears_in}:
from collections import defaultdict
counts = defaultdict(int)
for list_ in (a, b, c, d):
for value in set(list_): # eliminate duplicate values with `set`
counts[value] += 1
Then in the second step remove all values with a count < 3:
result = [value for value, count in counts.items() if count >= 3]
print(result) # [1, 2, 3, 4]

The code below will solve the generalised problem (with n lists, and a requirement that a common element must be in at least k of them). It will work with non-hashable items, which is the main disadvantage of all the other answers:
a = [1, 2, 3, 4]
b = [1, 2, 3, 4, 5]
c = [1, 2, 3, 4, 4, 5, 6]
d = [1, 2, 6, 7]
lists = [a, b, c, d]
result = []
desired_quanity = 3
for i in range(len(lists) - desired_quanity + 1): #see point 1 below
sublist = lists.pop(0) #see point 2
for item in sublist:
counter = 1 #1 not 0, by virute of the fact it is in sublist
for comparisonlist in lists:
if item in comparisonlist:
counter += 1
comparisonlist.remove(item) #see point 3
if counter >= desired_quanity:
result.append(item)
This has the disadvantage that for each element in every list, we have to check in every other list to see if it is there, but we can make things more efficient in a few ways. Also look-ups are alot slower in lists than sets (which we can't use since the OP has non-hashable items in the lists), and so this may be slow for very large lists.
1) If we require an item to be in k lists, we don't need to check each item in the last k-1 lists, as we would have already picked it up whilst searching through the first k lists.
2) Once we have searched through a list, we can discard that list, since any items in the just-searched-list that might contribute to our final result, will again already have been dealt with. This means that with each iteration we have fewer lists to search through.
3) When we have checked if an item is in enough lists, we can remove that item from the list, which means not only is the number of lists getting shorter as we proceed, the lists themselves are getting shorter, meaning quicker lookups.
As an aftersort, if we the original lists happen to be sorted beforehand, this might also help this algorithm work efficiently.

create a dictionary of counts and filter out those with count less than 3

Related

improve efficiency of a nested loop

I need to find the number of pairs with consecutive numbers in a list. If elements in the list are repeated, they should be treated as members of a distinct pair. For instance, if the list were [1, 1, 1, 2, 2, 5, 8, 8], then there are three ways to choose 1 and two ways to choose 2, or a total of 3×2=63×2=6 ways to choose the pair (1, 2), so that the answer would, in this case, be 6.
My solution currently contains a nested loop as below. The code works but I want to optimize for a runtime of less than 2 seconds.
Can anyone give me some pointers on how to improve the runtime of this solution?
L = [1, 2, 5, 8]
count = 0
for i in range(0,len(L)-1):
for x in range(i+1, len(L)):
if L[x] == L[i] + 1 or L[x] == L[i] -1 :
count+=1
You could use the Counter class from collection to classify and count the available numbers, then sum up the product of counts for existing pairs of consecutive values:
from collections import Counter
L = [1, 1, 1, 2, 2, 5, 8, 8]
counts = Counter(L)
r = sum(c*counts[n+1] for n,c in counts.items())
print(r) # 6

How to iterate over two lists with repeating elements

The list_1 have repeated values that are repeated in list_2 I need to find the elements that match from list_1 and list_2 and save them in result_list_1 and result_list_2 respectively and also the ones that do not match and save them in not_equals_list_1 and not_equals_list_2. The result of the items that match must also match in position.
Note: I need to use only arrays and not sets.
list_1 = [1, 5, 2, 3, 2, 4, 1, 3]
list_2 = [1, 2, 2, 3, 1, 6]
It should result in:
result_list_1 = [1, 2, 3, 2, 1]
result_list_2 = [1, 2, 3, 2, 1]
not_equals_list_1 = [3, 4, 5]
not_equals_list_2 = [6]
Your help would be of great help to me.
My method would decrease your time complexity at the cost of space complexity.
def taketwoarray(arr1,arr2):
my_dict={}
for i in arr1:# add all arr1 elements to dict
my_dict[i]=0
for i in arr2:# match if they are in arr1 increase count else set to -1
try:
if my_dict[i]!=-1:
my_dict[i]+=1
except:
my_dict[i]=-1
#now you have count of numbers that came in both lists and the numbers themselves
#and the numbers that dont match in either would be marked by zero and -1 count.
# count is set to -1 to avoid numbers coming again and again only in arr2.
Let me know if there is any issue!
collections.Counter is invaluable for things like this - like a set but with multiplicity. The algorithm is simple - we make a Counter for each list, and then for the opposite list we just check whether each element is in the other list's counter. If so, remove one occurrence and put it in the result_list. Otherwise, put it in the not_equals_list.
from collections import Counter
list_1 = [1, 5, 2, 3, 2, 4, 1, 3]
list_2 = [1, 2, 2, 3, 1, 6]
counter_1 = Counter(list_1)
counter_2 = Counter(list_2)
result_list_1 = []
result_list_2 = []
not_equals_list_1 = []
not_equals_list_2 = []
for item in list_1:
if counter_2[item] > 0:
result_list_1.append(item)
counter_2[item] -= 1
else:
not_equals_list_1.append(item)
for item in list_2:
if counter_1[item] > 0:
result_list_2.append(item)
counter_1[item] -= 1
else:
not_equals_list_2.append(item)
print(result_list_1) # [1, 2, 3, 2, 1]
print(result_list_2) # [1, 2, 2, 3, 1]
print(not_equals_list_1) # [5, 4, 3]
print(not_equals_list_2) # [6]
Order is preserved in not_equals_list from the order of the first list. If you desire it differently, you can use reversed() where necessary to change either order of iteration or to simply flip the result.
If using this type of solution with custom objects, you'll need to make sure that __hash__() is properly implemented for equality checking - since both sets and dicts are hashtable-based, and Counter is just a subclass of dict.
From a quick google search, multiset might provide a more efficient way of doing this by just converting your lists into sets and then doing set operations on them. I haven't tested this, though.
result_list_1 = []
result_list_2 = []
not_equals_list_1 = []
not_equals_list_2 = []
for item in list_1:
if item in list_2:
result_list_1.append(item)
else:
not_equals_list_1.append(item)
for item in list_2:
if item in list_1:
result_list_2.append(item)
else:
not_equals_list_2.append(item)
from collections import Counter
list_1 = [1, 5, 2, 3, 2, 4, 1, 3]
list_2 = [1, 2, 2, 3, 1, 6]
c1 = Counter(list_1)
c2 = Counter(list_2)
result_list_1 = [*(c1 & c2).elements()]
result_list_2 = result_list_1[:]
not_equals_list_1 = [*(c1 - c2).elements()]
not_equals_list_2 = [*(c2 - c1).elements()]
print(result_list_1) # [1, 1, 2, 2, 3]
print(result_list_2) # [1, 1, 2, 2, 3]
print(not_equals_list_1) # [5, 3, 4]
print(not_equals_list_2) # [6]

Compare many values and tell if neither of them equal

I'm working on a project and I need to compare some values between each other and tell if they DO NOT match. I have a list of thirteen lists and each of those have more than 500 values. All thirteen lists have the same length. I would like to find an index of the item in any of those thirteen lists.
However I tried to simplify the problem by making three lists and each of those contain four items.
list1 = [1, 2, 2, 2]
list2 = [1, 3, 2, 2]
list3 = [2, 4, 2, 2]
Blist = [list1, list2, list3]
for i in range(len(Blist)): #0, 1, 2
for j in range(len(Blist)): #0, 1, 2
if i == j:
pass
else:
for k in range(len(list1)): #0, 1, 2, 3
st = Blist[i][k] != Blist[j][k]
print(st)
I could compare two lists at a time but I can't come up with the solution that would compare all items with the same index and return me a value of the index "ind" whose values don't match (when you compare list1[ind], list2[ind] and list3[ind]).
If there were only three lists I could write
for i in range(len(list1)):
if (list1[i] != list2[i] and list1[i] != list3[i] and list2[i] != list3[i])
print(i)
But I'd like to solve a problem even if it has hundreds of lists with hundreds of items.
For every index, create a set of values taking values from a single index for each nested list. As a set can't have duplicate elements, the length of the set should be equal to the total number of nested lists. Otherwise, there were duplicates, which means all the values of that index were not unique.
values = [
[1, 2, 2, 2, 5],
[1, 3, 2, 2, 7],
[2, 4, 2, 2, 1]
]
n = len(values[0]) # Number of values in each nested list
total = len(values) # Total number of nested lists
for i in range(n):
s = {v[i] for v in values}
if len(s) == total:
print(i)
Output:
1
4
If you've understood the above approach, the code can be cut down using a somewhat functional approach. Basically 2 lines of python code. (written in multiple lines for improved readability).
values = [
[1, 2, 2, 2, 5],
[1, 3, 2, 2, 7],
[2, 4, 2, 2, 1]
]
total = len(values)
# Using a list comprehension to create a list with the unique indices
unique_indices = [
idx
for idx, s in enumerate(map(set, zip(*values)))
if len(s) == total
]
print(unique_indices)
Output:
[1, 4]
if you are allowed to use numpy,
array_nd = np.array(Blist)
uniqueValues , indicesList, occurCount= numpy.unique(array_nd, return_index=True, return_counts=True)
from the above filter all of them which has occurCount as 1 and you can get its index from indicesList.

Finding only pairs in list using list comprehension

Looking for a fancy one line solution to finding pairs of items in a list using list comprehension.
I've got some code that finds multiples, but can't figure out how to split those multiples into pairs.
lst = [1,2,4,2,2,3,3,1,1,1,2,4,3,4,1]
len(set([x for x in lst if lst.count(x) > 1]))
Code above returns 4. Answer should be 6 pairs, [1,1,1,1,1] = 2, [2,2,2,2] = 2, [3,3,3] = 1 and [4,4,4] = 1.
Another approach would be using [Python 3.Docs]: class collections.Counter([iterable-or-mapping]):
>>> from collections import Counter
>>>
>>> lst = [1, 2, 4, 2, 2, 3, 3, 1, 1, 1, 2, 4, 3, 4, 1]
>>>
>>> c = Counter(lst)
>>> c
Counter({1: 5, 2: 4, 4: 3, 3: 3})
>>>
>>> sum(item // 2 for item in c.values())
6
and the one line equivalent:
>>> sum(item // 2 for item in Counter(lst).values())
6
A one-liner with no other intermediate variables would be:
sum(lst.count(x)//2 for x in set(lst))
It loops over set(lst) which contains all the distinct numbers in lst, and adds their pair counts.
You can do the following (if I've understood your pairing method properly):
lst = [1,2,4,2,2,3,3,1,1,1,2,4,3,4,1]
the_dict = {x: int((lst.count(x)/2)) for x in lst}
print(sum(the_dict.values()))
> 6
print(the_dict)
> {1: 2, 2: 2, 4: 1, 3: 1}
This makes a dictionary with the count of all pairs, then you can sum the values in the dictionary to get the pair count. Then you also have the dictionary available with the pair count of each value, if you need it.

List comprehension with for loop

How can I double the first n odd numbers in a list using list comprehension?
Here is my solution:
>>> n = 2
>>> lst = [1, 2, 3, 4, 5, 6]
>>> lst = [num for num in lst if num % 2 == 1] + [num for num in lst if num % 2 == 0]
>>> lst = [num * 2 for num in lst[:n]] + lst[n:]
>>> print(lst)
[2, 6, 5, 2, 4, 6]
You can see that I can't keep the same order of lst anymore...
More example:
n = 2
lst = [2, 2, 2, 2, 1, 2, 3]
output: lst = [2, 2, 2, 2, 2, 2, 6]
Solution for the original requirement to *“double the first n numbers in a list if it’s odd”:
Since you do not want to remove any items from your original list, you cannot use the filter of the list comprehension syntax (the if after the for). So what you need to do instead is simply transform the item you are putting into the target list.
Your logic is something like this for an element x at index i:
def transform(x, i, n):
if i < n:
if x % 2 == 1:
return x * 2
return x
So you can use that exact function and use it in your list comprehension:
>>> n = 2
>>> lst = [1, 2, 3, 4, 5, 6]
>>> [transform(x, i, n) for i, x in enumerate(lst)]
[2, 2, 3, 4, 5, 6]
And of course, you can put this also inline into the list comprehension:
>>> [x * 2 if i < n and x % 2 == 1 else x for i, x in enumerate(lst)]
[2, 2, 3, 4, 5, 6]
First n odd numbers:
If you want to find the first n odd numbers, you cannot solve this like this. In order to solve this, you need to actually remember how many odd numbers you encountered before while going through the list. This means that you need to have some kind of “memory”. Such a thing is not a good fit for a list comprehension since list comprehensions are supposed to transform one item at a time without having side effects.
So instead, you would simply do this the straightforward way:
n = 2
lst = [2, 2, 2, 2, 1, 2, 3]
result = []
for x in lst:
if x % 2 == 1 and n > 0:
result.append(x * 2)
n -= 1
else:
result.append(x)
print(result) # [2, 2, 2, 2, 2, 2, 6]
For this to work, you'll need to keep count of odd numbers that you've already seen. For example, you could instantiate the itertools.count generator and advance it each time the odd number is encountered:
from itertools import count
def f(l, n):
odd = count()
return [x * 2 if x % 2 and next(odd) < n else x for x in l]
>>> f([1, 2, 3, 4, 5, 6], 2)
[2, 2, 6, 4, 5, 6]
>>> f([2, 2, 2, 2, 1, 2, 3], 2)
[2, 2, 2, 2, 2, 2, 6]
Use the ternary operator.
lst = [1, 2, 3, 4, 5, 6]
lst = [x * 2 if x % 2 == 1 and i <= n else x for i, x in enumerate(lst)]
or
lst[:n] = [x * 2 if x % 2 == 1 else x for x in lst[:n]]
Update: Under the new requirement of doubling first n odd integers:
lst = [1, 2, 3, 4, 5, 6]
class Doubler:
def __init__(self, n):
self.n = n
def proc(self, x):
if self.n > 0 and x % 2:
self.n -= 1
return 2 * x
return x
# Double first 2 odd elements
d = Doubler(n=2)
res = [d.proc(x) for x in lst]
print(res)
# [2, 2, 6, 4, 5, 6]
Name things with specificity, and the logic is exposed.
How can I double the first n odd numbers in a list using list comprehension?
We have odd numbers: v for v in l if n%2. This is a filter.
We can take the first n of them using islice(odds, n). We call this a slice, other languages might call it "take". And doubling them is a per item operation, so a map. Join these operations and we arrive at one answer to your question:
[v*2 for v in islice((v for v in l if n%2), n)]
However, that isn't what you wanted. The issue is specificity; your question doesn't say what to do with other items than the first n odd ones, so I have just ignored them.
So what do do if we want a replication of all the items your question did not mention? This means we have three groups: early odds, late odds, and evens, all processed distinctly. The latter may be mixed in arbitrarily, while we know late odds come after early odds. It's impractical to split them in individual streams, as you've shown, since that doesn't preserve their relative order.
I'll apply a few more itertools functions to solve this problem.
from itertools import repeat, chain
oddfactors = chain(repeat(2, n), repeat(1))
outlist = [v*next(oddfactors) if v%2 else v
for v in inlist]
Note that the iterator oddfactors is read for each odd item, not even items, because the if-else expression doesn't evaluate the expression if it's not being used. The iterator is consumed and you need to create another to perform the work again.
It is possible to place the oddfactors iterator's creation (and entire scope) within the list comprehension, but the first way I can think of is incredibly ugly:
from itertools import repeat, chain
outlist = [v*next(oddfactors) if v%2 else v
for v,oddfactors in zip(
inlist,
repeat(chain(repeat(2, n), repeat(1)))
)]
The trick here is to ensure we create the chained iterator only once, then feed it into each mapping operation. This exercise sure didn't help readability or performance. Using a nested comprehension would make it a bit cleaner but there's still only the one iterator, so it's a misleading hack.
outlist = [v*next(oddfactors) if v%2 else v
for oddfactors in [chain(repeat(2, n), repeat(1))]
for v in inlist]
How about this?
n = 2
lst = [1, 2, 3, 4, 5, 6]
for i in range(n):
lst[i]= lst[i]*2
[num if num%2 else 2*num for num in list]. num if a if b else c will return a if b is true, otherwise c.

Categories

Resources