Python: Check the occurrences in a list against a value

Python: Check the occurrences in a list against a value - python

lst = [1,2,3,4,1]
I want to know 1 occurs twice in this list, is there any efficient way to do?

lst.count(1) would return the number of times it occurs. If you're going to be counting items in a list, O(n) is what you're going to get.
The general function on the list is list.count(x), and will return the number of times x occurs in a list.

Are you asking whether every item in the list is unique?
len(set(lst)) == len(lst)
Whether 1 occurs more than once?
lst.count(1) > 1
Note that the above is not maximally efficient, because it won't short-circuit -- even if 1 occurs twice, it will still count the rest of the occurrences. If you want it to short-circuit you will have to write something a little more complicated.
Whether the first element occurs more than once?
lst[0] in lst[1:]
How often each element occurs?
import collections
collections.Counter(lst)
Something else?

For multiple occurrences, this give you the index of each occurence:
>>> lst=[1,2,3,4,5,1]
>>> tgt=1
>>> found=[]
>>> for index, suspect in enumerate(lst):
... if(tgt==suspect):
... found.append(index)
...
>>> print len(found), "found at index:",", ".join(map(str,found))
2 found at index: 0, 5
If you want the count of each item in the list:
>>> lst=[1,2,3,4,5,2,2,1,5,5,5,5,6]
>>> count={}
>>> for item in lst:
... count[item]=lst.count(item)
...
>>> count
{1: 2, 2: 3, 3: 1, 4: 1, 5: 5, 6: 1}

def valCount(lst):
res = {}
for v in lst:
try:
res[v] += 1
except KeyError:
res[v] = 1
return res
u = [ x for x,y in valCount(lst).iteritems() if y > 1 ]
u is now a list of all values which appear more than once.
Edit:
#katrielalex: thank you for pointing out collections.Counter, of which I was not previously aware. It can also be written more concisely using a collections.defaultdict, as demonstrated in the following tests. All three methods are roughly O(n) and reasonably close in run-time performance (using collections.defaultdict is in fact slightly faster than collections.Counter).
My intention was to give an easy-to-understand response to what seemed a relatively unsophisticated request. Given that, are there any other senses in which you consider it "bad code" or "done poorly"?
import collections
import random
import time
def test1(lst):
res = {}
for v in lst:
try:
res[v] += 1
except KeyError:
res[v] = 1
return res
def test2(lst):
res = collections.defaultdict(lambda: 0)
for v in lst:
res[v] += 1
return res
def test3(lst):
return collections.Counter(lst)
def rndLst(lstLen):
r = random.randint
return [r(0,lstLen) for i in xrange(lstLen)]
def timeFn(fn, *args):
st = time.clock()
res = fn(*args)
return time.clock() - st
def main():
reps = 5000
res = []
tests = [test1, test2, test3]
for t in xrange(reps):
lstLen = random.randint(10,50000)
lst = rndLst(lstLen)
res.append( [lstLen] + [timeFn(fn, lst) for fn in tests] )
res.sort()
return res
And the results, for random lists containing up to 50,000 items, are as follows:
(Vertical axis is time in seconds, horizontal axis is number of items in list)

Another way to get all items that occur more than once:
lst = [1,2,3,4,1]
d = {}
for x in lst:
d[x] = x in d
print d[1] # True
print d[2] # False
print [x for x in d if d[x]] # [1]

You could also sort the list which is O(n*log(n)), then check the adjacent elements for equality, which is O(n). The result is O(n*log(n)). This has the disadvantage of requiring the entire list be sorted before possibly bailing when a duplicate is found.
For a large list with a relatively rare duplicates, this could be the about the best you can do. The best way to approach this really does depend on the size of the data involved and its nature.

Related

how to format a one line conditional return statement?

I am trying to write a function called find_it(seq) that, given list of numbers, returns the number that appears an odd amount of times.
I have tried rearranging the return and for loop.
and tried without the else clause.
can someone point out how to format it?
thanks
def find_it(seq):
#return i for i in seq if seq.count(i) % 2 == 1 else 0
for i in seq: return i if seq.count(i) % 2 == 1 else: pass
#this is my solution without the one line and without using count()
def find_it(seq):
dic = {}
for i in seq:
if i not in dic:
dic.update({i:1})
else:
dic[i] += 1
print(dic)
for item,num in dic.items():
if num % 2 == 1:
return item

If you insist on making one-liner loop I suggest you use generator with next, this will make the code more readable
def find_it(seq):
return next((i for i in seq if seq.count(i) % 2 == 1), None)
However the more efficient way will be a simple loop
def find_it(seq):
for i in seq:
if seq.count(i) % 2 == 1:
return i

def find_it(seq):
return set([el for el in seq if seq.count(el) % 2 == 1])
> print(find_it([1, 2, 3, 1, 1, 2, 3]))
{1}
This snippet returns a set of elements present an odd number of times in a list.
It's not as efficient as can be, as checked elements present multiple times are still counted. For example, count(2) returns an int of how many 2s are in the list, but because of how the loop is, the next time the program sees a 2, it stills calculates the .count of 2 even though it's done it before.
This can be rectified by removing all the occurrences of an element from the list after it's been checked, or ignore checked elements. I was unable to find a way to do this in one line as you requested.

I digress only because OP seems intent on efficiency.
For this problem, the technique used can be influenced by the data being processed. This is best explained by example. Here are six different ways to achieve the same objective.
from collections import Counter
from timeit import timeit
# even number of 1s, odd number of 2s
list_ = [1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,2,2,2]
def find_it_1(seq):
for k, v in Counter(seq).items():
if v % 2:
return k
def find_it_2(seq):
for i in seq:
if seq.count(i) % 2 :
return i
def find_it_3(seq):
s = set()
for e in seq:
if e not in s:
if seq.count(e) % 2:
return e
s.add(e)
def find_it_4(seq):
return next((i for i in seq if seq.count(i) % 2), None)
def find_it_5(seq):
for e in set(seq):
if seq.count(e) % 2:
return e
def find_it_6(seq):
d = {}
for e in seq:
d[e] = d.get(e, 0) + 1
for k, v in d.items():
if v % 2:
return k
for func in find_it_1, find_it_2, find_it_3, find_it_4, find_it_5, find_it_6:
print(func.__name__, timeit(lambda: func(list_)))
Output:
find_it_1 1.627880711999751
find_it_2 2.23142556699986
find_it_3 0.9605982989996846
find_it_4 2.4646536830000514
find_it_5 0.6783656980001069
find_it_6 1.9190425920000962
Now, let's change the data as follows:
list_ = [2,2,2,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1]
Note that there are 3 occurrences of 2 and that they're at the start of the list. This results in:
find_it_1 1.574513012999887
find_it_2 0.3627374699999564
find_it_3 0.4003442379998887
find_it_4 0.5936855530007961
find_it_5 0.674294768999971
find_it_6 1.8698847380001098
Quod Erat Demonstrandum

How to find first value in a list having no duplicates?

l1 = ['A','B','C','D','A','B']
l2 = []
'C' is the first value in list l1, i want to create a function so that it returns C in l2.

In 3.6 and higher, this is very easy. Now that dicts preserve insertion order, collections.Counter can be used to efficiently count all elements in a single pass, then you can just scan the resulting Counter in order to find the first element with a count of 1:
from collections import Counter
l1 = ['A','B','C','D','A','B']
l2 = [next(k for k, v in Counter(l1).items() if v == 1)]
Work is strictly O(n), with only one pass of the input required (plus a partial pass of the unique values in the Counter itself), and the code is incredibly simple. In modern Python, Counter even has a C accelerator for counting inputs that pushes all the Counter construction work to the C layer, making it impossible to beat. If you want to account for the possibility that no such element exists, just wrap the l2 initialization to make it:
try:
l2 = [next(k for k, v in Counter(l1).items() if v == 1)]
except StopIteration:
l2 = []
# ... whatever else makes sense for your scenario ...
or avoid exception handling with itertools.islice (so l2 is 0-1 items, and it still short-circuits once a hit is found):
from itertools import islice
l2 = list(islice((k for k, v in Counter(l1).items() if v == 1), 1))

You can convert list to string and then compare index of each character from left and right using find and rfind functions of string. It stops counting as soon as the first match is found,
l1 = ['A','B','C','D','A','B']
def i_list(input):
l1 = ''.join(input)
for i in l1:
if l1.find(i) == l1.rfind(i):
return(i)
print(i_list(l1))
# output
C

An implementation using a defaultdict:
# Initialize
from collections import defaultdict
counts = defaultdict(int)
# Count all occurrences
for item in l1:
counts[item] += 1
# Find the first non-duplicated item
for item in l1:
if counts[item] == 1:
l2 = [item]
break
else:
l2 = []

As a follow up to ShadowRanger's answer, if you're using a lower version of Python, it's not that more complicated to filter the original list so that you don't have to rely on the ordering of the counter items:
from collections import Counter
l1 = ['A','B','C','D','A','B']
c = Counter(l1)
l2 = [x for x in l1 if c[x] == 1][:1]
print(l2) # ['C']
This is also O(n).

We can do it also with "numpy".
def find_first_non_duplicate(l1):
indexes_counts = np.asarray(np.unique(l1, return_counts=True, return_index=True)[1:]).T
(not_duplicates,) = np.where(indexes_counts[:, 1] == 1)
if not_duplicates.size > 0:
return [l1[np.min(indexes_counts[not_duplicates, 0])]]
else:
return []
print(find_first_non_duplicate(l1))
# output
['C']

A faster way to get the first unique element in a list:
Check each element one by one and store it in a dict.
Loop through the dict and check the first element whose count is
def countElement(a):
"""
returns a dict of element and its count
"""
g = {}
for i in a:
if i in g:
g[i] +=1
else:
g[i] =1
return g
#List to be processed - input_list
input_list = [1,1,1,2,2,2,2,3,3,4,5,5,234,23,3,12,3,123,12,31,23,13,2,4,23,42,42,34,234,23,42,34,23,423,42,34,23,423,4,234,23,42,34,23,4,23,423,4,23,4] #Input from user
try:
if input_list: #if list is not empty
print ([i for i,v in countElement(input_list).items() if v == 1][0]) #get the first element whose count is 1
else: #if list is empty
print ("empty list in Input")
except: #if list is empty - IndexError
print (f"Only duplicate values in list - {input_list}")

looping through loops in python?

I'm trying to solve this problem on the easy section of coderbyte and the prompt is:
Have the function ArrayAdditionI(arr) take the array of numbers stored in arr and return the string true if any combination of numbers in the array can be added up to equal the largest number in the array, otherwise return the string false. For example: if arr contains [4, 6, 23, 10, 1, 3] the output should return true because 4 + 6 + 10 + 3 = 23. The array will not be empty, will not contain all the same elements, and may contain negative numbers.
Here's my solution.
def ArrayAddition(arr):
arr = sorted(arr, reverse=True)
large = arr.pop(0)
storage = 0
placeholder = 0
for r in range(len(arr)):
for n in arr:
if n + storage == large: return True
elif n + storage < large: storage += n
else: continue
storage = 0
if placeholder == 0: placeholder = arr.pop(0)
else: arr.append(placeholder); placeholder = arr.pop(0)
return False
print ArrayAddition([2,95,96,97,98,99,100])
I'm not even sure if this is correct, but it seems to cover all the numbers I plug in. I'm wondering if there is a better way to solve this through algorithm which I know nothing of. I'm thinking a for within a for within a for, etc loop would do the trick, but I don't know how to do that.
What I have in mind is accomplishing this with A+B, A+C, A+D ... A+B+C ... A+B+C+D+E
e.g)
for i in range(len(arr):
print "III: III{}III".format(i)
storage = []
for j in range(len(arr):
print "JJ: II({}),JJ({})".format(i,j)
for k in range(len(arr):
print "K: I{}, J{}, K{}".format(i,j,k)
I've searched all over and found the suggestion of itertool, but I'm wondering if there is a way to write this code up more raw.
Thanks.

A recursive solution:
def GetSum(n, arr):
if len(arr) == 0 and n != 0:
return False
return (n == 0 or
GetSum(n, arr[1:]) or
GetSum(n-arr[0], arr[1:]))
def ArrayAddition(arr):
arrs = sorted(arr)
return GetSum(arrs[-1], arrs[:-1])
print ArrayAddition([2,95,96,97,98,99,100])
The GetSum function returns False when the required sum is non-zero and there are no items in the array. Then it checks for 3 cases:
If the required sum, n, is zero then the goal is achieved.
If we can get the sum with the remaining items after the first item is removed, then the goal is achieved.
If we can get the required sum minus the first element of the list on the rest of the list the goal is achieved.

Your solution doesn't work.
>>> ArrayAddition([10, 11, 20, 21, 30, 31, 60])
False
The simple solution is to use itertools to iterate over all subsets of the input (that don't contain the largest number):
def subsetsum(l):
l = list(l)
target = max(l)
l.remove(l)
for subset_size in xrange(1+len(l)):
for subset in itertools.combinations(l, subset_size):
if sum(subset) == target:
return True
return False
If you want to avoid itertools, you'll need to generate subsets directly. That can be accomplished by counting in binary and using the set bits to determine which elements to pick:
def subsetsum(l):
l = list(l)
target = max(l)
l.remove(l)
for subset_index in xrange(2**len(l)):
subtotal = 0
for i, num in enumerate(l):
# If bit i is set in subset_index
if subset_index & (1 << i):
subtotal += num
if subtotal == target:
return True
return False

Update: I forgot that you want to check all possible combinations. Use this instead:
def ArrayAddition(l):
for length in range(2, len(l)):
for lst in itertools.combinations(l, length):
if sum(lst) in l:
print(lst, sum(lst))
return True
return False
One-liner solution:
>>> any(any(sum(lst) in l for lst in itertools.combinations(l, length)) for length in range(2, len(l)))
Hope this helps!

Generate all the sums of the powerset and test them against the max
def ArrayAddition(L):
return any(sum(k for j,k in enumerate(L) if 1<<j&i)==max(L) for i in range(1<<len(L)))
You could improve this by doing some preprocessing - find the max first and remove it from L

One more way to do it...
Code:
import itertools
def func(l):
m = max(l)
rem = [itertools.combinations([x for x in l if not x == m],i) for i in range(2,len(l)-1)]
print [item for i in rem for item in i if sum(item)==m ]
if __name__=='__main__':
func([1,2,3,4,5])
Output:
[(1, 4), (2, 3)]
Hope this helps.. :)

If I understood the question correctly, simply this should return what you want:
2*max(a)<=sum(a)

My implementation of merging two sorted lists in linear time - what could be improved?

Fromg Google's Python Class:
E. Given two lists sorted in increasing order, create and return a merged
list of all the elements in sorted order. You may modify the passed in lists.
Ideally, the solution should work in "linear" time, making a single
pass of both lists.
Here's my solution:
def linear_merge(list1, list2):
merged_list = []
i = 0
j = 0
while True:
if i == len(list1):
return merged_list + list2[j:]
if j == len(list2):
return merged_list + list1[i:]
if list1[i] <= list2[j]:
merged_list.append(list1[i])
i += 1
else:
merged_list.append(list2[j])
j += 1
First of all, is it okay to use an infinite loop here? Should I break out of the loop using the break keyword when I'm done merging the list, or are the returns okay here?
I've seen similar questions asked here, and all the solutions look quite similar to mine, i.e. very C-like. Is there no more python-like solution? Or is this because of the nature of the algorithm?

This question covers this in more detail than you probably need. ;) The chosen answer matches your requirement. If I needed to do this myself, I would do it in the way that dbr described in his or her answer (add the lists together, sort the new list) as it is very simple.
EDIT:
I'm adding an implementation below. I actually saw this in another answer here which seems to have been deleted. I'm just hoping it wasn't deleted because it had an error which I'm not catching. ;)
def mergeSortedLists(a, b):
l = []
while a and b:
if a[0] < b[0]:
l.append(a.pop(0))
else:
l.append(b.pop(0))
return l + a + b

Here's a generator approach. You've probably noticed that a whole lot of these "generate lists" can be done well as generator functions. They're very useful: they don't require you to generate the whole list before using data from it, to keep the whole list in memory, and you can use them to directly generate many data types, not just lists.
This works if passed any iterator, not just lists.
This approach also passes one of the more useful tests: it behaves well when passed an infinite or near-infinite iterator, eg. linear_merge(xrange(10**9), xrange(10**9)).
The redundancy in the two cases could probably be reduced, which would be useful if you wanted to support merging more than two lists, but for clarity I didn't do that here.
def linear_merge(list1, list2):
"""
>>> a = [1, 3, 5, 7]
>>> b = [2, 4, 6, 8]
>>> [i for i in linear_merge(a, b)]
[1, 2, 3, 4, 5, 6, 7, 8]
>>> [i for i in linear_merge(b, a)]
[1, 2, 3, 4, 5, 6, 7, 8]
>>> a = [1, 2, 2, 3]
>>> b = [2, 2, 4, 4]
>>> [i for i in linear_merge(a, b)]
[1, 2, 2, 2, 2, 3, 4, 4]
"""
list1 = iter(list1)
list2 = iter(list2)
value1 = next(list1)
value2 = next(list2)
# We'll normally exit this loop from a next() call raising StopIteration, which is
# how a generator function exits anyway.
while True:
if value1 <= value2:
# Yield the lower value.
yield value1
try:
# Grab the next value from list1.
value1 = next(list1)
except StopIteration:
# list1 is empty. Yield the last value we received from list2, then
# yield the rest of list2.
yield value2
while True:
yield next(list2)
else:
yield value2
try:
value2 = next(list2)
except StopIteration:
# list2 is empty.
yield value1
while True:
yield next(list1)

Why stop at two lists?
Here's my generator based implementation to merge any number of sorted iterators in linear time.
I'm not sure why something like this isn't in itertools...
def merge(*sortedlists):
# Create a list of tuples containing each iterator and its first value
iterlist = [[i,i.next()] for i in [iter(j) for j in sortedlists]]
# Perform an initial sort of each iterator's first value
iterlist.sort(key=lambda x: x[1])
# Helper function to move the larger first item to its proper position
def reorder(iterlist, i):
if i == len(iterlist) or iterlist[0][1] < iterlist[i][1]:
iterlist.insert(i-1,iterlist.pop(0))
else:
reorder(iterlist,i+1)
while True:
if len(iterlist):
# Reorder the list if the 1st element has grown larger than the 2nd
if len(iterlist) > 1 and iterlist[0][1] > iterlist[1][1]:
reorder(iterlist, 1)
yield iterlist[0][1]
# try to pull the next value from the current iterator
try:
iterlist[0][1] = iterlist[0][0].next()
except StopIteration:
del iterlist[0]
else:
break
Here's an example:
x = [1,10,20,33,99]
y = [3,11,20,99,1001]
z = [3,5,7,70,1002]
[i for i in merge(x,y,z)]

hi i just did this exercise and i was wondering why not use,
def linear_merge(list1, list2):
return sorted(list1 + list2)
pythons sorted function is linear isn't it?

Here's my implementation from a previous question:
def merge(*args):
import copy
def merge_lists(left, right):
result = []
while (len(left) and len(right)):
which_list = (left if left[0] <= right[0] else right)
result.append(which_list.pop(0))
return result + left + right
lists = [arg for arg in args]
while len(lists) > 1:
left, right = copy.copy(lists.pop(0)), copy.copy(lists.pop(0))
result = merge_lists(left, right)
lists.append(result)
return lists.pop(0)

Another generator:
def merge(xs, ys):
xs = iter(xs)
ys = iter(ys)
try:
y = next(ys)
except StopIteration:
for x in xs:
yield x
raise StopIteration
while True:
for x in xs:
if x > y:
yield y
break
yield x
else:
yield y
for y in ys:
yield y
break
xs, ys, y = ys, xs, x

I agree with other answers that extending and sorting is the most straightforward way, but if you must merge, this will be a little faster because it does not make two calls to len every iteration nor does it do a bounds check. The Python pattern, if you could call it that, is to avoid testing for a rare case and catch the exception instead.
def linear_merge(list1, list2):
merged_list = []
i = 0
j = 0
try:
while True:
if list1[i] <= list2[j]:
merged_list.append(list1[i])
i += 1
else:
merged_list.append(list2[j])
j += 1
except IndexError:
if i == len(list1):
merged_list.extend(list2[j:])
if j == len(list2):
merged_list.extend(list1[i:])
return merged_list
edit
Optimized per John Machin's comment. Moved try outside of while True and extended merged_list upon exception.

According to a note here:
# Note: the solution above is kind of cute, but unforunately list.pop(0)
# is not constant time with the standard python list implementation, so
# the above is not strictly linear time.
# An alternate approach uses pop(-1) to remove the endmost elements
# from each list, building a solution list which is backwards.
# Then use reversed() to put the result back in the correct order. That
# solution works in linear time, but is more ugly.
and this link http://www.ics.uci.edu/~pattis/ICS-33/lectures/complexitypython.txt
append is O(1), reverse is O(n) but then it also says that pop is O(n) so which is which? Anyway I have modified the accepted answer to use pop(-1):
def linear_merge(list1, list2):
# +++your code here+++
ret = []
while list1 and list2:
if list1[-1] > list2[-1]:
ret.append(list1.pop(-1))
else:
ret.append(list2.pop(-1))
ret.reverse()
return list1 + list2 + ret

This solution runs in linear time and without editing l1 and l2:
def merge(l1, l2):
m, m2 = len(l1), len(l2)
newList = []
l, r = 0, 0
while l < m and r < m2:
if l1[l] < l2[r]:
newList.append(l1[l])
l += 1
else:
newList.append(l2[r])
r += 1
return newList + l1[l:] + l2[r:]

Find the most common element in a list

What is an efficient way to find the most common element in a Python list?
My list items may not be hashable so can't use a dictionary.
Also in case of draws the item with the lowest index should be returned. Example:
>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'

A simpler one-liner:
def most_common(lst):
return max(set(lst), key=lst.count)

Borrowing from here, this can be used with Python 2.7:
from collections import Counter
def Most_Common(lst):
data = Counter(lst)
return data.most_common(1)[0][0]
Works around 4-6 times faster than Alex's solutions, and is 50 times faster than the one-liner proposed by newacct.
On CPython 3.6+ (any Python 3.7+) the above will select the first seen element in case of ties. If you're running on older Python, to retrieve the element that occurs first in the list in case of ties you need to do two passes to preserve order:
# Only needed pre-3.6!
def most_common(lst):
data = Counter(lst)
return max(lst, key=data.get)

With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:
print most_common(['goose', 'duck', 'duck', 'goose'])
emits:
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose
As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).
groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].
Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.
This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:
def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]
same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)
Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).
from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]

What you want is known in statistics as mode, and Python of course has a built-in function to do exactly that for you:
>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3
Note that if there is no "most common element" such as cases where the top two are tied, this will raise StatisticsError on Python
<=3.7, and on 3.8 onwards it will return the first one encountered.

Without the requirement about the lowest index, you can use collections.Counter for this:
from collections import Counter
a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]
c = Counter(a)
print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'

If they are not hashable, you can sort them and do a single loop over the result counting the items (identical items will be next to each other). But it might be faster to make them hashable and use a dict.
def most_common(lst):
cur_length = 0
max_length = 0
cur_i = 0
max_i = 0
cur_item = None
max_item = None
for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
if cur_item is None or cur_item != item:
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
max_length = cur_length
max_i = cur_i
max_item = cur_item
cur_length = 1
cur_i = i
cur_item = item
else:
cur_length += 1
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
return cur_item
return max_item

This is an O(n) solution.
mydict = {}
cnt, itm = 0, ''
for item in reversed(lst):
mydict[item] = mydict.get(item, 0) + 1
if mydict[item] >= cnt :
cnt, itm = mydict[item], item
print itm
(reversed is used to make sure that it returns the lowest index item)

Sort a copy of the list and find the longest run. You can decorate the list before sorting it with the index of each element, and then choose the run that starts with the lowest index in the case of a tie.

A one-liner:
def most_common (lst):
return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]

I am doing this using scipy stat module and lambda:
import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))
Result:
most_freq_val = 5

# use Decorate, Sort, Undecorate to solve the problem
def most_common(iterable):
# Make a list with tuples: (item, index)
# The index will be used later to break ties for most common item.
lst = [(x, i) for i, x in enumerate(iterable)]
lst.sort()
# lst_final will also be a list of tuples: (count, index, item)
# Sorting on this list will find us the most common item, and the index
# will break ties so the one listed first wins. Count is negative so
# largest count will have lowest value and sort first.
lst_final = []
# Get an iterator for our new list...
itr = iter(lst)
# ...and pop the first tuple off. Setup current state vars for loop.
count = 1
tup = next(itr)
x_cur, i_cur = tup
# Loop over sorted list of tuples, counting occurrences of item.
for tup in itr:
# Same item again?
if x_cur == tup[0]:
# Yes, same item; increment count
count += 1
else:
# No, new item, so write previous current item to lst_final...
t = (-count, i_cur, x_cur)
lst_final.append(t)
# ...and reset current state vars for loop.
x_cur, i_cur = tup
count = 1
# Write final item after loop ends
t = (-count, i_cur, x_cur)
lst_final.append(t)
lst_final.sort()
answer = lst_final[0][2]
return answer
print most_common(['x', 'e', 'a', 'e', 'a', 'e', 'e']) # prints 'e'
print most_common(['goose', 'duck', 'duck', 'goose']) # prints 'goose'

Building on Luiz's answer, but satisfying the "in case of draws the item with the lowest index should be returned" condition:
from statistics import mode, StatisticsError
def most_common(l):
try:
return mode(l)
except StatisticsError as e:
# will only return the first element if no unique mode found
if 'no unique mode' in e.args[0]:
return l[0]
# this is for "StatisticsError: no mode for empty data"
# after calling mode([])
raise
Example:
>>> most_common(['a', 'b', 'b'])
'b'
>>> most_common([1, 2])
1
>>> most_common([])
StatisticsError: no mode for empty data

Simple one line solution
moc= max([(lst.count(chr),chr) for chr in set(lst)])
It will return most frequent element with its frequency.

You probably don't need this anymore, but this is what I did for a similar problem. (It looks longer than it is because of the comments.)
itemList = ['hi', 'hi', 'hello', 'bye']
counter = {}
maxItemCount = 0
for item in itemList:
try:
# Referencing this will cause a KeyError exception
# if it doesn't already exist
counter[item]
# ... meaning if we get this far it didn't happen so
# we'll increment
counter[item] += 1
except KeyError:
# If we got a KeyError we need to create the
# dictionary key
counter[item] = 1
# Keep overwriting maxItemCount with the latest number,
# if it's higher than the existing itemCount
if counter[item] > maxItemCount:
maxItemCount = counter[item]
mostPopularItem = item
print mostPopularItem

ans = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)
all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())
4
print(all_ans[max_key])
1

#This will return the list sorted by frequency:
def orderByFrequency(list):
listUniqueValues = np.unique(list)
listQty = []
listOrderedByFrequency = []
for i in range(len(listUniqueValues)):
listQty.append(list.count(listUniqueValues[i]))
for i in range(len(listQty)):
index_bigger = np.argmax(listQty)
for j in range(listQty[index_bigger]):
listOrderedByFrequency.append(listUniqueValues[index_bigger])
listQty[index_bigger] = -1
return listOrderedByFrequency
#And this will return a list with the most frequent values in a list:
def getMostFrequentValues(list):
if (len(list) <= 1):
return list
list_most_frequent = []
list_ordered_by_frequency = orderByFrequency(list)
list_most_frequent.append(list_ordered_by_frequency[0])
frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
index = 0
while(index < len(list_ordered_by_frequency)):
index = index + frequency
if(index < len(list_ordered_by_frequency)):
testValue = list_ordered_by_frequency[index]
testValueFrequency = list_ordered_by_frequency.count(testValue)
if (testValueFrequency == frequency):
list_most_frequent.append(testValue)
else:
break
return list_most_frequent
#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))
Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]

Here:
def most_common(l):
max = 0
maxitem = None
for x in set(l):
count = l.count(x)
if count > max:
max = count
maxitem = x
return maxitem
I have a vague feeling there is a method somewhere in the standard library that will give you the count of each element, but I can't find it.

This is the obvious slow solution (O(n^2)) if neither sorting nor hashing is feasible, but equality comparison (==) is available:
def most_common(items):
if not items:
raise ValueError
fitems = []
best_idx = 0
for item in items:
item_missing = True
i = 0
for fitem in fitems:
if fitem[0] == item:
fitem[1] += 1
d = fitem[1] - fitems[best_idx][1]
if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
best_idx = i
item_missing = False
break
i += 1
if item_missing:
fitems.append([item, 1, i])
return items[best_idx]
But making your items hashable or sortable (as recommended by other answers) would almost always make finding the most common element faster if the length of your list (n) is large. O(n) on average with hashing, and O(n*log(n)) at worst for sorting.

>>> li = ['goose', 'duck', 'duck']
>>> def foo(li):
st = set(li)
mx = -1
for each in st:
temp = li.count(each):
if mx < temp:
mx = temp
h = each
return h
>>> foo(li)
'duck'

I needed to do this in a recent program. I'll admit it, I couldn't understand Alex's answer, so this is what I ended up with.
def mostPopular(l):
mpEl=None
mpIndex=0
mpCount=0
curEl=None
curCount=0
for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
curCount=curCount+1 if el==curEl else 1
curEl=el
if curCount>mpCount \
or (curCount==mpCount and i<mpIndex):
mpEl=curEl
mpIndex=i
mpCount=curCount
return mpEl, mpCount, mpIndex
I timed it against Alex's solution and it's about 10-15% faster for short lists, but once you go over 100 elements or more (tested up to 200000) it's about 20% slower.

def most_frequent(List):
counter = 0
num = List[0]
for i in List:
curr_frequency = List.count(i)
if(curr_frequency> counter):
counter = curr_frequency
num = i
return num
List = [2, 1, 2, 2, 1, 3]
print(most_frequent(List))

Hi this is a very simple solution, with linear time complexity
L = ['goose', 'duck', 'duck']
def most_common(L):
current_winner = 0
max_repeated = None
for i in L:
amount_times = L.count(i)
if amount_times > current_winner:
current_winner = amount_times
max_repeated = i
return max_repeated
print(most_common(L))
"duck"
Where number, is the element in the list that repeats most of the time

numbers = [1, 3, 7, 4, 3, 0, 3, 6, 3]
max_repeat_num = max(numbers, key=numbers.count) *# which number most* frequently
max_repeat = numbers.count(max_repeat_num) *#how many times*
print(f" the number {max_repeat_num} is repeated{max_repeat} times")

def mostCommonElement(list):
count = {} // dict holder
max = 0 // keep track of the count by key
result = None // holder when count is greater than max
for i in list:
if i not in count:
count[i] = 1
else:
count[i] += 1
if count[i] > max:
max = count[i]
result = i
return result
mostCommonElement(["a","b","a","c"]) -> "a"

The most common element should be the one which is appearing more than N/2 times in the array where N being the len(array). The below technique will do it in O(n) time complexity, with just consuming O(1) auxiliary space.
from collections import Counter
def majorityElement(arr):
majority_elem = Counter(arr)
size = len(arr)
for key, val in majority_elem.items():
if val > size/2:
return key
return -1

def most_common(lst):
if max([lst.count(i)for i in lst]) == 1:
return False
else:
return max(set(lst), key=lst.count)

def popular(L):
C={}
for a in L:
C[a]=L.count(a)
for b in C.keys():
if C[b]==max(C.values()):
return b
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4]
print popular(L)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Check the occurrences in a list against a value - python

lst = [1,2,3,4,1] I want to know 1 occurs twice in this list, is there any efficient way to do?

lst.count(1) would return the number of times it occurs. If you're going to be counting items in a list, O(n) is what you're going to get. The general function on the list is list.count(x), and will return the number of times x occurs in a list.

Another way to get all items that occur more than once: lst = [1,2,3,4,1] d = {} for x in lst: d[x] = x in d print d[1] # True print d[2] # False print [x for x in d if d[x]] # [1]

Related

how to format a one line conditional return statement?

How to find first value in a list having no duplicates?

looping through loops in python?

My implementation of merging two sorted lists in linear time - what could be improved?

Find the most common element in a list

Categories

Resources