Finding the mode of a list - python

Given a list of items, recall that the mode of the list is the item that occurs most often.
I would like to know how to create a function that can find the mode of a list but that displays a message if the list does not have a mode (e.g., all the items in the list only appear once). I want to make this function without importing any functions. I'm trying to make my own function from scratch.

You can use the max function and a key. Have a look at python max function using 'key' and lambda expression.
max(set(lst), key=lst.count)

You can use the Counter supplied in the collections package which has a mode-esque function
from collections import Counter
data = Counter(your_list_in_here)
data.most_common() # Returns all unique items and their counts
data.most_common(1) # Returns the highest occurring item
Note: Counter is new in python 2.7 and is not available in earlier versions.

Python 3.4 includes the method statistics.mode, so it is straightforward:
>>> from statistics import mode
>>> mode([1, 1, 2, 3, 3, 3, 3, 4])
3
You can have any type of elements in the list, not just numeric:
>>> mode(["red", "blue", "blue", "red", "green", "red", "red"])
'red'

Taking a leaf from some statistics software, namely SciPy and MATLAB, these just return the smallest most common value, so if two values occur equally often, the smallest of these are returned. Hopefully an example will help:
>>> from scipy.stats import mode
>>> mode([1, 2, 3, 4, 5])
(array([ 1.]), array([ 1.]))
>>> mode([1, 2, 2, 3, 3, 4, 5])
(array([ 2.]), array([ 2.]))
>>> mode([1, 2, 2, -3, -3, 4, 5])
(array([-3.]), array([ 2.]))
Is there any reason why you can 't follow this convention?

There are many simple ways to find the mode of a list in Python such as:
import statistics
statistics.mode([1,2,3,3])
>>> 3
Or, you could find the max by its count
max(array, key = array.count)
The problem with those two methods are that they don't work with multiple modes. The first returns an error, while the second returns the first mode.
In order to find the modes of a set, you could use this function:
def mode(array):
most = max(list(map(array.count, array)))
return list(set(filter(lambda x: array.count(x) == most, array)))

Extending the Community answer that will not work when the list is empty, here is working code for mode:
def mode(arr):
if arr==[]:
return None
else:
return max(set(arr), key=arr.count)

In case you are interested in either the smallest, largest or all modes:
def get_small_mode(numbers, out_mode):
counts = {k:numbers.count(k) for k in set(numbers)}
modes = sorted(dict(filter(lambda x: x[1] == max(counts.values()), counts.items())).keys())
if out_mode=='smallest':
return modes[0]
elif out_mode=='largest':
return modes[-1]
else:
return modes

A little longer, but can have multiple modes and can get string with most counts or mix of datatypes.
def getmode(inplist):
'''with list of items as input, returns mode
'''
dictofcounts = {}
listofcounts = []
for i in inplist:
countofi = inplist.count(i) # count items for each item in list
listofcounts.append(countofi) # add counts to list
dictofcounts[i]=countofi # add counts and item in dict to get later
maxcount = max(listofcounts) # get max count of items
if maxcount ==1:
print "There is no mode for this dataset, values occur only once"
else:
modelist = [] # if more than one mode, add to list to print out
for key, item in dictofcounts.iteritems():
if item ==maxcount: # get item from original list with most counts
modelist.append(str(key))
print "The mode(s) are:",' and '.join(modelist)
return modelist

Mode of a data set is/are the member(s) that occur(s) most frequently in the set. If there are two members that appear most often with same number of times, then the data has two modes. This is called bimodal.If there are more than 2 modes, then the data would be called multimodal. If all the members in the data set appear the same number of times, then the data set has no mode at all. Following function modes() can work to find mode(s) in a given list of data:
import numpy as np; import pandas as pd
def modes(arr):
df = pd.DataFrame(arr, columns=['Values'])
dat = pd.crosstab(df['Values'], columns=['Freq'])
if len(np.unique((dat['Freq']))) > 1:
mode = list(dat.index[np.array(dat['Freq'] == max(dat['Freq']))])
return mode
else:
print("There is NO mode in the data set")
Output:
# For a list of numbers in x as
In [1]: x = [2, 3, 4, 5, 7, 9, 8, 12, 2, 1, 1, 1, 3, 3, 2, 6, 12, 3, 7, 8, 9, 7, 12, 10, 10, 11, 12, 2]
In [2]: modes(x)
Out[2]: [2, 3, 12]
# For a list of repeated numbers in y as
In [3]: y = [2, 2, 3, 3, 4, 4, 10, 10]
In [4]: modes(y)
Out[4]: There is NO mode in the data set
# For a list of strings/characters in z as
In [5]: z = ['a', 'b', 'b', 'b', 'e', 'e', 'e', 'd', 'g', 'g', 'c', 'g', 'g', 'a', 'a', 'c', 'a']
In [6]: modes(z)
Out[6]: ['a', 'g']
If we do not want to import numpy or pandas to call any function from these packages, then to get this same output, modes() function can be written as:
def modes(arr):
cnt = []
for i in arr:
cnt.append(arr.count(i))
uniq_cnt = []
for i in cnt:
if i not in uniq_cnt:
uniq_cnt.append(i)
if len(uniq_cnt) > 1:
m = []
for i in list(range(len(cnt))):
if cnt[i] == max(uniq_cnt):
m.append(arr[i])
mode = []
for i in m:
if i not in mode:
mode.append(i)
return mode
else:
print("There is NO mode in the data set")

I wrote up this handy function to find the mode.
def mode(nums):
corresponding={}
occurances=[]
for i in nums:
count = nums.count(i)
corresponding.update({i:count})
for i in corresponding:
freq=corresponding[i]
occurances.append(freq)
maxFreq=max(occurances)
keys=corresponding.keys()
values=corresponding.values()
index_v = values.index(maxFreq)
global mode
mode = keys[index_v]
return mode

Short, but somehow ugly:
def mode(arr) :
m = max([arr.count(a) for a in arr])
return [x for x in arr if arr.count(x) == m][0] if m>1 else None
Using a dictionary, slightly less ugly:
def mode(arr) :
f = {}
for a in arr : f[a] = f.get(a,0)+1
m = max(f.values())
t = [(x,f[x]) for x in f if f[x]==m]
return m > 1 t[0][0] else None

This function returns the mode or modes of a function no matter how many, as well as the frequency of the mode or modes in the dataset. If there is no mode (ie. all items occur only once), the function returns an error string. This is similar to A_nagpal's function above but is, in my humble opinion, more complete, and I think it's easier to understand for any Python novices (such as yours truly) reading this question to understand.
def l_mode(list_in):
count_dict = {}
for e in (list_in):
count = list_in.count(e)
if e not in count_dict.keys():
count_dict[e] = count
max_count = 0
for key in count_dict:
if count_dict[key] >= max_count:
max_count = count_dict[key]
corr_keys = []
for corr_key, count_value in count_dict.items():
if count_dict[corr_key] == max_count:
corr_keys.append(corr_key)
if max_count == 1 and len(count_dict) != 1:
return 'There is no mode for this data set. All values occur only once.'
else:
corr_keys = sorted(corr_keys)
return corr_keys, max_count

For a number to be a mode, it must occur more number of times than at least one other number in the list, and it must not be the only number in the list. So, I refactored #mathwizurd's answer (to use the difference method) as follows:
def mode(array):
'''
returns a set containing valid modes
returns a message if no valid mode exists
- when all numbers occur the same number of times
- when only one number occurs in the list
- when no number occurs in the list
'''
most = max(map(array.count, array)) if array else None
mset = set(filter(lambda x: array.count(x) == most, array))
return mset if set(array) - mset else "list does not have a mode!"
These tests pass successfully:
mode([]) == None
mode([1]) == None
mode([1, 1]) == None
mode([1, 1, 2, 2]) == None

Here is how you can find mean,median and mode of a list:
import numpy as np
from scipy import stats
#to take input
size = int(input())
numbers = list(map(int, input().split()))
print(np.mean(numbers))
print(np.median(numbers))
print(int(stats.mode(numbers)[0]))

Simple code that finds the mode of the list without any imports:
nums = #your_list_goes_here
nums.sort()
counts = dict()
for i in nums:
counts[i] = counts.get(i, 0) + 1
mode = max(counts, key=counts.get)
In case of multiple modes, it should return the minimum node.

Why not just
def print_mode (thelist):
counts = {}
for item in thelist:
counts [item] = counts.get (item, 0) + 1
maxcount = 0
maxitem = None
for k, v in counts.items ():
if v > maxcount:
maxitem = k
maxcount = v
if maxcount == 1:
print "All values only appear once"
elif counts.values().count (maxcount) > 1:
print "List has multiple modes"
else:
print "Mode of list:", maxitem
This doesn't have a few error checks that it should have, but it will find the mode without importing any functions and will print a message if all values appear only once. It will also detect multiple items sharing the same maximum count, although it wasn't clear if you wanted that.

This will return all modes:
def mode(numbers)
largestCount = 0
modes = []
for x in numbers:
if x in modes:
continue
count = numbers.count(x)
if count > largestCount:
del modes[:]
modes.append(x)
largestCount = count
elif count == largestCount:
modes.append(x)
return modes

For those looking for the minimum mode, e.g:case of bi-modal distribution, using numpy.
import numpy as np
mode = np.argmax(np.bincount(your_list))

Okey! So community has already a lot of answers and some of them used another function and you don't want.
let we create our very simple and easily understandable function.
import numpy as np
#Declare Function Name
def calculate_mode(lst):
Next step is to find Unique elements in list and thier respective frequency.
unique_elements,freq = np.unique(lst, return_counts=True)
Get mode
max_freq = np.max(freq) #maximum frequency
mode_index = np.where(freq==max_freq) #max freq index
mode = unique_elements[mode_index] #get mode by index
return mode
Example
lst =np.array([1,1,2,3,4,4,4,5,6])
print(calculate_mode(lst))
>>> Output [4]

How my brain decided to do it completely from scratch. Efficient and concise :) (jk lol)
import random
def removeDuplicates(arr):
dupFlag = False
for i in range(len(arr)):
#check if we found a dup, if so, stop
if dupFlag:
break
for j in range(len(arr)):
if ((arr[i] == arr[j]) and (i != j)):
arr.remove(arr[j])
dupFlag = True
break;
#if there was a duplicate repeat the process, this is so we can account for the changing length of the arr
if (dupFlag):
removeDuplicates(arr)
else:
#if no duplicates return the arr
return arr
#currently returns modes and all there occurences... Need to handle dupes
def mode(arr):
numCounts = []
#init numCounts
for i in range(len(arr)):
numCounts += [0]
for i in range(len(arr)):
count = 1
for j in range(len(arr)):
if (arr[i] == arr[j] and i != j):
count += 1
#add the count for that number to the corresponding index
numCounts[i] = count
#find which has the greatest number of occurences
greatestNum = 0
for i in range(len(numCounts)):
if (numCounts[i] > greatestNum):
greatestNum = numCounts[i]
#finally return the mode(s)
modes = []
for i in range(len(numCounts)):
if numCounts[i] == greatestNum:
modes += [arr[i]]
#remove duplicates (using aliasing)
print("modes: ", modes)
removeDuplicates(modes)
print("modes after removing duplicates: ", modes)
return modes
def initArr(n):
arr = []
for i in range(n):
arr += [random.randrange(0, n)]
return arr
#initialize an array of random ints
arr = initArr(1000)
print(arr)
print("_______________________________________________")
modes = mode(arr)
#print result
print("Mode is: ", modes) if (len(modes) == 1) else print("Modes are: ", modes)

def mode(inp_list):
sort_list = sorted(inp_list)
dict1 = {}
for i in sort_list:
count = sort_list.count(i)
if i not in dict1.keys():
dict1[i] = count
maximum = 0 #no. of occurences
max_key = -1 #element having the most occurences
for key in dict1:
if(dict1[key]>maximum):
maximum = dict1[key]
max_key = key
elif(dict1[key]==maximum):
if(key<max_key):
maximum = dict1[key]
max_key = key
return max_key

def mode(data):
lst =[]
hgh=0
for i in range(len(data)):
lst.append(data.count(data[i]))
m= max(lst)
ml = [x for x in data if data.count(x)==m ] #to find most frequent values
mode = []
for x in ml: #to remove duplicates of mode
if x not in mode:
mode.append(x)
return mode
print mode([1,2,2,2,2,7,7,5,5,5,5])

Here is a simple function that gets the first mode that occurs in a list. It makes a dictionary with the list elements as keys and number of occurrences and then reads the dict values to get the mode.
def findMode(readList):
numCount={}
highestNum=0
for i in readList:
if i in numCount.keys(): numCount[i] += 1
else: numCount[i] = 1
for i in numCount.keys():
if numCount[i] > highestNum:
highestNum=numCount[i]
mode=i
if highestNum != 1: print(mode)
elif highestNum == 1: print("All elements of list appear once.")

If you want a clear approach, useful for classroom and only using lists and dictionaries by comprehension, you can do:
def mode(my_list):
# Form a new list with the unique elements
unique_list = sorted(list(set(my_list)))
# Create a comprehensive dictionary with the uniques and their count
appearance = {a:my_list.count(a) for a in unique_list}
# Calculate max number of appearances
max_app = max(appearance.values())
# Return the elements of the dictionary that appear that # of times
return {k: v for k, v in appearance.items() if v == max_app}

#function to find mode
def mode(data):
modecnt=0
#for count of number appearing
for i in range(len(data)):
icount=data.count(data[i])
#for storing count of each number in list will be stored
if icount>modecnt:
#the loop activates if current count if greater than the previous count
mode=data[i]
#here the mode of number is stored
modecnt=icount
#count of the appearance of number is stored
return mode
print mode(data1)

import numpy as np
def get_mode(xs):
values, counts = np.unique(xs, return_counts=True)
max_count_index = np.argmax(counts) #return the index with max value counts
return values[max_count_index]
print(get_mode([1,7,2,5,3,3,8,3,2]))

Perhaps try the following. It is O(n) and returns a list of floats (or ints). It is thoroughly, automatically tested. It uses collections.defaultdict, but I'd like to think you're not opposed to using that. It can also be found at https://stromberg.dnsalias.org/~strombrg/stddev.html
def compute_mode(list_: typing.List[float]) -> typing.List[float]:
"""
Compute the mode of list_.
Note that the return value is a list, because sometimes there is a tie for "most common value".
See https://stackoverflow.com/questions/10797819/finding-the-mode-of-a-list
"""
if not list_:
raise ValueError('Empty list')
if len(list_) == 1:
raise ValueError('Single-element list')
value_to_count_dict: typing.DefaultDict[float, int] = collections.defaultdict(int)
for element in list_:
value_to_count_dict[element] += 1
count_to_values_dict = collections.defaultdict(list)
for value, count in value_to_count_dict.items():
count_to_values_dict[count].append(value)
counts = list(count_to_values_dict)
if len(counts) == 1:
raise ValueError('All elements in list are the same')
maximum_occurrence_count = max(counts)
if maximum_occurrence_count == 1:
raise ValueError('No element occurs more than once')
minimum_occurrence_count = min(counts)
if maximum_occurrence_count <= minimum_occurrence_count:
raise ValueError('Maximum count not greater than minimum count')
return count_to_values_dict[maximum_occurrence_count]

Related

What is the most efficient way of getting the intersection of k sorted arrays?

Given k sorted arrays what is the most efficient way of getting the intersection of these lists
Example
INPUT:
[[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]
Output:
[1,7]
There is a way to get the union of k sorted arrays based on what I read in the Elements of programming interviews book in nlogk time. I was wondering if there is a way to do something similar for the intersection as well
## merge sorted arrays in nlogk time [ regular appending and merging is nlogn time ]
import heapq
def mergeArys(srtd_arys):
heap = []
srtd_iters = [iter(x) for x in srtd_arys]
# put the first element from each srtd array onto the heap
for idx, it in enumerate(srtd_iters):
elem = next(it, None)
if elem:
heapq.heappush(heap, (elem, idx))
res = []
# collect results in nlogK time
while heap:
elem, ary = heapq.heappop(heap)
it = srtd_iters[ary]
res.append(elem)
nxt = next(it, None)
if nxt:
heapq.heappush(heap, (nxt, ary))
EDIT: obviously this is an algorithm question that I am trying to solve so I cannot use any of the inbuilt functions like set intersection etc
Exploiting sort order
Here is a single pass O(n) approach that doesn't require any special data structures or auxiliary memory beyond the fundamental requirement of one iterator per input.
from itertools import cycle, islice
def intersection(inputs):
"Yield the intersection of elements from multiple sorted inputs."
# intersection(['ABBCD', 'BBDE', 'BBBDDE']) --> B B D
n = len(inputs)
iters = cycle(map(iter, inputs))
try:
candidate = next(next(iters))
while True:
for it in islice(iters, n-1):
while (value := next(it)) < candidate:
pass
if value != candidate:
candidate = value
break
else:
yield candidate
candidate = next(next(iters))
except StopIteration:
return
Here's a sample session:
>>> data = [[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]
>>> list(intersection(data))
[1, 7]
>>> data = [[1,1,2,3], [1,1,4,4]]
>>> list(intersection(data))
[1, 1]
Algorithm in words
The algorithm starts by selecting the next value from the next iterator to be a candidate.
The main loop assumes a candidate has been selected and it loops over the next n - 1 iterators. For each of those iterators, it consumes values until it finds a value that is a least as large as the candidate. If that value is larger than the candidate, that value becomes the new candidate and the main loop starts again. If all n - 1 values are equal to the candidate, then the candidate is emitted and a new candidate is fetched.
When any input iterator is exhausted, the algorithm is complete.
Doing it without libraries (core language only)
The same algorithm works fine (though less beautifully) without using itertools. Just replace cycle and islice with their list based equivalents:
def intersection(inputs):
"Yield the intersection of elements from multiple sorted inputs."
# intersection(['ABBCD', 'BBDE', 'BBBDDE']) --> B B D
n = len(inputs)
iters = list(map(iter, inputs))
curr_iter = 0
try:
it = iters[curr_iter]
curr_iter = (curr_iter + 1) % n
candidate = next(it)
while True:
for i in range(n - 1):
it = iters[curr_iter]
curr_iter = (curr_iter + 1) % n
while (value := next(it)) < candidate:
pass
if value != candidate:
candidate = value
break
else:
yield candidate
it = iters[curr_iter]
curr_iter = (curr_iter + 1) % n
candidate = next(it)
except StopIteration:
return
Yes, it is possible! I've modified your example code to do this.
My answer assumes that your question is about the algorithm - if you want the fastest-running code using sets, see other answers.
This maintains the O(n log(k)) time complexity: all the code between if lowest != elem or ary != times_seen: and unbench_all = False is O(log(k)). There is a nested loop inside the main loop (for unbenched in range(times_seen):) but this only runs times_seen times, and times_seen is initially 0 and is reset to 0 after every time this inner loop is run, and can only be incremented once per main loop iteration, so the inner loop cannot do more iterations in total than the main loop. Thus, since the code inside the inner loop is O(log(k)) and runs at most as many times as the outer loop, and the outer loop is O(log(k)) and runs n times, the algorithm is O(n log(k)).
This algorithm relies upon how tuples are compared in Python. It compares the first items of the tuples, and if they are equal it, compares the second items (i.e. (x, a) < (x, b) is true if and only if a < b).
In this algorithm, unlike in the example code in the question, when an item is popped from the heap, it is not necessarily pushed again in the same iteration. Since we need to check if all sub-lists contain the same number, after a number is popped from the heap, it's sublist is what I call "benched", meaning that it is not added back to the heap. This is because we need to check if other sub-lists contain the same item, so adding this sub-list's next item is not needed right now.
If a number is indeed in all sub-lists, then the heap will look something like [(2,0),(2,1),(2,2),(2,3)], with all the first elements of the tuples the same, so heappop will select the one with the lowest sub-list index. This means that first index 0 will be popped and times_seen will be incremented to 1, then index 1 will be popped and times_seen will be incremented to 2 - if ary is not equal to times_seen then the number is not in the intersection of all sub-lists. This leads to the condition if lowest != elem or ary != times_seen:, which decides when a number shouldn't be in the result. The else branch of this if statement is for when it still might be in the result.
The unbench_all boolean is for when all sub-lists need to be removed from the bench - this could be because:
The current number is known to not be in the intersection of the sub-lists
It is known to be in the intersection of the sub-lists
When unbench_all is True, all the sub-lists that were removed from the heap are re-added. It is known that these are the ones with indices in range(times_seen) since the algorithm removes items from the heap only if they have the same number, so they must have been removed in order of index, contiguously and starting from index 0, and there must be times_seen of them. This means that we don't need to store the indices of the benched sub-lists, only the number that have been benched.
import heapq
def mergeArys(srtd_arys):
heap = []
srtd_iters = [iter(x) for x in srtd_arys]
# put the first element from each srtd array onto the heap
for idx, it in enumerate(srtd_iters):
elem = next(it, None)
if elem:
heapq.heappush(heap, (elem, idx))
res = []
# the number of tims that the current number has been seen
times_seen = 0
# the lowest number from the heap - currently checking if the first numbers in all sub-lists are equal to this
lowest = heap[0][0] if heap else None
# collect results in nlogK time
while heap:
elem, ary = heap[0]
unbench_all = True
if lowest != elem or ary != times_seen:
if lowest == elem:
heapq.heappop(heap)
it = srtd_iters[ary]
nxt = next(it, None)
if nxt:
heapq.heappush(heap, (nxt, ary))
else:
heapq.heappop(heap)
times_seen += 1
if times_seen == len(srtd_arys):
res.append(elem)
else:
unbench_all = False
if unbench_all:
for unbenched in range(times_seen):
unbenched_it = srtd_iters[unbenched]
nxt = next(unbenched_it, None)
if nxt:
heapq.heappush(heap, (nxt, unbenched))
times_seen = 0
if heap:
lowest = heap[0][0]
return res
if __name__ == '__main__':
a1 = [[1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 4, 7, 9]]
a2 = [[1, 1], [1, 1, 2, 2, 3]]
for arys in [a1, a2]:
print(mergeArys(arys))
An equivalent algorithm can be written like this, if you prefer:
def mergeArys(srtd_arys):
heap = []
srtd_iters = [iter(x) for x in srtd_arys]
# put the first element from each srtd array onto the heap
for idx, it in enumerate(srtd_iters):
elem = next(it, None)
if elem:
heapq.heappush(heap, (elem, idx))
res = []
# collect results in nlogK time
while heap:
elem, ary = heap[0]
lowest = elem
keep_elem = True
for i in range(len(srtd_arys)):
elem, ary = heap[0]
if lowest != elem or ary != i:
if ary != i:
heapq.heappop(heap)
it = srtd_iters[ary]
nxt = next(it, None)
if nxt:
heapq.heappush(heap, (nxt, ary))
keep_elem = False
i -= 1
break
heapq.heappop(heap)
if keep_elem:
res.append(elem)
for unbenched in range(i+1):
unbenched_it = srtd_iters[unbenched]
nxt = next(unbenched_it, None)
if nxt:
heapq.heappush(heap, (nxt, unbenched))
if len(heap) < len(srtd_arys):
heap = []
return res
You can use builtin sets and sets intersections :
d = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
result = set(d[0]).intersection(*d[1:])
{1, 7}
You can use reduce:
from functools import reduce
a = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
reduce(lambda x, y: x & set(y), a[1:], set(a[0]))
{1, 7}
I've come up with this algorithm. It doesn't exceed O(nk) I don't know if it's good enough for you. the point of this algorithm is that you can have k indexes for each array and each iteration you find the indexes of the next element in the intersection and increase every index until you exceed the bounds of an array and there are no more items in the intersection. the trick is since the arrays are sorted you can look at two elements in two different arrays and if one is bigger than the other you can instantly throw away the other because you know you cant have a smaller number than the one you are looking at. the worst case of this algorithm is that every index will be increased to the bound which takes kn time since an index cannot decrease its value.
inter = []
for n in range(len(arrays[0])):
if indexes[0] >= len(arrays[0]):
return inter
for i in range(1,k):
if indexes[i] >= len(arrays[i]):
return inter
while indexes[i] < len(arrays[i]) and arrays[i][indexes[i]] < arrays[0][indexes[0]]:
indexes[i] += 1
while indexes[i] < len(arrays[i]) and indexes[0] < len(arrays[0]) and arrays[i][indexes[i]] > arrays[0][indexes[0]]:
indexes[0] += 1
if indexes[0] < len(arrays[0]):
inter.append(arrays[0][indexes[0]])
indexes = [idx+1 for idx in indexes]
return inter
You said we can't use sets but how about dicts / hash tables? (yes I know they're basically the same thing) :D
If so, here's a fairly simple approach (please excuse the py2 syntax):
arrays = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
counts = {}
for ar in arrays:
last = None
for i in ar:
if (i != last):
counts[i] = counts.get(i, 0) + 1
last = i
N = len(arrays)
intersection = [i for i, n in counts.iteritems() if n == N]
print intersection
Same as Raymond Hettinger's solution but with more basic python code:
def intersection(arrays, unique: bool=False):
result = []
if not len(arrays) or any(not len(array) for array in arrays):
return result
pointers = [0] * len(arrays)
target = arrays[0][0]
start_step = 0
current_step = 1
while True:
idx = current_step % len(arrays)
array = arrays[idx]
while pointers[idx] < len(array) and array[pointers[idx]] < target:
pointers[idx] += 1
if pointers[idx] < len(array) and array[pointers[idx]] > target:
target = array[pointers[idx]]
start_step = current_step
current_step += 1
continue
if unique:
while (
pointers[idx] + 1 < len(array)
and array[pointers[idx]] == array[pointers[idx] + 1]
):
pointers[idx] += 1
if (current_step - start_step) == len(arrays):
result.append(target)
for other_idx, other_array in enumerate(arrays):
pointers[other_idx] += 1
if pointers[idx] < len(array):
target = array[pointers[idx]]
start_step = current_step
if pointers[idx] == len(array):
return result
current_step += 1
Here's an O(n) answer (where n = sum(len(sublist) for sublist in data)).
from itertools import cycle
def intersection(data):
result = []
maxval = float("-inf")
consecutive = 0
try:
for sublist in cycle(iter(sublist) for sublist in data):
value = next(sublist)
while value < maxval:
value = next(sublist)
if value > maxval:
maxval = value
consecutive = 0
continue
consecutive += 1
if consecutive >= len(data)-1:
result.append(maxval)
consecutive = 0
except StopIteration:
return result
print(intersection([[1,3,5,7], [1,1,3,5,7], [1,4,7,9]]))
[1, 7]
Some of the above methods are not covering the examples when there are duplicates in every subset of the list. The Below code implements this intersection and it will be more efficient if there are lots of duplicates in the subset of the list :) If not sure about duplicates it is recommended to use Counter from collections from collections import Counter. The custom counter function is made for increasing the efficiency of handling large duplicates. But still can not beat Raymond Hettinger's implementation.
def counter(my_list):
my_list = sorted(my_list)
first_val, *all_val = my_list
p_index = my_list.index(first_val)
my_counter = {}
for item in all_val:
c_index = my_list.index(item)
diff = abs(c_index-p_index)
p_index = c_index
my_counter[first_val] = diff
first_val = item
c_index = my_list.index(item)
diff = len(my_list) - c_index
my_counter[first_val] = diff
return my_counter
def my_func(data):
if not data or not isinstance(data, list):
return
# get the first value
first_val, *all_val = data
if not isinstance(first_val, list):
return
# count items in first value
p = counter(first_val) # counter({1: 2, 3: 1, 5: 1, 7: 1})
# collect all common items and calculate the minimum occurance in intersection
for val in all_val:
# collecting common items
c = counter(val)
# calculate the minimum occurance in intersection
inner_dict = {}
for inner_val in set(c).intersection(set(p)):
inner_dict[inner_val] = min(p[inner_val], c[inner_val])
p = inner_dict
# >>>p
# {1: 2, 7: 1}
# Sort by keys of counter
sorted_items = sorted(p.items(), key=lambda x:x[0]) # [(1, 2), (7, 1)]
result=[i[0] for i in sorted_items for _ in range(i[1])] # [1, 1, 7]
return result
Here are the sample Examples
>>> data = [[1,3,5,7],[1,1,3,5,7],[1,4,7,9]]
>>> my_func(data=data)
[1, 7]
>>> data = [[1,1,3,5,7],[1,1,3,5,7],[1,1,4,7,9]]
>>> my_func(data=data)
[1, 1, 7]
You can do the following using the functions heapq.merge, chain.from_iterable and groupby
from heapq import merge
from itertools import groupby, chain
ls = [[1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 4, 7, 9]]
def index_groups(lst):
"""[1, 1, 3, 5, 7] -> [(1, 0), (1, 1), (3, 0), (5, 0), (7, 0)]"""
return chain.from_iterable(((e, i) for i, e in enumerate(group)) for k, group in groupby(lst))
iterables = (index_groups(li) for li in ls)
flat = merge(*iterables)
res = [k for (k, _), g in groupby(flat) if sum(1 for _ in g) == len(ls)]
print(res)
Output
[1, 7]
The idea is to give an extra value (using enumerate) to differentiate between equal values within the same list (see the function index_groups).
The complexity of this algorithm is O(n) where n is the sum of the lengths of each list in the input.
Note that the output for (an extra 1 en each list):
ls = [[1, 1, 3, 5, 7], [1, 1, 3, 5, 7], [1, 1, 4, 7, 9]]
is:
[1, 1, 7]
You can use bit-masking with one-hot encoding. The inner lists become maxterms. You and them together for the intersection and or them for the union. Then you have to convert back, for which I've used a bit hack.
problem = [[1,3,5,7],[1,1,3,5,8,7],[1,4,7,9]];
debruijn = [0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9];
u32 = accum = (1 << 32) - 1;
for vec in problem:
maxterm = 0;
for v in vec:
maxterm |= 1 << v;
accum &= maxterm;
# https://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn
result = [];
while accum:
power = accum;
accum &= accum - 1; # Peter Wegner CACM 3 (1960), 322
power &= ~accum;
result.append(debruijn[((power * 0x077CB531) & u32) >> 27]);
print result;
This uses (simulates) 32-bit integers, so you can only have [0, 31] in your sets.
*I am inexperienced at Python, so I timed it. One should definitely use set.intersection.
Here is the single-pass counting algorithm, a simplified version of what others have suggested.
def intersection(iterables):
target, count = None, 0
for it in itertools.cycle(map(iter, iterables)):
for value in it:
if count == 0 or value > target:
target, count = value, 1
break
if value == target:
count += 1
break
else: # exhausted iterator
return
if count >= len(iterables):
yield target
count = 0
Binary and exponential search haven't come up yet. They're easily recreated even with the "no builtins" constraint.
In practice, that would be much faster, and sub-linear. In the worst case - where the intersection isn't shrinking - the naive approach would repeat work. But there's a solution for that: integrate the binary search while splitting the arrays in half.
def intersection(seqs):
seq = min(seqs, key=len)
if not seq:
return
pivot = seq[len(seq) // 2]
lows, counts, highs = [], [], []
for seq in seqs:
start = bisect.bisect_left(seq, pivot)
stop = bisect.bisect_right(seq, pivot, start)
lows.append(seq[:start])
counts.append(stop - start)
highs.append(seq[stop:])
yield from intersection(lows)
yield from itertools.repeat(pivot, min(counts))
yield from intersection(highs)
Both handle duplicates. Both guarantee O(N) worst-case time (counting slicing as atomic). The latter will approach O(min_size) speed; by always splitting the smallest in half it essentially can't suffer from the bad luck of uneven splits.
I couldn't help but notice that this is seems to be a variation on the Welfare Crook problem; see David Gries's book, The Science of Programming. Edsger Dijkstra also wrote an EWD about this, see Ascending Functions and the Welfare Crook.
The Welfare Crook
Suppose we have three long magnetic tapes, each containing a list of names in alphabetical order:
all people working for IBM Yorktown
students at Columbia University
people on welfare in New York City
Practically speaking, all three lists are endless, so no upper bounds are given. It is know that at least one person is on all three lists. Write a program to locate the first such person.
Our intersection of the ordered lists problem is a generalization of the Welfare Crook problem.
Here's a (rather primitive?) Python solution to the Welfare Crook problem:
def find_welfare_crook(f, g, h, i, j, k):
"""f, g, and h are "ascending functions," i.e.,
i <= j implies f[i] <= f[j] or, equivalently,
f[i] < f[j] implies i < j, and the same goes for g and h.
i, j, k define where to start the search in each list.
"""
# This is an implementation of a solution to the Welfare Crook
# problems presented in David Gries's book, The Science of Programming.
# The surprising and beautiful thing is that the guard predicates are
# so few and so simple.
i , j , k = i , j , k
while True:
if f[i] < g[j]:
i += 1
elif g[j] < h[k]:
j += 1
elif h[k] < f[i]:
k += 1
else:
break
return (i,j,k)
# The other remarkable thing is how the negation of the guard
# predicates works out to be: f[i] == g[j] and g[j] == c[k].
Generalization to Intersection of K Lists
This generalizes to K lists, and here's what I devised; I don't know how Pythonic this is, but it pretty compact:
def findIntersectionLofL(lofl):
"""Generalized findIntersection function which operates on a "list of lists." """
K = len(lofl)
indices = [0 for i in range(K)]
result = []
#
try:
while True:
# idea is to maintain the indices via a construct like the following:
allEqual = True
for i in range(K):
if lofl[i][indices[i]] < lofl[(i+1)%K][indices[(i+1)%K]] :
indices[i] += 1
allEqual = False
# When the above iteration finishes, if all of the list
# items indexed by the indices are equal, then another
# item common to all of the lists must be added to the result.
if allEqual :
result.append(lofl[0][indices[0]])
while lofl[0][indices[0]] == lofl[1][indices[1]]:
indices[0] += 1
except IndexError as e:
# Eventually, the foregoing iteration will advance one of the
# indices past the end of one of the lists, and when that happens
# an IndexError exception will be raised. This means the algorithm
# is finished.
return result
This solution does not keep repeated items. Changing the program to include all of the repeated items by changing what the program does in the conditional at the end of the "while True" loop is an exercise left to the reader.
Improved Performance
Comments from #greybeard prompted refinements shown below, in the
pre-computation of the "array index moduli" (the "(i+1)%K" expressions) and further investigation also brought about changes to the inner iteration's structure, to further remove overhead:
def findIntersectionLofLunRolled(lofl):
"""Generalized findIntersection function which operates on a "list of lists."
Accepts a list-of-lists, lofl. Each of the lists must be ordered.
Returns the list of each element which appears in all of the lists at least once.
"""
K = len(lofl)
indices = [0] * K
result = []
lt = [ (i, (i+1) % K) for i in range(K) ] # avoids evaluation of index exprs inside the loop
#
try:
while True:
allUnEqual = True
while allUnEqual:
allUnEqual = False
for i,j in lt:
if lofl[i][indices[i]] < lofl[j][indices[j]]:
indices[i] += 1
allUnEqual = True
# Now all of the lofl[i][indices[i]], for all i, are the same value.
# Store that value in the result, and then advance all of the indices
# past that common value:
v = lofl[0][indices[0]]
result.append(v)
for i,j in lt:
while lofl[i][indices[i]] == v:
indices[i] += 1
except IndexError as e:
# Eventually, the foregoing iteration will advance one of the
# indices past the end of one of the lists, and when that happens
# an IndexError exception will be raised. This means the algorithm
# is finished.
return result

How do I generate a table from a list

I have a list that contains sublists with 3 values and I need to print out a list that looks like:
I also need to compare the third column values with eachother to tell if they are increasing or decreasing as you go down.
bb = 3.9
lowest = 0.4
#appending all the information to a list
allinfo= []
while bb>=lowest:
everything = angleWithPost(bb,cc,dd,ee)
allinfo.append(everything)
bb-=0.1
I think the general idea for finding out whether or not the third column values are increasing or decreasing is:
#Checking whether or not Fnet are increasing or decreasing
ii=0
while ii<=(10*(bb-lowest)):
if allinfo[ii][2]>allinfo[ii+1][2]:
abc = "decreasing"
elif allinfo[ii][2]<allinfo[ii+1][2]:
abc = "increasing"
ii+=1
Then when i want to print out my table similar to the one above.
jj=0
while jj<=(10*(bb-lowest))
print "%8.2f %12.2f %12.2f %s" %(allinfo[jj][0], allinfo[jj][1], allinfo[jj][2], abc)
jj+=1
here is the angle with part
def chainPoints(aa,DIS,SEG,H):
#xtuple x chain points
n=0
xterms = []
xterm = -DIS
while n<=SEG:
xterms.append(xterm)
n+=1
xterm = -DIS + n*2*DIS/(SEG)
#
#ytuple y chain points
k=0
yterms = []
while k<=SEG:
yterm = H + aa*m.cosh(xterms[k]/aa) - aa*m.cosh(DIS/aa)
yterms.append(yterm)
k+=1
return(xterms,yterms)
#
#
def chainLength(aa,DIS,SEG,H):
xterms, yterms = chainPoints(aa,DIS,SEG,H)# using x points and y points from the chainpoints function
#length of chain
ff=1
Lterm=0.
totallength=0.
while ff<=SEG:
Lterm = m.sqrt((xterms[ff]-xterms[ff-1])**2 + (yterms[ff]-yterms[ff-1])**2)
totallength += Lterm
ff+=1
return(totallength)
#
def angleWithPost(aa,DIS,SEG,H):
xterms, yterms = chainPoints(aa,DIS,SEG,H)
totallength = chainLength(aa,DIS,SEG,H)
#Find the angle
thetaradians = (m.pi)/2. + m.atan(((yterms[1]-yterms[0])/(xterms[1]-xterms[0])))
#Need to print out the degrees
thetadegrees = (180/m.pi)*thetaradians
#finding the net force
Fnet = abs((rho*grav*totallength))/(2.*m.cos(thetaradians))
return(totallength, thetadegrees, Fnet)
Review this Python2 implementation which uses map and an iterator trick.
from itertools import izip_longest, islice
from pprint import pprint
data = [
[1, 2, 3],
[1, 2, 4],
[1, 2, 3],
[1, 2, 5],
]
class AddDirection(object):
def __init__(self):
# This default is used if the series begins with equal values or has a
# single element.
self.increasing = True
def __call__(self, pair):
crow, nrow = pair
if nrow is None or crow[-1] == nrow[-1]:
# This is the last row or the direction didn't change. Just return
# the direction we previouly had.
inc = self.increasing
elif crow[-1] > nrow[-1]:
inc = False
else:
# Here crow[-1] < nrow[-1].
inc = True
self.increasing = inc
return crow + ["Increasing" if inc else "Decreasing"]
result = map(AddDirection(), izip_longest(data, islice(data, 1, None)))
pprint(result)
The output:
pts/1$ python2 a.py
[[1, 2, 3, 'Increasing'],
[1, 2, 4, 'Decreasing'],
[1, 2, 3, 'Increasing'],
[1, 2, 5, 'Increasing']]
Whenever you want to transform the contents of a list (in this case the list of rows), map is a good place where to begin thinking.
When the algorithm requires data from several places of a list, offsetting the list and zipping the needed values is also a powerful technique. Using generators so that the list doesn't have to be copied, makes this viable in real code.
Finally, when you need to keep state between calls (in this case the direction), using an object is the best choice.
Sorry if the code is too terse!
Basically you want to add a 4th column to the inner list and print the results?
#print headers of table here, use .format for consistent padding
previous = 0
for l in outer_list:
if l[2] > previous:
l.append('increasing')
elif l[2] < previous:
l.append('decreasing')
previous = l[2]
#print row here use .format for consistent padding
Update for list of tuples, add value to tuple:
import random
outer_list = [ (i, i, random.randint(0,10),)for i in range(0,10)]
previous = 0
allinfo = []
for l in outer_list:
if l[2] > previous:
allinfo.append(l +('increasing',))
elif l[2] < previous:
allinfo.append(l +('decreasing',))
previous = l[2]
#print row here use .format for consistent padding
print(allinfo)
This most definitely can be optimized and you could reduce the number of times you are iterating over the data.

looping through loops in python?

I'm trying to solve this problem on the easy section of coderbyte and the prompt is:
Have the function ArrayAdditionI(arr) take the array of numbers stored in arr and return the string true if any combination of numbers in the array can be added up to equal the largest number in the array, otherwise return the string false. For example: if arr contains [4, 6, 23, 10, 1, 3] the output should return true because 4 + 6 + 10 + 3 = 23. The array will not be empty, will not contain all the same elements, and may contain negative numbers.
Here's my solution.
def ArrayAddition(arr):
arr = sorted(arr, reverse=True)
large = arr.pop(0)
storage = 0
placeholder = 0
for r in range(len(arr)):
for n in arr:
if n + storage == large: return True
elif n + storage < large: storage += n
else: continue
storage = 0
if placeholder == 0: placeholder = arr.pop(0)
else: arr.append(placeholder); placeholder = arr.pop(0)
return False
print ArrayAddition([2,95,96,97,98,99,100])
I'm not even sure if this is correct, but it seems to cover all the numbers I plug in. I'm wondering if there is a better way to solve this through algorithm which I know nothing of. I'm thinking a for within a for within a for, etc loop would do the trick, but I don't know how to do that.
What I have in mind is accomplishing this with A+B, A+C, A+D ... A+B+C ... A+B+C+D+E
e.g)
for i in range(len(arr):
print "III: III{}III".format(i)
storage = []
for j in range(len(arr):
print "JJ: II({}),JJ({})".format(i,j)
for k in range(len(arr):
print "K: I{}, J{}, K{}".format(i,j,k)
I've searched all over and found the suggestion of itertool, but I'm wondering if there is a way to write this code up more raw.
Thanks.
A recursive solution:
def GetSum(n, arr):
if len(arr) == 0 and n != 0:
return False
return (n == 0 or
GetSum(n, arr[1:]) or
GetSum(n-arr[0], arr[1:]))
def ArrayAddition(arr):
arrs = sorted(arr)
return GetSum(arrs[-1], arrs[:-1])
print ArrayAddition([2,95,96,97,98,99,100])
The GetSum function returns False when the required sum is non-zero and there are no items in the array. Then it checks for 3 cases:
If the required sum, n, is zero then the goal is achieved.
If we can get the sum with the remaining items after the first item is removed, then the goal is achieved.
If we can get the required sum minus the first element of the list on the rest of the list the goal is achieved.
Your solution doesn't work.
>>> ArrayAddition([10, 11, 20, 21, 30, 31, 60])
False
The simple solution is to use itertools to iterate over all subsets of the input (that don't contain the largest number):
def subsetsum(l):
l = list(l)
target = max(l)
l.remove(l)
for subset_size in xrange(1+len(l)):
for subset in itertools.combinations(l, subset_size):
if sum(subset) == target:
return True
return False
If you want to avoid itertools, you'll need to generate subsets directly. That can be accomplished by counting in binary and using the set bits to determine which elements to pick:
def subsetsum(l):
l = list(l)
target = max(l)
l.remove(l)
for subset_index in xrange(2**len(l)):
subtotal = 0
for i, num in enumerate(l):
# If bit i is set in subset_index
if subset_index & (1 << i):
subtotal += num
if subtotal == target:
return True
return False
Update: I forgot that you want to check all possible combinations. Use this instead:
def ArrayAddition(l):
for length in range(2, len(l)):
for lst in itertools.combinations(l, length):
if sum(lst) in l:
print(lst, sum(lst))
return True
return False
One-liner solution:
>>> any(any(sum(lst) in l for lst in itertools.combinations(l, length)) for length in range(2, len(l)))
Hope this helps!
Generate all the sums of the powerset and test them against the max
def ArrayAddition(L):
return any(sum(k for j,k in enumerate(L) if 1<<j&i)==max(L) for i in range(1<<len(L)))
You could improve this by doing some preprocessing - find the max first and remove it from L
One more way to do it...
Code:
import itertools
def func(l):
m = max(l)
rem = [itertools.combinations([x for x in l if not x == m],i) for i in range(2,len(l)-1)]
print [item for i in rem for item in i if sum(item)==m ]
if __name__=='__main__':
func([1,2,3,4,5])
Output:
[(1, 4), (2, 3)]
Hope this helps.. :)
If I understood the question correctly, simply this should return what you want:
2*max(a)<=sum(a)

Python: Check the occurrences in a list against a value

lst = [1,2,3,4,1]
I want to know 1 occurs twice in this list, is there any efficient way to do?
lst.count(1) would return the number of times it occurs. If you're going to be counting items in a list, O(n) is what you're going to get.
The general function on the list is list.count(x), and will return the number of times x occurs in a list.
Are you asking whether every item in the list is unique?
len(set(lst)) == len(lst)
Whether 1 occurs more than once?
lst.count(1) > 1
Note that the above is not maximally efficient, because it won't short-circuit -- even if 1 occurs twice, it will still count the rest of the occurrences. If you want it to short-circuit you will have to write something a little more complicated.
Whether the first element occurs more than once?
lst[0] in lst[1:]
How often each element occurs?
import collections
collections.Counter(lst)
Something else?
For multiple occurrences, this give you the index of each occurence:
>>> lst=[1,2,3,4,5,1]
>>> tgt=1
>>> found=[]
>>> for index, suspect in enumerate(lst):
... if(tgt==suspect):
... found.append(index)
...
>>> print len(found), "found at index:",", ".join(map(str,found))
2 found at index: 0, 5
If you want the count of each item in the list:
>>> lst=[1,2,3,4,5,2,2,1,5,5,5,5,6]
>>> count={}
>>> for item in lst:
... count[item]=lst.count(item)
...
>>> count
{1: 2, 2: 3, 3: 1, 4: 1, 5: 5, 6: 1}
def valCount(lst):
res = {}
for v in lst:
try:
res[v] += 1
except KeyError:
res[v] = 1
return res
u = [ x for x,y in valCount(lst).iteritems() if y > 1 ]
u is now a list of all values which appear more than once.
Edit:
#katrielalex: thank you for pointing out collections.Counter, of which I was not previously aware. It can also be written more concisely using a collections.defaultdict, as demonstrated in the following tests. All three methods are roughly O(n) and reasonably close in run-time performance (using collections.defaultdict is in fact slightly faster than collections.Counter).
My intention was to give an easy-to-understand response to what seemed a relatively unsophisticated request. Given that, are there any other senses in which you consider it "bad code" or "done poorly"?
import collections
import random
import time
def test1(lst):
res = {}
for v in lst:
try:
res[v] += 1
except KeyError:
res[v] = 1
return res
def test2(lst):
res = collections.defaultdict(lambda: 0)
for v in lst:
res[v] += 1
return res
def test3(lst):
return collections.Counter(lst)
def rndLst(lstLen):
r = random.randint
return [r(0,lstLen) for i in xrange(lstLen)]
def timeFn(fn, *args):
st = time.clock()
res = fn(*args)
return time.clock() - st
def main():
reps = 5000
res = []
tests = [test1, test2, test3]
for t in xrange(reps):
lstLen = random.randint(10,50000)
lst = rndLst(lstLen)
res.append( [lstLen] + [timeFn(fn, lst) for fn in tests] )
res.sort()
return res
And the results, for random lists containing up to 50,000 items, are as follows:
(Vertical axis is time in seconds, horizontal axis is number of items in list)
Another way to get all items that occur more than once:
lst = [1,2,3,4,1]
d = {}
for x in lst:
d[x] = x in d
print d[1] # True
print d[2] # False
print [x for x in d if d[x]] # [1]
You could also sort the list which is O(n*log(n)), then check the adjacent elements for equality, which is O(n). The result is O(n*log(n)). This has the disadvantage of requiring the entire list be sorted before possibly bailing when a duplicate is found.
For a large list with a relatively rare duplicates, this could be the about the best you can do. The best way to approach this really does depend on the size of the data involved and its nature.

Find the most common element in a list

What is an efficient way to find the most common element in a Python list?
My list items may not be hashable so can't use a dictionary.
Also in case of draws the item with the lowest index should be returned. Example:
>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'
A simpler one-liner:
def most_common(lst):
return max(set(lst), key=lst.count)
Borrowing from here, this can be used with Python 2.7:
from collections import Counter
def Most_Common(lst):
data = Counter(lst)
return data.most_common(1)[0][0]
Works around 4-6 times faster than Alex's solutions, and is 50 times faster than the one-liner proposed by newacct.
On CPython 3.6+ (any Python 3.7+) the above will select the first seen element in case of ties. If you're running on older Python, to retrieve the element that occurs first in the list in case of ties you need to do two passes to preserve order:
# Only needed pre-3.6!
def most_common(lst):
data = Counter(lst)
return max(lst, key=data.get)
With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:
print most_common(['goose', 'duck', 'duck', 'goose'])
emits:
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose
As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).
groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].
Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.
This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:
def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]
same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)
Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).
from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]
What you want is known in statistics as mode, and Python of course has a built-in function to do exactly that for you:
>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3
Note that if there is no "most common element" such as cases where the top two are tied, this will raise StatisticsError on Python
<=3.7, and on 3.8 onwards it will return the first one encountered.
Without the requirement about the lowest index, you can use collections.Counter for this:
from collections import Counter
a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]
c = Counter(a)
print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'
If they are not hashable, you can sort them and do a single loop over the result counting the items (identical items will be next to each other). But it might be faster to make them hashable and use a dict.
def most_common(lst):
cur_length = 0
max_length = 0
cur_i = 0
max_i = 0
cur_item = None
max_item = None
for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
if cur_item is None or cur_item != item:
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
max_length = cur_length
max_i = cur_i
max_item = cur_item
cur_length = 1
cur_i = i
cur_item = item
else:
cur_length += 1
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
return cur_item
return max_item
This is an O(n) solution.
mydict = {}
cnt, itm = 0, ''
for item in reversed(lst):
mydict[item] = mydict.get(item, 0) + 1
if mydict[item] >= cnt :
cnt, itm = mydict[item], item
print itm
(reversed is used to make sure that it returns the lowest index item)
Sort a copy of the list and find the longest run. You can decorate the list before sorting it with the index of each element, and then choose the run that starts with the lowest index in the case of a tie.
A one-liner:
def most_common (lst):
return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]
I am doing this using scipy stat module and lambda:
import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))
Result:
most_freq_val = 5
# use Decorate, Sort, Undecorate to solve the problem
def most_common(iterable):
# Make a list with tuples: (item, index)
# The index will be used later to break ties for most common item.
lst = [(x, i) for i, x in enumerate(iterable)]
lst.sort()
# lst_final will also be a list of tuples: (count, index, item)
# Sorting on this list will find us the most common item, and the index
# will break ties so the one listed first wins. Count is negative so
# largest count will have lowest value and sort first.
lst_final = []
# Get an iterator for our new list...
itr = iter(lst)
# ...and pop the first tuple off. Setup current state vars for loop.
count = 1
tup = next(itr)
x_cur, i_cur = tup
# Loop over sorted list of tuples, counting occurrences of item.
for tup in itr:
# Same item again?
if x_cur == tup[0]:
# Yes, same item; increment count
count += 1
else:
# No, new item, so write previous current item to lst_final...
t = (-count, i_cur, x_cur)
lst_final.append(t)
# ...and reset current state vars for loop.
x_cur, i_cur = tup
count = 1
# Write final item after loop ends
t = (-count, i_cur, x_cur)
lst_final.append(t)
lst_final.sort()
answer = lst_final[0][2]
return answer
print most_common(['x', 'e', 'a', 'e', 'a', 'e', 'e']) # prints 'e'
print most_common(['goose', 'duck', 'duck', 'goose']) # prints 'goose'
Building on Luiz's answer, but satisfying the "in case of draws the item with the lowest index should be returned" condition:
from statistics import mode, StatisticsError
def most_common(l):
try:
return mode(l)
except StatisticsError as e:
# will only return the first element if no unique mode found
if 'no unique mode' in e.args[0]:
return l[0]
# this is for "StatisticsError: no mode for empty data"
# after calling mode([])
raise
Example:
>>> most_common(['a', 'b', 'b'])
'b'
>>> most_common([1, 2])
1
>>> most_common([])
StatisticsError: no mode for empty data
Simple one line solution
moc= max([(lst.count(chr),chr) for chr in set(lst)])
It will return most frequent element with its frequency.
You probably don't need this anymore, but this is what I did for a similar problem. (It looks longer than it is because of the comments.)
itemList = ['hi', 'hi', 'hello', 'bye']
counter = {}
maxItemCount = 0
for item in itemList:
try:
# Referencing this will cause a KeyError exception
# if it doesn't already exist
counter[item]
# ... meaning if we get this far it didn't happen so
# we'll increment
counter[item] += 1
except KeyError:
# If we got a KeyError we need to create the
# dictionary key
counter[item] = 1
# Keep overwriting maxItemCount with the latest number,
# if it's higher than the existing itemCount
if counter[item] > maxItemCount:
maxItemCount = counter[item]
mostPopularItem = item
print mostPopularItem
ans = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)
all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())
4
print(all_ans[max_key])
1
#This will return the list sorted by frequency:
def orderByFrequency(list):
listUniqueValues = np.unique(list)
listQty = []
listOrderedByFrequency = []
for i in range(len(listUniqueValues)):
listQty.append(list.count(listUniqueValues[i]))
for i in range(len(listQty)):
index_bigger = np.argmax(listQty)
for j in range(listQty[index_bigger]):
listOrderedByFrequency.append(listUniqueValues[index_bigger])
listQty[index_bigger] = -1
return listOrderedByFrequency
#And this will return a list with the most frequent values in a list:
def getMostFrequentValues(list):
if (len(list) <= 1):
return list
list_most_frequent = []
list_ordered_by_frequency = orderByFrequency(list)
list_most_frequent.append(list_ordered_by_frequency[0])
frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
index = 0
while(index < len(list_ordered_by_frequency)):
index = index + frequency
if(index < len(list_ordered_by_frequency)):
testValue = list_ordered_by_frequency[index]
testValueFrequency = list_ordered_by_frequency.count(testValue)
if (testValueFrequency == frequency):
list_most_frequent.append(testValue)
else:
break
return list_most_frequent
#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))
Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]
Here:
def most_common(l):
max = 0
maxitem = None
for x in set(l):
count = l.count(x)
if count > max:
max = count
maxitem = x
return maxitem
I have a vague feeling there is a method somewhere in the standard library that will give you the count of each element, but I can't find it.
This is the obvious slow solution (O(n^2)) if neither sorting nor hashing is feasible, but equality comparison (==) is available:
def most_common(items):
if not items:
raise ValueError
fitems = []
best_idx = 0
for item in items:
item_missing = True
i = 0
for fitem in fitems:
if fitem[0] == item:
fitem[1] += 1
d = fitem[1] - fitems[best_idx][1]
if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
best_idx = i
item_missing = False
break
i += 1
if item_missing:
fitems.append([item, 1, i])
return items[best_idx]
But making your items hashable or sortable (as recommended by other answers) would almost always make finding the most common element faster if the length of your list (n) is large. O(n) on average with hashing, and O(n*log(n)) at worst for sorting.
>>> li = ['goose', 'duck', 'duck']
>>> def foo(li):
st = set(li)
mx = -1
for each in st:
temp = li.count(each):
if mx < temp:
mx = temp
h = each
return h
>>> foo(li)
'duck'
I needed to do this in a recent program. I'll admit it, I couldn't understand Alex's answer, so this is what I ended up with.
def mostPopular(l):
mpEl=None
mpIndex=0
mpCount=0
curEl=None
curCount=0
for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
curCount=curCount+1 if el==curEl else 1
curEl=el
if curCount>mpCount \
or (curCount==mpCount and i<mpIndex):
mpEl=curEl
mpIndex=i
mpCount=curCount
return mpEl, mpCount, mpIndex
I timed it against Alex's solution and it's about 10-15% faster for short lists, but once you go over 100 elements or more (tested up to 200000) it's about 20% slower.
def most_frequent(List):
counter = 0
num = List[0]
for i in List:
curr_frequency = List.count(i)
if(curr_frequency> counter):
counter = curr_frequency
num = i
return num
List = [2, 1, 2, 2, 1, 3]
print(most_frequent(List))
Hi this is a very simple solution, with linear time complexity
L = ['goose', 'duck', 'duck']
def most_common(L):
current_winner = 0
max_repeated = None
for i in L:
amount_times = L.count(i)
if amount_times > current_winner:
current_winner = amount_times
max_repeated = i
return max_repeated
print(most_common(L))
"duck"
Where number, is the element in the list that repeats most of the time
numbers = [1, 3, 7, 4, 3, 0, 3, 6, 3]
max_repeat_num = max(numbers, key=numbers.count) *# which number most* frequently
max_repeat = numbers.count(max_repeat_num) *#how many times*
print(f" the number {max_repeat_num} is repeated{max_repeat} times")
def mostCommonElement(list):
count = {} // dict holder
max = 0 // keep track of the count by key
result = None // holder when count is greater than max
for i in list:
if i not in count:
count[i] = 1
else:
count[i] += 1
if count[i] > max:
max = count[i]
result = i
return result
mostCommonElement(["a","b","a","c"]) -> "a"
The most common element should be the one which is appearing more than N/2 times in the array where N being the len(array). The below technique will do it in O(n) time complexity, with just consuming O(1) auxiliary space.
from collections import Counter
def majorityElement(arr):
majority_elem = Counter(arr)
size = len(arr)
for key, val in majority_elem.items():
if val > size/2:
return key
return -1
def most_common(lst):
if max([lst.count(i)for i in lst]) == 1:
return False
else:
return max(set(lst), key=lst.count)
def popular(L):
C={}
for a in L:
C[a]=L.count(a)
for b in C.keys():
if C[b]==max(C.values()):
return b
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4]
print popular(L)

Categories

Resources