Remove similar items in a list in Python

Remove similar items in a list in Python - python

How do you remove similar items in a list in Python but only for a given item. Example,
l = list('need')
If 'e' is the given item then
l = list('nd')
The set() function will not do the trick since it will remove all duplicates.
count() and remove() is not efficient.

use filter
assuming you write function that decide on the items that you want to keep in the list.
for your example
def pred(x):
return x!="e"
l=list("need")
l=list(filter(pred,l))

Assuming given = 'e' and l= list('need').
for i in range(l.count(given)):
l.remove(given)

If you just want to replace 'e' from the list of words in a list, you can use regex re.sub(). If you also want a count of how many occurrences of e were removed from each word, then you can use re.subn(). The first one will provide you strings in a list. The second will provide you a tuple (string, n) where n is the number of occurrences.
import re
lst = list(('need','feed','seed','deed','made','weed','said'))
j = [re.sub('e','',i) for i in lst]
k = [re.subn('e','',i) for i in lst]
The output for j and k are :
j = ['nd', 'fd', 'sd', 'dd', 'mad', 'wd', 'said']
k = [('nd', 2), ('fd', 2), ('sd', 2), ('dd', 2), ('mad', 1), ('wd', 2), ('said', 0)]
If you want to count the total changes made, just iterate thru k and sum it. There are other simpler ways too. You can simply use regEx
re.subn('e','',''.join(lst))[1]
This will give you total number of items replaced in the list.

List comprehension Method. Not sure if the size/complexity is less than that of count and remove.
def scrub(l, given):
return [i for i in l if i not in given]
Filter method, again i'm not sure
def filter_by(l, given):
return list(filter(lambda x: x not in given, l))
Bruteforce with recursion but there are a lot of potential downfalls. Still an option. Again I don't know the size/comp
def bruteforce(l, given):
try:
l.remove(given[0])
return bruteforce(l, given)
except ValueError:
return bruteforce(l, given[1:])
except IndexError:
return l
return l
For those of you curious as to the actual time associated with the above methods, i've taken the liberty to test them below!
Below is the method I've chosen to use.
def timer(func, name):
print("-------{}-------".format(name))
try:
start = datetime.datetime.now()
x = func()
end = datetime.datetime.now()
print((end-start).microseconds)
except Exception, e:
print("Failed: {}".format(e))
print("\r")
The dataset we are testing against. Where l is our original list and q is the items we want to remove, and r is our expected result.
l = list("need"*50000)
q = list("ne")
r = list("d"*50000)
For posterity I've added the count / remove method the OP was against. (For good reason!)
def count_remove(l, given):
for i in given:
for x in range(l.count(i)):
l.remove(i)
return l
All that's left to do is test!
timer(lambda: scrub(l, q), "List Comp")
assert(scrub(l,q) == r)
timer(lambda: filter_by(l, q), "Filter")
assert(filter_by(l,q) == r)
timer(lambda : count_remove(l, q), "Count/Remove")
assert(count_remove(l,q) == r)
timer(lambda: bruteforce(l, q), "Bruteforce")
assert(bruteforce(l,q) == r)
And our results
-------List Comp-------
10000
-------Filter-------
28000
-------Count/Remove-------
199000
-------Bruteforce-------
Failed: maximum recursion depth exceeded
Process finished with exit code 0
The Recursion method failed with a larger dataset, but we expected this. I tested on smaller datasets, and Recursion is marginally slower. I thought it would be faster.

Related

finding repeated substring in k length in a string using function

I just started using function and I'm trying to build one that's find a repeated substring that is length is at least k and returns the results into tuple that contains a dict.
the keys needs to be the substring and the value is how many times it was repeated, and then add to the tuple the length of the substring.
I just started but I didnt really knew how to continue but this is what I tried to do:
def longest_repeat(string, K)
longest = {} ,
if isinstance(K, int) and isinstance(string, str)
for sub_str in string:
if sub_str >= K:
longest[0][sub_seq] = DNA_seq_slic = []
a=0
b=k
for nuc in range(len(DNA_seq)-k+1):
DNA_seq_slic.append(DNA_seq[a:b])
a +=1
b +=1
import collections
for sub_seq in DNA_seq_slic:
repeated = [item for item, count in collections.Counter(DNA_seq_slic).items() if count > 1]
repeated_subseq_dict = dict(zip(repeated,[0 for x in range(0,len(repeated))]))
for key in repeated_subseq_dict:
repeated_subseq_dict[key] = DNA_seq_slic.count(key)
return(repeated_subseq_dict)
Im sorry if its a little bit messed up, I didnt really had direction and I tried to use other function I built to solve this and it didnt really worked. I can clarify more if needed.
the output should be something like this:
longest_repeated("ATAATACATAATA", 5)
output: longest = {ATAATA: 2} , 6
Really appreciate any kind of help! Thanks!

You can try re module:
import re
def longest_repeated(s, k):
m = re.findall(f"(.{{{k},}})(?=.*\\1)", s)
if m:
mx = max(m, key=len)
return {mx: s.count(mx)}, len(mx)
Some tests:
print(longest_repeated("ATAATACATAATA", 5))
({'ATAATA': 2}, 6)
print(longest_repeated("XXXXXATAATACATAATAXXXXX", 5))
({'ATAATA': 2}, 6)

Phone number sequence predictor

I have an Phone number here lets say : 98888888xx the last two XX are the numbers which I want to generate a sequence of like 988888811 , 9888888812, 988888813 and so on.
I am trying to learn python programming so can someone help me how would I go on writing a script for the same

You could use a list comprehension and range():
['98888888' + str(number).zfill(2) for number in range(100)]
['9888888800',
'9888888801',
'9888888802',
...
'9888888897',
'9888888898',
'9888888899']

A robust solution is to use recursion to handle many possible occurrences of "x":
import re
s = '98888888xx'
_len = len(re.sub('\d+', '', s))
def combos(d, current = []):
if len(current) == _len:
yield current
else:
for i in d[0]:
yield from combos(d[1:], current+[i])
_c = combos([range(1, 10)]*_len)
new_result = [(lambda d:re.sub('x', lambda _:str(next(d)), s))(iter(i)) for i in _c]
Output:
['9888888811', '9888888812', '9888888813', '9888888814', '9888888815', '9888888816', '9888888817', '9888888818', '9888888819', '9888888821', '9888888822', '9888888823', '9888888824', '9888888825', '9888888826', '9888888827', '9888888828', '9888888829', '9888888831', '9888888832', '9888888833', '9888888834', '9888888835', '9888888836', '9888888837', '9888888838', '9888888839', '9888888841', '9888888842', '9888888843', '9888888844', '9888888845', '9888888846', '9888888847', '9888888848', '9888888849', '9888888851', '9888888852', '9888888853', '9888888854', '9888888855', '9888888856', '9888888857', '9888888858', '9888888859', '9888888861', '9888888862', '9888888863', '9888888864', '9888888865', '9888888866', '9888888867', '9888888868', '9888888869', '9888888871', '9888888872', '9888888873', '9888888874', '9888888875', '9888888876', '9888888877', '9888888878', '9888888879', '9888888881', '9888888882', '9888888883', '9888888884', '9888888885', '9888888886', '9888888887', '9888888888', '9888888889', '9888888891', '9888888892', '9888888893', '9888888894', '9888888895', '9888888896', '9888888897', '9888888898', '9888888899']
Note that a simple solution in the form of a nested list comprehension presents itself when there are only two 'x':
d = [s.replace('x', '{}').format(a, b) for a in range(1, 10) for b in range(1, 10)]
However, multiple nested loops is not a clean approach to solving the problem when the input string contains three or more 'x's. Instead, recursion works best:
s = '98888888xxxxx'
_len = len(re.sub('\d+', '', s))
_c = combos([range(1, 10)]*_len)
new_result = [(lambda d:re.sub('x', lambda _:str(next(d)), s))(iter(i)) for i in _c]
Output (first twenty strings):
['9888888811111', '9888888811112', '9888888811113', '9888888811114', '9888888811115', '9888888811116', '9888888811117', '9888888811118', '9888888811119', '9888888811121', '9888888811122', '9888888811123', '9888888811124', '9888888811125', '9888888811126', '9888888811127', '9888888811128', '9888888811129', '9888888811131', '9888888811132']

Get sum of every pair unique pair in the array

I am doing this array pair sum problem which says:
Given an integer array, output all the unique pairs that sum up to a specific value k.
So the input:
pair_sum([1,3,2,2],4)
would return 2 pairs:
(1,3)
(2,2)
Seems a pretty simple problem. This is what I my script looks like:
def pair_sum(arr,k):
count =0
l = []
y = []
for i in range(len(arr)-1):
l.append((arr[i], arr[i+1]))
y.append((arr[i-1],arr[i]))
res = list(set(y+l))
for i,j in res:
if i + j == k:
print (i,j)
count +=1
return count
And I is working fine with few test cases except this one:
pair_sum([1,9,2,8,3,7,4,6,5,5,13,14,11,13,-1],10)
My code is returning 5 but it should return 6. I know that this can be manually calculated and then I can check which pair is missing but what if the test case is sufficiently large. I can't check manually for each case.
Can anyone tell what am I doing wrong here because I might have considered almost every possible pair. Also, like what if I don't want to use set here like can there be a more general solution which is not specific to any language. Because how can you get unique pair without using set?

I can't vouch for the efficiency of this, but:
[(x,y)
for (i,x) in enumerate(arr)
for (j,y) in enumerate(arr)
if i < j and x+y == k]
Just loop over all pairs, keeping only those whose first element comes before the second so as to avoid duplicates.

What I did is I added the elements to the set and took the difference between my sum and the element of list. If difference is in my set then I paired them
def pair_sum(arr,k):
count = 0
seen = set()
output = set()
for i in arr:
num = k - i
if num not in seen:
seen.add(i)
else:
output.add( (num,i) )
return len(output)

You can do this by using the pairwise() recipe shown in the itertools documentation as shown below.
import itertools
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = itertools.tee(iterable)
next(b, None)
return zip(a, b)
def pair_sum(arr, k):
for pair in (pair for pair in pairwise(arr) if sum(pair) == k):
print(pair)
pair_sum([1, 3, 2, 2], 4)
print()
pair_sum([1, 2, 3, 1], 3)
Output:
(1, 3)
(2, 2)
(1, 2)

def aps(arr, k):
if len(arr)<2:
return False
seen = set()
output = set()
for num in arr:
target = k - num
if target not in seen:
seen.add(num)
else:
output.add((min(num,target), max(num,target)))
print(output)
print(seen)
return len(output)

Python: Check the occurrences in a list against a value

lst = [1,2,3,4,1]
I want to know 1 occurs twice in this list, is there any efficient way to do?

lst.count(1) would return the number of times it occurs. If you're going to be counting items in a list, O(n) is what you're going to get.
The general function on the list is list.count(x), and will return the number of times x occurs in a list.

Are you asking whether every item in the list is unique?
len(set(lst)) == len(lst)
Whether 1 occurs more than once?
lst.count(1) > 1
Note that the above is not maximally efficient, because it won't short-circuit -- even if 1 occurs twice, it will still count the rest of the occurrences. If you want it to short-circuit you will have to write something a little more complicated.
Whether the first element occurs more than once?
lst[0] in lst[1:]
How often each element occurs?
import collections
collections.Counter(lst)
Something else?

For multiple occurrences, this give you the index of each occurence:
>>> lst=[1,2,3,4,5,1]
>>> tgt=1
>>> found=[]
>>> for index, suspect in enumerate(lst):
... if(tgt==suspect):
... found.append(index)
...
>>> print len(found), "found at index:",", ".join(map(str,found))
2 found at index: 0, 5
If you want the count of each item in the list:
>>> lst=[1,2,3,4,5,2,2,1,5,5,5,5,6]
>>> count={}
>>> for item in lst:
... count[item]=lst.count(item)
...
>>> count
{1: 2, 2: 3, 3: 1, 4: 1, 5: 5, 6: 1}

def valCount(lst):
res = {}
for v in lst:
try:
res[v] += 1
except KeyError:
res[v] = 1
return res
u = [ x for x,y in valCount(lst).iteritems() if y > 1 ]
u is now a list of all values which appear more than once.
Edit:
#katrielalex: thank you for pointing out collections.Counter, of which I was not previously aware. It can also be written more concisely using a collections.defaultdict, as demonstrated in the following tests. All three methods are roughly O(n) and reasonably close in run-time performance (using collections.defaultdict is in fact slightly faster than collections.Counter).
My intention was to give an easy-to-understand response to what seemed a relatively unsophisticated request. Given that, are there any other senses in which you consider it "bad code" or "done poorly"?
import collections
import random
import time
def test1(lst):
res = {}
for v in lst:
try:
res[v] += 1
except KeyError:
res[v] = 1
return res
def test2(lst):
res = collections.defaultdict(lambda: 0)
for v in lst:
res[v] += 1
return res
def test3(lst):
return collections.Counter(lst)
def rndLst(lstLen):
r = random.randint
return [r(0,lstLen) for i in xrange(lstLen)]
def timeFn(fn, *args):
st = time.clock()
res = fn(*args)
return time.clock() - st
def main():
reps = 5000
res = []
tests = [test1, test2, test3]
for t in xrange(reps):
lstLen = random.randint(10,50000)
lst = rndLst(lstLen)
res.append( [lstLen] + [timeFn(fn, lst) for fn in tests] )
res.sort()
return res
And the results, for random lists containing up to 50,000 items, are as follows:
(Vertical axis is time in seconds, horizontal axis is number of items in list)

Another way to get all items that occur more than once:
lst = [1,2,3,4,1]
d = {}
for x in lst:
d[x] = x in d
print d[1] # True
print d[2] # False
print [x for x in d if d[x]] # [1]

You could also sort the list which is O(n*log(n)), then check the adjacent elements for equality, which is O(n). The result is O(n*log(n)). This has the disadvantage of requiring the entire list be sorted before possibly bailing when a duplicate is found.
For a large list with a relatively rare duplicates, this could be the about the best you can do. The best way to approach this really does depend on the size of the data involved and its nature.

Find the most common element in a list

What is an efficient way to find the most common element in a Python list?
My list items may not be hashable so can't use a dictionary.
Also in case of draws the item with the lowest index should be returned. Example:
>>> most_common(['duck', 'duck', 'goose'])
'duck'
>>> most_common(['goose', 'duck', 'duck', 'goose'])
'goose'

A simpler one-liner:
def most_common(lst):
return max(set(lst), key=lst.count)

Borrowing from here, this can be used with Python 2.7:
from collections import Counter
def Most_Common(lst):
data = Counter(lst)
return data.most_common(1)[0][0]
Works around 4-6 times faster than Alex's solutions, and is 50 times faster than the one-liner proposed by newacct.
On CPython 3.6+ (any Python 3.7+) the above will select the first seen element in case of ties. If you're running on older Python, to retrieve the element that occurs first in the list in case of ties you need to do two passes to preserve order:
# Only needed pre-3.6!
def most_common(lst):
data = Counter(lst)
return max(lst, key=data.get)

With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:
import itertools
import operator
def most_common(L):
# get an iterable of (item, iterable) pairs
SL = sorted((x, i) for i, x in enumerate(L))
# print 'SL:', SL
groups = itertools.groupby(SL, key=operator.itemgetter(0))
# auxiliary function to get "quality" for an item
def _auxfun(g):
item, iterable = g
count = 0
min_index = len(L)
for _, where in iterable:
count += 1
min_index = min(min_index, where)
# print 'item %r, count %r, minind %r' % (item, count, min_index)
return count, -min_index
# pick the highest-count/earliest item
return max(groups, key=_auxfun)[0]
This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:
print most_common(['goose', 'duck', 'duck', 'goose'])
emits:
SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose
As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).
groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].
Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.
This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:
def most_common(L):
groups = itertools.groupby(sorted(L))
def _auxfun((item, iterable)):
return len(list(iterable)), -L.index(item)
return max(groups, key=_auxfun)[0]
same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)
Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).
from itertools import groupby as g
def most_common_oneliner(L):
return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]

What you want is known in statistics as mode, and Python of course has a built-in function to do exactly that for you:
>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3
Note that if there is no "most common element" such as cases where the top two are tied, this will raise StatisticsError on Python
<=3.7, and on 3.8 onwards it will return the first one encountered.

Without the requirement about the lowest index, you can use collections.Counter for this:
from collections import Counter
a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801]
c = Counter(a)
print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'

If they are not hashable, you can sort them and do a single loop over the result counting the items (identical items will be next to each other). But it might be faster to make them hashable and use a dict.
def most_common(lst):
cur_length = 0
max_length = 0
cur_i = 0
max_i = 0
cur_item = None
max_item = None
for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
if cur_item is None or cur_item != item:
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
max_length = cur_length
max_i = cur_i
max_item = cur_item
cur_length = 1
cur_i = i
cur_item = item
else:
cur_length += 1
if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
return cur_item
return max_item

This is an O(n) solution.
mydict = {}
cnt, itm = 0, ''
for item in reversed(lst):
mydict[item] = mydict.get(item, 0) + 1
if mydict[item] >= cnt :
cnt, itm = mydict[item], item
print itm
(reversed is used to make sure that it returns the lowest index item)

Sort a copy of the list and find the longest run. You can decorate the list before sorting it with the index of each element, and then choose the run that starts with the lowest index in the case of a tie.

A one-liner:
def most_common (lst):
return max(((item, lst.count(item)) for item in set(lst)), key=lambda a: a[1])[0]

I am doing this using scipy stat module and lambda:
import scipy.stats
lst = [1,2,3,4,5,6,7,5]
most_freq_val = lambda x: scipy.stats.mode(x)[0][0]
print(most_freq_val(lst))
Result:
most_freq_val = 5

# use Decorate, Sort, Undecorate to solve the problem
def most_common(iterable):
# Make a list with tuples: (item, index)
# The index will be used later to break ties for most common item.
lst = [(x, i) for i, x in enumerate(iterable)]
lst.sort()
# lst_final will also be a list of tuples: (count, index, item)
# Sorting on this list will find us the most common item, and the index
# will break ties so the one listed first wins. Count is negative so
# largest count will have lowest value and sort first.
lst_final = []
# Get an iterator for our new list...
itr = iter(lst)
# ...and pop the first tuple off. Setup current state vars for loop.
count = 1
tup = next(itr)
x_cur, i_cur = tup
# Loop over sorted list of tuples, counting occurrences of item.
for tup in itr:
# Same item again?
if x_cur == tup[0]:
# Yes, same item; increment count
count += 1
else:
# No, new item, so write previous current item to lst_final...
t = (-count, i_cur, x_cur)
lst_final.append(t)
# ...and reset current state vars for loop.
x_cur, i_cur = tup
count = 1
# Write final item after loop ends
t = (-count, i_cur, x_cur)
lst_final.append(t)
lst_final.sort()
answer = lst_final[0][2]
return answer
print most_common(['x', 'e', 'a', 'e', 'a', 'e', 'e']) # prints 'e'
print most_common(['goose', 'duck', 'duck', 'goose']) # prints 'goose'

Building on Luiz's answer, but satisfying the "in case of draws the item with the lowest index should be returned" condition:
from statistics import mode, StatisticsError
def most_common(l):
try:
return mode(l)
except StatisticsError as e:
# will only return the first element if no unique mode found
if 'no unique mode' in e.args[0]:
return l[0]
# this is for "StatisticsError: no mode for empty data"
# after calling mode([])
raise
Example:
>>> most_common(['a', 'b', 'b'])
'b'
>>> most_common([1, 2])
1
>>> most_common([])
StatisticsError: no mode for empty data

Simple one line solution
moc= max([(lst.count(chr),chr) for chr in set(lst)])
It will return most frequent element with its frequency.

You probably don't need this anymore, but this is what I did for a similar problem. (It looks longer than it is because of the comments.)
itemList = ['hi', 'hi', 'hello', 'bye']
counter = {}
maxItemCount = 0
for item in itemList:
try:
# Referencing this will cause a KeyError exception
# if it doesn't already exist
counter[item]
# ... meaning if we get this far it didn't happen so
# we'll increment
counter[item] += 1
except KeyError:
# If we got a KeyError we need to create the
# dictionary key
counter[item] = 1
# Keep overwriting maxItemCount with the latest number,
# if it's higher than the existing itemCount
if counter[item] > maxItemCount:
maxItemCount = counter[item]
mostPopularItem = item
print mostPopularItem

ans = [1, 1, 0, 0, 1, 1]
all_ans = {ans.count(ans[i]): ans[i] for i in range(len(ans))}
print(all_ans)
all_ans={4: 1, 2: 0}
max_key = max(all_ans.keys())
4
print(all_ans[max_key])
1

#This will return the list sorted by frequency:
def orderByFrequency(list):
listUniqueValues = np.unique(list)
listQty = []
listOrderedByFrequency = []
for i in range(len(listUniqueValues)):
listQty.append(list.count(listUniqueValues[i]))
for i in range(len(listQty)):
index_bigger = np.argmax(listQty)
for j in range(listQty[index_bigger]):
listOrderedByFrequency.append(listUniqueValues[index_bigger])
listQty[index_bigger] = -1
return listOrderedByFrequency
#And this will return a list with the most frequent values in a list:
def getMostFrequentValues(list):
if (len(list) <= 1):
return list
list_most_frequent = []
list_ordered_by_frequency = orderByFrequency(list)
list_most_frequent.append(list_ordered_by_frequency[0])
frequency = list_ordered_by_frequency.count(list_ordered_by_frequency[0])
index = 0
while(index < len(list_ordered_by_frequency)):
index = index + frequency
if(index < len(list_ordered_by_frequency)):
testValue = list_ordered_by_frequency[index]
testValueFrequency = list_ordered_by_frequency.count(testValue)
if (testValueFrequency == frequency):
list_most_frequent.append(testValue)
else:
break
return list_most_frequent
#tests:
print(getMostFrequentValues([]))
print(getMostFrequentValues([1]))
print(getMostFrequentValues([1,1]))
print(getMostFrequentValues([2,1]))
print(getMostFrequentValues([2,2,1]))
print(getMostFrequentValues([1,2,1,2]))
print(getMostFrequentValues([1,2,1,2,2]))
print(getMostFrequentValues([3,2,3,5,6,3,2,2]))
print(getMostFrequentValues([1,2,2,60,50,3,3,50,3,4,50,4,4,60,60]))
Results:
[]
[1]
[1]
[1, 2]
[2]
[1, 2]
[2]
[2, 3]
[3, 4, 50, 60]

Here:
def most_common(l):
max = 0
maxitem = None
for x in set(l):
count = l.count(x)
if count > max:
max = count
maxitem = x
return maxitem
I have a vague feeling there is a method somewhere in the standard library that will give you the count of each element, but I can't find it.

This is the obvious slow solution (O(n^2)) if neither sorting nor hashing is feasible, but equality comparison (==) is available:
def most_common(items):
if not items:
raise ValueError
fitems = []
best_idx = 0
for item in items:
item_missing = True
i = 0
for fitem in fitems:
if fitem[0] == item:
fitem[1] += 1
d = fitem[1] - fitems[best_idx][1]
if d > 0 or (d == 0 and fitems[best_idx][2] > fitem[2]):
best_idx = i
item_missing = False
break
i += 1
if item_missing:
fitems.append([item, 1, i])
return items[best_idx]
But making your items hashable or sortable (as recommended by other answers) would almost always make finding the most common element faster if the length of your list (n) is large. O(n) on average with hashing, and O(n*log(n)) at worst for sorting.

>>> li = ['goose', 'duck', 'duck']
>>> def foo(li):
st = set(li)
mx = -1
for each in st:
temp = li.count(each):
if mx < temp:
mx = temp
h = each
return h
>>> foo(li)
'duck'

I needed to do this in a recent program. I'll admit it, I couldn't understand Alex's answer, so this is what I ended up with.
def mostPopular(l):
mpEl=None
mpIndex=0
mpCount=0
curEl=None
curCount=0
for i, el in sorted(enumerate(l), key=lambda x: (x[1], x[0]), reverse=True):
curCount=curCount+1 if el==curEl else 1
curEl=el
if curCount>mpCount \
or (curCount==mpCount and i<mpIndex):
mpEl=curEl
mpIndex=i
mpCount=curCount
return mpEl, mpCount, mpIndex
I timed it against Alex's solution and it's about 10-15% faster for short lists, but once you go over 100 elements or more (tested up to 200000) it's about 20% slower.

def most_frequent(List):
counter = 0
num = List[0]
for i in List:
curr_frequency = List.count(i)
if(curr_frequency> counter):
counter = curr_frequency
num = i
return num
List = [2, 1, 2, 2, 1, 3]
print(most_frequent(List))

Hi this is a very simple solution, with linear time complexity
L = ['goose', 'duck', 'duck']
def most_common(L):
current_winner = 0
max_repeated = None
for i in L:
amount_times = L.count(i)
if amount_times > current_winner:
current_winner = amount_times
max_repeated = i
return max_repeated
print(most_common(L))
"duck"
Where number, is the element in the list that repeats most of the time

numbers = [1, 3, 7, 4, 3, 0, 3, 6, 3]
max_repeat_num = max(numbers, key=numbers.count) *# which number most* frequently
max_repeat = numbers.count(max_repeat_num) *#how many times*
print(f" the number {max_repeat_num} is repeated{max_repeat} times")

def mostCommonElement(list):
count = {} // dict holder
max = 0 // keep track of the count by key
result = None // holder when count is greater than max
for i in list:
if i not in count:
count[i] = 1
else:
count[i] += 1
if count[i] > max:
max = count[i]
result = i
return result
mostCommonElement(["a","b","a","c"]) -> "a"

The most common element should be the one which is appearing more than N/2 times in the array where N being the len(array). The below technique will do it in O(n) time complexity, with just consuming O(1) auxiliary space.
from collections import Counter
def majorityElement(arr):
majority_elem = Counter(arr)
size = len(arr)
for key, val in majority_elem.items():
if val > size/2:
return key
return -1

def most_common(lst):
if max([lst.count(i)for i in lst]) == 1:
return False
else:
return max(set(lst), key=lst.count)

def popular(L):
C={}
for a in L:
C[a]=L.count(a)
for b in C.keys():
if C[b]==max(C.values()):
return b
L=[2,3,5,3,6,3,6,3,6,3,7,467,4,7,4]
print popular(L)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove similar items in a list in Python - python

How do you remove similar items in a list in Python but only for a given item. Example, l = list('need') If 'e' is the given item then l = list('nd') The set() function will not do the trick since it will remove all duplicates. count() and remove() is not efficient.

use filter assuming you write function that decide on the items that you want to keep in the list. for your example def pred(x): return x!="e" l=list("need") l=list(filter(pred,l))

Assuming given = 'e' and l= list('need'). for i in range(l.count(given)): l.remove(given)

Related

finding repeated substring in k length in a string using function

Phone number sequence predictor

Get sum of every pair unique pair in the array

Python: Check the occurrences in a list against a value

Find the most common element in a list

Categories

Resources