I am trying to find the item in a list with highest number of occurrence.
For this, I am trying to compare every item in list with all other items in list and increasing count's value by 1 each time a match is found.
def findInt(array):
count = []
count = [1 for i in range(0,len(array))]
for i,item in enumerate(array):
if (array[i] == array[i:]): #How can I compare it with all items except itself?
count[i]+=1
return max(count), count.index(max(count))
findInt(array=[1,2,3])
My question is "how do I compare the item with all other items except itself"?
use collections.Counter which has a most_common function.
import collections
def findInt(array):
c = collections.Counter(array)
return c.most_common(1)
DEMO
>>> import collections
>>> array=[1,2,3,1,2,3,2]
>>> c = collections.Counter(array)
>>> c.most_common(1)
[(2, 3)]
DOC
class collections.Counter([iterable-or-mapping])
A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.
most_common([n])
Return a list of the n most common elements and their counts from the most common to the least. If n is omitted or None, most_common() returns all elements in the counter. Elements with equal counts are ordered arbitrarily:
Whilst there exist many better ways of solving this problem, for instance as indicated in #zwer's comment to your question, here's how I would solve exactly what you're asking:
# O(n ** 2)
def find_int(array):
n = len(array)
count = [1 for i in range(n)]
for i in range(n):
for j in range(n):
if i == j: continue
if array[i] == array[j]:
count[i] += 1
return max(count), count.index(max(count))
# Worse than O(n ** 2)
def find_int_using_slice(array):
n = len(array)
count = [1 for i in range(n)]
for i in range(n):
for a_j in array[0:i] + array[i+1:]:
if array[i] == a_j:
count[i] += 1
return max(count), count.index(max(count))
print(find_int_using_slice([1,2,3,1,2,3,2]))
We're using a nested for-loop here and using continue to skip the iteration when the two indexes are the same.
Unless specifically for the purpose of learning, please consider using built-ins for common tasks this, as they are well implemented, tested, optimised, etc.
There are many potential solutions, but here are the two I'd recommend, depending on your application's requirements: 1) sort and count in a single pass from left to right: O(n * log(n)) and losing the original ordering, or 2) use a dictionary to maintain the counts, requiring only a single pass from left to right: O(n) but using more memory. Of course the better decision would be to use in-built methods which are highly optimised, but that's your call
Updated answer to reflect OP not wanting to use collections.Counter
Using setdefault to prime the counter for first occurrences, then increment the counter. Then you can use max with a key to find the most common item.
def most_common(ar):
y = {}
for item in ar:
y.setdefault(item, 0)
y[item] += 1
return max(y.items(), key=lambda x: x[1])
array = [1, 2, 1, 1, 2, 1, 3, 3, 1]
most_common(array)
(1, 5) # (Most common item, occurrences of item)
def findInt(array):
count = []
for i in range(len(array)):
count.append(array.count(array[i]))
return max(count), count.index(max(count))
print(findInt(array=[1,2,3,1,2,3,2]))
Fine, I'll bite - given that memory is cheap, hashing is preferred over looping. I'd reckon one of the most performant ways would be to use a temporary registry:
def findInt(array):
occurrences = dict.fromkeys(array, 0)
for element in array:
occurrences[element] += 1
items = occurrences.values()
max_occurences = max(items)
return occurrences.keys()[items.index(max_occurences)], max_occurences
Returns a tuple of the element that occurs the most, and the number of times it occurs.
Actually, let's optimize it even more - here's a pure O(N) solution with no extra list building and searching:
def findInt(array):
occurrences = dict.fromkeys(array, 0)
candidate = array[0]
occurs = 0
for element in array:
value = occurrences[element] + 1
occurrences[element] = value
if value > occurs:
candidate = element
occurs = value
return candidate, occurs
Counter is ideal for counting frequencies of items in an iterable. Alternatively, you can loop once with a defaultdict.
import operator as op
import collections as ct
def findInt(array):
dd = ct.defaultdict(int)
for item in array:
dd[item] += 1
return dd
# Frequencies
array = [1, 2, 1, 1, 2, 1, 3, 3, 1]
freq = findInt(array)
freq
# Out: defaultdict(int, {1: 5, 2: 2, 3: 2})
# Maximum key-value pair (2 options)
{k:v for k,v in freq.items() if k == max(freq, key=lambda x: freq[x])}
# Out: {1: 5}
max({k:v for k,v in freq.items()}.items(), key=op.itemgetter(-1))
# Out: (1: 5)
Related
A=[2,3,4,1] B=[1,2,3,4]
I need to find how many elements of list A appear before than the same element of list B. In this case values 2,3,4 and the expected return would be 3.
def count(a, b):
muuttuja = 0
for i in range(0, len(a)-1):
if a[i] != b[i] and a[i] not in b[:i]:
muuttuja += 1
return muuttuja
I have tried this kind of solution but it is very slow to process lists that have great number of values. I would appreciate some suggestions for alternative methods of doing the same thing but more efficiently. Thank you!
If both the lists have unique elements you can make a map of element (as key) and index (as value). This can be achieved using dictionary in python. Since, dictionary uses only O(1) time for lookup. This code will give a time complexity of O(n)
A=[2,3,4,1]
B=[1,2,3,4]
d = {}
count = 0
for i,ele in enumerate(A) :
d[ele] = i
for i,ele in enumerate(B) :
if i > d[ele] :
count+=1
Use a set of already seen B-values.
def count(A, B):
result = 0
seen = set()
for a, b in zip(A, B):
seen.add(b)
if a not in seen:
result += 1
return result
This only works if the values in your lists are immutable.
Your method is slow because it has a time complexity of O(N²): checking if an element exists in a list of length N is O(N), and you do this N times. We can do better by using up some more memory instead of time.
First, iterate over b and create a dictionary mapping the values to the first index that value occurs at:
b_map = {}
for index, value in enumerate(b):
if value not in b_map:
b_map[value] = index
b_map is now {1: 0, 2: 1, 3: 2, 4: 3}
Next, iterate over a, counting how many elements have an index less than that element's value in the dictionary we just created:
result = 0
for index, value in enumerate(a):
if index < b_map.get(value, -1):
result += 1
Which gives the expected result of 3.
b_map.get(value, -1) is used to protect against the situation when a value in a doesn't occur in b, and you don't want to count it towards the total: .get returns the default value of -1, which is guaranteed to be less than any index. If you do want to count it, you can replace the -1 with len(a).
The second snippet can be replaced by a single call to sum:
result = sum(index < b_map.get(value, -1)
for index, value in enumerate(a))
You can make a prefix-count of A, which is an array where for each index you keep track of the number of occurrences of each element before the index.
You can use this to efficiently look-up the prefix-counts when looping over B:
import collections
A=[2,3,4,1]
B=[1,2,3,4]
prefix_count = [collections.defaultdict(int) for _ in range(len(A))]
prefix_count[0][A[0]] += 1
for i, n in enumerate(A[1:], start=1):
prefix_count[i] = collections.defaultdict(int, prefix_count[i-1])
prefix_count[i][n] += 1
prefix_count_b = sum(prefix_count[i][n] for i, n in enumerate(B))
print(prefix_count_b)
This outputs 3.
This still could be O(NN) because of the copy from the previous index when initializing the prefix_count array, if someone knows a better way to do this, please let me know*
I am playing a code challenge. Simply speaking, the problem is:
Given a list L (max length is of the order of 1000) containing positive integers.
Find the number of "Lucky Triples", which is L[i] divides L[j], and L[j] divides L[k].
for example, [1,2,3,4,5,6] should give the answer 3 because [1,2,4], [1,2,6],[1,3,6]
My attempt:
Sort the list. (let say there are n elements)
3 For loops: i, j, k (i from 1 to n-2), (j from i+1 to n-1), (k from j+1 to n)
only if L[j] % L[i] == 0, the k for loop will be executed
The algorithm seems to give the correct answer. But the challenge said that my code exceeded the time limit. I tried on my computer for the list [1,2,3,...,2000], count = 40888(I guess it is correct). The time is around 5 second.
Is there any faster way to do that?
This is the code I have written in python.
def answer(l):
l.sort()
cnt = 0
if len(l) == 2:
return cnt
for i in range(len(l)-2):
for j in range(1,len(l)-1-i):
if (l[i+j]%l[i] == 0):
for k in range(1,len(l)-j-i):
if (l[i+j+k]%l[i+j] == 0):
cnt += 1
return cnt
You can use additional space to help yourself. After you sort the input list you should make a map/dict where the key is each element in the list and value is a list of elements which are divisible by that in the list so you would have something like this
assume sorted list is list = [1,2,3,4,5,6] your map would be
1 -> [2,3,4,5,6]
2-> [4,6]
3->[6]
4->[]
5->[]
6->[]
now for every key in the map you find what it can divide and then you find what that divides, for example you know that
1 divides 2 and 2 divides 4 and 6, similarly 1 divides 3 and 3 divides 6
the complexity of sorting should be O(nlogn) and that of constructing the list should be better than O(n^2) (but I am not sure about this part) and then I am not sure about the complexity of when you are actually checking for multiples but I think this should be much much faster than a brute force O(n^3)
If someone could help me figure out the time complexity of this I would really appreciate it
EDIT :
You can make the map creation part faster by incrementing by X (and not 1) where X is the number in the list you are currently on since it is sorted.
Thank you guys for all your suggestions. They are brilliant. But it seems that I still can't pass the speed test or I cannot handle with duplicated elements.
After discussing with my friend, I have just come up with another solution. It should be O(n^2) and I passed the speed test. Thanks all!!
def answer(lst):
lst.sort()
count = 0
if len(lst) == 2:
return count
#for each middle element, count the divisors at the front and the multiples at the back. Then multiply them.
for i, middle in enumerate(lst[1:len(lst)-1], start = 1):
countfirst = 0
countthird = 0
for first in (lst[0:i]):
if middle % first == 0:
countfirst += 1
for third in (lst[i+1:]):
if third % middle == 0:
countthird += 1
count += countfirst*countthird
return count
I guess sorting the list is pretty inefficient. I would rather try to iteratively reduce the number of candidates. You could do that in two steps.
At first filter all numbers that do not have a divisor.
from itertools import combinations
candidates = [max(pair) for pair in combinations(l, 2) if max(pair)%min(pair) == 0]
After that, count the number of remaining candidates, that do have a divisor.
result = sum(max(pair)%min(pair) == 0 for pair in combinations(candidates, 2))
Your original code, for reference.
def answer(l):
l.sort()
cnt = 0
if len(l) == 2:
return cnt
for i in range(len(l)-2):
for j in range(1,len(l)-1-i):
if (l[i+j]%l[i] == 0):
for k in range(1,len(l)-j-i):
if (l[i+j+k]%l[i+j] == 0):
cnt += 1
return cnt
There are a number of misimplementations here, and with just a few tweaks we can probably get this running much faster. Let's start:
def answer(lst): # I prefer not to use `l` because it looks like `1`
lst.sort()
count = 0 # use whole words here. No reason not to.
if len(lst) == 2:
return count
for i, first in enumerate(lst):
# using `enumerate` here means you can avoid ugly ranges and
# saves you from a look up on the list afterwards. Not really a
# performance hit, but definitely looks and feels nicer.
for j, second in enumerate(lst[i+1:], start=i+1):
# this is the big savings. You know since you sorted the list that
# lst[1] can't divide lst[n] if n>1, but your code still starts
# searching from lst[1] every time! Enumerating over `l[i+1:]`
# cuts out a lot of unnecessary burden.
if second % first == 0:
# see how using enumerate makes that look nicer?
for third in lst[j+1:]:
if third % second == 0:
count += 1
return count
I bet that on its own will pass your speed test, but if not, you can check for membership instead. In fact, using a set here is probably a great idea!
def answer2(lst):
s = set(lst)
limit = max(s) # we'll never have a valid product higher than this
multiples = {} # accumulator for our mapping
for n in sorted(s):
max_prod = limit // n # n * (max_prod+1) > limit
multiples[n] = [n*k for k in range(2, max_prod+1) if n*k in s]
# in [1,2,3,4,5,6]:
# multiples = {1: [2, 3, 4, 5, 6],
# 2: [4, 6],
# 3: [6],
# 4: [],
# 5: [],
# 6: []}
# multiples is now a mapping you can use a Depth- or Breadth-first-search on
triples = sum(1 for j in multiples
for k in multiples.get(j, [])
for l in multiples.get(k, []))
# This basically just looks up each starting value as j, then grabs
# each valid multiple and assigns it to k, then grabs each valid
# multiple of k and assigns it to l. For every possible combination there,
# it adds 1 more to the result of `triples`
return triples
I'll give you just an idea, the implementation should be up to you:
Initialize the global counter to zero.
Sort the list, starting with smallest number.
Create a list of integers (one entry per number with same index).
Iterate through each number (index i), and do the following:
Check for dividers at positions 0 to i-1.
Store the number of dividers in the list at the position i.
Fetch the number of dividers from the list for each divider, and add each number to the global counter.
Unless you finished, go to 3rd.
Your result should be in the global counter.
I'd like to count how many times a big list contains elements in specific order. So for example if i have elements [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5] and i'd like to know how many times [1,2,3] are next to each other (answer is 4 in this case).
I was thinking on checking the indexes of number '3' (so currently it'd return [2,7,12,17]. Then i would iterate over that list, take elements in positions described in the list and check two positions in front of it. If they match '1' and '2' then add 1 to counter and keep looking. I believe this solution isn't really efficient and does not look nice, would there be a better solution?
Here's a generalized solution that works for subsequences of any size and for elements of any type. It's also very space-efficient, as it only operates on iterators.
from itertools import islice
def count(lst, seq):
it = zip(*(islice(lst, i, None) for i in range(len(seq))))
seq = tuple(seq)
return sum(x == seq for x in it)
In [4]: count(l, (1, 2, 3))
Out[4]: 4
The idea is to create a sliding window iterator of width len(seq) over lst, and count the number of tuples equal to tuple(seq). This means that count also counts overlapping matches:
In [5]: count('aaa', 'aa')
Out[5]: 2
For lists of ints, you could convert to strings and then use the count method:
>>> x = [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]
>>> y = [1,2,3]
>>> s = ',' + ','.join(str(i) for i in x) + ','
>>> t = ',' + ','.join(str(i) for i in y) + ','
>>> s.count(t)
4
If the items in the list contained strings which contain commas, this method could fail (as #schwobaseggl points out in the comments). You would need to pick a delimiter known not to occur in any of the strings, or adopt an entirely different approach which doesn't reduce to the string count method.
On Edit: I added a fix suggested by #Rawing to address a bug pointed out by #tobias_k . This turns out to be a more subtle problem than it first seems.
x = [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]
y = [1,2,3]
count = 0
for i in range(len(x)-len(y)):
if x[i:i+len(y)] == y:
count += 1
print(count)
You could iterate the list and compare sublists:
In [1]: lst = [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]
In [2]: sub = [1,2,3]
In [3]: [i for i, _ in enumerate(lst) if lst[i:i+len(sub)] == sub]
Out[3]: [0, 5, 10, 15]
Note, however, that on a very large list and sublist, this is pretty wasteful, as it creates very many slices of the original list to compare against the sublist. In a slightly longer version, you could use all to compare each of the relevant positions of the list with those of the sublist:
In [5]: [i for i, _ in enumerate(lst) if all(lst[i+k] == e for k, e in enumerate(sub))]
Out[5]: [0, 5, 10, 15]
This strikes me as the longest common subsequence problem repeated every time until the sequence returned is an empty list.
I think that the best that you can do in this case for an efficient algorithm is O(n*m) where n is the number of elements in your big list and m is the number of elements in your small list. You of course would have to have an extra step of removing the small sequence from the big sequence and repeating the process.
Here's the algorithm:
Find lcs(bigList, smallList)
Remove the first occurrence of the smallList from the bigList
Repeat until lcs is an empty list
Return the number of iterations
Here is an implementation of lcs that I wrote in python:
def lcs(first, second):
results = dict()
return lcs_mem(first, second, results)
def lcs_mem(first, second, results):
key = ""
if first > second:
key = first + "," + second
else:
key = second + "," + first
if len(first) == 0 or len(second) == 0:
return ''
elif key in results:
return results[key]
elif first[-1] == second[-1]:
result = lcs(first[:-1], second[:-1]) + first[-1]
results[key] = result
return result
else:
lcsLeft = lcs(first[:-1], second)
lcsRight = lcs(first, second[:-1])
if len(lcsLeft) > len(lcsRight):
return lcsLeft
else:
return lcsRight
def main():
pass
if __name__ == '__main__':
main()
Feel free to modify it to the above algorithm.
One can define the efficiency of solution from the complexity. Search more about complexity of algorithm in google.
And in your case, complexity is 2n where n is number of elements.
Here is the solution with the complexity n, cause it traverses the list only once, i.e. n number of times.
def IsSameError(x,y):
if (len(x) != len(y)):
return False
i = 0
while (i < len(y)):
if(x[i] != y[i]):
return False
i += 1
return True
x = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
y = [1, 2, 3]
xLength = len(x)
yLength = len(y)
cnt = 0
answer = []
while (cnt+3 < xLength):
if(IsSameError([x[cnt], x[cnt+1], x[cnt+2]], y)):
answer.append(x[cnt])
answer.append(x[cnt+1])
answer.append(x[cnt + 2])
cnt = cnt + 3
else:
cnt = cnt + 1
print answer
I'm trying to write a Python function that counts the number of entries in a list that occur exactly once.
For example, given the list [17], this function would return 1. Or given [3,3,-22,1,-22,1,3,0], it would return 1.
** Restriction: I cannot import anything into my program.
The incorrect code that I've written so far: I'm going the double-loop route, but the index math is getting over-complicated.
def count_unique(x):
if len(x) == 1:
return 1
i = 0
j = 1
for i in range(len(x)):
for j in range(j,len(x)):
if x[i] == x[j]:
del x[j]
j+1
j = 0
return len(x)
Since you can't use collections.Counter or sorted/itertools.groupby apparently (one of which would usually be my go to solution, depending on whether the inputs are hashable or sortable), just simulate roughly the same behavior as a Counter, counting all elements and then counting the number of elements that appeared only once at the end:
def count_unique(x):
if len(x) <= 1:
return len(x)
counts = {}
for val in x:
counts[val] = counts.get(val, 0) + 1
return sum(1 for count in counts.values() if count == 1)
lst = [3,3,-22,1,-22,1,3,0]
len(filter(lambda z : z[0] == 1,
map(lambda x : (len(filter(lambda y : y == x, lst)), x), lst)))
sorry :)
Your solution doesn't work because you are doing something weird. Deleting things from a list while iterating through it, j+1 makes no sense etc. Try adding elements that are found to be unique to a new list and then counting the number of things in it. Then figure out what my solution does.
Here is the O(n) solution btw:
lst = [3,3,-22,1,-22,1,3,0,37]
cnts = {}
for n in lst:
if n in cnts:
cnts[n] = cnts[n] + 1
else:
cnts[n] = 1
count = 0
for k, v in cnts.iteritems():
if v == 1:
count += 1
print count
A more simple and understandable solution:
l = [3, 3, -22, 1, -22, 1, 3, 0]
counter = 0
for el in l:
if l.count(el) == 1:
counter += 1
It's pretty simple. You iterate over the items of the list. Then you look if the element is exactly one time in the list and then you add +1. You can improve the code (make liste comprehensions, use lambda expressions and so on), but this is the idea behind it all and the most understandable, imo.
you are making this overly complicated. try using a dictionary where the key is the element in your list. that way if it exists it will be unique
to add to this. it is probably the best method when looking at complexity. an in lookup on a dictionary is considered O(1), the for loop is O(n) so total your time complexity is O(n) which is desirable... using count() on a list element does a search on the whole list for every element which is basically O(n^2)... thats bad
from collections import defaultdict
count_hash_table = defaultdict(int) # i am making a regular dictionary but its data type is an integer
elements = [3,3,-22,1,-22,1,3,0]
for element in elements:
count_hash_table[element] += 1 # here i am using that default datatype to count + 1 for each type
print sum(c for c in count_hash_table.values() if c == 1):
There is method on lists called count.... from this you can go further i guess.
for example:
for el in l:
if l.count(el) > 1:
continue
else:
print("found {0}".format(el))
What is the most efficient way to sort a list, [0,0,1,0,1,1,0] whose elements are only 0 & 1, without using any builtin sort() or sorted() or count() function. O(n) or less than that
>>> lst = [0,0,1,0,1,1,0]
>>> l, s = len(lst), sum(lst)
>>> result = [0] * (l - s) + [1] * s
>>> result
[0, 0, 0, 0, 1, 1, 1]
There are many different general sorting algorithms that can be used. However, in this case, the most important consideration is that all the elements to sort belong to the set (0,1).
As other contributors answered there is a trivial implementation.
def radix_sort(a):
slist = [[],[]]
for elem in a:
slist[elem].append(elem)
return slist[0] + slist[1]
print radix_sort([0,0,1,0,1,1,0])
It must be noted that this is a particular implementation of the Radix sort. And this can be extended easily if the elements of the list to be sorted belong to a defined limited set.
def radix_sort(a, elems):
slist = {}
for elem in elems:
slist[elem] = []
for elem in a:
slist[elem].append(elem)
nslist = []
for elem in elems:
nslist += slist[elem]
return nslist
print radix_sort([2,0,0,1,3,0,1,1,0],[0,1,2,3])
No sort() or sorted() or count() function. O(n)
This one is O(n) (you can't get less):
old = [0,0,1,0,1,1,0]
zeroes = old.count(0) #you gotta count them somehow!
new = [0]*zeroes + [1]*(len(old) - zeroes)
As there are no Python loops, this may be the faster you can get in pure Python...
def sort_arr_with_zero_one():
main_list = [0,0,1,0,1,1,0]
zero_list = []
one_list = []
for i in main_list:
if i:
one_list.append(i)
else:
zero_list.append(i)
return zero_list + one_list
You have only two values, so you know in advance the precise structure of the output: it will be divided into two regions of varying lengths.
I'd try this:
b = [0,0,1,0,1,1,0]
def odd_sort(a):
zeroes = a.count(0)
return [0 for i in xrange(zeroes)] + [1 for i in xrange(len(a) - zeroes)]
You could walk the list with two pointers, one from the start (i) and from the end (j), and compare the values one by one and swap them if necessary:
def sort_binary_values(l):
i, j = 0, len(l)-1
while i < j:
# skip 0 values from the begin
while i < j and l[i] == 0:
i = i+1
if i >= j: break
# skip 1 values from the end
while i < j and l[j] == 1:
j = j-1
if i >= j: break
# since all in sequence values have been skipped and i and j did not reach each other
# we encountered a pair that is out of order and needs to be swapped
l[i], l[j] = l[j], l[i]
j = j-1
i = i+1
return l
I like the answer by JBernado, but will throw in another monstrous option (although I've not done any profiling on it - it's not particulary extensible as it relies on the order of a dictionary hash, but works for 0 and 1):
from itertools import chain, repeat
from collections import Counter
list(chain.from_iterable(map(repeat, *zip(*Counter(bits).items()))))
Or - slightly less convoluted...
from itertools import repeat, chain, islice, ifilter
from operator import not_
list(islice(chain(ifilter(not_, bits), repeat(1)), len(bits)))
This should keep everything at the C level - so it should be fairly optimal.
All you need to know is how long the original sequence is and how many ones are in it.
old = [0,0,1,0,1,1,0]
ones = sum(1 for b in old if b)
new = [0]*(len(old)-ones) + [1]*ones
Here is a Python solution in O(n) time and O(2) space.
Absolutely no need to create new lists and best time performance
def sort01(arr):
i = 0
j = len(arr)-1
while i < j:
while arr[i] == 0:
i += 1
while arr[j] == 1:
j -= 1
if i<j:
arr[i] = 0
arr[j] = 1
return arr