Count list occurence in list python - python

I'd like to count how many times a big list contains elements in specific order. So for example if i have elements [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5] and i'd like to know how many times [1,2,3] are next to each other (answer is 4 in this case).
I was thinking on checking the indexes of number '3' (so currently it'd return [2,7,12,17]. Then i would iterate over that list, take elements in positions described in the list and check two positions in front of it. If they match '1' and '2' then add 1 to counter and keep looking. I believe this solution isn't really efficient and does not look nice, would there be a better solution?

Here's a generalized solution that works for subsequences of any size and for elements of any type. It's also very space-efficient, as it only operates on iterators.
from itertools import islice
def count(lst, seq):
it = zip(*(islice(lst, i, None) for i in range(len(seq))))
seq = tuple(seq)
return sum(x == seq for x in it)
In [4]: count(l, (1, 2, 3))
Out[4]: 4
The idea is to create a sliding window iterator of width len(seq) over lst, and count the number of tuples equal to tuple(seq). This means that count also counts overlapping matches:
In [5]: count('aaa', 'aa')
Out[5]: 2

For lists of ints, you could convert to strings and then use the count method:
>>> x = [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]
>>> y = [1,2,3]
>>> s = ',' + ','.join(str(i) for i in x) + ','
>>> t = ',' + ','.join(str(i) for i in y) + ','
>>> s.count(t)
4
If the items in the list contained strings which contain commas, this method could fail (as #schwobaseggl points out in the comments). You would need to pick a delimiter known not to occur in any of the strings, or adopt an entirely different approach which doesn't reduce to the string count method.
On Edit: I added a fix suggested by #Rawing to address a bug pointed out by #tobias_k . This turns out to be a more subtle problem than it first seems.

x = [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]
y = [1,2,3]
count = 0
for i in range(len(x)-len(y)):
if x[i:i+len(y)] == y:
count += 1
print(count)

You could iterate the list and compare sublists:
In [1]: lst = [1,2,3,4,5,1,2,3,4,5,1,2,3,4,5,1,2,3,4,5]
In [2]: sub = [1,2,3]
In [3]: [i for i, _ in enumerate(lst) if lst[i:i+len(sub)] == sub]
Out[3]: [0, 5, 10, 15]
Note, however, that on a very large list and sublist, this is pretty wasteful, as it creates very many slices of the original list to compare against the sublist. In a slightly longer version, you could use all to compare each of the relevant positions of the list with those of the sublist:
In [5]: [i for i, _ in enumerate(lst) if all(lst[i+k] == e for k, e in enumerate(sub))]
Out[5]: [0, 5, 10, 15]

This strikes me as the longest common subsequence problem repeated every time until the sequence returned is an empty list.
I think that the best that you can do in this case for an efficient algorithm is O(n*m) where n is the number of elements in your big list and m is the number of elements in your small list. You of course would have to have an extra step of removing the small sequence from the big sequence and repeating the process.
Here's the algorithm:
Find lcs(bigList, smallList)
Remove the first occurrence of the smallList from the bigList
Repeat until lcs is an empty list
Return the number of iterations
Here is an implementation of lcs that I wrote in python:
def lcs(first, second):
results = dict()
return lcs_mem(first, second, results)
def lcs_mem(first, second, results):
key = ""
if first > second:
key = first + "," + second
else:
key = second + "," + first
if len(first) == 0 or len(second) == 0:
return ''
elif key in results:
return results[key]
elif first[-1] == second[-1]:
result = lcs(first[:-1], second[:-1]) + first[-1]
results[key] = result
return result
else:
lcsLeft = lcs(first[:-1], second)
lcsRight = lcs(first, second[:-1])
if len(lcsLeft) > len(lcsRight):
return lcsLeft
else:
return lcsRight
def main():
pass
if __name__ == '__main__':
main()
Feel free to modify it to the above algorithm.

One can define the efficiency of solution from the complexity. Search more about complexity of algorithm in google.
And in your case, complexity is 2n where n is number of elements.
Here is the solution with the complexity n, cause it traverses the list only once, i.e. n number of times.
def IsSameError(x,y):
if (len(x) != len(y)):
return False
i = 0
while (i < len(y)):
if(x[i] != y[i]):
return False
i += 1
return True
x = [1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, 1, 2, 3, 4, 5]
y = [1, 2, 3]
xLength = len(x)
yLength = len(y)
cnt = 0
answer = []
while (cnt+3 < xLength):
if(IsSameError([x[cnt], x[cnt+1], x[cnt+2]], y)):
answer.append(x[cnt])
answer.append(x[cnt+1])
answer.append(x[cnt + 2])
cnt = cnt + 3
else:
cnt = cnt + 1
print answer

Related

How to make a list comprehesion with enumerate while reseating a variable in python

Let's say I want to get the the number of jumps of each consecutive 1's in a binary string of say, 169, which is 10101001.
The answer then it'd be 3, 2, 2 because when the algorithm starts at the most right digit of the binary number needs to move thrice to the left to reach the next 1, twice to get the next 1 and so on.
So the algorithm should have a counter that starts at 0, increments in one while finds 0's and reset each time it reaches a 1.
I need the output to be in form of a list using list comprehesion.
This is what I got so far:
number = 169
list = []
c = 1
for i in bin(number>>1)[:1:-1]:
if i == '1':
list.append(c)
c = 0
c += 1
The algorithm indeed works but the idea is to transform it into one line code using list compreshesion. I think that there should be one way to do it using enumerate().
Just like:
n = 169
list = [i for i, c in enumerate(bin(number>>1)[:1:-1], 1) if c == '1']
The problem is that the output will be [3, 5, 7] instead of [3, 2, 2] because the i (index variable) didn't reset.
I'm looking for an asnwer that isn't just straight list[a+1] - list[a] but more elegant and efficient solution.
Here's a one-liner for this problem that's most probably not readable.
s = "10101001"
result = [p - q for p, q in zip([index for index, a in enumerate(s[::-1]) if a == "1"][1:], [index for index, b in enumerate(s[::-1]) if b == "1"][:s.count("1")-1])]
You can use groupby here:
bs = "10101001"
result = [
sum(1 for _ in g) + 1 # this can also be something like len(list(g)) + 1
for k, g in groupby(reversed(bs))
if k == "0"
]
You cannot really do this easily with just the list comprehension, because what you want cannot be expressed with just mapping/filtering (in any straightforward way I can think of), but once you have a grouping iterator, it simply becomes suming the length of "0" runs.
you can easily do this with a regex pattern
import re
out = [len(i) for i in re.findall("0*1", num)]
output
print(out)
>>> [3, 2, 2]

Looping through all item except itself

I am trying to find the item in a list with highest number of occurrence.
For this, I am trying to compare every item in list with all other items in list and increasing count's value by 1 each time a match is found.
def findInt(array):
count = []
count = [1 for i in range(0,len(array))]
for i,item in enumerate(array):
if (array[i] == array[i:]): #How can I compare it with all items except itself?
count[i]+=1
return max(count), count.index(max(count))
findInt(array=[1,2,3])
My question is "how do I compare the item with all other items except itself"?
use collections.Counter which has a most_common function.
import collections
def findInt(array):
c = collections.Counter(array)
return c.most_common(1)
DEMO
>>> import collections
>>> array=[1,2,3,1,2,3,2]
>>> c = collections.Counter(array)
>>> c.most_common(1)
[(2, 3)]
DOC
class collections.Counter([iterable-or-mapping])
A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.
most_common([n])
Return a list of the n most common elements and their counts from the most common to the least. If n is omitted or None, most_common() returns all elements in the counter. Elements with equal counts are ordered arbitrarily:
Whilst there exist many better ways of solving this problem, for instance as indicated in #zwer's comment to your question, here's how I would solve exactly what you're asking:
# O(n ** 2)
def find_int(array):
n = len(array)
count = [1 for i in range(n)]
for i in range(n):
for j in range(n):
if i == j: continue
if array[i] == array[j]:
count[i] += 1
return max(count), count.index(max(count))
# Worse than O(n ** 2)
def find_int_using_slice(array):
n = len(array)
count = [1 for i in range(n)]
for i in range(n):
for a_j in array[0:i] + array[i+1:]:
if array[i] == a_j:
count[i] += 1
return max(count), count.index(max(count))
print(find_int_using_slice([1,2,3,1,2,3,2]))
We're using a nested for-loop here and using continue to skip the iteration when the two indexes are the same.
Unless specifically for the purpose of learning, please consider using built-ins for common tasks this, as they are well implemented, tested, optimised, etc.
There are many potential solutions, but here are the two I'd recommend, depending on your application's requirements: 1) sort and count in a single pass from left to right: O(n * log(n)) and losing the original ordering, or 2) use a dictionary to maintain the counts, requiring only a single pass from left to right: O(n) but using more memory. Of course the better decision would be to use in-built methods which are highly optimised, but that's your call
Updated answer to reflect OP not wanting to use collections.Counter
Using setdefault to prime the counter for first occurrences, then increment the counter. Then you can use max with a key to find the most common item.
def most_common(ar):
y = {}
for item in ar:
y.setdefault(item, 0)
y[item] += 1
return max(y.items(), key=lambda x: x[1])
array = [1, 2, 1, 1, 2, 1, 3, 3, 1]
most_common(array)
(1, 5) # (Most common item, occurrences of item)
def findInt(array):
count = []
for i in range(len(array)):
count.append(array.count(array[i]))
return max(count), count.index(max(count))
print(findInt(array=[1,2,3,1,2,3,2]))
Fine, I'll bite - given that memory is cheap, hashing is preferred over looping. I'd reckon one of the most performant ways would be to use a temporary registry:
def findInt(array):
occurrences = dict.fromkeys(array, 0)
for element in array:
occurrences[element] += 1
items = occurrences.values()
max_occurences = max(items)
return occurrences.keys()[items.index(max_occurences)], max_occurences
Returns a tuple of the element that occurs the most, and the number of times it occurs.
Actually, let's optimize it even more - here's a pure O(N) solution with no extra list building and searching:
def findInt(array):
occurrences = dict.fromkeys(array, 0)
candidate = array[0]
occurs = 0
for element in array:
value = occurrences[element] + 1
occurrences[element] = value
if value > occurs:
candidate = element
occurs = value
return candidate, occurs
Counter is ideal for counting frequencies of items in an iterable. Alternatively, you can loop once with a defaultdict.
import operator as op
import collections as ct
def findInt(array):
dd = ct.defaultdict(int)
for item in array:
dd[item] += 1
return dd
# Frequencies
array = [1, 2, 1, 1, 2, 1, 3, 3, 1]
freq = findInt(array)
freq
# Out: defaultdict(int, {1: 5, 2: 2, 3: 2})
# Maximum key-value pair (2 options)
{k:v for k,v in freq.items() if k == max(freq, key=lambda x: freq[x])}
# Out: {1: 5}
max({k:v for k,v in freq.items()}.items(), key=op.itemgetter(-1))
# Out: (1: 5)

Counting number of list entries that occur 1 time

I'm trying to write a Python function that counts the number of entries in a list that occur exactly once.
For example, given the list [17], this function would return 1. Or given [3,3,-22,1,-22,1,3,0], it would return 1.
** Restriction: I cannot import anything into my program.
The incorrect code that I've written so far: I'm going the double-loop route, but the index math is getting over-complicated.
def count_unique(x):
if len(x) == 1:
return 1
i = 0
j = 1
for i in range(len(x)):
for j in range(j,len(x)):
if x[i] == x[j]:
del x[j]
j+1
j = 0
return len(x)
Since you can't use collections.Counter or sorted/itertools.groupby apparently (one of which would usually be my go to solution, depending on whether the inputs are hashable or sortable), just simulate roughly the same behavior as a Counter, counting all elements and then counting the number of elements that appeared only once at the end:
def count_unique(x):
if len(x) <= 1:
return len(x)
counts = {}
for val in x:
counts[val] = counts.get(val, 0) + 1
return sum(1 for count in counts.values() if count == 1)
lst = [3,3,-22,1,-22,1,3,0]
len(filter(lambda z : z[0] == 1,
map(lambda x : (len(filter(lambda y : y == x, lst)), x), lst)))
sorry :)
Your solution doesn't work because you are doing something weird. Deleting things from a list while iterating through it, j+1 makes no sense etc. Try adding elements that are found to be unique to a new list and then counting the number of things in it. Then figure out what my solution does.
Here is the O(n) solution btw:
lst = [3,3,-22,1,-22,1,3,0,37]
cnts = {}
for n in lst:
if n in cnts:
cnts[n] = cnts[n] + 1
else:
cnts[n] = 1
count = 0
for k, v in cnts.iteritems():
if v == 1:
count += 1
print count
A more simple and understandable solution:
l = [3, 3, -22, 1, -22, 1, 3, 0]
counter = 0
for el in l:
if l.count(el) == 1:
counter += 1
It's pretty simple. You iterate over the items of the list. Then you look if the element is exactly one time in the list and then you add +1. You can improve the code (make liste comprehensions, use lambda expressions and so on), but this is the idea behind it all and the most understandable, imo.
you are making this overly complicated. try using a dictionary where the key is the element in your list. that way if it exists it will be unique
to add to this. it is probably the best method when looking at complexity. an in lookup on a dictionary is considered O(1), the for loop is O(n) so total your time complexity is O(n) which is desirable... using count() on a list element does a search on the whole list for every element which is basically O(n^2)... thats bad
from collections import defaultdict
count_hash_table = defaultdict(int) # i am making a regular dictionary but its data type is an integer
elements = [3,3,-22,1,-22,1,3,0]
for element in elements:
count_hash_table[element] += 1 # here i am using that default datatype to count + 1 for each type
print sum(c for c in count_hash_table.values() if c == 1):
There is method on lists called count.... from this you can go further i guess.
for example:
for el in l:
if l.count(el) > 1:
continue
else:
print("found {0}".format(el))

Getting difference from all possible pairs from a list Python

I have an int list with unspecified number. I would like to find the difference between two integers in the list that match a certain value.
#Example of a list
intList = [3, 6, 2, 7, 1]
#This is what I have done so far
diffList = []
i = 0
while (i < len(intList)):
x = intList[i]
j = i +1
while (j < len(intList)):
y = intList[j]
diff = abs(x-y)
diffList.append(diff)
j += 1
i +=1
#Find all pairs that has a difference of 2
diff = diffList.count(2)
print diff
Is there a better way to do this?
EDIT: Made changes to the codes. This is what I was trying to do. What I want to know is what else can I use besides the loop.
seems like a job for itertools.combinations
from itertools import combinations
for a, b in combinations(intList, 2):
print abs(a - b)
You could even turn this one into a list comprehension if you wanted to :)
[abs(a -b) for a, b in combinations(intList, 2)]
int_list = [3, 6, 2, 7, 1]
for x in int_list:
for y in int_list:
print abs(x - y)

Sort a list efficiently which contains only 0 and 1 without using any builtin python sort function?

What is the most efficient way to sort a list, [0,0,1,0,1,1,0] whose elements are only 0 & 1, without using any builtin sort() or sorted() or count() function. O(n) or less than that
>>> lst = [0,0,1,0,1,1,0]
>>> l, s = len(lst), sum(lst)
>>> result = [0] * (l - s) + [1] * s
>>> result
[0, 0, 0, 0, 1, 1, 1]
There are many different general sorting algorithms that can be used. However, in this case, the most important consideration is that all the elements to sort belong to the set (0,1).
As other contributors answered there is a trivial implementation.
def radix_sort(a):
slist = [[],[]]
for elem in a:
slist[elem].append(elem)
return slist[0] + slist[1]
print radix_sort([0,0,1,0,1,1,0])
It must be noted that this is a particular implementation of the Radix sort. And this can be extended easily if the elements of the list to be sorted belong to a defined limited set.
def radix_sort(a, elems):
slist = {}
for elem in elems:
slist[elem] = []
for elem in a:
slist[elem].append(elem)
nslist = []
for elem in elems:
nslist += slist[elem]
return nslist
print radix_sort([2,0,0,1,3,0,1,1,0],[0,1,2,3])
No sort() or sorted() or count() function. O(n)
This one is O(n) (you can't get less):
old = [0,0,1,0,1,1,0]
zeroes = old.count(0) #you gotta count them somehow!
new = [0]*zeroes + [1]*(len(old) - zeroes)
As there are no Python loops, this may be the faster you can get in pure Python...
def sort_arr_with_zero_one():
main_list = [0,0,1,0,1,1,0]
zero_list = []
one_list = []
for i in main_list:
if i:
one_list.append(i)
else:
zero_list.append(i)
return zero_list + one_list
You have only two values, so you know in advance the precise structure of the output: it will be divided into two regions of varying lengths.
I'd try this:
b = [0,0,1,0,1,1,0]
def odd_sort(a):
zeroes = a.count(0)
return [0 for i in xrange(zeroes)] + [1 for i in xrange(len(a) - zeroes)]
You could walk the list with two pointers, one from the start (i) and from the end (j), and compare the values one by one and swap them if necessary:
def sort_binary_values(l):
i, j = 0, len(l)-1
while i < j:
# skip 0 values from the begin
while i < j and l[i] == 0:
i = i+1
if i >= j: break
# skip 1 values from the end
while i < j and l[j] == 1:
j = j-1
if i >= j: break
# since all in sequence values have been skipped and i and j did not reach each other
# we encountered a pair that is out of order and needs to be swapped
l[i], l[j] = l[j], l[i]
j = j-1
i = i+1
return l
I like the answer by JBernado, but will throw in another monstrous option (although I've not done any profiling on it - it's not particulary extensible as it relies on the order of a dictionary hash, but works for 0 and 1):
from itertools import chain, repeat
from collections import Counter
list(chain.from_iterable(map(repeat, *zip(*Counter(bits).items()))))
Or - slightly less convoluted...
from itertools import repeat, chain, islice, ifilter
from operator import not_
list(islice(chain(ifilter(not_, bits), repeat(1)), len(bits)))
This should keep everything at the C level - so it should be fairly optimal.
All you need to know is how long the original sequence is and how many ones are in it.
old = [0,0,1,0,1,1,0]
ones = sum(1 for b in old if b)
new = [0]*(len(old)-ones) + [1]*ones
Here is a Python solution in O(n) time and O(2) space.
Absolutely no need to create new lists and best time performance
def sort01(arr):
i = 0
j = len(arr)-1
while i < j:
while arr[i] == 0:
i += 1
while arr[j] == 1:
j -= 1
if i<j:
arr[i] = 0
arr[j] = 1
return arr

Categories

Resources