Python find frequency of numbers in list of lists - python

I have a list of lists of int as shown below
[[1, 2, 3],
[1, 5],
[4, 2, 6]]
I want to generate the frequency of the numbers in the lists as a dict, for example 1 occurs in 2 of the lists, and so on, expected output is
{1:2,
2:2,
3:1,
4:1,
5:1,
6:1}
How can this be generated?

You could try this:
L is your list of list.
expected = {1:2,
2:2,
3:1,
4:1,
5:1,
6:1}
>>> from itertools import chain
>>> from collections import Counter
>>> flattened = list(chain.from_iterable(L))
>>> flattened
[1, 2, 3, 1, 5, 4, 2, 6]
>>> counts = Counter(flattened)
>>> counts
Counter({1: 2, 2: 2, 3: 1, 5: 1, 4: 1, 6: 1})
# It's easy to make it to a function or one-liner too.
>>> counts = Counter(chain.from_iterable(L))
>>> assert counts == expected # your expected result shown above
# silence means matching.

you can use Counter for this
>>> from collections import Counter as c
>>> array = [[1, 2, 3],[1,5],[4,2,6]]
>>> result = c()
>>> for sublist in array:
... result += c(sublist)
...
>>> result
Counter({1: 2, 2: 2, 3: 1, 5: 1, 4: 1, 6: 1})

Related

Checking for key in dict during comprehension

Is it possible to do something like this:
l = [1, 2, 2, 3, 4, 4, 1, 1]
d = {num: [num] if num not in d else d[num].append(num) for num in l}
Inherently, I wouldn't think so, without declaring d = {} first; even then, it doesn't append:
Output: {1: [1], 2: [2], 3: [3], 4: [4]}
# desired: {1: [1, 1, 1], 2: [2, 2], 3: [3], 4: [4, 4]}
Could use a defaultdict, curious if the comprehension is even possible?
No, it's not possible. If you think about, it will make sense why. When Python evaluates an assignment statement, it first evaluates the right-hand side of the assignment - the expression. Since it hasn't evaluated the entire assignment yet, the variable on the left-hand hasn't been added to the current namespace yet. Thus, while the expression is being evaluated, the variable will be undefined.
As suggested, you can use collections.defaultdict to accomplish what you want:
>>> from collections import defaultdict
>>>
>>> l = [1, 2, 2, 3, 4, 4, 1, 1]
>>> d = defaultdict(list)
>>> for num in l:
d[num].append(num)
>>> d
defaultdict(<class 'list'>, {1: [1, 1, 1], 2: [2, 2], 3: [3], 4: [4, 4]})
>>>
d doesn't exist in your dictionary comprehension.
Why not:
l = [1, 2, 2, 3, 4, 4, 1, 1]
d = {num: [num] * l.count(num) for num in set(l)}
EDIT: I think, it is better to use a loop there
d = {}
for item in l:
d.setdefault(item, []).append(item)
No, you cannot refer to your list comprehension before the comprehension is assigned to a variable.
But you can use collections.Counter to limit those costly list.append calls.
from collections import Counter
l = [1, 2, 2, 3, 4, 4, 1, 1]
c = Counter(l)
d = {k: [k]*v for k, v in c.items()}
# {1: [1, 1, 1], 2: [2, 2], 3: [3], 4: [4, 4]}
Related: Create List of Single Item Repeated n Times in Python

Filtering out python dictionary based on a key`s values

I have a dictionary dictM in the form of
dictM={movieID:[rating1,rating2,rating3,rating4]}
Key is a movieID and rating1, rating2, rating3, rating4 are its values. There are several movieID's with ratings. I want to move certain movieID's along with ratings to a new dicitonary if a movieID has a certain number of ratings.
What I'm doing is :
for movie in dictM.keys():
if len(dictM[movie])>=5:
dF[movie]=d[movie]
But I'm not getting the desired result. Does someone know a solution for this?
You can use dictionary comprehension, as follows:
>>> dictM = {1: [1, 2, 3, 4], 2: [1, 2, 3]}
>>> {k: v for (k, v) in dictM.items() if len(v) ==4}
{1: [1, 2, 3, 4]}
You can try this using simple dictionary comprhension:
dictM={3:[4, 3, 2, 5, 1]}
new_dict = {a:b for a, b in dictM.items() if len(b) >= 5}
One reason why your code above may not be producing any results is first, you have not defined dF and the the length of the only value in dictM is equal to 4, but you want 5 or above, as shown in the if statement in your code.
You don't delete the entries, you could do it like this:
dictM = {1: [1, 2, 3],
2: [1, 2, 3, 4, 5],
3: [1, 2, 3, 4, 5, 6, 7],
4: [1]}
dF = {}
for movieID in list(dictM):
if len(dictM[movieID]) >= 5:
dF[movieID] = dictM[movieID] # put the movie + rating in the new dict
del dictM[movieID] # remove the movie from the old dict
The result looks like this:
>>> dictM
{1: [1, 2, 3], 4: [1]}
>>> dF
{2: [1, 2, 3, 4, 5], 3: [1, 2, 3, 4, 5, 6, 7]}

Convert array to dictionary (counter)

Suppose I have an array as follows
arr = [1 , 2, 3, 4, 5]
I would like to convert it to a dictionary like
{
1: 1,
2: 1,
3: 1,
4: 1,
5: 1
}
My motivation behind this is so I can quickly increment the count of any of the keys in O(1) time.
Help will be much appreciated. Thanks
from collections import Counter
answer = Counter(arr)
You can use the fromkeys method:
>>> arr = [1 , 2, 3, 4, 5]
>>> dict.fromkeys(arr,1)
{1: 1, 2: 1, 3: 1, 4: 1, 5: 1}
>>>
You can use a dictionary comprehension:
{k: 1 for k in arr}
from collections import Counter
arr = [1, 2, 3, 4, 5]
c = Counter(arr)
Try collections.Counter:
>>> import collections
>>> arr = [1, 2, 3, 4, 5]
>>> collections.Counter(arr)
Counter({1: 1, 2: 1, 3: 1, 4: 1, 5: 1})
arr = [1 , 2, 3, 4, 5]
print({p: arr.count(p) for p in arr})
It's more precise I think and it still works when an element in the array repeats.

extracting item with most common probability in python list

I have a list [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] and I need [1,2,3,7] as final result (this is kind of reverse engineering). One logic is to check intersections -
while(i<dlistlen):
j=i+1
while(j<dlistlen):
il = dlist1[i]
jl = dlist1[j]
tmp = list(set(il) & set(jl))
print tmp
#print i,j
j=j+1
i=i+1
this is giving me output :
[1, 2]
[1, 2, 7]
[1, 2, 7]
[1, 2, 3]
[1, 2, 3]
[1, 2, 3, 7]
[]
Looks like I am close to getting [1,2,3,7] as my final answer, but can't figure out how. Please note, in the very first list (([[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]] )) there may be more items leading to one more final answer besides [1,2,3,4]. But as of now, I need to extract only [1,2,3,7] .
Please note, this is not kind of homework, I am creating own clustering algorithm that fits my need.
You can use the Counter class to keep track of how often elements appear.
>>> from itertools import chain
>>> from collections import Counter
>>> l = [[1, 2, 7], [1, 2, 3], [1, 2, 3, 7], [1, 2, 3, 5, 6, 7]]
>>> #use chain(*l) to flatten the lists into a single list
>>> c = Counter(chain(*l))
>>> print c
Counter({1: 4, 2: 4, 3: 3, 7: 3, 5: 1, 6: 1})
>>> #sort keys in order of descending frequency
>>> sortedValues = sorted(c.keys(), key=lambda x: c[x], reverse=True)
>>> #show the four most common values
>>> print sortedValues[:4]
[1, 2, 3, 7]
>>> #alternatively, show the values that appear in more than 50% of all lists
>>> print [value for value, freq in c.iteritems() if float(freq) / len(l) > 0.50]
[1, 2, 3, 7]
It looks like you're trying to find the largest intersection of two list elements. This will do that:
from itertools import combinations
# convert all list elements to sets for speed
dlist = [set(x) for x in dlist]
intersections = (x & y for x, y in combinations(dlist, 2))
longest_intersection = max(intersections, key=len)

How to count the frequency of the elements in an unordered list? [duplicate]

This question already has answers here:
Using a dictionary to count the items in a list
(8 answers)
Closed 7 months ago.
Given an unordered list of values like
a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
How can I get the frequency of each value that appears in the list, like so?
# `a` has 4 instances of `1`, 4 of `2`, 2 of `3`, 1 of `4,` 2 of `5`
b = [4, 4, 2, 1, 2] # expected output
In Python 2.7 (or newer), you can use collections.Counter:
>>> import collections
>>> a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
>>> counter = collections.Counter(a)
>>> counter
Counter({1: 4, 2: 4, 5: 2, 3: 2, 4: 1})
>>> counter.values()
dict_values([2, 4, 4, 1, 2])
>>> counter.keys()
dict_keys([5, 1, 2, 4, 3])
>>> counter.most_common(3)
[(1, 4), (2, 4), (5, 2)]
>>> dict(counter)
{5: 2, 1: 4, 2: 4, 4: 1, 3: 2}
>>> # Get the counts in order matching the original specification,
>>> # by iterating over keys in sorted order
>>> [counter[x] for x in sorted(counter.keys())]
[4, 4, 2, 1, 2]
If you are using Python 2.6 or older, you can download an implementation here.
If the list is sorted, you can use groupby from the itertools standard library (if it isn't, you can just sort it first, although this takes O(n lg n) time):
from itertools import groupby
a = [5, 1, 2, 2, 4, 3, 1, 2, 3, 1, 1, 5, 2]
[len(list(group)) for key, group in groupby(sorted(a))]
Output:
[4, 4, 2, 1, 2]
Python 2.7+ introduces Dictionary Comprehension. Building the dictionary from the list will get you the count as well as get rid of duplicates.
>>> a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
>>> d = {x:a.count(x) for x in a}
>>> d
{1: 4, 2: 4, 3: 2, 4: 1, 5: 2}
>>> a, b = d.keys(), d.values()
>>> a
[1, 2, 3, 4, 5]
>>> b
[4, 4, 2, 1, 2]
Count the number of appearances manually by iterating through the list and counting them up, using a collections.defaultdict to track what has been seen so far:
from collections import defaultdict
appearances = defaultdict(int)
for curr in a:
appearances[curr] += 1
In Python 2.7+, you could use collections.Counter to count items
>>> a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
>>>
>>> from collections import Counter
>>> c=Counter(a)
>>>
>>> c.values()
[4, 4, 2, 1, 2]
>>>
>>> c.keys()
[1, 2, 3, 4, 5]
Counting the frequency of elements is probably best done with a dictionary:
b = {}
for item in a:
b[item] = b.get(item, 0) + 1
To remove the duplicates, use a set:
a = list(set(a))
You can do this:
import numpy as np
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
np.unique(a, return_counts=True)
Output:
(array([1, 2, 3, 4, 5]), array([4, 4, 2, 1, 2], dtype=int64))
The first array is values, and the second array is the number of elements with these values.
So If you want to get just array with the numbers you should use this:
np.unique(a, return_counts=True)[1]
Here's another succint alternative using itertools.groupby which also works for unordered input:
from itertools import groupby
items = [5, 1, 1, 2, 2, 1, 1, 2, 2, 3, 4, 3, 5]
results = {value: len(list(freq)) for value, freq in groupby(sorted(items))}
results
format: {value: num_of_occurencies}
{1: 4, 2: 4, 3: 2, 4: 1, 5: 2}
I would simply use scipy.stats.itemfreq in the following manner:
from scipy.stats import itemfreq
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
freq = itemfreq(a)
a = freq[:,0]
b = freq[:,1]
you may check the documentation here: http://docs.scipy.org/doc/scipy-0.16.0/reference/generated/scipy.stats.itemfreq.html
from collections import Counter
a=["E","D","C","G","B","A","B","F","D","D","C","A","G","A","C","B","F","C","B"]
counter=Counter(a)
kk=[list(counter.keys()),list(counter.values())]
pd.DataFrame(np.array(kk).T, columns=['Letter','Count'])
seta = set(a)
b = [a.count(el) for el in seta]
a = list(seta) #Only if you really want it.
Suppose we have a list:
fruits = ['banana', 'banana', 'apple', 'banana']
We can find out how many of each fruit we have in the list like so:
import numpy as np
(unique, counts) = np.unique(fruits, return_counts=True)
{x:y for x,y in zip(unique, counts)}
Result:
{'banana': 3, 'apple': 1}
This answer is more explicit
a = [1,1,1,1,2,2,2,2,3,3,3,4,4]
d = {}
for item in a:
if item in d:
d[item] = d.get(item)+1
else:
d[item] = 1
for k,v in d.items():
print(str(k)+':'+str(v))
# output
#1:4
#2:4
#3:3
#4:2
#remove dups
d = set(a)
print(d)
#{1, 2, 3, 4}
For your first question, iterate the list and use a dictionary to keep track of an elements existsence.
For your second question, just use the set operator.
def frequencyDistribution(data):
return {i: data.count(i) for i in data}
print frequencyDistribution([1,2,3,4])
...
{1: 1, 2: 1, 3: 1, 4: 1} # originalNumber: count
I am quite late, but this will also work, and will help others:
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
freq_list = []
a_l = list(set(a))
for x in a_l:
freq_list.append(a.count(x))
print 'Freq',freq_list
print 'number',a_l
will produce this..
Freq [4, 4, 2, 1, 2]
number[1, 2, 3, 4, 5]
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
counts = dict.fromkeys(a, 0)
for el in a: counts[el] += 1
print(counts)
# {1: 4, 2: 4, 3: 2, 4: 1, 5: 2}
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
# 1. Get counts and store in another list
output = []
for i in set(a):
output.append(a.count(i))
print(output)
# 2. Remove duplicates using set constructor
a = list(set(a))
print(a)
Set collection does not allow duplicates, passing a list to the set() constructor will give an iterable of totally unique objects. count() function returns an integer count when an object that is in a list is passed. With that the unique objects are counted and each count value is stored by appending to an empty list output
list() constructor is used to convert the set(a) into list and referred by the same variable a
Output
D:\MLrec\venv\Scripts\python.exe D:/MLrec/listgroup.py
[4, 4, 2, 1, 2]
[1, 2, 3, 4, 5]
Simple solution using a dictionary.
def frequency(l):
d = {}
for i in l:
if i in d.keys():
d[i] += 1
else:
d[i] = 1
for k, v in d.iteritems():
if v ==max (d.values()):
return k,d.keys()
print(frequency([10,10,10,10,20,20,20,20,40,40,50,50,30]))
#!usr/bin/python
def frq(words):
freq = {}
for w in words:
if w in freq:
freq[w] = freq.get(w)+1
else:
freq[w] =1
return freq
fp = open("poem","r")
list = fp.read()
fp.close()
input = list.split()
print input
d = frq(input)
print "frequency of input\n: "
print d
fp1 = open("output.txt","w+")
for k,v in d.items():
fp1.write(str(k)+':'+str(v)+"\n")
fp1.close()
from collections import OrderedDict
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
def get_count(lists):
dictionary = OrderedDict()
for val in lists:
dictionary.setdefault(val,[]).append(1)
return [sum(val) for val in dictionary.values()]
print(get_count(a))
>>>[4, 4, 2, 1, 2]
To remove duplicates and Maintain order:
list(dict.fromkeys(get_count(a)))
>>>[4, 2, 1]
i'm using Counter to generate a freq. dict from text file words in 1 line of code
def _fileIndex(fh):
''' create a dict using Counter of a
flat list of words (re.findall(re.compile(r"[a-zA-Z]+"), lines)) in (lines in file->for lines in fh)
'''
return Counter(
[wrd.lower() for wrdList in
[words for words in
[re.findall(re.compile(r'[a-zA-Z]+'), lines) for lines in fh]]
for wrd in wrdList])
For the record, a functional answer:
>>> L = [1,1,1,1,2,2,2,2,3,3,4,5,5]
>>> import functools
>>> >>> functools.reduce(lambda acc, e: [v+(i==e) for i, v in enumerate(acc,1)] if e<=len(acc) else acc+[0 for _ in range(e-len(acc)-1)]+[1], L, [])
[4, 4, 2, 1, 2]
It's cleaner if you count zeroes too:
>>> functools.reduce(lambda acc, e: [v+(i==e) for i, v in enumerate(acc)] if e<len(acc) else acc+[0 for _ in range(e-len(acc))]+[1], L, [])
[0, 4, 4, 2, 1, 2]
An explanation:
we start with an empty acc list;
if the next element e of L is lower than the size of acc, we just update this element: v+(i==e) means v+1 if the index i of acc is the current element e, otherwise the previous value v;
if the next element e of L is greater or equals to the size of acc, we have to expand acc to host the new 1.
The elements do not have to be sorted (itertools.groupby). You'll get weird results if you have negative numbers.
Another approach of doing this, albeit by using a heavier but powerful library - NLTK.
import nltk
fdist = nltk.FreqDist(a)
fdist.values()
fdist.most_common()
Found another way of doing this, using sets.
#ar is the list of elements
#convert ar to set to get unique elements
sock_set = set(ar)
#create dictionary of frequency of socks
sock_dict = {}
for sock in sock_set:
sock_dict[sock] = ar.count(sock)
For an unordered list you should use:
[a.count(el) for el in set(a)]
The output is
[4, 4, 2, 1, 2]
Yet another solution with another algorithm without using collections:
def countFreq(A):
n=len(A)
count=[0]*n # Create a new list initialized with '0'
for i in range(n):
count[A[i]]+= 1 # increase occurrence for value A[i]
return [x for x in count if x] # return non-zero count
num=[3,2,3,5,5,3,7,6,4,6,7,2]
print ('\nelements are:\t',num)
count_dict={}
for elements in num:
count_dict[elements]=num.count(elements)
print ('\nfrequency:\t',count_dict)
You can use the in-built function provided in python
l.count(l[i])
d=[]
for i in range(len(l)):
if l[i] not in d:
d.append(l[i])
print(l.count(l[i])
The above code automatically removes duplicates in a list and also prints the frequency of each element in original list and the list without duplicates.
Two birds for one shot ! X D
This approach can be tried if you don't want to use any library and keep it simple and short!
a = [1,1,1,1,2,2,2,2,3,3,4,5,5]
marked = []
b = [(a.count(i), marked.append(i))[0] for i in a if i not in marked]
print(b)
o/p
[4, 4, 2, 1, 2]

Categories

Resources