unique count in (list of smaller lists) [duplicate] - python

This question already has answers here:
Nested List and count()
(8 answers)
Closed 8 years ago.
Is there a pythonic way to count the elements in a list of lists preferably using collections?
lol = [[1,2,3],[4,2],[5,1,6]]
Out:
1: 2
2: 2
3: 1
4: 1
5: 1
6: 1

from collections import Counter
import itertools
a= [[1,2,3],[4,2],[5,1,6]]
print Counter(itertools.chain(*a))
#output Counter({1: 2, 2: 2, 3: 1, 4: 1, 5: 1, 6: 1})
b=Counter(itertools.chain(*a))
for key,val in b.iteritems():
print key,':',val
output:
1 : 2
2 : 2
3 : 1
4 : 1
5 : 1
6 : 1
Other way of doing this but less efficient compared to itertools( thanks to 200OK)
a= [[1,2,3],[4,2],[5,1,6]]
sum(map(Counter, a), Counter())
#output {1: 2, 2: 2, 3: 1, 4: 1, 5: 1, 6: 1}

from collections import Counter
import itertools
lol = [[1,2,3],[4,2],[5,1,6]]
Counter(itertools.chain.from_iterable(lol))
Output
Counter({1: 2, 2: 2, 3: 1, 4: 1, 5: 1, 6: 1})

Related

Summing up collections.Counter objects using `groupby` in pandas

I am trying to group the words_count column by both essay_Set and domain1_score and adding the counters in words_count to add the counters results as mentioned here:
>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d # add two counters together: c[x] + d[x]
Counter({'a': 4, 'b': 3})
I grouped them using this command:
words_freq_by_set = words_freq_by_set.groupby(by=["essay_set", "domain1_score"]) but do not know how to pass the Counter addition function to apply it on words_count column which is simply +.
Here is my dataframe:
GroupBy.sum works with Counter objects. However I should mention the process is pairwise, so this may not be very fast. Let's try
words_freq_by_set.groupby(by=["essay_set", "domain1_score"])['words_count'].sum()
df = pd.DataFrame({
'a': [1, 1, 2],
'b': [Counter([1, 2]), Counter([1, 3]), Counter([2, 3])]
})
df
a b
0 1 {1: 1, 2: 1}
1 1 {1: 1, 3: 1}
2 2 {2: 1, 3: 1}
df.groupby(by=['a'])['b'].sum()
a
1 {1: 2, 2: 1, 3: 1}
2 {2: 1, 3: 1}
Name: b, dtype: object

Iterating through a boolean value in a dictionary issue

I have a code like this:
def myfun(*args):
return {i: sum(k % i == 0 for k in args) for i in range(1,10)}
myfun(1,2,3,4,4,5,10,16,20)
{1: 9, 2: 6, 3: 1, 4: 4, 5: 3, 6: 0, 7: 0, 8: 1, 9: 0}
And I want to convert it to a function without a dict comprehension:
def myfun(*args):
for item in args:
for item_2 in range(1,10):
return {item_2:sum(item % item_2 == 0)}
myfun(1,2,3,4,4,5,10,16,20)
But I get:
----> 4 return {item_2:sum(item % item_2 == 0)}
5
6 myfun(1,2,3,4,4,5,10,16,20)
TypeError: 'bool' object is not iterable
What exactly is the bool value that is returned rather than the sum?
To unwrap the nested dict comprehension, you would end up with two for loops. First you'd iterate over your range and initialize a dict entry (your sum value) to 0. Then you'd loop over your args and do your mod check, and increment the value if necessary. This will emulate your sum expression.
def myfun(*args):
result = {}
for i in range(1,10):
result[i] = 0
for k in args:
if k % i == 0:
result[i] += 1
return result
>>> myfun(1,2,3,4,4,5,10,16,20)
{1: 9, 2: 6, 3: 1, 4: 4, 5: 3, 6: 0, 7: 0, 8: 1, 9: 0}
That's correct non dict comprehension syntax:
def myfun(*args):
result = {}
for i in range(1,10):
result[i] = sum(k % i == 0 for k in args)
return result
print(myfun(1,2,3,4,4,5,10,16,20))
Output:
{1: 9, 2: 6, 3: 1, 4: 4, 5: 3, 6: 0, 7: 0, 8: 1, 9: 0}

Initializing a dictionary with zeroes [duplicate]

This question already has answers here:
How do I count the occurrences of a list item?
(30 answers)
Closed 3 years ago.
I am trying to track seen elements, from a big array, using a dict.
Is there a way to force a dictionary object to be integer type and set to zero by default upon initialization?
I have done this with a very clunky codes and two loops.
Here is what I do now:
fl = [0, 1, 1, 2, 1, 3, 4]
seenit = {}
for val in fl:
seenit[val] = 0
for val in fl:
seenit[val] = seenit[val] + 1
Of course, just use collections.defaultdict([default_factory[, ...]]):
from collections import defaultdict
fl = [0, 1, 1, 2, 1, 3, 4]
seenit = defaultdict(int)
for val in fl:
seenit[val] += 1
print(fl)
# Output
defaultdict(<class 'int'>, {0: 1, 1: 3, 2: 1, 3: 1, 4: 1})
print(dict(seenit))
# Output
{0: 1, 1: 3, 2: 1, 3: 1, 4: 1}
In addition, if you don't like to import collections you can use dict.get(key[, default])
fl = [0, 1, 1, 2, 1, 3, 4]
seenit = {}
for val in fl:
seenit[val] = seenit.get(val, 0) + 1
print(seenit)
# Output
{0: 1, 1: 3, 2: 1, 3: 1, 4: 1}
Also, if you only want to solve the problem and don't mind to use exactly dictionaries you may use collection.counter([iterable-or-mapping]):
from collections import Counter
fl = [0, 1, 1, 2, 1, 3, 4]
seenit = Counter(f)
print(seenit)
# Output
Counter({1: 3, 0: 1, 2: 1, 3: 1, 4: 1})
print(dict(seenit))
# Output
{0: 1, 1: 3, 2: 1, 3: 1, 4: 1}
Both collection.defaultdict and collection.Counter can be read as dictionary[key] and supports the usage of .keys(), .values(), .items(), etc. Basically they are a subclass of a common dictionary.
If you want to talk about performance I checked with timeit.timeit() the creation of the dictionary and the loop for a million of executions:
collection.defaultdic: 2.160868141 seconds
dict.get: 1.3540439499999999 seconds
collection.Counter: 4.700308418999999 seconds
collection.Counter may be easier, but much slower.
You can use collections.Counter:
from collections import Counter
Counter([0, 1, 1, 2, 1, 3, 4])
Output:
Counter({1: 3, 0: 1, 2: 1, 3: 1, 4: 1})
You can then address it like a dictionary:
>>> Counter({1: 3, 0: 1, 2: 1, 3: 1, 4: 1})[1]
3
>>> Counter({1: 3, 0: 1, 2: 1, 3: 1, 4: 1})[0]
1
Using val in seenit is a bit faster than .get():
seenit = dict()
for val in fl:
if val in seenit :
seenit[val] += 1
else:
seenit[val] = 1
For larger lists, Counter will eventually outperform all other approaches. and defaultdict is going to be faster than using .get() or val in seenit.

Python dictionary find key of max vlue

in python, if I want to find the max value of d, but the key only include 1,2,3 other than all the keys in the d. so how to do, thank you.
d = {1: 5, 2: 0, 3: 4, 4: 0, 5: 1}
Just get the keys and values for the keys 1, 2 and 3 in a list of tuples, sort the list and get the first tuple element [0] key [0].
d = {1: 5, 2: 0, 3: 4, 4: 0, 5: 1}
key_max_val = sorted([(k,v) for k,v in d.items() if k in [1,2,3]])[0][0]
print(key_max_val) # Outputs 1
You can use operator:
It will return you the key with maximum value:
In [873]: import operator
In [874]: d = {1: 5, 2: 0, 3: 4, 4: 0, 5: 1}
In [875]: max(d.iteritems(), key=operator.itemgetter(1))[0]
Out[875]: 1
I think this below should work (base on
#Mayank Porwal idea, sorry coz I can not reply):
d = {1: 5, 2: 0, 3: 4, 4: 0, 5: 1}
max(v for k,v in d.items())
Use a generator and the max builtin function:
Max value
max(v for k,v in d.items() if k in [1,2,3])
Max key
max(k for k,v in d.items() if k in [1,2,3])

nested dictionary of bin sizes from groupby multiple columns

df = pd.DataFrame({'a': [1,1,1,1,2,2,2,2,3,3,3,3], 'b': [5,5,1,1,3,3,3,1,2,1,1,1,]})
>>> df
a b
0 1 5
1 1 5
2 1 1
3 1 1
4 2 3
5 2 3
6 2 3
7 2 1
8 3 2
9 3 1
10 3 1
11 3 1
>>> df.groupby(['a','b']).size().to_dict()
{(1, 5): 2, (3, 2): 1, (2, 3): 3, (3, 1): 3, (1, 1): 2, (2, 1): 1}
What I am getting is the counts of each a and b combination with a tuple of the pair as key but what I am trying to get to is:
{1: {5: 2, 1: 2}, 2: {3: 3, 1: 1}, 3: {2: 1, 1: 3} }
You'll need an additional groupby inside a dict comprehension:
i = df.groupby(['a','b']).size().reset_index(level=1)
j = {k : dict(g.values) for k, g in i.groupby(level=0)}
print(j)
{
1: {1: 2, 5: 2},
2: {1: 1, 3: 3},
3: {1: 3, 2: 1}
}
You can use collections.defaultdict for an O(n) solution.
from collections import defaultdict
df = pd.DataFrame({'a': [1,1,1,1,2,2,2,2,3,3,3,3], 'b': [5,5,1,1,3,3,3,1,2,1,1,1,]})**Option 2: defaultdict**
d = defaultdict(lambda: defaultdict(int))
for i, j in map(tuple, df.values):
d[i][j] += 1
# defaultdict(<function __main__.<lambda>>,
# {1: defaultdict(int, {1: 2, 5: 2}),
# 2: defaultdict(int, {1: 1, 3: 3}),
# 3: defaultdict(int, {1: 3, 2: 1})})
from collections import Counter
import pandas as pd
s = pd.Series(Counter(zip(df.a, df.b)))
{
n: d.xs(n).to_dict()
for n, d in s.groupby(level=0)
}
{1: {1: 2, 5: 2}, 2: {1: 1, 3: 3}, 3: {1: 3, 2: 1}}

Categories

Resources