nested dictionary of bin sizes from groupby multiple columns - python

df = pd.DataFrame({'a': [1,1,1,1,2,2,2,2,3,3,3,3], 'b': [5,5,1,1,3,3,3,1,2,1,1,1,]})
>>> df
a b
0 1 5
1 1 5
2 1 1
3 1 1
4 2 3
5 2 3
6 2 3
7 2 1
8 3 2
9 3 1
10 3 1
11 3 1
>>> df.groupby(['a','b']).size().to_dict()
{(1, 5): 2, (3, 2): 1, (2, 3): 3, (3, 1): 3, (1, 1): 2, (2, 1): 1}
What I am getting is the counts of each a and b combination with a tuple of the pair as key but what I am trying to get to is:
{1: {5: 2, 1: 2}, 2: {3: 3, 1: 1}, 3: {2: 1, 1: 3} }

You'll need an additional groupby inside a dict comprehension:
i = df.groupby(['a','b']).size().reset_index(level=1)
j = {k : dict(g.values) for k, g in i.groupby(level=0)}
print(j)
{
1: {1: 2, 5: 2},
2: {1: 1, 3: 3},
3: {1: 3, 2: 1}
}

You can use collections.defaultdict for an O(n) solution.
from collections import defaultdict
df = pd.DataFrame({'a': [1,1,1,1,2,2,2,2,3,3,3,3], 'b': [5,5,1,1,3,3,3,1,2,1,1,1,]})**Option 2: defaultdict**
d = defaultdict(lambda: defaultdict(int))
for i, j in map(tuple, df.values):
d[i][j] += 1
# defaultdict(<function __main__.<lambda>>,
# {1: defaultdict(int, {1: 2, 5: 2}),
# 2: defaultdict(int, {1: 1, 3: 3}),
# 3: defaultdict(int, {1: 3, 2: 1})})

from collections import Counter
import pandas as pd
s = pd.Series(Counter(zip(df.a, df.b)))
{
n: d.xs(n).to_dict()
for n, d in s.groupby(level=0)
}
{1: {1: 2, 5: 2}, 2: {1: 1, 3: 3}, 3: {1: 3, 2: 1}}

Related

How to sum keys and multiply keys and values from dictionary in column?

In my dataframe a have column with dictionaries where keys and values are numeric:
col1 type
1 {4: 1, 8: 2, 4: 3}
2 {10: 2, 8: 1, 3: 3}
2 {5: 2, 2: 3}
I want to create two new columns: first one is equal to the sum of keys, second is equal to the sum of keys value pairs multiplications. So desired results must be:
col1 type col2 col3
1 {4: 1, 8: 2, 4: 3} 16 32
2 {10: 2, 8: 1, 3: 3} 21 37
2 {5: 2, 2: 3} 7 16
How to do that? When I do df["col2"] = sum(df.type.keys()) it puts same value in each row in column col2
When I do sum(df.type[0]) it rightly calculates 16.
Two problems:
{4: 1, 8: 2, 4: 3} is actually {4: 3, 8: 2}, because the key 4 is used twice, and the second usage overrides the first. Try print({4: 1, 8: 2, 4: 3}).
df.type.keys() gives you the keys of the Series df.type, which is the index (see here). print(df.type.keys()) should output something like RangeIndex(start=0, stop=3, step=1).
To achieve your goal you could use .map and do the following:
df["col2"] = df.type.map(sum)
df["col3"] = df.type.map(lambda d: sum(k * v for k, v in d.items()))
Result:
col1 type col2 col3
0 1 {4: 3, 8: 2} 12 28
1 2 {10: 2, 8: 1, 3: 3} 21 37
2 2 {5: 2, 2: 3} 7 16

Summing up collections.Counter objects using `groupby` in pandas

I am trying to group the words_count column by both essay_Set and domain1_score and adding the counters in words_count to add the counters results as mentioned here:
>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d # add two counters together: c[x] + d[x]
Counter({'a': 4, 'b': 3})
I grouped them using this command:
words_freq_by_set = words_freq_by_set.groupby(by=["essay_set", "domain1_score"]) but do not know how to pass the Counter addition function to apply it on words_count column which is simply +.
Here is my dataframe:
GroupBy.sum works with Counter objects. However I should mention the process is pairwise, so this may not be very fast. Let's try
words_freq_by_set.groupby(by=["essay_set", "domain1_score"])['words_count'].sum()
df = pd.DataFrame({
'a': [1, 1, 2],
'b': [Counter([1, 2]), Counter([1, 3]), Counter([2, 3])]
})
df
a b
0 1 {1: 1, 2: 1}
1 1 {1: 1, 3: 1}
2 2 {2: 1, 3: 1}
df.groupby(by=['a'])['b'].sum()
a
1 {1: 2, 2: 1, 3: 1}
2 {2: 1, 3: 1}
Name: b, dtype: object

Python dictionary find key of max vlue

in python, if I want to find the max value of d, but the key only include 1,2,3 other than all the keys in the d. so how to do, thank you.
d = {1: 5, 2: 0, 3: 4, 4: 0, 5: 1}
Just get the keys and values for the keys 1, 2 and 3 in a list of tuples, sort the list and get the first tuple element [0] key [0].
d = {1: 5, 2: 0, 3: 4, 4: 0, 5: 1}
key_max_val = sorted([(k,v) for k,v in d.items() if k in [1,2,3]])[0][0]
print(key_max_val) # Outputs 1
You can use operator:
It will return you the key with maximum value:
In [873]: import operator
In [874]: d = {1: 5, 2: 0, 3: 4, 4: 0, 5: 1}
In [875]: max(d.iteritems(), key=operator.itemgetter(1))[0]
Out[875]: 1
I think this below should work (base on
#Mayank Porwal idea, sorry coz I can not reply):
d = {1: 5, 2: 0, 3: 4, 4: 0, 5: 1}
max(v for k,v in d.items())
Use a generator and the max builtin function:
Max value
max(v for k,v in d.items() if k in [1,2,3])
Max key
max(k for k,v in d.items() if k in [1,2,3])

Converting list to nested dictionary

How can I convert a list into nested `dictionary'?
For example:
l = [1, 2, 3, 4]
I'd like to convert it to a dictionary that looks like this:
{1: {2: {3: {4: {}}}}}
For that reverse the list, then start creating the empty dictionary element.
l = [1, 2, 3, 4]
d = {}
for i in reversed(l):
d = {i: d}
>>> print(d)
{1: {2: {3: {4: {}}}}}
You could also use functools.reduce for this.
reduce(lambda cur, k: {k: cur}, reversed(l), {})
Demo
>>> from functools import reduce
>>> l = [1, 2, 3, 4]
>>> reduce(lambda cur, k: {k: cur}, reversed(l), {})
{1: {2: {3: {4: {}}}}}
The flow of construction looks something like
{4: {}} -> {3: {4: {}} -> {2: {3: {4: {}}}} -> {1: {2: {3: {4: {}}}}}
as reduce traverses the reverse iterator making a new single-element dict.
You can do something like this:
l = [1,2,3,4]
d = {}
for i in l[::-1]:
d = {i: d}
print(d)
{1: {2: {3: {4: {}}}}} [Finished in 0.4s]
Here is an abstraction. Uses for setdefault are typically overshadowed by defaultdict, but here is an interesting application if you have one or more lists (iterables):
def make_nested_dict(*iterables):
"""Return a nested dictionary."""
d = {}
for it in iterables:
temp = d
for i in it:
temp = temp.setdefault(i, {})
return d
make_nested_dict([1, 2, 3, 4])
# {1: {2: {3: {4: {}}}}}
make_nested_dict([1, 2, 3, 4], [5, 6])
# {1: {2: {3: {4: {}}}}, 5: {6: {}}}
Nested Branches
Unlike defaultdict, this technique accepts duplicate keys by appending to existing "branches". For example, we will append a new 7 → 8 branch at the third level of the first (A) branch:
A B C
make_nested_dict([1, 2, 3, 4], [5, 6], [1, 2, 7, 8])
# {1: {2: {3: {4: {}}, 7: {8: {}}}}, 5: {6: {}}}
Visually:
1 → 2 → 3 → 4 (A) 5 → 6 (B)
\
7 → 8 (C)

unique count in (list of smaller lists) [duplicate]

This question already has answers here:
Nested List and count()
(8 answers)
Closed 8 years ago.
Is there a pythonic way to count the elements in a list of lists preferably using collections?
lol = [[1,2,3],[4,2],[5,1,6]]
Out:
1: 2
2: 2
3: 1
4: 1
5: 1
6: 1
from collections import Counter
import itertools
a= [[1,2,3],[4,2],[5,1,6]]
print Counter(itertools.chain(*a))
#output Counter({1: 2, 2: 2, 3: 1, 4: 1, 5: 1, 6: 1})
b=Counter(itertools.chain(*a))
for key,val in b.iteritems():
print key,':',val
output:
1 : 2
2 : 2
3 : 1
4 : 1
5 : 1
6 : 1
Other way of doing this but less efficient compared to itertools( thanks to 200OK)
a= [[1,2,3],[4,2],[5,1,6]]
sum(map(Counter, a), Counter())
#output {1: 2, 2: 2, 3: 1, 4: 1, 5: 1, 6: 1}
from collections import Counter
import itertools
lol = [[1,2,3],[4,2],[5,1,6]]
Counter(itertools.chain.from_iterable(lol))
Output
Counter({1: 2, 2: 2, 3: 1, 4: 1, 5: 1, 6: 1})

Categories

Resources