I am trying to group the words_count column by both essay_Set and domain1_score and adding the counters in words_count to add the counters results as mentioned here:
>>> c = Counter(a=3, b=1)
>>> d = Counter(a=1, b=2)
>>> c + d # add two counters together: c[x] + d[x]
Counter({'a': 4, 'b': 3})
I grouped them using this command:
words_freq_by_set = words_freq_by_set.groupby(by=["essay_set", "domain1_score"]) but do not know how to pass the Counter addition function to apply it on words_count column which is simply +.
Here is my dataframe:
GroupBy.sum works with Counter objects. However I should mention the process is pairwise, so this may not be very fast. Let's try
words_freq_by_set.groupby(by=["essay_set", "domain1_score"])['words_count'].sum()
df = pd.DataFrame({
'a': [1, 1, 2],
'b': [Counter([1, 2]), Counter([1, 3]), Counter([2, 3])]
})
df
a b
0 1 {1: 1, 2: 1}
1 1 {1: 1, 3: 1}
2 2 {2: 1, 3: 1}
df.groupby(by=['a'])['b'].sum()
a
1 {1: 2, 2: 1, 3: 1}
2 {2: 1, 3: 1}
Name: b, dtype: object
Related
I have a list of lists of int as shown below
[[1, 2, 3],
[1, 5],
[4, 2, 6]]
I want to generate the frequency of the numbers in the lists as a dict, for example 1 occurs in 2 of the lists, and so on, expected output is
{1:2,
2:2,
3:1,
4:1,
5:1,
6:1}
How can this be generated?
You could try this:
L is your list of list.
expected = {1:2,
2:2,
3:1,
4:1,
5:1,
6:1}
>>> from itertools import chain
>>> from collections import Counter
>>> flattened = list(chain.from_iterable(L))
>>> flattened
[1, 2, 3, 1, 5, 4, 2, 6]
>>> counts = Counter(flattened)
>>> counts
Counter({1: 2, 2: 2, 3: 1, 5: 1, 4: 1, 6: 1})
# It's easy to make it to a function or one-liner too.
>>> counts = Counter(chain.from_iterable(L))
>>> assert counts == expected # your expected result shown above
# silence means matching.
you can use Counter for this
>>> from collections import Counter as c
>>> array = [[1, 2, 3],[1,5],[4,2,6]]
>>> result = c()
>>> for sublist in array:
... result += c(sublist)
...
>>> result
Counter({1: 2, 2: 2, 3: 1, 5: 1, 4: 1, 6: 1})
This question already has answers here:
How do I count the occurrences of a list item?
(30 answers)
Closed 3 years ago.
I am trying to track seen elements, from a big array, using a dict.
Is there a way to force a dictionary object to be integer type and set to zero by default upon initialization?
I have done this with a very clunky codes and two loops.
Here is what I do now:
fl = [0, 1, 1, 2, 1, 3, 4]
seenit = {}
for val in fl:
seenit[val] = 0
for val in fl:
seenit[val] = seenit[val] + 1
Of course, just use collections.defaultdict([default_factory[, ...]]):
from collections import defaultdict
fl = [0, 1, 1, 2, 1, 3, 4]
seenit = defaultdict(int)
for val in fl:
seenit[val] += 1
print(fl)
# Output
defaultdict(<class 'int'>, {0: 1, 1: 3, 2: 1, 3: 1, 4: 1})
print(dict(seenit))
# Output
{0: 1, 1: 3, 2: 1, 3: 1, 4: 1}
In addition, if you don't like to import collections you can use dict.get(key[, default])
fl = [0, 1, 1, 2, 1, 3, 4]
seenit = {}
for val in fl:
seenit[val] = seenit.get(val, 0) + 1
print(seenit)
# Output
{0: 1, 1: 3, 2: 1, 3: 1, 4: 1}
Also, if you only want to solve the problem and don't mind to use exactly dictionaries you may use collection.counter([iterable-or-mapping]):
from collections import Counter
fl = [0, 1, 1, 2, 1, 3, 4]
seenit = Counter(f)
print(seenit)
# Output
Counter({1: 3, 0: 1, 2: 1, 3: 1, 4: 1})
print(dict(seenit))
# Output
{0: 1, 1: 3, 2: 1, 3: 1, 4: 1}
Both collection.defaultdict and collection.Counter can be read as dictionary[key] and supports the usage of .keys(), .values(), .items(), etc. Basically they are a subclass of a common dictionary.
If you want to talk about performance I checked with timeit.timeit() the creation of the dictionary and the loop for a million of executions:
collection.defaultdic: 2.160868141 seconds
dict.get: 1.3540439499999999 seconds
collection.Counter: 4.700308418999999 seconds
collection.Counter may be easier, but much slower.
You can use collections.Counter:
from collections import Counter
Counter([0, 1, 1, 2, 1, 3, 4])
Output:
Counter({1: 3, 0: 1, 2: 1, 3: 1, 4: 1})
You can then address it like a dictionary:
>>> Counter({1: 3, 0: 1, 2: 1, 3: 1, 4: 1})[1]
3
>>> Counter({1: 3, 0: 1, 2: 1, 3: 1, 4: 1})[0]
1
Using val in seenit is a bit faster than .get():
seenit = dict()
for val in fl:
if val in seenit :
seenit[val] += 1
else:
seenit[val] = 1
For larger lists, Counter will eventually outperform all other approaches. and defaultdict is going to be faster than using .get() or val in seenit.
df = pd.DataFrame({'a': [1,1,1,1,2,2,2,2,3,3,3,3], 'b': [5,5,1,1,3,3,3,1,2,1,1,1,]})
>>> df
a b
0 1 5
1 1 5
2 1 1
3 1 1
4 2 3
5 2 3
6 2 3
7 2 1
8 3 2
9 3 1
10 3 1
11 3 1
>>> df.groupby(['a','b']).size().to_dict()
{(1, 5): 2, (3, 2): 1, (2, 3): 3, (3, 1): 3, (1, 1): 2, (2, 1): 1}
What I am getting is the counts of each a and b combination with a tuple of the pair as key but what I am trying to get to is:
{1: {5: 2, 1: 2}, 2: {3: 3, 1: 1}, 3: {2: 1, 1: 3} }
You'll need an additional groupby inside a dict comprehension:
i = df.groupby(['a','b']).size().reset_index(level=1)
j = {k : dict(g.values) for k, g in i.groupby(level=0)}
print(j)
{
1: {1: 2, 5: 2},
2: {1: 1, 3: 3},
3: {1: 3, 2: 1}
}
You can use collections.defaultdict for an O(n) solution.
from collections import defaultdict
df = pd.DataFrame({'a': [1,1,1,1,2,2,2,2,3,3,3,3], 'b': [5,5,1,1,3,3,3,1,2,1,1,1,]})**Option 2: defaultdict**
d = defaultdict(lambda: defaultdict(int))
for i, j in map(tuple, df.values):
d[i][j] += 1
# defaultdict(<function __main__.<lambda>>,
# {1: defaultdict(int, {1: 2, 5: 2}),
# 2: defaultdict(int, {1: 1, 3: 3}),
# 3: defaultdict(int, {1: 3, 2: 1})})
from collections import Counter
import pandas as pd
s = pd.Series(Counter(zip(df.a, df.b)))
{
n: d.xs(n).to_dict()
for n, d in s.groupby(level=0)
}
{1: {1: 2, 5: 2}, 2: {1: 1, 3: 3}, 3: {1: 3, 2: 1}}
So I have a DataFrame that looks like the following
a b c
0 AB 10 {a: 2, b: 1}
1 AB 1 {a: 3, b: 2}
2 AC 2 {a: 4, b: 3}
...
400 BC 4 {a: 1, b: 4}
Given another key pair like {c: 2} what's the syntax to add this to every value in row c?
a b c
0 AB 10 {a: 2, b: 1, c: 2}
1 AB 1 {a: 3, b: 2, c: 2}
2 AC 2 {a: 4, b: 3, c: 2}
...
400 BC 4 {a: 1, b: 4, c: 2}
I've tried df['C'] +=, and df['C'].append(), and df.C.append, but neither seem to work.
Here is a generalized way for updating dictionaries in a column with another dictionary, which can be used for multiple keys.
Test dataframe:
>>> x = pd.Series([{'a':2,'b':1}])
>>> df = pd.DataFrame(x, columns=['c'])
>>> df
c
0 {'b': 1, 'a': 2}
And just apply a lambda function:
>>> update_dict = {'c': 2}
>>> df['c'].apply(lambda x: {**x, **update_dict})
0 {'b': 1, 'a': 2, 'c': 2}
Name: c, dtype: object
Note: this uses the Python3 update dictionary syntax mentioned in an answer to How to merge two Python dictionaries in a single expression?. For Python2, you can use the merge_two_dicts function in the top answer. You can use the function definition from that answer and then write:
df['c'].apply(lambda x: merge_two_dicts(x, update_dict))
To my understanding, I know when I invoke Counter to covert dict. This dict includes value of keys is zero will disappear.
from collections import Counter
a = {"a": 1, "b": 5, "d": 0}
b = {"b": 1, "c": 2}
print Counter(a) + Counter(b)
If I want to keep my keys, how to do?
This is my expected result:
Counter({'b': 6, 'c': 2, 'a': 1, 'd': 0})
You can also use the update() method of Counter instead of + operator, example -
>>> a = {"a": 1, "b": 5, "d": 0}
>>> b = {"b": 1, "c": 2}
>>> x = Counter(a)
>>> x.update(Counter(b))
>>> x
Counter({'b': 6, 'c': 2, 'a': 1, 'd': 0})
update() function adds counts instead of replacing them , and it does not remove the zero value one either. We can also do Counter(b) first, then update with Counter(a), Example -
>>> y = Counter(b)
>>> y.update(Counter(a))
>>> y
Counter({'b': 6, 'c': 2, 'a': 1, 'd': 0})
Unfortunately, when summing two counter, only elements with a positive count are used.
If you want to keep the elements with a count of zero, you could define a function like this:
def addall(a, b):
c = Counter(a) # copy the counter a, preserving the zero elements
for x in b: # for each key in the other counter
c[x] += b[x] # add the value in the other counter to the first
return c
You can just subclass Counter and adjust its __add__ method:
from collections import Counter
class MyCounter(Counter):
def __add__(self, other):
"""Add counts from two counters.
Preserves counts with zero values.
>>> MyCounter('abbb') + MyCounter('bcc')
MyCounter({'b': 4, 'c': 2, 'a': 1})
>>> MyCounter({'a': 1, 'b': 0}) + MyCounter({'a': 2, 'c': 3})
MyCounter({'a': 3, 'c': 3, 'b': 0})
"""
if not isinstance(other, Counter):
return NotImplemented
result = MyCounter()
for elem, count in self.items():
newcount = count + other[elem]
result[elem] = newcount
for elem, count in other.items():
if elem not in self:
result[elem] = count
return result
counter1 = MyCounter({'a': 1, 'b': 0})
counter2 = MyCounter({'a': 2, 'c': 3})
print(counter1 + counter2) # MyCounter({'a': 3, 'c': 3, 'b': 0})
I help Anand S Kumar to do more a additional explanation.
Even though your dict includes negative value, it still keep your keys.
from collections import Counter
a = {"a": 1, "b": 5, "d": -1}
b = {"b": 1, "c": 2}
print Counter(a) + Counter(b)
#Counter({'b': 6, 'c': 2, 'a': 1})
x = Counter(a)
x.update(Counter(b))
print x
#Counter({'b': 6, 'c': 2, 'a': 1, 'd': -1})