I have a pandas dataframe:
A B C D
1 1 0 32
1 4
2 0 43
1 12
3 0 58
1 34
2 1 0 37
1 5
[..]
where A, B and C are index columns. What I want to compute is for every group of rows with unique values for A and B: D WHERE C=1 / D WHERE C=0.
The result should look like this:
A B NEW
1 1 4/32
2 12/43
3 58/34
2 1 37/5
[..]
Can you help me?
Use Series.unstack first, so possible divide columns 0,1:
new = df['D'].unstack()
new = new[1].div(new[0]).to_frame('NEW')
print (new)
NEW
A B
1 1 0.125000
2 0.279070
3 0.586207
2 2 0.135135
Related
My problem is the displacement of column names in my data frame after set the column B column as index.
What I had:
A B C
11 6260063207400 1999-02-15 1
22 6260063207400 1999-02-18 2
29 6260063207400 1999-02-20 2
61 6260063207400 1999-02-27 2
What I have:
B
A
1999-02-15 1
1999-02-18 2
1999-02-20 2
1999-02-27 2
1999-02-28 2
What I would:
A B
1999-02-15 1
1999-02-18 2
1999-02-20 2
1999-02-27 2
1999-02-28 2
Let us do
df = df.reset_index()
Let's assume, I have the following data frame.
Id Combinations
1 (A,B)
2 (C,)
3 (A,D)
4 (D,E,F)
5 (F)
I would like to filter out Combination column values with more than value in a set. Something like below. AND I would like count the number of occurrence as whole in Combination column. For example, ID number 2 and 5 should be removed since their value in a set is only 1.
The result I am looking for is:
ID Combination Frequency
1 A 2
1 B 1
3 A 2
3 D 2
4 D 2
4 E 1
4 F 2
Can anyone help to get the above result in Python pandas?
First if necessary convert values to lists:
df['Combinations'] = df['Combinations'].str.strip('(,)').str.split(',')
If need count after filtering only one values by Series.str.len in boolean indexing, then use DataFrame.explode and count values by Series.map with Series.value_counts:
df1 = df[df['Combinations'].str.len().gt(1)].explode('Combinations')
df1['Frequency'] = df1['Combinations'].map(df1['Combinations'].value_counts())
print (df1)
Id Combinations Frequency
0 1 A 2
0 1 B 1
2 3 A 2
2 3 D 2
3 4 D 2
3 4 E 1
3 4 F 1
Or if need count before removing them filter them by Series.duplicated in last step:
df2 = df.explode('Combinations')
df2['Frequency'] = df2['Combinations'].map(df2['Combinations'].value_counts())
df2 = df2[df2['Id'].duplicated(keep=False)]
Alternative:
df2 = df2[df2.groupby('Id').Id.transform('size') > 1]
Or:
df2 = df2[df2['Id'].map(df2['Id'].value_counts() > 1]
print (df2)
Id Combinations Frequency
0 1 A 2
0 1 B 1
2 3 A 2
2 3 D 2
3 4 D 2
3 4 E 1
3 4 F 2
This seems to be easy but couldn't find a working solution for it:
I have a dataframe with 3 columns:
df = pd.DataFrame({'A': [0,0,2,2,2],
'B': [1,1,2,2,3],
'C': [1,1,2,3,4]})
A B C
0 0 1 1
1 0 1 1
2 2 2 2
3 2 2 3
4 2 3 4
I want to select rows based on values of column A, then groupby based on values of column B, and finally transform values of column C into sum. something along the line of this (obviously not working) code:
df[df['A'].isin(['2']), 'C'] = df[df['A'].isin(['2']), 'C'].groupby('B').transform('sum')
desired output for above example is:
A B C
0 0 1 1
1 0 1 1
2 2 2 5
3 2 3 4
I also know how to split dataframe and do it. I am looking more for a solution that does it without the need of split+concat/merge. Thank you.
Is it just
s = df['A'].isin([2])
pd.concat((df[s].groupby(['A','B'])['C'].sum().reset_index(),
df[~s])
)
Output:
A B C
0 2 2 5
1 2 3 4
0 0 1 1
Update: Without splitting, you can assign a new column indicating special values of A:
(df.sort_values('A')
.assign(D=(~df['A'].isin([2])).cumsum())
.groupby(['D','A','B'])['C'].sum()
.reset_index('D',drop=True)
.reset_index()
)
Output:
A B C
0 0 1 1
1 0 1 1
2 2 2 5
3 2 3 4
I have a Pandas dataframe that contains a grouping variable. I would like to merge each group with other dataframes based on the contents of one of the columns. So, for example, I have a dataframe, dfA, which can be defined as:
dfA = pd.DataFrame({'a':[1,2,3,4,5,6],
'b':[0,1,0,0,1,1],
'c':['a','b','c','d','e','f']})
a b c
0 1 0 a
1 2 1 b
2 3 0 c
3 4 0 d
4 5 1 e
5 6 1 f
Two other dataframes, dfB and dfC, contain a common column ('a') and an extra column ('d') and can be defined as:
dfB = pd.DataFrame({'a':[1,2,3],
'd':[11,12,13]})
a d
0 1 11
1 2 12
2 3 13
dfC = pd.DataFrame({'a':[4,5,6],
'd':[21,22,23]})
a d
0 4 21
1 5 22
2 6 23
I would like to be able to split dfA based on column 'b' and merge one of the groups with dfB and the other group with dfC to produce an output that looks like:
a b c d
0 1 0 a 11
1 2 1 b 12
2 3 0 c 13
3 4 0 d 21
4 5 1 e 22
5 6 1 f 23
In this simplified version, I could concatenate dfB and dfC and merge with dfA without splitting into groups as shown below:
dfX = pd.concat([dfB,dfC])
dfA = dfA.merge(dfX,on='a',how='left')
print(dfA)
a b c d
0 1 0 a 11
1 2 1 b 12
2 3 0 c 13
3 4 0 d 21
4 5 1 e 22
5 6 1 f 23
However, in the real-world situation, the smaller dataframes will be generated from multiple different complex sources; generating the dataframes and combining into a single dataframe beforehand may not be feasible because there may be overlapping data on the column that will be used for merging the dataframes (but this will be avoided if the dataframe can be split based on the grouping variable). Is it possible to use Pandas groupby() method to do this instead? I was thinking of something like the following (which doesn't work, perhaps because I'm not combining the groups into a new dataframe correctly):
grouped = dfA.groupby('b')
for name, group in grouped:
if name == 0:
group = group.merge(dfB,on='a',how='left')
elif name == 1:
group = group.merge(dfC,on='a',how='left')
Any thoughts would be appreciated.
This will fix your code
l=[]
grouped = dfA.groupby('b')
for name, group in grouped:
if name == 0:
group = group.merge(dfB,on='a',how='left')
elif name == 1:
group = group.merge(dfC,on='a',how='left')
l.append(group)
pd.concat(l)
Out[215]:
a b c d
0 1 0 a 11.0
1 3 0 c 13.0
2 4 0 d NaN
0 2 1 b NaN
1 5 1 e 22.0
2 6 1 f 23.0
Hi I will show what im trying to do through examples:
I start with a dataframe like this:
> pd.DataFrame({'A':['a','a','a','c'],'B':[1,1,2,3], 'count':[5,6,1,7]})
A B count
0 a 1 5
1 a 1 6
2 a 2 1
3 c 3 7
I need to find a way to get all the unique combinations between column A and B, and merge them. The count column should be added together between the merged columns, the result should be like the following:
A B count
0 a 1 11
1 a 2 1
2 c 3 7
Thans for any help.
Use groupby with aggregating sum:
print (df.groupby(['A','B'], as_index=False)['count'].sum())
A B count
0 a 1 11
1 a 2 1
2 c 3 7
print (df.groupby(['A','B'])['count'].sum().reset_index())
A B count
0 a 1 11
1 a 2 1
2 c 3 7