My problem is the displacement of column names in my data frame after set the column B column as index.
What I had:
A B C
11 6260063207400 1999-02-15 1
22 6260063207400 1999-02-18 2
29 6260063207400 1999-02-20 2
61 6260063207400 1999-02-27 2
What I have:
B
A
1999-02-15 1
1999-02-18 2
1999-02-20 2
1999-02-27 2
1999-02-28 2
What I would:
A B
1999-02-15 1
1999-02-18 2
1999-02-20 2
1999-02-27 2
1999-02-28 2
Let us do
df = df.reset_index()
Related
I have a pandas dataframe:
A B C D
1 1 0 32
1 4
2 0 43
1 12
3 0 58
1 34
2 1 0 37
1 5
[..]
where A, B and C are index columns. What I want to compute is for every group of rows with unique values for A and B: D WHERE C=1 / D WHERE C=0.
The result should look like this:
A B NEW
1 1 4/32
2 12/43
3 58/34
2 1 37/5
[..]
Can you help me?
Use Series.unstack first, so possible divide columns 0,1:
new = df['D'].unstack()
new = new[1].div(new[0]).to_frame('NEW')
print (new)
NEW
A B
1 1 0.125000
2 0.279070
3 0.586207
2 2 0.135135
Is there a way to set column names for arguments as column index position, rather than column names?
Every example that I see is written with column names on value_vars. I need to use the column index.
For instance, instead of:
df2 = pd.melt(df,value_vars=['asset1','asset2'])
Using something similar to:
df2 = pd.melt(df,value_vars=[0,1])
Select columns names by indexing:
df = pd.DataFrame({
'asset1':list('acacac'),
'asset2':[4]*6,
'A':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4]
})
df2 = pd.melt(df,
id_vars=df.columns[[0,1]],
value_vars=df.columns[[2,3]],
var_name= 'c_name',
value_name='Value')
print (df2)
asset1 asset2 c_name Value
0 a 4 A 7
1 c 4 A 8
2 a 4 A 9
3 c 4 A 4
4 a 4 A 2
5 c 4 A 3
6 a 4 D 1
7 c 4 D 3
8 a 4 D 5
9 c 4 D 7
10 a 4 D 1
11 c 4 D 0
This seems to be easy but couldn't find a working solution for it:
I have a dataframe with 3 columns:
df = pd.DataFrame({'A': [0,0,2,2,2],
'B': [1,1,2,2,3],
'C': [1,1,2,3,4]})
A B C
0 0 1 1
1 0 1 1
2 2 2 2
3 2 2 3
4 2 3 4
I want to select rows based on values of column A, then groupby based on values of column B, and finally transform values of column C into sum. something along the line of this (obviously not working) code:
df[df['A'].isin(['2']), 'C'] = df[df['A'].isin(['2']), 'C'].groupby('B').transform('sum')
desired output for above example is:
A B C
0 0 1 1
1 0 1 1
2 2 2 5
3 2 3 4
I also know how to split dataframe and do it. I am looking more for a solution that does it without the need of split+concat/merge. Thank you.
Is it just
s = df['A'].isin([2])
pd.concat((df[s].groupby(['A','B'])['C'].sum().reset_index(),
df[~s])
)
Output:
A B C
0 2 2 5
1 2 3 4
0 0 1 1
Update: Without splitting, you can assign a new column indicating special values of A:
(df.sort_values('A')
.assign(D=(~df['A'].isin([2])).cumsum())
.groupby(['D','A','B'])['C'].sum()
.reset_index('D',drop=True)
.reset_index()
)
Output:
A B C
0 0 1 1
1 0 1 1
2 2 2 5
3 2 3 4
Say I have a dataframe df and group it by a few columns, dfg, with the median of one of its columns. How could I then take those median values, and expand them out so that those mean values are in a new column of the original df, and associated with the respective conditions? This will mean there are duplicates, but I will next be using this column for a subsequent calculation and having these in a column will make this possible.
Example data:
import pandas as pd
data = {'idx':[1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2],
'condition1':[1,1,2,2,3,3,4,4,1,1,2,2,3,3,4,4],
'condition2':[1,2,1,2,1,2,1,2,1,2,1,2,1,2,1,2],
'values':np.random.normal(0,1,16)}
df = pd.DataFrame(data)
dfg = df.groupby(['idx', 'condition2'], as_index=False)['values'].median()
example of desired result (note duplicates corresponding to correct conditions):
idx condition1 condition2 values medians
0 1 1 1 0.35031 0.656355
1 1 1 2 -0.291736 -0.024304
2 1 2 1 1.593545 0.656355
3 1 2 2 -1.275154 -0.024304
4 1 3 1 0.075259 0.656355
5 1 3 2 1.054481 -0.024304
6 1 4 1 0.9624 0.656355
7 1 4 2 0.243128 -0.024304
8 2 1 1 1.717391 1.155406
9 2 1 2 0.788847 1.006583
10 2 2 1 1.145891 1.155406
11 2 2 2 -0.492063 1.006583
12 2 3 1 -0.157029 1.155406
13 2 3 2 1.224319 1.006583
14 2 4 1 1.164921 1.155406
15 2 4 2 2.042239 1.006583
I believe you need GroupBy.transform with median for new column:
df['medians'] = df.groupby(['idx', 'condition2'])['values'].transform('median')
I have a Pandas dataframe that contains a grouping variable. I would like to merge each group with other dataframes based on the contents of one of the columns. So, for example, I have a dataframe, dfA, which can be defined as:
dfA = pd.DataFrame({'a':[1,2,3,4,5,6],
'b':[0,1,0,0,1,1],
'c':['a','b','c','d','e','f']})
a b c
0 1 0 a
1 2 1 b
2 3 0 c
3 4 0 d
4 5 1 e
5 6 1 f
Two other dataframes, dfB and dfC, contain a common column ('a') and an extra column ('d') and can be defined as:
dfB = pd.DataFrame({'a':[1,2,3],
'd':[11,12,13]})
a d
0 1 11
1 2 12
2 3 13
dfC = pd.DataFrame({'a':[4,5,6],
'd':[21,22,23]})
a d
0 4 21
1 5 22
2 6 23
I would like to be able to split dfA based on column 'b' and merge one of the groups with dfB and the other group with dfC to produce an output that looks like:
a b c d
0 1 0 a 11
1 2 1 b 12
2 3 0 c 13
3 4 0 d 21
4 5 1 e 22
5 6 1 f 23
In this simplified version, I could concatenate dfB and dfC and merge with dfA without splitting into groups as shown below:
dfX = pd.concat([dfB,dfC])
dfA = dfA.merge(dfX,on='a',how='left')
print(dfA)
a b c d
0 1 0 a 11
1 2 1 b 12
2 3 0 c 13
3 4 0 d 21
4 5 1 e 22
5 6 1 f 23
However, in the real-world situation, the smaller dataframes will be generated from multiple different complex sources; generating the dataframes and combining into a single dataframe beforehand may not be feasible because there may be overlapping data on the column that will be used for merging the dataframes (but this will be avoided if the dataframe can be split based on the grouping variable). Is it possible to use Pandas groupby() method to do this instead? I was thinking of something like the following (which doesn't work, perhaps because I'm not combining the groups into a new dataframe correctly):
grouped = dfA.groupby('b')
for name, group in grouped:
if name == 0:
group = group.merge(dfB,on='a',how='left')
elif name == 1:
group = group.merge(dfC,on='a',how='left')
Any thoughts would be appreciated.
This will fix your code
l=[]
grouped = dfA.groupby('b')
for name, group in grouped:
if name == 0:
group = group.merge(dfB,on='a',how='left')
elif name == 1:
group = group.merge(dfC,on='a',how='left')
l.append(group)
pd.concat(l)
Out[215]:
a b c d
0 1 0 a 11.0
1 3 0 c 13.0
2 4 0 d NaN
0 2 1 b NaN
1 5 1 e 22.0
2 6 1 f 23.0