I expect to describe well want I need. I have a data frame with the same columns name and another column that works as an index. The data frame looks as follows:
df = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,3,3,3,3],'X':[1,2,3,4,5,2,3,4,1,3,4,5],'Y':[1,2,3,4,5,2,3,4,5,4,3,2]})
df
Out[21]:
ID X Y
0 1 1 1
1 1 2 2
2 1 3 3
3 1 4 4
4 1 5 5
5 2 2 2
6 2 3 3
7 2 4 4
8 3 1 5
9 3 3 4
10 3 4 3
11 3 5 2
My intention is to copy X as an index or one column (it doesn't matter) and append Y columns from each 'ID' in the following way:
You can try
out = pd.concat([group.rename(columns={'Y': f'Y{name}'}) for name, group in df.groupby('ID')])
out.columns = out.columns.str.replace(r'\d+$', '', regex=True)
print(out)
ID X Y Y Y
0 1 1 1.0 NaN NaN
1 1 2 2.0 NaN NaN
2 1 3 3.0 NaN NaN
3 1 4 4.0 NaN NaN
4 1 5 5.0 NaN NaN
5 2 2 NaN 2.0 NaN
6 2 3 NaN 3.0 NaN
7 2 4 NaN 4.0 NaN
8 3 1 NaN NaN 5.0
9 3 3 NaN NaN 4.0
10 3 4 NaN NaN 3.0
11 3 5 NaN NaN 2.0
Here's another way to do it:
df_org = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,3,3,3,3],'X':[1,2,3,4,5,2,3,4,1,3,4,5]})
df = df_org.copy()
for i in set(df_org['ID']):
df1 = df_org[df_org['ID']==i]
col = 'Y'+str(i)
df1.columns = ['ID', col]
df = pd.concat([ df, df1[[col]] ], axis=1)
df.columns = df.columns.str.replace(r'\d+$', '', regex=True)
print(df)
Output:
ID X Y Y Y
0 1 1 1.0 NaN NaN
1 1 2 2.0 NaN NaN
2 1 3 3.0 NaN NaN
3 1 4 4.0 NaN NaN
4 1 5 5.0 NaN NaN
5 2 2 NaN 2.0 NaN
6 2 3 NaN 3.0 NaN
7 2 4 NaN 4.0 NaN
8 3 1 NaN NaN 1.0
9 3 3 NaN NaN 3.0
10 3 4 NaN NaN 4.0
11 3 5 NaN NaN 5.0
Another solution could be as follow.
Get unique values for column ID (stored in array s).
Use np.transpose to repeat column ID n times (n == len(s)) and evaluate the array's matches with s.
Use np.where to replace True with values from df.Y and False with NaN.
Finally, drop the orignal df.Y and rename the new columns as required.
import pandas as pd
import numpy as np
df = pd.DataFrame({'ID':[1,1,1,1,1,2,2,2,3,3,3,3],
'X':[1,2,3,4,5,2,3,4,1,3,4,5],
'Y':[1,2,3,4,5,2,3,4,5,4,3,2]})
s = df.ID.unique()
df[s] = np.where((np.transpose([df.ID]*len(s))==s),
np.transpose([df.Y]*len(s)),
np.nan)
df.drop('Y', axis=1, inplace=True)
df.rename(columns={k:'Y' for k in s}, inplace=True)
print(df)
ID X Y Y Y
0 1 1 1.0 NaN NaN
1 1 2 2.0 NaN NaN
2 1 3 3.0 NaN NaN
3 1 4 4.0 NaN NaN
4 1 5 5.0 NaN NaN
5 2 2 NaN 2.0 NaN
6 2 3 NaN 3.0 NaN
7 2 4 NaN 4.0 NaN
8 3 1 NaN NaN 5.0
9 3 3 NaN NaN 4.0
10 3 4 NaN NaN 3.0
11 3 5 NaN NaN 2.0
If performance is an issue, this method should be faster than this answer, especially when the number of unique values for ID increases.
Consider a simple dataframe:
import numpy as np
import pandas as pd
x = pd.DataFrame(np.arange(10).reshape(5,2))
print(x)
0 1
0 0 1
1 2 3
2 4 5
3 6 7
4 8 9
I would like to create a hierarchically indexed dataframe of the form:
0 1
a b a b
0 0 NaN 1 NaN
1 2 NaN 3 NaN
2 4 NaN 5 NaN
3 6 NaN 7 NaN
4 8 NaN 9 NaN
where the 'a' columns correspond to the original dataframe columns and the 'b' columns are blank (or nan).
I can certainly create a hierarchically indexed dataframe with all NaNs and loop over the columns of the original dataframe, writing them into
the new dataframe. Is there something more compact than that?
you can do with MultiIndex.from_product
extra_level = ['a', 'b']
new_cols = pd.MultiIndex.from_product([x.columns, extra_level])
x.columns = new_cols[::len(x.columns)] # select all the first element of extra_level
x = x.reindex(columns=new_cols)
print(x)
0 1
a b a b
0 0 NaN 1 NaN
1 2 NaN 3 NaN
2 4 NaN 5 NaN
3 6 NaN 7 NaN
4 8 NaN 9 NaN
Very much like #Ben.T I am using MultiIndex.from_product:
x.assign(l='a')
.set_index('l', append=True)
.unstack()
.reindex(pd.MultiIndex.from_product([x.columns.tolist(), ['a','b']]), axis=1)
Output:
0 1
a b a b
0 0 NaN 1 NaN
1 2 NaN 3 NaN
2 4 NaN 5 NaN
3 6 NaN 7 NaN
4 8 NaN 9 NaN
I have the following the data frame and I want to add the index value plus the column header and divide by 2.
The initial grid would like this:
grid = pd.DataFrame(
columns=[1,2,3,4,5],
index = [1,2,3,4,5]
)
grid
This results in the following:
1 2 3 4 5
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN
What I'm looking to get is: For example, index of 1 + column header of 3 results in 4, which divided by 2 is 2
1 2 3 4 5
1 1 1.5 2 2.5 3
2 1.5 2 2.5 3 3.5
3 2 2.5 3 3.5 4
4 2.5 3 3.5 4 4.5
5 3 3.5 4 4.5 5
You can use broadcasting here:
grid[:] = (grid.index.values[:,None] + grid.columns.values)/2
I have a dataframe in which I want to apply a rolling mean over a column of numbers that come in 3-pairs where I only want 4 unique values to go into the mean.
Lets say my dataframe looks like:
Group Column to roll
1 9
2 5
2 5
2 4
2 4
2 4
2 3
2 3
2 3
2 6
2 6
2 6
2 8
Since I want 4 unique values to go into the mean but all values to be of equal weight and within the same group, my expected output (assuming I need 4 unique values) would be:
Group Output
1 nan
2 nan
2 nan
2 nan
2 nan
2 nan
2 nan
2 nan
2 nan
2 (6+3+4+5)/4
2 (6+3+4+5)/4
2 (6+3+4+5)/4
2 (8+6+3+4)/4
Any ideas how to do this?
You could try something like this:
df['Column to roll'].drop_duplicates().rolling(4).mean().reindex(df.index).ffill()
Output:
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
5 NaN
6 NaN
7 NaN
8 4.50
9 4.50
10 4.50
11 5.25
Name: Column to roll, dtype: float64
Edit question changed
df_out = df.groupby('Group')['Column to roll']\
.apply(lambda x: x.drop_duplicates().rolling(4).mean()).rename('Output')
df.set_index('Group',append=True).swaplevel(0,1)\
.join(df_out, how='left').ffill().reset_index(level=1, drop=True)
Output:
Column to roll Output
Group
1 9 NaN
2 5 NaN
2 5 NaN
2 4 NaN
2 4 NaN
2 4 NaN
2 3 NaN
2 3 NaN
2 3 NaN
2 6 4.50
2 6 4.50
2 6 4.50
2 8 5.25
Here is a dataframe
a b c d
nan nan 3 5
nan 1 2 3
1 nan 4 5
2 3 7 9
nan nan 2 3
I want to replace the observations in both columns 'a' and 'b' where both of them are NaNs with 0s. Rows 2 and 5 in columns 'a' and 'b' have both both NaN, so I want to replace only those rows with 0's in those matching NaN columns.
so my output must be
a b c d
0 0 3 5
nan 1 2 3
1 nan 4 5
2 3 7 9
0 0 2 3
There might be a easier builtin function in Pandas, but this one should work.
df[['a', 'b']] = df.ix[ (np.isnan(df.a)) & (np.isnan(df.b)), ['a', 'b'] ].fillna(0)
Actually the solution from #Psidom much easier to read.
You can create a boolean series based on the conditions on columns a/b, and then use loc to modify corresponding columns and rows:
df.loc[df[['a','b']].isnull().all(1), ['a','b']] = 0
df
# a b c d
#0 0.0 0.0 3 5
#1 NaN 1.0 2 3
#2 1.0 NaN 4 5
#3 2.0 3.0 7 9
#4 0.0 0.0 2 3
Or:
df.loc[df.a.isnull() & df.b.isnull(), ['a','b']] = 0