I want to shift rows in a pandas df when values are equal to a specific value in a Column. For the df below, I'm trying to shift the values in Column B to Column A when values in A == x.
import pandas as pd
df = pd.DataFrame({
'A' : [1,'x','x','x',5],
'B' : ['x',2,3,4,'x'],
})
This is my attempt:
df = df.loc[df.A.shift(-1) == df.A.shift(1), 'x'] = df.A.shift(1)
Intended Output:
A B
0 1 x
1 2
2 3
3 4
4 5 x
You can use:
m = df.A.eq('x')
df[m]=df[m].shift(-1,axis=1)
print(df)
A B
0 1 x
1 2 NaN
2 3 NaN
3 4 NaN
4 5 x
You can use:
df[df.A=='x'] = df.shift(-1,axis=1)
print(df)
A B
0 1 x
1 2 NaN
2 3 NaN
3 4 NaN
4 5 x
Related
I have a data-frame
Columns
0 Nan
1 Nan
2 Nan
3 Nan
I want to fill all the Nan columns here with natural numbers starting from 1 to rest of the empty columns in increasing order.
Expected Output
Columns
0 1
1 2
2 3
3 4
Any suggestions to do this?
df['Columns'] = df['Columns'].fillna(??????????)
Solution if need replace only missing values use DataFrame.loc with Series.cumsum, then Trues are processing like 1:
m = df['Columns'].isna()
#nice solution from #Ch3steR, thank you
df.loc[m, 'Columns'] = m.cumsum()
#alternative
#df.loc[m, 'Columns'] = range(1, m.sum() + 1)
print (df)
Columns
0 1
1 2
2 3
3 4
Test for another data:
print (df)
Columns
0 NaN
1 NaN
2 100.0
3 NaN
m = df['Columns'].isna()
df.loc[m, 'Columns'] = m.cumsum()
print (df)
Columns
0 1.0
1 2.0
2 100.0
3 3.0
If need set values by range, so original column values are overwritten, use:
df['Columns'] = range(1, len(df) + 1)
print (df)
Columns
0 1
1 2
2 3
3 4
I have an initial column with no missing data (A) but with repeated values. How do I fill the next column (B) with missing data so that it is filled and the column on the left always has the same value on the right? I would also like any other columns to remain the same (C)
For example, this is what I have
A B C
1 1 20 4
2 2 NaN 8
3 3 NaN 2
4 2 30 9
5 3 40 1
6 1 NaN 3
And this is what I want
A B C
1 1 20 4
2 2 30* 8
3 3 40* 2
4 2 30 9
5 3 40 1
6 1 20* 3
Asterisk on filled values.
This needs to be scalable with a very large dataframe.
Additionally, if I had a value on the left column that has more than one value on the right side on separate observations, how would I fill with the mean?
You can use groupby on 'A' and use first to find the first corresponding value in 'B' (it will not select NaN).
import pandas as pd
df = pd.DataFrame({'A':[1,2,3,2,3,1],
'B':[20, None, None, 30, 40, None],
'C': [4,8,2,9,1,3]})
# find first 'B' value for each 'A'
lookup = df[['A', 'B']].groupby('A').first()['B']
# only use rows where 'B' is NaN
nan_mask = df['B'].isnull()
# replace NaN values in 'B' with lookup values
df['B'].loc[nan_mask] = df.loc[nan_mask].apply(lambda x: lookup[x['A']], axis=1)
print(df)
Which outputs:
A B C
0 1 20.0 4
1 2 30.0 8
2 3 40.0 2
3 2 30.0 9
4 3 40.0 1
5 1 20.0 3
If there are many NaN values in 'B' you might want to exclude them before you use groupby.
import pandas as pd
df = pd.DataFrame({'A':[1,2,3,2,3,1],
'B':[20, None, None, 30, 40, None],
'C': [4,8,2,9,1,3]})
# Only use rows where 'B' is NaN
nan_mask = df['B'].isnull()
# Find first 'B' value for each 'A'
lookup = df[~nan_mask][['A', 'B']].groupby('A').first()['B']
df['B'].loc[nan_mask] = df.loc[nan_mask].apply(lambda x: lookup[x['A']], axis=1)
print(df)
You could do sort_values first then forward fill column B based on column A. The way to implement this will be:
import pandas as pd
import numpy as np
x = {'A':[1,2,3,2,3,1],
'B':[20,np.nan,np.nan,30,40,np.nan],
'C':[4,8,2,9,1,3]}
df = pd.DataFrame(x)
#sort_values first, then forward fill based on column B
#this will get the right values for you while maintaing
#the original order of the dataframe
df['B'] = df.sort_values(by=['A','B'])['B'].ffill()
print (df)
Output will be:
Original data:
A B C
0 1 20.0 4
1 2 NaN 8
2 3 NaN 2
3 2 30.0 9
4 3 40.0 1
5 1 NaN 3
Updated data:
A B C
0 1 20.0 4
1 2 30.0 8
2 3 40.0 2
3 2 30.0 9
4 3 40.0 1
5 1 20.0 3
How to replace rows with columns in below data that all data is preserved?
Test data:
import pandas as pd
data_dic = {
"x": ['a','b','a','a','b'],
"y": [1,2,3,4,5]
}
df = pd.DataFrame(data_dic)
x y
0 a 1
1 b 2
2 a 3
3 b 4
4 b 5
Expected Output:
a b
0 1 2
1 3 4
2 NaN 5
Use GroupBy.cumcount with pivot:
df = df.assign(g = df.groupby('x').cumcount()).pivot('g','x','y')
Or DataFrame.set_index with Series.unstack:
df = df.set_index([df.groupby('x').cumcount(),'x'])['y'].unstack()
print (df)
x a b
g
0 1.0 2.0
1 3.0 4.0
2 NaN 5.0
I want to add l in column 'A' but it creates a new column and adds l to the last one. Why is it happening? And how can I make what I want?
import pandas as pd
l=[1,2,3]
df = pd.DataFrame(columns =['A'])
df = df.append(l, ignore_index=True)
df = df.append(l, ignore_index=True)
print(df)
A 0
0 NaN 1.0
1 NaN 2.0
2 NaN 3.0
3 NaN 1.0
4 NaN 2.0
5 NaN 3.0
Edited
Is this what you want to do:
In[6]:df=df.A.append(pd.Series(l)).reset_index().drop('index',1).rename(columns={0:'A'})
In[7]:df
Out[7]:
A
0 1
1 2
2 3
Then you can add any list of different length.
Suppose:
a=[9,8,7,6,5]
In[11]:df=df.A.append(pd.Series(a)).reset_index().drop('index',1).rename(columns={0:'A'})
In[12]:df
Out[12]:
A
0 1
1 2
2 3
3 9
4 8
5 7
6 6
7 5
Previously
are you looking for this :
df=pd.DataFrame(l,columns=['A'])
df
Out[5]:
A
0 1
1 2
2 3
You can just pass a dictionary in the dataframe constructor, that if I understand your question correctly.
l = [1,2,3]
df = pd.DataFrame({'A': l})
df
A
0 1
1 2
2 3
Given a 3-column DataFrame, df:
a b c
0 NaN a True
1 1 b True
2 2 c False
3 3 NaN False
4 4 e True
[5 rows x 3 columns]
I would like to place aNaN in column c for each row where a NaN exists in any other colunn. My current approach is as follows:
for col in df:
df['c'][pd.np.isnan(df[col])] = pd.np.nan
I strongly suspect that there is a way to do this via logical indexing instead of iterating through columns as I am currently doing.
How could this be done?
Thank you!
If you don't care about the bool/float issue, I propose:
>>> df.loc[df.isnull().any(axis=1), "c"] = np.nan
>>> df
a b c
0 NaN a NaN
1 1 b 1
2 2 c 0
3 3 NaN NaN
4 4 e 1
[5 rows x 3 columns]
If you really do, then starting again from your frame df you could:
>>> df["c"] = df["c"].astype(object)
>>> df.loc[df.isnull().any(axis=1), "c"] = np.nan
>>> df
a b c
0 NaN a NaN
1 1 b True
2 2 c False
3 3 NaN NaN
4 4 e True
[5 rows x 3 columns]
df.c[df.ix[:, :'c'].apply(lambda r: any(r.isnull()), axis=1)] = np.nan
Note that you may need to change the type of column c to float or you'll get an error about being unable to assign nan to integer column.
filter and select the rows where you have NaN for either 'a' or 'b' and assign 'c' to NaN:
In [18]:
df.ix[pd.isnull(df.a) | pd.isnull(df.b),'c'] = NaN
In [19]:
df
Out[19]:
a b c
0 NaN a NaN
1 1 b 1
2 2 c 0
3 3 d 0
4 4 NaN NaN
[5 rows x 3 columns]