Pandas insert empty row at 0th position - python

Suppose have following data frame
A B
1 2 3 4 5
4 5 6 7 8
I want to check if df(0,0) is nan then insert pd.series(np.nan) at 0th position. So in above case it will be
A B
1 2 3 4 5
4 5 6 7 8
I am able to check (0,0) element but how do I insert empty row at first position?

Use append of DataFrame with one empty row:
df1 = pd.DataFrame([[np.nan] * len(df.columns)], columns=df.columns)
df = df1.append(df, ignore_index=True)
print (df)
A B C D E
0 NaN NaN NaN NaN NaN
1 1.0 2.0 3.0 4.0 5.0
2 4.0 5.0 6.0 7.0 8.0

Perhaps you can first append a row with zeros, shift the whole rows and overwrite the first with 0:
df
A B C D E
0 1 2 3 4 5
1 4 5 6 7 8
df.loc[len(df)] = 0
df
A B C D E
0 1 2 3 4 5
1 4 5 6 7 8
2 0 0 0 0 0
df = df.shift()
df.loc[0] = 0
df
A B C D E
0 0.0 0.0 0.0 0.0 0.0
1 1.0 2.0 3.0 4.0 5.0
2 4.0 5.0 6.0 7.0 8.0

Related

How to assign a value from the last row of a preceding group to the next group?

The goal is to put the digits from the last row of the previous letter group in the new column "last_digit_prev_group". The expected, correct value, as a result formula, was entered by me manually in the column "col_ok". I stopped trying shift (), but the effect was far from what I expected. Maybe there is some other way?
Forgive me the inconsistency of my post, I'm not an IT specialist and I don't know English. Thanks in advance for your support.
df = pd.read_csv('C:/Users/.../a.csv',names=['group_letter', 'digit', 'col_ok'] ,
index_col=0,)
df['last_digit_prev_group'] = df.groupby('group_letter')['digit'].shift(1)
print(df)
group_letter digit col_ok last_digit_prev_group
A 1 n NaN
A 3 n 1.0
A 2 n 3.0
A 5 n 2.0
A 1 n 5.0
B 1 1 NaN
B 2 1 1.0
B 1 1 2.0
B 1 1 1.0
B 3 1 1.0
C 5 3 NaN
C 6 3 5.0
C 1 3 6.0
C 2 3 1.0
C 3 3 2.0
D 4 3 NaN
D 3 3 4.0
D 2 3 3.0
D 5 3 2.0
D 7 3 5.0
Use Series.mask with DataFrame.duplicated for last valeus of digit, then Series.shift and last ffill:
df['last_digit_prev_group'] = (df['digit'].mask(df.duplicated('group_letter', keep='last'))
.shift()
.ffill())
print (df)
group_letter digit col_ok last_digit_prev_group
0 A 1 n NaN
1 A 3 n NaN
2 A 2 n NaN
3 A 5 n NaN
4 A 1 n NaN
5 B 1 1 1.0
6 B 2 1 1.0
7 B 1 1 1.0
8 B 1 1 1.0
9 B 3 1 1.0
10 C 5 3 3.0
11 C 6 3 3.0
12 C 1 3 3.0
13 C 2 3 3.0
14 C 3 3 3.0
15 D 4 3 3.0
16 D 3 3 3.0
17 D 2 3 3.0
18 D 5 3 3.0
19 D 7 3 3.0
If possible some last value is NaN:
df['last_digit_prev_group'] = (df['digit'].mask(df.duplicated('group_letter', keep='last'))
.shift()
.groupby(df['group_letter']).ffill()
print (df)
group_letter digit col_ok last_digit_prev_group
0 A 1.0 n NaN
1 A 3.0 n NaN
2 A 2.0 n NaN
3 A 5.0 n NaN
4 A 1.0 n NaN
5 B 1.0 1 1.0
6 B 2.0 1 1.0
7 B 1.0 1 1.0
8 B 1.0 1 1.0
9 B 3.0 1 1.0
10 C 5.0 3 3.0
11 C 6.0 3 3.0
12 C 1.0 3 3.0
13 C 2.0 3 3.0
14 C NaN 3 3.0
15 D 4.0 3 NaN
16 D 3.0 3 NaN
17 D 2.0 3 NaN
18 D 5.0 3 NaN
19 D 7.0 3 NaN

How to insert list of values into null values of a column in python?

I am new to pandas. I am facing an issue with null values. I have a list of 3 values which has to be inserted into a column of missing values how do I do that?
In [57]: df
Out[57]:
a b c d
0 0 1 2 3
1 0 NaN 0 1
2 0 Nan 3 4
3 0 1 2 5
4 0 Nan 2 6
In [58]: list = [11,22,44]
The output I want
Out[57]:
a b c d
0 0 1 2 3
1 0 11 0 1
2 0 22 3 4
3 0 1 2 5
4 0 44 2 6
If your list is same length as the no of NaN:
l=[11,22,44]
df.loc[df['b'].isna(),'b'] = l
print(df)
a b c d
0 0 1.0 2 3
1 0 11.0 0 1
2 0 22.0 3 4
3 0 1.0 2 5
4 0 44.0 2 6
Try with stack and assign the value then unstack back
s = df.stack(dropna=False)
s.loc[s.isna()] = l # chnage the list name to l here, since override the original python and panda function and object name will create future warning
df = s.unstack()
df
Out[178]:
a b c d
0 0.0 1.0 2.0 3.0
1 0.0 11.0 0.0 1.0
2 0.0 22.0 3.0 4.0
3 0.0 1.0 2.0 5.0
4 0.0 44.0 2.0 6.0

fill NaN values with mean based on another column specific value

I want to fill the NaN values on my dataframe on column c with the mean for only rows who has as category B, and ignore the others.
print (df)
Category b c
0 A 1 5.0
1 C 1 NaN
2 A 1 4.0
3 B 2 NaN
4 A 2 1.0
5 B 2 Nan
6 C 1 3.0
7 C 1 2.0
8 B 1 NaN
So what I'm doing for the moment is :
df.c = df.c.fillna(df.c.mean())
But it fill all the NaN values, while I want only to fill the 3rd, 5th and the 8th rows who had category value equal to B.
Combine fillna with slicing assignment
df.loc[df.Category.eq('B'), 'c'] = (df.loc[df.Category.eq('B'), 'c'].
fillna(df.c.mean()))
Out[736]:
Category b c
0 A 1 5.0
1 C 1 NaN
2 A 1 4.0
3 B 2 3.0
4 A 2 1.0
5 B 2 3.0
6 C 1 3.0
7 C 1 2.0
8 B 1 3.0
Or a direct assignment with 2 masks
pandas.DataFrame.eq is the element wise equality operator.
df.loc[df.Category.eq('B') & df.c.isna(), 'c'] = df.c.mean()
Out[745]:
Category b c
0 A 1 5.0
1 C 1 NaN
2 A 1 4.0
3 B 2 3.0
4 A 2 1.0
5 B 2 3.0
6 C 1 3.0
7 C 1 2.0
8 B 1 3.0
This would be the answer for your question:
df.c = df.apply(
lambda row: row['c'].fillna(df.c.mean()) if row['Category']=='B' else row['c'] ,axis=1)

How to do forward filling for each group in pandas

I have a dataframe similar to below
id A B C D E
1 2 3 4 5 5
1 NaN 4 NaN 6 7
2 3 4 5 6 6
2 NaN NaN 5 4 1
I want to do a null value imputation for columns A, B, C in a forward filling but for each group. That means, I want the forward filling be applied on each id. How can I do that?
Use GroupBy.ffill for forward filling per groups for all columns, but if first values per groups are NaNs there is no replace, so is possible use fillna and last casting to integers:
print (df)
id A B C D E
0 1 2.0 3.0 4.0 5 NaN
1 1 NaN 4.0 NaN 6 NaN
2 2 3.0 4.0 5.0 6 6.0
3 2 NaN NaN 5.0 4 1.0
cols = ['A','B','C']
df[['id'] + cols] = df.groupby('id')[cols].ffill().fillna(0).astype(int)
print (df)
id A B C D E
0 1 2 3 4 5 NaN
1 1 2 4 4 6 NaN
2 2 3 4 5 6 6.0
3 2 3 4 5 4 1.0
Detail:
print (df.groupby('id')[cols].ffill().fillna(0).astype(int))
id A B C
0 1 2 3 4
1 1 2 4 4
2 2 3 4 5
3 2 3 4 5
Or:
cols = ['A','B','C']
df.update(df.groupby('id')[cols].ffill().fillna(0))
print (df)
id A B C D E
0 1 2.0 3.0 4.0 5 NaN
1 1 2.0 4.0 4.0 6 NaN
2 2 3.0 4.0 5.0 6 6.0
3 2 3.0 4.0 5.0 4 1.0

Add new dataframe to existing database but only add if column name matches

I have two dataframes that I am trying to combine but I'm not getting the result I want using pandas.concat.
I have a database of data that I want to add new data to but only if the column of name matches.
Let says df1 is:
A B C D
1 1 2 2
3 3 4 4
5 5 6 6
and df2 is:
A E D F
7 7 8 8
9 9 0 0
the result I would like to get is:
A B C D
1 1 2 2
3 3 4 4
5 5 6 6
7 - - 8
9 - - 0
The blank data doesn't have to be - it can be anything.
When I use:
results = pandas.concat([df1, df2], axis=0, join='outer')
it gives me a new dataframe with all of the columns A through F, instead of what I want. Any ideas for how I can accomplish this? Thanks!
You want to use the pd.DataFrame.align method and specify that you want to align with the left argument's indices and that you only care about columns.
d1, d2 = df1.align(df2, join='left', axis=1)
Then you can use pd.DataFrame.append or pd.concat
pd.concat([d1, d2], ignore_index=True)
A B C D
0 1 1.0 2.0 2
1 3 3.0 4.0 4
2 5 5.0 6.0 6
3 7 NaN NaN 8
4 9 NaN NaN 0
Or
d1.append(d2, ignore_index=True)
A B C D
0 1 1.0 2.0 2
1 3 3.0 4.0 4
2 5 5.0 6.0 6
3 7 NaN NaN 8
4 9 NaN NaN 0
My preferred way would be to skip the reassignment to names
pd.concat(df1.align(df2, 'left', 1), ignore_index=True)
A B C D
0 1 1.0 2.0 2
1 3 3.0 4.0 4
2 5 5.0 6.0 6
3 7 NaN NaN 8
4 9 NaN NaN 0
You can use find the intersection of columns on df2 and concat or append:
pd.concat(
[df1, df2[df1.columns.intersection(df2.columns)]]
)
Or,
df1.append(df2[df1.columns.intersection(df2.columns)])
A B C D
0 1 1.0 2.0 2
1 3 3.0 4.0 4
2 5 5.0 6.0 6
0 7 NaN NaN 8
1 9 NaN NaN 0
You can also use reindex and concat:
pd.concat([df1,df2.reindex(columns=df1.columns)])
Out[81]:
A B C D
0 1 1.0 2.0 2
1 3 3.0 4.0 4
2 5 5.0 6.0 6
0 7 NaN NaN 8
1 9 NaN NaN 0
Transpose first before merging.
df1.T.merge(df2.T, how="left", left_index=True, right_index=True).T
A B C D
0_x 1.0 1.0 2.0 2.0
1_x 3.0 3.0 4.0 4.0
2 5.0 5.0 6.0 6.0
0_y 7.0 NaN NaN 8.0
1_y 9.0 NaN NaN 0.0
df1.T df2.T
0 1 2 1 2
A 1 3 5 A 7 9
B 1 3 5 E 7 9
C 2 4 6 D 8 0
D 2 4 6 F 8 0
Now the result can be obtained with a merge with how="left" and we use the indices as the join key by passing left_index=True and right_index=True.
df1.T.merge(df2.T, how="left", left_index=True, right_index=True)
0_x 1_x 2 0_y 1_y
A 1 3 5 7.0 9.0
B 1 3 5 NaN NaN
C 2 4 6 NaN NaN
D 2 4 6 8.0 0.0

Categories

Resources