Fill empty values in a dataframe based on columns in another dataframe

Fill empty values in a dataframe based on columns in another dataframe - python

I have a dataframe df1 like this.
I want to fill the nan and the number 0 in column score with mutiple values in another dataframe df2 according to the different names.
How could I do this?

Option 1
Short version
df1.score = df1.score.mask(df1.score.eq(0)).fillna(
df1.name.map(df2.set_index('name').score)
)
df1
name score
0 A 10.0
1 B 32.0
2 A 10.0
3 C 30.0
4 B 20.0
5 A 45.0
6 A 10.0
7 A 10.0
Option 2
Interesting version using searchsorted. df2 must be sorted by 'name'.
i = np.where(np.isnan(df1.score.mask(df1.score.values == 0).values))[0]
j = df2.name.values.searchsorted(df1.name.values[i])
df1.score.values[i] = df2.score.values[j]
df1
name score
0 A 10.0
1 B 32.0
2 A 10.0
3 C 30.0
4 B 20.0
5 A 45.0
6 A 10.0
7 A 10.0

If df1 and df2 are your dataframes, you can create a mapping and then call pd.Series.replace:
df1 = pd.DataFrame({'name' : ['A', 'B', 'A', 'C', 'B', 'A', 'A', 'A'],
'score': [0, 32, 0, np.nan, np.nan, 45, np.nan, np.nan]})
df2 = pd.DataFrame({'name' : ['A', 'B', 'C'], 'score' : [10, 20, 30]})
print(df1)
name score
0 A 0.0
1 B 32.0
2 A 0.0
3 C NaN
4 B NaN
5 A 45.0
6 A NaN
7 A NaN
print(df2)
name score
0 A 10
1 B 20
2 C 30
mapping = dict(df2.values)
df1.loc[(df1.score.isnull()) | (df1.score == 0), 'score'] =\
df1[(df1.score.isnull()) | (df1.score == 0)].name.replace(mapping)
print(df1)
name score
0 A 10.0
1 B 32.0
2 A 10.0
3 C 30.0
4 B 20.0
5 A 45.0
6 A 10.0
7 A 10.0

Or using merge, fillna
import pandas as pd
import numpy as np
df1.loc[df.score==0,'score']=np.nan
df1.merge(df2,on='name',how='left').fillna(method='bfill',axis=1)[['name','score_x']]\
.rename(columns={'score_x':'score'})

This method changes the order (the result will be sorted by name).
df1.set_index('name').replace(0, np.nan).combine_first(df2.set_index('name')).reset_index()
name score
0 A 10
1 A 10
2 A 45
3 A 10
4 A 10
5 B 32
6 B 20
7 C 30

Related

Division in pandas dataframe

I am trying to divide my data frame with one of its columns:
Here is my data frame:
A
B
C
1
10
10
2
20
30
3
15
33
Now, I want to divide columns "b" and "c" by column "a", my desired output be like:
A
B
C
1
10
10
2
10
15
3
5
11
df/df['a']

Use DataFrame.div:
df[['B','C']] = df[['B','C']].div(df['A'], axis=0)
print (df)
A B C
0 1 10.0 10.0
1 2 10.0 15.0
2 3 5.0 11.0
If need divide all columns without A:
cols = df.columns.difference(['A'])
df[cols] = df[cols].div(df['A'], axis=0)

try this:
d = {
'A': [1,2,3],
'B': [10,20,15],
'C': [10,30,33]
}
df = pd.DataFrame(d)
df['B'] = df['B']/df['A']
df['C'] = df['C']/df['A']
print(df)
Output:
A B C
0 1 10.0 10.0
1 2 10.0 15.0
2 3 5.0 11.0

How to use .bfill() with pandas groupby without dropping the grouping variable

I want to use bfill and groupby but have not figured out a way to do so without dropping the grouping variable. I know I can just concatenate back the ID column but there's gotta be another way of doing this.
import pandas as pd
import numpy as np
test = pd.DataFrame({'ID': ['A', 'A', 'A', 'B', 'B', 'B'],
'dd': [0, 0, 0, 0, 0, 0],
'nu': np.array([0, 1, np.NaN, np.NaN, 10, 20])})
In [11]:test.groupby('ID').bfill()
Out[11]:
nu
0 0.0
1 1.0
2 NaN
3 10.0
4 10.0
5 20.0
Desired output
ID dd nu
0 A 0 0.0
1 A 0 1.0
2 A 0 NaN
3 B 0 10.0
4 B 0 10.0
5 B 0 20.0

Try df.assign:
>>> test.assign(nu=test.groupby('ID').bfill()['nu'])
ID dd nu
0 A 0 0.0
1 A 0 1.0
2 A 0 NaN
3 B 0 10.0
4 B 0 10.0
5 B 0 20.0
Or df.groupby.apply,
>>> test.groupby('ID').apply(lambda x:x.bfill())
ID dd nu
0 A 0 0.0
1 A 0 1.0
2 A 0 NaN
3 B 0 10.0
4 B 0 10.0
5 B 0 20.0

Pandas Merge - Bring in identical columnvalues based on keys

I have 3 dataframes like this,
df = pd.DataFrame([[1, 3], [2, 4], [3,6], [4,12], [5,18]], columns=['A', 'B'])
df2 = pd.DataFrame([[1, 5], [2, 6], [3,9]], columns=['A', 'C'])
df3 = pd.DataFrame([[4, 15, "hello"], [5, 19, "yes"]], columns=['A', 'C', 'D'])
They look like this,
df
A B
0 1 3
1 2 4
2 3 6
3 4 12
4 5 18
df2
A C
0 1 5
1 2 6
2 3 9
df3
A C D
0 4 15 hello
1 5 19 yes
my merges, first merge,
f_merge = pd.merge(df, df2, on='A',how='left')
second merge,(first_merge with df3)
s_merge = pd.merge(f_merge, df3, on='A', how='left')
I get the output like this,
A B C_x C_y D
0 1 3 5.0 NaN NaN
1 2 4 6.0 NaN NaN
2 3 6 9.0 NaN NaN
3 4 12 NaN 15.0 hello
4 5 18 NaN 19.0 yes
I need like this,
A B C D
0 1 3 5.0 NaN
1 2 4 6.0 NaN
2 3 6 9.0 NaN
3 4 12 15.0 hello
4 5 18 19.0 yes
How can I achieve this output? Any suggestion would be great.

Concat df2 and df3 before merging.
new_df = pd.merge(df, pd.concat([df2, df3], ignore_index=True), on='A')
new_df
Out:
A B C D
0 1 3 5 NaN
1 2 4 6 NaN
2 3 6 9 NaN
3 4 12 15 hello
4 5 18 19 yes

We can do combine_first
df.set_index('A',inplace=True)
df2.set_index('A').combine_first(df).combine_first(df3.set_index('A'))
B C D
A
1 3.0 5.0 NaN
2 4.0 6.0 NaN
3 6.0 9.0 NaN
4 12.0 15.0 hello
5 18.0 19.0 yes

Move Null rows to the bottom of the dataframe

I have a dataframe:
df1 = pd.DataFrame({'a': [1, 2, 10, np.nan, 5, 6, np.nan, 8],
'b': list('abcdefgh')})
df1
a b
0 1.0 a
1 2.0 b
2 10.0 c
3 NaN d
4 5.0 e
5 6.0 f
6 NaN g
7 8.0 h
I would like to move all the rows where a is np.nan to the bottom of the dataframe
df2 = pd.DataFrame({'a': [1, 2, 10, 5, 6, 8, np.nan, np.nan],
'b': list('abcefhdg')})
df2
a b
0 1.0 a
1 2.0 b
2 10.0 c
3 5.0 e
4 6.0 f
5 8.0 h
6 NaN d
7 NaN g
I have tried this:
na = df1[df1.a.isnull()]
df1.dropna(subset = ['a'], inplace=True)
df1 = df1.append(na)
df1
Is there a cleaner way to do this? Or is there a function that I can use for this?

New answer after edit OP
You were close but you can clean up your code a bit by using the following:
df1 = pd.concat([df1[df1['a'].notnull()], df1[df1['a'].isnull()]], ignore_index=True)
print(df1)
a b
0 1.0 a
1 2.0 b
2 10.0 c
3 5.0 e
4 6.0 f
5 8.0 h
6 NaN d
7 NaN g
Old answer
Use sort_values with the na_position=last argument:
df1 = df1.sort_values('a', na_position='last')
print(df1)
a b
0 1.0 a
1 2.0 b
2 3.0 c
4 5.0 e
5 6.0 f
7 8.0 h
3 NaN d
6 NaN g

Not exist in pandas yet, use Series.isna with Series.argsort for positions and change ordering by DataFrame.iloc:
df1 = df1.iloc[df1['a'].isna().argsort()].reset_index(drop=True)
print (df1)
a b
0 1.0 a
1 2.0 b
2 10.0 c
3 5.0 e
4 6.0 f
5 8.0 h
6 NaN d
7 NaN g
Or pure pandas solution with helper column and DataFrame.sort_values:
df1 = (df1.assign(tmp=df1['a'].isna())
.sort_values('tmp')
.drop('tmp', axis=1)
.reset_index(drop=True))
print (df1)
a b
0 1.0 a
1 2.0 b
2 10.0 c
3 5.0 e
4 6.0 f
5 8.0 h
6 NaN d
7 NaN g

Pandas Create New Column Based on Value in Another Column, If False Return Previous Value of New Column

this is a Python pandas problem I've been struggling with for a while now. Lets say I have a simple dataframe df where df['a'] = [1,2,3,1,4,6] and df['b'] = [10,20,30,40,50,60]. I would like to create a third column 'c', where if the value of df['a'] == 1, df['c'] = df['b']. If this is false, df['c'] = the previous value of df['c']. I have tried using np.where to make this happen, but the result is not what I was expecting. Any advice?
df = pd.DataFrame()
df['a'] = [1,2,3,1,4,6]
df['b'] = [10,20,30,40,50,60]
df['c'] = np.nan
df['c'] = np.where(df['a'] == 1, df['b'], df['c'].shift(1))
The result is:
a b c
0 1 10 10.0
1 2 20 NaN
2 3 30 NaN
3 1 40 40.0
4 4 50 NaN
5 6 60 NaN
Whereas I would have expected:
a b c
0 1 10 10.0
1 2 20 10.0
2 3 30 10.0
3 1 40 40.0
4 4 50 40.0
5 6 60 40.0

Try this:
df.c.ffill(inplace=True)
Output:
a b c
0 1 10 10.0
1 2 20 10.0
2 3 30 10.0
3 1 40 40.0
4 4 50 40.0
5 6 60 40.0

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Fill empty values in a dataframe based on columns in another dataframe - python

I have a dataframe df1 like this. I want to fill the nan and the number 0 in column score with mutiple values in another dataframe df2 according to the different names. How could I do this?

Or using merge, fillna import pandas as pd import numpy as np df1.loc[df.score==0,'score']=np.nan df1.merge(df2,on='name',how='left').fillna(method='bfill',axis=1)[['name','score_x']]\ .rename(columns={'score_x':'score'})

This method changes the order (the result will be sorted by name). df1.set_index('name').replace(0, np.nan).combine_first(df2.set_index('name')).reset_index() name score 0 A 10 1 A 10 2 A 45 3 A 10 4 A 10 5 B 32 6 B 20 7 C 30

Related

Division in pandas dataframe

How to use .bfill() with pandas groupby without dropping the grouping variable

Pandas Merge - Bring in identical columnvalues based on keys

Move Null rows to the bottom of the dataframe

Pandas Create New Column Based on Value in Another Column, If False Return Previous Value of New Column

Categories

Resources