Pandas Fill Column with Dictionary - python

I have a data frame like this:
A B C D
0 1 0 nan nan
1 8 0 nan nan
2 8 1 nan nan
3 2 1 nan nan
4 0 0 nan nan
5 1 1 nan nan
and i have a dictionary like this:
dc = {'C': 5, 'D' : 10}
I want to fill the nanvalues in the data frame with the dictionary but only for the cells in which the column B values are 0, i want to obtain this:
A B C D
0 1 0 5 10
1 8 0 5 10
2 8 1 nan nan
3 2 1 nan nan
4 0 0 5 10
5 1 1 nan nan
I know how to subset the dataframe but i can't find a way to fill the values with the dictionary; any ideas?

You could use fillna with loc and pass your dict to it:
In [13]: df.loc[df.B==0,:].fillna(dc)
Out[13]:
A B C D
0 1 0 5 10
1 8 0 5 10
4 0 0 5 10
To do it for you dataframe you need to slice with the same mask and assign the result above to it:
df.loc[df.B==0, :] = df.loc[df.B==0,:].fillna(dc)
In [15]: df
Out[15]:
A B C D
0 1 0 5 10
1 8 0 5 10
2 8 1 NaN NaN
3 2 1 NaN NaN
4 0 0 5 10
5 1 1 NaN NaN

Related

Fast way to fill NaN in DataFrame

I have DataFrame object df with column like that:
[In]: df
[Out]:
id sum
0 1 NaN
1 1 NaN
2 1 2
3 1 NaN
4 1 4
5 1 NaN
6 2 NaN
7 2 NaN
8 2 3
9 2 NaN
10 2 8
10 2 NaN
... ... ...
[1810601 rows x 2 columns]
I have a lot a NaN values in my column and I want to fill these in the following way:
if NaN is on the beginning (for first index per id equals 0), then it should be 0
else if NaN I want take value from previous index for the same id
Output should be like that:
[In]: df
[Out]:
id sum
0 1 0
1 1 0
2 1 2
3 1 2
4 1 4
5 1 4
6 2 0
7 2 0
8 2 3
9 2 3
10 2 8
10 2 8
... ... ...
[1810601 rows x 2 columns]
I tried to do it "step by step" using loop with iterrows(), but it is very ineffective method. I believe it can be done faster with pandas methods
Try ffill as suggested with groupby
df['sum'] = df.groupby('id')['sum'].ffill().fillna(0)

set entire group to NaN if containing a single NaN and combine columns

I have a df
a b c d
0 1 nan 1
0 2 2 nan
0 2 3 4
1 3 1 nan
1 1 nan 3
1 1 2 3
1 1 2 4
I need to groub by a and b and then if c or d contains 1 or more nan's within groups I want the entire group in the specific column to be nan:
a b c d
0 1 nan 1
0 2 2 nan
0 2 3 nan
1 3 1 nan
1 1 nan 3
1 1 nan 3
1 1 nan 4
and then combine c and d that there is no nan's anymore
a b c d e
0 1 nan 1 1
0 2 2 nan 2
0 2 3 nan 3
1 3 1 nan 1
1 1 nan 3 3
1 1 nan 3 3
1 1 nan 4 4
You will want to check each group for whether it is nan and then set the appropriate value (nan or existing value) and then use combine_first() to combine the columns.
from io import StringIO
import pandas as pd
import numpy as np
df = pd.read_csv(StringIO("""
a b c d
0 1 nan 1
0 2 2 nan
0 2 3 4
1 3 1 nan
1 1 nan 3
1 1 2 3
1 1 2 4
"""), sep=' ')
for col in ['c', 'd']:
df[col] = df.groupby(['a','b'])[col].transform(lambda x: np.nan if any(x.isna()) else x)
df['e'] = df['c'].combine_first(df['d'])
df
a b c d e
0 0 1 NaN 1.0 1.0
1 0 2 2.0 NaN 2.0
2 0 2 3.0 NaN 3.0
3 1 3 1.0 NaN 1.0
4 1 1 NaN 3.0 3.0
5 1 1 NaN 3.0 3.0
6 1 1 NaN 4.0 4.0

Merging data into an existing pandas dataframe column conditionally

I have the following data:
one_dict = {0: "zero", 1: "one", 2: "two", 3: "three", 4: "four"}
two_dict = {0: "light", 1: "calc", 2: "line", 3: "blur", 4: "color"}
np.random.seed(2)
n = 15
a_df = pd.DataFrame(dict(a=np.random.randint(0, 4, n), b=np.random.randint(0, 3, n)))
a_df["c"] = np.nan
a_df = a_df.sort_values("b").reset_index(drop=True)
where the dataframe looks as:
In [45]: a_df
Out[45]:
a b c
0 3 0 NaN
1 1 0 NaN
2 0 0 NaN
3 2 0 NaN
4 3 0 NaN
5 1 0 NaN
6 2 1 NaN
7 2 1 NaN
8 3 1 NaN
9 0 2 NaN
10 3 2 NaN
11 3 2 NaN
12 0 2 NaN
13 3 2 NaN
14 1 2 NaN
I would like to replace values in c with those from dictionaries one_dict
and two_dict, with the result as follows:
In [45]: a_df
Out[45]:
a b c
0 3 0 three
1 1 0 one
2 0 0 zero
3 2 0 .
4 3 0 .
5 1 0 .
6 2 1 calc
7 2 1 calc
8 3 1 blur
9 0 2 NaN
10 3 2 NaN
11 3 2 NaN
12 0 2 NaN
13 3 2 NaN
14 1 2 NaN
 Attempt
I'm not sure what a good approach to this would be though.
I thought that I might do something along the following lines:
merge_df = pd.DataFrame(dict(one = one_dict, two=two_dict)).reset_index()
merge_df['zeros'] = 0
merge_df['ones'] = 1
giving
In [62]: merge_df
Out[62]:
index one two zeros ones
0 0 zero light 0 1
1 1 one calc 0 1
2 2 two line 0 1
3 3 three blur 0 1
4 4 four color 0 1
Then merge this into the a_df, but I'm not sure how to merge in and update
at the same time, or if this is a good approach.
Edit
keys correspond to the values of column a
. is just shorthand, this should be filled in with the value as others are
This is just matter of creating new dataframe with the correct structure and merge:
(a_df.drop('c', axis=1)
.merge(pd.DataFrame([one_dict,two_dict])
.rename_axis(index='b',columns='a')
.stack().reset_index(name='c'),
on=['a','b'],
how='left')
)
Output:
a b c
0 3 0 three
1 1 0 one
2 0 0 zero
3 2 0 two
4 3 0 three
5 1 0 one
6 2 1 line
7 2 1 line
8 3 1 blur
9 0 2 NaN
10 3 2 NaN
11 3 2 NaN
12 0 2 NaN
13 3 2 NaN
14 1 2 NaN

Groupby function gives me a table, not a series form?

df:
id cond1 a b c d
0 Q b 1 1 nan 1
1 R b 8 3 nan 3
2 Q a 12 4 8 nan
3 Q b 8 3 nan 1
4 R b 1 2 nan 3
5 Q a 7 9 8 nan
6 Q b 4 4 nan 1
7 R b 9 8 nan 3
8 Q a 0 10 8 nan
Group by id and cond1 and do a rolling(2).sum():
df.groupby(['id','cond1']).apply(lambda x: x[x.name[1]].rolling(2).sum())
Output:
id cond1
Q a 2 nan
5 19.00000
8 7.00000
b 0 nan
3 4.00000
6 7.00000
R b 1 nan
4 5.00000
7 10.00000
dtype: float64
Why is the output in a table form? Can it be in a series form and its index reset?
You can use reset_index() to make groupby object back to dataframe

replace each column of pandas dataframe with each value of array

df is like this,
A B C
0 NaN 150 -150
1 100 NaN 150
2 -100 -150 NaN
3 -100 -150 NaN
4 NaN 150 150
5 100 NaN -150
Another array is array([1, 2, 3])
I want to replace non-null value in each column with each value in array, and the result will be,
A B C
0 NaN 2 3
1 1 NaN 3
2 1 2 NaN
3 1 2 NaN
4 NaN 2 3
5 1 NaN 3
How can I achieve this in a simple way? I write something like,
df[df.notnull()] = np.array([1,2,3])
df[df.notnull()].loc[:,] = np.array([1,2,3])
but all cannot work.
How about:
>>> (df * 0 + 1) * arr
A B C
0 NaN 2 3
1 1 NaN 3
2 1 2 NaN
3 1 2 NaN
4 NaN 2 3
5 1 NaN 3

Categories

Resources