Pandas: Custom fillna() function?

Pandas: Custom fillna() function? - python

Lets say I have data like this:
>>> df = pd.DataFrame({'values': [5, np.nan, 2, 2, 2, 5, np.nan, 4, 5]})
>>> print(df)
values
0 5.0
1 NaN
2 2.0
3 2.0
4 2.0
5 5.0
6 NaN
7 4.0
8 5.0
I know that I can use fillna(), with arguments such as fillna(method='ffill') to fill missing values with the previous value. Is there a way of writing a custom method for fillna? Lets say I want every NaN value to be replaced by the arithmetic middle of to previous 2 values and the next 2 values, how would I do that? (I am not saying that is a good method of filling the values, but I want to know if it can be done).
Example for what the output would have to look like:
0 5.0
1 3.0
2 2.0
3 2.0
4 2.0
5 5.0
6 4.0
7 4.0
8 5.0

You can use ffill and bfill together as follows :
df['values'] = df['values'].ffill().add(df['values'].bfill()).div(2)
print(df)
values
0 5.0
1 3.0
2 2.0
3 2.0
4 2.0
5 5.0
6 4.0
7 4.0
8 5.0
Just change the df['values'] to df to apply over the whole dataframe!

Related

How to merge two tables while preserving all values?

I am relatively new to python and I am wondering how I can merge these two tables and preserve both their values?
Consider these two tables:
df = pd.DataFrame([[1, 3], [2, 4],[2.5,1],[5,6],[7,8]], columns=['A', 'B'])
A B
1 3
2 4
2.5 1
5 6
7 8
df2 = pd.DataFrame([[1],[2],[3],[4],[5],[6],[7],[8]], columns=['A'])
A
1
2
...
8
I want to obtain the following result:
A B
1 3
2 4
2.5 1
3 NaN
4 NaN
5 6
6 NaN
7 8
8 NaN
You can see that column A includes all values from both the first and second dataframe in an ordered manner.
I have attempted:
pd.merge(df,df2,how='outer')
pd.merge(df,df2,how='right')
But the former does not result in an ordered dataframe and the latter does not include rows that are unique to df.

Let us do concat then drop_duplicates
out = pd.concat([df2,df]).drop_duplicates('A',keep='last').sort_values('A')
Out[96]:
A B
0 1.0 3.0
1 2.0 4.0
2 2.5 1.0
2 3.0 NaN
3 4.0 NaN
3 5.0 6.0
5 6.0 NaN
4 7.0 8.0
7 8.0 NaN

Forward fill on custom value in pandas dataframe

I am looking to perform forward fill on some dataframe columns.
the ffill method replaces missing values or NaN with the previous filled value.
In my case, I would like to perform a forward fill, with the difference that I don't want to do that on Nan but for a specific value (say "*").
Here's an example
import pandas as pd
import numpy as np
d = [{"a":1, "b":10},
{"a":2, "b":"*"},
{"a":3, "b":"*"},
{"a":4, "b":"*"},
{"a":np.nan, "b":50},
{"a":6, "b":60},
{"a":7, "b":70}]
df = pd.DataFrame(d)
with df being
a b
0 1.0 10
1 2.0 *
2 3.0 *
3 4.0 *
4 NaN 50
5 6.0 60
6 7.0 70
The expected result should be
a b
0 1.0 10
1 2.0 10
2 3.0 10
3 4.0 10
4 NaN 50
5 6.0 60
6 7.0 70
If replacing "*" by np.nan then ffill, that would cause to apply ffill to column a.
Since my data has hundreds of columns, I was wondering if there is a more efficient way than looping over all columns, check if it countains "*", then replace and ffill.

You can use df.mask with df.isin with df.replace
df.mask(df.isin(['*']),df.replace('*',np.nan).ffill())
a b
0 1.0 10
1 2.0 10
2 3.0 10
3 4.0 10
4 NaN 50
5 6.0 60
6 7.0 70

I think you're going in the right direction, but here's a complete solution. What I'm doing is 'marking' the original NaN values, then replacing "*" with NaN, using ffill, and then putting the original NaN values back.
df = df.replace(np.NaN, "<special>").replace("*", np.NaN).ffill().replace("<special>", np.NaN)
output:
a b
0 1.0 10.0
1 2.0 10.0
2 3.0 10.0
3 4.0 10.0
4 NaN 50.0
5 6.0 60.0
6 7.0 70.0
And here's an alternative solution that does the same thing, without the 'special' marking:
original_nan = df.isna()
df = df.replace("*", np.NaN).ffill()
df[original_nan] = np.NaN

Perform arithmetic operations on null values

When i am trying to do arithmetic operation including two or more columns facing problem with null values.
One more thing which i want to mention here that i don't want to fill missed/null values.
Actually i want something like 1 + np.nan = 1 but it is giving np.nan. I tried to solve it by np.nansum but it didn't work.
df = pd.DataFrame({"a":[1,2,3,4],"b":[1,2,np.nan,np.nan]})
df
Out[6]:
a b c
0 1 1.0 2.0
1 2 2.0 4.0
2 3 NaN NaN
3 4 NaN NaN
And,
df["d"] = np.nansum([df.a + df.b])
df
Out[13]:
a b d
0 1 1.0 6.0
1 2 2.0 6.0
2 3 NaN 6.0
3 4 NaN 6.0
But i want actually like,
df
Out[10]:
a b c
0 1 1.0 2.0
1 2 2.0 4.0
2 3 NaN 3.0
3 4 NaN 4.0

The np.nansum here calculated the sum, of the entire column. You do not want that, you probably want to call the np.nansum on the two columns, like:
df['d'] = np.nansum((df.a, df.b), axis=0)
This then yield the expected:
>>> df
a b d
0 1 1.0 2.0
1 2 2.0 4.0
2 3 NaN 3.0
3 4 NaN 4.0

Simply use DataFrame.sum over axis=1:
df['c'] = df.sum(axis=1)
Output
a b c
0 1 1.0 2.0
1 2 2.0 4.0
2 3 NaN 3.0
3 4 NaN 4.0

Fill NaN values in dataframe with previous values in column

Hi I have a dataframe with some missing values ex:
The black numbers 40 and 50 are the values already inputted and the red ones are to autofill from the previous values. Row 2 is blank as there is no previous number to fill.
Any idea how I can do this efficiently? I was trying loops but maybe there is a better way

It can be done easily with ffill method in pandas fillna.
To illustrate the working consider the following sample dataframe
df = pd.DataFrame()
df['Vals'] = [1, 2, 3, np.nan, np.nan, 6, 7, np.nan, 8]
Vals
0 1.0
1 2.0
2 3.0
3 NaN
4 5.0
5 6.0
6 7.0
7 NaN
8 8.0
To fill the missing value do this
df['Vals'].fillna(method='ffill', inplace=True)
Vals
0 1.0
1 2.0
2 3.0
3 3.0
4 3.0
5 6.0
6 7.0
7 7.0
8 8.0

There is a direct synonym function for this, pandas.DataFrame.ffill
df['Vals',inplace=True]

pandas rolling apply to allow nan

I have a very simple Pandas Series:
xx = pd.Series([1, 2, np.nan, np.nan, 3, 4, 5])
If I run this I get what I want:
>>> xx.rolling(3,1).mean()
0 1.0
1 1.5
2 1.5
3 2.0
4 3.0
5 3.5
6 4.0
But if I have to use .apply() I cannot get it to work by ignoring NaNs in the mean() operation:
>>> xx.rolling(3,1).apply(np.mean)
0 1.0
1 1.5
2 NaN
3 NaN
4 NaN
5 NaN
6 4.0
>>> xx.rolling(3,1).apply(lambda x : np.mean(x))
0 1.0
1 1.5
2 NaN
3 NaN
4 NaN
5 NaN
6 4.0
What should I do in order to both use .apply() and have the result in the first output? My actual problem is more complicated that I have to use .apply() to realize but it boils down to this issue.

You can use np.nanmean()
xx.rolling(3,1).apply(lambda x : np.nanmean(x))
Out[59]:
0 1.0
1 1.5
2 1.5
3 2.0
4 3.0
5 3.5
6 4.0
dtype: float64
If you have to process the nans explicitly, you can do:
xx.rolling(3,1).apply(lambda x : np.mean(x[~np.isnan(x)]))
Out[94]:
0 1.0
1 1.5
2 1.5
3 2.0
4 3.0
5 3.5
6 4.0
dtype: float64

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas: Custom fillna() function? - python

You can use ffill and bfill together as follows : df['values'] = df['values'].ffill().add(df['values'].bfill()).div(2) print(df) values 0 5.0 1 3.0 2 2.0 3 2.0 4 2.0 5 5.0 6 4.0 7 4.0 8 5.0 Just change the df['values'] to df to apply over the whole dataframe!

Related

How to merge two tables while preserving all values?

Forward fill on custom value in pandas dataframe

Perform arithmetic operations on null values

Fill NaN values in dataframe with previous values in column

pandas rolling apply to allow nan

Categories

Resources