Pandas - Fillna based on last non-blank value and next column

Pandas - Fillna based on last non-blank value and next column - python

I have the following pandas dataframe:
A B C
0 100.0 110.0 100
1 90.0 120.0 110
2 NaN 105.0 105
3 NaN 100.0 103
4 NaN NaN 107
5 NaN NaN 110
I need to fill NaNs in all columns in a particular way. Let's take column "A" as an example: the last non-NaN value is row #1 (90.0). So for column "A" I need to fill NaNs with the following formula:
Column_A-Row_1 * Column_B-CurrentRow / Column_B-Row_1
For example, the first NaN of column A (row #2) should be filled with: 90 * 105 / 120. The following NaN of column A should be filled with: 90 * 100 / 120.
Please note that column names can change, so I can't reference columns by name.
This is the expected output:
A B C
0 100.00 110.00 100.0
1 90.00 120.00 110.0
2 78.75 105.00 105.0
3 75.00 100.00 103.0
4 NaN 103.88 107.0
5 NaN 106.80 110.0
Any ideas? Thanks

You can fill the first NaN that follows a number using shift on both axis:
df2 = df.combine_first(df.shift().mul(df.div(df.shift()).shift(-1,axis=1)))
output:
A B C
0 100.00 110.000000 100
1 90.00 120.000000 110
2 78.75 105.000000 105
3 NaN 100.000000 103
4 NaN 103.883495 107
5 NaN NaN 110
It is unclear how you get the 75 though, do you want to iterate the process?

Related

Merge rows duplicate values in a column using Pandas

I have DataFrame like this
A B C D
010 100 NaN 300
020 NaN 200 400
020 100 NaN NaN
030 NaN NaN 19
030 1 NaN NaN
040 NaN 2 1
How can I merge all rows that have duplicate (same value) in Column A so that other values fill the empty places?
End result
A B C D
010 100 NaN 300
020 100 200 400
030 1 NaN 19
040 NaN 2 1

Check with
df=df.groupby('A',as_index=False).first()
Out[65]:
A B C D
0 10 100.0 NaN 300.0
1 20 100.0 200.0 400.0
2 30 1.0 NaN 19.0
3 40 NaN 2.0 1.0

How To Map Column Values where two others match? "Reindexing only valid with uniquely valued Index objects"?

I have one DataFrame, df, I have four columns shown below:
IDP1 IDP1Number IDP2 IDP2Number
1 100 1 NaN
3 110 2 150
5 120 3 NaN
7 140 4 160
9 150 5 190
NaN NaN 6 130
NaN NaN 7 NaN
NaN NaN 8 200
NaN NaN 9 90
NaN NaN 10 NaN
I want instead to map values from df.IDP1Number to IDP2Number using IDP1 to IDP2. I want to replace existing values if IDP1 and IDP2 both exist with IDP1Number. Otherwise leave values in IDP2Number alone.
The error message that appears reads, " Reindexing only valid with uniquely value Index objects
The Dataframe below is what I wish to have:
IDP1 IDP1Number IDP2 IDP2Number
1 100 1 100
3 110 2 150
5 120 3 110
7 140 4 160
9 150 5 120
NaN NaN 6 130
NaN NaN 7 140
NaN NaN 8 200
NaN NaN 9 150
NaN NaN 10 NaN

Here's a way to do:
# filter the data and create a mapping dict
maps = df.query("IDP1.notna()")[['IDP1', 'IDP1Number']].set_index('IDP1')['IDP1Number'].to_dict()
# create new column using ifelse condition
df['IDP2Number'] = df.apply(lambda x: maps.get(x['IDP2'], None) if (pd.isna(x['IDP2Number']) or x['IDP2'] in maps) else x['IDP2Number'], axis=1)
print(df)
IDP1 IDP1Number IDP2 IDP2Number
0 1.0 100.0 1 100.0
1 3.0 110.0 2 150.0
2 5.0 120.0 3 110.0
3 7.0 140.0 4 160.0
4 9.0 150.0 5 120.0
5 NaN NaN 6 130.0
6 NaN NaN 7 140.0
7 NaN NaN 8 200.0
8 NaN NaN 9 150.0
9 NaN NaN 10 NaN

How to find last index in Pandas Data Frame row and count backwards using column information?

For example:
If I have a data frame like this:
20 40 60 80 100 120 140
1 1 1 1 NaN NaN NaN NaN
2 1 1 1 1 1 NaN NaN
3 1 1 1 1 NaN NaN NaN
4 1 1 NaN NaN 1 1 1
How do I find the last index in each row and then count the difference in columns elapsed so I get something like this?
20 40 60 80 100 120 140
1 40 20 0 NaN NaN NaN NaN
2 80 60 40 20 0 NaN NaN
3 60 40 20 0 NaN NaN NaN
4 20 0 NaN NaN 40 20 0

You can try of Transposing the dataframe, then after count only not null values and last set 1
#bit of complex procedure, solution involving with.
def fill_values(df):
df = df[::-1]
a = df == 1
b = a.cumsum()
#Function in counting the cummulative not null values
arr = np.where(a, b-b.mask(a).ffill().fillna(0).astype(int), 1)
return (b-b.mask(a).ffill().fillna(0).astype(int))[::-1]*20
df.apply(fill_values,1).replace(0,np.nan)-20
Out:
20 40 60 80 100 120 140
1 40.0 20.0 0.0 NaN NaN NaN NaN
2 80.0 60.0 40.0 20.0 0.0 NaN NaN
3 60.0 40.0 20.0 0.0 NaN NaN NaN
4 20.0 0.0 NaN NaN 40.0 20.0 0.0

How can i filter consecutive data rows btw NaN rows in a pandas dataframe?

I have a dataframe that looks like the following. There are >=1 consecutive rows where y_l is populated and y_h is NaN and vice versa.
When we have more than 1 consecutive populated lines between the NaNs we only want to keep the one with the lowest y_l or the highest y_h.
e.g. on the df below from the last 3 rows we would only keep the 2nd and discard the other two.
What would be a smart way to implement that?
df = pd.DataFrame({'y_l': [NaN, 97,95,98,NaN],'y_h': [90, NaN,NaN,NaN,95]}, columns=['y_l','y_h'])
>>> df
y_l y_h
0 NaN 90.0
1 97.0 NaN
2 95.0 NaN
3 98.0 NaN
4 NaN 95
Desired result:
y_l y_h
0 NaN 90.0
1 95.0 NaN
2 NaN 95

You need create new column or Series for distinguish each consecutives and then use groupby with aggreagte by agg, last for change order of columns use reindex:
a = df['y_l'].isnull()
b = a.ne(a.shift()).cumsum()
df = (df.groupby(b, as_index=False)
.agg({'y_l':'min', 'y_h':'max'})
.reindex(columns=['y_l','y_h']))
print (df)
y_l y_h
0 NaN 90.0
1 95.0 NaN
2 NaN 95.0
Detail:
print (b)
0 1
1 2
2 2
3 2
4 3
Name: y_h, dtype: int32

What if you had more columns?
for example
df = pd.DataFrame({'A': [NaN, 15,20,25,NaN],'y_l': [NaN, 97,95,98,NaN],'y_h': [90, NaN,NaN,NaN,95]}, columns=['A','y_l','y_h'])
>>>df
A y_l y_h
0 NaN NaN 90.0
1 15.0 97.0 NaN
2 20.0 95.0 NaN
3 25.0 98.0 NaN
4 NaN NaN 95.0
How could you keep the values in column A after filtering out the irrelevant rows as below?
A y_l y_h
0 NaN NaN 90.0
1 20.0 95.0 NaN
2 NaN NaN 95.0

Interpolate a missing values using rows and columns values

In Python Pandas, how should I interactively interpolate a dataframe with some NaN rows and columns?
For example, the following dataframe -
90 92.5 95 100 110 120
Index
1 NaN NaN NaN NaN NaN NaN
2 0.469690 NaN NaN NaN NaN NaN
3 0.478220 NaN 0.492232 0.505685 NaN NaN
4 0.486377 NaN 0.503853 0.518890 0.550517 NaN
5 0.485862 NaN 0.502130 0.515076 0.537675 0.564383
My goal is to interpolate & fill all the NaN efficiently, I.E, to interpolate whatever NaN that is possible. However If I use
df.interpolate(inplace=True, axis=0, method='spline', order=1, limit=20, limit_direction='both')
it will return "TypeError: Cannot interpolate with all NaNs."

You can try this (thank you #Boud for df.dropna(axis=1, how='all')):
In [138]: new = df.dropna(axis=1, how='all').interpolate(limit=20, limit_direction='both')
In [139]: new
Out[139]:
90 95 100 110 120
Index
1 0.469690 0.492232 0.505685 0.550517 0.564383
2 0.469690 0.492232 0.505685 0.550517 0.564383
3 0.478220 0.492232 0.505685 0.550517 0.564383
4 0.486377 0.503853 0.518890 0.550517 0.564383
5 0.485862 0.502130 0.515076 0.537675 0.564383

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas - Fillna based on last non-blank value and next column - python

Related

Merge rows duplicate values in a column using Pandas

How To Map Column Values where two others match? "Reindexing only valid with uniquely valued Index objects"?

How to find last index in Pandas Data Frame row and count backwards using column information?

How can i filter consecutive data rows btw NaN rows in a pandas dataframe?

Interpolate a missing values using rows and columns values

Categories

Resources