Merge Multiples Dataframes preserving columns and filling with NaN the rest - python

I have a set of data frames df1, df2, ... dfn
dfs are is like:
id | date | metric_value
001 | 2013-01-01 | 0.73
001 | 2013-03-01 | 0.73
002 | 2013-01-01 | 0.73
002 | 2013-02-01 | 0.73
But there is not necessarily a match between the id and date column, so I could have a df1 like:
id | date | metric_value1
001 | 2013-01-01 | 0.73
001 | 2013-03-01 | 0.73
002 | 2013-01-01 | 0.73
002 | 2013-02-01 | 0.73
004 | 2013-03-01 | 0.73
And a df2 like:
id | date | metric_value2
001 | 2013-01-01 | 0.72
003 | 2013-02-01 | 0.72
003 | 2013-03-01 | 0.72
004 | 2013-01-01 | 0.72
How could I merge df1 and df2, generally speaking df1 ... dfn, so I could have something like:
id | date | metric_value1 | metric_value2
001 | 2013-01-01 | 0.73 | 0.72
001 | 2013-02-01 | Nan | Nan
001 | 2013-03-01 | 0.73 | Nan
002 | 2013-01-01 | 0.73 | Nan
002 | 2013-02-01 | 0.73 | Nan
002 | 2013-03-01 | Nan | Nan
003 | 2013-01-01 | Nan | Nan
003 | 2013-02-01 | Nan | 0.72
003 | 2013-03-01 | Nan | 0.72
004 | 2013-01-01 | Nan | 0.72
004 | 2013-02-01 | Nan | Nan
004 | 2013-03-01 | 0.73 | Nan
To cover all Ids, in the entire range of date, from min date, to max date

Taking #JonathanLeon solution a little further:
import io
import pandas as pd
data='''id|date|metric_value1
001|2013-01-01|0.73
001|2013-03-01|0.73
002|2013-01-01|0.73
002|2013-02-01|0.73
004|2013-03-01|0.73'''
df1 = pd.read_csv(io.StringIO(data), sep='|', engine='python')
data='''id|date|metric_value2
001|2013-01-01|0.72
003|2013-02-01|0.72
003|2013-03-01|0.72
004|2013-01-01|0.72'''
df2 = pd.read_csv(io.StringIO(data), sep='|', engine='python')
df_out = df1.merge(df2, on=['id', 'date'], how='outer')
df_out['date'] = pd.to_datetime(df_out['date'])
df_out.set_index(['id', 'date'])\
.reindex(pd.MultiIndex.from_product([df_out['id'].unique(),
df_out['date'].unique()],
names=['id', 'date']))\
.sort_index()
.reset_index()
Output:
id date metric_value1 metric_value2
0 1 2013-01-01 0.73 0.72
1 1 2013-02-01 NaN NaN
2 1 2013-03-01 0.73 NaN
3 2 2013-01-01 0.73 NaN
4 2 2013-02-01 0.73 NaN
5 2 2013-03-01 NaN NaN
6 3 2013-01-01 NaN NaN
7 3 2013-02-01 NaN 0.72
8 3 2013-03-01 NaN 0.72
9 4 2013-01-01 NaN 0.72
10 4 2013-02-01 NaN NaN
11 4 2013-03-01 0.73 NaN

Try:
data='''id|date|metric_value1
001|2013-01-01|0.73
001|2013-03-01|0.73
002|2013-01-01|0.73
002|2013-02-01|0.73
004|2013-03-01|0.73'''
df1 = pd.read_csv(io.StringIO(data), sep='|', engine='python')
data='''id|date|metric_value2
001|2013-01-01|0.72
003|2013-02-01|0.72
003|2013-03-01|0.72
004|2013-01-01|0.72'''
df2 = pd.read_csv(io.StringIO(data), sep='|', engine='python')
df1.merge(df2, on=['id', 'date'], how='outer')
Output:
id date metric_value1 metric_value2
0 1 2013-01-01 0.730 0.720
1 1 2013-03-01 0.730 NaN
2 2 2013-01-01 0.730 NaN
3 2 2013-02-01 0.730 NaN
4 4 2013-03-01 0.730 NaN
5 3 2013-02-01 NaN 0.720
6 3 2013-03-01 NaN 0.720
7 4 2013-01-01 NaN 0.720

import pandas
import datetime
#build your list of unique ids
ids = pandas.concat([df1['id'], df2['id']])
ids = pandas.Series(ids.unique())
#can do as above to get all possible dates, I've just generated them.
dates = pandas.DataFrame(pandas.date_range(datetime.date.today(), freq='D', periods = 10), columns=['date'])
#use merge to generate the cartesian product of all dates and all ids
combinations = pandas.merge(left=dates, right=pandas.DataFrame(ids.unique(), columns=['id']), how='outer', left_index=True, right_index=True)
#merge your dataframes on your 'key' columns
df3 = pandas.merge(left=dates, right=df1, on=['date', 'id'], how='left')
df4 = pandas.merge(left=dates, right=df2, on=['date', 'id'], how='left')

Related

Merging the multiple columns

I have a dataframe like this (actual data has 70 columns with timestamp) with Column name as A_Timestamp, BC_Timestamp, DA_Timestamp, CA_Timestamp, B_Values, C_values, D_Values, Q_Values
A_Timestamp
B_Values
2020-11-08 11:15:00
1
2020-11-10 15:34:00
2
BC_Timestamp
C_Values
2020-11-11 12:13:00
8
2020-11-15 02:47:00
4
DA_Timestamp
D_Values
2020-1-13 14:47:00
3
2020-11-9 5:34:00
5
CA_Timestamp
Q_Values
2020-7-18 01:04:00
7
2020-04-10 16:34:00
6
And I want Like this:
| Timestamp | |B_Values| C_values| D_values| Q_Values|
| 2020-11-08 11:15:00 | 1 | Nan | Nan | Nan|
| 2020-11-10 15:34:00 | 2 | Nan | Nan | Nan |
| 2020-11-11 12:13:00 | Nan | 8 | Nan | Nan|
| 2020-11-15 02:47:00 | Nan | 4 | Nan | Nan|
| 2020-1-13 14:47:00 | Nan | Nan | 3 | Nan|
| 2020-11-9 05:34:00 | Nan | Nan | 5 | Nan|
| 2020-7-18 01:04:00 | Nan | Nan | Nan | 7|
I want to merge all the columns ending with 'Timestamp' into one single column. And each timestamp with their respective value in the respective columns.
You can use a renamer for the Timestamp column:
dfs = [df1, df2, df3, df4]
renamer = lambda x: 'Timestamp' if x.endswith('Timestamp') else x
out = pd.concat([d.rename(renamer, axis=1) for d in dfs])
Output:
Timestamp B_Values C_Values D_Values Q_Values
0 2020-11-08 11:15:00 1.0 NaN NaN NaN
1 2020-11-10 15:34:00 2.0 NaN NaN NaN
0 2020-11-11 12:13:00 NaN 8.0 NaN NaN
1 2020-11-15 02:47:00 NaN 4.0 NaN NaN
0 2020-1-13 14:47:00 NaN NaN 3.0 NaN
1 2020-11-9 5:34:00 NaN NaN 5.0 NaN
0 2020-7-18 01:04:00 NaN NaN NaN 7.0
1 2020-04-10 16:34:00 NaN NaN NaN 6.0
alternative
Assuming you have a single DataFrame as input:
A_Timestamp B_Values BC_Timestamp C_Values DA_Timestamp D_Values CA_Timestamp Q_Values
0 2020-11-08 11:15:00 1 2020-11-11 12:13:00 8 2020-1-13 14:47:00 3 2020-7-18 01:04:00 7
1 2020-11-10 15:34:00 2 2020-11-15 02:47:00 4 2020-11-9 5:34:00 5 2020-04-10 16:34:00 6
You can then reshape with a MultiIndex:
m = df.columns.str.endswith('Timestamp')
s = df.columns.to_series().mask(m)
out = (df
.set_axis(pd.MultiIndex.from_arrays(
[s.bfill(), s.fillna('Timestamp')]), axis=1)
.T.stack().unstack(-2).droplevel(0)
)
Output:
B_Values C_Values D_Values Q_Values Timestamp
0 1 NaN NaN NaN 2020-11-08 11:15:00
1 2 NaN NaN NaN 2020-11-10 15:34:00
0 NaN 8 NaN NaN 2020-11-11 12:13:00
1 NaN 4 NaN NaN 2020-11-15 02:47:00
0 NaN NaN 3 NaN 2020-1-13 14:47:00
1 NaN NaN 5 NaN 2020-11-9 5:34:00
0 NaN NaN NaN 7 2020-7-18 01:04:00
1 NaN NaN NaN 6 2020-04-10 16:34:00
Or, if order of the rows doesn't matter:
m = df.columns.str.endswith('Timestamp')
s = df.columns.to_series().mask(m)
(df.set_axis(pd.MultiIndex.from_arrays(
[s.fillna('Timestamp'), s.bfill()]), axis=1)
.stack()
)

Convert list with multiple entries per day to standard daytime index and give each entry it's own column

I have a file that looks like this:
Date | col1 | col2 | col3
2010-01-01 | -1.4 | 0.0 | 0.0
2010-01-01 | -1.4 | 0.0 | 0.0
2010-01-01 | -2.4 | 0.0 | 0.66
2010-01-02 | -2.4 | 0.0 | 0.08
2010-01-02 | -4.3 | 0.0 | 0.1
2010-01-02 | -4.3 | 0.0 | 1.04
Same days refer to a specific city, so for 2010-01-01 there is data for 3 cities, same for 2010-01-02 and all other days (it's always the same amount, at the moment 13 cities = 13 rows per day).
The city names are in a list where the order of the cities is the same as the order of the days:
["city1", "city2", "city3"]
So "city1" is the first row for each day, then "city2", then "city3" and so on.
I need to get this format into a standard format where I can set the Date as the index, so need a format like this:
Date | city1_col1 | city1_col2 | city1_col3 | city2_col1| city2_col2 | city2_col3 | city3_col1| city3_col2 | city3_col3
2010-01-01 | -1.4 | 0.0 | 0.0 | -1.4 | 0.0 | 0.0 | -2.4 | 0.0 | 0.66
2010-01-02 | -2.4 | 0.0 | 0.08 | -4.3 | 0.0 | 0.1 | -4.3 | 0.0 | 1.04
The data is later merged with other dataframes where the indexes are also the days of the year so a multiindex won't work.
How can I achieve this with pandas?
Here's a way to do that:
df["city"] = cities * (len(df) // len(cities))
df = pd.pivot_table(df, index="Date", columns="city")
df.columns = [c[1] + "_" + c[0] for c in df.columns]
df=df.sort_index(axis=1)
The output is:
city1_col1 city1_col2 city1_col3 city2_col1 city2_col2 city2_col3 city3_col1 city3_col2 city3_col3
Date
2010-01-01 -1.4 0.0 0.00 -1.4 0.0 0.0 -2.4 0.0 0.66
2010-01-02 -2.4 0.0 0.08 -4.3 0.0 0.1 -4.3 0.0 1.04

maximum variation within one second for each row of a DataFrame

I'm having a calculation problem with pandas and I'd like to know if anyone could help me.
Having this df created using this code:
df = pd.DataFrame({'B': [0, 2, 1, np.nan, 4, 1, 3, 10, np.nan, 3, 6]},
index = [pd.Timestamp('20130101 09:31:23.999'),
pd.Timestamp('20130101 09:31:24.200'),
pd.Timestamp('20130101 09:31:24.250'),
pd.Timestamp('20130101 09:31:25.000'),
pd.Timestamp('20130101 09:31:25.375'),
pd.Timestamp('20130101 09:31:25.850'),
pd.Timestamp('20130101 09:31:26.100'),
pd.Timestamp('20130101 09:31:27.150'),
pd.Timestamp('20130101 09:31:28.050'),
pd.Timestamp('20130101 09:31:28.850'),
pd.Timestamp('20130101 09:31:29.200')])
df
| | B |
|-------------------------|------|
| 2013-01-01 09:31:23.999 | 0.0 |
| 2013-01-01 09:31:24.200 | 2.0 |
| 2013-01-01 09:31:24.250 | 1.0 |
| 2013-01-01 09:31:25.000 | NaN |
| 2013-01-01 09:31:25.375 | 4.0 |
| 2013-01-01 09:31:25.850 | 1.0 |
| 2013-01-01 09:31:26.100 | 3.0 |
| 2013-01-01 09:31:27.150 | 10.0 |
| 2013-01-01 09:31:28.050 | NaN |
| 2013-01-01 09:31:28.850 | 3.0 |
| 2013-01-01 09:31:29.200 | 6.0 |
I would like to be able to calculate for each row what the maximum variation of B has been during one second.
For example, in the first row you would have to look at how much it has changed with respect to the second row and the third row which are those within the interval of a second and calculate the difference with the maximum value.
In this case, the maximum value is in the second row "09:31:24.200", the maximum variation will be 2 - 0.
Then, we will create a new column with all these maximum variations for each of the rows.
df
| | B | Maximum Variation |
|-------------------------|------|--------------------|
| 2013-01-01 09:31:23.999 | 0.0 | 2.0 |
| 2013-01-01 09:31:24.200 | 2.0 | 1.0 |
| 2013-01-01 09:31:24.250 | 1.0 | 0.0 |
| 2013-01-01 09:31:25.000 | NaN | 4.0 |
| 2013-01-01 09:31:25.375 | 4.0 |-3.0 |
| 2013-01-01 09:31:25.850 | 1.0 | 2.0 |
| 2013-01-01 09:31:26.100 | 3.0 | 0.0 |
| 2013-01-01 09:31:27.150 | 10.0 | 0.0 |
| 2013-01-01 09:31:28.050 | NaN | 3.0 |
| 2013-01-01 09:31:28.850 | 3.0 | 3.0 |
| 2013-01-01 09:31:29.200 | 6.0 | 0.0 |
I hope it's clear enough
Solution has been found and shared in the answers, but still an efficiency improvement in this solution that doesn't involve having to make a loop for each row of the df, will be more than welcome
I've finally found the solution:
df = pd.DataFrame({'B': [0, 1, 2, 8, 6, 1, 3, 10, np.nan, 3, 6]},
index = [pd.Timestamp('20130101 09:31:23.999'),
pd.Timestamp('20130101 09:31:24.200'),
pd.Timestamp('20130101 09:31:24.250'),
pd.Timestamp('20130101 09:31:25.000'),
pd.Timestamp('20130101 09:31:25.375'),
pd.Timestamp('20130101 09:31:25.850'),
pd.Timestamp('20130101 09:31:26.100'),
pd.Timestamp('20130101 09:31:27.150'),
pd.Timestamp('20130101 09:31:28.050'),
pd.Timestamp('20130101 09:31:28.850'),
pd.Timestamp('20130101 09:31:29.200')])
df = df.reset_index()
df = df.rename(columns={"index": "start_date"})
df['duration_in_seconds'] = 1
df['end_date'] = df['start_date'] + pd.to_timedelta(df['duration_in_seconds'], unit='s')
df['max'] = np.nan
for index, row in df.iterrows():
start = row['start_date']
end = row['end_date']
maxi = df[(df['start_date'] >= start ) & (df['start_date'] <= end)]['B'].max()
df.iloc[index, df.columns.get_loc('max')] = maxi
df['Maximum Variation'] = df['max'] - df['B']
df
| | start_date | B | duration_in_seconds | end_date | max | Maximum Variation |
|----|-------------------------|------|---------------------|-------------------------|------|-------------------|
| 0 | 2013-01-01 09:31:23.999 | 0.0 | 1 | 2013-01-01 09:31:24.999 | 2.0 | 2.0 |
| 1 | 2013-01-01 09:31:24.200 | 1.0 | 1 | 2013-01-01 09:31:25.200 | 8.0 | 7.0 |
| 2 | 2013-01-01 09:31:24.250 | 2.0 | 1 | 2013-01-01 09:31:25.250 | 8.0 | 6.0 |
| 3 | 2013-01-01 09:31:25.000 | 8.0 | 1 | 2013-01-01 09:31:26.000 | 8.0 | 0.0 |
| 4 | 2013-01-01 09:31:25.375 | 6.0 | 1 | 2013-01-01 09:31:26.375 | 6.0 | 0.0 |
| 5 | 2013-01-01 09:31:25.850 | 1.0 | 1 | 2013-01-01 09:31:26.850 | 3.0 | 2.0 |
| 6 | 2013-01-01 09:31:26.100 | 3.0 | 1 | 2013-01-01 09:31:27.100 | 3.0 | 0.0 |
| 7 | 2013-01-01 09:31:27.150 | 10.0 | 1 | 2013-01-01 09:31:28.150 | 10.0 | 0.0 |
| 8 | 2013-01-01 09:31:28.050 | NaN | 1 | 2013-01-01 09:31:29.050 | 3.0 | NaN |
| 9 | 2013-01-01 09:31:28.850 | 3.0 | 1 | 2013-01-01 09:31:29.850 | 6.0 | 3.0 |
| 10 | 2013-01-01 09:31:29.200 | 6.0 | 1 | 2013-01-01 09:31:30.200 | 6.0 | 0.0 |
More time efficient solutions are still welcome
More efficient solution
df = df.reset_index()
df = df.rename(columns={"index": "start_date"})
df['duration_in_seconds'] = 1
df['end_date'] = df['start_date'] + pd.to_timedelta(df['duration_in_seconds'], unit='s')
df['max'] = np.nan
df["max"] = df.apply(lambda row : df.loc[(df["start_date"] >= row['start_date']) & (df["start_date"] <=row['end_date'])]["B"].max(), axis = 1)
df['Maximum Variation'] = df['max'] - df['B']
import numpy as np
import pandas as pd
df = pd.DataFrame({'B': [0, 2, 1, np.nan, 4, 1, 3, 10, np.nan, 3, 6]},
index = [pd.Timestamp('20130101 09:31:23.999'),
pd.Timestamp('20130101 09:31:24.200'),
pd.Timestamp('20130101 09:31:24.250'),
pd.Timestamp('20130101 09:31:25.000'),
pd.Timestamp('20130101 09:31:25.375'),
pd.Timestamp('20130101 09:31:25.850'),
pd.Timestamp('20130101 09:31:26.100'),
pd.Timestamp('20130101 09:31:27.150'),
pd.Timestamp('20130101 09:31:28.050'),
pd.Timestamp('20130101 09:31:28.850'),
pd.Timestamp('20130101 09:31:29.200')])
print(df)
B
2013-01-01 09:31:23.999 0.0
2013-01-01 09:31:24.200 2.0
2013-01-01 09:31:24.250 1.0
2013-01-01 09:31:25.000 NaN
2013-01-01 09:31:25.375 4.0
2013-01-01 09:31:25.850 1.0
2013-01-01 09:31:26.100 3.0
2013-01-01 09:31:27.150 10.0
2013-01-01 09:31:28.050 NaN
2013-01-01 09:31:28.850 3.0
2013-01-01 09:31:29.200 6.0
df_min = df.resample('1S').min()
print(df_min)
B
2013-01-01 09:31:23 0.0
2013-01-01 09:31:24 1.0
2013-01-01 09:31:25 1.0
2013-01-01 09:31:26 3.0
2013-01-01 09:31:27 10.0
2013-01-01 09:31:28 3.0
2013-01-01 09:31:29 6.0
df_max = df.resample('1S').max()
print(df_max)
B
2013-01-01 09:31:23 0.0
2013-01-01 09:31:24 2.0
2013-01-01 09:31:25 4.0
2013-01-01 09:31:26 3.0
2013-01-01 09:31:27 10.0
2013-01-01 09:31:28 3.0
2013-01-01 09:31:29 6.0
df_diff = df_max - df_min
print(df_diff)
B
2013-01-01 09:31:23 0.0
2013-01-01 09:31:24 1.0
2013-01-01 09:31:25 3.0
2013-01-01 09:31:26 0.0
2013-01-01 09:31:27 0.0
2013-01-01 09:31:28 0.0
2013-01-01 09:31:29 0.0

How to unpivot pandas dataframe to create datetime column

I have a pivoted pandas dataframe that looks like the one below.
I need to unpivot it into a dataframe indexed by datetime, and the variables (columns) reduced to only one of each.
I tried using melt but I am struggling to reshape it because of the hour row.
What would be the best option to reshape such a dataframe?
The dataframe I have
+----------+------+------+------+------+------+
| nan | var1 | var1 | var2 | var2 | var3 |
+----------+------+------+------+------+------+
| Hour | 2 | 3 | 0 | 2 | 0 |
| 1/1/2019 | 0.8 | 0.4 | 0.6 | 0.9 | 0.7 |
| 1/2/2019 | 0.2 | 0.2 | 0.7 | 0.3 | 0.1 |
| 1/3/2019 | 0.1 | 0.0 | 0.3 | 0.4 | 1.0 |
+----------+------+------+------+------+------+
The dataframe I need to get
+---------------+------+------+------+
| Datetime | var1 | var2 | var3 |
+---------------+------+------+------+
| 1/1/2019 0:00 | NaN | 0.6 | 0.7 |
| 1/1/2019 1:00 | NaN | NaN | NaN |
| 1/1/2019 2:00 | 0.8 | 0.9 | NaN |
| 1/1/2019 3:00 | 0.4 | NaN | NaN |
| 1/2/2019 0:00 | NaN | 0.7 | 0.1 |
| 1/2/2019 1:00 | NaN | NaN | NaN |
| 1/2/2019 2:00 | 0.2 | 0.3 | NaN |
| 1/2/2019 3:00 | 0.2 | NaN | NaN |
| 1/3/2019 0:00 | NaN | 0.3 | 1.0 |
| 1/3/2019 1:00 | NaN | NaN | NaN |
| 1/3/2019 2:00 | 0.1 | 0.4 | NaN |
| 1/3/2019 3:00 | 0.0 | NaN | NaN |
+---------------+------+------+------+
Here's a really shitty answer that is unidiomatic pandas, but gets the job done given the data you presented in the format you presented in. If you have massive amount of data I highly recommend you find a more optimized way.
dff = df.copy()
mn, mx = df.loc['Hour'].agg([min, max]).astype(int)
df = df.loc[df.index.repeat(mx-mn+1)]
df = df.loc[df.index != 'Hour']
df = df.assign(time=list(range(mn,mx+1))*(mx-mn))
df = df.set_index('time', append=True).iloc[:,:0]
for i,v in enumerate(dff.columns):
d = dff.iloc[:, i].to_frame()
hour = d.at['Hour', v]
for idx, row in d.iloc[1:].iterrows():
df.loc[(idx, hour), v] = row[v]
df = df.reset_index().rename(columns={0: 'date'})
df['datetime'] = df[['date', 'time']].apply(lambda x: f"{x['date']} {x['time']}:00", axis=1)
df = df.drop(columns=['date', 'time']).set_index('datetime').reset_index()
print(df)
datetime v1 v2 v3
0 1/1/2019 0:00 NaN 0.6 0.7
1 1/1/2019 1:00 NaN NaN NaN
2 1/1/2019 2:00 0.8 0.9 NaN
3 1/1/2019 3:00 0.4 NaN NaN
4 1/2/2019 0:00 NaN 0.7 0.1
5 1/2/2019 1:00 NaN NaN NaN
6 1/2/2019 2:00 0.2 0.3 NaN
7 1/2/2019 3:00 0.2 NaN NaN
8 1/3/2019 0:00 NaN 0.3 1.0
9 1/3/2019 1:00 NaN NaN NaN
10 1/3/2019 2:00 0.1 0.4 NaN
11 1/3/2019 3:00 0.0 NaN NaN

Pandas row-wise aggregation with multi-index

I have a pandas dataframe where there's three levels of row indexing. The last level is a datetime index. There are nan values and I am trying to fill them with the average of each row at the datetime level. How can I go about doing this?
data_df
Level 0 | Level 1 | Level 2 |
A 123 2019-01-28 17:00:00 | 3 | 1 | nan
2019-01-28 18:00:00 | 2 | nan | 1
2019-01-28 19:00:00 | nan | nan | 5
234 2019-01-28 05:00:00 | 1 | 1 | 3
2019-01-28 06:00:00 | nan | nan | nan
Some rows may all be nan values. In this case I want to fill the row with 0's. Some rows may have all values filled in so imputing with average isn't needed.
I want this the following result:
Level 0 | Level 1 | Level 2 |
A 123 2019-01-28 17:00:00 | 3 | 1 | 2
2019-01-28 18:00:00 | 2 | 1.5 | 1
2019-01-28 19:00:00 | 5 | 5 | 5
234 2019-01-28 05:00:00 | 1 | 1 | 3
2019-01-28 06:00:00 | 0 | 0 | 0
Use DataFrame.mask with mean per rows and last convert only NaNs rows by DataFrame.fillna:
df = df.mask(df.isna(), df.mean(axis=1), axis=0).fillna(0)
print (df)
a b c
Level 0 Level 1 Level 2
A 123 2019-01-28 17:00:00 3.0 1.0 2.0
2019-01-28 18:00:00 2.0 1.5 1.0
2019-01-28 19:00:00 5.0 5.0 5.0
234 2019-01-28 05:00:00 1.0 1.0 3.0
2019-01-28 06:00:00 0.0 0.0 0.0
Another solution is use DataFrame.fillna for replace, but because not implemented df.fillna(df.mean(axis=1), axis=1) is necessary double transpose:
df = df.T.fillna(df.mean(axis=1)).fillna(0).T

Categories

Resources