I want to resample the data in Sms ,call and Internet column by replacing the value by their mean for every hour.
Code 1 tried :
df1.reset_index().set_index('TIME').resample('1H').mean()
error:Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'Index'
Code 2 tried:
df1['TIME'] = pd.to_datetime(data['TIME'])
df1.CALL.resample('60min', how='mean')
error: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'RangeIndex'
Dataframe:
ID TIME SMS CALL INTERNET
0 1 2013-11-30 23:00:00 0.277204 0.273629 13.674575
1 1 2013-11-30 23:10:00 0.341536 0.058176 13.330858
2 1 2013-11-30 23:20:00 0.379427 0.054601 11.329552
3 1 2013-11-30 23:30:00 0.600781 0.218489 13.166163
4 1 2013-11-30 23:40:00 0.405565 0.134176 13.347791
5 1 2013-11-30 23:50:00 0.187700 0.080738 12.434744
6 1 2013-12-01 00:00:00 0.282651 0.135964 13.860353
7 1 2013-12-01 00:10:00 0.109826 0.056388 12.583463
8 1 2013-12-01 00:20:00 0.348638 0.053438 12.644995
9 1 2013-12-01 00:30:00 0.138375 0.054062 12.251733
10 1 2013-12-01 00:40:00 0.054062 0.163803 11.292642
df1.dtypes
ID int64
TIME object
SMS float64
CALL float64
INTERNET float64
dtype: object
You can use parameter on in resample:
on : string, optional
For a DataFrame, column to use instead of index for resampling. Column must be datetime-like.
New in version 0.19.0.
df1['TIME'] = pd.to_datetime(df1['TIME'])
df = df1.resample('60min', on='TIME').mean()
print (df)
ID SMS CALL INTERNET
TIME
2013-11-30 23:00:00 1 0.365369 0.136635 12.880614
2013-12-01 00:00:00 1 0.186710 0.092731 12.526637
Or add set_index for DatetimeIndex:
df1['TIME'] = pd.to_datetime(df1['TIME'])
df = df1.set_index('TIME').resample('60min').mean()
Related
I have following dataframe, where date was set as the index col,
date
renormalized
2017-01-01
6
2017-01-08
5
2017-01-15
3
2017-01-22
3
2017-01-29
3
I want to append 00:00:00 to each of the datetime in the index column, make it like
date
renormalized
2017-01-01 00:00:00
6
2017-01-08 00:00:00
5
2017-01-15 00:00:00
3
2017-01-22 00:00:00
3
2017-01-29 00:00:00
3
It seems I got stuck for no solution to make it happen.... It will be great if anyone can help...
Thanks
AL
When your time is 0 for all instances, pandas doesn't show the time by default (although it's a Timestamp class, so it has the time!). Probably your data is already normalized, and you can perform delta time operations as usual.
You can see a target observation with df.index[0] for instance, or take a look at all the times with df.index.time.
You can use DatetimeIndex.strftime
df.index = pd.to_datetime(df.index).strftime('%Y-%m-%d %H:%M:%S')
print(df)
renormalized
date
2017-01-01 00:00:00 6
2017-01-08 00:00:00 5
2017-01-15 00:00:00 3
2017-01-22 00:00:00 3
2017-01-29 00:00:00 3
Or you can choose
df.index = df.index + ' 00:00:00'
I'm doing some resampling on data and I was wondering why resampling 1min data to 5min data creates MORE time intervals than my original dataset?
Also, why does t resample until 2018-12-11 (11 days longer!) than the original datset?
1-min data:
result of resampling to 5-min intervalls:
This is how I do the resampling:
df1.loc[:,'qKfz_gesamt'].resample('5min').mean()
I was wondering why resampling 1min data to 5min data creates MORE time intervals than my original dataset?
Problem is if no consecutive values in original pandas create consecutive 5minutes intervals and for not exist values are created NaNs:
df1 = pd.DataFrame({'qKfz_gesamt': range(4)},
index=pd.to_datetime(['2018-11-25 00:00:00','2018-11-25 00:01:00',
'2018-11-25 00:02:00','2018-11-25 00:15:00']))
print (df1)
qKfz_gesamt
2018-11-25 00:00:00 0
2018-11-25 00:01:00 1
2018-11-25 00:02:00 2
2018-11-25 00:15:00 3
print (df1['qKfz_gesamt'].resample('5min').mean())
2018-11-25 00:00:00 1.0
2018-11-25 00:05:00 NaN
2018-11-25 00:10:00 NaN
2018-11-25 00:15:00 3.0
Freq: 5T, Name: qKfz_gesamt, dtype: float64
print (df1['qKfz_gesamt'].resample('5min').mean().dropna())
2018-11-25 00:00:00 1.0
2018-11-25 00:15:00 3.0
Name: qKfz_gesamt, dtype: float64
why does t resample until 2018-12-11 (11 days longer!) than the original datset?
You need filter by maximal value of index:
rng = pd.date_range('2018-11-25', periods=10)
df1 = pd.DataFrame({'a': range(10)}, index=rng)
print (df1)
a
2018-11-25 0
2018-11-26 1
2018-11-27 2
2018-11-28 3
2018-11-29 4
2018-11-30 5
2018-12-01 6
2018-12-02 7
2018-12-03 8
2018-12-04 9
df1 = df1.loc[:'2018-11-30']
print (df1)
a
2018-11-25 0
2018-11-26 1
2018-11-27 2
2018-11-28 3
2018-11-29 4
2018-11-30 5
Or:
df1 = df1.loc[df1.index <= '2018-11-30']
print (df1)
a
2018-11-25 0
2018-11-26 1
2018-11-27 2
2018-11-28 3
2018-11-29 4
2018-11-30 5
I'm trying to upsample my data from daily to hourly frequency and forward fill missing data.
I start with the following code:
df1 = pd.read_csv("DATA.csv")
df1.head(5)
I then used the following to convert to a datetime string and set the date/time as an index:
df1['DT'] = pd.to_datetime(df1['DT']).dt.strftime('%Y-%m-%d %H:%M:%S')
df1.set_index('DT')
I try to resample hourly as follows:
df1['DT'] = df1.resample('H').ffill()
But I get the following error:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or
PeriodIndex, but got an instance of 'RangeIndex'
I thought my dtype was already date time as instructed by the pd.to_datetime code above. Nothing I try seems to be working. Can anyone please help me?
My expected output is as follows:
DT VALUE
2016-08-01 00:00:00 0.000000
2016-08-01 01:00:00 0.000000
2016-08-01 02:00:00 0.000000
etc.
The file itself has approximately 1000 rows. The first 50 rows or so are zero so to clarify where there's actual data:
DT VALUE
2018-12-13 00:00:00 24000.000000
2018-12-13 01:00:00 24000.000000
2018-12-13 02:00:00 24000.000000
...
2018-12-13 23:00:00 24000.000000
2018-12-14 00:00:00 26000.000000
2018-12-14 01:00:00 26000.000000
etc.
Try assign it back
df1=df1.set_index('DT')
Or
df1.set_index('DT',inplace=True)
I am assuming some initial rows of your dataset as you mentioned,
DT VALUE
0 2016-08-01 0
1 2016-08-02 0
2 2016-08-03 0
3 2016-08-04 0
4 2016-08-05 0
5 2016-08-06 0
6 2016-08-07 0
7 2016-08-08 0
8 2016-08-09 0
Then, make index on DT like this,
df = df.set_index('DT')
df
Output:
VALUE
DT
2016-08-01 0
2016-08-02 0
2016-08-03 0
2016-08-04 0
2016-08-05 0
2016-08-06 0
2016-08-07 0
2016-08-08 0
2016-08-09 0
Now, resample your dataframe,
df = df.resample('H').ffill()
df
Output: showing some initial values of output,
VALUE
DT
2016-08-01 00:00:00 0
2016-08-01 01:00:00 0
2016-08-01 02:00:00 0
2016-08-01 03:00:00 0
2016-08-01 04:00:00 0
2016-08-01 05:00:00 0
2016-08-01 06:00:00 0
2016-08-01 07:00:00 0
2016-08-01 08:00:00 0
2016-08-01 09:00:00 0
2016-08-01 10:00:00 0
You could convert the index to a pd.DatetimeIndex and then resample that. I also don't think you need (or want) the strftime() call:
df1 = pd.read_csv("DATA.csv")
df1['DT'] = pd.to_datetime(df1['DT'])
df1.set_index('DT')
df1.index = pd.DatetimeIndex(df1.index)
df1['DT'] = df1.resample('H').ffill()
NOTE: You could probably combine a bunch of this and it would still be quite clear, like:
df1 = pd.read_csv("DATA.csv")
df1.index = pd.DatetimeIndex(pd.to_datetime(df1['DT']))
df1['DT'] = df1.resample('H').ffill()
I'm trying to run fillna on a column of type datetime64[ns]. When I run something like:
df['date'].fillna(datetime("2000-01-01"))
I get:
TypeError: an integer is required
Any way around this?
This should work in 0.12 and 0.13 (just released).
#DSM points out that datetimes are constructed like: datetime.datetime(2012,1,1)
SO the error is from failing to construct the value the you are passing to fillna.
Note that using a Timestamp WILL parse the string.
In [3]: s = Series(date_range('20130101',periods=10))
In [4]: s.iloc[3] = pd.NaT
In [5]: s.iloc[7] = pd.NaT
In [6]: s
Out[6]:
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 NaT
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 NaT
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00
dtype: datetime64[ns]
datetime.datetime will work as well
In [7]: s.fillna(Timestamp('20120101'))
Out[7]:
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 2012-01-01 00:00:00
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 2012-01-01 00:00:00
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00
dtype: datetime64[ns]
Right now, df['date'].fillna(pd.Timestamp("20210730")) works in pandas 1.3.1
This example is works with dynamic data if you want to replace NaT data in rows with data from another DateTime data.
df['column_with_NaT'].fillna(df['dt_column_with_thesame_index'], inplace=True)
It's works for me when I was updated some rows in DateTime column and not updated rows had NaT value, and I've been needed to inherit old series data. And this code above resolve my problem. Sry for the not perfect English )
How to convert a column consisting of datetime64 objects to a strings that would read
01-11-2013 for today's date of November 1.
I have tried
df['DateStr'] = df['DateObj'].strftime('%d%m%Y')
but I get this error
AttributeError: 'Series' object has no attribute 'strftime'
As of version 17.0, you can format with the dt accessor:
df['DateStr'] = df['DateObj'].dt.strftime('%d%m%Y')
In [6]: df = DataFrame(dict(A = date_range('20130101',periods=10)))
In [7]: df
Out[7]:
A
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 2013-01-04 00:00:00
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 2013-01-08 00:00:00
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00
In [8]: df['A'].apply(lambda x: x.strftime('%d%m%Y'))
Out[8]:
0 01012013
1 02012013
2 03012013
3 04012013
4 05012013
5 06012013
6 07012013
7 08012013
8 09012013
9 10012013
Name: A, dtype: object
It works directly if you first set as index. Then essentially you pass a 'DatetimeIndex' object and not a 'Series'
df = df.set_index('DateObj').copy()
df['DateStr'] = df.index.strftime('%d%m%Y')