I have following dataframe, where date was set as the index col,
date
renormalized
2017-01-01
6
2017-01-08
5
2017-01-15
3
2017-01-22
3
2017-01-29
3
I want to append 00:00:00 to each of the datetime in the index column, make it like
date
renormalized
2017-01-01 00:00:00
6
2017-01-08 00:00:00
5
2017-01-15 00:00:00
3
2017-01-22 00:00:00
3
2017-01-29 00:00:00
3
It seems I got stuck for no solution to make it happen.... It will be great if anyone can help...
Thanks
AL
When your time is 0 for all instances, pandas doesn't show the time by default (although it's a Timestamp class, so it has the time!). Probably your data is already normalized, and you can perform delta time operations as usual.
You can see a target observation with df.index[0] for instance, or take a look at all the times with df.index.time.
You can use DatetimeIndex.strftime
df.index = pd.to_datetime(df.index).strftime('%Y-%m-%d %H:%M:%S')
print(df)
renormalized
date
2017-01-01 00:00:00 6
2017-01-08 00:00:00 5
2017-01-15 00:00:00 3
2017-01-22 00:00:00 3
2017-01-29 00:00:00 3
Or you can choose
df.index = df.index + ' 00:00:00'
Related
So I have a dataset with a specific date along with every data. I want to fill these values according to their specific date in Excel which contains the date range of the whole year. It's like the date starts from 01-01-2020 00:00:00 and end at 31-12-2020 23:45:00 with the frequency of 15 mins. So there will be a total of 35040 date-time values in Excel.
my data is like:
load date
12 01-02-2020 06:30:00
21 29-04-2020 03:45:00
23 02-07-2020 12:15:00
54 07-08-2020 16:00:00
23 22-09-2020 16:30:00
As you can see these values are not continuous but they have specific dates with them, so I these date values as the index and put it at that particular date in the Excel which has the date column, and also put zero in the missing values. Can someone please help?
Use DataFrame.reindex with date_range - so added 0 values for all not exist datetimes:
rng = pd.date_range('2020-01-01','2020-12-31 23:45:00', freq='15Min')
df['date'] = pd.to_datetime(df['date'])
df = df.set_index('date').reindex(rng, fill_value=0)
print (df)
load
2020-01-01 00:00:00 0
2020-01-01 00:15:00 0
2020-01-01 00:30:00 0
2020-01-01 00:45:00 0
2020-01-01 01:00:00 0
...
2020-12-31 22:45:00 0
2020-12-31 23:00:00 0
2020-12-31 23:15:00 0
2020-12-31 23:30:00 0
2020-12-31 23:45:00 0
[35136 rows x 1 columns]
I have a DataFrame with a DatetimeIndex, and I want to create a new column that is an aggregation of another column, aggregated by the Datetime at a slower frequency. For example, hourly values and the daily mean of the day they're a part of:
[index] A A_daily_mean
2018-08-01 00:00:00 6 7.5
2018-08-02 01:00:00 7 7.5
2018-08-03 02:00:00 8 7.5
2018-08-04 03:00:00 9 7.5
Is there a one-liner for this?
This is super easy for other groupby aggregations (on non-datetimes):
df['groupby_mean'] = df.groupby([col1, col2]).mean()
but grouping on the date of the DatetimeIndex fails miserably:
df['mean_of_resampled'] = df.groupby(df.index.date).mean()
or alternatively
df['mean_of_resampled'] = df.resample('1d').mean()
which both give:
[index] A A_daily_mean
2018-08-01 00:00:00 6 7.5
2018-08-01 01:00:00 7 NaN
2018-08-01 02:00:00 8 NaN
2018-08-01 03:00:00 9 NaN
I know I can add back the values by doing a merge or join, but I'm wondering if I'm missing some better, happier way.
I think you are looking for transform(), e.g.:
In []:
df['A_daily_mean'] = df.groupby(df.index.date).transform('mean')
df
Out[]:
A A_daily_mean
2018-08-01 00:00:00 6 7.5
2018-08-01 01:00:00 7 7.5
2018-08-01 02:00:00 8 7.5
2018-08-01 03:00:00 9 7.5
I have a pandas dataframe time column like following.
segments_data['time']
Out[1585]:
0 04:50:00
1 04:50:00
2 05:00:00
3 05:12:00
4 06:04:00
5 06:44:00
6 06:44:00
7 06:47:00
8 06:47:00
9 06:47:00
I want to add 5 hours and 30 mins to above time column.
I am doing following in python.
pd.DatetimeIndex(segments_data['time']) + pd.DateOffset(hours=5,minutes=30)
But it gives me an error.
TypeError: object of type 'datetime.time' has no len()
please help.
as of '0.25.3' this is as simple as
df[column] = df[column] + pd.Timedelta(hours=1)
You can try importing timedelta:
from datetime import datetime, timedelta
and then:
segments_data['time'] = pd.DatetimeIndex(segments_data['time']) + timedelta(hours=5,minutes=30)
Pandas does not support vectorised operations with datetime.time objects. For efficient, vectorised operations, there is no requirement to use the datetime module from the standard library.
You have a couple of options to vectorise your calculation. Either use a Pandas timedelta series, if your times represent a duration. Or use a Pandas datetime series, if your times represent specific points in time.
The choice depends entirely on what your data represents.
timedelta series
df['time'] = pd.to_timedelta(df['time'].astype(str)) + pd.to_timedelta('05:30:00')
print(df['time'].head())
0 10:20:00
1 10:20:00
2 10:30:00
3 10:42:00
4 11:34:00
Name: 1, dtype: timedelta64[ns]
datetime series
df['time'] = pd.to_datetime(df['time'].astype(str)) + pd.DateOffset(hours=5, minutes=30)
print(df['time'].head())
0 2018-12-24 10:20:00
1 2018-12-24 10:20:00
2 2018-12-24 10:30:00
3 2018-12-24 10:42:00
4 2018-12-24 11:34:00
Name: 1, dtype: datetime64[ns]
Notice by default the current date is assumed.
This is a gnarly way of doing it, principally the problem here is the lack of vectorised support for time objects, so you first need to convert the time to datetime by using combine and then apply the offset and get the time component back:
In [28]:
import datetime as dt
df['new_time'] = df['time'].apply(lambda x: (dt.datetime.combine(dt.datetime(1,1,1), x,) + dt.timedelta(hours=3,minutes=30)).time())
df
Out[28]:
time new_time
index
0 04:50:00 08:20:00
1 04:50:00 08:20:00
2 05:00:00 08:30:00
3 05:12:00 08:42:00
4 06:04:00 09:34:00
5 06:44:00 10:14:00
6 06:44:00 10:14:00
7 06:47:00 10:17:00
8 06:47:00 10:17:00
9 06:47:00 10:17:00
I'm trying to run fillna on a column of type datetime64[ns]. When I run something like:
df['date'].fillna(datetime("2000-01-01"))
I get:
TypeError: an integer is required
Any way around this?
This should work in 0.12 and 0.13 (just released).
#DSM points out that datetimes are constructed like: datetime.datetime(2012,1,1)
SO the error is from failing to construct the value the you are passing to fillna.
Note that using a Timestamp WILL parse the string.
In [3]: s = Series(date_range('20130101',periods=10))
In [4]: s.iloc[3] = pd.NaT
In [5]: s.iloc[7] = pd.NaT
In [6]: s
Out[6]:
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 NaT
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 NaT
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00
dtype: datetime64[ns]
datetime.datetime will work as well
In [7]: s.fillna(Timestamp('20120101'))
Out[7]:
0 2013-01-01 00:00:00
1 2013-01-02 00:00:00
2 2013-01-03 00:00:00
3 2012-01-01 00:00:00
4 2013-01-05 00:00:00
5 2013-01-06 00:00:00
6 2013-01-07 00:00:00
7 2012-01-01 00:00:00
8 2013-01-09 00:00:00
9 2013-01-10 00:00:00
dtype: datetime64[ns]
Right now, df['date'].fillna(pd.Timestamp("20210730")) works in pandas 1.3.1
This example is works with dynamic data if you want to replace NaT data in rows with data from another DateTime data.
df['column_with_NaT'].fillna(df['dt_column_with_thesame_index'], inplace=True)
It's works for me when I was updated some rows in DateTime column and not updated rows had NaT value, and I've been needed to inherit old series data. And this code above resolve my problem. Sry for the not perfect English )
I have a dataframe which I want to split into 5 chunks (more generally n chunks), so that I can apply a groupby on the chunks.
I want the chunks to have equal time intervals but in general each group may contain different numbers of records.
Let's call the data
s = pd.Series(pd.date_range('2012-1-1', periods=100, freq='D'))
and the timeinterval ti = (s.max() - s.min())/n
So the first chunk should include all rows with dates between s.min() and s.min() + ti, the second, all rows with dates between s.min() + ti and s.min() + 2*ti, etc.
Can anyone suggest an easy way to achieve this? If somehow I could convert all my dates into seconds since the epoch, then I could do something like thisgroup = floor(thisdate/ti).
Is there an easy 'pythonic' or 'panda-ista' way to do this?
Thanks very much (and Merry Christmas!),
Robin
You can use numpy.array_split:
>>> import pandas as pd
>>> import numpy as np
>>> s = pd.Series(pd.date_range('2012-1-1', periods=10, freq='D'))
>>> np.array_split(s, 5)
[0 2012-01-01 00:00:00
1 2012-01-02 00:00:00
dtype: datetime64[ns], 2 2012-01-03 00:00:00
3 2012-01-04 00:00:00
dtype: datetime64[ns], 4 2012-01-05 00:00:00
5 2012-01-06 00:00:00
dtype: datetime64[ns], 6 2012-01-07 00:00:00
7 2012-01-08 00:00:00
dtype: datetime64[ns], 8 2012-01-09 00:00:00
9 2012-01-10 00:00:00
dtype: datetime64[ns]]
>>> np.array_split(s, 2)
[0 2012-01-01 00:00:00
1 2012-01-02 00:00:00
2 2012-01-03 00:00:00
3 2012-01-04 00:00:00
4 2012-01-05 00:00:00
dtype: datetime64[ns], 5 2012-01-06 00:00:00
6 2012-01-07 00:00:00
7 2012-01-08 00:00:00
8 2012-01-09 00:00:00
9 2012-01-10 00:00:00
dtype: datetime64[ns]]
The answer is as follows:
s = pd.DataFrame(pd.date_range('2012-1-1', periods=20, freq='D'), columns=["date"])
n = 5
s["date"] = np.int64(s) #This step may not be needed in future pandas releases
s["bin"] = np.floor((n-0.001)*(s["date"] - s["date"].min( )) /((s["date"].max( ) - s["date"].min( ))))