I have a DatetimeIndex in pandas and I want to convert it to a rolling DatetimeIndex using the last date in the series.
So if I create a sample datetime index:
dates=pd.DatetimeIndex(pd.date_range(dt(2017,10,1),dt(2018,02,02)))
An example
Input: DatetimeIndex with all dates in the above range:
dates
2017-10-01
2017-10-02
.
.
2018-02-01
2018-02-02
Desired Output: DatetimeIndex with only the 2nd of every month (as that is the last date in the input):
dates
2017-10-02
2017-11-02
2017-12-02
2018-01-02
2018-02-02
Attempts
I've tried
dates[::-1][::30]
and also
dates[dates.apply(lambda x: x.date().day==2)]
Unfortunately months can differ by 30 or 31 days so the first way doesn't work and while the second method works for days in range 1-30, for the 31st it skips every other month. So, for example, if I had:
dates
2017-10-01
2017-10-02
.
.
2018-01-31
I would want:
dates
2017-10-31
2017-11-30
2017-12-31
2018-01-31
while the second method skips November as it doesn't have a 30th.
Is there any way to use RelativeDelta to do this?
You can use the .is_month_end functionality in Pandas. This gives an array of boolean values – True if the date is a month-end, false if otherwise.
import pandas as pd
import datetime as dt
dates=pd.Series(pd.date_range('2017-10-1','2017-12-31'))
print(dates[dates.is_month_end])
Output
DatetimeIndex(['2017-10-31', '2017-11-30', '2017-12-31'], dtype='datetime64[ns]', freq=None)
This will help you filter things.
Related
I am able to convert this to datetime64[ns] while doing individually as a series, but when try to do it over dataframe I get this error:
df[['Date Range','ME Created Date/Time','Ready For Books Date/Time']]=pd.to_datetime(df[['Date Range','ME Created Date/Time','Ready For Books Date/Time']],format='%d-%m-%Y %H:%M:%S')
to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
Date Range
ME Created Date/Time
Ready For Books Date/Time
11-05-2022 00:00:00
02-05-2022 14:31:37
11-05-2022 00:00:00
10-09-2022 00:00:00
06-09-2022 14:19:03
10-09-2022 00:00:00
10-09-2022 00:00:00
06-09-2022 14:19:03
10-09-2022 00:00:00
10-09-2022 00:00:00
06-09-2022 14:19:03
10-09-2022 00:00:00
10-09-2022 00:00:00
06-09-2022 14:19:03
10-09-2022 00:00:00
I solved it through apply method. But I wanted to do it directly with .to_datetime().
df[['Date Range','ME Created Date/Time','Ready For Books Date/Time']] = df[['Date Range','ME Created Date/Time','Ready For Books Date/Time']].apply(pd.to_datetime, format='%d-%m-%Y %H:%M:%S')
So I have 2 questions:
Is it possible to use to_datetime() directly on the dataframe as shown above without apply method?
Is it possible for to_datetime() to return the output as 'Date' without the input timestamp & without the help of .dt.date accessor?
I'm not sure this is the most efficient way, but for sure it's one of the easiest to read :
df = df.applymap(lambda x: pd.to_datetime(x).date())
I am trying to convert my column in a df into a time series. The dataset goes from March 23rd 2015-August 17th 2019 and the dataset looks like this:
time 1day_active_users
0 2015-03-23 00:00:00-04:00 19687.0
1 2015-03-24 00:00:00-04:00 19437.0
I am trying to convert the time column into a datetime series but it returns the column as an object. Here is the code:
data = pd.read_csv(data_path)
data.set_index('time', inplace=True)
data.index= pd.to_datetime(data.index)
data.index.dtype
data.index.dtype returns dtype('O'). I assume this is why when I try to index an element in time, it returns an error. For example, when I run this:
data.loc['2015']
It gives me this error
KeyError: '2015'
Any help or feedback would be appreciated. Thank you.
As commented, the problem might be due to the different timezones. Try passing utc=True to pd.to_datetime:
df['time'] = pd.to_datetime(df['time'],utc=True)
df['time']
Test Data
time 1day_active_users
0 2015-03-23 00:00:00-04:00 19687.0
1 2015-03-24 00:00:00-05:00 19437.0
Output:
0 2015-03-23 04:00:00+00:00
1 2015-03-24 05:00:00+00:00
Name: time, dtype: datetime64[ns, UTC]
And then:
df.set_index('time', inplace=True)
df.loc['2015']
gives
1day_active_users
time
2015-03-23 04:00:00+00:00 19687.0
2015-03-24 05:00:00+00:00 19437.0
I'm trying to print a dataframe with datetimes corresponding to the 2/29/2020 date omitted in Jupyter. When I typed in the conditional statement on the top cell in the picture linked below and outputted the dataframe onto the bottom cell with all of the datetimes after 2/28/2020 22:00:00, only the dataframe row corresponding to just the first hour of the day (2/29/2020 00:00:00) was omitted and not the dataframe rows corresponding to the 2/29/2020 01:00:00 -> 2/29/2020 23:00:00 datetimes like I wanted. How can I change the conditional statement on the top cell which will make it so that all of the datetimes for 2/29/2020 will disappear?
To omit all datetimes of 2/29/2020, you need to first convert the datetimes to dates in your comparison.
Change:
post_retrofit[post_retrofit['Unit Datetime'] != date(2020, 2, 29)]
To:
post_retrofit[post_retrofit['Unit Datetime'].dt.date != datetime(2020, 2, 29).date()]
Your question not clear.
Lets assume I have the following;
Data
post_retrofit_without_2_29=pd.DataFrame({'Unit Datetime':['2020-02-28 23:00:00','2020-02-28 22:00:00','2020-02-29 22:00:00']})
print(post_retrofit_without_2_29)
Unit Datetime
0 2020-02-28 23:00:00
1 2020-02-28 22:00:00
2 2020-02-29 22:00:00
Solution
To filter out by date, I have to coerce the datetime to date as follows;
post_retrofit_without_2_29['Unit Date']=pd.to_datetime(post_retrofit_without_2_29['Unit Datetime']).dt.strftime("%y-%m-%d")
print(post_retrofit_without_2_29)
Unit Datetime Unit Date
0 2020-02-28 23:00:00 20-02-28
1 2020-02-28 22:00:00 20-02-28
2 2020-02-29 22:00:00 20-02-29
Filter
post_retrofit_without_2_29[post_retrofit_without_2_29['Unit Date']>'20-02-28']
Unit Datetime Unit Date
2 2020-02-29 22:00:00 20-02-29
You can do this easily by creating a pivot_table() with the dates as indexes. As a result of which, there will be no problem with the time.
post_retrofit_without_2_29_pivot = pd.pivot_table(data=post_retrofit_without_2_29 , index= post_retrofit_without_2_29['Unit Datetime'])
post_retrofit_without_2_29_pivot.loc[hourly_pivot.index != pd.to_datetime("2020-02-29") ]
I know this is a bit lengthy, but its simple to understand.
Hope, you got some help with this answer :}
I have a column in a dataframe that I want to convert to a date. The values of the column are either DDMONYYY or DD Month YYYY 00:00:00.000 GMT. For example, one row in the dataframe could have the value 31DEC2002 and the next row could have 31 December 2015 00:00:00.000 GMT. I think this is why I get an error when trying to convert the column to a date using pd.to_datetime or datetime.strptime to convert.
Anyone got any ideas? I'd be very grateful for any help/pointers.
For me working to_datetime with utc=True for converting all values to UTC and errors='coerce' for convert not parseable values to NaT (missing datetime):
df = pd.DataFrame({'date':['31DEC2002','31 December 2015 00:00:00.000 GMT','.']})
df['date'] = pd.to_datetime(df['date'], utc=True, errors='coerce')
print (df)
date
0 2002-12-31 00:00:00+00:00
1 2015-12-31 00:00:00+00:00
2 NaT
I have a dataframe with a datetime column. I want to group by the time component only and aggregate, e.g. by taking the mean.
I know that I can use pd.Grouper to group by date AND time, but it doesn't work on time only.
Say we have the following dataframe:
import numpy as np
import pandas as pd
drange = pd.date_range('2019-08-01 00:00', '2019-08-12 12:00', freq='1T')
time = drange.time
c0 = np.random.rand(len(drange))
c1 = np.random.rand(len(drange))
df = pd.DataFrame(dict(drange=drange, time=time, c0=c0, c1=c1))
print(df.head())
drange time c0 c1
0 2019-08-01 00:00:00 00:00:00 0.031946 0.159739
1 2019-08-01 00:01:00 00:01:00 0.809171 0.681942
2 2019-08-01 00:02:00 00:02:00 0.036720 0.133443
3 2019-08-01 00:03:00 00:03:00 0.650522 0.409797
4 2019-08-01 00:04:00 00:04:00 0.239262 0.814565
In this case, the following throws a TypeError:
grouper = pd.Grouper(key='time', freq='5T')
grouped = df.groupby(grouper).mean()
I could set key=drange to group by date and time and then:
Reset the index
Transform the new column to float
Bin with pd.cut
Cast back to time
Finally group-by and then aggregate
... But I wonder whether there is a cleaner way to achieve the same results.
Series.dt.time/DatetimeIndex.time returns the time as datetime.time. This isn't great because pandas works best withtimedelta64 and so your 'time' column is cast to object, losing all datetime functionality.
You can subtract off the normalized date to obtain the time as a timedelta so you can continue to use the datetime tools of pandas. You can floor this to group.
s = (df.drange - df.drange.dt.normalize()).dt.floor('5T')
df.groupby(s).mean()
c0 c1
drange
00:00:00 0.436971 0.530201
00:05:00 0.441387 0.518831
00:10:00 0.465008 0.478130
... ... ...
23:45:00 0.523233 0.515991
23:50:00 0.468695 0.434240
23:55:00 0.569989 0.510291
Alternatively if you feel unsure of floor, this gets the identical output up to the index name
df['time'] = (df.drange - df.drange.dt.normalize()) # timedelta64[ns]
df.groupby(pd.Grouper(key='time', freq='5T')).mean()
When you use DataFrame.groupby you can a Series an argument. Moreover, if your series is a datetime, you can use the series.dt to access the properties of date. In your case df['drange'].dt.hour or df['drange'].dt.time should do it.
# df['drange']=pd.to_datetime(df['drange'])
df.groupby(df['drange'].dt.hour).agg(...)