I am able to convert this to datetime64[ns] while doing individually as a series, but when try to do it over dataframe I get this error:
df[['Date Range','ME Created Date/Time','Ready For Books Date/Time']]=pd.to_datetime(df[['Date Range','ME Created Date/Time','Ready For Books Date/Time']],format='%d-%m-%Y %H:%M:%S')
to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
Date Range
ME Created Date/Time
Ready For Books Date/Time
11-05-2022 00:00:00
02-05-2022 14:31:37
11-05-2022 00:00:00
10-09-2022 00:00:00
06-09-2022 14:19:03
10-09-2022 00:00:00
10-09-2022 00:00:00
06-09-2022 14:19:03
10-09-2022 00:00:00
10-09-2022 00:00:00
06-09-2022 14:19:03
10-09-2022 00:00:00
10-09-2022 00:00:00
06-09-2022 14:19:03
10-09-2022 00:00:00
I solved it through apply method. But I wanted to do it directly with .to_datetime().
df[['Date Range','ME Created Date/Time','Ready For Books Date/Time']] = df[['Date Range','ME Created Date/Time','Ready For Books Date/Time']].apply(pd.to_datetime, format='%d-%m-%Y %H:%M:%S')
So I have 2 questions:
Is it possible to use to_datetime() directly on the dataframe as shown above without apply method?
Is it possible for to_datetime() to return the output as 'Date' without the input timestamp & without the help of .dt.date accessor?
I'm not sure this is the most efficient way, but for sure it's one of the easiest to read :
df = df.applymap(lambda x: pd.to_datetime(x).date())
Related
I'm trying to print a dataframe with datetimes corresponding to the 2/29/2020 date omitted in Jupyter. When I typed in the conditional statement on the top cell in the picture linked below and outputted the dataframe onto the bottom cell with all of the datetimes after 2/28/2020 22:00:00, only the dataframe row corresponding to just the first hour of the day (2/29/2020 00:00:00) was omitted and not the dataframe rows corresponding to the 2/29/2020 01:00:00 -> 2/29/2020 23:00:00 datetimes like I wanted. How can I change the conditional statement on the top cell which will make it so that all of the datetimes for 2/29/2020 will disappear?
To omit all datetimes of 2/29/2020, you need to first convert the datetimes to dates in your comparison.
Change:
post_retrofit[post_retrofit['Unit Datetime'] != date(2020, 2, 29)]
To:
post_retrofit[post_retrofit['Unit Datetime'].dt.date != datetime(2020, 2, 29).date()]
Your question not clear.
Lets assume I have the following;
Data
post_retrofit_without_2_29=pd.DataFrame({'Unit Datetime':['2020-02-28 23:00:00','2020-02-28 22:00:00','2020-02-29 22:00:00']})
print(post_retrofit_without_2_29)
Unit Datetime
0 2020-02-28 23:00:00
1 2020-02-28 22:00:00
2 2020-02-29 22:00:00
Solution
To filter out by date, I have to coerce the datetime to date as follows;
post_retrofit_without_2_29['Unit Date']=pd.to_datetime(post_retrofit_without_2_29['Unit Datetime']).dt.strftime("%y-%m-%d")
print(post_retrofit_without_2_29)
Unit Datetime Unit Date
0 2020-02-28 23:00:00 20-02-28
1 2020-02-28 22:00:00 20-02-28
2 2020-02-29 22:00:00 20-02-29
Filter
post_retrofit_without_2_29[post_retrofit_without_2_29['Unit Date']>'20-02-28']
Unit Datetime Unit Date
2 2020-02-29 22:00:00 20-02-29
You can do this easily by creating a pivot_table() with the dates as indexes. As a result of which, there will be no problem with the time.
post_retrofit_without_2_29_pivot = pd.pivot_table(data=post_retrofit_without_2_29 , index= post_retrofit_without_2_29['Unit Datetime'])
post_retrofit_without_2_29_pivot.loc[hourly_pivot.index != pd.to_datetime("2020-02-29") ]
I know this is a bit lengthy, but its simple to understand.
Hope, you got some help with this answer :}
I have a dataframe (df) with two columns where the head looks like
name start end
0 John 2018-11-09 00:00:00 2012-03-01 00:00:00
1 Steve 1990-09-03 00:00:00
2 Debs 1977-09-07 00:00:00 2012-07-02 00:00:00
3 Mandy 2009-01-09 00:00:00
4 Colin 1993-08-22 00:00:00 2002-06-03 00:00:00
The start and end columns have the type object. I want to change the type to datetime so I can use the following:
referenceError = DeptTemplate['start'] > DeptTemplate['end']
am trying to change the type using:
df['start'].dt.strftime('%d/%m/%Y')
df['end'].dt.strftime('%d/%m/%Y')
but I think where there are some rows where there are no date in the columns its causing a problem. How can I set any blank values so I can change the type to date time and run my analysis?
As shown in the .to_datetime docs you can set the behavior using the errors kwarg. You can also set the strftime format with the format kwarg.
# Bad values will be NaT
df["start"] = pd.to_datetime(df.start, errors='coerce', format='%d/%m/%Y')
As mentioned in the comments, you can prepare the column with replace if you absolutely must use strftime.
I have a DatetimeIndex in pandas and I want to convert it to a rolling DatetimeIndex using the last date in the series.
So if I create a sample datetime index:
dates=pd.DatetimeIndex(pd.date_range(dt(2017,10,1),dt(2018,02,02)))
An example
Input: DatetimeIndex with all dates in the above range:
dates
2017-10-01
2017-10-02
.
.
2018-02-01
2018-02-02
Desired Output: DatetimeIndex with only the 2nd of every month (as that is the last date in the input):
dates
2017-10-02
2017-11-02
2017-12-02
2018-01-02
2018-02-02
Attempts
I've tried
dates[::-1][::30]
and also
dates[dates.apply(lambda x: x.date().day==2)]
Unfortunately months can differ by 30 or 31 days so the first way doesn't work and while the second method works for days in range 1-30, for the 31st it skips every other month. So, for example, if I had:
dates
2017-10-01
2017-10-02
.
.
2018-01-31
I would want:
dates
2017-10-31
2017-11-30
2017-12-31
2018-01-31
while the second method skips November as it doesn't have a 30th.
Is there any way to use RelativeDelta to do this?
You can use the .is_month_end functionality in Pandas. This gives an array of boolean values – True if the date is a month-end, false if otherwise.
import pandas as pd
import datetime as dt
dates=pd.Series(pd.date_range('2017-10-1','2017-12-31'))
print(dates[dates.is_month_end])
Output
DatetimeIndex(['2017-10-31', '2017-11-30', '2017-12-31'], dtype='datetime64[ns]', freq=None)
This will help you filter things.
I have normal strings with more than millions data points from .csv file with format as below:
Datetime
22/12/2015 17:00:00
22/12/2015 18:00:00
I loaded into pandas and tried to converted into datetime format by using pandas.to_datetime(df['Datetime']). However, the new time series data I got that is not correct. There are some new Datetime produced during converting process. For example, 2016-12-11 23:30:00 that does not contain in original data.
It has been a while that I worked with panda, but in your example you have a different dateformat than in the example lines from csv:
yyyy-mm-dd hh:mm:ss
instead of
mm/dd/yyyy hh:mm:ss
the to_datetime function takes a parameter "format", this should help if that is the cause.
You want to use the option dayfirst=True
pd.to_datetime(df.Datetime, dayfirst=True)
This:
Datetime
22/12/2015 17:00:00
22/12/2015 18:00:00
11/12/2015 23:30:00
Gets converted to
0 2015-12-22 17:00:00
1 2015-12-22 18:00:00
2 2015-12-11 23:30:00
Name: Datetime, dtype: datetime64[ns]
I have a column with a birthdate. Some are N.A, some 01.01.2016 but some contain 01.01.2016 01:01:01
Filtering the N.A. values works fine. But handling the different date formats seems clumsy. Is it possible to have pandas handle these gracefully and e.g. for a birthdate only interpret the date and not fail?
pd.to_datetime() will handle multiple formats
>>> ser = pd.Series(['NaT', '01.01.2016', '01.01.2016 01:01:01'])
>>> pd.to_datetime(ser)
0 NaT
1 2016-01-01 00:00:00
2 2016-01-01 01:01:01
dtype: datetime64[ns]