Data
I import a date from an Excel workbook and store it in a variable called reportdate.
reportdate = pd.read_excel(file, header=None, nrows= 1, use_cols = 'A:B').dropna(axis=1, how='all').loc[0,1]
I then convert reportdate to a DataFrame using rdf = pd.DataFrame({'Date':[reportdate]}).
type(reportdate) returns pandas._libs.tslibs.timestamps.Timestamp.
reportdate returns Timestamp('2019-12-02 07:19:07.703000').
I don't know how to recreate reportdate to be that exact format and timestamp format.
Here is a sample data set.
df = pd.DataFrame({'CN ON': ['WD-D5','JF-04','P5'],
'Date Range': ['10/05/2019 - 11/06/2019','09/05/2019 - 12/15/2019','05/09/2019 - 10/25/2019']
})
What I do
I then parse apart Date Range to get the last date in the range and convert it to a datetime type.
df['End Date'] = df['Date Range'].str[-10:]
df['End Date'] = pd.to_datetime(df['End Date'], errors='coerce')
I need to calculate the day difference between reportdate and End Date.
What I try
Here is what I try.
df['ReportDate'] = reportdate
df['ReportDate'] = pd.to_datetime(df['ReportDate'], errors='coerece')
df['Days'] = df['End Date'] - df['ReportDate']
Then I check the types.
df.dtypes returns datetime64[ns] for both ReportDate and End Date.
What I need
I need the difference in days to be an integer or float because I need to check if those days are between certain values.
I keep getting the following error TypeError: ufunc subtract cannot use operands with types dtype('<U10') and dtype('<M8[ns]').
Any guidance on how I can get the days difference between the dates in a number (int, float, etc.) format would be greatly appreciated. I don't know where my TypeError is throwing.
The problem is caused by errors='coerce'. I searched and someone said 'coerce' is a leftover from old-version python. Try to remove it.
import pandas as pd
df = pd.DataFrame({'CN ON': ['WD-D5','JF-04','P5'],
'Date Range': ['10/05/2019 - 11/06/2019','09/05/2019 - 12/15/2019','05/09/2019 - 10/25/2019']
})
df['End Date'] = df['Date Range'].str[-10:]
df['End Date'] = pd.to_datetime(df['End Date'])
df['ReportDate'] = '2019-12-02 07:19:08'
df['ReportDate'] = pd.to_datetime(df['ReportDate'])
df['Days'] = df['End Date'] - df['ReportDate']
print(df)
Related
I have a df with a column 'start date'. It has date values. They are of string type. The column also has blank values.
--> df.loc[0,'start date']
output : '2022-10-04'
--> type(df.loc[0,'start date'])
output : str
I have imported datetime
import datetime
And I try to convert this entire column to pandas datetime type.
df['start date'] = pd.to_datetime(df['start date'], errors='coerce')
Because this will give time stamp as well, I want to keep only the date, I did this:
df['start date'] = pd.to_datetime(df['start date'], errors='coerce').datetime.date
But this gives an Attribute error :AttributeError: 'Series' object has no attribute 'datetime'
Whereas if I tweak the code to :
df['start date'] = pd.to_datetime(df['start date'], errors='coerce').dt.date
It works!
What's happening?! Any help in understanding the concept behind this would be appreciated :)
I have a dataframe trades_df which looks like this -
Open Time
Open Price
Close Time
19-08-2020 12:19
1.19459
19-08-2020 12:48
28-08-2020 03:09
0.90157
08-09-2020 12:20
It has columns open_time and close_time in the format 19-08-2020 12:19. I want to make a new column 'time_spent' which would be the difference between 'close_time' and 'open_time'.
Is this what you are trying to do?
df = pd.DataFrame({
'Start_Time' : ['19-08-2020 12:19', '28-08-2020 03:09'],
'End_Time' : ['19-08-2020 12:48', '28-08-2020 06:03']
})
df['Start_Time'] = pd.to_datetime(df['Start_Time'], infer_datetime_format=True)
df['End_Time'] = pd.to_datetime(df['End_Time'], infer_datetime_format=True)
df['Time_Difference'] = df['End_Time'] - df['Start_Time']
df
Your datetimes have days come first, so dayfirst should be True:
# use dayfirst argument to convert to correct datetimes
df['time_spent'] = pd.to_datetime(df.close_time, dayfirst=True) - pd.to_datetime(df.open_time, dayfirst=True)
So I have a issue around dates that are coming from a excel sheet which I'm transforming into a CSV and then loading into a data frame. Basically the data I'm dealing with each day can come in two different formats. These two date columns are called Appointment Date and Attended Date
I'm dealing with (DD/MM/YYYY HH:MM) and (YYYY/MM/DD HH:MM) and its coming from a third party so I cant set the date format structure. What i need to do is parse the data and remove the HH:MM and output the data only has DD/MM/YYYY.
My current code is currently the following:
df['Appointment Date'] = df['Appointment Date'].str.replace(' ', '/', regex=True)
df['Attended Date'] = df['Attended Date'].str.replace(' ', '/', regex=True)
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M").dt.strftime("%d/%m/%Y")
df['Attended Date'] = pd.to_datetime(df['Attended Date'], format="%d/%m/%Y/%H:%M").dt.strftime("%d/%m/%Y")
But I'm not able to parse the data when it comes through as YYYY/MM/DD HH:MM
Exception error:
time data '2021-10-08/00:00:00' does not match format '%d/%m/%Y/%H:%M' (match)
Any ideas on how i can get around this?
Try it one way, and if it doesn't work, try it the other way.
try:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M:%S").dt.strftime("%d/%m/%Y")
except WhateverDateParseException:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%Y/%m/%d/%H:%M:%S").dt.strftime("%d/%m/%Y")
Of course, instead of WhateverDateParseException use the actual exception that is raised in your code.
Edit: fixed missing "%S"
I would use regular expressions for that as follows:
import pandas as pd
df = pd.DataFrame({"daytime": ["31/12/2020 23:59", "2020/12/31 23:59"]})
df["daypart"] = df["daytime"].str.replace(r" \d\d:\d\d","") # drop HH:MM part
df["day"] = df["daypart"].str.replace(r"(\d\d\d\d)/(\d\d)/(\d\d)", r"\3/\2/\1")
print(df)
output
daytime daypart day
0 31/12/2020 23:59 31/12/2020 31/12/2020
1 2020/12/31 23:59 2020/12/31 31/12/2020
Explanation: I used so-called capturing groups in second .replace, if there is (4 digits)/(2 digits)/(2 digits) their order is re-arranged that 3rd become 1st, 2nd become 2nd and 1st become 3rd (note that group are 1-based, not 0-base like is case with general python indexing). AS day format is now consistent you could be able to parse it easily.
As mentioned by #C14L that method can be followed but my guess seeing your exception is you need to add a seconds format (%S) to your time formatting, so the updated code wld be like
try:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%d/%m/%Y/%H:%M:%S").dt.strftime("%d/%m/%Y")
except WhateverDateParseException:
df['Appointment Date'] = pd.to_datetime(df['Appointment Date'], format="%Y/%m/%d/%H:%M:%S").dt.strftime("%d/%m/%Y")
The format, %d/%m/%Y/%H:%M does not match with the Date-Time string, 2021-10-08/00:00:00. You need to use %Y-%m-%d/%H:%M:%S for this Date-Time string.
Demo:
from datetime import datetime
date_time_str = '2021-10-08/00:00:00'
date_str = datetime.strptime(date_time_str, '%Y-%m-%d/%H:%M:%S').strftime('%d/%m/%Y')
print(date_str)
Output:
08/10/2021
I have a python dataframe with 2 columns that contain dates as strings e.g. start_date "2002-06-12" and end_date "2009-03-01". I would like to calculate the difference (days) between these 2 columns for each row and save the results into a new column called for example time_diff of type float.
I have tried:
df["time_diff"] = (pd.Timestamp(df.end_date) - pd.Timestamp(df.start_date )).astype("timedelta64[d]")
pd.to_numeric(df["time_diff"])
based on some tutorials but this gives TypeError: Cannot convert input for the first line. What do I need to change to get this running?
Here is a working example of converting a string column of a dataframe to datetime type and saving the time difference between the datetime columns in a new column as a float data type( number of seconds)
import pandas as pd
from datetime import timedelta
tmp = [("2002-06-12","2009-03-01"),("2016-04-28","2022-03-14")]
df = pd.DataFrame(tmp,columns=["col1","col2"])
df["col1"]=pd.to_datetime(df["col1"])
df["col2"]=pd.to_datetime(df["col2"])
df["time_diff"]=df["col2"]-df["col1"]
df["time_diff"]=df["time_diff"].apply(timedelta.total_seconds)
Time difference in seconds can be converted to minutes or days by using simple math.
Try:
import numpy as np
enddates = np.asarray([pd.Timestamp(end) for end in df.end_date.values])
startdates = np.asarray([pd.Timestamp(start) for start in df.start_date.values])
df['time_diff'] = (enddates - startdates).astype("timedelta64")
First convert strings to datetime, then calculate difference in days.
df['start_date'] = pd.to_datetime(df['start_date'], format='%Y-%m-%d')
df['end_date'] = pd.to_datetime(df['end_date'], format='%Y-%m-%d')
df['time_diff'] = (df.end_date - df.start_date).dt.days
You can also do it by converting your columns into date and then computing the difference :
from datetime import datetime
df = pd.DataFrame({'Start Date' : ['2002-06-12', '2002-06-12' ], 'End date' : ['2009-03-01', '2009-03-06']})
df['Start Date'] = [ datetime.strptime(x, "%Y-%m-%d").date() for x in df['Start Date'] ]
df['End date'] = [ datetime.strptime(x, "%Y-%m-%d").date() for x in df['End date'] ]
df['Diff'] = df['End date'] - df['Start Date']
Out :
End date Start Date Diff
0 2009-03-01 2002-06-12 2454 days
1 2009-03-06 2002-06-12 2459 days
You should just use pd.to_datetime to convert your string values:
df["time_diff"] = (pd.to_datetime(df.end_date) - pd.to_datetime(df.start_date))
The resul will automatically be a timedelta64
You can try this :
df = pd.DataFrame()
df['Arrived'] = [pd.Timestamp('01-04-2017')]
df['Left'] = [pd.Timestamp('01-06-2017')]
diff = df['Left'] - df['Arrived']
days = pd.Series(delta.days for delta in (diff)
result = days[0]
I am using pandas dataframe that is loaded with csv files along with dates in it. Lets say
Assigned Date
1/15/2019
Resolved Date
1/20/2019
I am calculating the differance
df0['ResDate'] = df0['Resolved Date'].apply(lambda t: pd.to_datetime(t).date())
df0['RepDate'] = df0['Assigned Date'].apply(lambda t: pd.to_datetime(t).date())
df0['Woda']=df0['ResDate']-df0['RepDate']
I am getting the correct differance but i need to subract the weekends in this.
How do i proceed.
Thanks
Use numpy.busday_count:
df0['Assigned Date'] = pd.to_datetime(df0['Assigned Date'])
df0['Resolved Date'] = pd.to_datetime(df0['Resolved Date'])
df0['Woda'] = [np.busday_count(b,a) for a, b in zip(df0['Resolved Date'],df0['Assigned Date'])]
You can use datetime module to find the difference between two days:
import datetime
d1 = datetime.datetime.strptime('2019-01-15', '%Y-%m-%d')
d2 = datetime.datetime.strptime('2019-01-20', '%Y-%m-%d')
diff_days = (d2 - d1).days
diff_weekdays = diff_days - (diff_days // 7) * 2
print(diff_weekdays)