I have a python dataframe with 2 columns that contain dates as strings e.g. start_date "2002-06-12" and end_date "2009-03-01". I would like to calculate the difference (days) between these 2 columns for each row and save the results into a new column called for example time_diff of type float.
I have tried:
df["time_diff"] = (pd.Timestamp(df.end_date) - pd.Timestamp(df.start_date )).astype("timedelta64[d]")
pd.to_numeric(df["time_diff"])
based on some tutorials but this gives TypeError: Cannot convert input for the first line. What do I need to change to get this running?
Here is a working example of converting a string column of a dataframe to datetime type and saving the time difference between the datetime columns in a new column as a float data type( number of seconds)
import pandas as pd
from datetime import timedelta
tmp = [("2002-06-12","2009-03-01"),("2016-04-28","2022-03-14")]
df = pd.DataFrame(tmp,columns=["col1","col2"])
df["col1"]=pd.to_datetime(df["col1"])
df["col2"]=pd.to_datetime(df["col2"])
df["time_diff"]=df["col2"]-df["col1"]
df["time_diff"]=df["time_diff"].apply(timedelta.total_seconds)
Time difference in seconds can be converted to minutes or days by using simple math.
Try:
import numpy as np
enddates = np.asarray([pd.Timestamp(end) for end in df.end_date.values])
startdates = np.asarray([pd.Timestamp(start) for start in df.start_date.values])
df['time_diff'] = (enddates - startdates).astype("timedelta64")
First convert strings to datetime, then calculate difference in days.
df['start_date'] = pd.to_datetime(df['start_date'], format='%Y-%m-%d')
df['end_date'] = pd.to_datetime(df['end_date'], format='%Y-%m-%d')
df['time_diff'] = (df.end_date - df.start_date).dt.days
You can also do it by converting your columns into date and then computing the difference :
from datetime import datetime
df = pd.DataFrame({'Start Date' : ['2002-06-12', '2002-06-12' ], 'End date' : ['2009-03-01', '2009-03-06']})
df['Start Date'] = [ datetime.strptime(x, "%Y-%m-%d").date() for x in df['Start Date'] ]
df['End date'] = [ datetime.strptime(x, "%Y-%m-%d").date() for x in df['End date'] ]
df['Diff'] = df['End date'] - df['Start Date']
Out :
End date Start Date Diff
0 2009-03-01 2002-06-12 2454 days
1 2009-03-06 2002-06-12 2459 days
You should just use pd.to_datetime to convert your string values:
df["time_diff"] = (pd.to_datetime(df.end_date) - pd.to_datetime(df.start_date))
The resul will automatically be a timedelta64
You can try this :
df = pd.DataFrame()
df['Arrived'] = [pd.Timestamp('01-04-2017')]
df['Left'] = [pd.Timestamp('01-06-2017')]
diff = df['Left'] - df['Arrived']
days = pd.Series(delta.days for delta in (diff)
result = days[0]
Related
I had a column in data frame called startEndDate, example: '10.12-20.05.2019', divided those to columns start_date and end_date with same year, example: start_date '10.12.2019' and end_date '20.05.2019'. But year in this example is wrong, as it should be 2018 because start date cannot be after end date. How can I compare entire dataframe and replace values so it contains correct start_dates based on if statement(because some start dates should stay with year as 2019)?
This will show you which rows the start_date is > than the end date
data = {
'Start_Date' : ['2020-01-01', '2020-01-02', '2020-01-03', '2020-01-04'],
'End_Date' : ['2020-02-01', '2019-01-02', '2019-01-03', '2020-01-05']
}
df = pd.DataFrame(data)
df['Start_Date'] = pd.to_datetime(df['Start_Date'], infer_datetime_format=True)
df['End_Date'] = pd.to_datetime(df['End_Date'], infer_datetime_format=True)
df['Check'] = np.where(df['Start_Date'] > df['End_Date'], 'Error', 'No Error')
df
Without seeing more of your data or your intended final data this is the best we will be able to do to help identify problems in the data.
This method first splits up the date string to two dates and creates start and end date columns. Then it subtracts 1 year from the start date if it is greater than the end date.
import pandas as pd
import numpy as np
# mock data
df = pd.DataFrame({"dates": ["10.12-20.05.2019", "02.04-31.10.2019"]})
# split date string to two dates, convert to datetime and stack to columns
df[["start", "end"]] = np.vstack(
df.dates.apply(lambda x: pd.to_datetime(
[x.split("-")[0] + x[-5:],
x.split("-")[1]], format="%d.%m.%Y")))
# subtract 1 year from start date if greater than end date
df["start"] = np.where(df["start"]>df["end"],
df["start"] - pd.DateOffset(years=1),
df["start"])
df
# dates start end
#0 10.12-20.05.2019 2018-12-10 2019-05-20
#1 02.04-31.10.2019 2019-04-02 2019-10-31
Although I have used split here for the initial splitting of the string, as there will always be 5 characters before the hyphen, and the date will always be the last 5 (with the .), there is no need to use the split and instead that line could change to:
df[["start", "end"]] = np.vstack(
df.dates.apply(lambda x: pd.to_datetime(
[x[:5] + x[-5:],
x[6:]], format="%d.%m.%Y")))
So, Basically, I got this 2 df columns with data content. The initial content is in the dd/mm/YYYY format, and I want to subtract them. But I can't really subtract string, so I converted it to datetime, but when I do such thing for some reason the format changes to YYYY-dd-mm, so when I try to subtract them, I got a wrong result. For example:
Initial Content:
a: 05/09/2022
b: 30/09/2021
result expected: 25 days.
Converted to DateTime:
a: 2022-05-09
b: 2021-09-30 (For some reason this date stills the same)
result: 144 days.
I'm using pandas and datetime to make this project.
So, I wanted to know a way I can subtract this 2 columns with the proper result.
--- Answer
When I used
pd.to_datetime(date, format="%d/%m/%Y")
It worked. Thank you all for your time. This is my first project in pandas. :)
df = pd.DataFrame({'Date1': ['05/09/2021'], 'Date2': ['30/09/2021']})
df = df.apply(lambda x:pd.to_datetime(x,format=r'%d/%m/%Y')).assign(Delta=lambda x: (x.Date2-x.Date1).dt.days)
print(df)
Date1 Date2 Delta
0 2021-09-05 2021-09-30 25
I just answered a similar query here subtracting dates in python
import datetime
from datetime import date
from datetime import datetime
import pandas as pd
date_format_str = '%Y-%m-%d %H:%M:%S.%f'
date_1 = '2016-09-24 17:42:27.839496'
date_2 = '2017-01-18 10:24:08.629327'
start = datetime.strptime(date_1, date_format_str)
end = datetime.strptime(date_2, date_format_str)
diff = end - start
# Get interval between two timstamps as timedelta object
diff_in_hours = diff.total_seconds() / 3600
print(diff_in_hours)
# get the difference between two dates as timedelta object
diff = end.date() - start.date()
print(diff.days)
Pandas
import datetime
from datetime import date
from datetime import datetime
import pandas as pd
date_1 = '2016-09-24 17:42:27.839496'
date_2 = '2017-01-18 10:24:08.629327'
start = pd.to_datetime(date_1, format='%Y-%m-%d %H:%M:%S.%f')
end = pd.to_datetime(date_2, format='%Y-%m-%d %H:%M:%S.%f')
# get the difference between two datetimes as timedelta object
diff = end - start
print(diff.days)
Data
I import a date from an Excel workbook and store it in a variable called reportdate.
reportdate = pd.read_excel(file, header=None, nrows= 1, use_cols = 'A:B').dropna(axis=1, how='all').loc[0,1]
I then convert reportdate to a DataFrame using rdf = pd.DataFrame({'Date':[reportdate]}).
type(reportdate) returns pandas._libs.tslibs.timestamps.Timestamp.
reportdate returns Timestamp('2019-12-02 07:19:07.703000').
I don't know how to recreate reportdate to be that exact format and timestamp format.
Here is a sample data set.
df = pd.DataFrame({'CN ON': ['WD-D5','JF-04','P5'],
'Date Range': ['10/05/2019 - 11/06/2019','09/05/2019 - 12/15/2019','05/09/2019 - 10/25/2019']
})
What I do
I then parse apart Date Range to get the last date in the range and convert it to a datetime type.
df['End Date'] = df['Date Range'].str[-10:]
df['End Date'] = pd.to_datetime(df['End Date'], errors='coerce')
I need to calculate the day difference between reportdate and End Date.
What I try
Here is what I try.
df['ReportDate'] = reportdate
df['ReportDate'] = pd.to_datetime(df['ReportDate'], errors='coerece')
df['Days'] = df['End Date'] - df['ReportDate']
Then I check the types.
df.dtypes returns datetime64[ns] for both ReportDate and End Date.
What I need
I need the difference in days to be an integer or float because I need to check if those days are between certain values.
I keep getting the following error TypeError: ufunc subtract cannot use operands with types dtype('<U10') and dtype('<M8[ns]').
Any guidance on how I can get the days difference between the dates in a number (int, float, etc.) format would be greatly appreciated. I don't know where my TypeError is throwing.
The problem is caused by errors='coerce'. I searched and someone said 'coerce' is a leftover from old-version python. Try to remove it.
import pandas as pd
df = pd.DataFrame({'CN ON': ['WD-D5','JF-04','P5'],
'Date Range': ['10/05/2019 - 11/06/2019','09/05/2019 - 12/15/2019','05/09/2019 - 10/25/2019']
})
df['End Date'] = df['Date Range'].str[-10:]
df['End Date'] = pd.to_datetime(df['End Date'])
df['ReportDate'] = '2019-12-02 07:19:08'
df['ReportDate'] = pd.to_datetime(df['ReportDate'])
df['Days'] = df['End Date'] - df['ReportDate']
print(df)
I have a column with dates looking like this: 10-apr-18.
when I'm transposing my df or doing anything with it, pandas automatically sort this column by the day (the first number) so it's not chronological.
I've tried to use to_datetime but because the month is a string it won't work.
How can I convert this to date OR cancel the automatically sorting (my raw data is already in the right order).
I suggest convert to datetimes with to_datetime and parameter format:
df = pd.DataFrame({'dates':['10-may-18','10-apr-18']})
#also working for me
#df['dates'] = pd.to_datetime(df['dates'])
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%y')
df = df.sort_values('dates')
df['dates'] = df['dates'].dt.strftime('%d-%B-%y')
print (df)
dates
1 10-April-18
0 10-May-18
df = pd.DataFrame({'dates':['10-may-18','10-apr-18']})
#also working for me
#df['dates'] = pd.to_datetime(df['dates'])
df['datetimes'] = pd.to_datetime(df['dates'], format='%d-%b-%y')
df = df.sort_values('datetimes')
df['full'] = df['datetimes'].dt.strftime('%d-%B-%y')
print (df)
dates datetimes full
1 10-apr-18 2018-04-10 10-April-18
0 10-may-18 2018-05-10 10-May-18
df['dates'] = pd.to_datetime(df['dates'], format='%d-%b-%y').dt.strftime('%d/%B/%y')
I am using pandas dataframe that is loaded with csv files along with dates in it. Lets say
Assigned Date
1/15/2019
Resolved Date
1/20/2019
I am calculating the differance
df0['ResDate'] = df0['Resolved Date'].apply(lambda t: pd.to_datetime(t).date())
df0['RepDate'] = df0['Assigned Date'].apply(lambda t: pd.to_datetime(t).date())
df0['Woda']=df0['ResDate']-df0['RepDate']
I am getting the correct differance but i need to subract the weekends in this.
How do i proceed.
Thanks
Use numpy.busday_count:
df0['Assigned Date'] = pd.to_datetime(df0['Assigned Date'])
df0['Resolved Date'] = pd.to_datetime(df0['Resolved Date'])
df0['Woda'] = [np.busday_count(b,a) for a, b in zip(df0['Resolved Date'],df0['Assigned Date'])]
You can use datetime module to find the difference between two days:
import datetime
d1 = datetime.datetime.strptime('2019-01-15', '%Y-%m-%d')
d2 = datetime.datetime.strptime('2019-01-20', '%Y-%m-%d')
diff_days = (d2 - d1).days
diff_weekdays = diff_days - (diff_days // 7) * 2
print(diff_weekdays)