I am using pandas dataframe that is loaded with csv files along with dates in it. Lets say
Assigned Date
1/15/2019
Resolved Date
1/20/2019
I am calculating the differance
df0['ResDate'] = df0['Resolved Date'].apply(lambda t: pd.to_datetime(t).date())
df0['RepDate'] = df0['Assigned Date'].apply(lambda t: pd.to_datetime(t).date())
df0['Woda']=df0['ResDate']-df0['RepDate']
I am getting the correct differance but i need to subract the weekends in this.
How do i proceed.
Thanks
Use numpy.busday_count:
df0['Assigned Date'] = pd.to_datetime(df0['Assigned Date'])
df0['Resolved Date'] = pd.to_datetime(df0['Resolved Date'])
df0['Woda'] = [np.busday_count(b,a) for a, b in zip(df0['Resolved Date'],df0['Assigned Date'])]
You can use datetime module to find the difference between two days:
import datetime
d1 = datetime.datetime.strptime('2019-01-15', '%Y-%m-%d')
d2 = datetime.datetime.strptime('2019-01-20', '%Y-%m-%d')
diff_days = (d2 - d1).days
diff_weekdays = diff_days - (diff_days // 7) * 2
print(diff_weekdays)
Related
I have the following DF :
Date
01/07/2022
10/07/2022
20/07/2022
The date x is
12/07/2022
So basically the function should return
10/07/2022
I am trying to avoid looping over the whole column but I don't know how to specify that I want the max date before a given date.
max(DF['Dates']) #Returns 20/07/2022
Try this:
d = '12/07/2022'
f = '%d/%m/%Y'
(pd.to_datetime(df['Date'],format=f)
.where(lambda x: x.lt(pd.to_datetime(d,format=f)))
.max())
You can filter dates by index:
df[df.Date < pd.to_datetime('12/07/2022')]
Then find max:
max(df[df.Date < pd.to_datetime('12/07/2022')].Date)
# Setting some stuff up
Date = ["01/07/2022", "10/07/2022", "20/07/2022"]
df = pd.DataFrame({"Date":Date})
df.Date = pd.to_datetime(df.Date, format='%d/%m/%Y')
target_date = pd.to_datetime("12/07/2022", format='%d/%m/%Y')
df = df.sort_values(by=["Date"]) # Sort by date
# Find all dates that are before target date, then choose the last one (i.e. the most recent one)
df.Date[df.Date < target_date][-1:].dt.date.values[0]
Output:
datetime.date(2022, 7, 10)
I'm trying to get number of days between two dates using below function
df['date'] = pd.to_datetime(df.date)
# Creating a function that returns the number of days
def calculate_days(date):
today = pd.Timestamp('today')
return today - date
# Apply the function to the column date
df['days'] = df['date'].apply(lambda x: calculate_days(x))
The results looks like this
153 days 10:16:46.294037
but I want it to say 153. How do I handle this?
For performance you can subtract values without apply for avoid loops use Series.rsub for subtract from rigth side:
df['date'] = pd.to_datetime(df.date)
df['days'] = df['date'].rsub(pd.Timestamp('today')).dt.days
What working like:
df['days'] = (pd.Timestamp('today') - df['date']).dt.days
If want use your solution:
df['date'] = pd.to_datetime(df.date)
def calculate_days(date):
today = pd.Timestamp('today')
return (today - date).days
df['days'] = df['date'].apply(lambda x: calculate_days(x))
Or:
df['date'] = pd.to_datetime(df.date)
def calculate_days(date):
today = pd.Timestamp('today')
return (today - date)
df['days'] = df['date'].apply(lambda x: calculate_days(x)).dt.days
df['date'] = pd.to_datetime(df.date)
a) pandas
(pd.Timestamp("today") - df.date).days
b) this numpy build function allows you to select a weekmask
np.busday_count(df.date.date(), pd.Timestamp("today").date(), weekmask=[1,1,1,1,1,1,1])
Data
I import a date from an Excel workbook and store it in a variable called reportdate.
reportdate = pd.read_excel(file, header=None, nrows= 1, use_cols = 'A:B').dropna(axis=1, how='all').loc[0,1]
I then convert reportdate to a DataFrame using rdf = pd.DataFrame({'Date':[reportdate]}).
type(reportdate) returns pandas._libs.tslibs.timestamps.Timestamp.
reportdate returns Timestamp('2019-12-02 07:19:07.703000').
I don't know how to recreate reportdate to be that exact format and timestamp format.
Here is a sample data set.
df = pd.DataFrame({'CN ON': ['WD-D5','JF-04','P5'],
'Date Range': ['10/05/2019 - 11/06/2019','09/05/2019 - 12/15/2019','05/09/2019 - 10/25/2019']
})
What I do
I then parse apart Date Range to get the last date in the range and convert it to a datetime type.
df['End Date'] = df['Date Range'].str[-10:]
df['End Date'] = pd.to_datetime(df['End Date'], errors='coerce')
I need to calculate the day difference between reportdate and End Date.
What I try
Here is what I try.
df['ReportDate'] = reportdate
df['ReportDate'] = pd.to_datetime(df['ReportDate'], errors='coerece')
df['Days'] = df['End Date'] - df['ReportDate']
Then I check the types.
df.dtypes returns datetime64[ns] for both ReportDate and End Date.
What I need
I need the difference in days to be an integer or float because I need to check if those days are between certain values.
I keep getting the following error TypeError: ufunc subtract cannot use operands with types dtype('<U10') and dtype('<M8[ns]').
Any guidance on how I can get the days difference between the dates in a number (int, float, etc.) format would be greatly appreciated. I don't know where my TypeError is throwing.
The problem is caused by errors='coerce'. I searched and someone said 'coerce' is a leftover from old-version python. Try to remove it.
import pandas as pd
df = pd.DataFrame({'CN ON': ['WD-D5','JF-04','P5'],
'Date Range': ['10/05/2019 - 11/06/2019','09/05/2019 - 12/15/2019','05/09/2019 - 10/25/2019']
})
df['End Date'] = df['Date Range'].str[-10:]
df['End Date'] = pd.to_datetime(df['End Date'])
df['ReportDate'] = '2019-12-02 07:19:08'
df['ReportDate'] = pd.to_datetime(df['ReportDate'])
df['Days'] = df['End Date'] - df['ReportDate']
print(df)
I need to convert a variable i created into a timestamp from a datetime.
I need it in a timestamp format to perform a lambda function against my pandas series, which is stored as a datetime64.
The lambda function should find the difference in months between startDate and the entire pandas series. Please help?
I've tried using relativedelta to calculate the difference in months but I'm not sure how to implement it with a pandas series.
from datetime import datetime
import pandas as pd
from dateutil.relativedelta import relativedelta as rd
#open the data set and store in the series ('df')
file = pd.read_csv("test_data.csv")
df = pd.DataFrame(file)
#extract column "AccountOpenedDate into a data frame"
open_date_data = pd.Series.to_datetime(df['AccountOpenedDate'], format = '%Y/%m/%d')
#set the variable startDate
dateformat = '%Y/%m/%d %H:%M:%S'
set_date = datetime.strptime('2017/07/01 00:00:00',dateformat)
startDate = datetime.timestamp(set_date)
#This function calculates the difference in months between two dates: ignore
def month_delta(start_date, end_date):
delta = rd(end_date, start_date)
# >>> relativedelta(years=+2, months=+3, days=+28)
return 12 * delta.years + delta.months
d1 = datetime(2017, 7, 1)
d2 = datetime(2019, 10, 29)
total_months = month_delta(d1, d2)
# Apply a lambda function to each row by adding 5 to each value in each column
dfobj = open_date_data.apply(lambda x: x + startDate)
print(dfobj)
I'm only using a single column from the loaded data set. It's a date column in the following format ("%Y/%m/%d %H:%M:%S"). I want to find the difference in months between startDate and all the dates in the series.
As I don't have your original csv, I've made up some sample data and hopefully managed to shorten your code quite a bit:
open_date_data = pd.Series(pd.date_range('2017/07/01', periods=10, freq='M'))
startDate = pd.Timestamp("2017/07/01")
Then, with help from this answer to get the appropriate month_diff formula:
def month_diff(a, b):
return 12 * (a.year - b.year) + (a.month - b.month)
open_date_data.apply(lambda x: month_diff(x, startDate))
I have a python dataframe with 2 columns that contain dates as strings e.g. start_date "2002-06-12" and end_date "2009-03-01". I would like to calculate the difference (days) between these 2 columns for each row and save the results into a new column called for example time_diff of type float.
I have tried:
df["time_diff"] = (pd.Timestamp(df.end_date) - pd.Timestamp(df.start_date )).astype("timedelta64[d]")
pd.to_numeric(df["time_diff"])
based on some tutorials but this gives TypeError: Cannot convert input for the first line. What do I need to change to get this running?
Here is a working example of converting a string column of a dataframe to datetime type and saving the time difference between the datetime columns in a new column as a float data type( number of seconds)
import pandas as pd
from datetime import timedelta
tmp = [("2002-06-12","2009-03-01"),("2016-04-28","2022-03-14")]
df = pd.DataFrame(tmp,columns=["col1","col2"])
df["col1"]=pd.to_datetime(df["col1"])
df["col2"]=pd.to_datetime(df["col2"])
df["time_diff"]=df["col2"]-df["col1"]
df["time_diff"]=df["time_diff"].apply(timedelta.total_seconds)
Time difference in seconds can be converted to minutes or days by using simple math.
Try:
import numpy as np
enddates = np.asarray([pd.Timestamp(end) for end in df.end_date.values])
startdates = np.asarray([pd.Timestamp(start) for start in df.start_date.values])
df['time_diff'] = (enddates - startdates).astype("timedelta64")
First convert strings to datetime, then calculate difference in days.
df['start_date'] = pd.to_datetime(df['start_date'], format='%Y-%m-%d')
df['end_date'] = pd.to_datetime(df['end_date'], format='%Y-%m-%d')
df['time_diff'] = (df.end_date - df.start_date).dt.days
You can also do it by converting your columns into date and then computing the difference :
from datetime import datetime
df = pd.DataFrame({'Start Date' : ['2002-06-12', '2002-06-12' ], 'End date' : ['2009-03-01', '2009-03-06']})
df['Start Date'] = [ datetime.strptime(x, "%Y-%m-%d").date() for x in df['Start Date'] ]
df['End date'] = [ datetime.strptime(x, "%Y-%m-%d").date() for x in df['End date'] ]
df['Diff'] = df['End date'] - df['Start Date']
Out :
End date Start Date Diff
0 2009-03-01 2002-06-12 2454 days
1 2009-03-06 2002-06-12 2459 days
You should just use pd.to_datetime to convert your string values:
df["time_diff"] = (pd.to_datetime(df.end_date) - pd.to_datetime(df.start_date))
The resul will automatically be a timedelta64
You can try this :
df = pd.DataFrame()
df['Arrived'] = [pd.Timestamp('01-04-2017')]
df['Left'] = [pd.Timestamp('01-06-2017')]
diff = df['Left'] - df['Arrived']
days = pd.Series(delta.days for delta in (diff)
result = days[0]