How to get number of days between two dates using pandas

How to get number of days between two dates using pandas - python

I'm trying to get number of days between two dates using below function
df['date'] = pd.to_datetime(df.date)
# Creating a function that returns the number of days
def calculate_days(date):
today = pd.Timestamp('today')
return today - date
# Apply the function to the column date
df['days'] = df['date'].apply(lambda x: calculate_days(x))
The results looks like this
153 days 10:16:46.294037
but I want it to say 153. How do I handle this?

For performance you can subtract values without apply for avoid loops use Series.rsub for subtract from rigth side:
df['date'] = pd.to_datetime(df.date)
df['days'] = df['date'].rsub(pd.Timestamp('today')).dt.days
What working like:
df['days'] = (pd.Timestamp('today') - df['date']).dt.days
If want use your solution:
df['date'] = pd.to_datetime(df.date)
def calculate_days(date):
today = pd.Timestamp('today')
return (today - date).days
df['days'] = df['date'].apply(lambda x: calculate_days(x))
Or:
df['date'] = pd.to_datetime(df.date)
def calculate_days(date):
today = pd.Timestamp('today')
return (today - date)
df['days'] = df['date'].apply(lambda x: calculate_days(x)).dt.days

df['date'] = pd.to_datetime(df.date)
a) pandas
(pd.Timestamp("today") - df.date).days
b) this numpy build function allows you to select a weekmask
np.busday_count(df.date.date(), pd.Timestamp("today").date(), weekmask=[1,1,1,1,1,1,1])

Related

In column dataframe, how do I find the date just before a given date

I have the following DF :
Date
01/07/2022
10/07/2022
20/07/2022
The date x is
12/07/2022
So basically the function should return
10/07/2022
I am trying to avoid looping over the whole column but I don't know how to specify that I want the max date before a given date.
max(DF['Dates']) #Returns 20/07/2022

Try this:
d = '12/07/2022'
f = '%d/%m/%Y'
(pd.to_datetime(df['Date'],format=f)
.where(lambda x: x.lt(pd.to_datetime(d,format=f)))
.max())

You can filter dates by index:
df[df.Date < pd.to_datetime('12/07/2022')]
Then find max:
max(df[df.Date < pd.to_datetime('12/07/2022')].Date)

# Setting some stuff up
Date = ["01/07/2022", "10/07/2022", "20/07/2022"]
df = pd.DataFrame({"Date":Date})
df.Date = pd.to_datetime(df.Date, format='%d/%m/%Y')
target_date = pd.to_datetime("12/07/2022", format='%d/%m/%Y')
df = df.sort_values(by=["Date"]) # Sort by date
# Find all dates that are before target date, then choose the last one (i.e. the most recent one)
df.Date[df.Date < target_date][-1:].dt.date.values[0]
Output:
datetime.date(2022, 7, 10)

Create new column based on multiple conditions of existing column while manipulating existing column

I am new to Python/pandas coming from an R background. I am having trouble understanding how I can manipulate an existing column to create a new column based on multiple conditions of the existing column. There are 10 different conditions that need to met but for simplicity I will use a 2 case scenario.
In R:
install.packages("lubridate")
library(lubridate)
df <- data.frame("Date" = c("2020-07-01", "2020-07-15"))
df$Date <- as.Date(df$Date, format = "%Y-%m-%d")
df$Fiscal <- ifelse(day(df$Date) > 14,
paste0(year(df$Date),"-",month(df$Date) + 1,"-01"),
paste0(year(df$Date),"-",month(df$Date),"-01")
)
df$Fiscal <- as.Date(df$Fiscal, format = "%Y-%m-%d")
In Python I have:
import pandas as pd
import datetime as dt
df = {'Date': ['2020-07-01', '2020-07-15']}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'], yearfirst = True, format = "%Y-%m-%d")
df.loc[df['Date'].dt.day > 14,
'Fiscal'] = "-".join([dt.datetime.strftime(df['Date'].dt.year), dt.datetime.strftime(df['Date'].dt.month + 1),"01"])
df.loc[df['Date'].dt.day <= 14,
'Fiscal'] = "-".join([dt.datetime.strftime(df['Date'].dt.year), dt.datetime.strftime(df['Date'].dt.month),"01"])
If I don't convert the 'Date' field it says that it expects a string, however if I do convert the date field, I still get an error as it seems it is applying to a 'Series' object.
TypeError: descriptor 'strftime' for 'datetime.date' objects doesn't apply to a 'Series' object
I understand I may have some terminology or concepts incorrect and apologize, however the answers I have seen dealing with creating a new column with multiple conditions do not seem to be manipulating the existing column they are checking the condition on, and simply taking on an assigned value. I can only imagine there is a more efficient way of doing this that is less 'R-ey' but I am not sure where to start.

This isn't intended as a full answer, just as an illustration how strftime works: strftime is a method of a date(time) object that takes a format-string as argument:
import pandas as pd
import datetime as dt
df = {'Date': ['2020-07-01', '2020-07-15']}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'], yearfirst = True, format = "%Y-%m-%d")
s = [dt.date(df['Date'][i].year, df['Date'][i].month + 1, 1).strftime('%Y-%m-%d')
for i in df['Date'].index]
print(s)
Result:
['2020-08-01', '2020-08-01']
Again: No full answer, just a hint.
EDIT: You can vectorise this, for example by:
import pandas as pd
import datetime as dt
df = {'Date': ['2020-07-01', '2020-07-15']}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'], yearfirst=True, format='%Y-%m-%d')
df['Fiscal'] = df['Date'].apply(lambda d: dt.date(d.year, d.month, 1)
if d.day < 15 else
dt.date(d.year, d.month + 1, 1))
print(df)
Result:
Date Fiscal
0 2020-07-01 2020-07-01
1 2020-07-15 2020-08-01
Here I'm using an on-the-fly lambda function. You could also do it with an externally defined function:
def to_fiscal(date):
if date.day < 15:
return dt.date(date.year, date.month, 1)
return dt.date(date.year, date.month + 1, 1)
df['Fiscal'] = df['Date'].apply(to_fiscal)
In general vectorisation is better than looping over rows because the looping is done on a more "lower" level and that is much more efficient.

Until someone tells me otherwise I will do it this way. If there's a way to do it vectorized (or just a better way in general) I would greatly appreciate it
import pandas as pd
import datetime as dt
df = {'Date': ['2020-07-01', '2020-07-15']}
df = pd.DataFrame(df)
df['Date'] = pd.to_datetime(df['Date'], yearfirst=True, format='%Y-%m-%d')
test_list = list()
for i in df['Date'].index:
mth = df['Date'][i].month
yr = df['Date'][i].year
dy = df['Date'][i].day
if(dy > 14):
new_date = dt.date(yr, mth + 1, 1)
else:
new_date = dt.date(yr, mth, 1)
test_list.append(new_date)
df['New_Date'] = test_list

How to remove year from datetime to plot years on top of each other

I'm trying to compare 10 years of data. I would like to remove the 'year' from the datetime, so I can plot each January on top of each other.
I've tried the following
df_data = pd.read_csv("P11-B2.csv", skiprows=[i for i in range(1,35)], usecols=[1,2,4])
df = pd.DataFrame(columns = ['Datetime', 'FH'])
df1 = pd.to_datetime(df_data['YYYYMMDD'], format='%Y%m%d')
df2 = df_data[' HH'].astype('timedelta64[h]')
df['Datetime'] = df1 + df2
df['FH'] = pd.to_numeric(df_data[' FH'], errors ='coerce')
del df1
del df2
del df_data
df['month'] = pd.DatetimeIndex(df['Datetime']).month
df100 = pd.to_datetime(df['month'], format='%m')
df['day'] = pd.DatetimeIndex(df['Datetime']).day
df101 = pd.to_datetime(df['day'], format='%d')
df['hour'] = pd.DatetimeIndex(df['Datetime']).hour
df102 = df['hour'].astype('timedelta64[h]')
df['year'] = 1900
df104 = pd.to_datetime(df['year'], format='%Y')
#df['DATE'] = df104 + df100 + df101 + df102
df['DATE'] = df['year'] + df['month'] + df['day'] + df['hour']
Though this returns an integer.
Is there a different way to only remove the year and keep the %m%d%H format?
Or is there a simple way to override the x-axis and use the integer?
This is what i would like to plot
I want to make a plot for each month, showing different lines for each year.

difference between 2 dates in days saved into new column as float

I have a python dataframe with 2 columns that contain dates as strings e.g. start_date "2002-06-12" and end_date "2009-03-01". I would like to calculate the difference (days) between these 2 columns for each row and save the results into a new column called for example time_diff of type float.
I have tried:
df["time_diff"] = (pd.Timestamp(df.end_date) - pd.Timestamp(df.start_date )).astype("timedelta64[d]")
pd.to_numeric(df["time_diff"])
based on some tutorials but this gives TypeError: Cannot convert input for the first line. What do I need to change to get this running?

Here is a working example of converting a string column of a dataframe to datetime type and saving the time difference between the datetime columns in a new column as a float data type( number of seconds)
import pandas as pd
from datetime import timedelta
tmp = [("2002-06-12","2009-03-01"),("2016-04-28","2022-03-14")]
df = pd.DataFrame(tmp,columns=["col1","col2"])
df["col1"]=pd.to_datetime(df["col1"])
df["col2"]=pd.to_datetime(df["col2"])
df["time_diff"]=df["col2"]-df["col1"]
df["time_diff"]=df["time_diff"].apply(timedelta.total_seconds)
Time difference in seconds can be converted to minutes or days by using simple math.

Try:
import numpy as np
enddates = np.asarray([pd.Timestamp(end) for end in df.end_date.values])
startdates = np.asarray([pd.Timestamp(start) for start in df.start_date.values])
df['time_diff'] = (enddates - startdates).astype("timedelta64")

First convert strings to datetime, then calculate difference in days.
df['start_date'] = pd.to_datetime(df['start_date'], format='%Y-%m-%d')
df['end_date'] = pd.to_datetime(df['end_date'], format='%Y-%m-%d')
df['time_diff'] = (df.end_date - df.start_date).dt.days

You can also do it by converting your columns into date and then computing the difference :
from datetime import datetime
df = pd.DataFrame({'Start Date' : ['2002-06-12', '2002-06-12' ], 'End date' : ['2009-03-01', '2009-03-06']})
df['Start Date'] = [ datetime.strptime(x, "%Y-%m-%d").date() for x in df['Start Date'] ]
df['End date'] = [ datetime.strptime(x, "%Y-%m-%d").date() for x in df['End date'] ]
df['Diff'] = df['End date'] - df['Start Date']
Out :
End date Start Date Diff
0 2009-03-01 2002-06-12 2454 days
1 2009-03-06 2002-06-12 2459 days

You should just use pd.to_datetime to convert your string values:
df["time_diff"] = (pd.to_datetime(df.end_date) - pd.to_datetime(df.start_date))
The resul will automatically be a timedelta64

You can try this :
df = pd.DataFrame()
df['Arrived'] = [pd.Timestamp('01-04-2017')]
df['Left'] = [pd.Timestamp('01-06-2017')]
diff = df['Left'] - df['Arrived']
days = pd.Series(delta.days for delta in (diff)
result = days[0]

How can I subtract weekends in my list of dates?

I am using pandas dataframe that is loaded with csv files along with dates in it. Lets say
Assigned Date
1/15/2019
Resolved Date
1/20/2019
I am calculating the differance
df0['ResDate'] = df0['Resolved Date'].apply(lambda t: pd.to_datetime(t).date())
df0['RepDate'] = df0['Assigned Date'].apply(lambda t: pd.to_datetime(t).date())
df0['Woda']=df0['ResDate']-df0['RepDate']
I am getting the correct differance but i need to subract the weekends in this.
How do i proceed.
Thanks

Use numpy.busday_count:
df0['Assigned Date'] = pd.to_datetime(df0['Assigned Date'])
df0['Resolved Date'] = pd.to_datetime(df0['Resolved Date'])
df0['Woda'] = [np.busday_count(b,a) for a, b in zip(df0['Resolved Date'],df0['Assigned Date'])]

You can use datetime module to find the difference between two days:
import datetime
d1 = datetime.datetime.strptime('2019-01-15', '%Y-%m-%d')
d2 = datetime.datetime.strptime('2019-01-20', '%Y-%m-%d')
diff_days = (d2 - d1).days
diff_weekdays = diff_days - (diff_days // 7) * 2
print(diff_weekdays)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to get number of days between two dates using pandas - python

df['date'] = pd.to_datetime(df.date) a) pandas (pd.Timestamp("today") - df.date).days b) this numpy build function allows you to select a weekmask np.busday_count(df.date.date(), pd.Timestamp("today").date(), weekmask=[1,1,1,1,1,1,1])

Related

In column dataframe, how do I find the date just before a given date

Create new column based on multiple conditions of existing column while manipulating existing column

How to remove year from datetime to plot years on top of each other

difference between 2 dates in days saved into new column as float

How can I subtract weekends in my list of dates?

Categories

Resources