How to get inclusive difference between timestamps - python

I'd like to get the difference between the end and start date columns, inclusive
df = pd.DataFrame()
df['start'] = pd.to_datetime(['1/1/2020','1/2/2020'])
df['end'] = pd.to_datetime(['1/31/2020', '1/25/2020'])
df['diff'] = df['end'] - df['start']
So instead of
start end diff
0 2020-01-01 2020-01-31 30 days
1 2020-01-02 2020-01-25 23 days
I want to get 31 and 24 days. I can solve it by adding a 1 day Timedelta, but it seems a bit fragile. Is there any other way>
df['diff'] = df['end'] - df['start'] + pd.Timedelta(days=1)

It's exactly what you need. This is how I use it when working on dates
pd.Timedelta(days=1).

Related

How to replace NaT in pandas dataframe with a date using information from other columns

noob here.
I have a dataframe that looks like this:
start
end
start_year
NaT
NaT
2020
NaT
NaT
2021
and I want to fill in the NaT's with the first and last day of the year listed in the start_year column. So it would look like this:
start
end
start_year
2020-01-01
2020-12-31
2020
2021-01-01
2021-12-31
2021
I tried to fill in the NaTs in the 'end' column like this:
df2.loc[df2['start'].isnull()
& df2['end'].isnull()
& df2['start_year'].notnull()
, "end"] = dt.date(df2["start_year"], 12, 31)
but I get this error:
TypeError: cannot convert the series to <class 'int'>
When I look at just the start year column it says this:
Name: start_year, Length: 4213, dtype: int64
I also tried using
df2["start_year"].values
but that didn't help.
Apologies if I'm just being an idiot. I searched around on here and google but couldn't find an answer.
For both columns start and end, keep the value if filled else fill it with first day (resp last day) of year
df['start'] = df.apply(lambda x: x['start'] if pd.notna(x['start']) else pd.to_datetime(datetime(x['start_year'], 1, 1), format='%y-%m-%d'), axis = 1)
df['end'] = df.apply(lambda x: x['end'] if pd.notna(x['end']) else pd.to_datetime(datetime(x['start_year'], 12, 31), format='%y-%m-%d'), axis = 1)
Use:
#if necessary
#df['start'] = pd.to_datetime(df['start'])
#df['end'] = pd.to_datetime(df['end'])
#replace missing values by Year - first day and last day
df['start'] = df['start'].fillna(pd.to_datetime(df['start_year'],format='%Y'))
df['end'] = (df['end'].fillna(pd.to_datetime(df['start_year'].add(1), format='%Y')
- pd.Timedelta('1 day')))
print (df)
start end start_year
0 2020-01-01 2020-12-31 2020
1 2021-01-01 2021-12-31 2021
df['start_year'].apply(pd.Period).dt.to_timestamp('A')
output:
0 2020-12-31
1 2021-12-31
Name: start_year, dtype: datetime64[ns]

Python / Pandas convert object with date and time into separate columns

I have 2 x columns called Start_Time and End_time, they each contain dates in this format: "dd/mm/yyyy hh:mm".
I am trying to extract/ clean up the info so that Start/ End Time show time only, and that new columns Start_Date and End_Date show date only.
I have seen 101 examples of this online and by all accounts the following should work:
df['Start_Date'] = pd.to_datetime(df['Start_Time']).dt.date
df['Start_Time'] = pd.to_datetime(df['Start_Time']).dt.time
df['End_Date'] = pd.to_datetime(df['End_Time']).dt.date
df['End_Time'] = pd.to_datetime(df['End_Time']).dt.time
I am getting the following error however:
"TypeError: <class 'datetime.time'> is not convertible to datetime"
Start_Time and End_time are currently objects - I have tried converting them to to type datetime but also run into errors.
Can anyone tell me what I am doing wrong? Thank you!
Use DataFrame.assign for avoid overwrite original column:
df = pd.DataFrame({'Start_Time':['02/03/2022 15:15','05/03/2022 15:15'],
'End_Time':['12/03/2022 20:15','07/04/2022 20:15']})
df = df.assign(Start_Date = pd.to_datetime(df['Start_Time'], dayfirst=True).dt.date,
Start_Time = pd.to_datetime(df['Start_Time'], dayfirst=True).dt.time,
End_Date = pd.to_datetime(df['End_Time'], dayfirst=True).dt.date,
End_Time = pd.to_datetime(df['End_Time'], dayfirst=True).dt.time )
print (df)
Start_Time End_Time Start_Date End_Date
0 15:15:00 20:15:00 2022-03-02 2022-03-12
1 15:15:00 20:15:00 2022-03-05 2022-04-07
Or use helper variables:
df = pd.DataFrame({'Start_Time':['02/03/2022 15:15','05/03/2022 15:15'],
'End_Time':['12/03/2022 20:15','07/04/2022 20:15']})
s1 = pd.to_datetime(df['Start_Time'], dayfirst=True)
s2 = pd.to_datetime(df['End_Time'], dayfirst=True)
df['Start_Date'] = s1.dt.date
df['End_Date'] = s2.dt.date
df['Start_Time'] = s1.dt.time
df['End_Time'] = s2.dt.time
print (df)
Start_Time End_Time Start_Date End_Date
0 15:15:00 20:15:00 2022-03-02 2022-03-12
1 15:15:00 20:15:00 2022-03-05 2022-04-07

How to get rid of MonthEnds type

I am trying to get the delta in months between a starting date and an ending date within Pandas DataFrame. The result is not totally satisfying...
First, the outcome is some sort of Datetime type in the form of <[value] * MonthEnds>. I can't use this to calculate with. First question is how to convert this to an integer. I tried the .n attribute but then I get the following error:
AttributeError: 'Series' object has no attribute 'n'
Second, the outcome is 'missing' one month. Can this be avoided by using another solution/method? Or should I just add 1 month to the answer?
To support my questions I created some simplified code:
dates = [{'Start':'1-1-2020', 'End':'31-10-2020'}, {'Start':'1-2-2020', 'End':'30-11-2020'}]
df = pd.DataFrame(dates)
df['Start'] = pd.to_datetime(df['Start'], dayfirst=True)
df['End'] = pd.to_datetime(df['End'], dayfirst=True)
df['Duration'] = (df['End'].dt.to_period('M') - df['Start'].dt.to_period('M'))
df
This results in:
Start End Duration
0 2020-01-01 2020-10-31 <9 * MonthEnds>
1 2020-02-01 2020-11-30 <9 * MonthEnds>
The preferred result would be:
Start End Duration
0 2020-01-01 2020-10-31 10
1 2020-02-01 2020-11-30 10
Subtract the start-date from the end-date and convert the time delta to months.
import pandas as pd
dates = [{'Start':'1-1-2020', 'End':'31-10-2020'}, {'Start':'1-2-2020', 'End':'30-11-2020'}]
df = pd.DataFrame(dates)
df['Start'] = pd.to_datetime(df['Start'], dayfirst=True)
df['End'] = pd.to_datetime(df['End'], dayfirst=True)
df['Duration'] = (df['End']-df['Start']).astype('<m8[M]').astype(int)+1
print(df)
Output:
Start End Duration
0 2020-01-01 2020-10-31 10
1 2020-02-01 2020-11-30 10
Try This
dates = [{'Start':'1-1-2020', 'End':'31-10-2020'}, {'Start':'1-2-2020', 'End':'30-11-2020'}]
df = pd.DataFrame(dates)
df['Start'] = pd.to_datetime(df['Start'], dayfirst=True)
df['End'] = pd.to_datetime(df['End'], dayfirst=True)
df['Duration'] = (df['End'] - df['Start']).apply(lambda x:x.days//30)
print(df)

How to find the difference between two formatted dates in days?

I have a pandas DataFrame with the following content:
df =
start end
01/April 02/May
12/April 12/April
I need to add a column with the difference (in days) between end and start values (end - start).
How can I do it?
I tried the following:
import pandas as pd
df.startdate = pd.datetime(df.start, format='%B/%d')
df.enddate = pd.datetime(df.end, format='%B/%d')
But not sure if this is a right direction.
import pandas as pd
df = pd.DataFrame({"start":["01/April", "12/April"], "end": ["02/May", "12/April"]})
df["start"] = pd.to_datetime(df["start"])
df["end"] = pd.to_datetime(df["end"])
df["diff"] = (df["end"] - df["start"])
Output:
end start diff
0 2018-05-02 2018-04-01 31 days
1 2018-04-12 2018-04-12 0 days
This is one way.
df['start'] = pd.to_datetime(df['start']+'/2018', format='%d/%B/%Y')
df['end'] = pd.to_datetime(df['end']+'/2018', format='%d/%B/%Y')
df['diff'] = df['end'] - df['start']
# start end diff
# 0 2018-04-01 2018-05-02 31 days
# 1 2018-04-12 2018-04-12 0 days

Pandas get days in a between two two dates from a particular month

I have a pandas dataframe with three columns. A start and end date and a month.
I would like to add a column for how many days within the month are between the two dates. I started doing something with apply, the calendar library and some math, but it started to get really complex. I bet pandas has a simple solution, but am struggling to find it.
Input:
import pandas as pd
df1 = pd.DataFrame(data=[['2017-01-01', '2017-06-01', '2016-01-01'],
['2015-03-02', '2016-02-10', '2016-02-01'],
['2011-01-02', '2018-02-10', '2016-03-01']],
columns=['start date', 'end date date', 'Month'])
Desired Output:
start date end date date Month Days in Month
0 2017-01-01 2017-06-01 2016-01-01 0
1 2015-03-02 2016-02-10 2016-02-01 10
2 2011-01-02 2018-02-10 2016-03-01 31
There is a solution:
get a date list by pd.date_range between start and end dates, and then check how many date has the same year and month with the target month.
def overlap(x):
md = pd.to_datetime(x[2])
cand = [(ad.year, ad.month) for ad in pd.date_range(x[0], x[1])]
return len([x for x in cand if x ==(md.year, md.month)])
df1["Days in Month"]= df1.apply(overlap, axis=1)
You'll get:
start date end date date Month Days in Month
0 2017-01-01 2017-06-01 2016-01-01 0
1 2015-03-02 2016-02-10 2016-02-01 10
2 2011-01-02 2018-02-10 2016-03-01 31
You can convert your cell to datetime by
df = df.applymap(lambda x: pd.to_datetime(x))
Then find intersection days with function
def intersectionDaysInMonth(start, end, month):
end_month = month.replace(month=month.month + 1)
if month <= start <= end_month:
return end_month - start
if month <= end <= end_month:
return end - month
if start <= month < end_month <= end:
return end_month - month
return pd.to_timedelta(0)
Then apply
df['Days in Month'] = df.apply(lambda row: intersectionDaysInMonth(*row).days, axis=1)

Categories

Resources