Subtract 2 datetime lists dd/mm/YYYY in pandas - python

So, Basically, I got this 2 df columns with data content. The initial content is in the dd/mm/YYYY format, and I want to subtract them. But I can't really subtract string, so I converted it to datetime, but when I do such thing for some reason the format changes to YYYY-dd-mm, so when I try to subtract them, I got a wrong result. For example:
Initial Content:
a: 05/09/2022
b: 30/09/2021
result expected: 25 days.
Converted to DateTime:
a: 2022-05-09
b: 2021-09-30 (For some reason this date stills the same)
result: 144 days.
I'm using pandas and datetime to make this project.
So, I wanted to know a way I can subtract this 2 columns with the proper result.
--- Answer
When I used
pd.to_datetime(date, format="%d/%m/%Y")
It worked. Thank you all for your time. This is my first project in pandas. :)

df = pd.DataFrame({'Date1': ['05/09/2021'], 'Date2': ['30/09/2021']})
df = df.apply(lambda x:pd.to_datetime(x,format=r'%d/%m/%Y')).assign(Delta=lambda x: (x.Date2-x.Date1).dt.days)
print(df)
Date1 Date2 Delta
0 2021-09-05 2021-09-30 25

I just answered a similar query here subtracting dates in python
import datetime
from datetime import date
from datetime import datetime
import pandas as pd
date_format_str = '%Y-%m-%d %H:%M:%S.%f'
date_1 = '2016-09-24 17:42:27.839496'
date_2 = '2017-01-18 10:24:08.629327'
start = datetime.strptime(date_1, date_format_str)
end = datetime.strptime(date_2, date_format_str)
diff = end - start
# Get interval between two timstamps as timedelta object
diff_in_hours = diff.total_seconds() / 3600
print(diff_in_hours)
# get the difference between two dates as timedelta object
diff = end.date() - start.date()
print(diff.days)
Pandas
import datetime
from datetime import date
from datetime import datetime
import pandas as pd
date_1 = '2016-09-24 17:42:27.839496'
date_2 = '2017-01-18 10:24:08.629327'
start = pd.to_datetime(date_1, format='%Y-%m-%d %H:%M:%S.%f')
end = pd.to_datetime(date_2, format='%Y-%m-%d %H:%M:%S.%f')
# get the difference between two datetimes as timedelta object
diff = end - start
print(diff.days)

Related

Python PySpark substract 1 year from given end date to work with one year of data range

What I wanted to do is get 1 year of data.
By calculate latest date from the column date, as my end date. Then use the end date - 1 year to get the start date. After that, I can filter the data in between those start and end date.
I did manage to get the end date, but can't find how I can get the start date.
Below is the code that I have used so far. -1 year is what needs to be solved.
and if you know how to filter in pyspark is also welcome.
from pyspark.sql.functions import min, max
import datetime
import pyspark.sql.function as F
from pyspark.sql.functions import date_format, col
#convert string to date type
df = df.withColumn('risk_date', F.to_date(F.col('chosen_risk_prof_date'), 'dd.MM.yyyy'))
#filter only 1 year of data from big data set.
#calculate the start date and end date. lastest_date = end end.
latest_date = df.select((max("risk_date"))).show()
start_date = latest_date - *1 year*
new_df = df.date > start_date & df.date < end_date
Then after this get all the data between start date and end date
you can use relativedelta as below
from datetime import datetime
from dateutil.relativedelta import relativedelta
print(datetime.now() - relativedelta(years=1))

subtracting dates in python

i have an dataframe with dates and would like to get the time between the first date and the last date, when i run the code below
df.sort_values('timestamp', inplace=True)
firstDay = df.iloc[0]['timestamp']
lastDay = df.iloc[len(df)-1]['timestamp']
print(firstDay)
print(lastDay)
it provides the following formate of the dates :
2016-09-24 17:42:27.839496
2017-01-18 10:24:08.629327
and I'm trying to get the different between them but they're in the str format, and I've been having trouble converting them to a form where i can get the difference
here you go :o)
import datetime
from datetime import date
from datetime import datetime
import pandas as pd
date_format_str = '%Y-%m-%d %H:%M:%S.%f'
date_1 = '2016-09-24 17:42:27.839496'
date_2 = '2017-01-18 10:24:08.629327'
start = datetime.strptime(date_1, date_format_str)
end = datetime.strptime(date_2, date_format_str)
diff = end - start
# Get interval between two timstamps as timedelta object
diff_in_hours = diff.total_seconds() / 3600
print(diff_in_hours)
# get the difference between two dates as timedelta object
diff = end.date() - start.date()
print(diff.days)
Pandas
import datetime
from datetime import date
from datetime import datetime
import pandas as pd
date_1 = '2016-09-24 17:42:27.839496'
date_2 = '2017-01-18 10:24:08.629327'
start = pd.to_datetime(date_1, format='%Y-%m-%d %H:%M:%S.%f')
end = pd.to_datetime(date_2, format='%Y-%m-%d %H:%M:%S.%f')
# get the difference between two datetimes as timedelta object
diff = end - start
print(diff.days)

Unable to subtract a day from any specific date format

I'm trying to subtract a day from this date 06-30-2019 in order to make it 06-29-2019 but can't figure out any way to achive that.
I've tried with:
import datetime
date = "06-30-2019"
date = datetime.datetime.strptime(date,'%m-%d-%Y').strftime('%m-%d-%Y')
print(date)
It surely gives me back the date I used above.
How can I subtract a day from a date in the above format?
try this
import datetime
date = "06/30/19"
date = datetime.datetime.strptime(date, "%m/%d/%y")
NewDate = date + datetime.timedelta(days=-1)
print(NewDate) # 2019-06-29 00:00:00
Your code:
date = "06-30-2019"
date = datetime.datetime.strptime(date,'%m-%d-%Y').strftime('%m-%d-%Y')
Check type of date variable.
type(date)
Out[]: str
It is in string format. To perform subtraction operation you must convert it into date format first. You can use pd.to_datetime()
# Import packages
import pandas as pd
from datetime import timedelta
# input date
date = "06-30-2019"
# Convert it into pd.to_datetime format
date = pd.to_datetime(date)
print(date)
# Substracting days
number_of_days = 1
new_date = date - timedelta(number_of_days)
print(new_date)
output:
2019-06-29 00:00:00
If you want to get rid of timestamp you can use:
str(new_date.date())
Out[]: '2019-06-29'
use timedelta
import datetime
date = datetime.datetime.strptime("06/30/19" ,"%m/%d/%y")
print( date - datetime.timedelta(days=1))

difference between 2 dates in days saved into new column as float

I have a python dataframe with 2 columns that contain dates as strings e.g. start_date "2002-06-12" and end_date "2009-03-01". I would like to calculate the difference (days) between these 2 columns for each row and save the results into a new column called for example time_diff of type float.
I have tried:
df["time_diff"] = (pd.Timestamp(df.end_date) - pd.Timestamp(df.start_date )).astype("timedelta64[d]")
pd.to_numeric(df["time_diff"])
based on some tutorials but this gives TypeError: Cannot convert input for the first line. What do I need to change to get this running?
Here is a working example of converting a string column of a dataframe to datetime type and saving the time difference between the datetime columns in a new column as a float data type( number of seconds)
import pandas as pd
from datetime import timedelta
tmp = [("2002-06-12","2009-03-01"),("2016-04-28","2022-03-14")]
df = pd.DataFrame(tmp,columns=["col1","col2"])
df["col1"]=pd.to_datetime(df["col1"])
df["col2"]=pd.to_datetime(df["col2"])
df["time_diff"]=df["col2"]-df["col1"]
df["time_diff"]=df["time_diff"].apply(timedelta.total_seconds)
Time difference in seconds can be converted to minutes or days by using simple math.
Try:
import numpy as np
enddates = np.asarray([pd.Timestamp(end) for end in df.end_date.values])
startdates = np.asarray([pd.Timestamp(start) for start in df.start_date.values])
df['time_diff'] = (enddates - startdates).astype("timedelta64")
First convert strings to datetime, then calculate difference in days.
df['start_date'] = pd.to_datetime(df['start_date'], format='%Y-%m-%d')
df['end_date'] = pd.to_datetime(df['end_date'], format='%Y-%m-%d')
df['time_diff'] = (df.end_date - df.start_date).dt.days
You can also do it by converting your columns into date and then computing the difference :
from datetime import datetime
df = pd.DataFrame({'Start Date' : ['2002-06-12', '2002-06-12' ], 'End date' : ['2009-03-01', '2009-03-06']})
df['Start Date'] = [ datetime.strptime(x, "%Y-%m-%d").date() for x in df['Start Date'] ]
df['End date'] = [ datetime.strptime(x, "%Y-%m-%d").date() for x in df['End date'] ]
df['Diff'] = df['End date'] - df['Start Date']
Out :
End date Start Date Diff
0 2009-03-01 2002-06-12 2454 days
1 2009-03-06 2002-06-12 2459 days
You should just use pd.to_datetime to convert your string values:
df["time_diff"] = (pd.to_datetime(df.end_date) - pd.to_datetime(df.start_date))
The resul will automatically be a timedelta64
You can try this :
df = pd.DataFrame()
df['Arrived'] = [pd.Timestamp('01-04-2017')]
df['Left'] = [pd.Timestamp('01-06-2017')]
diff = df['Left'] - df['Arrived']
days = pd.Series(delta.days for delta in (diff)
result = days[0]

pandas.to_datetime with different length date strings

I have a column of timestamps that I would like to convert to datetime in my pandas dataframe. The format of the dates is %Y-%m-%d-%H-%M-%S which pd.to_datetime does not recognize. I have manually entered the format as below:
df['TIME'] = pd.to_datetime(df['TIME'], format = '%Y-%m-%d-%H-%M-%S')
My problem is some of the times do not have seconds so they are shorter
(format = %Y-%m-%d-%H-%M).
How can I get all of these strings to datetimes?
I was thinking I could add zero seconds (-0) to the end of my shorter dates but I don't know how to do that.
try strftime and if you want the right format and if Pandas can't recognize your custom datetime format, you should provide it explicetly
from functools import partial
df1 = pd.DataFrame({'Date': ['2018-07-02-06-05-23','2018-07-02-06-05']})
newdatetime_fmt = partial(pd.to_datetime, format='%Y-%m-%d-%H-%M-%S')
df1['Clean_Date'] = (df1.Date.str.replace('-','').apply(lambda x: pd.to_datetime(x).strftime('%Y-%m-%d-%H-%M-%S'))
.apply(newdatetime_fmt))
print(df1,df1.dtypes)
output:
Date Clean_Date
0 2018-07-02-06-05-23 2018-07-02 06:05:23
1 2018-07-02-06-05 2018-07-02 06:05:00
Date object
Clean_Date datetime64[ns]

Categories

Resources