How can I calculate number of days between two dates? [duplicate] - python

This question already has answers here:
How to calculate number of days between two given dates
(15 answers)
Closed 3 years ago.
If I have two dates (ex. 19960104 and 19960314), what is the best way to get the number of days between these two dates?
Actually I have to calculate many dates in my dataframe. I use the code:
`for j in range(datenumbers[i]):
date_format = "%Y%m%d"
a = datetime.strptime(str(df_first_day.date[j]), date_format)
b = datetime.strptime(str(df_first_day.exdate[j]), date_format)
delta = (b - a).days
df_first_day.days_to_expire[j] = delta`
I need to put every difference between two dates in one of column of my dataframe. I wonder if there is a better way to do as not using for loop

You will find dates much easier to handle if you first convert text to datetime.
Then it becomes trivial to compute a timedelta that answers your question.
import datetime as dt
fmt = '%Y%m%d'
d1 = dt.datetime.strptime('19960104', fmt)
d2 = dt.datetime.strptime('19960314', fmt)
print((d2 - d1).days)
This displays:
70
EDIT
If you choose to define a function:
def num_to_timestamp(ymd):
return dt.datetime.strptime(str(num), '%Y%m%d')
then you can conveniently apply it to a column:
df['date'] = df['date'].apply(num_to_timestamp)
Similarly the axis=1 option lets you construct a column that is difference of two existing columns.

Related

Converting Pandas Object to minutes and seconds [duplicate]

This question already has answers here:
Pandas - convert strings to time without date
(3 answers)
Closed 1 year ago.
I have a column in for stop_time 05:38 (MM:SS) but it is showing up as an object. is there a way to turn this to a time?
I tried using # perf_dfExtended['Stop_Time'] = pd.to_datetime(perf_dfExtended['Stop_Time'], format='%M:%S')
but then it adds a date to the output: 1900-01-01 00:05:38
I guess what you're looking for is pd.to_timedelta (https://pandas.pydata.org/docs/reference/api/pandas.to_timedelta.html). to_datetime operation which will of course always try to create a date.
What you have to remember about though is that pd.to_timedelta could raise ValueError for your column, as it requires hh:mm:ss format. Try to use apply function on your column by adding '00:' by the beginning of arguments of your column (which I think are strings?), and then turn the column to timedelta. Could be something like:
pd.to_timedelta(perf_dfExtended['Stop_Time'].apply(lambda x: f'00:{x}'))
This may work for you:
perf_dfExtended['Stop_Time'] = \
pd.to_datetime(perf_dfExtended['Stop_Time'], format='%M:%S').dt.time
Output (with some additional examples)
0 00:05:38
1 00:10:17
2 00:23:45

calculate the difference of two timestamp columns [duplicate]

This question already has answers here:
Calculate Time Difference Between Two Pandas Columns in Hours and Minutes
(4 answers)
calculate the time difference between two consecutive rows in pandas
(2 answers)
Closed 2 years ago.
I have a dataset like this:
data = pd.DataFrame({'order_date-time':['2017-09-13 08:59:02', '2017-06-28 11:52:20', '2018-05-18 10:25:53', '2017-08-01 18:38:42', '2017-08-10 21:48:40','2017-07-27 15:11:51',
'2018-03-18 21:00:44','2017-08-05 16:59:05', '2017-08-05 16:59:05','2017-06-05 12:22:19'],
'delivery_date_time':['2017-09-20 23:43:48', '2017-07-13 20:39:29','2018-06-04 18:34:26','2017-08-09 21:26:33','2017-08-24 20:04:21','2017-08-31 20:19:52',
'2018-03-28 21:57:44','2017-08-14 18:13:03','2017-08-14 18:13:03','2017-06-26 13:52:03']})
I want to calculate the time differences between these dates as the number of days and add it to the table as the delivery delay column. But I need to include both day and time for this calculation
for example, if the difference is 7 days 14:44:46 we can round this to 7 days.
from datetime import datetime
datetime.strptime(date_string, format)
you could use this to convert the string to DateTime format and put it in variable and then calculate it
Visit https://www.journaldev.com/23365/python-string-to-datetime-strptime/
Python's datetime library is good to work with individual timestamps. If you have your data in a pandas DataFrame as in your case, however, you should use pandas datetime functionality.
To convert a column with timestamps from stings to proper datetime format you can use pandas.to_datetime():
data['order_date_time'] = pd.to_datetime(data['order_date_time'], format="%Y-%m-%d %H:%M:%S")
data['delivery_date_time'] = pd.to_datetime(data['delivery_date_time'], format="%Y-%m-%d %H:%M:%S")
The format argument is optional, but I think it is a good idea to always use it to make sure your datetime format is not "interpreted" incorrectly. It also makes the process much faster on large data-sets.
Once you have the columns in a datetime format you can simply calculate the timedelta between them:
data['delay'] = data['delivery_date_time'] - data['order_date_time']
And then finally, if you want to round this timedelta, then pandas has again the right method for this:
data['approx_delay'] = data['delay'].dt.round('d')
where the extra dt gives access to datetime specific methods, the round function takes a frequency as arguments, and in this case that frequency has been set to a day using 'd'

Pandas time series decomposition based on leap year [duplicate]

This question already has answers here:
Subtract a year from a datetime column in pandas
(4 answers)
Closed 4 years ago.
I have a pandas Time Series (called df) that has one column (with name data) that contains data with a daily frequency over a time period of 5 years. The following code produces some random data:
import pandas as pd
import numpy as np
df_index = pd.date_range('01-01-2012', periods=5 * 365 + 2, freq='D')
df = pd.DataFrame({'data': np.random.rand(len(df_index))}, index=df_index)
I want to perform a simple yearly trend decomposition, where for each day I subtract its value one year ago. Aditionally, I want to attend leap years in the subtraction. Is there any elegant way to do that? My way to do this is to perform differences with 365 and 366 days and assign them to new columns.
df['diff_365'] = df['data'].diff(365)
df['diff_366'] = df['data'].diff(366)
Afterwards, I apply a function to each row thats selects the right value based on whether the same date from last year is 365 or 366 days ago.
def decide(row):
if (row.name - 59).is_leap_year:
return row[1]
else:
return row[0]
df['yearly_diff'] = df[['diff_365', 'diff_366']].apply(decide, axis=1)
Explanation: the function decide takes as argument a row from the DataFrame consisting of the columns diff_365 and diff_366 (along with the DatetimeIndex). The expression row.name returns the date of the row and assuming the time series has daily frequency (freq = 'D'), 59 days are subtracted which is the number of days from 1st January to 28th February. Based on whether the resulting date is a day from a leap year, the value from the diff_366 column is returned, otherwise the value from the diff_365 column.
This took 8 lines and it feels that the subtraction can be performed in one or two lines. I tried to apply a similiar function directly to the data column (via apply and taking the default argument axis=0). But in this case, I cannot take my DatetimeIndex into account. Is there a better to perform the subtraction?
You may not need to worry about dealing with leap years explicitly. When you construct a DatetimeIndex, you can specify start and end parameters. As per the docs:
Of the four parameters start, end, periods, and freq, exactly three
must be specified.
Here's an example of how you can restructure your logic:
df_index = pd.date_range(start='01-01-2012', end='12-31-2016', freq='D')
df = pd.DataFrame({'data': np.random.rand(len(df_index))}, index=df_index)
df['yearly_diff'] = df['data'] - (df_index - pd.DateOffset(years=1)).map(df['data'].get)
Explanation
We construct a DatetimeIndex object by supplying start, end and freq arguments.
Subtract 1 year from your index by subtracting pd.DateOffset(years=1).
Use pd.Series.map to map these 1yr behind dates to data.
Subtract the resulting series from the original data series.

Get last date in each month of a time series pandas

Currently I'm generating a DateTimeIndex using a certain function, zipline.utils.tradingcalendar.get_trading_days. The time series is roughly daily but with some gaps.
My goal is to get the last date in the DateTimeIndex for each month.
.to_period('M') & .to_timestamp('M') don't work since they give the last day of the month rather than the last value of the variable in each month.
As an example, if this is my time series I would want to select '2015-05-29' while the last day of the month is '2015-05-31'.
['2015-05-18', '2015-05-19', '2015-05-20', '2015-05-21',
'2015-05-22', '2015-05-26', '2015-05-27', '2015-05-28',
'2015-05-29', '2015-06-01']
Condla's answer came closest to what I needed except that since my time index stretched for more than a year I needed to groupby by both month and year and then select the maximum date. Below is the code I ended up with.
# tempTradeDays is the initial DatetimeIndex
dateRange = []
tempYear = None
dictYears = tempTradeDays.groupby(tempTradeDays.year)
for yr in dictYears.keys():
tempYear = pd.DatetimeIndex(dictYears[yr]).groupby(pd.DatetimeIndex(dictYears[yr]).month)
for m in tempYear.keys():
dateRange.append(max(tempYear[m]))
dateRange = pd.DatetimeIndex(dateRange).order()
Suppose your data frame looks like this
original dataframe
Then the following Code will give you the last day of each month.
df_monthly = df.reset_index().groupby([df.index.year,df.index.month],as_index=False).last().set_index('index')
transformed_dataframe
This one line code does its job :)
My strategy would be to group by month and then select the "maximum" of each group:
If "dt" is your DatetimeIndex object:
last_dates_of_the_month = []
dt_month_group_dict = dt.groupby(dt.month)
for month in dt_month_group_dict:
last_date = max(dt_month_group_dict[month])
last_dates_of_the_month.append(last_date)
The list "last_date_of_the_month" contains all occuring last dates of each month in your dataset. You can use this list to create a DatetimeIndex in pandas again (or whatever you want to do with it).
This is an old question, but all existing answers here aren't perfect. This is the solution I came up with (assuming that date is a sorted index), which can be even written in one line, but I split it for readability:
month1 = pd.Series(apple.index.month)
month2 = pd.Series(apple.index.month).shift(-1)
mask = (month1 != month2)
apple[mask.values].head(10)
Few notes here:
Shifting a datetime series requires another pd.Series instance (see here)
Boolean mask indexing requires .values (see here)
By the way, when the dates are the business days, it'd be easier to use resampling: apple.resample('BM')
Maybe the answer is not needed anymore, but while searching for an answer to the same question I found maybe a simpler solution:
import pandas as pd
sample_dates = pd.date_range(start='2010-01-01', periods=100, freq='B')
month_end_dates = sample_dates[sample_dates.is_month_end]
Try this, to create a new diff column where the value 1 points to the change from one month to the next.
df['diff'] = np.where(df['Date'].dt.month.diff() != 0,1,0)

How to calculate the difference between dates in python [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How can I subtract a day from a python date?
subtract two times in python
I had generated date in python as below
import time
time_stamp = time.strftime('%Y-%m-%d')
print time_stamp
Result:
'2012-12-19'
What i am trying is, if above time_stamp is the present today's date , i want a date '2012-12-17' by performing difference(substraction of two days)
So how to perform date reduction of two days from the current date in python
To perform calculations between some dates in Python use the timedelta class from the datetime module.
To do what you want to achieve, the following code should suffice.
import datetime
time_stamp = datetime.datetime(day=21, month=12, year=2012)
difference = time_stamp - datetime.timedelta(day=2)
print '%s-%s-%s' % (difference.year, difference.year, difference.day)
Explanation of the above:
The second line creates a new datetime object (from the the datetime class), with specified day, month and year
The third line creates a timedelta object, which is used to perform calculations between datetime objects

Categories

Resources