I have a panda dataframe (stock prices) with an index in a date format. It is daily but only for working days.
I basically try to compute some price performance YTD and from a year ago.
To get the first date of the actual year in my dataframe I used the following method:
today = str(datetime.date.today())
curr_year = int(today[:4])
curr_month = int(today[5:7])
first_date_year = (df[str(curr_year)].first_valid_index())
Now I try to get the closest date a year ago (exactly one year from the last_valid_index()). I could extract the month and the year but then it wouldn't be as precise. Any suggestion ?
Thanks
Since you didn't provide any data, I am assuming that you have a list of dates (string types) like the following:
dates = ['11/01/2016', '12/01/2016', '02/01/2017', '03/01/2017']
You then need to transform that into datetime format, I would suggest using pandas:
pd_dates = pd.to_datetime(dates)
Then you have to define today and one year ago. I would suggest using datetime for that:
today = datetime.today()
date_1yr_ago = datetime(today.year-1, today.month, today.day)
Lastly, you slice the date list for dates larger than the date_1yr_ago value and get the first value of that slice:
pd_dates[pd_dates > date_1yr_ago][0]
This will return the first date that is larger than the 1 year ago date.
output:
Timestamp('2017-02-01 00:00:00')
You can convert that datetime value to string with the following code:
datetime.strftime(pd_dates[pd_dates > date_1yr_ago][0], '%Y/%m/%d')
output:
'2017/02/01'
Related
I know I should import datetime to have actual date. But the rest is black magic for me right now.
ex.
dates = ['2019-010-11', '2013-05-16', '2011-06-16', '2000-04-22']
actual_date = datetime.datetime.now()
How can I subtract this and as a result have new list with days that passed by from dates to actual_date?
If I'm understanding correctly, you need to find the current date, and then find the number of days between the current date and the dates in your list?
If so, you could try this:
from datetime import datetime, date
dates = ['2019-10-11', '2013-05-16', '2011-06-16', '2000-04-22']
actual_date = date.today()
days = []
for date in dates:
date_object = datetime.strptime(date, '%Y-%m-%d').date()
days_difference = (actual_date - date_object).days
days.append(days_difference)
print(days)
What I am doing here is:
Converting the individual date strings to a "date" object
Subtracting the this date from the actual date. This gets you the time as well, so to strip that out we add .days.
Save the outcome to a list, although of course you could do whatever you wanted with the output.
Essentially I want to create a new column that has the number of days remaining until maturity from today. The code below doesn't work, kind of stuck what to do next as nearly all examples showcase doing math on 2 DF columns.
today = date.today()
today = today.strftime("%m/%d/%y")
df['Maturity Date'] = df['Maturity Date'].apply(pd.to_datetime)
df['Remaining Days til Maturity] = (df['Maturity Date'] - today)
You're mixing types, it's like subtracting apples from pears. In your example, today is a string representing - to us humans - a date (in some format, looks like the one used in the USA). Your pandas Series (the column of interest in your DataFrame) has a datetime64[ns] type, after you did the apply(pd.to_datetime) (which, you could do more efficiently without the apply as that will run an operation in a non-vectorized way over every element of the Series - have a look below, where I'm converting those strings into datetime64[ns] type in a vectorized way).
The main idea is that whenever you do operations with multiple objects, they should be of the same type. Sometimes frameworks will automatically convert types for you, but don't rely on it.
import pandas as pd
df = pd.DataFrame({"date": ["2000-01-01"]})
df["date"] = pd.to_datetime(df["date"])
today = pd.Timestamp.today().floor("D") # That's one way to do it
today
# Timestamp('2021-11-02 00:00:00')
today - df["date"]
# 0 7976 days
# Name: date, dtype: timedelta64[ns]
parse the Maturity Date as a datetime and format it as month/day/year then subtract the Maturity Date as a date type and store the difference in days as Remaining Days til Maturity
from datetime import date
today = date.today()
df=pd.DataFrame({'Maturity Date':'11/04/2021'},index=[0])
df['Maturity Date'] = pd.to_datetime(df['Maturity Date'], format='%m/%d/%Y')
df['Remaining Days til Maturity'] = (df['Maturity Date'].dt.date - today).dt.days
print(df)
output:
Maturity Date Remaining Days til Maturity
0 2021-11-04 2
I am trying to generate a set of dates with pandas date_range functionality. Then I want to iterate over this range and subtract several months from each of the dates (exact number of month is determined in loop) to get a new date.
I get some very odd results when I do this.
MVP:
#get date range
dates = pd.date_range(start = '1/1/2013', end='1/1/2018', freq=str(test_size)+'MS', closed='left', normalize=True)
#take first date as example
date = dates[0]
date
Timestamp('2013-01-01 00:00:00', freq='3MS')
So far so good.
Now let's say I want to go just one month back from this date. I define numpy timedelta (it supports months for definition, while pandas' timedelta doesn't):
#get timedelta of 1 month
deltaGap = np.timedelta64(1,'M')
#subtract one month from date
date - deltaGap
Timestamp('2012-12-01 13:30:54', freq='3MS')
Why so? Why I get 13:30:54 in time component instead of midnight.
Moreover, if I subtract more than 1 month it the shift becomes so large that I lose a whole day:
#let's say I want to subtract both 2 years and then 1 month
deltaTrain = np.timedelta64(2,'Y')
#subtract 2 years and then subtract 1 month
date - deltaTrain - deltaGap
Timestamp('2010-12-02 01:52:30', freq='3MS')
I've had similar issues with timedelta, and the solution I've ended up using was using relativedelta from dateutil, which is specifically built for this kind of application (taking into account all the calendar weirdness like leap years, weekdays, etc...). For example given:
from dateutil.relativedelta import relativedelta
date = dates[0]
>>> date
Timestamp('2013-01-01 00:00:00', freq='10MS')
deltaGap = relativedelta(months=1)
>>> date-deltaGap
Timestamp('2012-12-01 00:00:00', freq='10MS')
deltaGap = relativedelta(years=2, months=1)
>>> date-deltaGap
Timestamp('2010-12-01 00:00:00', freq='10MS')
Check out the documentation for more info on relativedelta
The issues with numpy.timedelta64
I think that the problem with np.timedelta is revealed in these 2 parts of the docs:
There are two Timedelta units (‘Y’, years and ‘M’, months) which are treated specially, because how much time they represent changes depending on when they are used. While a timedelta day unit is equivalent to 24 hours, there is no way to convert a month unit into days, because different months have different numbers of days.
and
The length of the span is the range of a 64-bit integer times the length of the date or unit. For example, the time span for ‘W’ (week) is exactly 7 times longer than the time span for ‘D’ (day), and the time span for ‘D’ (day) is exactly 24 times longer than the time span for ‘h’ (hour).
So the timedeltas are fine for hours, weeks, months, days, because these are non-variable timespans. However, months and years are variable in length (think leap years), and so to take this into account, numpy takes some sort of "average" (I guess). One numpy "year" seems to be one year, 5 hours, 49 minutes and 12 seconds, while one numpy "month" seems to be 30 days, 10 hours, 29 minutes and 6 seconds.
# Adding one numpy month adds 30 days + 10:29:06:
deltaGap = np.timedelta64(1,'M')
date+deltaGap
# Timestamp('2013-01-31 10:29:06', freq='10MS')
# Adding one numpy year adds 1 year + 05:49:12:
deltaGap = np.timedelta64(1,'Y')
date+deltaGap
# Timestamp('2014-01-01 05:49:12', freq='10MS')
This is not so easy to work with, which is why I would just go to relativedelta, which is much more intuitive (to me).
You can try using pd.DateOffset which is mainly used for applying offset logic (month, year, hour) on dates format.
# get random dates
dates = pd.date_range(start = '1/1/2013', freq='H',periods=100,closed='left', normalize=True)
#take first date as example
date = dates[0]
# subtract a month
dates[0] - pd.DateOffset(months=1)
Timestamp('2012-12-01 00:00:00')
# to apply this on all dates
new_dates = list(map(lambda x: x - pd.DateOffset(months=1), dates))
I have a Dataframe traindf having Date column with the date in the format "YYYY-MM-DD". I am trying to convert the date in day in the year and append to the year. For ex. "2010-02-05" to "2010036". I got the below code working but want to check if there are any efficient way to get it.
dtstrip = [int('%d%03d' % (datetime.datetime.strptime(dt, fmt).timetuple().tm_year, datetime.datetime.strptime(dt, fmt).timetuple().tm_yday)) for dt in traindf['Date']]
traindf['Date'] = dtstrip
You want something like this?
today = datetime.datetime.now()
print(today.strftime('%Y%j'))
For today the output is:
2017260
I have a string that is the full year followed by the ISO week of the year (so some years have 53 weeks, because the week counting starts at the first full week of the year). I want to convert it to a datetime object using pandas.to_datetime(). So I do:
pandas.to_datetime('201145', format='%Y%W')
and it returns:
Timestamp('2011-01-01 00:00:00')
which is not right. Or if I try:
pandas.to_datetime('201145', format='%Y%V')
it tells me that %V is a bad directive.
What am I doing wrong?
I think that the following question would be useful to you: Reversing date.isocalender()
Using the functions provided in that question this is how I would proceed:
import datetime
import pandas as pd
def iso_year_start(iso_year):
"The gregorian calendar date of the first day of the given ISO year"
fourth_jan = datetime.date(iso_year, 1, 4)
delta = datetime.timedelta(fourth_jan.isoweekday()-1)
return fourth_jan - delta
def iso_to_gregorian(iso_year, iso_week, iso_day):
"Gregorian calendar date for the given ISO year, week and day"
year_start = iso_year_start(iso_year)
return year_start + datetime.timedelta(days=iso_day-1, weeks=iso_week-1)
def time_stamp(yourString):
year = int(yourString[0:4])
week = int(yourString[-2:])
day = 1
return year, week, day
yourTimeStamp = iso_to_gregorian( time_stamp('201145')[0] , time_stamp('201145')[1], time_stamp('201145')[2] )
print yourTimeStamp
Then run that function for your values and append them as date time objects to the dataframe.
The result I got from your specified string was:
2011-11-07