When computing the expiration date of some financial instruments, usually all we have to go off of is the following:
Trade Date RIC
5/22/1989 SPH0
5/23/1989 SPH0
5/24/1989 SPH0
5/25/1989 SPH0
5/26/1989 SPH0
Where the trade date is a day the instrument is being traded, and the RIC is a 4 letter string composed of:
First 2 characters = an asset class
3rd character = expiration month
last character = last digit of expiration year
Expiration months explained:
month_codes_to_int = {'F':'1', 'G':'2', 'H':'3', 'J':'4', 'K':'5', 'M':'6',
'N':'7', 'Q':'8', 'U':'9', 'V':'10', 'X':'11', 'Z':'12'}
I am generating an expiration year for every row by using the trade date and the RIC... I am doing so by iterating over the DF and filling an empty cell in a column every time with a function that will correctly compute the expiration date.
for index, row in df.iterrows():
row['Trade Date'] = pd.to_datetime(row['Trade Date'])
print(row['Trade Date'], row['RIC'])
current_year = row['Trade Date'].year
asset_class = row['RIC'].split[0:3]
expiration_month = row['RIC'][2]
expiration_year_last_digit = row['RIC'][3]
expiration_year =
My methodology is to have the expiration date be the closest date with the month and year to the current date, so for 5/22/1989 for example it would be 3/15/1990 (day doesn't matter, 1990 is the closest year ending with a 0 to 1989).
Is there a way to automate this using pandas datetime features?
First, create series containing your month, year, as well as the last digit of your TradeDate column.
m = df.RIC.str[2].map(month_codes_to_int)
y = df.RIC.str[3].astype(int)
s = df.TradeDate.dt.year.mod(10)
Then calculate your offset:
offset = np.where(y==s, 0, 10+y-s)
Finally, create your new column:
pd.to_datetime((df.TradeDate.dt.year + offset).astype(str) + m, format='%Y%m')
Output:
0 1990-03-01
1 1990-03-01
2 1990-03-01
3 1990-03-01
4 1990-03-01
dtype: datetime64[ns]
Related
I have a dataset which comprises of minutely data for 2 stocks over 3 months. I have to create date in the first column and time (with interval of 1 minute) in the next column for 3 months. I am attaching the snap of 1 such data set. Kindly help me to solve this problem.
Data Format
-Create 3 month range numpy array of dates and time with minute frequency
date_rng = pd.date_range(start='1/1/2021', end='3/31/2021', freq='min')
-Isolate dates
date = date_rng.date
-Isolate times
time = date_rng.time
-Create pandas dataframe with 2 columns (date and time)
pd.DataFrame({'date': date, 'time': time})
-Then simply concat the new dataframe with your existing dataframe on the column axis.
***** Remove Saturday and Sunday *****
You could remove weekends by creating a column with weekend day names and then taking a slice of the dataframe exluding Saturday and Sunday:
date_rng = pd.date_range(start='1/1/2021', end='3/31/2021', freq='min')
date = date_rng.date
time = date_rng.time
day = date_rng.day_name()
df = pd.DataFrame({'date': date, 'time': time, 'day': day})
Remove Sat and Sun with this code:
sat = df.day != 'Saturday'
sun = df.day != 'Sunday'
df = df[sat & sun]
As for Holidays, you could use the same method but you would need a list of the holidays applicable in your region.
****** Trading times ******
marketOpen = datetime.strptime('9:15:00', "%H:%M:%S").time()
marketClose = datetime.strptime('15:59:00', "%H:%M:%S").time()
df = df[(df.time >= marketOpen) & (df.time <= marketClose)]
******* Exclude specific day ******
holiday = datetime.strptime("03/30/2021", "%m/%d/%Y").date()
df = df[df.date != holiday]
Lastly, don't forget to reset your dataframe's index.
quarter = pd.Timestamp(dt.date(2020, 1, 1)).quarter
assert quarter == 1
df['quarter'] = df['date'].dt.quarter
This returns a 1,2,3 or 4 in df['quarter'] depending on the date in column df['date'].
What I would like to have is this format in column df['quarter']:
Qx-2019 or Qx-2020 depending on the year, where x is the quarter found with the script above.
How can I get the specific year from the quarter and add the formt Qx-year?
Thank you.
Try with to_period
s.dt.to_period('Q')
Out[159]:
0 2020Q4
1 2019Q1
dtype: period[Q-DEC]
Update
'Q' + df['date'].dt.quarter.astype(str) + '-' + df['date'].dt.year.astype(str)
I am trying to solve for how to get the values of year to date versus last year to date from a dataframe.
Dataframe:
ID start_date distance
1 2019-7-25 2
2 2019-7-26 2
3 2020-3-4 1
4 2020-3-4 1
5 2020-3-5 3
6 2020-3-6 3
There is data back to 2017 and more data will keep getting added so I would like the YTD and LYTD to be dynamic based upon the current year.
I know how to get the cumulative sum for each year and month but I am really struggling with how to calculate the YTD and LYTD.
year_month_distance_df = distance_kpi_df.groupby(["Start_Year","Start_Month"]).agg({"distance":"sum"}).reset_index()
The other code I tried:
cum_sum_distance_ytd =
distance_kpi_df[["start_date_local","distance"]]
cum_sum_distance_ytd = cum_sum_distance_ytd.set_index("start_date_local")
cum_sum_distance_ytd = cum_sum_distance_ytd.groupby(pd.Grouper(freq = "D")).sum()
When I try this logic and add Start_Day into the group by it obviously just sums all the data for that day.
Expected output:
Year to Date = 8
Last Year to Date = 4
You could split the date into its components and get the ytd for all years with
expanding = df.groupby([
df.start_date.month, df.start_date.day, df.start_date.year
]).distance.sum().unstack().cumsum()
Unstacking will fill with np.nan wherever any year does not have a value in the row's date... if that is a problem you can use the fill_value parameter
.unstack(fill_value=0).cumsum()
I am trying to add a set of common date related columns to my data frame and my approach to building these date columns is off the .date_range() pandas method that will have the date range for my dataframe.
While I can use methods like .index.day or .index.weekday_name for general date columns, I would like to set a business day column based on date_range I constructed, but not sure if I can use the freq attribute nickname 'B' or if I need to create a new date range.
Further, I am hoping to not count those business days based on a list of holiday dates that I have.
Here is my setup:
Holiday table
holiday_table = holiday_table.set_index('date')
holiday_table_dates = holiday_table.index.to_list() # ['2019-12-31', etc..]
Base Date Table
data_date_range = pd.date_range(start=date_range_start, end=date_range_end)
df = pd.DataFrame({'date': data_date_range}).set_index('date')
df['day_index'] = df.index.day
# Weekday Name
df['weekday_name'] = df.index.weekday_name
# Business day
df['business_day'] = data_date_range.freq("B")
Error at df['business_day'] = data_date_range.freq("B"):
---> 13 df['business_day'] = data_date_range.freq("B")
ApplyTypeError: Unhandled type: str
OK, I think I understand your question now. You are looking to create a a new column of working business days (excluding your custom holidays). In my example i just used the regular US holidays from pandas but you already have your holidays as a list in holiday_table_dates but you should still be able to follow the general layout of my example for your specific use. I also used the assumption that you are OK with boolean values for your business_day column:
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar as h_cal
# sample data
data_date_range = pd.date_range(start='1/1/2019', end='12/31/2019')
df = pd.DataFrame({'date': data_date_range}).set_index('date')
df['day_index'] = df.index.day
# Weekday Name
df['weekday_name'] = df.index.weekday_name
# this is just a sample using US holidays
hday = h_cal().holidays(df.index.min(), df.index.max())
# b is the same date range as bove just with the freq set to business days
b = pd.date_range(start='1/1/2019', end='12/31/2019', freq='B')
# find all the working business day where b is not a holiday
bday = b[~b.isin(hday)]
# create a boolean col where the date index is in your custom business day we just created
df['bday'] = df.index.isin(bday)
day_index weekday_name bday
date
2019-01-01 1 Tuesday False
2019-01-02 2 Wednesday True
2019-01-03 3 Thursday True
2019-01-04 4 Friday True
2019-01-05 5 Saturday False
I have a table in excel that the column header is the month and the rows are the days. I need to get today's current date which i have already done. Once i do this i need to match the month and day with the column "cy_day".
Example:
if todays day is jan 3 then it should only return "2".
Excel File:
cy_day jan feb mar
1 1 1 1
2 3 2 4
3 4 4 5
4 7 5 6
import pandas as pd
from pandas import DataFrame
import calendar
cycle_day_path = 'test\\Documents\\cycle_day_calendar.xlsx'
df = pd.read_excel(cycle_day_path)
df = DataFrame(df, index=None)
print(df)
month = pd.to_datetime('today').strftime("%b")
day = pd.to_datetime('today').strftime("%d")
Try this:
today = pd.Timestamp('2019-01-03')
col = today.strftime('%b').lower()
df[df[col] == today.day]
Given you've extracted month using '%b', it should just be this after correcting for the upper case in '%b' month name (see http://strftime.org/):
df.loc[df[month.lower()] == day, 'cy_day']
Now for Jan 3 you will get 2 (as a DataFrame). If you want just the number 2 do:
df.loc[df[month.lower()] == day, 'cy_day'].values[0]
the value of the month variable as returned by pd.to_datetime('today').strftime("%b") is a capitalized string, so in order to use is to access column from yo your dataframe should lowercase it.
so first you should do
month = month.lower()
after date you need to make sure that the values in your month columns are of type str since you are going to compare them with an str value.
day_of_month = df[month] == day
df["cy_day"][day_of_month]
if they are not of type str, you should convert the day variable to the same type as the month columns