pandas - generating next closest year off of last digit

pandas - generating next closest year off of last digit - python

When computing the expiration date of some financial instruments, usually all we have to go off of is the following:
Trade Date RIC
5/22/1989 SPH0
5/23/1989 SPH0
5/24/1989 SPH0
5/25/1989 SPH0
5/26/1989 SPH0
Where the trade date is a day the instrument is being traded, and the RIC is a 4 letter string composed of:
First 2 characters = an asset class
3rd character = expiration month
last character = last digit of expiration year
Expiration months explained:
month_codes_to_int = {'F':'1', 'G':'2', 'H':'3', 'J':'4', 'K':'5', 'M':'6',
'N':'7', 'Q':'8', 'U':'9', 'V':'10', 'X':'11', 'Z':'12'}
I am generating an expiration year for every row by using the trade date and the RIC... I am doing so by iterating over the DF and filling an empty cell in a column every time with a function that will correctly compute the expiration date.
for index, row in df.iterrows():
row['Trade Date'] = pd.to_datetime(row['Trade Date'])
print(row['Trade Date'], row['RIC'])
current_year = row['Trade Date'].year
asset_class = row['RIC'].split[0:3]
expiration_month = row['RIC'][2]
expiration_year_last_digit = row['RIC'][3]
expiration_year =
My methodology is to have the expiration date be the closest date with the month and year to the current date, so for 5/22/1989 for example it would be 3/15/1990 (day doesn't matter, 1990 is the closest year ending with a 0 to 1989).
Is there a way to automate this using pandas datetime features?

First, create series containing your month, year, as well as the last digit of your TradeDate column.
m = df.RIC.str[2].map(month_codes_to_int)
y = df.RIC.str[3].astype(int)
s = df.TradeDate.dt.year.mod(10)
Then calculate your offset:
offset = np.where(y==s, 0, 10+y-s)
Finally, create your new column:
pd.to_datetime((df.TradeDate.dt.year + offset).astype(str) + m, format='%Y%m')
Output:
0 1990-03-01
1 1990-03-01
2 1990-03-01
3 1990-03-01
4 1990-03-01
dtype: datetime64[ns]

Related

Setting Time with interval of 1 minute

I have a dataset which comprises of minutely data for 2 stocks over 3 months. I have to create date in the first column and time (with interval of 1 minute) in the next column for 3 months. I am attaching the snap of 1 such data set. Kindly help me to solve this problem.
Data Format

-Create 3 month range numpy array of dates and time with minute frequency
date_rng = pd.date_range(start='1/1/2021', end='3/31/2021', freq='min')
-Isolate dates
date = date_rng.date
-Isolate times
time = date_rng.time
-Create pandas dataframe with 2 columns (date and time)
pd.DataFrame({'date': date, 'time': time})
-Then simply concat the new dataframe with your existing dataframe on the column axis.
***** Remove Saturday and Sunday *****
You could remove weekends by creating a column with weekend day names and then taking a slice of the dataframe exluding Saturday and Sunday:
date_rng = pd.date_range(start='1/1/2021', end='3/31/2021', freq='min')
date = date_rng.date
time = date_rng.time
day = date_rng.day_name()
df = pd.DataFrame({'date': date, 'time': time, 'day': day})
Remove Sat and Sun with this code:
sat = df.day != 'Saturday'
sun = df.day != 'Sunday'
df = df[sat & sun]
As for Holidays, you could use the same method but you would need a list of the holidays applicable in your region.
****** Trading times ******
marketOpen = datetime.strptime('9:15:00', "%H:%M:%S").time()
marketClose = datetime.strptime('15:59:00', "%H:%M:%S").time()
df = df[(df.time >= marketOpen) & (df.time <= marketClose)]
******* Exclude specific day ******
holiday = datetime.strptime("03/30/2021", "%m/%d/%Y").date()
df = df[df.date != holiday]
Lastly, don't forget to reset your dataframe's index.

Return fiscal quarter from dataframe date column with custom string in Python

quarter = pd.Timestamp(dt.date(2020, 1, 1)).quarter
assert quarter == 1
df['quarter'] = df['date'].dt.quarter
This returns a 1,2,3 or 4 in df['quarter'] depending on the date in column df['date'].
What I would like to have is this format in column df['quarter']:
Qx-2019 or Qx-2020 depending on the year, where x is the quarter found with the script above.
How can I get the specific year from the quarter and add the formt Qx-year?
Thank you.

Try with to_period
s.dt.to_period('Q')
Out[159]:
0 2020Q4
1 2019Q1
dtype: period[Q-DEC]
Update
'Q' + df['date'].dt.quarter.astype(str) + '-' + df['date'].dt.year.astype(str)

Python Pandas Year To Date vs. Last Year To Date (YTD, LYTD)

I am trying to solve for how to get the values of year to date versus last year to date from a dataframe.
Dataframe:
ID start_date distance
1 2019-7-25 2
2 2019-7-26 2
3 2020-3-4 1
4 2020-3-4 1
5 2020-3-5 3
6 2020-3-6 3
There is data back to 2017 and more data will keep getting added so I would like the YTD and LYTD to be dynamic based upon the current year.
I know how to get the cumulative sum for each year and month but I am really struggling with how to calculate the YTD and LYTD.
year_month_distance_df = distance_kpi_df.groupby(["Start_Year","Start_Month"]).agg({"distance":"sum"}).reset_index()
The other code I tried:
cum_sum_distance_ytd =
distance_kpi_df[["start_date_local","distance"]]
cum_sum_distance_ytd = cum_sum_distance_ytd.set_index("start_date_local")
cum_sum_distance_ytd = cum_sum_distance_ytd.groupby(pd.Grouper(freq = "D")).sum()
When I try this logic and add Start_Day into the group by it obviously just sums all the data for that day.
Expected output:
Year to Date = 8
Last Year to Date = 4

You could split the date into its components and get the ytd for all years with
expanding = df.groupby([
df.start_date.month, df.start_date.day, df.start_date.year
]).distance.sum().unstack().cumsum()
Unstacking will fill with np.nan wherever any year does not have a value in the row's date... if that is a problem you can use the fill_value parameter
.unstack(fill_value=0).cumsum()

Pandas Manipulating Freq for Business Day DateRange

I am trying to add a set of common date related columns to my data frame and my approach to building these date columns is off the .date_range() pandas method that will have the date range for my dataframe.
While I can use methods like .index.day or .index.weekday_name for general date columns, I would like to set a business day column based on date_range I constructed, but not sure if I can use the freq attribute nickname 'B' or if I need to create a new date range.
Further, I am hoping to not count those business days based on a list of holiday dates that I have.
Here is my setup:
Holiday table
holiday_table = holiday_table.set_index('date')
holiday_table_dates = holiday_table.index.to_list() # ['2019-12-31', etc..]
Base Date Table
data_date_range = pd.date_range(start=date_range_start, end=date_range_end)
df = pd.DataFrame({'date': data_date_range}).set_index('date')
df['day_index'] = df.index.day
# Weekday Name
df['weekday_name'] = df.index.weekday_name
# Business day
df['business_day'] = data_date_range.freq("B")
Error at df['business_day'] = data_date_range.freq("B"):
---> 13 df['business_day'] = data_date_range.freq("B")
ApplyTypeError: Unhandled type: str

OK, I think I understand your question now. You are looking to create a a new column of working business days (excluding your custom holidays). In my example i just used the regular US holidays from pandas but you already have your holidays as a list in holiday_table_dates but you should still be able to follow the general layout of my example for your specific use. I also used the assumption that you are OK with boolean values for your business_day column:
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar as h_cal
# sample data
data_date_range = pd.date_range(start='1/1/2019', end='12/31/2019')
df = pd.DataFrame({'date': data_date_range}).set_index('date')
df['day_index'] = df.index.day
# Weekday Name
df['weekday_name'] = df.index.weekday_name
# this is just a sample using US holidays
hday = h_cal().holidays(df.index.min(), df.index.max())
# b is the same date range as bove just with the freq set to business days
b = pd.date_range(start='1/1/2019', end='12/31/2019', freq='B')
# find all the working business day where b is not a holiday
bday = b[~b.isin(hday)]
# create a boolean col where the date index is in your custom business day we just created
df['bday'] = df.index.isin(bday)
day_index weekday_name bday
date
2019-01-01 1 Tuesday False
2019-01-02 2 Wednesday True
2019-01-03 3 Thursday True
2019-01-04 4 Friday True
2019-01-05 5 Saturday False

If Current Date is in Column Then Show Row

I have a table in excel that the column header is the month and the rows are the days. I need to get today's current date which i have already done. Once i do this i need to match the month and day with the column "cy_day".
Example:
if todays day is jan 3 then it should only return "2".
Excel File:
cy_day jan feb mar
1 1 1 1
2 3 2 4
3 4 4 5
4 7 5 6
import pandas as pd
from pandas import DataFrame
import calendar
cycle_day_path = 'test\\Documents\\cycle_day_calendar.xlsx'
df = pd.read_excel(cycle_day_path)
df = DataFrame(df, index=None)
print(df)
month = pd.to_datetime('today').strftime("%b")
day = pd.to_datetime('today').strftime("%d")

Try this:
today = pd.Timestamp('2019-01-03')
col = today.strftime('%b').lower()
df[df[col] == today.day]

Given you've extracted month using '%b', it should just be this after correcting for the upper case in '%b' month name (see http://strftime.org/):
df.loc[df[month.lower()] == day, 'cy_day']
Now for Jan 3 you will get 2 (as a DataFrame). If you want just the number 2 do:
df.loc[df[month.lower()] == day, 'cy_day'].values[0]

the value of the month variable as returned by pd.to_datetime('today').strftime("%b") is a capitalized string, so in order to use is to access column from yo your dataframe should lowercase it.
so first you should do
month = month.lower()
after date you need to make sure that the values in your month columns are of type str since you are going to compare them with an str value.
day_of_month = df[month] == day
df["cy_day"][day_of_month]
if they are not of type str, you should convert the day variable to the same type as the month columns

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

pandas - generating next closest year off of last digit - python

Related

Setting Time with interval of 1 minute

Return fiscal quarter from dataframe date column with custom string in Python

Python Pandas Year To Date vs. Last Year To Date (YTD, LYTD)

Pandas Manipulating Freq for Business Day DateRange

If Current Date is in Column Then Show Row

Categories

Resources