Setting Time with interval of 1 minute - python

I have a dataset which comprises of minutely data for 2 stocks over 3 months. I have to create date in the first column and time (with interval of 1 minute) in the next column for 3 months. I am attaching the snap of 1 such data set. Kindly help me to solve this problem.
Data Format

-Create 3 month range numpy array of dates and time with minute frequency
date_rng = pd.date_range(start='1/1/2021', end='3/31/2021', freq='min')
-Isolate dates
date = date_rng.date
-Isolate times
time = date_rng.time
-Create pandas dataframe with 2 columns (date and time)
pd.DataFrame({'date': date, 'time': time})
-Then simply concat the new dataframe with your existing dataframe on the column axis.
***** Remove Saturday and Sunday *****
You could remove weekends by creating a column with weekend day names and then taking a slice of the dataframe exluding Saturday and Sunday:
date_rng = pd.date_range(start='1/1/2021', end='3/31/2021', freq='min')
date = date_rng.date
time = date_rng.time
day = date_rng.day_name()
df = pd.DataFrame({'date': date, 'time': time, 'day': day})
Remove Sat and Sun with this code:
sat = df.day != 'Saturday'
sun = df.day != 'Sunday'
df = df[sat & sun]
As for Holidays, you could use the same method but you would need a list of the holidays applicable in your region.
****** Trading times ******
marketOpen = datetime.strptime('9:15:00', "%H:%M:%S").time()
marketClose = datetime.strptime('15:59:00', "%H:%M:%S").time()
df = df[(df.time >= marketOpen) & (df.time <= marketClose)]
******* Exclude specific day ******
holiday = datetime.strptime("03/30/2021", "%m/%d/%Y").date()
df = df[df.date != holiday]
Lastly, don't forget to reset your dataframe's index.

Related

Resample daily time series to business day

I have the following daily time series that I want to resample (aggregate sum) for only business days (Mon - Fri)
but this code also aggregate the weekends (Sat & Sun)
df_resampled = df.resample('5B').sum()
You can exclude weekends in boolean indexing with DatetimeIndex.dayofweek:
df_resampled = df[~df.index.dayofweek.isin([5,6])].resample('5B').sum()
df_resampled = df[df.index.dayofweek < 5].resample('5B').sum()
You can pivot the table on day of week and remove weekends. Check this out.
Step 0: generate random example (You already have data so you shouldn't really care about this step)
import pandas as pd
import numpy as np
def random_dates(start, end, n, freq, seed=None):
if seed is not None:
np.random.seed(seed)
dr = pd.date_range(start, end, freq=freq)
return pd.to_datetime(np.sort(np.random.choice(dr, n, replace=False)))
dates = random_dates('2015-01-01', '2018-01-01', 10, 'H', seed=[3, 1415])
df = pd.DataFrame()
df.index = pd.DatetimeIndex(dates.date)
df['Sales'] = np.random.randint(1, 5, size=len(df))
Step 1: Get days of week
df['Day of week'] = df.index.to_series().dt.dayofweek
# 0 is Monday - 6 is Saturday
Step 2: Get the result you asked for
# remove days 5 and 6 (Sun and Sat) and pivot on day of week
result = df[df['Day of week'] < 5].pivot_table(index = 'Day of week', values = "Sales", aggfunc = "sum")
print(result)
Example output:
Sales
Day of week
0 11
3 1
4 14
Again remember: 0 is Monday - 6 is Saturday. You can change these to names to get a more beautiful output.

Convert week number in dataframe to start date of week (Monday)

I'm looking to convert daily data into weekly data. Here is the code I've used to achieve this
daily_data['Week_Number'] = pd.to_datetime(daily_data['candle_date']).dt.week
daily_data['Year'] = pd.to_datetime(daily_data['candle_date']).dt.year
df2 = daily_data.groupby(['Year', 'Week_Number']).agg({'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum', 'market_cap': 'sum'})
Currently, the dataframe output looks as below -
open high low close volume market_cap
Year Week_Number
2020 31 11106.793367 12041.230145 10914.007709 11059.660924 86939673211 836299315108
32 11059.658520 11903.881608 11011.841384 11653.660942 125051146775 1483987715241
33 11665.874956 12047.515879 11199.052457 11906.236593 141819289223 1513036354035
34 11915.898402 12382.422676 11435.685834 11671.520767 136888268138 1533135548697
35 11668.211439 11806.669046 11183.114210 11704.963980 122232543594 1490089199926
36 11713.540300 12044.196936 9951.201578 10277.329333 161912442921 1434502733759
I'd like the output to have a column week_date that shows the date of Monday of the week as the start date. Ex: Show 27-07-2020 in place of 31st week of 2020 and so on. It's this final piece that I'm stuck with really badly. Please could I request some help to achieve this.
**
SOLUTION FOR THOSE WHO NEED
**
The entire function used to convert daily data to weekly below
def convert_dailydata_to_weeklydata(daily_data):
# Print function name
SupportMethods.print_func_name()
# Loop over the rows until a row with Monday as date is present
row_counter_start = 0
while True:
if datetime.weekday(daily_data['candle_date'][row_counter_start]) == 0:
break
row_counter_start += 1
# # Loop over the rows until a row with Sunday as date is present
# row_counter_end = len(daily_data.index) - 1
# while True:
# if datetime.weekday(daily_data['candle_date'][row_counter_end]) == 6:
# break
# row_counter_end -= 1
# print(daily_data)
# print(row_counter_end)
# Copy all rows after the first Monday row of data is reached
daily_data_temp = daily_data[row_counter_start:]
# Getting week number
daily_data_temp['Week_Number'] = pd.to_datetime(daily_data_temp['candle_date']).dt.week
# Getting year. Weeknum is common across years to we need to create unique index by using year and weeknum
daily_data_temp['Year'] = pd.to_datetime(daily_data_temp['candle_date']).dt.year
# Grouping based on required values
df = daily_data_temp.groupby(['Year', 'Week_Number']).agg(
{'open': 'first', 'high': 'max', 'low': 'min', 'close': 'last', 'volume': 'sum', 'market_cap': 'sum'})
# Reset index
df = df.reset_index()
# Create week date (start of week)
# The + "1" isfor the day of the week.Week numbers 0-6 with 0 being Sunday and 6 being Saturday.
df['week_date'] = pd.to_datetime(df['Year'].astype(str) + df['Week_Number'].astype(str) + "1", format='%G%V%w')
# Set indexes
df = df.set_index(['Year', 'Week_Number'])
# Re-order columns into a new dataframe
weekly_data = df[["week_date", "open", "high", "low", "close", "volume", "market_cap"]]
weekly_data = weekly_data.rename({'week_date': 'candle_date'}, axis=1)
# Drop index columns
weekly_data.reset_index(drop=True, inplace=True)
# Return data by dropping curent week's data
if datetime.weekday(weekly_data.head(-1)['candle_date']) != 0:
return weekly_data.head(-1)
else:
return weekly_data
df['week_date-Week']=pd.to_datetime(df['Week_Number'].astype(str)+df['Year'].astype(str).add('-1') ,format='%V%G-%u')
Try using pd.to_datetime on the 'Year' and 'Week_Number' columns with a format string for Year, Week of Year, and Day of Week ('%G%V%w'):
df = df.reset_index()
df['week_date'] = pd.to_datetime(
df['Year'].astype(str) + df['Week_Number'].astype(str) + "1",
format='%G%V%w'
)
df = df.set_index(['Year', 'Week_Number'])
The + "1" is for the day of the week. Week numbers 0-6 with 0 being Sunday and 6 being Saturday. (Ref. Format Codes)
df:
open close week_date
Year Week_Number
2020 31 11106.793367 11059.660924 2020-07-27
32 11059.658520 11653.660942 2020-08-03
try via apply() and datetime.strptime() method:
import datetime
df = df.reset_index()
df['week_date']=(df[['Year','Week_Number']].astype(str)
.apply(lambda x:datetime.datetime.strptime('-W'.join(x) + '-1', "%Y-W%W-%w"),1))
df = df.set_index(['Year', 'Week_Number'])
Try use dt.strftime with '%V'
pd.to_datetime(pd.Series(['27-07-2020'])).dt.strftime('%V')
pyspark SQL
When data is too heavy, .apply will take a lot of time to process. I used below code to get month first date and week date starting from Monday.
df= df.withColumn('month_date', trunc('date', 'month'))
Output
date month_date
2019-05-28 2019-05-01
Below gets us week date from date column keeping week start from monday
df= df.withColumn("week_end", next_day("date", "SUN")).withColumn("week_start_date", date_sub("week_end", 6))
I used these on databricks . Apply took more than 2 hr on 200 billion rows data and this one took only 5 mins around

Selecting the last week of each month only from a data frame - Python/Pandas

If I have a data frame that is indexed by weekly dates (2019-01-07, 2019-01-14, 2019-01-21...etc), is there a way in Pandas to efficiently select only the rows that correspond to the last week of each month in the index?
Just get last day of month (MonthEnd), then filter, for example (assuming that you have Date column in your DataFrame):
from pandas.tseries.offsets import MonthEnd
df['MonthEnd'] = df['Date'] + MonthEnd(1)
df[ (df['MonthEnd'] - df['Date']).dt.days <= 7 ]

Pandas Manipulating Freq for Business Day DateRange

I am trying to add a set of common date related columns to my data frame and my approach to building these date columns is off the .date_range() pandas method that will have the date range for my dataframe.
While I can use methods like .index.day or .index.weekday_name for general date columns, I would like to set a business day column based on date_range I constructed, but not sure if I can use the freq attribute nickname 'B' or if I need to create a new date range.
Further, I am hoping to not count those business days based on a list of holiday dates that I have.
Here is my setup:
Holiday table
holiday_table = holiday_table.set_index('date')
holiday_table_dates = holiday_table.index.to_list() # ['2019-12-31', etc..]
Base Date Table
data_date_range = pd.date_range(start=date_range_start, end=date_range_end)
df = pd.DataFrame({'date': data_date_range}).set_index('date')
df['day_index'] = df.index.day
# Weekday Name
df['weekday_name'] = df.index.weekday_name
# Business day
df['business_day'] = data_date_range.freq("B")
Error at df['business_day'] = data_date_range.freq("B"):
---> 13 df['business_day'] = data_date_range.freq("B")
ApplyTypeError: Unhandled type: str
OK, I think I understand your question now. You are looking to create a a new column of working business days (excluding your custom holidays). In my example i just used the regular US holidays from pandas but you already have your holidays as a list in holiday_table_dates but you should still be able to follow the general layout of my example for your specific use. I also used the assumption that you are OK with boolean values for your business_day column:
import pandas as pd
from pandas.tseries.holiday import USFederalHolidayCalendar as h_cal
# sample data
data_date_range = pd.date_range(start='1/1/2019', end='12/31/2019')
df = pd.DataFrame({'date': data_date_range}).set_index('date')
df['day_index'] = df.index.day
# Weekday Name
df['weekday_name'] = df.index.weekday_name
# this is just a sample using US holidays
hday = h_cal().holidays(df.index.min(), df.index.max())
# b is the same date range as bove just with the freq set to business days
b = pd.date_range(start='1/1/2019', end='12/31/2019', freq='B')
# find all the working business day where b is not a holiday
bday = b[~b.isin(hday)]
# create a boolean col where the date index is in your custom business day we just created
df['bday'] = df.index.isin(bday)
day_index weekday_name bday
date
2019-01-01 1 Tuesday False
2019-01-02 2 Wednesday True
2019-01-03 3 Thursday True
2019-01-04 4 Friday True
2019-01-05 5 Saturday False

Is there some Python function like .to_period that could help me extract a fiscal year's week number based on a date?

Essentially, I want to apply some lambda function(?) of some sort to apply to a column in my dataframe that contains dates. Originally, I used dt.week to extract the week number but the calendar dates don't match up with the fiscal year I'm using (Apr 2019 - Mar 2020).
I have tried using pandas' function to_period('Q-MAR) but that seems to be a little bit off. I have been researching other ways but nothing seems to work properly.
Apr 1 2019 -> Week 1
Apr 3 2019 -> Week 1
Apr 30 2019 -> Week 5
May 1 2019 -> Week 5
May 15 2019 -> Week 6
Thank you for any advice or tips in advance!
You can create a DataFrame which contains the dates with a frequency of weeks:
date_rng = pd.date_range(start='01/04/2019',end='31/03/2020', freq='W')
df = pd.DataFrame(date_rng, columns=['date'])
You can then query df for which index the date is smaller than or equal to the value:
df.index[df.date <= query_date][-1]
This will output the largest index which is smaller than or equal to the date you want to examine. I imagine you can pour this into a lambda yourself?
NOTE
This solution has limitations, the biggest one being you have to manually define the datetime dataframe.
I did create a fiscal calendar that can be later improvised to create function in spark
from fiscalyear import *
beginDate = '2016-01-01'
endDate = '2021-12-31'
#create empty dataframe
df = spark.createDataFrame([()])
#create date from given date range
df1 = df.withColumn("date",explode(expr(f"sequence(to_date('{beginDate}'), to_date('{endDate}'), interval 1 day)")))
# get week
df1 = df1.withColumn('week',weekofyear(col("date"))).withColumn('year',year(col("date")))
#translate to use pandas in python
df1 = df1.toPandas()
#get fiscal year
df1['financial_year'] = df1['date'].map(lambda x: x.year if x.month > 3 else x.year-1)
df1['date'] = pd.to_datetime(df1['date'])
#get calendar qtr
df1['quarter_old'] = df1['date'].dt.quarter
#get fiscal calendar
df1['quarter'] = np.where(df1['financial_year']< (df1['year']),df1['quarter_old']+3,df1['quarter_old'])
df1['quarter'] = np.where(df1['financial_year'] == (df1['year']),df1['quarter_old']-1,df1['quarter'])
#get fiscal week by shiftin gas per number of months different from usual calendar
df1["fiscal_week"] = df1.week.shift(91)
df1 = df1.loc[(df1['date'] >= '2020-01-01')]
df1.display()

Categories

Resources