Rolling Mean of Rolling Correlation dataframe in Python?

Rolling Mean of Rolling Correlation dataframe in Python? - python

Python beginner here.
What I've done so far:
Imported price data from Yahoo Finance from a list of stocks.
Between the stocks (every combination), computed the 20 day rolling correlation into a dataframe.
I would like to:
1) Calculate the 200 day simple moving average for each of the 20 day rolling correlations.
2) Report the 200 day moving average results in a matrix.
How to do this in python/pandas? Thanks, this would help me out a ton!
Here is what I have so far...
import pandas as pd
from pandas import DataFrame
import datetime
import pandas.io.data as web
from pandas.io.data import DataReader
stocks = ['spy', 'gld', 'uso']
start = datetime.datetime(2014,1,1)
end = datetime.datetime(2015,1,1)
f = web.DataReader(stocks, 'yahoo', start, end)
adj_close_df = f['Adj Close']
correls = pd.rolling_corr(adj_close_df, 20)
means = pd.rolling_mean(correls, 200) #<---- I get an error message here!

This is a start which answers questions 1-3 (you should only have one question per post).
import pandas.io.data as web
import datetime as dt
import pandas as pd
end_date = dt.datetime.now().date()
start_date = end_date - pd.DateOffset(years=5)
symbols = ['AAPL', 'IBM', 'GM']
prices = web.get_data_yahoo(symbols=symbols, start=start_date, end=end_date)['Adj Close']
returns = prices.pct_change()
rolling_corr = pd.rolling_corr_pairwise(returns, window=20)
Getting the rolling mean of the rolling correlation is relatively simple for a single stock against all others. For example:
pd.rolling_mean(rolling_corr.major_xs('AAPL').T, 200).tail()
Out[34]:
AAPL GM IBM
Date
2015-05-08 1 0.313391 0.324728
2015-05-11 1 0.315561 0.327537
2015-05-12 1 0.317844 0.330375
2015-05-13 1 0.320137 0.333189
2015-05-14 1 0.322119 0.335659
To view the correlation matrix for the most recent 200 day window:
>>> rolling_corr.iloc[-200:].mean(axis=0)
AAPL GM IBM
AAPL 1.000000 0.322119 0.335659
GM 0.322119 1.000000 0.383672
IBM 0.335659 0.383672 1.000000

Related

Iterating through a range of dates in Python with missing dates

Here I got a pandas data frame with daily return of stocks and columns are date and return rate.
But if I only want to keep the last day of each week, and the data has some missing days, what can I do?
import pandas as pd
df = pd.read_csv('Daily_return.csv')
df.Date = pd.to_datetime(db.Date)
count = 300
for last_day in ('2017-01-01' + 7n for n in range(count)):
Actually my brain stop working at this point with my limited imagination......Maybe one of the biggest point is "+7n" kind of stuff is meaningless with some missing dates.

I'll create a sample dataset with 40 dates and 40 sample returns, then sample 90 percent of that randomly to simulate the missing dates.
The key here is that you need to convert your date column into datetime if it isn't already, and make sure your df is sorted by the date.
Then you can groupby year/week and take the last value. If you run this repeatedly you'll see that the selected dates can change if the value dropped was the last day of the week.
Based on that
import pandas as pd
import numpy as np
df = pd.DataFrame()
df['date'] = pd.date_range(start='04-18-2022',periods=40, freq='D')
df['return'] = np.random.uniform(size=40)
# Keep 90 percent of the records so we can see what happens when some days are missing
df = df.sample(frac=.9)
# In case your dates are actually strings
df['date'] = pd.to_datetime(df['date'])
# Make sure they are sorted from oldest to newest
df = df.sort_values(by='date')
df = df.groupby([df['date'].dt.isocalendar().year,
df['date'].dt.isocalendar().week], as_index=False).last()
print(df)
Output
date return
0 2022-04-24 0.299958
1 2022-05-01 0.248471
2 2022-05-08 0.506919
3 2022-05-15 0.541929
4 2022-05-22 0.588768
5 2022-05-27 0.504419

How to construct the daily returns of a index

I should using the snp500 series, which contains the closing prices of S&P500 index for the years 2010-2019, construct the daily returns of this index (returns can be defined a percentage increase in price: $r_1=(P_1-P_0)/P_0$ and convert them to yearly returns, building on the functionx = lambda p,r,n,t: "%"+str(round(p*(1+(r/n))**(n*t),2)/100) Pay attention to the units of measurement. I should assume that there are 252 days in a year. Maybe, I can use the method .shift() for this assignment.
Firstly, I defined the function $r_1=(P_1-P_0)/P_0$
def percentage_increase_in_price():
r_1 = (P_1 - P_0) / P_0
Secondly, I wrote the function for finding the data about the index of snp500 from 2010 to 2019
import pandas as pd
import pandas_datareader.data as web
import datetime as dt
start = dt.datetime(2010, 1, 1)
end = dt.datetime(2019, 12, 31)
snp500 = web.DataReader('SP500', 'fred', start, end)
snp500
Then, I have no idea what my next step is.
Could you advise me on how to complete this task?

How about this?
import pandas as pd
import pandas_datareader.data as web
snp500 = web.DataReader('SP500', 'fred', '2010-01-01', '2019-12-31')
# calculate simple returns
snp500["daily_ret"] = snp500["SP500"].pct_change()
snp500.dropna(inplace=True)
# scale daily returns to annual returns and apply rounding
def annualize(r, n, p, t=1):
return round(p * (1 + r/n)**(n*t),2)/100
snp500["inv"] = snp500["daily_ret"].apply(annualize, p=100, n=252)
Output:
SP500 daily_ret inv
DATE
2012-03-27 1412.52 -0.002817 0.9972
2012-03-28 1405.54 -0.004942 0.9951
2012-03-29 1403.28 -0.001608 0.9984

Numpy busday_count not considering holidays

I have a dataset and I need to calculate working days from a given date to today, excluding the given list of holidays. I will be including weekends.
Date Sample:
This is the code I tried:
import pandas as pd
import numpy as np
from datetime import date
df = pd.read_excel('C:\\sample.xlsx')
#get todays date
df["today"] = date.today()
#Convert data type
start = df["R_REL_DATE"].values.astype('datetime64[D]')
end = df["today"].values.astype('datetime64[D]')
holiday = ['2021-06-19', '2021-06-20']
#Numpy function to find in between days
days = np.busday_count(start, end, weekmask='1111111', holidays=holiday)
#Add this column to dataframe
df["Days"] = days
df
When I run this code, it gives difference between R_REL_DATE and today, but doesn't subtract given holidays.
Please help, I want the given list of holidays deducted from the days.

Make sure today and R_REL_DATE are in pandas datetime format with pd.to_datetime():
import pandas as pd
import numpy as np
import datetime
df = pd.DataFrame({'R_REL_DATE': {0: '7/23/2020', 1: '8/26/2020'},
'DAYS IN QUEUE': {0: 338, 1: 304}})
df["today"] = pd.to_datetime(datetime.date.today())
df["R_REL_DATE"] = pd.to_datetime(df["R_REL_DATE"])
start = df["R_REL_DATE"].values.astype('datetime64[D]')
end = df["today"].values.astype('datetime64[D]')
holiday = ['2021-06-19', '2021-06-20']
#Numpy function to find in between days
days = np.busday_count(start, end, weekmask='1111111', holidays=holiday)
#Add this column to dataframe
df["Days"] = days - 1
df
Out[1]:
R_REL_DATE DAYS IN QUEUE today Days
0 2020-07-23 338 2021-06-27 336
1 2020-08-26 304 2021-06-27 302

Finding the elapsed time between two columns

I recently started using pandas and I am trying to teach myself training models. I have a dataset that has end_time and start_time columns and I am currently struggling to find the time elapsed between these columns in the same row in seconds.
This is the code I tried;
[IN]
from datetime import datetime
from datetime import date
st = pd.to_datetime(df['start_time'], format='%Y-%m-%d')
et = pd.to_datetime(df['end_time'], format='%Y-%m-%d')
print((et-st).dt.days)*60*60*24
[OUT]
0 0
1 0
2 0
3 0
4 0
..
10000 0
Length: 10001, dtype: int64
I looked up other similar questions and where this one differ is, it's connected to a CSV file. I can easily apply the steps with dummy data from the other question solutions but it doesn't work for my case.

See the following. I fabricated some data, if you have a data example that produces the error please feel free to put it in the question.
import pandas as pd
from datetime import datetime
from datetime import date
df = pd.DataFrame({'start_time':pd.date_range('2015-01-01 01:00:00', periods=3), 'end_time':pd.date_range('2015-01-02 02:00:00', periods=3, freq='23H')})
st = pd.to_datetime(df['start_time'], format='%Y-%m-%d')
et = pd.to_datetime(df['end_time'], format='%Y-%m-%d')
diff = et-st
df['seconds'] = diff.dt.total_seconds()

why is my beta different from yahoo finance?

I have some code which calculates the beta of the S&P 500 vs any stock - in this case the ticker symbol "FET". However the result seems to be completely different from what I am seeing on yahoo finance, historical this stock has been very volatile and that would explain the beta value of 1.55 on yahoo finance - http://finance.yahoo.com/q?s=fet. Can someone please advise as to why I am seeing a completely different number (0.0088)? Thanks in advance.
from pandas.io.data import DataReader
from datetime import datetime
from datetime import date
import numpy
import sys
today = date.today()
stock_one = DataReader('FET','yahoo',datetime(2009,1,1), today)
stock_two = DataReader('^GSPC','yahoo',stock_one['Adj Close'].keys()[0], today)
a = stock_one['Adj Close'].pct_change()
b = stock_two['Adj Close'].pct_change()
covariance = numpy.cov(a[1:],b[1:])[0][1]
variance = numpy.var(b[1:])
beta = covariance / variance
print 'beta value ' + str(beta)

Ok, so I played with the code a bit and this is what I have.
from pandas.io.data import DataReader
import pandas.io.data as web
from datetime import datetime
from datetime import date
import numpy
import sys
start = datetime(2009, 1, 1)
today = date.today()
stock1 = 'AAPL'
stock2 = '^GSPC'
stocks = web.DataReader([stock1, stock2],'yahoo', start, today)
# stock_two = DataReader('^GSPC','yahoo', start, today)
a = stocks['Adj Close'].pct_change()
covariance = a.cov() # Cov Matrix
variance = a.var() # Of stock2
var = variance[stock2]
cov = covariance.loc[stock2, stock1]
beta = cov / var
print "The Beta for %s is: " % (stock2), str(beta)
The length of the prices did not equal each other, so there was problem #1. Also when your final line executed found the beta for every value of the cov matrix, which is probably not what you wanted. You don't need to know what the beta is based on cov(0,0) and cov(1,1), you just need to look at cov(0,1) or cov(1,0). Those are the positions in the matrix not the values.
Anyway, here is the answer I got:
The Beta for ^GSPC is: 0.885852632799
* Edit *
Made the code easier to run, and changed it so there is only one line for inputting what stocks you want to pull from Yahoo.

You need to convert the closing Px into correct format for calculation. These prices should be converted into return percentages for both the index and the stock price.

In order to match Yahoo finance, you need to use three years' of monthly Adjusted Close prices.
https://help.yahoo.com/kb/finance/SLN2347.html?impressions=true
Beta
The Beta used is Beta of Equity. Beta is the monthly price change of a
particular company relative to the monthly price change of the S&P500.
The time period for Beta is 3 years (36 months) when available.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Rolling Mean of Rolling Correlation dataframe in Python? - python

Related

Iterating through a range of dates in Python with missing dates

How to construct the daily returns of a index

Numpy busday_count not considering holidays

Finding the elapsed time between two columns

why is my beta different from yahoo finance?

Categories

Resources