How to calculate stock pullback - python

I am trying to calculate the pullback (percentage change) off of its high. Not necessarily change from high to today, but percentage change from high to the lowest point after that high.
Where I am drawing a blank, is I don't know where to begin with finding the lowest point in the stock after the high for the stock. I can find the high for each stock, but how do I trim that column so that it only has the stock prices after that high?
import numpy as np
import pandas as pd
import datetime as dt
import pandas.io.data as web
stocks = ['AAPL', 'NFLX', 'MSFT', 'MCD', 'DIS']
start = dt.datetime(2015, 1, 1)
end = dt.datetime.today()
df = web.DataReader(stocks, 'yahoo', start, end)
df = df['Close']
dfMax = df.max()
From here, I have 5 columns, one column for each stock, and the subsequent prices on each day. I am stumped...

First, you need to use the Adj Close price so that you can accurately measure daily returns (i.e. so your results aren't impacted by splits and dividends).
To calculate the forward min (i.e. the lowest point AFTER the most recent max high), perform a cummin on the prices sorted in reverse order, and then reverse again: df[::-1].cummin()[::-1].
The pullback from the cumulative max price is one minus the ratio of this forward min price to the cumulative max price: 1 - df[::-1].cummin()[::-1] / df.cummax()
df = web.DataReader(stocks, 'yahoo', start, end)['Adj Close']
df_pullback = 1 - df[::-1].cummin()[::-1] / df.cummax()
df_pullback.plot()

Related

calculate monthly customer churn with the 1st of each month

I am working with a subscription based data set of which this is an exemplar:
import pandas as pd
import numpy as np
from datetime import timedelta
start_date = pd.date_range(start = "2015-01-09", end = "2022-09-11", freq = "6D")
cancel_date = [start_date + timedelta(days = np.random.exponential(scale = 100)) for start_date in start_date]
churned = [random.randint(0, 1) for i in range(len(start_date))]; churned = [bool(x) for x in churned]
df = pd.DataFrame(
{"start_date":start_date,
"cancel_date":cancel_date,
"churned":churned}
)
df["cancel_date"] = df["cancel_date"].dt.date
df["cancel_date"] = df["cancel_date"].astype("datetime64[ns]")
I need a way to calculate monthly customer churn in python using the following steps:
Firstly, I need to obtain the number of subscriptions that started before the 1st of each month that are still active
Secondly, I need to obtain the number of subscriptions that started before the 1st of each month and which were cancelled after the 1st of each month
These two steps constitute the denominator of the monthly calculation
Finally, I need to obtain the number of subscriptions that cancelled in each month
This step produces the numerator of the monthly calculation.
The numerator and the denominator are divided and multiplied by 100 to obtain the percentage of customers that churn each month
I am really really lost with this problem can someone please point me in the right direction - I have been working on this problem for so long

How to construct the daily returns of a index

I should using the snp500 series, which contains the closing prices of S&P500 index for the years 2010-2019, construct the daily returns of this index (returns can be defined a percentage increase in price: $r_1=(P_1-P_0)/P_0$ and convert them to yearly returns, building on the functionx = lambda p,r,n,t: "%"+str(round(p*(1+(r/n))**(n*t),2)/100) Pay attention to the units of measurement. I should assume that there are 252 days in a year. Maybe, I can use the method .shift() for this assignment.
Firstly, I defined the function $r_1=(P_1-P_0)/P_0$
def percentage_increase_in_price():
r_1 = (P_1 - P_0) / P_0
Secondly, I wrote the function for finding the data about the index of snp500 from 2010 to 2019
import pandas as pd
import pandas_datareader.data as web
import datetime as dt
start = dt.datetime(2010, 1, 1)
end = dt.datetime(2019, 12, 31)
snp500 = web.DataReader('SP500', 'fred', start, end)
snp500
Then, I have no idea what my next step is.
Could you advise me on how to complete this task?
How about this?
import pandas as pd
import pandas_datareader.data as web
snp500 = web.DataReader('SP500', 'fred', '2010-01-01', '2019-12-31')
# calculate simple returns
snp500["daily_ret"] = snp500["SP500"].pct_change()
snp500.dropna(inplace=True)
# scale daily returns to annual returns and apply rounding
def annualize(r, n, p, t=1):
return round(p * (1 + r/n)**(n*t),2)/100
snp500["inv"] = snp500["daily_ret"].apply(annualize, p=100, n=252)
Output:
SP500 daily_ret inv
DATE
2012-03-27 1412.52 -0.002817 0.9972
2012-03-28 1405.54 -0.004942 0.9951
2012-03-29 1403.28 -0.001608 0.9984

Difference between multi year timeseries and it's 'standard year'

Assume I've a timeseries of a certain number of years as in:
rng = pd.date_range(start = '2001-01-01',periods = 5113)
ts = pd.TimeSeries(np.random.randn(len(rng)), rng)
Than I can calculate it's standard year (the average value of each day over all years) by doing:
std = ts.groupby([ts.index.month, ts.index.day]).mean()
Now I was wondering how I could subtract my multi-year timeseries from this standard year, in order to get a timeseries that show which days were below or above it's standard.
You can do this using the groupby, just subtract each group's mean from the values for that group:
average_diff = ts.groupby([ts.index.month, ts.index.day]).apply(
lambda g: g - g.mean()
)

why is my beta different from yahoo finance?

I have some code which calculates the beta of the S&P 500 vs any stock - in this case the ticker symbol "FET". However the result seems to be completely different from what I am seeing on yahoo finance, historical this stock has been very volatile and that would explain the beta value of 1.55 on yahoo finance - http://finance.yahoo.com/q?s=fet. Can someone please advise as to why I am seeing a completely different number (0.0088)? Thanks in advance.
from pandas.io.data import DataReader
from datetime import datetime
from datetime import date
import numpy
import sys
today = date.today()
stock_one = DataReader('FET','yahoo',datetime(2009,1,1), today)
stock_two = DataReader('^GSPC','yahoo',stock_one['Adj Close'].keys()[0], today)
a = stock_one['Adj Close'].pct_change()
b = stock_two['Adj Close'].pct_change()
covariance = numpy.cov(a[1:],b[1:])[0][1]
variance = numpy.var(b[1:])
beta = covariance / variance
print 'beta value ' + str(beta)
Ok, so I played with the code a bit and this is what I have.
from pandas.io.data import DataReader
import pandas.io.data as web
from datetime import datetime
from datetime import date
import numpy
import sys
start = datetime(2009, 1, 1)
today = date.today()
stock1 = 'AAPL'
stock2 = '^GSPC'
stocks = web.DataReader([stock1, stock2],'yahoo', start, today)
# stock_two = DataReader('^GSPC','yahoo', start, today)
a = stocks['Adj Close'].pct_change()
covariance = a.cov() # Cov Matrix
variance = a.var() # Of stock2
var = variance[stock2]
cov = covariance.loc[stock2, stock1]
beta = cov / var
print "The Beta for %s is: " % (stock2), str(beta)
The length of the prices did not equal each other, so there was problem #1. Also when your final line executed found the beta for every value of the cov matrix, which is probably not what you wanted. You don't need to know what the beta is based on cov(0,0) and cov(1,1), you just need to look at cov(0,1) or cov(1,0). Those are the positions in the matrix not the values.
Anyway, here is the answer I got:
The Beta for ^GSPC is: 0.885852632799
* Edit *
Made the code easier to run, and changed it so there is only one line for inputting what stocks you want to pull from Yahoo.
You need to convert the closing Px into correct format for calculation. These prices should be converted into return percentages for both the index and the stock price.
In order to match Yahoo finance, you need to use three years' of monthly Adjusted Close prices.
https://help.yahoo.com/kb/finance/SLN2347.html?impressions=true
Beta
The Beta used is Beta of Equity. Beta is the monthly price change of a
particular company relative to the monthly price change of the S&P500.
The time period for Beta is 3 years (36 months) when available.

How can I make pandas treat the start of the next business day as the next time after the previous business day?

I have financial trade data (timestamped with the trade time, so there are duplicate times and the datetimes are irregularly spaced). Basically I have just a datetime column and a price column in a pandas dataframe, and I've calculated returns, but I want to linearly interpolate the data so that I can get an estimate of prices every second, minute, day, etc...
It seems the best way to do this is treat the beginning of a Tuesday as occurring just after the end of Monday, so essentially modding out by the time between days. Does pandas provide an easy way to do this? I've searched the documentation and found BDay, but that doesn't seem to do what I want.
Edit: Here's a sample of my code:
df = read_csv(filePath,usecols=[0,4]) #column 0 is date_time and column 4 is price
df.date_time = pd.to_datetime(df.date_time,format = '%m-%d-%Y %H:%M:%S.%f')
def get_returns(df):
return np.log(df.Price.shift(1) / df.Price)
But my issue is that this is trade data, so that I have every trade that occurs for a given stock over some time period, trading happens only during a trading day (9:30 am - 4 pm), and the data is timestamped. I can take the price that every trade happens at and make a price series, but when I calculate kurtosis and other stylized facts, I'm getting very strange results because these sorts of statistics are usually run on evenly spaced time series data.
What I started to do was write code to interpolate my data linearly so that I could get the price every 10 seconds, minute, 10 minutes, hour, day, etc. However, with business days, weekends, holidays, and all the time where trading can't happen, I want to make python think that the only time which exists is during a business day, so that my real world times still match up with the correct date times, but not such that I need a price stamp for all the times when trading is closed.
def lin_int_tseries(series, timeChange):
tDelta = datetime.timedelta(seconds=timeChange)
data_times = series['date_time']
new_series = []
sample_times = []
sample_times.append(data_times[0])
while max(sample_times) < max(data_times):
sample_times.append(sample_times[-1] + tDelta)
for position,time in enumerate(sample_times):
try:
ind = data_times.index(time)
new_series.append(series[ind])
except:
t_next = getnextTime(time,data_times) #get next largest timestamp in data
t_prev = getprevTime(time,data_times) #get next smallest timestamp in data
ind_next = data_times.index(t_next) #index of next largest timestamp
ind_prev = data_times.index(t_prev) #index of next smallest timestamp
p_next = series[ind_next][1] #price at next timestamp
p_prev = series[ind_prev][1] #price a prev timestamp
omega = (float(time) - t_prev)/(t_next - t_prev) #linear interpolation
p_interp = (1 - omega)*p_prev + omega*p_next
new_series.append([time,p_interp])
return new_series
Sorry if it's still unclear. I just want to find some way to stitch the end of one trading day to the beginning of the next trading day, while not losing the actual datetime information.
You should use pandas resample:
df=df.resample("D")

Categories

Resources