Group by and multiply columns with a shift in Python - python

I have a dataframe with different companies price history and a dividend adjustment factor in the dataframe df.
I want to calculate the adjusted close price (which considers the dividend adjustment factor) for each company.
I tried some variations of
df['Adj Close'] = df.groupby('companyName')['priceUSD']*df['divAdjFactor'].shift(1)
Picture of a snippet of my original dataframe (non-grouped) and a test view of a filtered frame where I apply the calculation as I want. In the second frame I multiplied 0.595318 with 36.48 (just so happens here the first two divAdjFactor are there same, not always the case). I want to do this calculation on the original dataframe.
testlist = ['General Electric Company']
df_adj = df.query('companyName == #testlist')
df_adj['Adj Close'] = df_adj['priceUSD'] * df_adj['divAdjFactor'].shift(1)

You are close, need DataFrameGroupBy.shift per divAdjFactor:
df['Adj Close'] = df['priceUSD']*df.groupby('companyName')['divAdjFactor'].shift(1)

Related

Calculating the annualized average returns with resample('Y') and without

I'm trying to calc the annualized return of Amazon stock and can't figure out the main difference between the following approaches
df = pdr.get_data_yahoo('amzn',datetime(2015, 1, 1),datetime(2019, 12, 31))['Adj Close']
1)df.pct_change()).mean()*252
Result = 0,400
2)df.resample('Y').last().pct_change().mean()
Result = 0,472
Why there is a difference about 7% ?
After reading the doc for the functions, I'd like go through an example of resampling time series data for a better understanding.
With resample method the price column of the DataFrame is grouped by a certain time span, in this case the 'Y' indicates a resampling by year and with last() we get the price value at the end of each year.
data.resample('Y').last()
Output: 1. Step
Next, with pct_change() we calculate the percentage change between the values for each row and the previous rows which are the price values at the end of each year that we got before.
data.resample('Y').last().pct_change()
Output: 2. Step
Now, in the final step we calculate the mean percentage change during the entire time period by using the mean() method
data.resample('Y').last().pct_change().mean()
Output: 3. Step
like #itprorh66 already wrote, the main difference between the two approaches is just about when the mean of the values is calculated.

Renaming dataframes in python loop

I have a database that includes monthly time series data on around 15 different indicators. The data is all in the same format, year-to-date values and year-to-date growth. January data is missing, with data for each indicator starting with the year-to-date total as of February.
For each indicator I want to turn the year-to-date data into monthly values. The code below does that.
But I want to be able to run this as a loop over all the 15 indictators, and then automatically rename each dataframe that results to include a reference to the category it belongs to. For example, one category of data is sales in value terms, so when I apply the code to that category, I want the output of df_m to be renamed as sales_m, and df_yoy as sales_yoy.
I thought I could so this by defining a list of the 15 indicators to start with, and then somehow assigning that list to the dataframes produced by the loop. But I can't make that work.
category = ['sales', 'construction']
df_m = df.loc[:, df.columns.str.contains('Monthly')]
df_ytd = df.drop(df.filter(regex='Monthly').columns, axis=1)
df_ytd = df_ytd.fillna(method='bfill', limit=1)
df_ytd.loc[df_ytd.index.month.isin([1,2]), :] = df_ytd / 2
df_ytd.columns = df_ytd.columns.str.replace(', YTD', '')
df_m.columns = df_m.columns.str.replace('YTD, ', '').str.replace(', Monthly', '')
df_m = df_m.fillna(df_ytd)
df_yoy = df_m.pct_change(periods=12) * 100
sales_m = df_m

DataFrame VWAP Does not match TradingView

Not sure why I cannot get my DataFrame VWAP calculations to TradingView version at this link: https://www.tradingview.com/support/solutions/43000502018-volume-weighted-average-price-vwap/
They provide a simple calculation method which I can duplicate in a DataFrame but my calculations do not match. I believe it has something to do with the TradingView “Anchor Session” setting. Not sure how to adjust or add to my DataFrame calculations to match TradingView. I have also tried the Python Technical Analysis Library which does not match TradingView.
Simple Calculation
There are five steps in calculating VWAP:
Calculate the Typical Price for the period.
[(High + Low + Close)/3)]
Multiply the Typical Price by the period Volume.
(Typical Price x Volume)
Create a Cumulative Total of Typical Price.
Cumulative(Typical Price x Volume)
Create a Cumulative Total of Volume.
Cumulative(Volume)
Divide the Cumulative Totals.
VWAP = Cumulative(Typical Price x Volume) / Cumulative(Volume)
Anchor Period
Indicator calculation period. This setting specifies the Anchor, i.e. how frequently the VWAP calculation will be reset. For VWAP to work properly, each VWAP period should include several bars inside of it, so e.g. setting Anchor to 'Session' and timeframe to '1D' is not useful because the indicator will be reset on every bar.
Possible values: Session, Week, Month, Quarter, Year, Decade, Century, Earnings (reset on earnings), Dividends (reset on dividends), Splits (reset on splits).
# My VWAP - DataFrame Code:
df['Typ_Price'] = (df['high'] + df['low'] + df['close'] ) /3
df['Typ_PriceVol'] = df['Typ_Price'] * df['volume']
df['Cum_Vol_Price'] = df['Typ_PriceVol'].cumsum()
df['Cum_Vol'] = df['volume'].cumsum()
df['VWAP'] = df['Cum_Vol_Price'] / df['Cum_Vol']
print(df)

I need to calculate annual daily returns for 252 days of 30 stocks in a dataframe and print them separately

I have 30 stock's Adj Close price in a dataframe. I need to calculate annual daily returns and annual volatility for each of these 30 stocks
I need to do this operation for 30 stocks in this dataframe i.e. 30 columns
operation on one column is performed in the following way:
variance = data['Axis Bank'].var()
daily_volatility = np.sqrt(variance)
annual_volatility = np.sqrt(252)*variance
Is there a method to perform above operation in a loop for all the columns in the dataframe?
I Tried this loop, but its not working, i cant take these values in a variable
for columns in data.columns.values.tolist():
variance = data[columns].var()
daily_volatility = np.sqrt(variance)
annual_volatility = np.sqrt(252)*daily_volatility
print(annual_volatility)
I think loop here is no necessary, get std by all columns and then multiple by scalar (if all columns in Dataframes is necessary processing):
df = adj_close.std()
print("Annual HDFC STD Daily Returns:", df*np.sqrt(252))

Offset Index Charting

I am trying to plot two stock prices on an index plot. This plot is very common as it starts both stocks with different prices, at the same place.
See below for a chart of IBM vs. TSLA
def get_historical_closes(ticker, start_date, end_date):
# get the data for the tickers. This will be a panel
p = wb.DataReader(ticker, "yahoo", start_date, end_date)
# convert the panel to a DataFrame and selection only Adj Close
# while making all index levels columns
d = p.to_frame()['Adj Close'].reset_index()
# rename the columns
d.rename(columns={'minor': 'Ticker', 'Adj Close': 'Close'}, inplace=True)
# pivot each ticker to a column
pivoted = d.pivot(index='Date', columns='Ticker')
# and drop the one level on the columns
pivoted.columns = pivoted.columns.droplevel(0)
return pivoted
tickers = ['IBM','TSLA']
start = '2015-12-31'
end ='2016-12-22'
df_ret=get_historical_closes(tickers,start,end).pct_change().replace('NaN',0)
df_ret=np.cumprod(1+df_ret)
df_ret.plot()
As you can see, both start at 1.00.
What I would like to do is to have the convergence at 1.00 be at some arbitrary point in the date index. For example, I would like to see the same chart, except that the lines converge at 1 on July 31, 2016. Thus, offsetting the index convergence at a given point.
Does anyone have any idea how to accomplish this?
Was trying to make it more difficult than what it actually should be. See below:
df_day=df_ret[df_ret.index=='2016-03-31']
df_plot = pd.DataFrame(index=df_ret.index, columns=df_ret.columns)
for col in df_ret.columns: # For each factor
df_plot[col]=df_ret[col]/df_day[col].values
df_plot.plot()

Categories

Resources