Offset Index Charting - python

I am trying to plot two stock prices on an index plot. This plot is very common as it starts both stocks with different prices, at the same place.
See below for a chart of IBM vs. TSLA
def get_historical_closes(ticker, start_date, end_date):
# get the data for the tickers. This will be a panel
p = wb.DataReader(ticker, "yahoo", start_date, end_date)
# convert the panel to a DataFrame and selection only Adj Close
# while making all index levels columns
d = p.to_frame()['Adj Close'].reset_index()
# rename the columns
d.rename(columns={'minor': 'Ticker', 'Adj Close': 'Close'}, inplace=True)
# pivot each ticker to a column
pivoted = d.pivot(index='Date', columns='Ticker')
# and drop the one level on the columns
pivoted.columns = pivoted.columns.droplevel(0)
return pivoted
tickers = ['IBM','TSLA']
start = '2015-12-31'
end ='2016-12-22'
df_ret=get_historical_closes(tickers,start,end).pct_change().replace('NaN',0)
df_ret=np.cumprod(1+df_ret)
df_ret.plot()
As you can see, both start at 1.00.
What I would like to do is to have the convergence at 1.00 be at some arbitrary point in the date index. For example, I would like to see the same chart, except that the lines converge at 1 on July 31, 2016. Thus, offsetting the index convergence at a given point.
Does anyone have any idea how to accomplish this?

Was trying to make it more difficult than what it actually should be. See below:
df_day=df_ret[df_ret.index=='2016-03-31']
df_plot = pd.DataFrame(index=df_ret.index, columns=df_ret.columns)
for col in df_ret.columns: # For each factor
df_plot[col]=df_ret[col]/df_day[col].values
df_plot.plot()

Related

Group by and multiply columns with a shift in Python

I have a dataframe with different companies price history and a dividend adjustment factor in the dataframe df.
I want to calculate the adjusted close price (which considers the dividend adjustment factor) for each company.
I tried some variations of
df['Adj Close'] = df.groupby('companyName')['priceUSD']*df['divAdjFactor'].shift(1)
Picture of a snippet of my original dataframe (non-grouped) and a test view of a filtered frame where I apply the calculation as I want. In the second frame I multiplied 0.595318 with 36.48 (just so happens here the first two divAdjFactor are there same, not always the case). I want to do this calculation on the original dataframe.
testlist = ['General Electric Company']
df_adj = df.query('companyName == #testlist')
df_adj['Adj Close'] = df_adj['priceUSD'] * df_adj['divAdjFactor'].shift(1)
You are close, need DataFrameGroupBy.shift per divAdjFactor:
df['Adj Close'] = df['priceUSD']*df.groupby('companyName')['divAdjFactor'].shift(1)

Accessing last value in a time series dataframe with pandas and plotly

How would I grab the very last value of a time series?
I have a df with timeseries info for many countries, that tracks several variables and does some simple averaging etc.
I just want to grab the most recent value / values for each country and graph it with plotly. I have tried using .last() but not really sure where to fit it into the loop.
I need to grab both the last value for one chart, and the last n values for another chart.
# Daily Change
country = "X"
#Plot rolling average new cases
data = [go.Scatter(x=df_join.loc[f'{country}']['Date'],
y=df_join.loc[f'{country}']['Pct Change'],
mode='lines',
name='Pct Change')]
layout = go.Layout(title=f'{country}: Pct Change')
fig = go.Figure(data=data, layout=layout)
pyo.plot(fig)
IIUC you need to filter your dataframe before hand :
dates = pd.date_range(pd.Timestamp('today'),pd.Timestamp('today') + pd.DateOffset(days=5))
df = pd.DataFrame({'Date' : dates, 'ID' : ['A','A','A','B','B','B']})
df2 = df.loc[df.groupby(['ID'])['Date'].idxmax()]
print(df2)
Date ID
2 2020-05-16 12:26:06.772939 A
5 2020-05-19 12:26:06.772939 B

Pandas Rolling: Return Min and Max Dates, Sum of Exposure

I am trying to implement a rolling window and am struggling with the very last part. As you can see below, the code returns the sum of exposure, attached to the last date in the rolling window. I also want a column that has the first date in the window as well. (they are ordered by date but I am ultimately after the min and max dates for each window as well as the sum of exposure). Trying to take the date out of the index and using min and max function on it produces this error:
NotImplementedError: ops for Rolling for this dtype datetime64[ns] are not implemented
dates = pd.date_range('1/1/2018', periods=24, freq='M')
df = pd.DataFrame(
{
'ID1': ['A']*24,
'ID2': ['B']*24,
'Date': dates,
'EXPOSURE':[1]*24
}
)
df.set_index(['ID1','ID2','Date'], inplace=True)
result = df.groupby(['ID1','ID2']).rolling(12, min_periods = 12).sum()
result.head(100)
Output
Operations on rolling windows are actually limited to aggregation functions
and must be performed on numbers, not on dates.
To circumvent this limitation, notice that the Date from the beginning of
your rolling window of size 12 is actually the Date from 11-th row before.
So you can:
generate result just as you do,
call result2 = df.groupby(['ID1','ID2']).Date.shift(11),
drop 2 top levels from MultiIndex of result and merge both results.

Could someone explain axis=0 or 1 with iteration in Python? [duplicate]

This question already has answers here:
What does axis in pandas mean?
(27 answers)
Closed 3 years ago.
I am working on a class project and have the code I pasted here. It created 3 dataframes of stock data, close prices, volumes and dividends. All the data has been pivoted so the dates are the index, columns are the tickers and values are the previous mentioned. The question asks to create an index of weights based on getting the percentage of cash in each stock (so price x volume at each date/sum(price x volume for all the dates of a particular ticker) (note: this is how we are told to calculate the weight based on the instructions). I wrote the code in the function but initially I set axis=0 because this would add up the values in each column (iteration down the rows). However, this answer was not accepted, and the correct answer is axis=1. This makes no sense to me as this would add up the prices for every stock on a given date rather than for all the dates of a given stock wouldn't it? Am I missing something?
df = pd.read_csv('../../data/project_3/eod-quotemedia.csv')
percent_top_dollar = 0.2
high_volume_symbols = project_helper.large_dollar_volume_stocks(df, 'adj_close', 'adj_volume', percent_top_dollar)
df = df[df['ticker'].isin(high_volume_symbols)]
close = df.reset_index().pivot(index='date', columns='ticker', values='adj_close')
volume = df.reset_index().pivot(index='date', columns='ticker', values='adj_volume')
dividends = df.reset_index().pivot(index='date', columns='ticker', values='dividends')
def generate_dollar_volume_weights(close, volume):
"""
Generate dollar volume weights.
Parameters
----------
close : DataFrame
Close price for each ticker and date
volume : str
Volume for each ticker and date
Returns
-------
dollar_volume_weights : DataFrame
The dollar volume weights for each ticker and date
"""
assert close.index.equals(volume.index)
assert close.columns.equals(volume.columns)
#TODO: Implement function
close_adj = close * volume
return close_adj.apply(lambda x: x/x.sum(), axis=1)
Here is the explanation:
axis=1 does the operations row by row.
axis=0 does the operations column by column.

How to plot stacked time histogram starting from a Pandas DataFrame?

Consider the following DataFrame df:
Date Kind
2018-09-01 13:15:32 Red
2018-09-02 16:13:26 Blue
2018-09-04 22:10:09 Blue
2018-09-04 09:55:30 Red
... ...
In which you have a column with a datetime64[ns] dtype and another which contains a np.object which can assume only a finite number of values (in this case, 2).
You have to plot a date histogram in which you have:
On the x-axis, the dates (per-day histogram showing month and day);
On the y-axis, the number of items belonging to that date, showing in a stacked bar the difference between Blue and Red.
How is it possible to achieve this using Matplotlib?
I was thinking to do a set_index and resample as follows:
df.set_index('Date', inplace=True)
df.resample('1d').count()
But I'm losing the information on the number of items per Kind. I also want to keep any missing day as zero.
Any help very appreciated.
Use groupby, count and unstack to adjust the dataframe:
df2 = df.groupby(['Date', 'Kind'])['Kind'].count().unstack('Kind').fillna(0)
Next, re-sample the dataframe and sum the count for each day. This will also add any missing days that are not in the dataframe (as specified). Then adjust the index to only keep the date part.
df2 = df2.resample('D').sum()
df2.index = df2.index.date
Now plot the dataframe with stacked=True:
df2.plot(kind='bar', stacked=True)
Alternatively, the plt.bar() function can be used for the final plotting:
cols = df['Kind'].unique() # Find all original values in the column
ind = range(len(df2))
p1 = plt.bar(ind, df2[cols[0]])
p2 = plt.bar(ind, df2[cols[1]], bottom=df2[cols[0]])
Here it is necessary to set the bottom argument of each part to be the sum of all the parts that came before.

Categories

Resources