Imagine I have a dataframe with multindex (stock and date). I would like to calculate the pct_change from column close. I use this code:
data['win'] = data.Close.pct_change()*100
Open Close High Low Volume Adj_Close win
Ticker Date
AAPL 2018-12-14 169.000000 165.479996 169.080002 165.279999 40703700 165.479996 NaN
2018-12-17 165.449997 163.940002 168.350006 162.729996 44287900 163.940002 -0.930622
2018-12-18 165.380005 166.070007 167.529999 164.389999 33841500 166.070007 1.299259
2018-12-19 166.000000 160.889999 167.449997 159.089996 48889400 160.889999 -3.119171
AMZN 2018-12-14 1638.000000 1591.910034 1642.569946 1585.000000 6367200 1591.910034 **889.440017**
2018-12-17 1566.000000 1520.910034 1576.130005 1505.010010 8829800 1520.910034 -4.460051
2018-12-18 1540.000000 1551.479980 1567.550049 1523.010010 6523000 1551.479980 2.009977
2018-12-19 1543.050049 1495.079956 1584.530029 1483.180054 8654400 1495.079956 -3.635240
It works fine, but when it starts with AMZN pct_change, the first one is incorrect because it use the last value of AAPL.
How can I change the formula to calculate the pct_change correctly?
The solution should be this:
Open Close High Low Volume Adj_Close win
Ticker Date
AAPL 2018-12-14 169.000000 165.479996 169.080002 165.279999 40703700 165.479996 NaN
2018-12-17 165.449997 163.940002 168.350006 162.729996 44287900 163.940002 -0.930622
2018-12-18 165.380005 166.070007 167.529999 164.389999 33841500 166.070007 1.299259
2018-12-19 166.000000 160.889999 167.449997 159.089996 48889400 160.889999 -3.119171
AMZN 2018-12-14 1638.000000 1591.910034 1642.569946 1585.000000 6367200 1591.910034 Nan
2018-12-17 1566.000000 1520.910034 1576.130005 1505.010010 8829800 1520.910034 -4.460051
2018-12-18 1540.000000 1551.479980 1567.550049 1523.010010 6523000 1551.479980 2.009977
2018-12-19 1543.050049 1495.079956 1584.530029 1483.180054 8654400 1495.079956 -3.635240
Related
I have a large dataset with daily stock prices and company codes. I need to compute the 52 week high for each stock at every point in time based on the previous 52 weeks. The problem is that some of the companies do not necessarily have data in between some periods and thus, if I use a fixed window size for a rolling max then the results are not correct.
First I tried this:
df['52wh'] = df["PRC"].groupby(df['id']).shift(1).rolling(253).max()
However, this doesn't work since it does not take into account the dates but only the previous 253 entries.
I also tried this:
df['date'] = pd.to_datetime(df['date'])
df['52wh'] = df.set_index('date').groupby('id').rolling(window=365, freq='D', min_periods=1).max()['PRC']
But this gives me this error:
ValueError: cannot handle a non-unique multi-index!
I am thinking maybe a rolling function with get bounds could work but I don't know how to write a good one.
Here is an example of how the data frame looks like
date id PRC
0 2010-01-09 10158 11.87
1 2010-01-10 10158 12.30
2 2010-01-11 10158 12.37
3 2010-01-12 10158 12.89
4 2010-02-08 10158 10.13
... ... ... ...
495711 2018-12-12 93188 14.48
495712 2018-12-13 93188 14.48
495713 2018-12-14 93188 14.48
495714 2018-12-17 93188 14.48
495715 2018-12-18 93188 NaN
Can someone help? Thanks in advance guys! :)
Given a DataFrame of stock prices, I am interested in filtering based on the latest closing price. I am aware of how to do this for a simple DataFrame, but cannot figure out how to do it for a multi-indexed dataframe.
Simple dataframe:
AAPL AMZN GOOG MSFT
2021-02-08 136.91 3322.94 2092.91 242.47
2021-02-09 136.01 3305.00 2083.51 243.77
2021-02-10 135.39 3286.58 2095.38 242.82
2021-02-11 135.13 3262.13 2095.89 244.49
2021-02-12 135.37 3277.71 2104.11 244.99
Operation: df.loc[:,df.iloc[-1] < 250]
Output:
AAPL MSFT
2021-02-08 136.91 242.47
2021-02-09 136.01 243.77
2021-02-10 135.39 242.82
2021-02-11 135.13 244.49
2021-02-12 135.37 244.99
However I cannot figure out how to accomplish this on a DataFrame with a MultiIndex (such as OHLC)
Multiindex DataFrame:
Close High Low ... Open Volume
AAPL AMZN GOOG MSFT AAPL AMZN GOOG MSFT AAPL ... MSFT AAPL AMZN GOOG MSFT AAPL AMZN GOOG MSFT
2021-02-08 136.91 3322.94 2092.91 242.47 136.96 3365.00 2123.55 243.68 134.92 ... 240.81 136.03 3358.50 2105.91 243.15 71297200 3257400 1241900 22211900
2021-02-09 136.01 3305.00 2083.51 243.77 137.88 3338.00 2105.13 244.76 135.85 ... 241.38 136.62 3312.49 2078.54 241.87 76774200 2203500 889900 23565000
2021-02-10 135.39 3286.58 2095.38 242.82 136.99 3317.95 2108.37 245.92 134.40 ... 240.89 136.48 3314.00 2094.21 245.00 73046600 3151600 1135500 22186700
2021-02-11 135.13 3262.13 2095.89 244.49 136.39 3292.00 2102.03 245.15 133.77 ... 242.15 135.90 3292.00 2099.51 244.78 64280000 2301400 945700 15751100
2021-02-12 135.37 3277.71 2104.11 244.99 135.53 3280.25 2108.82 245.30 133.69 ... 242.73 134.35 3250.00 2090.25 243.93 60029300 2329300 855700 16552000
[5 rows x 20 columns]
Filter: df_filter = df.iloc[-1].loc['Close'] < 250
AAPL True
AMZN False
GOOG False
MSFT True
Name: 2021-02-12 00:00:00, dtype: bool
Operation???:
Maybe something like df.loc[:,filter] but I receive the error:
pandas.core.indexing.IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match)
I understand it's a multi-index so I also tried using pd.IndexSlice: df.loc[:,idx[:,df_filter]] but still get:
ValueError: cannot index with a boolean indexer that is not the same length as the index
Desired Output:
Close High Low Open Volume
AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT
2021-02-08 136.91 242.47 136.96 243.68 134.92 240.81 136.03 243.15 71297200 22211900
2021-02-09 136.01 243.77 137.88 244.76 135.85 241.38 136.62 241.87 76774200 23565000
2021-02-10 135.39 242.82 136.99 245.92 134.40 240.89 136.48 245.00 73046600 22186700
2021-02-11 135.13 244.49 136.39 245.15 133.77 242.15 135.90 244.78 64280000 15751100
2021-02-12 135.37 244.99 135.53 245.30 133.69 242.73 134.35 243.93 60029300 16552000
I'm not sure if IndexSlice works with boolean indexing. You can try passing the valid index:
df.loc[:,pd.IndexSlice[:, df_filter.index[df_filter]]]
How can we convert a Pandas DataFrame containing a MultiIndex column, such as
FB AAPL
open volume open volume
date
2019-10-30 189.56 28734995 244.76 31130522
2019-10-31 196.70 42286529 247.24 34790520
2019-11-01 192.85 21711829 249.54 37781334
to one with regular columns, where one of the index level is now a column in all rows
open volume ticker
date
2019-10-30 189.56 28734995 FB
2019-10-31 196.70 42286529 FB
2019-11-01 192.85 21711829 FB
2019-10-30 244.76 31130522 AAPL
2019-10-31 247.24 34790520 AAPL
2019-11-01 249.54 37781334 AAPL
Main idea is use DataFrame.stack with DataFrame.reset_index for convert MultiIndex second level to column:
df = df.stack(0).rename_axis(('date','ticker')).reset_index(level=1)
print (df)
ticker open volume
date
2019-10-30 AAPL 244.76 31130522
2019-10-30 FB 189.56 28734995
2019-10-31 AAPL 247.24 34790520
2019-10-31 FB 196.70 42286529
2019-11-01 AAPL 249.54 37781334
2019-11-01 FB 192.85 21711829
If ordering is important then is used ordered catagoricals for tickers, sorting and for column to last position reassign with DataFrame.pop:
df1 = df.stack(0).rename_axis(('date','ticker')).reset_index(level=1)
df1['ticker'] = pd.Categorical(df1.pop('ticker'),
ordered=True,
categories=df.columns.get_level_values(0).unique())
df1 = df1.sort_values(['ticker','date'])
print (df1)
open volume ticker
date
2019-10-30 189.56 28734995 FB
2019-10-31 196.70 42286529 FB
2019-11-01 192.85 21711829 FB
2019-10-30 244.76 31130522 AAPL
2019-10-31 247.24 34790520 AAPL
2019-11-01 249.54 37781334 AAPL
I have following data format.
Date Open High Low Close
2018-11-12 **10607.80** 10645.50 10464.05 10482.20
2018-11-13 10451.90 10596.25 10440.55 10582.50
2018-11-14 10634.90 10651.60 10532.70 10576.30
2018-11-15 10580.60 10646.50 10557.50 10616.70
2018-11-16 10644.00 10695.15 10631.15 **10682.20**
2018-11-19 **10731.25** 10774.70 10688.80 10763.40
2018-11-20 10740.10 10740.85 10640.85 10656.20
2018-11-21 10670.95 10671.30 10562.35 10600.05
2018-11-22 10612.65 10646.25 10512.00 **10526.75**
2018-11-26 **10568.30** 10637.80 10489.75 10628.60
2018-11-27 10621.45 10695.15 10596.35 10685.60
2018-11-28 10708.75 10757.80 10699.85 10728.85
2018-11-29 10808.70 10883.05 10782.35 10858.70
2018-11-30 10892.10 10922.45 10835.10 **10876.75**
I want to get the open price of monday and closing price of following Friday.
This is my code for same.
open = df.Open.resample('W-MON').last()
print open.tail(5)
close = df.Close.resample('W-FRI').last().resample('W-MON').first()
print close.tail(5)
weekly_data = pd.concat([open, close], axis=1)
print weekly_data.tail(5)
It gives me correct data for open and close individually, but when i merge to weekly_data, it gives wrong output for close. It shows me previous friday closing price.
How to fix this issue?
You can use shift by -4 days for align both DatetimeIndex:
open = df.Open.resample('W-MON').last()
print (open.tail(5))
Date
2018-11-12 10607.80
2018-11-19 10731.25
2018-11-26 10568.30
2018-12-03 10892.10
Freq: W-MON, Name: Open, dtype: float64
close = df.Close.resample('W-FRI').last().shift(-4, freq='D')
print (close.tail(5))
Date
2018-11-12 10682.20
2018-11-19 10526.75
2018-11-26 10876.75
Freq: W-MON, Name: Close, dtype: float64
weekly_data = pd.concat([open, close], axis=1)
print (weekly_data)
Open Close
Date
2018-11-12 10607.80 10682.20
2018-11-19 10731.25 10526.75
2018-11-26 10568.30 10876.75
2018-12-03 10892.10 NaN
I have the dataframe below, which has several stocks value for about 200 companies, I am trying to find a way to for loop and build a new dataframe which includes these companies' different yearly feature
Date Symbol Open High Low Close Volume Daily Return
2016-01-04 AAPL 102.61 105.37 102.00 105.35 67281190 0.025703
2016-01-05 AAPL 105.75 105.85 102.41 102.71 55790992 0.019960
2016-12-28 AMZN 776.25 780.00 770.50 772.13 3301025 0.009122
2016-12-29 AMZN 772.40 773.40 760.85 765.15 3158299 0.020377
I have tried different way, the closest I have come is:
stocks_features = pd.DataFrame(data=stocks_data.Symbol.unique(), columns = ['Symbol'])
stocks_features['Max_Yearly_Price'] = stocks_data['High'].max()
stocks_features['Min_Yearly_Price'] = stocks_data['Low'].min()
stocks_features
But it gives me the same values for all stocks:
Symbol Max_Yearly_Price Min_Yearly_Price
AAPL 847.21 89.47
AMZN 847.21 89.47
What I am doing wrong, how can I accomplish this?
By using groupby agg
df.groupby('Symbol').agg({'High':'max','Low':'min'}).\
rename(columns={'High':'Max_Yearly_Price','Low':'Min_Yearly_Price'})
Out[861]:
Max_Yearly_Price Min_Yearly_Price
Symbol
AAPL 105.85 102.00
AMZN 780.00 760.85
Wen's answer is great as well. I had a different way of solving it. I'll explain as I go along:
# creates a dictionary of all the symbols and their max values
value_maps = dict(stocks_features.loc[stocks_features.\
groupby('Symbol').High.agg('idxmax')][['Symbol', 'High']].values)
# sets Max_Yearly_Price equal to the symbol
stocks_features['Max_Yearly_Price'] = stocks_features['Symbol']
# replaces the symbol wiht the corresponding value from the dicitonary
stocks_features['Max_Yearly_Price'] = stocks_features['Max_Yearly_Price'].map(value_maps)
# ouput
Date Symbol Open High Low Close Volume Daily Return Max_Yearly_Price
0 2016-01-04 AAPL 102.61 105.37 102.00 105.35 672811900.025703 NaN 105.85
1 2016-01-05 AAPL 105.75 105.85 102.41 102.71 557909920.019960 NaN 105.85
2 2016-12-28 AMZN 776.25 780.00 770.50 772.13 33010250.009122 NaN 780.00
3 2016-12-29 AMZN 772.40 773.40 760.85 765.15 31582990.020377 NaN 780.00