I have following data format.
Date Open High Low Close
2018-11-12 **10607.80** 10645.50 10464.05 10482.20
2018-11-13 10451.90 10596.25 10440.55 10582.50
2018-11-14 10634.90 10651.60 10532.70 10576.30
2018-11-15 10580.60 10646.50 10557.50 10616.70
2018-11-16 10644.00 10695.15 10631.15 **10682.20**
2018-11-19 **10731.25** 10774.70 10688.80 10763.40
2018-11-20 10740.10 10740.85 10640.85 10656.20
2018-11-21 10670.95 10671.30 10562.35 10600.05
2018-11-22 10612.65 10646.25 10512.00 **10526.75**
2018-11-26 **10568.30** 10637.80 10489.75 10628.60
2018-11-27 10621.45 10695.15 10596.35 10685.60
2018-11-28 10708.75 10757.80 10699.85 10728.85
2018-11-29 10808.70 10883.05 10782.35 10858.70
2018-11-30 10892.10 10922.45 10835.10 **10876.75**
I want to get the open price of monday and closing price of following Friday.
This is my code for same.
open = df.Open.resample('W-MON').last()
print open.tail(5)
close = df.Close.resample('W-FRI').last().resample('W-MON').first()
print close.tail(5)
weekly_data = pd.concat([open, close], axis=1)
print weekly_data.tail(5)
It gives me correct data for open and close individually, but when i merge to weekly_data, it gives wrong output for close. It shows me previous friday closing price.
How to fix this issue?
You can use shift by -4 days for align both DatetimeIndex:
open = df.Open.resample('W-MON').last()
print (open.tail(5))
Date
2018-11-12 10607.80
2018-11-19 10731.25
2018-11-26 10568.30
2018-12-03 10892.10
Freq: W-MON, Name: Open, dtype: float64
close = df.Close.resample('W-FRI').last().shift(-4, freq='D')
print (close.tail(5))
Date
2018-11-12 10682.20
2018-11-19 10526.75
2018-11-26 10876.75
Freq: W-MON, Name: Close, dtype: float64
weekly_data = pd.concat([open, close], axis=1)
print (weekly_data)
Open Close
Date
2018-11-12 10607.80 10682.20
2018-11-19 10731.25 10526.75
2018-11-26 10568.30 10876.75
2018-12-03 10892.10 NaN
Related
Let's say I have an array of values corresponding to stock symbols: all = {AAPL, TSLA, MSFT}.
I created a function to scrap their historical prices on YFinance.
def Scrapping(symbol):
aapl = yf.Ticker(symbol)
ainfo = aapl.history(period='1y')
global castex_date
castex_date = ainfo.index
return ainfo.Close
all_Assets = list(map(Scrapping, all))
print(all_Assets)
This is an output sample:
[Date
2018-12-12 00:00:00-05:00 53.183998
2018-12-13 00:00:00-05:00 53.095001
Name: Close, Length: 1007, dtype: float64, Date
2018-12-12 00:00:00-05:00 24.440001
2018-12-13 00:00:00-05:00 25.119333
2022-12-08 00:00:00-05:00 247.399994
2022-12-09 00:00:00-05:00 245.419998
Name: Close, Length: 1007, dtype: float64]
The issue is that all these symbols' historical data have the same name 'Output'. When putting all of these in a dataframe, we get:
df = pd.DataFrame(all_Assets)
df2 = df.transpose()
print(df2)
Close Close Close
Date
2018-12-12 00:00:00-05:00 53.183998 24.440001 104.500511
2018-12-13 00:00:00-05:00 53.095001 25.119333 104.854965
2018-12-14 00:00:00-05:00 52.105000 24.380667 101.578560
2018-12-17 00:00:00-05:00 50.826500 23.228001 98.570389
2018-12-18 00:00:00-05:00 51.435501 22.468666 99.605042
This creates an issue when plotting the DF.
I need these columns names to be equal to the 'symbol' parameter of the function. SO, automatically, the column names would be AAPL TSLA MSFT
You can use:
import pandas as pd
import yfinance as yf
all_symbols = ['AAPL', 'TSLA', 'MSFT']
def scrapping(symbol):
ticker = yf.Ticker(symbol)
data = ticker.history(period='1y')
return data['Close'].rename(symbol)
all_assets = map(scrapping, all_symbols)
df = pd.concat(all_assets, axis=1)
One-liner version:
df = pd.concat({symbol: yf.Ticker(symbol).history(period='1y')['Close']
for symbol in all_symbols}, axis=1)
Output:
>>> df
AAPL TSLA MSFT
Date
2022-02-14 167.863144 291.920013 292.261475
2022-02-15 171.749573 307.476654 297.680695
2022-02-16 171.511032 307.796661 297.333221
2022-02-17 167.863144 292.116669 288.626678
2022-02-18 166.292648 285.660004 285.846893
... ... ... ...
2023-02-08 151.688400 201.289993 266.730011
2023-02-09 150.639999 207.320007 263.619995
2023-02-10 151.009995 196.889999 263.100006
2023-02-13 153.850006 194.639999 271.320007
2023-02-14 152.949997 203.729996 272.790009
[252 rows x 3 columns]
I'm trying to convert daily prices into weekly, monthly, quarterly, semesterly, yearly, but the code only works when I run it for one stock. When I add another stock to the list the code crashes and gives two errors. 'ValueError: Length of names must match number of levels in MultiIndex.' and 'TypeError: other must be a MultiIndex or a list of tuples.' I'm not experienced with MultiIndexing and have searched everywhere with no success.
This is the code:
import yfinance as yf
from pandas_datareader import data as pdr
symbols = ['AMZN', 'AAPL']
yf.pdr_override()
df = pdr.get_data_yahoo(symbols, start = '2014-12-01', end = '2021-01-01')
df = df.reset_index()
df.Date = pd.to_datetime(df.Date)
df.set_index('Date', inplace = True)
res = {'Open': 'first', 'Adj Close': 'last'}
dfw = df.resample('W').agg(res)
dfw_ret = (dfw['Adj Close'] / dfw['Open'] - 1)
dfm = df.resample('BM').agg(res)
dfm_ret = (dfm['Adj Close'] / dfm['Open'] - 1)
dfq = df.resample('Q').agg(res)
dfq_ret = (dfq['Adj Close'] / dfq['Open'] - 1)
dfs = df.resample('6M').agg(res)
dfs_ret = (dfs['Adj Close'] / dfs['Open'] - 1)
dfy = df.resample('Y').agg(res)
dfy_ret = (dfy['Adj Close'] / dfy['Open'] - 1)
print(dfw_ret)
print(dfm_ret)
print(dfq_ret)
print(dfs_ret)
print(dfy_ret)```
This is what the original df prints:
```Adj Close Open
AAPL AMZN AAPL AMZN
Date
2014-12-01 26.122288 326.000000 29.702499 338.119995
2014-12-02 26.022408 326.309998 28.375000 327.500000
2014-12-03 26.317518 316.500000 28.937500 325.730011
2014-12-04 26.217640 316.929993 28.942499 315.529999
2014-12-05 26.106400 312.630005 28.997499 316.799988
... ... ... ... ...
2020-12-24 131.549637 3172.689941 131.320007 3193.899902
2020-12-28 136.254608 3283.959961 133.990005 3194.000000
2020-12-29 134.440399 3322.000000 138.050003 3309.939941
2020-12-30 133.294067 3285.850098 135.580002 3341.000000
2020-12-31 132.267349 3256.929932 134.080002 3275.000000
And this is what the different df_ret print when I go from daily
to weekly/monthly/etc but it can only do it for one stock and
the idea is to be able to do it for multiple stocks:
Date
2014-12-07 -0.075387
2014-12-14 -0.013641
2014-12-21 -0.029041
2014-12-28 0.023680
2015-01-04 0.002176
...
2020-12-06 -0.014306
2020-12-13 -0.012691
2020-12-20 0.018660
2020-12-27 -0.008537
2021-01-03 0.019703
Freq: W-SUN, Length: 318, dtype: float64
Date
2014-12-31 -0.082131
2015-01-30 0.134206
2015-02-27 0.086016
2015-03-31 -0.022975
2015-04-30 0.133512
...
2020-08-31 0.085034
2020-09-30 -0.097677
2020-10-30 -0.053569
2020-11-30 0.034719
2020-12-31 0.021461
Freq: BM, Length: 73, dtype: float64
Date
2014-12-31 -0.082131
2015-03-31 0.190415
2015-06-30 0.166595
2015-09-30 0.165108
2015-12-31 0.322681
2016-03-31 -0.095461
2016-06-30 0.211909
2016-09-30 0.167275
2016-12-31 -0.103026
2017-03-31 0.169701
2017-06-30 0.090090
2017-09-30 -0.011760
2017-12-31 0.213143
2018-03-31 0.234932
2018-06-30 0.199052
2018-09-30 0.190349
2018-12-31 -0.257182
2019-03-31 0.215363
2019-06-30 0.051952
2019-09-30 -0.097281
2019-12-31 0.058328
2020-03-31 0.039851
2020-06-30 0.427244
2020-09-30 0.141676
2020-12-31 0.015252
Freq: Q-DEC, dtype: float64
Date
2014-12-31 -0.082131
2015-06-30 0.388733
2015-12-31 0.538386
2016-06-30 0.090402
2016-12-31 0.045377
2017-06-30 0.277180
2017-12-31 0.202181
2018-06-30 0.450341
2018-12-31 -0.107405
2019-06-30 0.292404
2019-12-31 -0.039075
2020-06-30 0.471371
2020-12-31 0.180907
Freq: 6M, dtype: float64
Date
2014-12-31 -0.082131
2015-12-31 1.162295
2016-12-31 0.142589
2017-12-31 0.542999
2018-12-31 0.281544
2019-12-31 0.261152
2020-12-31 0.737029
Freq: A-DEC, dtype: float64```
Without knowing what your df DataFrame looks like I am assuming it is an issue with correctly handling the resampling on a MultiIndex similar to the one talked about in this question.
The solution listed there is to use pd.Grouper with the freq and level parameters filled out correctly.
# This is just from the listed solution so I am not sure if these is the correct level to choose
df.groupby(pd.Grouper(freq='W', level=-1))
If this doesn't work, I think you would need to provide some more detail or a dummy data set to reproduce the issue.
This question already has answers here:
How do I expand the output display to see more columns of a Pandas DataFrame?
(22 answers)
Closed 3 years ago.
I am trying to understand rolling function on pandas on python here is my example code
# importing pandas as pd
import pandas as pd
# By default the "date" column was in string format,
# we need to convert it into date-time format
# parse_dates =["date"], converts the "date" column to date-time format
# Resampling works with time-series data only
# so convert "date" column to index
# index_col ="date", makes "date" column
df = pd.read_csv("apple.csv", parse_dates = ["date"], index_col = "date")
print (df.close.rolling(3).sum())
print (df.close.rolling(3, win_type ='triang').sum())
cvs input file has 255 entries but I get few entries on the output, I get "..." between 2018-10-04 and 2017-12-26. I verified the input file, it has a lot more valid entries in between these dates.
date
2018-11-14 NaN
2018-11-13 NaN
2018-11-12 578.63
2018-11-09 590.87
2018-11-08 607.13
2018-11-07 622.91
2018-11-06 622.21
2018-11-05 615.31
2018-11-02 612.84
2018-11-01 631.29
2018-10-31 648.56
2018-10-30 654.38
2018-10-29 644.40
2018-10-26 641.84
2018-10-25 648.34
2018-10-24 651.19
2018-10-23 657.62
2018-10-22 658.47
2018-10-19 662.69
2018-10-18 655.98
2018-10-17 656.52
2018-10-16 659.36
2018-10-15 660.70
2018-10-12 661.62
2018-10-11 653.92
2018-10-10 652.92
2018-10-09 657.68
2018-10-08 667.00
2018-10-05 674.93
2018-10-04 676.05
...
2017-12-26 512.25
2017-12-22 516.18
2017-12-21 520.59
2017-12-20 524.37
2017-12-19 523.90
2017-12-18 525.31
2017-12-15 524.93
2017-12-14 522.61
2017-12-13 518.46
2017-12-12 516.19
2017-12-11 516.64
2017-12-08 513.74
2017-12-07 511.36
2017-12-06 507.70
2017-12-05 507.97
2017-12-04 508.45
2017-12-01 510.49
2017-11-30 512.70
2017-11-29 512.38
2017-11-28 514.40
2017-11-27 516.64
2017-11-24 522.13
2017-11-22 524.02
2017-11-21 523.07
2017-11-20 518.08
2017-11-17 513.27
2017-11-16 511.23
2017-11-15 510.33
2017-11-14 511.52
2017-11-13 514.39
Name: close, Length: 254, dtype: float64
thank you for your help ...
... just means that pandas isn't showing you all the rows, that's where the 'missing' ones are.
To display all rows:
with pd.option_context("display.max_rows", None):
print (df.close.rolling(3, win_type ='triang').sum())
Imagine I have a dataframe with multindex (stock and date). I would like to calculate the pct_change from column close. I use this code:
data['win'] = data.Close.pct_change()*100
Open Close High Low Volume Adj_Close win
Ticker Date
AAPL 2018-12-14 169.000000 165.479996 169.080002 165.279999 40703700 165.479996 NaN
2018-12-17 165.449997 163.940002 168.350006 162.729996 44287900 163.940002 -0.930622
2018-12-18 165.380005 166.070007 167.529999 164.389999 33841500 166.070007 1.299259
2018-12-19 166.000000 160.889999 167.449997 159.089996 48889400 160.889999 -3.119171
AMZN 2018-12-14 1638.000000 1591.910034 1642.569946 1585.000000 6367200 1591.910034 **889.440017**
2018-12-17 1566.000000 1520.910034 1576.130005 1505.010010 8829800 1520.910034 -4.460051
2018-12-18 1540.000000 1551.479980 1567.550049 1523.010010 6523000 1551.479980 2.009977
2018-12-19 1543.050049 1495.079956 1584.530029 1483.180054 8654400 1495.079956 -3.635240
It works fine, but when it starts with AMZN pct_change, the first one is incorrect because it use the last value of AAPL.
How can I change the formula to calculate the pct_change correctly?
The solution should be this:
Open Close High Low Volume Adj_Close win
Ticker Date
AAPL 2018-12-14 169.000000 165.479996 169.080002 165.279999 40703700 165.479996 NaN
2018-12-17 165.449997 163.940002 168.350006 162.729996 44287900 163.940002 -0.930622
2018-12-18 165.380005 166.070007 167.529999 164.389999 33841500 166.070007 1.299259
2018-12-19 166.000000 160.889999 167.449997 159.089996 48889400 160.889999 -3.119171
AMZN 2018-12-14 1638.000000 1591.910034 1642.569946 1585.000000 6367200 1591.910034 Nan
2018-12-17 1566.000000 1520.910034 1576.130005 1505.010010 8829800 1520.910034 -4.460051
2018-12-18 1540.000000 1551.479980 1567.550049 1523.010010 6523000 1551.479980 2.009977
2018-12-19 1543.050049 1495.079956 1584.530029 1483.180054 8654400 1495.079956 -3.635240
Let's say I have financial data in a pandas.Series, called fin_series.
Here's a peek at fin_series.
In [565]: fin_series
Out[565]:
Date
2008-05-16 1000.000000
2008-05-19 1001.651747
2008-05-20 1004.137434
...
2014-12-22 1158.085200
2014-12-23 1150.139126
2014-12-24 1148.934665
Name: Close, Length: 1665
I'm interested in looking at the quarterly endpoints of the data. However, not all financial trading days fall exactly on the 'end of the quarter.'
For example:
In [566]: fin_series.asfreq('q')
Out[566]:
2008-06-30 976.169624
2008-09-30 819.518923
2008-12-31 760.429261
...
2009-06-30 795.768956
2009-09-30 870.467121
2009-12-31 886.329978
...
2011-09-30 963.304679
2011-12-31 NaN
2012-03-31 NaN
....
2012-09-30 NaN
2012-12-31 1095.757137
2013-03-31 NaN
2013-06-30 NaN
...
2014-03-31 1138.548881
2014-06-30 1168.248194
2014-09-30 1147.000073
Freq: Q-DEC, Name: Close, dtype: float64
Here's a little function that accomplishes what I'd like, along with the desired end result.
def bmg_qt_asfreq(series):
ind = series[1:].index.quarter != series[:-1].index.quarter
ind = numpy.append(ind, True)
return tmp[ind]
which gives me:
In [15]: bmg_asfreq(tmp)
Out[15]:
Date
2008-06-30 976.169425
2008-09-30 819.517607
2008-12-31 760.428770
...
2011-09-30 963.252831
2011-12-30 999.742132
2012-03-30 1049.848583
...
2012-09-28 1086.689824
2012-12-31 1093.943357
2013-03-28 1117.111859
Name: Close, dtype: float64
Note that I'm preserving the dates of the "closest prior price," instead of simply using pandas.asfreq(freq = 'q', method = 'ffill'), as the preservations of dates that exist within the original Series.Index is crucial.
This seems like a silly problem that many people have had and must be addressed by all of the pandas time manipulation functionality, but I can't figure out how to do it with resample or asfreq.
Anyone who could show me the builtin pandas functionality to accomplish this would be greatly appreciated.
Regards,
Assuming the input is a dataframe Series , first do
import pandas as pd
fin_series.resample("q",pd.Series.last_valid_index)
to get a series with the last non-NA index for each quarter. Then
fin_series.resample("q","last")
for the last non-NA value. You can then join these together. As you suggested in your comment:
fin_series[fin_series.resample("q",pd.Series.last_valid_index)]
df.asfreq('d').interpolate().asfreq('q')