Turning daily stock prices into weekly/monthly/quarterly/semester/yearly?

Turning daily stock prices into weekly/monthly/quarterly/semester/yearly? - python

I'm trying to convert daily prices into weekly, monthly, quarterly, semesterly, yearly, but the code only works when I run it for one stock. When I add another stock to the list the code crashes and gives two errors. 'ValueError: Length of names must match number of levels in MultiIndex.' and 'TypeError: other must be a MultiIndex or a list of tuples.' I'm not experienced with MultiIndexing and have searched everywhere with no success.
This is the code:
import yfinance as yf
from pandas_datareader import data as pdr
symbols = ['AMZN', 'AAPL']
yf.pdr_override()
df = pdr.get_data_yahoo(symbols, start = '2014-12-01', end = '2021-01-01')
df = df.reset_index()
df.Date = pd.to_datetime(df.Date)
df.set_index('Date', inplace = True)
res = {'Open': 'first', 'Adj Close': 'last'}
dfw = df.resample('W').agg(res)
dfw_ret = (dfw['Adj Close'] / dfw['Open'] - 1)
dfm = df.resample('BM').agg(res)
dfm_ret = (dfm['Adj Close'] / dfm['Open'] - 1)
dfq = df.resample('Q').agg(res)
dfq_ret = (dfq['Adj Close'] / dfq['Open'] - 1)
dfs = df.resample('6M').agg(res)
dfs_ret = (dfs['Adj Close'] / dfs['Open'] - 1)
dfy = df.resample('Y').agg(res)
dfy_ret = (dfy['Adj Close'] / dfy['Open'] - 1)
print(dfw_ret)
print(dfm_ret)
print(dfq_ret)
print(dfs_ret)
print(dfy_ret)```
This is what the original df prints:
```Adj Close Open
AAPL AMZN AAPL AMZN
Date
2014-12-01 26.122288 326.000000 29.702499 338.119995
2014-12-02 26.022408 326.309998 28.375000 327.500000
2014-12-03 26.317518 316.500000 28.937500 325.730011
2014-12-04 26.217640 316.929993 28.942499 315.529999
2014-12-05 26.106400 312.630005 28.997499 316.799988
... ... ... ... ...
2020-12-24 131.549637 3172.689941 131.320007 3193.899902
2020-12-28 136.254608 3283.959961 133.990005 3194.000000
2020-12-29 134.440399 3322.000000 138.050003 3309.939941
2020-12-30 133.294067 3285.850098 135.580002 3341.000000
2020-12-31 132.267349 3256.929932 134.080002 3275.000000
And this is what the different df_ret print when I go from daily
to weekly/monthly/etc but it can only do it for one stock and
the idea is to be able to do it for multiple stocks:
Date
2014-12-07 -0.075387
2014-12-14 -0.013641
2014-12-21 -0.029041
2014-12-28 0.023680
2015-01-04 0.002176
...
2020-12-06 -0.014306
2020-12-13 -0.012691
2020-12-20 0.018660
2020-12-27 -0.008537
2021-01-03 0.019703
Freq: W-SUN, Length: 318, dtype: float64
Date
2014-12-31 -0.082131
2015-01-30 0.134206
2015-02-27 0.086016
2015-03-31 -0.022975
2015-04-30 0.133512
...
2020-08-31 0.085034
2020-09-30 -0.097677
2020-10-30 -0.053569
2020-11-30 0.034719
2020-12-31 0.021461
Freq: BM, Length: 73, dtype: float64
Date
2014-12-31 -0.082131
2015-03-31 0.190415
2015-06-30 0.166595
2015-09-30 0.165108
2015-12-31 0.322681
2016-03-31 -0.095461
2016-06-30 0.211909
2016-09-30 0.167275
2016-12-31 -0.103026
2017-03-31 0.169701
2017-06-30 0.090090
2017-09-30 -0.011760
2017-12-31 0.213143
2018-03-31 0.234932
2018-06-30 0.199052
2018-09-30 0.190349
2018-12-31 -0.257182
2019-03-31 0.215363
2019-06-30 0.051952
2019-09-30 -0.097281
2019-12-31 0.058328
2020-03-31 0.039851
2020-06-30 0.427244
2020-09-30 0.141676
2020-12-31 0.015252
Freq: Q-DEC, dtype: float64
Date
2014-12-31 -0.082131
2015-06-30 0.388733
2015-12-31 0.538386
2016-06-30 0.090402
2016-12-31 0.045377
2017-06-30 0.277180
2017-12-31 0.202181
2018-06-30 0.450341
2018-12-31 -0.107405
2019-06-30 0.292404
2019-12-31 -0.039075
2020-06-30 0.471371
2020-12-31 0.180907
Freq: 6M, dtype: float64
Date
2014-12-31 -0.082131
2015-12-31 1.162295
2016-12-31 0.142589
2017-12-31 0.542999
2018-12-31 0.281544
2019-12-31 0.261152
2020-12-31 0.737029
Freq: A-DEC, dtype: float64```

Without knowing what your df DataFrame looks like I am assuming it is an issue with correctly handling the resampling on a MultiIndex similar to the one talked about in this question.
The solution listed there is to use pd.Grouper with the freq and level parameters filled out correctly.
# This is just from the listed solution so I am not sure if these is the correct level to choose
df.groupby(pd.Grouper(freq='W', level=-1))
If this doesn't work, I think you would need to provide some more detail or a dummy data set to reproduce the issue.

Related

How to change Dataframe columns names

Let's say I have an array of values corresponding to stock symbols: all = {AAPL, TSLA, MSFT}.
I created a function to scrap their historical prices on YFinance.
def Scrapping(symbol):
aapl = yf.Ticker(symbol)
ainfo = aapl.history(period='1y')
global castex_date
castex_date = ainfo.index
return ainfo.Close
all_Assets = list(map(Scrapping, all))
print(all_Assets)
This is an output sample:
[Date
2018-12-12 00:00:00-05:00 53.183998
2018-12-13 00:00:00-05:00 53.095001
Name: Close, Length: 1007, dtype: float64, Date
2018-12-12 00:00:00-05:00 24.440001
2018-12-13 00:00:00-05:00 25.119333
2022-12-08 00:00:00-05:00 247.399994
2022-12-09 00:00:00-05:00 245.419998
Name: Close, Length: 1007, dtype: float64]
The issue is that all these symbols' historical data have the same name 'Output'. When putting all of these in a dataframe, we get:
df = pd.DataFrame(all_Assets)
df2 = df.transpose()
print(df2)
Close Close Close
Date
2018-12-12 00:00:00-05:00 53.183998 24.440001 104.500511
2018-12-13 00:00:00-05:00 53.095001 25.119333 104.854965
2018-12-14 00:00:00-05:00 52.105000 24.380667 101.578560
2018-12-17 00:00:00-05:00 50.826500 23.228001 98.570389
2018-12-18 00:00:00-05:00 51.435501 22.468666 99.605042
This creates an issue when plotting the DF.
I need these columns names to be equal to the 'symbol' parameter of the function. SO, automatically, the column names would be AAPL TSLA MSFT

You can use:
import pandas as pd
import yfinance as yf
all_symbols = ['AAPL', 'TSLA', 'MSFT']
def scrapping(symbol):
ticker = yf.Ticker(symbol)
data = ticker.history(period='1y')
return data['Close'].rename(symbol)
all_assets = map(scrapping, all_symbols)
df = pd.concat(all_assets, axis=1)
One-liner version:
df = pd.concat({symbol: yf.Ticker(symbol).history(period='1y')['Close']
for symbol in all_symbols}, axis=1)
Output:
>>> df
AAPL TSLA MSFT
Date
2022-02-14 167.863144 291.920013 292.261475
2022-02-15 171.749573 307.476654 297.680695
2022-02-16 171.511032 307.796661 297.333221
2022-02-17 167.863144 292.116669 288.626678
2022-02-18 166.292648 285.660004 285.846893
... ... ... ...
2023-02-08 151.688400 201.289993 266.730011
2023-02-09 150.639999 207.320007 263.619995
2023-02-10 151.009995 196.889999 263.100006
2023-02-13 153.850006 194.639999 271.320007
2023-02-14 152.949997 203.729996 272.790009
[252 rows x 3 columns]

Plot each column mean grouped by specific date range

I have 7 columns of data, indexed by datetime (30 minutes frequency) starting from 2017-05-31 ending in 2018-05-25. I want to plot the mean of specific range of date (seasons). I have been trying groupby, but I can't get to group by specific range. I get wrong results if I do df.groupby(df.date.dt.month).mean().
A few lines from the dataset (date range is from 2017-05-31 to 2018-05-25)
50 51 56 58
date
2017-05-31 00:00:00 200.213542 276.929198 242.879051 NaN
2017-05-31 00:30:00 200.215478 276.928229 242.879051 NaN
2017-05-31 01:00:00 200.215478 276.925324 242.878083 NaN
2017-06-01 01:00:00 200.221288 276.944691 242.827729 NaN
2017-06-01 01:30:00 200.221288 276.944691 242.827729 NaN
2017-08-31 09:00:00 206.961886 283.374453 245.041349 184.358250
2017-08-31 09:30:00 206.966727 283.377358 245.042317 184.360187
2017-12-31 09:00:00 212.925877 287.198416 247.455413 187.175144
2017-12-31 09:30:00 212.926846 287.196480 247.465097 187.179987
2018-03-31 23:00:00 213.304498 286.933093 246.469647 186.887548
2018-03-31 23:30:00 213.308369 286.938902 246.468678 186.891422
2018-04-30 23:00:00 215.496812 288.342024 247.522230 188.104749
2018-04-30 23:30:00 215.497781 288.340086 247.520294 188.103780
I have created these variables (These are the ranges I need)
increment_rates_winter = df['2017-08-30'].mean() - df['2017-06-01'].mean()
increment_rates_spring = df['2017-11-30'].mean() - df['2017-09-01'].mean()
increment_rates_summer = df['2018-02-28'].mean() - df['2017-12-01'].mean()
increment_rates_fall = df['2018-05-24'].mean() - df['2018-03-01'].mean()
Concatenated them:
df_seasons =pd.concat([increment_rates_winter,increment_rates_spring,increment_rates_summer,increment_rates_fall],axis=1)
and after plotting, I got this:
However, I've been trying to get this:
df_seasons
Out[664]:
Winter Spring Summer Fall
50 6.697123 6.948447 -1.961549 7.662622
51 6.428329 4.760650 -2.188402 5.927087
52 5.580953 6.667529 1.136889 12.939295
53 6.406259 2.506279 -2.105125 6.964549
54 4.332826 3.678492 -2.574769 6.569398
56 2.222032 3.359607 -2.694863 5.348258
58 NaN 1.388535 -0.035889 4.213046
The seasons in x and the means plotted for each column.
Winter = df['2017-06-01':'2017-08-30']
Spring = df['2017-09-01':'2017-11-30']
Summer = df['2017-12-01':'2018-02-28']
Fall = df['2018-03-01':'2018-05-30']
Thank you in advance!

We can get a specific date range in the following way, and then you can define it however you want and take the mean
import pandas as pd
df = pd.read_csv('test.csv')
df['date'] = pd.to_datetime(df['date'])
start_date = "2017-12-31 09:00:00"
end_date = "2018-04-30 23:00:00"
mask = (df['date'] > start_date) & (df['date'] <= end_date)
f_df = df.loc[mask]
This gives the output
date 50 ... 58
8 2017-12-31 09:30:00 212.926846 ... 187.179987 NaN
9 2018-03-31 23:00:00 213.304498 ... 186.887548 NaN
10 2018-03-31 23:30:00 213.308369 ... 186.891422 NaN
11 2018-04-30 23:00:00 215.496812 ... 188.104749 NaN
Hope this helps

How about transpose it:
df_seasons.T.plot()
Output:

Pandas Merging two dataframes with joining on date between dates

Have quite interesting case.
There is df_1 with time column based on low-granularity data (2s) like this:
2018-08-31 22:59:47.980000+00:00 41.77
2018-08-31 22:59:49.979000+00:00 42.76
2018-08-31 22:59:51.979000+00:00 40.86
2018-08-31 22:59:53.979000+00:00 41.83
2018-08-31 22:59:55.979000+00:00 41.73
2018-08-31 22:59:57.979000+00:00 42.71
Also there is df_2 with labels for this data and time column on hour basis:
2018-08-31 22:00:00 0.0
2018-08-31 23:00:00 1.0
2018-09-01 00:00:00 0.0
2018-09-01 01:00:00 1.0
2018-09-01 02:00:00 0.0
I would like to merge df_1 with df_2 that time from df_1 would be between each two consecutive time rows in df_2 (between one hour for giving the label). If I would have two time columns in df_2 (like startTime and endTime) I would use pandasql and its opportunities:
import pandasql
sqlcode = '''
select *
from df_1
inner join df_2 on df_1.time >= df_2.startTime and df_1.time <= df_2.endTime
'''
newdf = ps.sqldf(sqlcode,locals())
But in this case I only have one column. Is there any way to solve this problem in Pandas?

This is pd.merge_asofproblem, I create a keydat dual of dates in df2,in order to show which date we merge from df2
#df1.Date=pd.to_datetime(df1.Date)
#df2.Date=pd.to_datetime(df2.Date)
yourdf=pd.merge_asof(df1,df2.assign(keydate=df2.Date),on='Date',direction='forward')
yourdf
Date ... keydate
0 2018-08-31 22:59:47.980 ... 2018-08-31 23:00:00
1 2018-08-31 22:59:49.979 ... 2018-08-31 23:00:00
2 2018-08-31 22:59:51.979 ... 2018-08-31 23:00:00
3 2018-08-31 22:59:53.979 ... 2018-08-31 23:00:00
4 2018-08-31 22:59:55.979 ... 2018-08-31 23:00:00
5 2018-08-31 22:59:57.979 ... 2018-08-31 23:00:00
[6 rows x 4 columns]

I solved the problem using workaround with splitting time into date and hour columns. Maybe not too fancy but it solves the deal and pretty straight-forward:
import pandasql as ps
df_1['date'] = [d.date() for d in df_1['time']]
df_1['time'] = df_1['time'].dt.round('H').dt.hour
df_2['date'] = [d.date() for d in df_2['time']]
df_2['time'] = df_2['time'].dt.round('H').dt.hour
sqlcode = '''
select *
from df_1
inner join df_2 on df_1.time=df_2.time and df_1.date=df_2.date
'''
newdf = ps.sqldf(sqlcode,locals())

How to get the mean and sum of the random numbers for a date time series?

I created a series for all the business days for the year 2016 and then assigned random numbers for each date:
Created a date time index for the year 2016:
df= pd.bdate_range('2016-01-01', '2016-12-31')
output
DatetimeIndex(['2016-01-01', '2016-01-04', '2016-01-05', '2016-01-06',
'2016-01-07', '2016-01-08', '2016-01-11', '2016-01-12',
'2016-01-13', '2016-01-14',
...
'2016-12-19', '2016-12-20', '2016-12-21', '2016-12-22',
'2016-12-23', '2016-12-26', '2016-12-27', '2016-12-28',
'2016-12-29', '2016-12-30'],
dtype='datetime64[ns]', length=261, freq='B')
Created index for each columns:
s = pd.Series(np.random.randn(len(df)), index=df)
output
2016-01-01 0.430445
2016-01-04 -0.378483
2016-01-05 0.410059
2016-01-06 2.276409
2016-01-07 1.102603
2016-01-08 -0.339722
2016-01-11 0.542110
2016-01-12 -0.898154
......
2016-12-28 -0.952172
2016-12-29 -1.522073
2016-12-30 -1.065957
I would like to get the sum of index created for each values where I have Tuesday and also I would like to get the mean values of index for each month.

Problem 1: Sum of tuesday values
use dayofweek, and index where dayofweek == 1 (which represents tuesdays)
s[s.index.dayofweek == 1].sum()
# Output:
2.1416224135016124
Problem 2: Mean by month
Use groupby with pd.Grouper(freq='m'):
s.groupby(pd.Grouper(freq='m')).mean()
# Output:
2016-01-31 0.072559
2016-02-29 0.009706
2016-03-31 0.118553
2016-04-30 -0.228017
2016-05-31 0.132211
2016-06-30 -0.188015
2016-07-31 0.008239
2016-08-31 -0.181972
2016-09-30 0.554330
2016-10-31 -0.293271
2016-11-30 -0.092587
2016-12-31 -0.268706
Freq: M, dtype: float64

Resample with custom month-end frequency

I'm looking for an equivalent specification to W-MON (weekly, ending Monday) for monthly data.
Specifically, I have a pandas data frame of daily data, and I want to only take monthly observations, starting with the most recent date and going back monthly.
So if today is 17/06/2016, my date index would be 17/06/2016, 17/05/2016, 17/04/2016... etc.
Right now I can only find month-start and month-end as specifications for df.asfreq().
Thanks.

You can create the relevant dates using relativedelta and select using .loc[]:
from datetime import datetime
from dateutil.relativedelta import relativedelta
from pandas_datareader.data import DataReader
Using daily sample data:
stock_data = DataReader('FB', 'yahoo', datetime(2013, 1, 1), datetime.today()).resample('D').fillna(method='ffill')['Open']
and a month end date to show how relativedelta treats this case:
today = date(2016, 1, 31)
Create the sequence of dates:
n_months = 30
dates = [today - relativedelta(years=m // 12, months=m % 12) for m in range(n_months)]
to get:
stock_data.loc[dates]
Date
2016-01-31 108.989998
2015-12-31 106.000000
2015-11-30 105.839996
2015-10-31 104.510002
2015-09-30 88.440002
2015-08-31 90.599998
2015-07-31 94.949997
2015-06-30 86.599998
2015-05-31 79.949997
2015-04-30 80.010002
2015-03-31 82.900002
2015-02-28 80.680000
2015-01-31 78.000000
2014-12-31 79.540001
2014-11-30 77.669998
2014-10-31 74.930000
2014-09-30 79.349998
2014-08-31 74.300003
2014-07-31 74.000000
2014-06-30 67.459999
2014-05-31 63.950001
2014-04-30 57.580002
2014-03-31 60.779999
2014-02-28 69.470001
2014-01-31 60.470001
2013-12-31 54.119999
2013-11-30 46.750000
2013-10-31 47.160000
2013-09-30 50.139999
2013-08-31 42.020000
Name: Open, dtype: float64

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Turning daily stock prices into weekly/monthly/quarterly/semester/yearly? - python

Related

How to change Dataframe columns names

Plot each column mean grouped by specific date range

Pandas Merging two dataframes with joining on date between dates

How to get the mean and sum of the random numbers for a date time series?

Resample with custom month-end frequency

Categories

Resources