Pandas column multi-index to rows - python

I'm using yfinance to download the price history for multiple symbols, which returns a dataframe with multiple indexes. For example:
import yfinance as yf
df = yf.download(tickers = ['AAPL', 'MSFT'], period = '2d')
A similar dataframe could be constructed without yfinance like:
import pandas as pd
pd.options.display.float_format = '{:.2f}'.format
import numpy as np
attributes = ['Adj Close', 'Close', 'High', 'Low', 'Open', 'Volume']
symbols = ['AAPL', 'MSFT']
dates = ['2020-07-23', '2020-07-24']
data = [[[371.38, 202.54], [371.38, 202.54], [388.31, 210.92], [368.04, 202.15], [387.99, 207.19], [49251100, 67457000]],
[[370.46, 201.30], [370.46, 201.30], [371.88, 202.86], [356.58, 197.51 ], [363.95, 200.42], [46323800, 39799500]]]
data = np.array(data).reshape(len(dates), len(symbols) * len(attributes))
cols = pd.MultiIndex.from_product([attributes, symbols])
df = pd.DataFrame(data, index=dates, columns=cols)
df
Output:
Adj Close Close High Low Open Volume
AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT AAPL MSFT
2020-07-23 371.38 202.54 371.38 202.54 388.31 210.92 368.04 202.15 387.99 207.19 49251100.0 67457000.0
2020-07-24 370.46 201.30 370.46 201.30 371.88 202.86 356.58 197.51 363.95 200.42 46323800.0 39799500.0
Once I have this dataframe, I want to restructure it so that I have a row for each symbol and date. I'm currently doing this by looping through a list of symbols and calling the API once each time, and appending the results. I'm sure there must be a more efficient way:
df = pd.DataFrame()
symbols = ['AAPL', 'MSFT']
for x in range(0, len(symbols)):
symbol = symbols[x]
result = yf.download(tickers = symbol, start = '2020-07-23', end = '2020-07-25')
result.insert(0, 'symbol', symbol)
df = pd.concat([df, result])
Example of the desired output:
df
symbol Open High Low Close Adj Close Volume
Date
2020-07-23 AAPL 387.989990 388.309998 368.040009 371.380005 371.380005 49251100
2020-07-24 AAPL 363.950012 371.880005 356.579987 370.459991 370.459991 46323800
2020-07-23 MSFT 207.190002 210.919998 202.149994 202.539993 202.539993 67457000
2020-07-24 MSFT 200.419998 202.860001 197.509995 201.300003 201.300003 39799500

This looks like a simple stacking operation. Let's go with
df = yf.download(tickers = ['AAPL', 'MSFT'], period = '2d') # Get your data
df.stack(level=1).rename_axis(['Date', 'symbol']).reset_index(level=1)
Output:
symbol Adj Close ... Open Volume
Date ...
2020-07-23 AAPL 371.380005 ... 387.989990 49251100
2020-07-23 MSFT 202.539993 ... 207.190002 67457000
2020-07-24 AAPL 370.459991 ... 363.950012 46323800
2020-07-24 MSFT 201.300003 ... 200.419998 39799500
[4 rows x 7 columns]

Related

Match "df['items']" values with "df1" columns & if values match, store matched value in new dataframe "out", so the expected o/p as below

I have tried this below code, values matched but result stored under same column i.e, items. I converted df1 columns into variables and match with df. But I want result in new dataframe out:
for ticker in tickers:
dfs = df1[df1["ticker"] == ticker]
asset = str(int(dfs['asset'].values[0]))
debt = str(int(dfs['debt'].values[0]))
debtc = str(int(dfs['debtc'].values[0]))
data = [ticker, asset, debt, debtc]
def checkIfValuesExists1(df, ticker):
for ele in ticker:
if ele in df['items'].values:
out[ele] = data
return out
out = checkIfValuesExists1(df, data)
My current output is:
out:
ticker
items
AAPL
4564
MSFT
7778
GOOGL
7654
df:
ticker
items
AAPL
4564
MSFT
7778
GOOGL
7654
df1:
ticker
asset
debt
debtc
AAPL
4564
9674
9755
MSFT
4477
7778
6545
GOOGL
5675
5535
7654
Expected output i.e, out:
ticker
asset
debt
debtc
AAPL
4564
MSFT
7778
GOOGL
7654
Use eq to identify value equality in a vectorised way and use where to mask NaN values.
df = pd.Series([4564, 7778, 7654], index=["AAPL", "MSFT", "GOOGL"])
df1 = pd.DataFrame(
{
"asset": [4564, 4477, 5675],
"debt": [9674, 7778, 5535],
"debtc": [9755, 6545, 7654],
},
index=["AAPL", "MSFT", "GOOGL"],
)
df1.where(df1.eq(df, axis=0))

How to append to another df from inside a for loop

How can you append to an existing df from inside a for loop? For example:
import pandas as pd
from pandas_datareader import data as web
stocks = ['amc', 'aapl']
colnames = ['Datetime', 'Open', 'High', 'Low', 'Close', 'Adj Close', 'Volume', 'Name']
df1 = pd.DataFrame(data=None, columns=colnames)
for stock in stocks:
df = web.DataReader(stock, 'yahoo')
df['Name'] = stock
What should I do next so that df is appended to df1?
You could try pandas.concat()
df1 = pd.DataFrame(data=None, columns=colnames)
for stock in stocks:
df = web.DataReader(stock, 'yahoo')
df['Name'] = stock
df1 = pd.concat([df1, df], ignore_index=True)
Instead of concat dataframe in each loop, you could also try append dataframe to a list
dfs = []
for stock in stocks:
df = web.DataReader(stock, 'yahoo')
df['Name'] = stock
dfs.append(df)
df_ = pd.concat(dfs, ignore_index=True)
print(df_)
High Low Open Close Volume Adj Close Name
0 32.049999 31.549999 31.900000 31.549999 1867000.0 24.759203 amc
1 31.799999 30.879999 31.750000 31.000000 1225900.0 24.327585 amc
2 31.000000 30.350000 30.950001 30.799999 932100.0 24.170631 amc
3 30.900000 30.250000 30.700001 30.350000 1099000.0 23.817492 amc
4 30.700001 30.100000 30.549999 30.650000 782500.0 24.052916 amc
... ... ... ... ... ... ... ...
2515 179.009995 176.339996 176.690002 178.960007 100589400.0 178.960007 aapl
2516 179.610001 176.699997 178.550003 177.770004 92633200.0 177.770004 aapl
2517 178.029999 174.399994 177.839996 174.610001 103049300.0 174.610001 aapl
2518 174.880005 171.940002 174.029999 174.309998 78699800.0 174.309998 aapl
2519 174.880005 171.940002 174.029999 174.309998 78751328.0 174.309998 aapl
[2520 rows x 7 columns]
What you're trying to do won't quite work, since the data retrieved by DataReader has several columns and you need that data for several stocks. However, each of those columns is a time series.
So what you probably want is something that looks like this:
Stock amc
Field High Low Open ...
2022-03-30 29.230000 25.350000 ...
2022-03-31 25.920000 23.260000 ...
2022-04-01 25.280001 22.340000 ...
2022-04-01 25.280001 22.340000 ...
...
And you'd be able to access like df[('amc', 'Low')] to get a time series for that stock, or like df[('amc', 'Low')]['2022-04-01'][0] to get the 'Low' value for 'amc' on April 1st.
This gets you exactly that:
import pandas as pd
from pandas_datareader import data as web
stocks = ['amc', 'aapl']
df = pd.DataFrame()
for stock_name in stocks:
stock_df = web.DataReader(stock_name, data_source='yahoo')
for col in stock_df:
df[(stock_name, col)] = stock_df[col]
df.columns = pd.MultiIndex.from_tuples(df.columns, names=['Stock', 'Field'])
print(f'\nall data:\n{"-"*40}\n', df)
print(f'\none series:\n{"-"*40}\n', df[('aapl', 'Volume')])
print(f'\nsingle value:\n{"-"*40}\n', df[('amc', 'Low')]['2022-04-01'][0])
The solution uses a MultiIndex to achieve what you need. It first loads all the data as retrieved from the API into columns labeled with tuples of stock name and field, and it then converts that into a proper MultiIndex after loading completes.
Output:
all data:
----------------------------------------
Stock amc ... aapl
Field High Low ... Volume Adj Close
Date ...
2017-04-04 32.049999 31.549999 ... 79565600.0 34.171505
2017-04-05 31.799999 30.879999 ... 110871600.0 33.994480
2017-04-06 31.000000 30.350000 ... 84596000.0 33.909496
2017-04-07 30.900000 30.250000 ... 66688800.0 33.833969
2017-04-10 30.700001 30.100000 ... 75733600.0 33.793839
... ... ... ... ... ...
2022-03-29 34.330002 26.410000 ... 100589400.0 178.960007
2022-03-30 29.230000 25.350000 ... 92633200.0 177.770004
2022-03-31 25.920000 23.260000 ... 103049300.0 174.610001
2022-04-01 25.280001 22.340000 ... 78699800.0 174.309998
2022-04-01 25.280001 22.340000 ... 78751328.0 174.309998
[1260 rows x 12 columns]
one series:
----------------------------------------
Date
2017-04-04 79565600.0
2017-04-05 110871600.0
2017-04-06 84596000.0
2017-04-07 66688800.0
2017-04-10 75733600.0
...
2022-03-29 100589400.0
2022-03-30 92633200.0
2022-03-31 103049300.0
2022-04-01 78699800.0
2022-04-01 78751328.0
Name: (aapl, Volume), Length: 1260, dtype: float64
single value:
----------------------------------------
22.34000015258789

Download multiple stocks with pandas yahoo finance datareader and putting them in a DataFrame

Hi guys i would like to download multiple stocks from yahoo finance using Pandas.
But at the same time I need to save only the "Adj Close" column for each stock.
Moreover I would like to create a DataFrame with all this "Adj Close" columns and set the columns name as the stock ticker.
I tried to use this code but I'm stuck.
import numpy as np
import pandas as pd
from datetime import datetime
import pandas_datareader.data as web
stocks = ['ORCL', 'TSLA', 'IBM','YELP', 'MSFT']
ls_key = 'Adj Close'
start = datetime(2014,1,1)
end = datetime(2015,1,1)
f = web.DataReader(stocks, 'yahoo',start,end)
f
Hope anyone can help me
df = f[[("Adj Close", s) for s in stocks]]
df.columns = df.columns.droplevel(level=0)
df
>>
Symbols ORCL TSLA IBM YELP MSFT
Date
2014-01-02 33.703285 30.020000 137.696884 67.919998 31.983477
2014-01-03 33.613930 29.912001 138.520721 67.660004 31.768301
2014-01-06 33.479893 29.400000 138.045746 71.720001 31.096956
2014-01-07 33.819431 29.872000 140.799240 72.660004 31.337952
2014-01-08 33.703274 30.256001 139.507858 78.419998 30.778502
... ... ... ... ...
2014-12-24 41.679443 44.452000 123.015839 53.000000 42.568497
2014-12-26 41.562233 45.563999 123.411110 52.939999 42.338593
2014-12-29 41.120468 45.141998 122.019974 53.009998 41.958347
2014-12-30 40.877041 44.445999 121.670265 54.240002 41.578117
2014-12-31 40.543465 44.481998 121.966751 54.730000 41.074078

Rolling mean returns over DataFrame

I want to add columns to the following Dataframe for each stock of 5 year (60 month) rolling returns. The following code is used to obtain the financial data over the period 1995 to 2010.
quandl.ApiConfig.api_key = 'Enter Key'
stocks = ['MSFT', 'AAPL', 'WMT', 'GE', 'KO']
stockdata = quandl.get_table('WIKI/PRICES', ticker = stocks, paginate=True,
qopts = { 'columns': ['date', 'ticker', 'adj_close'] },
date = { 'gte': '1995-1-1', 'lte': '2010-12-31' })
# Setting date as index with columns of tickers and adjusted closing price
df = stockdata.pivot(index = 'date',columns='ticker')
df.index = pd.to_datetime(df.index)
df.resample('1M').mean()
df = df.pct_change()
df.head()
Out[1]:
rets
ticker AAPL BA F GE JNJ KO
date
1995-01-03 NaN NaN NaN NaN NaN NaN
1995-01-04 0.026055 -0.002567 0.026911 0.000000 0.006972 -0.019369
1995-01-05 -0.012697 0.002573 -0.008735 0.002549 -0.002369 -0.004938
1995-01-06 0.080247 0.018824 0.000000 -0.004889 -0.006758 0.000000
1995-01-09 -0.019048 0.000000 0.017624 -0.009827 -0.011585 -0.014887
df.tail()
Out[2]:
rets
ticker AAPL BA F GE JNJ KO
date
2010-12-27 0.003337 -0.004765 0.005364 0.008315 -0.005141 -0.007777
2010-12-28 0.002433 0.001699 -0.008299 0.007147 0.001938 0.004457
2010-12-29 -0.000553 0.002929 0.000598 -0.002729 0.001289 0.001377
2010-12-30 -0.005011 -0.000615 -0.002987 -0.004379 -0.003058 0.000764
2010-12-31 -0.003399 0.003846 0.005992 0.005498 -0.001453 0.004122
Any assistance of how to do this would be awesome!
The problem is in the multi-level index in the columns. We can start by selecting the second level index, and after that the rolling mean works:
means = df['rets'].rolling(60).mean()
means.tail()
Gives:
The error you are receiving is due to you passing the entire dataframe into the rolling function since your frame uses a multi index. You cant pass a multi index frame to a rolling function since rolling only accepts numpy arrays of 1 column. You’ll have to probably create a for loop and return the values individually per ticker

Updating pandas DataFrame by key

I have a dataframe of historical stock trades. The frame has columns like ['ticker', 'date', 'cusip', 'profit', 'security_type']. Initially:
trades['cusip'] = np.nan
trades['security_type'] = np.nan
I have historical config files that I can load into frames that have columns like ['ticker', 'cusip', 'date', 'name', 'security_type', 'primary_exchange'].
I would like to UPDATE the trades frame with the cusip and security_type from config, but only where the ticker and date match.
I thought I could do something like:
pd.merge(trades, config, on=['ticker', 'date'], how='left')
But that doesn't update the columns, it just adds the config columns to trades.
The following works, but I think there has to be a better way. If not, I will probably do it outside of pandas.
for date in trades['date'].unique():
config = get_config_file_as_df(date)
## config['date'] == date
for ticker in trades['ticker'][trades['date'] == date]:
trades['cusip'][
(trades['ticker'] == ticker)
& (trades['date'] == date)
] \
= config['cusip'][config['ticker'] == ticker].values[0]
trades['security_type'][
(trades['ticker'] == ticker)
& (trades['date'] == date)
] \
= config['security_type'][config['ticker'] == ticker].values[0]
Suppose you have this setup:
import pandas as pd
import numpy as np
import datetime as DT
nan = np.nan
trades = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'],
'date' : pd.date_range('1/1/2000', periods = 4),
'cusip' : [nan, nan, 100, nan]
})
trades = trades.set_index(['ticker', 'date'])
print(trades)
# cusip
# ticker date
# IBM 2000-01-01 NaN
# MSFT 2000-01-02 NaN
# GOOG 2000-01-03 100 # <-- We do not want to overwrite this
# AAPL 2000-01-04 NaN
config = pd.DataFrame({'ticker' : ['IBM', 'MSFT', 'GOOG', 'AAPL'],
'date' : pd.date_range('1/1/2000', periods = 4),
'cusip' : [1,2,3,nan]})
config = config.set_index(['ticker', 'date'])
# Let's permute the index to show `DataFrame.update` correctly matches rows based on the index, not on the order of the rows.
new_index = sorted(config.index)
config = config.reindex(new_index)
print(config)
# cusip
# ticker date
# AAPL 2000-01-04 NaN
# GOOG 2000-01-03 3
# IBM 2000-01-01 1
# MSFT 2000-01-02 2
Then you can update NaN values in trades with values from config using the DataFrame.update method. Note that DataFrame.update matches rows based on indices (which is why set_index was called above).
trades.update(config, join = 'left', overwrite = False)
print(trades)
# cusip
# ticker date
# IBM 2000-01-01 1
# MSFT 2000-01-02 2
# GOOG 2000-01-03 100 # If overwrite = True, then 100 is overwritten by 3.
# AAPL 2000-01-04 NaN

Categories

Resources