pandas-ta with multiindex dataframe - python

I am wanting to use pandas-ta.
Although most aspects of this library seem easier for technical analysis I can only make it function on single ticker dataframes.
I would like to figure out how to get pandas-ta to work over multiple tickers in a multiindex dataframe.
I get the data using: - where [stocks] come from a csv list.
df = yf.download[stocks], '2021-1-1', interval='1d')
the pandas-ta download method below only creates a single ticker dataframe and only iterates the first ticker when using [stocks].
df.ta.ticker('GOOG', period = '1y', interval = "1h")
My current dataframe appears something like below. (where the list of tickers will change)
Adj Close Close High Low Open Volume
BTC-USD ETH-USD BTC-USD ETH-USD BTC-USD ETH-USD BTC-USD ETH-USD BTC-USD ETH-USD BTC-USD ETH-USD
Date
2020-12-31 29001.720703 737.803406 29001.720703 737.803406 29244.876953 754.299438 28201.992188 726.511902 28841.574219 751.626648 46754964848 13926846861
2021-01-01 29374.152344 730.367554 29374.152344 730.367554 29600.626953 749.201843 28803.585938 719.792236 28994.009766 737.708374 40730301359 13652004358
2021-01-02 32127.267578 774.534973 32127.267578 774.534973 33155.117188 786.798462 29091.181641 718.109497 29376.455078 730.402649 67865420765 19740771179
2021-01-03 32782.023438 975.507690 32782.023438 975.507690 34608.558594 1006.565002 32052.316406 771.561646 32129.408203 774.511841 78665235202 45200463368
2021-01-04 31971.914062 1040.233032 31971.914062 1040.233032 33440.218750 1153.189209 28722.755859 912.305359 32810.949219 977.058838 81163475344 56945985763
When I try to apply a pandas-ta function such as:
df[stocks] = data[stocks].ta.sma(length=10)
I get the error.
AttributeError: 'Series' object has no attribute 'ta'
When I use the documentation standard method
sma10 = ta.sma(df["Close"], length=10)
I don't know how to target the specific (BTC-USD)'Close' columns for all tickers in the .csv list - ie. (df['Close']
In both examples pandas-ta sma is using the 'close' value but I'm hoping to be able to apply all pandas-ta methods to a multiindex.
I can download 'Close' only data -
data = yf.download[stocks], '2021-1-1', interval='1d')['Close']
however the columns will be the 'ticker names' containing 'Close' data and I still have the same issue with pandas-ta trying to find the 'close' column data.
I don't know how to make pandas-ta function over multiple tickers in the same dataframe.
Is there a solution to this?
Thanks for any help!

Since each column of multi-column consists of a tuple, it is possible to deal with data frames in horizontal format by specifying them in tuple format using .loc, etc. Two types of technical analysis are added by loop processing. The last step is to reorder the columns. If you need to handle more than just the closing price, you can use the closing price as the target of the loop.
import pandas as pd
import pandas_ta as ta
import yfinance as yf
stocks = 'BTC-USD ETH-USD XRP-USD XEM-USD'
df = yf.download(stocks, '2021-1-1', interval='1d',)
technicals = ['sma10', 'sma25', 'vwma']
tickers = stocks.split(' ')
for ticker in tickers:
for t in technicals:
if t[:2] == 'sma':
l = int(t[3:])
df[(t, ticker)] = ta.sma(df.loc[:,('Close', ticker)], length=l)
else:
df[(t, ticker)] = ta.vwma(df.loc[:,('Close', ticker)], df.loc[:,('Volume', ticker)])

Related

python Dataframe using pandas data insert into excel file

I have received a data frame using pandas, data have one column and multiple rows in that column
and each row has multiple data like ({buy_quantity:0, symbol:nse123490,....})
I want to insert it into an excel sheet using pandas data frame with python xlwings lib. with some selected data please help me
wb = xw.Book('Easy_Algo.xlsx')
ts = wb.sheets['profile']
pdata=sas.get_profile()
df = pd.DataFrame(pdata)
ts.range('A1').value = df[['symbol','product','avg price','buy avg']]
output like this :
please help me... how to insert data into excel only selected.
Considering that the dataframe below is named df and the type of the column positions is dict, you can use the code below to transform the keys to columns and values to rows.
out = df.join(pd.DataFrame(df.pop('positions').values.tolist()))
out.to_excel('Easy_Algo.xlsx', sheet_name=['profile'], index=False) #to store the result in an Excel file/spreadsheet.
Note : Make sure to add these two lines below if the type of the column positions is not dict.
import ast
df['positions']=df['positions'].apply(ast.literal_eval)
#A sample dataframe for test :
import pandas as pd
import ast
string_dict = {'{"Symbol": "NIFTY2292218150CE NFO", "Produc": "NRML", "Avg. Price": 18.15, "Buy Avg": 0}',
'{"Symbol": "NIFTY22SEP18500CE NFO", "Produc": "NRML", "Avg. Price": 20.15, "Buy Avg": 20.15}',
'{"Symbol": "NIFTY22SEP16500PE NFO", "Produc": "NRML", "Avg. Price": 16.35, "Buy Avg": 16.35}'}
df = pd.DataFrame(string_dict, columns=['positions'])
df['positions']=df['positions'].apply(ast.literal_eval)
out = df.join(pd.DataFrame(df.pop('positions').values.tolist()))
>>> print(out)
Symbol Produc Avg. Price Buy Avg
0 NIFTY22SEP16500PE NFO NRML 16.35 16.35
1 NIFTY22SEP18500CE NFO NRML 20.15 20.15
2 NIFTY2292218150CE NFO NRML 18.15 0.00
If i understood correctly, you want only those columns written to an excel file
df = df[['symbol','product','avg price','buy avg']]
df.to_excel("final.xlsx")
df.to_excel("final.xlsx", index = False) # in case there was a default index generated by pandas and you wanna get rid of it.
i hope this helps.

List which doesn't append in a for loop in pandas autocorrelation

I have a data Series which looks like this:
Date Open High Low Adj Close Change
4844 26/10/2020 3441.419922 3441.419922 3233.939941 3269.959961 -5.243488
4845 02/11/2020 3296.199951 3529.050049 3279.739990 3509.439941 6.076183
4846 09/11/2020 3583.040039 3645.989990 3511.909912 3585.149902 0.058850
4847 16/11/2020 3600.159912 3628.510010 3543.840088 3557.540039 -1.198015
4848 20/11/2020 3579.310059 3581.229980 3556.850098 3557.540039 -0.611940
I'm trying to create a new list which contains a autocorrelation coefficient for various lookback periods via a for loop. I've tried this:
import pandas as pd
Df = pd.read_csv("SP500 Weekly Data.csv", delimiter=",")
Df.fillna('')
Df['Change'] = ((Df['Adj Close'] - Df['Open']) / Df['Adj Close']*100)
for t in range(1,20):
wk = []
auto = Df['Change'].autocorr(t).astype(float)
wk.append(auto)
print(wk)
but instead of getting a list of values, all I get from the print is the last value:
[0.002519726414980291]
At first I thought it was the type of value being returned [I got an ''numpy.float64' object is not iterable' error with .extend()], but .append() doesn't appear to be adding to the list with each loop.
Any help is appreciated, as well as any advice on the mistake I've made, so I can look out for it next time! Thanks
In your code, the wk list is being initialized empty every time, Hence you need to place it outside the loop like below for it to work.
import pandas as pd
Df = pd.read_csv("SP500 Weekly Data.csv", delimiter=",")
Df.fillna('')
Df['Change'] = ((Df['Adj Close'] - Df['Open']) / Df['Adj Close']*100)
wk = []
for t in range(1,20):
auto = Df['Change'].autocorr(t).astype(float)
wk.append(auto)
print(wk)

How to start a for loop for this given DataFrame in Pandas for multiple same name rows?

I need some help, I am working on a .ipynb file to filter data and get certain things from that Dataframe.
This is DataFrame I'm working with.
From this dataframe, as you can see there are multiple rows of the same SYMBOL.
I need help to open a "for" loop which will get me the highest CHG_IN_OI for every symbol, take the row of that highest CHG_IN_OI for that row.
For example if there are 14 rows of ACC as a symbol, I need to find highest CHG_IN_OI for ACC from the CHG_IN_OI column and get that row of the highest change and Retain the remaining columns as well!.
I have made a list named, Multisymbols which has these symbols:
multisymbols = [
'ACC',
'ADANIENT',
'ADANIPORTS',
'AMARAJABAT',
'AMBUJACEM',
'APOLLOHOSP',
'APOLLOTYRE',
'ASHOKLEY',
'ASIANPAINT',
'AUROPHARMA',
'AXISBANK',
'BAJAJ-AUTO',
'BAJAJFINSV',
'BAJFINANCE',
'BALKRISIND',
'BANDHANBNK',
'BANKBARODA',
'BATAINDIA',
'BEL',
'BERGEPAINT',
'BHARATFORG',
'BHARTIARTL',
'BHEL',
'BIOCON',
'BOSCHLTD',
'BPCL',
'BRITANNIA',
'CADILAHC',
'CANBK',
'CENTURYTEX',
'CHOLAFIN',
'CIPLA',
'COALINDIA',
'COLPAL',
'CONCOR',
'CUMMINSIND',
'DABUR',
'DIVISLAB',
'DLF',
'DRREDDY',
'EICHERMOT',
'EQUITAS',
'ESCORTS',
'EXIDEIND',
'FEDERALBNK',
'GAIL',
'GLENMARK',
'GMRINFRA',
'GODREJCP',
'GODREJPROP',
'GRASIM',
'HAVELLS',
'HCLTECH',
'HDFC',
'HDFCBANK',
'HDFCLIFE',
'HEROMOTOCO',
'HINDALCO',
'HINDPETRO',
'HINDUNILVR',
'IBULHSGFIN',
'ICICIBANK',
'ICICIPRULI',
'IDEA',
'IDFCFIRSTB',
'IGL',
'INDIGO',
'INDUSINDBK',
'INFRATEL',
'INFY',
'IOC',
'ITC',
'JINDALSTEL',
'JSWSTEEL',
'JUBLFOOD',
'KOTAKBANK',
'L&TFH',
'LICHSGFIN',
'LT',
'LUPIN',
'M&M',
'M&MFIN',
'MANAPPURAM',
'MARICO',
'MARUTI',
'MCDOWELL-N',
'MFSL',
'MGL',
'MINDTREE',
'MOTHERSUMI',
'MRF',
'MUTHOOTFIN',
'NATIONALUM',
'NAUKRI',
'NESTLEIND',
'NIITTECH',
'NMDC',
'NTPC',
'ONGC',
'PAGEIND',
'PEL',
'PETRONET',
'PFC',
'PIDILITIND',
'PNB',
'POWERGRID',
'PVR',
'RAMCOCEM',
'RBLBANK',
'RECLTD',
'RELIANCE',
'SAIL',
'SBILIFE',
'SBIN',
'SHREECEM',
'SEIMENS',
'SRF',
'SRTRANSFIN',
'SUNPHARMA',
'SUNTV',
'TATACHEM',
'TATACONSUM',
'TATAMOTORS',
'TATAPOWER',
'TATASTEEL',
'TCS',
'TECHM',
'TITAN',
'TORNTPHARM',
'TORNTPOWER',
'TVSMOTOR',
'UBL',
'UJJIVAN',
'ULTRACEMCO',
'UPL',
'VEDL',
'VOLTAS',
'WIPRO',
'ZEEL'
]
df = df[df['SYMBOL'].isin(multisymbols)]
df
These are all the shares in the NSE. Hope you can understand and help me out. I used .groupby(),it successfully gave me the highest CHG_IN_OI and .agg() to retain the remaining columns but the data was not correct. I just simply want the row for every symbols "HIGHEST" CHG_IN_OI.
Thanks in Advance!
Although different from the data presented in the question, we have answered the same financial data using equity data as an example.
import pandas as pd
import pandas_datareader.data as web
import datetime
with open('./alpha_vantage_api_key.txt') as f:
api_key = f.read()
start = datetime.datetime(2019, 1, 1)
end = datetime.datetime(2020, 8,1)
df_all = pd.DataFrame()
symbol = ['AAPL','TSLA']
for i in symbol:
df = web.DataReader(i, 'av-daily', start, end, api_key=api_key)
df['symbol'] = i
df_all = pd.concat([df_all, df], axis=0)
df.index = pd.to_datetime(df.index)
Aggregating a single column
df_all.groupby('symbol')['volume'].agg('max').reset_index()
symbol volume
0 AAPL 106721200
1 TSLA 60938758
Multi-Column Aggregation
df_all.groupby('symbol')[['high','volume']].agg(high=('high','max'), volume=('volume','max'))
high volume
symbol
AAPL 425.66 106721200
TSLA 1794.99 60938758
Extract the target line
symbol_max = df_all.groupby('symbol').apply(lambda x: x.loc[x['volume'].idxmax()]).reset_index(drop=True)
symbol_max
open high low close volume symbol
0 257.26 278.4100 256.37 273.36 106721200 AAPL
1 882.96 968.9899 833.88 887.06 60938758 TSLA

How do I make this function iterable (getting indexerror)

I am fairly new to python and coding in general.
I have a big data file that provides daily data for the period 2011-2018 for a number of stock tickers (300~).
The data is a .csv file with circa 150k rows and looks as follows (short example):
Date,Symbol,ShortExemptVolume,ShortVolume,TotalVolume
20110103,AAWW,0.0,28369,78113.0
20110103,AMD,0.0,3183556,8095093.0
20110103,AMRS,0.0,14196,18811.0
20110103,ARAY,0.0,31685,77976.0
20110103,ARCC,0.0,177208,423768.0
20110103,ASCMA,0.0,3930,26527.0
20110103,ATI,0.0,193772,301287.0
20110103,ATSG,0.0,23659,72965.0
20110103,AVID,0.0,7211,18896.0
20110103,BMRN,0.0,21740,213974.0
20110103,CAMP,0.0,2000,11401.0
20110103,CIEN,0.0,625165,1309490.0
20110103,COWN,0.0,3195,24293.0
20110103,CSV,0.0,6133,25394.0
I have a function that allows me to filter for a specific symbol and get 10 observations before and after a specified date (could be any date between 2011 and 2018).
import pandas as pd
from datetime import datetime
import urllib
import datetime
def get_data(issue_date, stock_ticker):
df = pd.read_csv (r'D:\Project\Data\Short_Interest\exampledata.csv')
df['Date'] = pd.to_datetime(df['Date'], format="%Y%m%d")
d = df
df = pd.DataFrame(d)
short = df.loc[df.Symbol.eq(stock_ticker)]
# get the index of the row of interest
ix = short[short.Date.eq(issue_date)].index[0]
# get the item row for that row's index
iloc_ix = short.index.get_loc(ix)
# get the +/-1 iloc rows (+2 because that is how slices work), basically +1 and -1 trading days
short_data = short.iloc[iloc_ix-10: iloc_ix+11]
return [short_data]
I want to create a script that iterates a list of 'issue_dates' and 'stock_tickers'. The list (a .csv) looks as following:
ARAY,07/08/2017
ARAY,24/04/2014
ACETQ,16/11/2015
ACETQ,16/11/2015
NVLNA,15/08/2014
ATSG,29/09/2017
ATI,24/05/2016
MDRX,18/06/2013
MDRX,18/06/2013
AMAGX,10/05/2017
AMAGX,14/02/2014
AMD,14/09/2016
To break down my problem and question I would like to know how to do the following:
First, how do I load the inputs?
Second, how do I call the function on each input?
And last, how do I accumulate all the function returns in one dataframe?
To load the inputs and call the function for each row; iterate over the csv file and pass each row's values to the function and accumulate the resulting Seriesin a list.
I modified your function a bit: removed the DataFrame creation so it is only done once and added a try/except block to account for missing dates or tickers (your example data didn't match up too well). The dates in the second csv look like they are day/month/year so I converted them for that format.
import pandas as pd
import datetime, csv
def get_data(df, issue_date, stock_ticker):
'''Return a Series for the ticker centered on the issue date.
'''
short = df.loc[df.Symbol.eq(stock_ticker)]
# get the index of the row of interest
try:
ix = short[short.Date.eq(issue_date)].index[0]
# get the item row for that row's index
iloc_ix = short.index.get_loc(ix)
# get the +/-1 iloc rows (+2 because that is how slices work), basically +1 and -1 trading days
short_data = short.iloc[iloc_ix-10: iloc_ix+11]
except IndexError:
msg = f'no data for {stock_ticker} on {issue_date}'
#log.info(msg)
print(msg)
short_data = None
return short_data
df = pd.read_csv (datafile)
df['Date'] = pd.to_datetime(df['Date'], format="%Y%m%d")
results = []
with open('issues.csv') as issues:
for ticker,date in csv.reader(issues):
day,month,year = map(int,date.split('/'))
# dt = datetime.datetime.strptime(date, r'%d/%m/%Y')
date = datetime.date(year,month,day)
s = get_data(df,date,ticker)
results.append(s)
# print(s)
Creating a single DataFrame or table for all that info may be problematic especially since the date ranges are all different. Probably should ask a separate question regarding that. Its mcve should probably just include a few minimal Pandas Series with a couple of different date ranges and tickers.

Reindexing a specific level of a MultiIndex dataframe

I have a DataFrame with two indices and would like to reindex it by one of the indices.
from pandas_datareader import data
import matplotlib.pyplot as plt
import pandas as pd
# Instruments to download
tickers = ['AAPL']
# Online source one should use
data_source = 'yahoo'
# Data range
start_date = '2000-01-01'
end_date = '2018-01-09'
# Load the desired data
panel_data = data.DataReader(tickers, data_source, start_date, end_date).to_frame()
panel_data.head()
The reindexing goes as follows:
# Get just the adjusted closing prices
adj_close = panel_data['Adj Close']
# Gett all weekdays between start and end dates
all_weekdays = pd.date_range(start=start_date, end=end_date, freq='B')
# Align the existing prices in adj_close with our new set of dates
adj_close = adj_close.reindex(all_weekdays, method="ffill")
The last line gives the following error:
TypeError: '<' not supported between instances of 'tuple' and 'int'
This is because the DataFrame index is a list of tuples:
panel_data.index[0]
(Timestamp('2018-01-09 00:00:00'), 'AAPL')
Is it possible to reindex adj_close? By the way, if I don't convert the Panel object to a DataFrame using to_frame(), the reindexing works as it is. But it seems that Panel objects are deprecated...
If you're looking to reindex on a certain level, then reindex accepts a level argument you can pass -
adj_close.reindex(all_weekdays, level=0)
When passing a level argument, you cannot pass a method argument at the same time (reindex throws a TypeError), so you can chain a ffill call after -
adj_close.reindex(all_weekdays, level=0).ffill()

Categories

Resources