How to add a column in multilevel Dataframe using pandas and yfinance?

How to add a column in multilevel Dataframe using pandas and yfinance? - python

I have the below code:
import yfinance as yf
import pandas as pd
import datetime as dt
end=dt.datetime.today()
start=end-dt.timedelta(59)
tickers=['WBA', 'HD']
ohlcv={}
df=pd.DataFrame
df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')
df['h-l']=abs(df.High-df.Low)
df['h-pc']=abs (df.High-df['Adj Close'].shift(1))
df['l-pc']=abs(df.Low-df['Adj Close'].shift(1))
df['tr']=df[['h-l','h-pc','l-pc']].max(axis=1)
df['atr']=df['tr'].rolling(window=n, min_periods=n).mean()
When I am trying to run it I am getting the below mentioned error:
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'High'
I tried using this code:
df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
the report extracted has mathematical errors as there is no separation between the tickers.
When I actually need is for each and every ticker mentioned in the tickers list it should create a column where called "h-l" where it subtracts the high of that row with the low of that row and so on.

Option 1: Multi-Level Column Names
Multi-level columns are accessed by passing a tuple
df[('WMB', 'High')]
Package versions used
print(pd.__version__) at least '1.0.5'
print(yf.__version__) is '0.1.54'
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
end = datetime.today()
start = end - timedelta(59)
tickers = ['WBA', 'HD']
df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')
# iterate over level 0 ticker names
for ticker in tickers:
df[(ticker, 'h-l')] = abs(df[(ticker, 'High')] - df[(ticker, 'Low')])
df[(ticker, 'h-pc')] = abs(df[(ticker, 'High')] - df[(ticker, 'Adj Close')].shift(1))
df[(ticker, 'l-pc')] = abs(df[(ticker, 'Low')] - df[(ticker, 'Adj Close')].shift(1))
df[(ticker, 'tr')] = df[[(ticker, 'h-l'), (ticker, 'h-pc'), (ticker, 'l-pc')]].max(axis=1)
# df[(ticker, 'atr')] = df[(ticker, 'tr')].rolling(window=n, min_periods=n).mean() # not included becasue n is not defined
# sort the columns
df = df.reindex(sorted(df.columns), axis=1)
# display(df.head())
HD WBA
Adj Close Close High Low Open Volume h-l h-pc l-pc tr Adj Close Close High Low Open Volume h-l h-pc l-pc tr
Datetime
2020-06-08 09:30:00-04:00 253.937500 253.937500 253.960007 252.360001 252.490005 210260.0 1.600006 NaN NaN 1.600006 46.049999 46.049999 46.070000 45.490002 45.490002 239860.0 0.579998 NaN NaN 0.579998
2020-06-08 09:35:00-04:00 253.470001 253.470001 254.339996 253.220093 253.990005 95906.0 1.119904 0.402496 0.717407 1.119904 46.330002 46.330002 46.330002 46.040001 46.070000 104259.0 0.290001 0.280003 0.009998 0.290001
2020-06-08 09:40:00-04:00 253.580002 253.580002 253.829895 252.955002 253.429993 55868.0 0.874893 0.359894 0.514999 0.874893 46.610001 46.610001 46.660000 46.240002 46.330002 113174.0 0.419998 0.329998 0.090000 0.419998
2020-06-08 09:45:00-04:00 253.740005 253.740005 253.929993 253.289993 253.529999 61892.0 0.639999 0.349991 0.290009 0.639999 46.880001 46.880001 46.950001 46.624100 46.624100 121388.0 0.325901 0.340000 0.014099 0.340000
2020-06-08 09:50:00-04:00 253.703400 253.703400 253.910004 253.419998 253.740005 60809.0 0.490005 0.169998 0.320007 0.490005 46.919998 46.919998 46.990002 46.820000 46.880001 154239.0 0.170002 0.110001 0.060001 0.170002
Option 2: Single-Level Column Names
As demonstrated in How to deal with multi-level column names downloaded with yfinance?, it's easier to deal with single-level column names.
With the tickers in a column instead of a multi-level column headers, use pandas.DataFrame.gropuby on the Ticker column.
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
tickerStrings = ['WBA', 'HD']
df = yf.download(tickers, group_by='Ticker', start=start ,end=end, interval='5m')
# create single level column names
df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
# function with calculations
def my_calculations(df):
df['h-l']=abs(df.High-df.Low)
df['h-pc']=abs(df.High-df['Adj Close'].shift(1))
df['l-pc']=abs(df.Low-df['Adj Close'].shift(1))
df['tr']=df[['h-l','h-pc','l-pc']].max(axis=1)
# df['atr']=df['tr'].rolling(window=n, min_periods=n).mean() # n is not defined in the question
return df
# apply the function
df_updated = df.reset_index().groupby('Ticker').apply(my_calculations).sort_values(['Ticker', 'Date'])

Here are some columns I've created. Finding the percent change from the previous day, finding the range, and the percent range.
df['% Change'] = (df['Adj Close'] / df['Adj Close'].shift(1))-1
df['Range'] = df['High'] - df['Low']
df['% Range'] = df['Range'] / df['Open']

Related

Mismatch when filling yearly data into dataframe with daily data

I am trying to download data and add statistics and economic indicators, however my data is on a daily basis and the indicators are on a yearly basis.
I tried to store year/indicator pairs as a dictionary, go through each day in the dates column returned from yfinance, and populate a list with the GDP Deflator for each day using the dictionary. Then I convert that list to a Dataframe and add it as a row to the dataframe returned from yfinance and save it as a csv.
However, when I look at the csv file, the GDP deflator for 2004 shows up for the last day in 2003, and for the last two days in 2004 the GDP Deflator is that of 2005.
What am I doing wrong?
code below:
import pandas as pd
import yfinance as yf
import world_bank_data as wb
df = pd.DataFrame() # Empty DataFrame
GDPD = []
df = yf.download(tickers = 'USDSGD=X' , period='max', interval='1d')
df.reset_index(inplace=True)
date = df['Date']
SGD_def_dict = {"Year":[],"GDP_Deflator":[]}
for i in range(len(date)):
if date[i].year in SGD_def_dict['Year']:
GDPD.append(list(SGD_def_dict.values())[-1][-1])
else:
SGD_def_dict["Year"].append(date[i].year)
try:
SGD_def_dict["GDP_Deflator"].append(wb.get_series('NY.GDP.DEFL.ZS', country= 'SGP', date=date[i].year, id_or_value='id', simplify_index=True))
except:
SGD_def_dict["GDP_Deflator"].append(float("nan"))
#GDPD.append(list(SGD_def_dict.values())[-1][-1])
df2 = pd.DataFrame({"GDP_Deflator":GDPD})
df["GDP_Deflator"] = df2
df.to_csv(r'C:..WBTEST.csv')`

You need to match the year of each day to the corresponding GDP deflator in the dictionary, and then use the same value for all days in that year.
import pandas as pd
import yfinance as yf
import world_bank_data as wb
df = pd.DataFrame() # Empty DataFrame
df = yf.download(tickers = 'USDSGD=X' , period='max', interval='1d')
df.reset_index(inplace=True)
date = df['Date']
SGD_def_dict = {"Year":[],"GDP_Deflator":[]}
for i in range(len(date)):
year = date[i].year
if year not in SGD_def_dict['Year']:
SGD_def_dict["Year"].append(year)
try:
SGD_def_dict["GDP_Deflator"].append(wb.get_series('NY.GDP.DEFL.ZS', country= 'SGP', date=year, id_or_value='id', simplify_index=True))
except:
SGD_def_dict["GDP_Deflator"].append(float("nan"))
df['Year'] = df['Date'].dt.year
df = df.merge(pd.DataFrame(SGD_def_dict), on='Year')
df.drop(['Year'], axis=1, inplace=True)
df.to_csv(r'C:..WBTEST.csv')

How can I sort timestamp from following data dictionary?

Code:
import pandas as pd
from pycoingecko import CoinGeckoAPI
c=CoinGeckoAPI()
bdata=c.get_coin_market_chart_by_id(id='bitcoin',vs_currency='usd',days=30)
data_=pd.DataFrame(bdata)
print(data_)
data=pd.to_datetime(data_[prices],unit='ms')
print(data)
Output:
Requirement:
But I required output in which 4 columns:
Timestamp, Prices, Market_caps, Total_volume
And I want to change the timestamp format into to_datetime
In the above codes, I just sort the bitcoin data from pycoingecko
Example:

You can convert this into a dataframe format like this:
import pandas as pd
from pycoingecko import CoinGeckoAPI
c=CoinGeckoAPI()
bdata=c.get_coin_market_chart_by_id(id='bitcoin',vs_currency='usd',days=30)
prices = pd.DataFrame(bdata['prices'], columns=['TimeStamp', 'Price']).set_index('TimeStamp')
market_caps = pd.DataFrame(bdata['market_caps'], columns=['TimeStamp', 'Market Cap']).set_index('TimeStamp')
total_volumes = pd.DataFrame(bdata['total_volumes'], columns=['TimeStamp', 'Total Volumes']).set_index('TimeStamp')
# combine the separate dataframes
df_market = pd.concat([prices, market_caps, total_volumes], axis=1)
# convert the index to a datetime dtype
df_market.index = pd.to_datetime(df_market.index, unit='ms')
Code adapted from this answer.

You can extract the timestamp column and convert it into date as following with minimum change to your code, you can follow up by merging the new column to your array:
import pandas as pd
from pycoingecko import CoinGeckoAPI
c=CoinGeckoAPI()
bdata=c.get_coin_market_chart_by_id(id='bitcoin',vs_currency='usd',days=30)
data_=pd.DataFrame(bdata)
print(data_)
#data=pd.to_datetime(data_["prices"],unit='ms')
df = pd.DataFrame([pd.Series(x) for x in data_["prices"]])
df.columns = ["timestamp","data"]
df=pd.to_datetime(df["timestamp"],unit='ms')
print(df)

How to filter columns in a multindex dataframe (pandas)

I have the below dataframe:
WBA ... HD
Open High Low Close ... l-pc h-l h-pc l-pc
Datetime ...
2020-06-08 09:30:00-04:00 45.490002 46.090000 45.490002 46.049999 ... NaN 2.100006 NaN NaN
2020-06-08 09:35:00-04:00 46.070000 46.330002 46.040001 46.330002 ... 0.009998 1.119904 0.402496 0.717407
2020-06-08 09:40:00-04:00 46.330002 46.660000 46.240002 46.610001 ... 0.090000 0.874893 0.359894 0.514999
2020-06-08 09:45:00-04:00 46.624100 46.950001 46.624100 46.880001 ... 0.014099 0.639999 0.349991 0.290009
2020-06-08 09:50:00-04:00 46.880001 46.990002 46.820000 46.919998 ... 0.060001 0.490005 0.169998 0.320007
this dataframe was obtained using the below code:
import yfinance as yf
import pandas as pd
import datetime as dt
end=dt.datetime.today()
start=end-dt.timedelta(59)
tickers=['WBA', 'HD']
ohlcv={}
df=pd.DataFrame
df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')
for i in tickers:
df[i,"h-l"]=abs(df[i]['High']-df[i]['Low'])
df[i,'h-pc']=abs (df[i]["High"]-df[i]['Adj Close'].shift(1))
df[i,'l-pc']=abs(df[i]["Low"]-df[i]['Adj Close'].shift(1))
I am trying to apply this function for all the tickers mentioned in the "tickers" list:
df['tr']=dff[['h-l','h-pc','l-pc']].max(axis=1)
df['atr']=df['tr'].rolling(window=n, min_periods=n).mean()
for the tickers I need to find the "tr" and then using the tr i have to find the "atr" I am not able get the"tr"

Be systematic about accessing columns through tuples and it all just works.
import yfinance as yf
import pandas as pd
import datetime as dt
end=dt.datetime.today()
start=end-dt.timedelta(59)
tickers=['WBA', 'HD']
ohlcv={}
# df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')
dfc = df.copy()
for t in tickers:
dfc[(t,"h-l")] = abs(dfc.loc[:,(t,'High')] - dfc.loc[:,(t,'Low')])
dfc[(t,"h-pc")] = abs(dfc.loc[:,(t,'High')] - dfc.loc[:,(t,'Adj Close')].shift(1))
dfc[(t,"l-pc")] = abs(dfc.loc[:,(t,'Low')] - dfc.loc[:,(t,'Adj Close')].shift(1))
# access all the new columns through tuples e.g ("WBA","h-l") ...
dfc["tr"] = dfc[[(t, c) for t in tickers for c in ['h-l','h-pc','l-pc']]].max(axis=1)
n=5
dfc["atr"] = dfc['tr'].rolling(window=n, min_periods=n).mean()

Select all rows in df with the same value in second value of index tuple

In the same vein as my earlier question here previous question about pandas index tuples
How do I access all rows of a df that have the same second element of a tuple in the index?
I have the following df which continues to the next date. It would be great to show dates with the same Symbol together.
Open Close Day Change
Date Symbol
11-01-2018 AEDAUD 0.3470 0.3448 -0.0022
AEDCAD 0.3415 0.3408 -0.0007
AEDCHF 0.2663 0.2656 -0.0007
AEDDKK 1.6955 1.6838 -0.0117
AEDEUR 0.2277 0.2261 -0.0016
I'm having trouble selecting using all the rows with the same value in the Symbol column.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
print(pd.__version__)
forex_11 = pd.read_csv('FOREX_20180111.csv', sep=',', parse_dates=['Date'])
forex_12 = pd.read_csv('FOREX_20180112.csv', sep=',', parse_dates=['Date'])
time_format = '%d-%m-%Y'
forex = forex_11.append(forex_12, ignore_index=False)
forex['Date'] = forex['Date'].dt.strftime(time_format)
tuples = list(forex[['Date', 'Symbol']].itertuples(index=False, name=None))
forex.index = pd.MultiIndex.from_tuples(tuples, names=['Date', 'Symbol'])
forex_open_close = pd.DataFrame(np.array(forex[['Open','Close']]), index=forex.index)
forex_open_close.columns = ['Open', 'Close']
forex_open_close['Day Change'] = forex_open_close['Close'] - forex_open_close['Open']
print(forex_open_close.head())

Ok with credit to ChuHo
The following code solves my problem
idx = pd.IndexSlice
print(forex_open_close.loc[idx[:,['AUDARS']], :])
And gives this output :
Open Close Day Change
Date Symbol
11-01-2018 AUDARS 14.6193 14.7489 0.1296
12-01-2018 AUDARS 14.7486 14.7758 0.0272

Pandas Yahoo Stock API

I am new to Pandas (and Python) and trying to working with the Yahoo API for stock prices.
I need to get the data, loop through it and grab the dates and values.
here is the code
df = pd.get_data_yahoo( symbols = 'AAPL',
start = datetime( 2011, 1, 1 ),
end = datetime( 2012, 1, 1 ),
interval = 'm' )
results are:
df
Open High Low Close Volume
Date
2011-01-03 325.640015 348.600006 324.840027 339.320007 140234700
2011-02-01 341.299988 364.899994 337.720001 353.210022 127618700
2011-03-01 355.470001 361.669983 326.259979 348.510010 125874700
I can get the dates but not the month date value because it is the index(?)
How best to loop through the data for this information? This is about processing the data and not sorting or searching it.

If you need to iterate over the rows in your dataframe, and do some processing, then pandas.DataFrame.apply() works great.
Code:
Some mock processing code...
def process_data(row):
# the index becomes the name when converted to a series (row)
print(row.name.month, row.Close)
Test Code:
import datetime as dt
from pandas_datareader import data
df = data.get_data_yahoo(
'AAPL',
start=dt.datetime(2011, 1, 1),
end=dt.datetime(2011, 5, 1),
interval='m')
print(df)
# process each row
df.apply(process_data, axis=1)
Results:
Open High Low Close Volume \
Date
2011-01-03 325.640015 348.600006 324.840027 339.320007 140234700
2011-02-01 341.299988 364.899994 337.720001 353.210022 127618700
2011-03-01 355.470001 361.669983 326.259979 348.510010 125874700
2011-04-01 351.110016 355.130005 320.160004 350.130005 128252100
Adj Close
Date
2011-01-03 43.962147
2011-02-01 45.761730
2011-03-01 45.152802
2011-04-01 45.362682
1 339.320007
2 353.210022
3 348.51001
4 350.130005

here is what made my life groovy when trying to work with the data from Yahoo.
First was getting the date from the dataframe index.
df = df.assign( date = df.index.date )
here are a few others I found helpful from dealing with the data.
df [ 'diff' ] = df [ 'Close' ].diff( )
df [ 'pct_chg' ] = df [ 'Close' ].pct_change()
df [ 'hl' ] = df [ 'High' ] - df [ 'Low' ]
Pandas is amazing stuff.

I believe this should work for you.
import pandas_datareader.data as web
import datetime
start = datetime.datetime(2013, 1, 1)
end = datetime.datetime(2016, 1, 27)
df = web.DataReader("GOOGL", 'yahoo', start, end)
dates =[]
for x in range(len(df)):
newdate = str(df.index[x])
newdate = newdate[0:10]
dates.append(newdate)
df['dates'] = dates
print df.head()
print df.tail()
Also, take a look at the link below for more helpful hints of how to do these kinds of things.
https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#yahoo-finance

from pandas_datareader import data as pdr
from datetime import date
import yfinance as yf
yf.pdr_override()
import pandas as pd
import requests
import json
from os import listdir
from os.path import isfile, join
# Tickers List
tickers_list = ['AAPL', 'GOOGL','FB', 'WB' , 'MO']
today = date.today()
# We can get data by our choice by giving days bracket
start_date= "2010-01-01"
files=[]
def getData(ticker):
print (ticker)
data = pdr.get_data_yahoo(ticker, start=start_date, end=today)
dataname= ticker+'_'+str(today)
files.append(dataname)
SaveData(data, dataname)
# Create an data folder to save these data file in data folder.
def SaveData(df, filename):
df.to_csv('./data/'+filename+'.csv')
for tik in tickers_list:
getData(tik)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to add a column in multilevel Dataframe using pandas and yfinance? - python

Here are some columns I've created. Finding the percent change from the previous day, finding the range, and the percent range. df['% Change'] = (df['Adj Close'] / df['Adj Close'].shift(1))-1 df['Range'] = df['High'] - df['Low'] df['% Range'] = df['Range'] / df['Open']

Related

Mismatch when filling yearly data into dataframe with daily data

How can I sort timestamp from following data dictionary?

How to filter columns in a multindex dataframe (pandas)

Select all rows in df with the same value in second value of index tuple

Pandas Yahoo Stock API

Categories

Resources