I am new to Pandas (and Python) and trying to working with the Yahoo API for stock prices.
I need to get the data, loop through it and grab the dates and values.
here is the code
df = pd.get_data_yahoo( symbols = 'AAPL',
start = datetime( 2011, 1, 1 ),
end = datetime( 2012, 1, 1 ),
interval = 'm' )
results are:
df
Open High Low Close Volume
Date
2011-01-03 325.640015 348.600006 324.840027 339.320007 140234700
2011-02-01 341.299988 364.899994 337.720001 353.210022 127618700
2011-03-01 355.470001 361.669983 326.259979 348.510010 125874700
I can get the dates but not the month date value because it is the index(?)
How best to loop through the data for this information? This is about processing the data and not sorting or searching it.
If you need to iterate over the rows in your dataframe, and do some processing, then pandas.DataFrame.apply() works great.
Code:
Some mock processing code...
def process_data(row):
# the index becomes the name when converted to a series (row)
print(row.name.month, row.Close)
Test Code:
import datetime as dt
from pandas_datareader import data
df = data.get_data_yahoo(
'AAPL',
start=dt.datetime(2011, 1, 1),
end=dt.datetime(2011, 5, 1),
interval='m')
print(df)
# process each row
df.apply(process_data, axis=1)
Results:
Open High Low Close Volume \
Date
2011-01-03 325.640015 348.600006 324.840027 339.320007 140234700
2011-02-01 341.299988 364.899994 337.720001 353.210022 127618700
2011-03-01 355.470001 361.669983 326.259979 348.510010 125874700
2011-04-01 351.110016 355.130005 320.160004 350.130005 128252100
Adj Close
Date
2011-01-03 43.962147
2011-02-01 45.761730
2011-03-01 45.152802
2011-04-01 45.362682
1 339.320007
2 353.210022
3 348.51001
4 350.130005
here is what made my life groovy when trying to work with the data from Yahoo.
First was getting the date from the dataframe index.
df = df.assign( date = df.index.date )
here are a few others I found helpful from dealing with the data.
df [ 'diff' ] = df [ 'Close' ].diff( )
df [ 'pct_chg' ] = df [ 'Close' ].pct_change()
df [ 'hl' ] = df [ 'High' ] - df [ 'Low' ]
Pandas is amazing stuff.
I believe this should work for you.
import pandas_datareader.data as web
import datetime
start = datetime.datetime(2013, 1, 1)
end = datetime.datetime(2016, 1, 27)
df = web.DataReader("GOOGL", 'yahoo', start, end)
dates =[]
for x in range(len(df)):
newdate = str(df.index[x])
newdate = newdate[0:10]
dates.append(newdate)
df['dates'] = dates
print df.head()
print df.tail()
Also, take a look at the link below for more helpful hints of how to do these kinds of things.
https://pandas-datareader.readthedocs.io/en/latest/remote_data.html#yahoo-finance
from pandas_datareader import data as pdr
from datetime import date
import yfinance as yf
yf.pdr_override()
import pandas as pd
import requests
import json
from os import listdir
from os.path import isfile, join
# Tickers List
tickers_list = ['AAPL', 'GOOGL','FB', 'WB' , 'MO']
today = date.today()
# We can get data by our choice by giving days bracket
start_date= "2010-01-01"
files=[]
def getData(ticker):
print (ticker)
data = pdr.get_data_yahoo(ticker, start=start_date, end=today)
dataname= ticker+'_'+str(today)
files.append(dataname)
SaveData(data, dataname)
# Create an data folder to save these data file in data folder.
def SaveData(df, filename):
df.to_csv('./data/'+filename+'.csv')
for tik in tickers_list:
getData(tik)
Related
I am trying to download data and add statistics and economic indicators, however my data is on a daily basis and the indicators are on a yearly basis.
I tried to store year/indicator pairs as a dictionary, go through each day in the dates column returned from yfinance, and populate a list with the GDP Deflator for each day using the dictionary. Then I convert that list to a Dataframe and add it as a row to the dataframe returned from yfinance and save it as a csv.
However, when I look at the csv file, the GDP deflator for 2004 shows up for the last day in 2003, and for the last two days in 2004 the GDP Deflator is that of 2005.
What am I doing wrong?
code below:
import pandas as pd
import yfinance as yf
import world_bank_data as wb
df = pd.DataFrame() # Empty DataFrame
GDPD = []
df = yf.download(tickers = 'USDSGD=X' , period='max', interval='1d')
df.reset_index(inplace=True)
date = df['Date']
SGD_def_dict = {"Year":[],"GDP_Deflator":[]}
for i in range(len(date)):
if date[i].year in SGD_def_dict['Year']:
GDPD.append(list(SGD_def_dict.values())[-1][-1])
else:
SGD_def_dict["Year"].append(date[i].year)
try:
SGD_def_dict["GDP_Deflator"].append(wb.get_series('NY.GDP.DEFL.ZS', country= 'SGP', date=date[i].year, id_or_value='id', simplify_index=True))
except:
SGD_def_dict["GDP_Deflator"].append(float("nan"))
#GDPD.append(list(SGD_def_dict.values())[-1][-1])
df2 = pd.DataFrame({"GDP_Deflator":GDPD})
df["GDP_Deflator"] = df2
df.to_csv(r'C:..WBTEST.csv')`
You need to match the year of each day to the corresponding GDP deflator in the dictionary, and then use the same value for all days in that year.
import pandas as pd
import yfinance as yf
import world_bank_data as wb
df = pd.DataFrame() # Empty DataFrame
df = yf.download(tickers = 'USDSGD=X' , period='max', interval='1d')
df.reset_index(inplace=True)
date = df['Date']
SGD_def_dict = {"Year":[],"GDP_Deflator":[]}
for i in range(len(date)):
year = date[i].year
if year not in SGD_def_dict['Year']:
SGD_def_dict["Year"].append(year)
try:
SGD_def_dict["GDP_Deflator"].append(wb.get_series('NY.GDP.DEFL.ZS', country= 'SGP', date=year, id_or_value='id', simplify_index=True))
except:
SGD_def_dict["GDP_Deflator"].append(float("nan"))
df['Year'] = df['Date'].dt.year
df = df.merge(pd.DataFrame(SGD_def_dict), on='Year')
df.drop(['Year'], axis=1, inplace=True)
df.to_csv(r'C:..WBTEST.csv')
I have the below code:
import yfinance as yf
import pandas as pd
import datetime as dt
end=dt.datetime.today()
start=end-dt.timedelta(59)
tickers=['WBA', 'HD']
ohlcv={}
df=pd.DataFrame
df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')
df['h-l']=abs(df.High-df.Low)
df['h-pc']=abs (df.High-df['Adj Close'].shift(1))
df['l-pc']=abs(df.Low-df['Adj Close'].shift(1))
df['tr']=df[['h-l','h-pc','l-pc']].max(axis=1)
df['atr']=df['tr'].rolling(window=n, min_periods=n).mean()
When I am trying to run it I am getting the below mentioned error:
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'High'
I tried using this code:
df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
the report extracted has mathematical errors as there is no separation between the tickers.
When I actually need is for each and every ticker mentioned in the tickers list it should create a column where called "h-l" where it subtracts the high of that row with the low of that row and so on.
Option 1: Multi-Level Column Names
Multi-level columns are accessed by passing a tuple
df[('WMB', 'High')]
Package versions used
print(pd.__version__) at least '1.0.5'
print(yf.__version__) is '0.1.54'
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
end = datetime.today()
start = end - timedelta(59)
tickers = ['WBA', 'HD']
df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')
# iterate over level 0 ticker names
for ticker in tickers:
df[(ticker, 'h-l')] = abs(df[(ticker, 'High')] - df[(ticker, 'Low')])
df[(ticker, 'h-pc')] = abs(df[(ticker, 'High')] - df[(ticker, 'Adj Close')].shift(1))
df[(ticker, 'l-pc')] = abs(df[(ticker, 'Low')] - df[(ticker, 'Adj Close')].shift(1))
df[(ticker, 'tr')] = df[[(ticker, 'h-l'), (ticker, 'h-pc'), (ticker, 'l-pc')]].max(axis=1)
# df[(ticker, 'atr')] = df[(ticker, 'tr')].rolling(window=n, min_periods=n).mean() # not included becasue n is not defined
# sort the columns
df = df.reindex(sorted(df.columns), axis=1)
# display(df.head())
HD WBA
Adj Close Close High Low Open Volume h-l h-pc l-pc tr Adj Close Close High Low Open Volume h-l h-pc l-pc tr
Datetime
2020-06-08 09:30:00-04:00 253.937500 253.937500 253.960007 252.360001 252.490005 210260.0 1.600006 NaN NaN 1.600006 46.049999 46.049999 46.070000 45.490002 45.490002 239860.0 0.579998 NaN NaN 0.579998
2020-06-08 09:35:00-04:00 253.470001 253.470001 254.339996 253.220093 253.990005 95906.0 1.119904 0.402496 0.717407 1.119904 46.330002 46.330002 46.330002 46.040001 46.070000 104259.0 0.290001 0.280003 0.009998 0.290001
2020-06-08 09:40:00-04:00 253.580002 253.580002 253.829895 252.955002 253.429993 55868.0 0.874893 0.359894 0.514999 0.874893 46.610001 46.610001 46.660000 46.240002 46.330002 113174.0 0.419998 0.329998 0.090000 0.419998
2020-06-08 09:45:00-04:00 253.740005 253.740005 253.929993 253.289993 253.529999 61892.0 0.639999 0.349991 0.290009 0.639999 46.880001 46.880001 46.950001 46.624100 46.624100 121388.0 0.325901 0.340000 0.014099 0.340000
2020-06-08 09:50:00-04:00 253.703400 253.703400 253.910004 253.419998 253.740005 60809.0 0.490005 0.169998 0.320007 0.490005 46.919998 46.919998 46.990002 46.820000 46.880001 154239.0 0.170002 0.110001 0.060001 0.170002
Option 2: Single-Level Column Names
As demonstrated in How to deal with multi-level column names downloaded with yfinance?, it's easier to deal with single-level column names.
With the tickers in a column instead of a multi-level column headers, use pandas.DataFrame.gropuby on the Ticker column.
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
tickerStrings = ['WBA', 'HD']
df = yf.download(tickers, group_by='Ticker', start=start ,end=end, interval='5m')
# create single level column names
df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
# function with calculations
def my_calculations(df):
df['h-l']=abs(df.High-df.Low)
df['h-pc']=abs(df.High-df['Adj Close'].shift(1))
df['l-pc']=abs(df.Low-df['Adj Close'].shift(1))
df['tr']=df[['h-l','h-pc','l-pc']].max(axis=1)
# df['atr']=df['tr'].rolling(window=n, min_periods=n).mean() # n is not defined in the question
return df
# apply the function
df_updated = df.reset_index().groupby('Ticker').apply(my_calculations).sort_values(['Ticker', 'Date'])
Here are some columns I've created. Finding the percent change from the previous day, finding the range, and the percent range.
df['% Change'] = (df['Adj Close'] / df['Adj Close'].shift(1))-1
df['Range'] = df['High'] - df['Low']
df['% Range'] = df['Range'] / df['Open']
I have been trying to extract stock prices using pandas_datareader. data, but
I kept receiving an error message.
I have checked other threads relating to this problem and, I have tried downloading data reader using conda install DataReader and also tried pip install DataReader.
import pandas as pd
import datetime
from pandas import Series,DataFrame
import pandas_datareader.data as web
pandas_datareader.__version__
'0.6.0'
start=datetime.datetime(2009,1,1)
end=datetime.datetime(2019,1,1)
df=web.DataReader( 'AT&T Inc T',start,end)
df.head()
My expected result should be a data frame with all the features and rows of the stock.
Below is the error message I got:
Please, how do I fix this problem?
Thanks.
<ipython-input-45-d75bedd6b2dd> in <module>
1 start=datetime.datetime(2009,1,1)
2 end=datetime.datetime(2019,1,1)
----> 3 df=web.DataReader( 'AT&T Inc T',start,end)
4 df.head()
~\Anaconda3\lib\site-packages\pandas_datareader\data.py in DataReader(name,
data_source, start, end, retry_count, pause, session, access_key)
456 else:
457 msg = "data_source=%r is not implemented" % data_source
--> 458 raise NotImplementedError(msg)
459
460
NotImplementedError: data_source=datetime.datetime(2009, 1, 1, 0, 0) is not implemented
The following worked:
import pandas as pd
import datetime
from pandas import Series,DataFrame
import pandas_datareader
import pandas_datareader.data as web
pandas_datareader.__version__
start=datetime.datetime(2009,1,1)
end=datetime.datetime(2019,1,1)
df=web.DataReader( 'T', "yahoo", start,end)
print(df.head())
The data log is as the follows:
High Low ... Volume Adj Close
Date ...
2009-01-02 29.459999 28.430000 ... 21879800.0 16.438549
2009-01-05 28.889999 28.059999 ... 32414700.0 15.885386
2009-01-06 28.700001 28.000000 ... 28746100.0 15.812749
2009-01-07 27.650000 27.000000 ... 30532700.0 15.427205
2009-01-08 27.350000 26.820000 ... 21431200.0 15.410195
[5 rows x 6 columns]
This is how I would do it.
import pandas as pd
import numpy as np
from pandas_datareader import data as wb
import datetime as dt
start = '2019-6-20'
end = '2019-7-20'
tickers = ['CSCO',
'AXP',
'HD',
'PG']
thelen = len(tickers)
price_data = []
for ticker in tickers:
prices = wb.DataReader(ticker, start = start, end = end, data_source='yahoo')[['Adj Close']]
price_data.append(prices.assign(ticker=ticker)[['ticker', 'Adj Close']])
df = pd.concat(price_data)
df.dtypes
df.head()
df.shape
pd.set_option('display.max_columns', 500)
df = df.reset_index()
df = df.set_index('Date')
table = df.pivot(columns='ticker')
# By specifying col[1] in below list comprehension
# You can select the stock names under multi-level column
table.columns = [col[1] for col in table.columns]
table.head()
Result:
AXP CSCO HD PG
Date
2019-06-20 124.530563 57.049965 211.250000 111.021019
2019-06-21 124.341156 56.672348 209.389999 110.484497
2019-06-24 123.752991 56.821407 205.500000 111.607231
2019-06-25 122.776054 55.728306 204.740005 111.001152
2019-06-26 123.204704 56.245041 206.419998 109.023956
So I'm trying to grab per minute stock data over a one year time gap and I know the Google Finance API doesn't work anymore so I did some digging around I found some code from a old github thread that could find the range within 5 days from yahoo finance data; however, it does not do anymore than that even when I put a keyword like '1Y' which defaults to 1 day. Here is the code below:
import requests
import pandas as pd
import arrow
import datetime
import os
def get_quote_data(symbol='AAPL', data_range='5d', data_interval='1m'):
res = requests.get('https://query1.finance.yahoo.com/v8/finance/chart/{symbol}?range={data_range}&interval={data_interval}'.format(**locals()))
data = res.json()
body = data['chart']['result'][0]
dt = datetime.datetime
dt = pd.Series(map(lambda x: arrow.get(x).datetime.replace(tzinfo=None), body['timestamp']), name='Datetime')
df = pd.DataFrame(body['indicators']['quote'][0], index=dt)
dg = pd.DataFrame(body['timestamp'])
df = df.loc[:, ('open', 'high', 'low', 'close', 'volume')]
df.dropna(inplace=True) #removing NaN rows
df.columns = ['OPEN', 'HIGH','LOW','CLOSE','VOLUME'] #Renaming columns in pandas
return df
body['meta']['validRanges'] tells you:
['1d', '5d', '1mo', '3mo', '6mo', '1y', '2y', '5y', '10y', 'ytd', 'max']
You are requesting 1Y instead of 1y. This difference is important.
By the way you can load the timestamps much more easily like this:
pd.to_datetime(body['timestamp'], unit='s')
print('stock ticker: {0}'.format(get_quote_data(symbol='AAPL', data_range='1d', data_interval='1m')))
works
I have five stock portfolios that I have imported from Yahoo! finance and need to create a DataFrame with the closing prices for 2016 of all of the stocks. However, I'm struggling to label the columns with the corresponding stock names.
import pandas.io.data as web
import pandas_datareader.data as web
import pandas as pd
from pandas import Series, DataFrame
import numpy as np
import datetime
start = datetime.datetime(2016, 1, 1)
end = datetime.datetime(2016, 12, 31)
NFLX = web.DataReader("NFLX", 'yahoo', start, end)
AAPL = web.DataReader("AAPL", 'yahoo', start, end)
GOOGL = web.DataReader("GOOGL", 'yahoo', start, end)
FB = web.DataReader("FB", 'yahoo', start, end)
TSLA = web.DataReader("TSLA", 'yahoo', start, end)
df_NFLX = pd.DataFrame(NFLX['Close'])
df_AAPL = pd.DataFrame(AAPL['Close'])
df_GOOGL = pd.DataFrame(GOOGL['Close'])
df_FB = pd.DataFrame(FB['Close'])
df_TSLA = pd.DataFrame(TSLA['Close'])
frames = [df_NFLX, df_AAPL, df_GOOGL, df_FB, df_TSLA]
result = pd.concat(frames, axis = 1)
result = result.rename(columns = {'Two':'N'})
result
My code produces this - and I want to title each column accordingly.
Out[15]:
Close Close Close Close Close
Date
2016-01-04 109.959999 105.349998 759.440002 102.220001 223.410004
2016-01-05 107.660004 102.709999 761.530029 102.730003 223.429993
2016-01-06 117.680000 100.699997 759.330017 102.970001 219.039993
2016-01-07 114.559998 96.449997 741.000000 97.919998 215.649994
2016-01-08 111.389999 96.959999 730.909973 97.330002 211.000000
2016-01-11 114.970001 98.529999 733.070007 97.510002 207.850006
2016-01-12 116.580002 99.959999 745.340027 99.370003 209.970001
A simple way to patch up the code you've written is to just assign a list of names to df.columns.
df.columns = ['NFLX', 'AAPL', 'GOOGL', 'FB', 'TSLA']
However, there are ways to make large chunks of your code more concise which also allow you to specify the stock names as column names cleanly. I would go back to the beginning and (after defining start and end) start by creating a list of the stock tickers you want to fetch.
start = datetime.datetime(2016, 1, 1)
end = datetime.datetime(2016, 12, 31)
tickers = ['NFLX', 'AAPL', 'GOOGL', 'FB', 'TSLA']
Then you can construct all the data frames in a loop of some kind. If you want only the Close column, you can extract that column immediately, and in fact you can make a dict out of all these columns and then construct a DataFrame directly from that dict.
result = DataFrame({t: web.DataReader(t, 'yahoo', start, end)['Close']
for t in tickers})
An alternative would be to put all the stock data in a Panel, which would be useful if you might want to work with other columns.
p = pd.Panel({t: web.DataReader(t, 'yahoo', start, end) for t in tickers})
Then you can extract the Close figures with
result = p[:,:,'Close']
You'll notice it has the proper column labels automatically.
To rename the columns in the constructed table, you can change this:
df_NFLX = pd.DataFrame(NFLX['Close'])
to this:
df_NFLX = pd.DataFrame(NFLX['Close']).rename(columns={'Close': 'NFLX'})