I'm trying to use pandas to download historical stock data for all Stockholm Large Cap stocks. It works fine but for some stocks it doesn't.
import pandas_datareader.data as pdr
import datetime
import csv
with open('stockholm_largecap.csv', 'rb') as f:
reader = csv.reader(f)
stockholmLargeCap = list(reader)
start = datetime.datetime(1970, 1, 1)
end = datetime.datetime.today();
stockData = {}
for symbol in stockholmLargeCap:
f = pdr.DataReader(symbol, 'yahoo', start, end)
print f
The stockholm_largecap.csv contains all stocks in alphabetical order but once I get to certain stocks I get (for example BETS-B.ST): SymbolWarning: Failed to read symbol: 'BETS-B.ST', replacing with NaN. and the script terminates. Is there some way to continue the program, ignoring the error and what could be the cause of some stocks not working?
raise RemoteDataError(msg.format(self.__class__.__name__))
pandas_datareader._utils.RemoteDataError: No data fetched using 'YahooDailyReader'
use try and except
import pandas_datareader.data as pdr
for symbol in ['SPY', 'holla']:
try:
f = pdr.DataReader(symbol, 'yahoo', "2001-01-01", "2010-01-01")
print f.head(5)
except:
print ('did not find: '+symbol)
Open High Low Close Volume Adj Close
Date
2001-01-02 132.0000 132.1562 127.5625 128.8125 8737500 95.2724
2001-01-03 128.3125 136.0000 127.6562 135.0000 19431600 99.8488
2001-01-04 134.9375 135.4687 133.0000 133.5468 9219000 98.7740
2001-01-05 133.4687 133.6250 129.1875 129.1875 12911400 95.5497
2001-01-08 129.8750 130.1875 127.6875 130.1875 6625300 96.2893
did not find: holla
I had the same problem while I was trying to get stocks from a list , I used exception handling block which continued the execution of the code despite of the symbol warning viz.[ SymbolWarning: Failed to read symbol: 'AXZZW', replacing with NaN.
warnings.warn(msg.format(sym), SymbolWarning)]
from pandas_datareader._utils import RemoteDataError
from pandas_datareader.data import Options
try:
df1 = web.DataReader(rows[i],'yahoo',"2001-01-01","2010-01-01")
print("Downloading",i,"/",len(rows),"............")
print(df1)
except KeyError:
print("Data not found at Ticker %s"%i)
continue
except RemoteDataError:
print("Data not found at Ticker %s"%i)
continue
print("Success!")
Hope this works for you too!
Related
Im using the python yfinance yahoo API for stock data retrieval. Right now im getting the peg ratio, which is an indicator of a company price related to its growth and earnings. I have a csv downloaded from here: https://www.nasdaq.com/market-activity/stocks/screener.
It has exactly 8000 stocks.
What I do is get the symbol list, and iterate it to access to the yahoo ticker. Then I get a use the ticker.info method which returns a dictionary. I iterate this process through the 8000 symbols. It goes at a speed of 6 symbols per minute, which is not viable. Is there a faster way with another API or another structure? I dont care about the API as long as I can get basic info as the growth, earnings, EPS and those things.
Here is the code:
import pandas as pd
import yfinance as yf
data = pd.read_csv("data/stock_list.csv")
symbols = data['Symbol']
for symbol in symbols:
stock = yf.Ticker(symbol)
try:
if stock.info['pegRatio']:
print(stock.info['shortName'] + " : " + str(stock.info['pegRatio']))
except KeyError:
pass
It seems that when certain data are needed from the Ticker.info attribute, HTTP requests are made to acquire them. Multithreading will help to improve matters. Try this:-
import pandas as pd
import yfinance as yf
import concurrent.futures
data = pd.read_csv('data/stock_list.csv')
def getPR(symbol):
sn = None
pr = None
try:
stock = yf.Ticker(symbol)
pr = stock.info['pegRatio']
sn = stock.info['shortName']
except Exception:
pass
return (sn, pr)
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = {executor.submit(getPR, sym): sym for sym in data['Symbol']}
for future in concurrent.futures.as_completed(futures):
sn, pr = future.result()
if sn:
print(f'{sn} : {pr}')
I am trying to use yahoo finance for importing stock data
I am using the code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')
plt.style.use("fivethirtyeight")
%matplotlib inline
# For reading stock data from yahoo
from pandas_datareader.data import DataReader
# For time stamps
from datetime import datetime
It is running fine.
from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override() # <== that's all it takes :-)
# download dataframe
# The tech stocks we'll use for this analysis
tech_list = ['WIPRO.BO', 'INFY.BO', 'TCS.BO', 'HAPPSTMNDS.BO']
# Set up End and Start times for data grab
end = datetime.now()
start = datetime(end.year - 1, end.month, end.day)
#For loop for grabing yahoo finance data and setting as a dataframe
for stock in tech_list:
# Set DataFrame as the Stock Ticker
globals()[stock] = pdr.get_data_yahoo(stock, start, end)
While running below mentioned code I am getting an error:
company_list = ['WIPRO.BO', 'INFY.BO', 'TCS.BO', 'HAPPSTMNDS.BO']
company_name = ["Wipro", "Infosys", "Tata_Consultancy_Services", "Happiest_Minds_Technologies"]
for company, com_name in zip(company_list, company_name):
company["company_name"] = com_name
df = pd.concat(company_list, axis=0)
df.tail(10)
Error Message:
TypeError Traceback (most recent call last)
<ipython-input-6-4753fcd8a7a3> in <module>
3
4 for company, com_name in zip(company_list, company_name):
----> 5 company["company_name"] = com_name
6
7 df = pd.concat(company_list, axis=0)
TypeError: 'str' object does not support item assignment
Please help me in solving this.
Thanks a lot ^_^
ARIMA is used for forecasting univariate time-series data. Not sure which feature you want to forecast. Came up with this one below:(Upvote if it works for you!)
#For loop for grabing yahoo finance data and setting as a dataframe
lt=[]
for stock in tech_list:
# Set DataFrame as the Stock Ticker
temp_df = pdr.get_data_yahoo(stock, start, end)
temp_df = temp_df.reset_index()
lt.append(temp_df)
# Each element in the list is a DataFrame
df = pd.concat([lt[0],lt[1],lt[2],lt[3]], axis=0)
df = df.reset_index(drop=True)
print(df.head())
Output:
Date Open High Low Close Adj Close Volume
0 2020-07-09 224.850006 224.850006 219.800003 221.600006 221.103027 198245
1 2020-07-10 221.600006 223.449997 219.449997 222.000000 221.502121 109461
2 2020-07-13 224.000000 229.000000 222.750000 227.550003 227.039673 385205
3 2020-07-14 229.000000 231.600006 224.199997 225.050003 224.545288 449975
4 2020-07-15 237.000000 265.500000 233.800003 262.950012 262.360291 6313161
The name of the fix_yahoo_finance package has been changed to yfinance. So please try this code.
from pandas_datareader import data as pdr
import yfinance as yf
yf.pdr_override() # <== that's all it takes :-)
# download dataframe
# The tech stocks we'll use for this analysis
tech_list = ['WIPRO.BO', 'INFY.BO', 'TCS.BO', 'HAPPSTMNDS.BO']
# Set up End and Start times for data grab
end = datetime.now()
start = datetime(end.year - 1, end.month, end.day)
#For loop for grabing yahoo finance data and setting as a dataframe
for stock in tech_list:
# Set DataFrame as the Stock Ticker
globals()[stock] = pdr.get_data_yahoo(stock, start, end)
The get_data_yahoo() method returns a pandas dataframe. So depending on what you want to do you can generate list of dataframes and concatenate the list together.
I am trying to scrape stock market information from all the stocks concerning the FTSE250.
I use Yahoo_Fin in order to do so. The code works, but I get an error that a stock is delisted.
Hence, I tried to put-in an except exception code. I read the documentation about the try and except libraries, but could not find the correct answer. I don't receive a syntax error, but the except exception code doesn't do anything.
EDIT: Putting two except exceptions works, below is the updated code.
index_df = pdr.get_data_yahoo(index_name, start_date, end_date, progress=False)
index_df['Percent Change'] = index_df['Adj Close'].pct_change()
index_return = (index_df['Percent Change'] + 1).cumprod()[-1]
for ticker in tickers:
# Download historical data as CSV for each stock (makes the process faster)
try:
df = pdr.get_data_yahoo(ticker, start_date, end_date,progress=False)
df.to_csv(f'{ticker}.csv')
except:
except Exception:
if ticker not in tickers:
next(ticker)
for ticker in tickers:
try:
# Calculating returns relative to the market (returns multiple)
df['Percent Change'] = df['Adj Close'].pct_change()
stock_return = (df['Percent Change'] + 1).cumprod()[-1]
returns_multiple = round((stock_return / index_return), 2)
returns_multiples.extend([returns_multiple])
except Exception:
if ticker not in tickers:
next(ticker)
Your exception doesn't make sense to me. You are looping over the tickers, then you check if the ticker is not in tickers. That will always be False, so your "next" statement is never going to be executed and it will just continue on.
Seems what you want is something like this:
for ticker in tickers:
# Download historical data as CSV for each stock (makes the process faster)
try:
df = pdr.get_data_yahoo(ticker, start_date, end_date,progress=False)
df.to_csv(f'{ticker}.csv')
# Calculating returns relative to the market (returns multiple)
df['Percent Change'] = df['Adj Close'].pct_change()
stock_return = (df['Percent Change'] + 1).cumprod()[-1]
returns_multiple = round((stock_return / index_return), 2)
returns_multiples.extend([returns_multiple])
#change the name of the stock index
print (f'Ticker: {ticker}; Returns Multiple against FTSE 250 : {returns_multiple}\n')
time.sleep(1)
except IndexError:
print(f"Error in ticker: {ticker}, skipping...")
Where you pass a proper except condition depending on what the traceback is (can't tell if it's FileNotFoundError or IndexError). But I think all you want to do is process the next ticker in the list?
I have read numerous StackOverflow threads about looping during try/except statements, using else and finally, if/else statements, and while statements, but none of them address what I want. That or I don't know how to utilise that information to get what I want done.
Basically, I am trying to get adjusted closing stock prices for various companies on a given date. I pasted some dummy data in the code block below to demonstrate (NOTE: you'll have to install pandas and pandas_datareader to get the dummy code to run). The get_stock_adj_close function returns the adj_close price given a ticker and date. The dummy_dataframe contains 4 companies with their tickers and random dates. And the add_days function takes a date and adds any number of days. I would like to append the adjusted close stock prices for each company in the dataframe on the listed date into the stock_prices list.
Because the yahoo stock price database isn't that reliable for older entries and because some dates fall on days when the market is closed, whenever a price isn't available it raises a KeyError: 'Date'. Thus, what I would like to do is keep adding days indefinitely until it finds a date where a price does exist. The problem is it only adds the day once and then raises the same KeyError. I want it to keep adding days until it finds a day where the database has a stock price available and then return back to the dataframe and keep going with the next row. Right now the whole thing breaks on the first GM date (fourth row), which raises the KeyError and the fifth row/second GM date is ignored. Any help is appreciated!
Dummy data:
from datetime import datetime, date, timedelta
import pandas as pd
import pandas_datareader as pdr
from dateutil.relativedelta import relativedelta
def add_days(d, num_days):
return d + timedelta(days=num_days)
def get_stock_adj_close(ticker, chosen_date):
stock_df = pdr.get_data_yahoo(ticker, start = chosen_date, end = chosen_date)
return stock_df.iloc[0]['Adj Close']
d = {'TICKER': ['AMD','AMD','CHTR','GM'], 'DATE': [datetime(2020,2,4), datetime(2019,2,8),datetime(2019,1,31), datetime(2010,4,7)]}
dummy_dataframe = pd.DataFrame(data=d)
stock_prices = []
for i, row in dummy_dataframe.iterrows():
given_date = row['DATE']
try:
stock_price = get_stock_adj_close(row['TICKER'], given_date)
print(stock_price)
stock_prices.append(stock_price)
except KeyError:
given_date = add_days(given_date,1)
stock_price = get_stock_adj_close(row['TICKER'], given_date)
stock_prices.append(stock_price)
print(stock_prices)
I think while loop will help you. For example:
for i, row in dummy_dataframe.iterrows():
given_date = row['DATE']
stock_price_found = False
while not stock_price_found:
try:
stock_price = get_stock_adj_close(row['TICKER'], given_date)
print(stock_price)
stock_prices.append(stock_price)
stock_price_found = False
except KeyError:
given_date = add_days(given_date,1)
Or you can also use while True together with break:
for i, row in dummy_dataframe.iterrows():
given_date = row['DATE']
while True:
try:
stock_price = get_stock_adj_close(row['TICKER'], given_date)
print(stock_price)
stock_prices.append(stock_price)
break
except KeyError:
given_date = add_days(given_date,1)
Don't forget to make sure that you are not stuck in indefinite loop, would be also helpful some other exit conditions from while loop, for example, after 10 failures.
I've used:
data = DataReader("yhoo", "yahoo", datetime.datetime(2000, 1, 1),
datetime.datetime.today())
in pandas (python) to get history data of yahoo, but it cannot show today's price (the market has not yet closed) how can I resolve such problem, thanks in advance.
import pandas
import pandas.io.data
import datetime
import urllib2
import csv
YAHOO_TODAY="http://download.finance.yahoo.com/d/quotes.csv?s=%s&f=sd1ohgl1vl1"
def get_quote_today(symbol):
response = urllib2.urlopen(YAHOO_TODAY % symbol)
reader = csv.reader(response, delimiter=",", quotechar='"')
for row in reader:
if row[0] == symbol:
return row
## main ##
symbol = "TSLA"
history = pandas.io.data.DataReader(symbol, "yahoo", start="2014/1/1")
print history.tail(2)
today = datetime.date.today()
df = pandas.DataFrame(index=pandas.DatetimeIndex(start=today, end=today, freq="D"),
columns=["Open", "High", "Low", "Close", "Volume", "Adj Close"],
dtype=float)
row = get_quote_today(symbol)
df.ix[0] = map(float, row[2:])
history = history.append(df)
print "today is %s" % today
print history.tail(2)
just to complete perigee's answer, it cost me quite some time to find a way to append the data.
Open High Low Close Volume Adj Close
Date
2014-02-04 180.7 181.60 176.20 178.73 4686300 178.73
2014-02-05 178.3 180.59 169.36 174.42 7268000 174.42
today is 2014-02-06
Open High Low Close Volume Adj Close
2014-02-05 178.30 180.59 169.36 174.420 7268000 174.420
2014-02-06 176.36 180.11 176.00 178.793 5199297 178.793
Find a way to work around, just use urllib to fetch the data with:
http://download.finance.yahoo.com/d/quotes.csv?s=yhoo&f=sd1ohgl1l1v
then add it to dataframe
This code uses the pandas read_csv method to get the new quote from yahoo, and it checks if the new quote is an update from the current date or a new date in order to update the last record in history or append a new record.
If you add a while(true) loop and a sleep around the new_quote section, you can have the code refresh the quote during the day.
It also has duplicate last trade price to fill in the Close and the Adjusted Close, given that intraday close and adj close are always the same value.
import pandas as pd
import pandas.io.data as web
def get_quote_today(symbol):
url="http://download.finance.yahoo.com/d/quotes.csv?s=%s&f=d1t1ohgl1vl1"
new_quote= pd.read_csv(url%symbol,
names=[u'Date',u'time',u'Open', u'High', u'Low',
u'Close', u'Volume', u'Adj Close'])
# generate timestamp:
stamp = pd.to_datetime(new_quote.Date+" "+new_quote.time)
new_quote.index= stamp
return new_quote.iloc[:, 2:]
if __name__ == "__main__":
symbol = "TSLA"
history = web.DataReader(symbol, "yahoo", start="2014/1/1")
print history.tail()
new_quote = get_quote_today(symbol)
if new_quote.index > history.index[-1]:
if new_quote.index[-1].date() == history.index[-1].date():
# if both quotes are for the first date, update history's last record.
history.iloc[-1]= new_quote.iloc[-1]
else:
history=history.append(new_quote)
history.tail()
So from trying this out and looking at the dataframe, it doesn't look too possible. You tell it to go from a specific day until today, yet the dataframe stops at may 31st 2013. This tells me that yahoo probably has not made it available for you to use in the past couple days or somehow pandas is just not picking it up. It is not just missing 1 day, it is missing 3.
If I do the following:
>>> df = DataReader("yhoo", "yahoo", datetime.datetime(2013, 6, 1),datetime.datetime.today())
>>> len(df)
0
it shows me that there simply is no data to pick up in those days so far. If there is some way around this then I cannot figure it out, but it just seems that the data is not available for you yet, which is hard to believe.
The module from pandas doesn't work anymore, because the google and yahoo doens't provide support anymore. So you can create a function to take the data direct from the Google Finance using the url. Here is a part of a code to do this
import csv
import datetime
import re
import codecs
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
You can wrote a function to get data from Google Finance using the url, you have to indent the parte bellow.
#You have to indent this part
def get_google_finance_intraday(ticker, period=60, days=1, exchange='NASD'):
"""
Retrieve intraday stock data from Google Finance.
Parameters
----------------
ticker : str
Company ticker symbol.
period : int
Interval between stock values in seconds.
i = 60 corresponds to one minute tick data
i = 86400 corresponds to daily data
days : int
Number of days of data to retrieve.
exchange : str
Exchange from which the quotes should be fetched
Returns
---------------
df : pandas.DataFrame
DataFrame containing the opening price, high price, low price,
closing price, and volume. The index contains the times associated with
the retrieved price values.
"""
# build url
url = 'https://finance.google.com/finance/getprices?p={days}d&f=d,o,h,l,c,v&q={ticker}&i={period}&x={exchange}'.format(ticker=ticker, period=period, days=days, exchange=exchange)
page = requests.get(url)
reader = csv.reader(codecs.iterdecode(page.content.splitlines(), "utf-8"))
columns = ['Open', 'High', 'Low', 'Close', 'Volume']
rows = []
times = []
for row in reader:
if re.match('^[a\d]', row[0]):
if row[0].startswith('a'):
start = datetime.datetime.fromtimestamp(int(row[0][1:]))
times.append(start)
else:
times.append(start+datetime.timedelta(seconds=period*int(row[0])))
rows.append(map(float, row[1:]))
if len(rows):
return pd.DataFrame(rows, index=pd.DatetimeIndex(times, name='Date'), columns=columns)
else:
return pd.DataFrame(rows, index=pd.DatetimeIndex(times, name='Date'))
Now you can just call the function with the ticket that you want, in my case AAPL and the result is a pandas DataFrame containing the opening price, high price, low price, closing price, and volume.
ticker = 'AAPL'
period = 60
days = 1
exchange = 'NASD'
df = get_google_finance_intraday(ticker, period=period, days=days)
df
The simplest way to extract Indian stock price data into Python is to use the nsepy library.
In case you do not have the nsepy library do the following:
pip install nsepy
The following code allows you to extract HDFC stock price for 10 years.
from nsepy import get_history
from datetime import date
dfc=get_history(symbol="HDFCBANK",start=date(2015,5,12),end=date(2020,5,18))
This is so far the easiest code I have found.