Handling Exceptions with Bulk API Requests - python

I am pulling data from an API that allows batch requests, and then storing the data to a Dataframe. When there is an exception with one of the items being looked up via the API, I want to either skip that item entirely, (or write zeroes to the Dataframe) and then go on to the next item.
But my issue is that because the API data is being accessed in bulk (i.e., not looping through each item in the list), an exception for any item in the list breaks the program. So how can I elegantly handle exceptions without looping through each individual item in the tickers list?
Note that removing ERROR from the tickers list will enable the program to run successfully:
import os
from iexfinance.stocks import Stock
import iexfinance
# Set IEX Finance API Token (Sandbox)
os.environ['IEX_API_VERSION'] = 'iexcloud-sandbox'
os.environ['IEX_TOKEN'] = 'Tpk_a4bc3e95d4c94810a3b2d4138dc81c5d'
# List of companies to get data for
tickers = ['MSFT', 'ERROR', 'AMZN']
batch = Stock(tickers, output_format='pandas')
income_ttm = 0
try:
# Get income from last 4 quarters, sum it, and store to temp Dataframe
df_income = batch.get_income_statement(period="year")
print(df_income)
except (iexfinance.utils.exceptions.IEXQueryError, iexfinance.utils.exceptions.IEXSymbolError) as e:
pass

This should do the work
import os
from copy import deepcopy
from iexfinance.stocks import Stock
import iexfinance
def find_wrong_symbol(tickers, err):
wrong_ticker = []
for one_ticker in tickers:
if one_ticker.upper() in err:
wrong_ticker.append(one_ticker)
return wrong_ticker
# Set IEX Finance API Token (Sandbox)
os.environ['IEX_API_VERSION'] = 'iexcloud-sandbox'
os.environ['IEX_TOKEN'] = 'Tpk_a4bc3e95d4c94810a3b2d4138dc81c5d'
# List of companies to get data for
tickers = ['MSFT', 'AMZN', 'failing']
batch = Stock(tickers, output_format='pandas')
income_ttm = 0
try:
# Get income from last 4 quarters, sum it, and store to temp Dataframe
df_income = batch.get_income_statement(period="year")
print(df_income)
except (iexfinance.utils.exceptions.IEXQueryError, iexfinance.utils.exceptions.IEXSymbolError) as e:
wrong_tickers = find_wrong_symbol(tickers, str(e))
tickers_to_get = deepcopy(tickers)
assigning_dict = {}
for wrong_ticker in wrong_tickers:
tickers_to_get.pop(tickers_to_get.index(wrong_ticker))
assigning_dict.update({wrong_ticker: lambda x: 0})
new_batch = Stock(tickers_to_get, output_format='pandas')
df_income = new_batch.get_income_statement(period="year").assign(**assigning_dict)
I create a small function in order to find tickers that are not handle by the API. After deleting the wrong tickers, I recall the API without it and with the assign function add the missing columns with the 0 values (it could be anything, a NaN or another default value).

Related

API CALL STOCK DATA

import requests
import pandas as pd
import os
import io
import time
import csv
# Method 1: PRE-ENTER LIST OF stockS INSIDE STOCK LIST
# stock_list = ['QQQ', 'AAPL', 'TSLA', 'AMZN', 'GOOG',
# 'MSFT', 'META', 'BA', 'PFE', 'MRNA', 'BAC']
stock_list = ['TSLA', 'XLE']
for stock in stock_list:
os.chdir('C:/Users/bean/Desktop')
path = f'C:/Users/bean/Desktop'
API = 'APIKEY'
symbol = stock
if not os.path.exists(os.path.join(path, symbol)):
os.makedirs(os.path.join(symbol+'/months'))
# os.makedirs(os.path.join(symbol))
# Slice months for api calls.
month_slices = [f'year1month1', f'year1month2', f'year1month3',
f'year1month4', f'year1month5', f'year1month6',
f'year1month7', f'year1month8', f'year1month9',
f'year1month10', f'year1month11', f'year1month12',
f'year2month1', f'year2month2', f'year2month3',
f'year2month4', f'year2month5', f'year2month6',
f'year2month7', f'year2month8', f'year2month9',
f'year2month10', f'year2month11', f'year2month12']
# Get all URL links.
urls = []
for stock in stock_list:
for slice in month_slices:
url = f'https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY_EXTENDED&symbol={stock}&interval=1min&slice={slice}&apikey={API}'
urls.append(url)
print(url)
# Append the data.
data = []
counter = 0
for url in urls:
response = requests.get(url)
# df = pd.DataFrame(url)
df = pd.read_csv(io.BytesIO(response.content))
df.to_csv(
f'C:/Users/bean/Desktop/{stock}/months/{stock}_{slice}.csv', index=False)
data.append(df)
counter += 1
if counter % 5 == 0:
print(
f'counter is: {counter} for symbol: {stock}. ')
print(
'Sleeping one minute. API allows 5 calls per minute; 500 total daily.')
time.sleep(60)
counter = 0
# Combine and save sheets to your destination.
months_df = pd.concat(data)
months_df.to_csv(
f'C:/Users/bean/Desktop/{stock}/combined_{stock}_data.csv', index=False)
print(f' finished: {months_df}')
Essentially, using alpha vantage to try getting minute data for market. Can anyone help me with this code. The problem is I made this long version which works but here trying to make it more concise, I am using a loop and counter. Since the free version of the API only allows 5 calls per minute, I need to make the program sleep. The problem is using the counter and the loop, when I make it sleep, after it comes back it only does one api call instead of the next 5 like it should. Thats when the program stops again for 60 seconds, then proceeds again one call at time.
I am not sure why it wouldn't just repeat. I liked the idea of if remainder 5 (%5==0) because if I have a lot of symbols in list it can continue going.
Does this have to do with the indent of the counter?
Thanks

Making the python version of a SAS macro/call

I'm trying to create in Python what a macro does in SAS. I have a list of over 1K tickers that I'm trying to download information for but doing all of them in one step made python crash so I split up the data into 11 portions. Below is the code we're working with:
t0=t.time()
printcounter=0
for ticker in tickers1:
printcounter+=1
print(printcounter)
try:
selected = yf.Ticker(ticker)
shares = selected.get_shares()
shares_wide = shares.transpose()
info=selected.info
market_cap=info['marketCap']
sector=info['sector']
name=info['shortName']
comb = shares_wide.assign(market_cap_oct22=market_cap,sector=sector,symbol=ticker,name=name)
company_info_1 = company_info_1.append(comb)
except:
comb = pd.DataFrame()
comb = comb.append({'symbol':ticker,'ERRORFLAG':'ERROR'},ignore_index=True)
company_info_1 = company_info_1.append(comb)
print("total run time:", round(t.time()-t0,3),"s")
What I'd like to do is instead of re-writing and running this code for all 11 portions of data and manually changing "tickers1" and "company_info_1" to "tickers2" "company_info_2" "tickers3" "company_info_3" (and so on)... I'd like to see if there is a way to make a python version of a SAS macro/call so that I can get this data more dynamically. Is there a way to do this in python?
You need to generalize your existing code and wrap it in a function.
def comany_info(tickers):
for ticker in tickers:
try:
selected = yf.Ticker(ticker) # you may also have to pass the yf object
shares = selected.get_shares()
shares_wide = shares.transpose()
info=selected.info
market_cap=info['marketCap']
sector=info['sector']
name=info['shortName']
comb = shares_wide.assign(market_cap_oct22=market_cap,sector=sector,symbol=ticker,name=name)
company_info = company_info.append(comb)
except:
comb = pd.DataFrame()
comb = comb.append({'symbol':ticker,'ERRORFLAG':'ERROR'},ignore_index=True)
company_info = company_info.append(comb)
return company_info # return the dataframe
Create a master dataframe to collect your results from the function call. Loop over the 11 groups of tickers passing each group into your function. Append the results to your master.
# master df to collect results
master = pd.DataFrame()
# assuming you have your tickers in a list of lists
# loop over each of the 11 groups of tickers
for tickers in groups_of_tickers:
df = company_info(tickers) # fetch data from Yahoo Finance
master = master.append(df))
Please note I typed this on the fly. I have no way of testing this. I'm quite sure there are syntactical issues to work through. Hopefully it provides a framework for how to think about the solution.

What is the fastest way to iterate through a list of yfinance tickers?

Im using the python yfinance yahoo API for stock data retrieval. Right now im getting the peg ratio, which is an indicator of a company price related to its growth and earnings. I have a csv downloaded from here: https://www.nasdaq.com/market-activity/stocks/screener.
It has exactly 8000 stocks.
What I do is get the symbol list, and iterate it to access to the yahoo ticker. Then I get a use the ticker.info method which returns a dictionary. I iterate this process through the 8000 symbols. It goes at a speed of 6 symbols per minute, which is not viable. Is there a faster way with another API or another structure? I dont care about the API as long as I can get basic info as the growth, earnings, EPS and those things.
Here is the code:
import pandas as pd
import yfinance as yf
data = pd.read_csv("data/stock_list.csv")
symbols = data['Symbol']
for symbol in symbols:
stock = yf.Ticker(symbol)
try:
if stock.info['pegRatio']:
print(stock.info['shortName'] + " : " + str(stock.info['pegRatio']))
except KeyError:
pass
It seems that when certain data are needed from the Ticker.info attribute, HTTP requests are made to acquire them. Multithreading will help to improve matters. Try this:-
import pandas as pd
import yfinance as yf
import concurrent.futures
data = pd.read_csv('data/stock_list.csv')
def getPR(symbol):
sn = None
pr = None
try:
stock = yf.Ticker(symbol)
pr = stock.info['pegRatio']
sn = stock.info['shortName']
except Exception:
pass
return (sn, pr)
with concurrent.futures.ThreadPoolExecutor() as executor:
futures = {executor.submit(getPR, sym): sym for sym in data['Symbol']}
for future in concurrent.futures.as_completed(futures):
sn, pr = future.result()
if sn:
print(f'{sn} : {pr}')

yfinance api return multiple ticker data

I am trying to pull out multiple ticker data from the yfinance API and save it to a csv file (in total I have 1000 tickers I need to get the data for, that data being the entire table of date, open, high, low, close, volume, etc etc), so far I am able to successfully get data for 1 ticker by using the following Python code:
import yfinance as yf
def yfinance(ticker_symbol):
ticker_data = yf.Ticker(ticker_symbol)
tickerDF = ticker_data.history(period='1d', start='2020-09-30', end='2020-10-31')
print(tickerDF)
yfinance('000001.SS')
However if I try on multiple tickers this doesn't work. Following the yfinance docs which say for multiple tickers use:
tickers = yf.Tickers('msft aapl goog')
# ^ returns a named tuple of Ticker objects
# access each ticker using (example)
tickers.tickers.MSFT.info
tickers.tickers.AAPL.history(period="1mo")
tickers.tickers.GOOG.actions
I have a couple of issue here, the docs use a string such as 'aapl' my tickers are all of digit format like '000001.SS', the ".SS" part is proving to be an issue when passing it into the code:
tickers.tickers.000001.SS.history(period="1mo")
# Clearly this wont for for a start
The next issue I am having is, even if I pass in for example 3 tickers to my function like so:
yfinance('000001.SS 000050.KS 00006.KS')
# similar to yfinance docs of tickers = yf.Tickers('msft aapl goog')
I get errors like:
AttributeError: 'Tickers' object has no attribute '000001.SS'
(I have also tried to run these into a for loop and pass each on to the Tickers object but get the same error.)
Im stuck now, I dont know how to pass in multiple tickers to yfinance and get back data that I want and the docs aren't very helpful.
Is anyone able to help me with this?
Could you not just store them in an array specifying the type as dtype object then use that pull the data from.
import yfinance as yf
import numpy as np
tickers = ['msft', 'aapl', 'goog']
totalPortfolio = np.empty([len(tickers)], dtype=object)
num = 0
for ticker in tickers:
totalPortfolio[num] = yf.download(ticker, start='2020-09-30', end='2020-10-31', interval="1d")
num = num + 1
Take a look at the code below:
test = yf.Tickers("A B C")
# creates test as a yf.tickers object
test_dict = test.tickers
# creates a dict object containing the individual tickers. Can be checked with type()
You are trying to use "tickers.tickers.MSFT.info" to retrieve the ticker data from your dictionary "tickers.tickers" but like your error message says, a dict object has no attributes named after your specific ticker names. This is in general not how you access elements in a dictionary.
Instead you should use the code as below (like with all dict objects):
#old code from above
test = yf.Tickers("A B C")
test_dict = test.tickers
#new code accessing the dict correctly
a_data = test_dict["A"]
a_data = test.tickers["A"] #does the same as the line above
b_data = test.tickers["B"] #and so on for the other tickers
In a loop this could look something like this:
ticker_list = ["A", "B", "C"] #add tickers as needed
tickers_data = {}
tickers_history = {}
for ticker in ticker_list:
tickers_data[ticker] = yf.Ticker(ticker)
tickers_history = tickers_data[ticker].history(period='1d', start='2020-09-30', end='2020-10-31')
#access the dicts as needed using tickers_data[" your ticker name "]
alternatively you can also use the "yf.Tickers" function to retrieve multiple tickers at once, but because you save the history seperately I don't think this will necessarily improve your code much.
You should pay attention however, that "yf.Ticker()" and "yf.Tickers()" are different functions from each other with differing syntax and are not interchangeable.
You did mix that up when you tried accessing multiple tickers with your custom "yfinance()" function, that has been previously defined with the "yf.Ticker()" function and thus only accepts one symbol at a time.

Checking HTTP Status (Python)

Is there a way to check the HTTP Status Code in the code below, as I have not used the request or urllib libraries which would allow for this.
from pandas.io.excel import read_excel
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
#check the sheet number, spot: 9/9, short end 7/9
spot_curve = read_excel(url, sheetname=8) #Creates the dataframes
short_end_spot_curve = read_excel(url, sheetname=6)
# do some cleaning, keep NaN for now, as forward fill NaN is not recommended for yield curve
spot_curve.columns = spot_curve.loc['years:']
valid_index = spot_curve.index[4:]
spot_curve = spot_curve.loc[valid_index]
# remove all maturities within 5 years as those are duplicated in short-end file
col_mask = spot_curve.columns.values > 5
spot_curve = spot_curve.iloc[:, col_mask]
#Providing correct names
short_end_spot_curve.columns = short_end_spot_curve.loc['years:']
valid_index = short_end_spot_curve.index[4:]
short_end_spot_curve = short_end_spot_curve.loc[valid_index]
# merge these two, time index are identical
# ==============================================
combined_data = pd.concat([short_end_spot_curve, spot_curve], axis=1, join='outer')
# sort the maturity from short end to long end
combined_data.sort_index(axis=1, inplace=True)
def filter_func(group):
return group.isnull().sum(axis=1) <= 50
combined_data = combined_data.groupby(level=0).filter(filter_func)
In pandas:
read_excel try to use urllib2.urlopen(urllib.request.urlopen instead in py3x) to open the url and get .read() of response immediately without store the http request like:
data = urlopen(url).read()
Though you need only part of the excel, pandas will download the whole excel each time. So, I voted #jonnybazookatone.
It's better to store the excel to your local, then you can check the status code and md5 of file first to verify data integrity or others.

Categories

Resources