Retry Single Iteration in For Loop (Python) - python

Python novice here (sorry if this is a dumb question)! I'm currently using a for loop to download and manipulate data. Unfortunately, I occasionally run into brief network issues that cause portions of the loop to fail.
Originally, I was doing something like this:
# Import Modules
import fix_yahoo_finance as yf
import pandas as pd
from stockstats import StockDataFrame as sdf
# Stock Tickers to Gather Data For - in my full code I have thousands of tickers
Ticker = ['MSFT','SPY','GOOG']
# Data Start and End Data
Data_Start_Date = '2017-03-01'
Data_End_Date = '2017-06-01'
# Create Data List to Append
DataList = pd.DataFrame([])
# Initialize Loop
for i in Ticker:
# Download Data
data = yf.download(i, Data_Start_Date, Data_End_Date)
# Create StockDataFrame
stock_df = sdf.retype(data)
# Calculate RSI
data['rsi'] = stock_df['rsi_14']
DataList.append(pd.DataFrame(data))
DataList.to_csv('DataList.csv',header=True,index=True)
With that basic layout, whenever I had a network error, it caused the entire program to halt and spit out an error.
I did some research and tried modifying the 'for loop' to following:
for i in Ticker:
try:
# Download Data
data = yf.download(i, Data_Start_Date, Data_End_Date)
# Create StockDataFrame
stock_df = sdf.retype(data)
# Calculate RSI
data['rsi'] = stock_df['rsi_14']
DataList.append(pd.DataFrame(data))
except:
continue
With this, the code always ran without issue, but whenever I encountered a network error, it skipped all the tickers it was on (failed to download their data).
I want this to download the data for each ticker once. If it fails, I want it to try again until it succeeds once and then move on to the next ticker. I tried using while True and variations of it, but it caused the loop to download the same ticker multiple times!
Any help or advice is greatly appreciated! Thank you!

If you can continue after you've hit a glitch (some protocols support it), then you're better off not using this exact approach. But for a slightly brute-force method:
for i in Ticker:
incomplete = True
tries = 10
while incomplete and tries > 0:
try:
# Download Data
data = yf.download(i, Data_Start_Date, Data_End_Date)
incomplete = False
except:
tries -= 1
# Create StockDataFrame
if incomplete:
print("Oops, it is really failing a lot, skipping: %r" % (i,))
continue # not technically needed, but in case you opt to add
# anything afterward ...
else:
stock_df = sdf.retype(data)
# Calculate RSI
data['rsi'] = stock_df['rsi_14']
DataList.append(pd.DataFrame(data))
This is slighly different that Prune's in that it stops after 10 attempts ... if it fails that many times, that indicates you may want to divert some energy into fixing a different problem such as network connectivity.
If it gets to that point, it will continue in the list of Tickers, so perhaps you can get most of what you need.

You can use a wrapper loop to continue until you get a good result.
for i in Ticker:
fail = True
while fail: # Keep trying until it works
try:
# Download Data
data = yf.download(i, Data_Start_Date, Data_End_Date)
# Create StockDataFrame
stock_df = sdf.retype(data)
# Calculate RSI
data['rsi'] = stock_df['rsi_14']
DataList.append(pd.DataFrame(data))
except:
continue
else:
fail = False

Related

how to retry from the broken error exception part on a for loop

This is my latest attempt:
Screenshot from my IDE
stockcode = pd.read_csv('D:\Book1.csv')
stockcode = stockcode.Symbol.to_list()
print(len(stockcode))
df = []
for i in stockcode:
df = isec.get_historical_data(
interval=time_interval,
from_date=from_date,
to_date=to_date,
stock_code=i,
exchange_code="NSE",
product_type="cash"
)
df = pd.DataFrame(df["Success"])
df.append(df)
df = pd.concat(df)
I am trying to fetch data of multiple stocks from brokers API but after few mins it throws an error.
how can it restart from the part where it stopped?
I tried some loop exception but of no use and the error thrown was Output exceeds the size limit
Screenshot from my IDE
If I understand you correctly, you want to restart with no looping through items you've already processed? In this case, I'd use manual retry in a code and looping through a copy of your original list without items which are already OK:
number_of_retries = 3
retry_counter = 0
while retry_counter < number_of_retries and not is_success:
try:
stockcode_process = stockcode.copy()
for i in stockcode_process:
# your for-loop body here with the additional last line as following to remove processed item:
stockcode.pop(i)
is_success = true
except Exception:
retry_counter += 1

Making the python version of a SAS macro/call

I'm trying to create in Python what a macro does in SAS. I have a list of over 1K tickers that I'm trying to download information for but doing all of them in one step made python crash so I split up the data into 11 portions. Below is the code we're working with:
t0=t.time()
printcounter=0
for ticker in tickers1:
printcounter+=1
print(printcounter)
try:
selected = yf.Ticker(ticker)
shares = selected.get_shares()
shares_wide = shares.transpose()
info=selected.info
market_cap=info['marketCap']
sector=info['sector']
name=info['shortName']
comb = shares_wide.assign(market_cap_oct22=market_cap,sector=sector,symbol=ticker,name=name)
company_info_1 = company_info_1.append(comb)
except:
comb = pd.DataFrame()
comb = comb.append({'symbol':ticker,'ERRORFLAG':'ERROR'},ignore_index=True)
company_info_1 = company_info_1.append(comb)
print("total run time:", round(t.time()-t0,3),"s")
What I'd like to do is instead of re-writing and running this code for all 11 portions of data and manually changing "tickers1" and "company_info_1" to "tickers2" "company_info_2" "tickers3" "company_info_3" (and so on)... I'd like to see if there is a way to make a python version of a SAS macro/call so that I can get this data more dynamically. Is there a way to do this in python?
You need to generalize your existing code and wrap it in a function.
def comany_info(tickers):
for ticker in tickers:
try:
selected = yf.Ticker(ticker) # you may also have to pass the yf object
shares = selected.get_shares()
shares_wide = shares.transpose()
info=selected.info
market_cap=info['marketCap']
sector=info['sector']
name=info['shortName']
comb = shares_wide.assign(market_cap_oct22=market_cap,sector=sector,symbol=ticker,name=name)
company_info = company_info.append(comb)
except:
comb = pd.DataFrame()
comb = comb.append({'symbol':ticker,'ERRORFLAG':'ERROR'},ignore_index=True)
company_info = company_info.append(comb)
return company_info # return the dataframe
Create a master dataframe to collect your results from the function call. Loop over the 11 groups of tickers passing each group into your function. Append the results to your master.
# master df to collect results
master = pd.DataFrame()
# assuming you have your tickers in a list of lists
# loop over each of the 11 groups of tickers
for tickers in groups_of_tickers:
df = company_info(tickers) # fetch data from Yahoo Finance
master = master.append(df))
Please note I typed this on the fly. I have no way of testing this. I'm quite sure there are syntactical issues to work through. Hopefully it provides a framework for how to think about the solution.

Handling Exceptions with Bulk API Requests

I am pulling data from an API that allows batch requests, and then storing the data to a Dataframe. When there is an exception with one of the items being looked up via the API, I want to either skip that item entirely, (or write zeroes to the Dataframe) and then go on to the next item.
But my issue is that because the API data is being accessed in bulk (i.e., not looping through each item in the list), an exception for any item in the list breaks the program. So how can I elegantly handle exceptions without looping through each individual item in the tickers list?
Note that removing ERROR from the tickers list will enable the program to run successfully:
import os
from iexfinance.stocks import Stock
import iexfinance
# Set IEX Finance API Token (Sandbox)
os.environ['IEX_API_VERSION'] = 'iexcloud-sandbox'
os.environ['IEX_TOKEN'] = 'Tpk_a4bc3e95d4c94810a3b2d4138dc81c5d'
# List of companies to get data for
tickers = ['MSFT', 'ERROR', 'AMZN']
batch = Stock(tickers, output_format='pandas')
income_ttm = 0
try:
# Get income from last 4 quarters, sum it, and store to temp Dataframe
df_income = batch.get_income_statement(period="year")
print(df_income)
except (iexfinance.utils.exceptions.IEXQueryError, iexfinance.utils.exceptions.IEXSymbolError) as e:
pass
This should do the work
import os
from copy import deepcopy
from iexfinance.stocks import Stock
import iexfinance
def find_wrong_symbol(tickers, err):
wrong_ticker = []
for one_ticker in tickers:
if one_ticker.upper() in err:
wrong_ticker.append(one_ticker)
return wrong_ticker
# Set IEX Finance API Token (Sandbox)
os.environ['IEX_API_VERSION'] = 'iexcloud-sandbox'
os.environ['IEX_TOKEN'] = 'Tpk_a4bc3e95d4c94810a3b2d4138dc81c5d'
# List of companies to get data for
tickers = ['MSFT', 'AMZN', 'failing']
batch = Stock(tickers, output_format='pandas')
income_ttm = 0
try:
# Get income from last 4 quarters, sum it, and store to temp Dataframe
df_income = batch.get_income_statement(period="year")
print(df_income)
except (iexfinance.utils.exceptions.IEXQueryError, iexfinance.utils.exceptions.IEXSymbolError) as e:
wrong_tickers = find_wrong_symbol(tickers, str(e))
tickers_to_get = deepcopy(tickers)
assigning_dict = {}
for wrong_ticker in wrong_tickers:
tickers_to_get.pop(tickers_to_get.index(wrong_ticker))
assigning_dict.update({wrong_ticker: lambda x: 0})
new_batch = Stock(tickers_to_get, output_format='pandas')
df_income = new_batch.get_income_statement(period="year").assign(**assigning_dict)
I create a small function in order to find tickers that are not handle by the API. After deleting the wrong tickers, I recall the API without it and with the assign function add the missing columns with the 0 values (it could be anything, a NaN or another default value).

HTS Prophet Holidays Issue

I am attempting to use the htsprophet package in Python. I am using the following example code below. This example is pulled from https://github.com/CollinRooney12/htsprophet/blob/master/htsprophet/runHTS.py . The issue I am getting is ValueError "holidays must be a DataFrame with 'ds' and 'holiday' column. I am wondering if there is a work around this because I clearly have a data frame holidays with the two columns ds and holidays. I believe that the error comes from one of the dependency packages from fbprophet from the forecaster file. I am wondering if there is anything that I need to add or if anyone has added something to fix this.
import pandas as pd
from htsprophet.hts import hts, orderHier, makeWeekly
from htsprophet.htsPlot import plotNode, plotChild, plotNodeComponents
import numpy as np
#%% Random data (Change this to whatever data you want)
date = pd.date_range("2015-04-02", "2017-07-17")
date = np.repeat(date, 10)
medium = ["Air", "Land", "Sea"]
businessMarket = ["Birmingham","Auburn","Evanston"]
platform = ["Stone Tablet","Car Phone"]
mediumDat = np.random.choice(medium, len(date))
busDat = np.random.choice(businessMarket, len(date))
platDat = np.random.choice(platform, len(date))
sessions = np.random.randint(1000,10000,size=(len(date),1))
data = pd.DataFrame(date, columns = ["day"])
data["medium"] = mediumDat
data["platform"] = platDat
data["businessMarket"] = busDat
data["sessions"] = sessions
#%% Run HTS
##
# Make the daily data weekly (optional)
##
data1 = makeWeekly(data)
##
# Put the data in the format to run HTS, and get the nodes input (a list of list that describes the hierarchical structure)
##
data2, nodes = orderHier(data, 1, 2, 3)
##
# load in prophet inputs (Running HTS runs prophet, so all inputs should be gathered beforehand)
# Made up holiday data
##
holidates = pd.date_range("12/25/2013","12/31/2017", freq = 'A')
holidays = pd.DataFrame(["Christmas"]*5, columns = ["holiday"])
holidays["ds"] = holidates
holidays["lower_window"] = [-4]*5
holidays["upper_window"] = [0]*5
##
# Run hts with the CVselect function (this decides which hierarchical aggregation method to use based on minimum mean Mean Absolute Scaled Error)
# h (which is 12 here) - how many steps ahead you would like to forecast. If youre using daily data you don't have to specify freq.
#
# NOTE: CVselect takes a while, so if you want results in minutes instead of half-hours pick a different method
##
myDict = hts(data2, 52, nodes, holidays = holidays, method = "FP", transform = "BoxCox")
##
The problem lies in the htsProphet package, with the 'fitForecast.py' file. The instantiation of the fbProphet object relies on just positional arguments, however a new argument as been added to the fbProphet class. This means the arguments don't correspond anymore.
You can solve this by hacking the fbProphet module and changing the positional arguments to keyword arguments, just fixing lines '73-74' should be sufficient to get it running:
Prophet(growth=growth, changepoints=changepoints1, n_changepoints=n_changepoints1, yearly_seasonality=yearly_seasonality, weekly_seasonality=weekly_seasonality, holidays=holidays, seasonality_prior_scale=seasonality_prior_scale, \
holidays_prior_scale=holidays_prior_scale, changepoint_prior_scale=changepoint_prior_scale, mcmc_samples=mcmc_samples, interval_width=interval_width, uncertainty_samples=uncertainty_samples)
Ill submit a bug for this to the creators.

Checking HTTP Status (Python)

Is there a way to check the HTTP Status Code in the code below, as I have not used the request or urllib libraries which would allow for this.
from pandas.io.excel import read_excel
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
#check the sheet number, spot: 9/9, short end 7/9
spot_curve = read_excel(url, sheetname=8) #Creates the dataframes
short_end_spot_curve = read_excel(url, sheetname=6)
# do some cleaning, keep NaN for now, as forward fill NaN is not recommended for yield curve
spot_curve.columns = spot_curve.loc['years:']
valid_index = spot_curve.index[4:]
spot_curve = spot_curve.loc[valid_index]
# remove all maturities within 5 years as those are duplicated in short-end file
col_mask = spot_curve.columns.values > 5
spot_curve = spot_curve.iloc[:, col_mask]
#Providing correct names
short_end_spot_curve.columns = short_end_spot_curve.loc['years:']
valid_index = short_end_spot_curve.index[4:]
short_end_spot_curve = short_end_spot_curve.loc[valid_index]
# merge these two, time index are identical
# ==============================================
combined_data = pd.concat([short_end_spot_curve, spot_curve], axis=1, join='outer')
# sort the maturity from short end to long end
combined_data.sort_index(axis=1, inplace=True)
def filter_func(group):
return group.isnull().sum(axis=1) <= 50
combined_data = combined_data.groupby(level=0).filter(filter_func)
In pandas:
read_excel try to use urllib2.urlopen(urllib.request.urlopen instead in py3x) to open the url and get .read() of response immediately without store the http request like:
data = urlopen(url).read()
Though you need only part of the excel, pandas will download the whole excel each time. So, I voted #jonnybazookatone.
It's better to store the excel to your local, then you can check the status code and md5 of file first to verify data integrity or others.

Categories

Resources