The problem I'm having is that the continue command is skipping inconsistently. It skips the numerical output ebitda but puts the incorrect ticker next to it. Why is this? If I make the ticker just phm an input it should skip, it correctly prints an empty list [] but when an invalid ticker is placed next to a valid one, the confusion starts happening.
import requests
ticker = ['aapl', 'phm', 'mmm']
ebitda = []
for i in ticker:
r_EV=requests.get('https://query2.finance.yahoo.com/v10/finance/quoteSummary/'+i+'?formatted=true&crumb=8ldhetOu7RJ&lang=en-US®ion=US&modules=defaultKeyStatistics%2CfinancialData%2CcalendarEvents&corsDomain=finance.yahoo.com')
r_ebit = requests.get('https://query1.finance.yahoo.com/v10/finance/quoteSummary/' + i + '?formatted=true&crumb=B2JsfXH.lpf&lang=en-US®ion=US&modules=incomeStatementHistory%2CcashflowStatementHistory%2CbalanceSheetHistory%2CincomeStatementHistoryQuarterly%2CcashflowStatementHistoryQuarterly%2CbalanceSheetHistoryQuarterly%2Cearnings&corsDomain=finance.yahoo.com%27')
data = r_EV.json()
data1 = r_ebit.json()
if data1['quoteSummary']['result'][0]['balanceSheetHistoryQuarterly']['balanceSheetStatements'][0].get('totalCurrentAssets') == None:
continue #skips certain ticker if no Total Current Assets available (like for PHM)
ebitda_data = data['quoteSummary']['result'][0]['financialData']
ebitda_dict = ebitda_data['ebitda']
ebitda.append(ebitda_dict['raw']) #navigates to dictionairy where ebitda is stored
ebitda_formatted = dict(zip(ticker, ebitda))
print(ebitda_formatted)
# should print {'aapl': 73961996288, 'mmm': 8618000384}
# NOT: {'aapl': 73961996288, 'phm': 8618000384}
The continue works just fine. You produce this list:
[73961996288, 8618000384]
However, you then zip that list with ticker, which still has 3 elements in it, including 'phm'. zip() stops when one of the iterables is empty, so you produce the following tuples:
>>> ebitda
[73961996288, 8618000384]
>>> ticker
['aapl', 'phm', 'mmm']
>>> zip(ticker, ebitda)
[('aapl', 73961996288), ('phm', 8618000384)]
If you are selectively adding ebitda values to a list, you'd also have to record what ticker values you processed:
used_ticker.append(i)
and use that new list.
Or you could just start with an empty ebitda_formatted dictionary and add to that in the loop:
ebitda_formatted[i] = ebitda_dict['raw']
Related
I'm trying to gather dividend yields from multiple stocks via yfinance. I have a loop which creates a CSV-file for each ticker with historical data.
When I've downloaded dividend data via a function previously, it has worked - basically I created a function with a for-loop and then appended a dataframe with the stocks.
However, now I want to do it the same way but with a boolean expression instead, and it's not working.. I'm not getting any errors but I'm not receiving any ticker symbols (which I know satisfy the condition). I've tried to formulate the boolean loop differently, without success.
What am I doing wrong? Below is my code:
import yfinance as yf
import pandas as pd
import os
df = pd.read_csv(r'C:\\Users\Name\Stocks\Trading\teststocks.csv')
tickers = df["Symbol"].tolist()
i=0
listlength = len(tickers)
for ticker in tickers:
i=i+1
print("Downloading data for",ticker,",",i,"of",listlength)
df = yf.download(ticker, period = "max", interval = "1wk", rounding = True)
df.dropna(inplace=True)
df.to_csv(os.path.join("C:\\Users\Name\Stocks\dataset",ticker + ".csv"))
def dividend(df):
info = yf.Ticker(ticker).info
div = info.get("dividendYield")
if div is None:
pass
elif div > 0.04:
return True
else:
return False
for filename in os.listdir("C:\\Users\Name\Stocks\dataset"):
df = pd.read_csv("C:\\Users\Name\Stocks\dataset\{}".format(filename))
if dividend(df):
print("{}".format(filename))
So this function is looping through the ticker symbols from the dataset folder and getting the dividend data from yfinance, however it's not returning with the ticker that satisfy the condition - which in this case is if the dividend yield is higher than 4%. The first dataframe being read is a CSV file with the ticker symbols in the OMXS30 - so for example HM-B.ST should appear from the dividend function..
Another thing that I want to add is that I'm using the same logic for a function for marketcap, which does work. See below:
def marketcap(df):
info = yf.Ticker(ticker).info
mcap = info.get("marketCap")
if mcap is None:
pass
elif mcap > 10000000000:
return True
else:
return False
for filename in os.listdir("C:\\Users\Name\Stocks\dataset"):
df = pd.read_csv("C:\\Users\Name\Stocks\dataset\{}".format(filename))
if marketcap(df):
print("{}".format(filename))
I do not know why the dividend boolean expression does not work, when the marketcap does work.
Thanks in advance.
Neither the function dividend nor marketcap is working as it should. The reason has to do with the following:
for ticker in tickers:
# do stuff
Here you are taking a list of tickers and doing some stuff for each ticker in this list. This means that by the end of your loop, the variable ticker equals the last item in the list. E.g. suppose tickers = ['HM-B.ST','AAPL'], then ticker will at the end equal AAPL.
Now, let's have a look at your function dividend:
def dividend(df):
info = yf.Ticker(ticker).info
div = info.get("dividendYield")
if div is None:
pass
elif div > 0.04:
return True
else:
return False
This function has one argument (df), but it is not actually using it. Instead you are applying yf.Ticker(...).info to a variable ticker, which is no longer being updated at all. If the function is not returning any True values, this must simply mean that the last ticker (e.g. "AAPL") does not represent a dividend stock. So, to fix this you want to change the input for the function: def dividend(ticker). Write something like:
for filename in os.listdir("C:\\Users\Name\Stocks\dataset"):
df = pd.read_csv("C:\\Users\Name\Stocks\dataset\{}".format(filename))
# e.g. with filename like "HM-B.ST.csv", split at "."
# and select only first part
ticker = filename.split('.')[0]
if dividend(ticker):
print("{}".format(filename))
You need to make the same change for your function marketcap. Again, if this function is currently returning True values, this just means that your last list item references a stock has a higher mcap than the threshold.
Edit: Suggested refactored code
import yfinance as yf
import pandas as pd
tickers = ['ABB.ST','TELIA.ST','ELUX-B.ST','HM-B.ST']
def buy_dividend(ticker):
info = yf.Ticker(ticker).info
# keys we need
keys = ['marketCap','trailingPE','dividendYield']
# store returned vals in a `list`. E.g. for 'HM-B.ST':
# [191261163520, 13.417525, 0.0624], i.e. mcap, PE, divYield
vals = [info.get(key) for key in keys]
# if *any* val == `None`, `all()` will be `False`
if all(vals):
# returns `True` if *all* conditions are met, else `False`
return (vals[0] > 1E10) & (vals[1] < 20) & (vals[2] > 0.04)
return False
for ticker in tickers:
# `progress=False` suppresses the progress print
df = yf.download(ticker, period = "max", interval = "1wk",
rounding = True, progress = False)
df.dropna(inplace=True)
if df.empty:
continue
# df.to_csv(os.path.join("C:\\Users\Name\Stocks\dataset",ticker + ".csv"))
# get last close & mean from column `df.Close`
last_close = df.loc[df.index.max(),'Close']
mean = df.Close.mean()
if last_close < mean:
if buy_dividend(ticker):
print("{} is a good buy".format(ticker))
else:
print("{} is not a good buy".format(ticker))
This will print:
TELIA.ST is not a good buy
ELUX-B.ST is a good buy
HM-B.ST is a good buy
# and will silently pass over 'ABB.ST', since `(last_close < mean) == False` here
New function looks like this:
def buy_dividend(ticker):
if df.empty:
pass
else:
last_close = df[-1:]['Close'].values[0]
mean = df["Close"].mean()
if last_close < mean:
info = yf.Ticker(ticker).info
mcap = info.get("marketCap")
if mcap is None:
pass
elif mcap > 1E10:
PE = info.get('trailingPE')
if PE is None:
pass
elif PE < 20:
div = info.get("dividendYield")
if div is None:
pass
elif div > 0.04:
return True
for filename in os.listdir("C:\\Users\Andreas\Aktieanalys\dataset"):
df = pd.read_csv("C:\\Users\Andreas\Aktieanalys\dataset\{}".format(filename))
if buy_dividend(ticker):
print("{} is a good buy".format(filename))
But somehow the dividend yield are messing things up. If the rows containing "div" are being #, then the function works perfect and correctly. Why is that?
I'm trying to get some data for multiple stocks, but simple for loop does not iterate over the class.
For example:
In[2]: import yfinance as yf
stock = yf.Ticker('AAPL')
stock.info.get('sharesOutstanding')
Out[2]: 4375479808
And when I'm trying something like:
t = ['AAPL', 'MSFT']
for str in t:
stock = yf.Ticker(str)
a = []
a = stock.info.get('sharesOutstanding')
I get only MSFT shares outstanding.
Ideally, the result must be a dataframe like:
sharesOutstanding
AAPl 4375479808
MSFT 7606049792
Any ideas how to realise it? Actually I have list of about 6375 stocks, but if there would be a solve for two stocks, then code cample can be used for multiple stocks, I think.
PROBLEM SOLVING:
a = []
b = []
for str in t:
try:
stock = yf.Ticker(str)
a.append(stock.info.get('sharesOutstanding'))
b.append(stock.info.get('symbol'))
except KeyError:
continue
except IndexError:
continue
shares_ots = pd.DataFrame(a, b)
The problem most likely occurs because the a list is declared locally within the loop, meaning that the data it holds is overridden each iteration.
To solve the issue, we can declare the list outside of the scope of the loop. This way, it can retain its information.
t = ['AAPL', 'MSFT']
a = []
for str in t:
stock = yf.Ticker(str)
a.append(stock.info.get('sharesOutstanding'))
Alternatively, you can use another built-in function in the API as shown in the docs.
tickers = yf.Tickers('aapl msft')
# ^ returns a named tuple of Ticker objects
# access each ticker
tickers.msft.info.get('sharesOutstanding'))
tickers.aapl.info.get('sharesOutstanding'))
EDIT
If you prefer, you can simplify the loop with list comprehension as shown:
t = ['AAPL', 'MSFT']
a = [yf.Ticker(str).info.get('sharesOutstanding') for str in t]
Because the Ticker(str).info object is a Python dictionary, we can pass in an additional argument to the get function to specify a default fallback value.
a = [yf.Ticker(str).info.get('sharesOutstanding', 'NaN') for str in t]
In this case, if the dictionary does not have the 'sharesOutstanding' key, it will default to None. This way, we can ensure that len(a) == len(t).
To create a pandas data frame, try something like
df = pd.DataFrame(a, t, columns=['sharesOutstanding'])
You are re-creating an array on each iteration, and not correctly appending to that array anyway. Try this:
t = ['AAPL', 'MSFT']
a = []
for str in t:
stock = yf.Ticker(str)
a.append(stock.info.get('sharesOutstanding'))
I have a dataframe which looks like the following (Name of the first dataframe(image below) is relevantdata in the code):
I want the dataframe to be transformed to the following format:
Essentially, I want to get the relevant confirmed number for each Key for all the dates that are available in the dataframe. If a particular date is not available for a Key, we make that value to be zero.
Currently my code is as follows (A try/except block is used as some Keys don't have the the whole range of dates, hence a Keyerror occurs the first time you refer to that date using countrydata.at[date,'Confirmed'] for the respective Key, hence the except block will make an entry of zero into the dictionary for that date):
relevantdata = pandas.read_csv('https://raw.githubusercontent.com/open-covid-19/data/master/output/data_minimal.csv')
dates = relevantdata['Date'].unique().tolist()
covidcountries = relevantdata['Key'].unique().tolist()
data = dict()
data['Country'] = covidcountries
confirmeddata = relevantdata[['Date','Key','Confirmed']]
for country in covidcountries:
for date in dates:
countrydata = confirmeddata.loc[lambda confirmeddata: confirmeddata['Key'] == country].set_index('Date')
try:
if (date in data.keys()) == False:
data[date] = list()
data[date].append(countrydata.at[date,'Confirmed'])
else:
data[date].append(countrydata.at[date,'Confirmed'])
except:
if (date in data.keys()) == False:
data[date].append(0)
else:
data[date].append(0)
finaldf = pandas.DataFrame(data = data)
While the above code accomplished what I want in getting the dataframe in the format I require, it is way too slow, having to loop through every key and date. I want to know if there is a better and faster method to doing the same without having to use a nested for loop. Thank you for all your help.
I'm trying to get the historical stock price data for all these tickers going back to 2014. All of these companies went public in 2014, so it will automatically get them from the day they first traded.
What I would like is for the stocklist list to contain at the end is a list of dataframes/price histories for each company, but separately and not put together.
So stocklist would be data frames/stock histories for each company, i.e. ['LC', 'ZAYO'] etc.
tickers = ['LC', 'ZAYO', 'GPRO', 'ANET', 'GRUB', 'CSLT', 'ONDK', 'QUOT', 'NEWR', 'ATEN']
stocklist = []
for i in tickers:
stock = Share(i)
adj = stock.get_historical('2014-1-1', '2016-12-27')
df = pd.DataFrame(adj)
df = df.set_index('Date')
df['Adj_Close'] = df['Adj_Close'].astype(float, errors='coerce')
price = df.sort()
i = price
stocklist.append(i)
You're not appending to stocklist inside the loop due to bad indentation.
Also, you're messing with the loop variable i needlessly.
This might work, although it's difficult to test since the Share class is not available:
tickers = ['LC', 'ZAYO', 'GPRO', 'ANET', 'GRUB',
'CSLT', 'ONDK', 'QUOT', 'NEWR', 'ATEN']
stocklist = []
for ticker in tickers:
stock = Share(ticker)
adj = stock.get_historical('2014-1-1', '2016-12-27')
df = pd.DataFrame(adj)
df.set_index('Date', inplace=True)
df['Adj_Close'] = df['Adj_Close'].astype(float, errors='coerce')
df.sort_index(inplace=True)
stocklist.append(df)
Changes I made:
use tickers as a variable name instead of list which is the name of a built-in type
set index and sort the dataframe in-place instead of making copies
use DataFrame.sort_index() for sorting since DataFrame.sort() is deprecated
fixed indentation so stocklist is populated inside the loop
removed the unnecessary assignment before stocklist appending
It might also be more useful to collect the dataframes in a dictionary keyed by tickers. So you would initialize stocklist = {} and instead of appending do stocklist[ticker] = df.
For self-practice, I'm writing a dictionary program that stores data in the following data structure: [(average,month),(average,month),....,(average,month)]. The datafile is called table.csv and can be found in the link:
http://www.cse.msu.edu/~cse231/PracticeOfComputingUsingPython/05_ListsTuples/AppleStock/
The question I have is why does the list, testList[x][0], go blank when this condition becomes false?:
if dates == UniqueDates[x]:
When x = 0, such that testList[0][0], and the condition is True, the list is [474.98, 468.22, 454.7, 455.19, 439.76, 450.99]. But, when it becomes False, that same list, testList[0][0], mysteriously becomes [ ]. Why aren't the values in the list being kept?
f = open('table.csv','r').readlines()
col = 6
testList = []
uniqueDates = []
x = 0
for i in range(1,len(f)):
dates = f[i].split(',')[0][:7]
column = float(f[i].split(',')[col])
if dates not in uniqueDates:
uniqueDates.append(dates)
testList.append(())
testList[x] = [],dates
if dates == uniqueDates[x]:
testList[x][0].append(column)
else:
testList[x][0].append((mean(testList[x][0]),uniqueDates[x]))
x += 1
testList[x][0].append(column)
Consider this section:
if dates not in uniqueDates:
uniqueDates.append(dates)
testList.append(())
testList[x] = [],dates
The first time this executes is when processing line 7, the first time the month changes. Before executing this section, x == 0; so the last line in this block replaces the first element of testList. I think you want it to replace the new empty element that you just appended.
I suspect what you want here is to simply combine the last two lines into one:
if dates not in uniqueDates:
uniqueDates.append(dates)
testList.append(([],dates))