I'm trying to get some data for multiple stocks, but simple for loop does not iterate over the class.
For example:
In[2]: import yfinance as yf
stock = yf.Ticker('AAPL')
stock.info.get('sharesOutstanding')
Out[2]: 4375479808
And when I'm trying something like:
t = ['AAPL', 'MSFT']
for str in t:
stock = yf.Ticker(str)
a = []
a = stock.info.get('sharesOutstanding')
I get only MSFT shares outstanding.
Ideally, the result must be a dataframe like:
sharesOutstanding
AAPl 4375479808
MSFT 7606049792
Any ideas how to realise it? Actually I have list of about 6375 stocks, but if there would be a solve for two stocks, then code cample can be used for multiple stocks, I think.
PROBLEM SOLVING:
a = []
b = []
for str in t:
try:
stock = yf.Ticker(str)
a.append(stock.info.get('sharesOutstanding'))
b.append(stock.info.get('symbol'))
except KeyError:
continue
except IndexError:
continue
shares_ots = pd.DataFrame(a, b)
The problem most likely occurs because the a list is declared locally within the loop, meaning that the data it holds is overridden each iteration.
To solve the issue, we can declare the list outside of the scope of the loop. This way, it can retain its information.
t = ['AAPL', 'MSFT']
a = []
for str in t:
stock = yf.Ticker(str)
a.append(stock.info.get('sharesOutstanding'))
Alternatively, you can use another built-in function in the API as shown in the docs.
tickers = yf.Tickers('aapl msft')
# ^ returns a named tuple of Ticker objects
# access each ticker
tickers.msft.info.get('sharesOutstanding'))
tickers.aapl.info.get('sharesOutstanding'))
EDIT
If you prefer, you can simplify the loop with list comprehension as shown:
t = ['AAPL', 'MSFT']
a = [yf.Ticker(str).info.get('sharesOutstanding') for str in t]
Because the Ticker(str).info object is a Python dictionary, we can pass in an additional argument to the get function to specify a default fallback value.
a = [yf.Ticker(str).info.get('sharesOutstanding', 'NaN') for str in t]
In this case, if the dictionary does not have the 'sharesOutstanding' key, it will default to None. This way, we can ensure that len(a) == len(t).
To create a pandas data frame, try something like
df = pd.DataFrame(a, t, columns=['sharesOutstanding'])
You are re-creating an array on each iteration, and not correctly appending to that array anyway. Try this:
t = ['AAPL', 'MSFT']
a = []
for str in t:
stock = yf.Ticker(str)
a.append(stock.info.get('sharesOutstanding'))
Related
I'm trying to create in Python what a macro does in SAS. I have a list of over 1K tickers that I'm trying to download information for but doing all of them in one step made python crash so I split up the data into 11 portions. Below is the code we're working with:
t0=t.time()
printcounter=0
for ticker in tickers1:
printcounter+=1
print(printcounter)
try:
selected = yf.Ticker(ticker)
shares = selected.get_shares()
shares_wide = shares.transpose()
info=selected.info
market_cap=info['marketCap']
sector=info['sector']
name=info['shortName']
comb = shares_wide.assign(market_cap_oct22=market_cap,sector=sector,symbol=ticker,name=name)
company_info_1 = company_info_1.append(comb)
except:
comb = pd.DataFrame()
comb = comb.append({'symbol':ticker,'ERRORFLAG':'ERROR'},ignore_index=True)
company_info_1 = company_info_1.append(comb)
print("total run time:", round(t.time()-t0,3),"s")
What I'd like to do is instead of re-writing and running this code for all 11 portions of data and manually changing "tickers1" and "company_info_1" to "tickers2" "company_info_2" "tickers3" "company_info_3" (and so on)... I'd like to see if there is a way to make a python version of a SAS macro/call so that I can get this data more dynamically. Is there a way to do this in python?
You need to generalize your existing code and wrap it in a function.
def comany_info(tickers):
for ticker in tickers:
try:
selected = yf.Ticker(ticker) # you may also have to pass the yf object
shares = selected.get_shares()
shares_wide = shares.transpose()
info=selected.info
market_cap=info['marketCap']
sector=info['sector']
name=info['shortName']
comb = shares_wide.assign(market_cap_oct22=market_cap,sector=sector,symbol=ticker,name=name)
company_info = company_info.append(comb)
except:
comb = pd.DataFrame()
comb = comb.append({'symbol':ticker,'ERRORFLAG':'ERROR'},ignore_index=True)
company_info = company_info.append(comb)
return company_info # return the dataframe
Create a master dataframe to collect your results from the function call. Loop over the 11 groups of tickers passing each group into your function. Append the results to your master.
# master df to collect results
master = pd.DataFrame()
# assuming you have your tickers in a list of lists
# loop over each of the 11 groups of tickers
for tickers in groups_of_tickers:
df = company_info(tickers) # fetch data from Yahoo Finance
master = master.append(df))
Please note I typed this on the fly. I have no way of testing this. I'm quite sure there are syntactical issues to work through. Hopefully it provides a framework for how to think about the solution.
I am trying to pull out multiple ticker data from the yfinance API and save it to a csv file (in total I have 1000 tickers I need to get the data for, that data being the entire table of date, open, high, low, close, volume, etc etc), so far I am able to successfully get data for 1 ticker by using the following Python code:
import yfinance as yf
def yfinance(ticker_symbol):
ticker_data = yf.Ticker(ticker_symbol)
tickerDF = ticker_data.history(period='1d', start='2020-09-30', end='2020-10-31')
print(tickerDF)
yfinance('000001.SS')
However if I try on multiple tickers this doesn't work. Following the yfinance docs which say for multiple tickers use:
tickers = yf.Tickers('msft aapl goog')
# ^ returns a named tuple of Ticker objects
# access each ticker using (example)
tickers.tickers.MSFT.info
tickers.tickers.AAPL.history(period="1mo")
tickers.tickers.GOOG.actions
I have a couple of issue here, the docs use a string such as 'aapl' my tickers are all of digit format like '000001.SS', the ".SS" part is proving to be an issue when passing it into the code:
tickers.tickers.000001.SS.history(period="1mo")
# Clearly this wont for for a start
The next issue I am having is, even if I pass in for example 3 tickers to my function like so:
yfinance('000001.SS 000050.KS 00006.KS')
# similar to yfinance docs of tickers = yf.Tickers('msft aapl goog')
I get errors like:
AttributeError: 'Tickers' object has no attribute '000001.SS'
(I have also tried to run these into a for loop and pass each on to the Tickers object but get the same error.)
Im stuck now, I dont know how to pass in multiple tickers to yfinance and get back data that I want and the docs aren't very helpful.
Is anyone able to help me with this?
Could you not just store them in an array specifying the type as dtype object then use that pull the data from.
import yfinance as yf
import numpy as np
tickers = ['msft', 'aapl', 'goog']
totalPortfolio = np.empty([len(tickers)], dtype=object)
num = 0
for ticker in tickers:
totalPortfolio[num] = yf.download(ticker, start='2020-09-30', end='2020-10-31', interval="1d")
num = num + 1
Take a look at the code below:
test = yf.Tickers("A B C")
# creates test as a yf.tickers object
test_dict = test.tickers
# creates a dict object containing the individual tickers. Can be checked with type()
You are trying to use "tickers.tickers.MSFT.info" to retrieve the ticker data from your dictionary "tickers.tickers" but like your error message says, a dict object has no attributes named after your specific ticker names. This is in general not how you access elements in a dictionary.
Instead you should use the code as below (like with all dict objects):
#old code from above
test = yf.Tickers("A B C")
test_dict = test.tickers
#new code accessing the dict correctly
a_data = test_dict["A"]
a_data = test.tickers["A"] #does the same as the line above
b_data = test.tickers["B"] #and so on for the other tickers
In a loop this could look something like this:
ticker_list = ["A", "B", "C"] #add tickers as needed
tickers_data = {}
tickers_history = {}
for ticker in ticker_list:
tickers_data[ticker] = yf.Ticker(ticker)
tickers_history = tickers_data[ticker].history(period='1d', start='2020-09-30', end='2020-10-31')
#access the dicts as needed using tickers_data[" your ticker name "]
alternatively you can also use the "yf.Tickers" function to retrieve multiple tickers at once, but because you save the history seperately I don't think this will necessarily improve your code much.
You should pay attention however, that "yf.Ticker()" and "yf.Tickers()" are different functions from each other with differing syntax and are not interchangeable.
You did mix that up when you tried accessing multiple tickers with your custom "yfinance()" function, that has been previously defined with the "yf.Ticker()" function and thus only accepts one symbol at a time.
Here I try to calculate mean value based on the data in two list of dicts. Although I used same code before, I keep getting error. Is there any solution?
import pandas as pd
data = pd.read_csv('data3.csv',sep=';') # Reading data from csv
data = data.dropna(axis=0) # Drop rows with null values
data = data.T.to_dict().values() # Converting dataframe into list of dictionaries
newdata = pd.read_csv('newdata.csv',sep=';') # Reading data from csv
newdata = newdata.T.to_dict().values() # Converting dataframe into list of dictionaries
score = []
for item in newdata:
score.append({item['Genre_Name']:item['Ranking']})
from statistics import mean
score={k:int(v) for i in score for k,v in i.items()}
for item in data:
y= mean(map(score.get,map(str.strip,item['Recommended_Genres'].split(','))))
print(y)
Too see csv files: https://repl.it/#rmakakgn/SVE2
.get method of dict return None if given key does not exist and statistics.mean fail due to that, consider that
import statistics
d = {"a":1,"c":3}
data = [d.get(x) for x in ("a","b","c")]
print(statistics.mean(data))
result in:
TypeError: can't convert type 'NoneType' to numerator/denominator
You need to remove Nones before feeding into statistics.mean, which you can do using list comprehension:
import statistics
d = {"a":1,"c":3}
data = [d.get(x) for x in ("a","b","c")]
data = [i for i in data if i is not None]
print(statistics.mean(data))
or filter:
import statistics
d = {"a":1,"c":3}
data = [d.get(x) for x in ("a","b","c")]
data = filter(lambda x:x is not None,data)
print(statistics.mean(data))
(both snippets above code will print 2)
In this particular case, you might get filter effect by replacing:
mean(map(score.get,map(str.strip,item['Recommended_Genres'].split(','))))
with:
mean([i for i in map(score.get,map(str.strip,item['Recommended_Genres'].split(','))) if i is not None])
though as with most python built-in and standard library functions accepting list as sole argument, you might decide to not build list but feed created generator directly i.e.
mean(i for i in map(score.get,map(str.strip,item['Recommended_Genres'].split(','))) if i is not None)
For further discussion see PEP 202 xor PEP 289.
No matter what I do I don't seem to be able to add all the base volumes and quote volumes together easily! I want to end up with a total base volume and a total quote volume of all the data in the data frame. Can someone help me on how you can do this easily?
I have tried summing and saving the data in a dictionary first and then adding it but I just don't seem to be able to make this work!
import urllib
import pandas as pd
import json
def call_data(): # Call data from Poloniex
global df
datalink = 'https://poloniex.com/public?command=returnTicker'
df = urllib.request.urlopen(datalink)
df = df.read().decode('utf-8')
df = json.loads(df)
global current_eth_price
for k, v in df.items():
if 'ETH' in k:
if 'USDT_ETH' in k:
current_eth_price = round(float(v['last']),2)
print("Current ETH Price $:",current_eth_price)
def calc_volumes(): # Calculate the base & quote volumes
global volume_totals
for k, v in df.items():
if 'ETH' in k:
basevolume = float(v['baseVolume'])*current_eth_price
quotevolume = float(v['quoteVolume'])*float(v['last'])*current_eth_price
if quotevolume > 0:
percentages = (quotevolume - basevolume) / basevolume * 100
volume_totals = {'key':[k],
'basevolume':[basevolume],
'quotevolume':[quotevolume],
'percentages':[percentages]}
print("volume totals:",volume_totals)
print("#"*8)
call_data()
calc_volumes()
A few notes:
For the next 2 years don't use the keyword globals for anything.
put function documentation under the function in quotes
using the requests library will be much easier than urllib. However ...
pandas can fetch the JSON and parse it all in one step
ok it doesn't have to be as split up as this, I'm just showing you how to properly pass variables around instead of globals.
I could not find "ETH" by itself. In the data they sent they have these 3 ['BTC_ETH', 'USDT_ETH', 'USDC_ETH']. So I used "USDT_ETH" I hope the substitution is ok.
calc_volumes is seeming to do the calculation and being some sort of filter (it's picky as to what it prints). This function needs to be broken up in to it's two separate jobs. printing and calculating. (maybe there was a filter step but I leave that for homework)
.
import pandas as pd
eth_price_url = 'https://poloniex.com/public?command=returnTicker'
def get_data(url=''):
""" Call data from Poloniex and put it in a dataframe"""
data = pd.read_json(url)
return data
def get_current_eth_price(data = None):
""" grab the price out of the dataframe """
current_eth_price = data['USDT_ETH']['last'].round(2)
return current_eth_price
def calc_volumes(data=None, current_eth_price=None):
""" Calculate the base & quote volumes """
data = df[df.columns[df.columns.str.contains('ETH')]].loc[['baseVolume', 'quoteVolume', 'last']]
data = data.transpose()
data[['baseVolume','quoteVolume']]*= current_eth_price
data['quoteVolume']*=data['last']
data['percentages']=(data['quoteVolume'] - data['baseVolume']) / data['quoteVolume'] * 100
return data
df = get_data(url = eth_price_url)
the_price = get_current_eth_price(data = df)
print(f'the current eth price is: {the_price}')
volumes = calc_volumes(data=df, current_eth_price=the_price)
print(volumes)
This code seems kind of odd and inconsistent... for example, you're importing pandas and calling your variable df but you're not actually using dataframes. If you used df = pd.read_json('https://poloniex.com/public?command=returnTicker', 'index')* to get a dataframe, most of your data manipulation here would become much easier, and wouldn't require any loops either.
For example, the first function's code would become as simple as current_eth_price = df.loc['USDT_ETH','last'].
The second function's code would basically be
eth_rows = df[df.index.str.contains('ETH')]
total_base_volume = (eth_rows.baseVolume * current_eth_price).sum()
total_quote_volume = (eth_rows.quoteVolume * eth_rows['last'] * current_eth_price).sum()
(*The 'index' argument tells pandas to read the JSON dictionary indexed by rows, then columns, rather than columns, then rows.)
The problem I'm having is that the continue command is skipping inconsistently. It skips the numerical output ebitda but puts the incorrect ticker next to it. Why is this? If I make the ticker just phm an input it should skip, it correctly prints an empty list [] but when an invalid ticker is placed next to a valid one, the confusion starts happening.
import requests
ticker = ['aapl', 'phm', 'mmm']
ebitda = []
for i in ticker:
r_EV=requests.get('https://query2.finance.yahoo.com/v10/finance/quoteSummary/'+i+'?formatted=true&crumb=8ldhetOu7RJ&lang=en-US®ion=US&modules=defaultKeyStatistics%2CfinancialData%2CcalendarEvents&corsDomain=finance.yahoo.com')
r_ebit = requests.get('https://query1.finance.yahoo.com/v10/finance/quoteSummary/' + i + '?formatted=true&crumb=B2JsfXH.lpf&lang=en-US®ion=US&modules=incomeStatementHistory%2CcashflowStatementHistory%2CbalanceSheetHistory%2CincomeStatementHistoryQuarterly%2CcashflowStatementHistoryQuarterly%2CbalanceSheetHistoryQuarterly%2Cearnings&corsDomain=finance.yahoo.com%27')
data = r_EV.json()
data1 = r_ebit.json()
if data1['quoteSummary']['result'][0]['balanceSheetHistoryQuarterly']['balanceSheetStatements'][0].get('totalCurrentAssets') == None:
continue #skips certain ticker if no Total Current Assets available (like for PHM)
ebitda_data = data['quoteSummary']['result'][0]['financialData']
ebitda_dict = ebitda_data['ebitda']
ebitda.append(ebitda_dict['raw']) #navigates to dictionairy where ebitda is stored
ebitda_formatted = dict(zip(ticker, ebitda))
print(ebitda_formatted)
# should print {'aapl': 73961996288, 'mmm': 8618000384}
# NOT: {'aapl': 73961996288, 'phm': 8618000384}
The continue works just fine. You produce this list:
[73961996288, 8618000384]
However, you then zip that list with ticker, which still has 3 elements in it, including 'phm'. zip() stops when one of the iterables is empty, so you produce the following tuples:
>>> ebitda
[73961996288, 8618000384]
>>> ticker
['aapl', 'phm', 'mmm']
>>> zip(ticker, ebitda)
[('aapl', 73961996288), ('phm', 8618000384)]
If you are selectively adding ebitda values to a list, you'd also have to record what ticker values you processed:
used_ticker.append(i)
and use that new list.
Or you could just start with an empty ebitda_formatted dictionary and add to that in the loop:
ebitda_formatted[i] = ebitda_dict['raw']