I'm trying to gather dividend yields from multiple stocks via yfinance. I have a loop which creates a CSV-file for each ticker with historical data.
When I've downloaded dividend data via a function previously, it has worked - basically I created a function with a for-loop and then appended a dataframe with the stocks.
However, now I want to do it the same way but with a boolean expression instead, and it's not working.. I'm not getting any errors but I'm not receiving any ticker symbols (which I know satisfy the condition). I've tried to formulate the boolean loop differently, without success.
What am I doing wrong? Below is my code:
import yfinance as yf
import pandas as pd
import os
df = pd.read_csv(r'C:\\Users\Name\Stocks\Trading\teststocks.csv')
tickers = df["Symbol"].tolist()
i=0
listlength = len(tickers)
for ticker in tickers:
i=i+1
print("Downloading data for",ticker,",",i,"of",listlength)
df = yf.download(ticker, period = "max", interval = "1wk", rounding = True)
df.dropna(inplace=True)
df.to_csv(os.path.join("C:\\Users\Name\Stocks\dataset",ticker + ".csv"))
def dividend(df):
info = yf.Ticker(ticker).info
div = info.get("dividendYield")
if div is None:
pass
elif div > 0.04:
return True
else:
return False
for filename in os.listdir("C:\\Users\Name\Stocks\dataset"):
df = pd.read_csv("C:\\Users\Name\Stocks\dataset\{}".format(filename))
if dividend(df):
print("{}".format(filename))
So this function is looping through the ticker symbols from the dataset folder and getting the dividend data from yfinance, however it's not returning with the ticker that satisfy the condition - which in this case is if the dividend yield is higher than 4%. The first dataframe being read is a CSV file with the ticker symbols in the OMXS30 - so for example HM-B.ST should appear from the dividend function..
Another thing that I want to add is that I'm using the same logic for a function for marketcap, which does work. See below:
def marketcap(df):
info = yf.Ticker(ticker).info
mcap = info.get("marketCap")
if mcap is None:
pass
elif mcap > 10000000000:
return True
else:
return False
for filename in os.listdir("C:\\Users\Name\Stocks\dataset"):
df = pd.read_csv("C:\\Users\Name\Stocks\dataset\{}".format(filename))
if marketcap(df):
print("{}".format(filename))
I do not know why the dividend boolean expression does not work, when the marketcap does work.
Thanks in advance.
Neither the function dividend nor marketcap is working as it should. The reason has to do with the following:
for ticker in tickers:
# do stuff
Here you are taking a list of tickers and doing some stuff for each ticker in this list. This means that by the end of your loop, the variable ticker equals the last item in the list. E.g. suppose tickers = ['HM-B.ST','AAPL'], then ticker will at the end equal AAPL.
Now, let's have a look at your function dividend:
def dividend(df):
info = yf.Ticker(ticker).info
div = info.get("dividendYield")
if div is None:
pass
elif div > 0.04:
return True
else:
return False
This function has one argument (df), but it is not actually using it. Instead you are applying yf.Ticker(...).info to a variable ticker, which is no longer being updated at all. If the function is not returning any True values, this must simply mean that the last ticker (e.g. "AAPL") does not represent a dividend stock. So, to fix this you want to change the input for the function: def dividend(ticker). Write something like:
for filename in os.listdir("C:\\Users\Name\Stocks\dataset"):
df = pd.read_csv("C:\\Users\Name\Stocks\dataset\{}".format(filename))
# e.g. with filename like "HM-B.ST.csv", split at "."
# and select only first part
ticker = filename.split('.')[0]
if dividend(ticker):
print("{}".format(filename))
You need to make the same change for your function marketcap. Again, if this function is currently returning True values, this just means that your last list item references a stock has a higher mcap than the threshold.
Edit: Suggested refactored code
import yfinance as yf
import pandas as pd
tickers = ['ABB.ST','TELIA.ST','ELUX-B.ST','HM-B.ST']
def buy_dividend(ticker):
info = yf.Ticker(ticker).info
# keys we need
keys = ['marketCap','trailingPE','dividendYield']
# store returned vals in a `list`. E.g. for 'HM-B.ST':
# [191261163520, 13.417525, 0.0624], i.e. mcap, PE, divYield
vals = [info.get(key) for key in keys]
# if *any* val == `None`, `all()` will be `False`
if all(vals):
# returns `True` if *all* conditions are met, else `False`
return (vals[0] > 1E10) & (vals[1] < 20) & (vals[2] > 0.04)
return False
for ticker in tickers:
# `progress=False` suppresses the progress print
df = yf.download(ticker, period = "max", interval = "1wk",
rounding = True, progress = False)
df.dropna(inplace=True)
if df.empty:
continue
# df.to_csv(os.path.join("C:\\Users\Name\Stocks\dataset",ticker + ".csv"))
# get last close & mean from column `df.Close`
last_close = df.loc[df.index.max(),'Close']
mean = df.Close.mean()
if last_close < mean:
if buy_dividend(ticker):
print("{} is a good buy".format(ticker))
else:
print("{} is not a good buy".format(ticker))
This will print:
TELIA.ST is not a good buy
ELUX-B.ST is a good buy
HM-B.ST is a good buy
# and will silently pass over 'ABB.ST', since `(last_close < mean) == False` here
New function looks like this:
def buy_dividend(ticker):
if df.empty:
pass
else:
last_close = df[-1:]['Close'].values[0]
mean = df["Close"].mean()
if last_close < mean:
info = yf.Ticker(ticker).info
mcap = info.get("marketCap")
if mcap is None:
pass
elif mcap > 1E10:
PE = info.get('trailingPE')
if PE is None:
pass
elif PE < 20:
div = info.get("dividendYield")
if div is None:
pass
elif div > 0.04:
return True
for filename in os.listdir("C:\\Users\Andreas\Aktieanalys\dataset"):
df = pd.read_csv("C:\\Users\Andreas\Aktieanalys\dataset\{}".format(filename))
if buy_dividend(ticker):
print("{} is a good buy".format(filename))
But somehow the dividend yield are messing things up. If the rows containing "div" are being #, then the function works perfect and correctly. Why is that?
Related
sample table here
i am trying to look up corresponding commodity prices from columns(CU00.SHF,AU00.SHF,SC00.SHF,I8888.DCE C00.DCE), with a new set of timestamps, the dates of which are 32 days later than the dates in column 'history_date'.
i tried .loc and .at in a loop to extract the matching values with below functions:
latest_day = data.iloc[data.shape[0] - 1, 0].date()
def next_trade_day(x):
x = pd.to_datetime(x).date() #imported is_workday funtion requires datetime type
while True:
if is_workday(x + timedelta(32)) != False:
break
return (pd.Timestamp((x + timedelta(32))))
if is_workday(x + timedelta(32)) == False:
x = x + timedelta(1)
return pd.Timestamp(x + timedelta(32))
def end_price(x):
x = pd.Timestamp(x)
if x <= latest_day:
return data.at[x,'CU00.SHF']
if x > latest_day:
return'None'
return data.at[x,'CU00.SHF']
but it always gives
KeyError: Timestamp('2023-02-03 00:00:00')
any idea how should i achieve the target?
thanks in advance!
if you want work datetime:
convert column datetime
check date converted, use filte
pd.to_datetime(df['your column'],errors='ignore')
df.loc[df.['your column'] > 'your-date' ]
if work both, then check your full code.
So I am writing a code for a Tkinter GUI, and in it, the code pulls data from FRED and uses it to present graphs. There is an option at the start to save the pulled data in a CSV file so you can run it without the internet. But when the code runs to use the CSV, something happens with the scale and it gives me a graph like this. I think it has something to do with the datetime data not being remembered. Current code situation follows:
Imports: from tkinter import *, from tkinter import ttk, pandas_datareader as pdr, pandas as pd, from datetime import datetime
Example of how data is called:
def getBudgetData():
'''
PURPOSE: Get the government budget balance data
INPUTS: None
OUTPUTS: The dataframe of the selected country
'''
global namedCountry
# Reads what country is in the combobox when selected, then gives the index value so the correct
# code is used in the graph
namedCountry = countryCombo.get()
selectedCountry = countryOptions.index(namedCountry)
df = dfBudget[dfBudget.columns[selectedCountry]]
return df
Code for getting/reading the dataframes
def readDataframeCSV():
global dfCPIQuarterly, dfCPIMonthly, dfGDP, dfUnemployment, dfCashRate, dfBudget
dfCPIQuarterly = pd.read_csv('dataframes\dfCPIQuarterly.csv', infer_datetime_format = True)
dfCPIMonthly = pd.read_csv('dataframes\dfCPIMonthly.csv')
dfGDP = pd.read_csv('dataframes\dfGDP.csv')
dfUnemployment = pd.read_csv('dataframes\dfUnemployment.csv')
dfCashRate = pd.read_csv('dataframes\dfCashRate.csv')
dfBudget = pd.read_csv('dataframes\dfBudget.csv')
def LogDiff(x, frequency):
'''
PURPOSE: Transform level data into growth
INPUTS: x (time series), frequency (frequency of time series)
OUTPUTS: x_diff (growth rate of time series)
REFERENCE: Tau, Ran, & Chris Brookes. (2019). Python Guide to accompany
introductary econometrics for finance (4th Edition).
Cambridge University Press.
'''
x_diff = 100*log(x/x.shift(frequency))
x_diff = x_diff.dropna()
return x_diff
def getAllFredData():
'''
PURPOSE: Extract all required data from FRED
INPUTS: None
OUTPUTS: Dataframes of all time series
REFERENCE: https://fred.stlouisfed.org/
'''
global dfCPIQuarterly, dfCPIMonthly, dfGDP, dfUnemployment, dfCashRate, dfBudget
# Country codes
countryCPIQuarterlyCodes = ['AUSCPIALLQINMEI', 'NZLCPIALLQINMEI']
countryCPIMonthlyCodes = ['CPALCY01CAM661N', 'JPNCPIALLMINMEI', 'GBRCPIALLMINMEI', 'CPIAUCSL']
countryGDPCodes = ['AUSGDPRQDSMEI', 'NAEXKP01CAQ189S', 'JPNRGDPEXP',
'NAEXKP01NZQ189S', 'CLVMNACSCAB1GQUK', 'GDPC1']
countryUnemploymentCodes = ['LRUNTTTTAUQ156S', 'LRUNTTTTCAQ156S', 'LRUN64TTJPQ156S',
'LRUNTTTTNZQ156S', 'LRUNTTTTGBQ156S', 'LRUN64TTUSQ156S']
countryCashRateCodes = ['IR3TBB01AUM156N', 'IR3TIB01CAM156N', 'INTDSRJPM193N',
'IR3TBB01NZM156N', 'IR3TIB01GBM156N', 'FEDFUNDS']
countryBudgetCodes = ['GGNLBAAUA188N', 'GGNLBACAA188N', 'GGNLBAJPA188N',
'NZLGGXCNLG01GDPPT', 'GGNLBAGBA188N', 'FYFSGDA188S']
# Inflation
dfCPIQuarterly = pdr.DataReader(countryCPIQuarterlyCodes,
'fred', start, end)
for country in countryCPIQuarterlyCodes:
dfCPIQuarterly[country] = pd.DataFrame({"Inflation rate":LogDiff(dfCPIQuarterly[country], 4)})
dfCPIMonthly = pdr.DataReader(countryCPIMonthlyCodes,
'fred', start, end)
for country in countryCPIMonthlyCodes:
dfCPIMonthly[country] = pd.DataFrame({"Inflation rate":LogDiff(dfCPIMonthly[country], 12)})
# GDP
dfGDP = pdr.DataReader(countryGDPCodes,
'fred', start, end)
for country in countryGDPCodes:
dfGDP[country] = pd.DataFrame({"Economic Growth":LogDiff(dfGDP[country], 4)})
# Unemployment
dfUnemployment = pdr.DataReader(countryUnemploymentCodes,
'fred', start, end)
# Cash Rate
dfCashRate = pdr.DataReader(countryCashRateCodes,
'fred', start, end)
# Budget
dfBudget = pdr.DataReader(countryBudgetCodes,
'fred', start, end)
print('')
saveToCSVLoop = True
while saveToCSVLoop == True:
saveToCSV = input('Would you like to save the dataframes to a CSV file so start-up will be quicker next (y or n): ')
if saveToCSV == 'y':
dfCPIQuarterly.to_csv('dataframes\dfCPIQuarterly.csv', index = True)
dfCPIMonthly.to_csv('dataframes\dfCPIMonthly.csv', index = False)
dfGDP.to_csv('dataframes\dfGDP.csv', index = False)
dfUnemployment.to_csv('dataframes\dfUnemployment.csv', index = False)
dfCashRate.to_csv('dataframes\dfCashRate.csv', index = False)
dfBudget.to_csv('dataframes\dfBudget.csv', index = False)
saveToCSVLoop = False
elif saveToCSV == 'n':
saveToCSVLoop = False
else:
print('\nNot a valid option')
sleep(1)
It's hard to help you without the csv data. It could be that the dates aren't saved properly, or aren't interpreted properly. Maybe you could try parsing the datetime first. It kind of looks like there are no years, or something that is expected to be the year is actually a month?
Since it starts at 1970, I have a feeling that it's interpreting your time as unix epoch, not normal yyyymmdd type dates. Try printing dfCPIQuarterly and see if it looks like a date. Maybe you shouldn't use infer_datetime_format = True when reading it from the csv, but it's hard to tell without more details.
I need to convert the value of the 'Amount' field to dollar, based on the value of another 'Currency' field, but I don't understand why the value of the first record is repeated to me throughout the dataframe.
Here is my code:
def calculo_dolar_2(data):
valor = (data*1000)/float(precio_dolar)
return valor
df_conversion_dolar_2['ED'] = df_conversion_dolar_2['Currency'].apply(lambda x: ( df_conversion_dolar_2['Amount'].apply(calculo_dolar_2)) if x=='$$' else df_conversion_dolar_2['Amount'])
df_conversion_dolar_2
capture
I am trying in this other way, but without success:
precio_dolar = 800
def calculo_dolar_3(data):
if data == '$$':
valor = (df_conversion_dolar_2['Amount']*1000)/float(precio_dolar)
else:
valor = df_conversion_dolar_2['Amount']
return valor
df_conversion_dolar_2['ED'] = df_conversion_dolar_2['Currency'].apply(lambda x: df_conversion_dolar_2['Amount'].apply(calculo_dolar_3))
df_conversion_dolar_2
What is it due to?
I haven't tested the code but this is how I would do it;
# make your code clear (what is 2?)
df = df_conversion_dolar_2
precio_dolar = 800
# first, let's make a boolean selector
dolar_select = df['Currency'] == '$$$'
# Selecting dollar rows at the column Amount is as follow:
# This line is only to show you what happens and is not
# needed in your final code
df.loc[dolar_select, 'Amount']
# Anyway, now we apply your function to the selected data:
df['ED'] = df.loc[dolar_select, 'Amount'].map(lambda x: (x*1000)/float(precio_dolar))
# Finally, fill the NaN values in your dataframe (the non selected rows)
df.loc[df['ED'].isna(), 'ED'] = df['Amount']
I think what you're trying to do can be accomplished like so
def calculo_dolar_2(data):
valor = (data*1000)/float(precio_dolar)
return valor
df_conversion_dolar_2['ED'] = df_conversion_dolar_2.apply(lambda x: calculo_dolar_2(x['Amount']) if x['Currency']=='$$' else x['Amount'])
I have read numerous StackOverflow threads about looping during try/except statements, using else and finally, if/else statements, and while statements, but none of them address what I want. That or I don't know how to utilise that information to get what I want done.
Basically, I am trying to get adjusted closing stock prices for various companies on a given date. I pasted some dummy data in the code block below to demonstrate (NOTE: you'll have to install pandas and pandas_datareader to get the dummy code to run). The get_stock_adj_close function returns the adj_close price given a ticker and date. The dummy_dataframe contains 4 companies with their tickers and random dates. And the add_days function takes a date and adds any number of days. I would like to append the adjusted close stock prices for each company in the dataframe on the listed date into the stock_prices list.
Because the yahoo stock price database isn't that reliable for older entries and because some dates fall on days when the market is closed, whenever a price isn't available it raises a KeyError: 'Date'. Thus, what I would like to do is keep adding days indefinitely until it finds a date where a price does exist. The problem is it only adds the day once and then raises the same KeyError. I want it to keep adding days until it finds a day where the database has a stock price available and then return back to the dataframe and keep going with the next row. Right now the whole thing breaks on the first GM date (fourth row), which raises the KeyError and the fifth row/second GM date is ignored. Any help is appreciated!
Dummy data:
from datetime import datetime, date, timedelta
import pandas as pd
import pandas_datareader as pdr
from dateutil.relativedelta import relativedelta
def add_days(d, num_days):
return d + timedelta(days=num_days)
def get_stock_adj_close(ticker, chosen_date):
stock_df = pdr.get_data_yahoo(ticker, start = chosen_date, end = chosen_date)
return stock_df.iloc[0]['Adj Close']
d = {'TICKER': ['AMD','AMD','CHTR','GM'], 'DATE': [datetime(2020,2,4), datetime(2019,2,8),datetime(2019,1,31), datetime(2010,4,7)]}
dummy_dataframe = pd.DataFrame(data=d)
stock_prices = []
for i, row in dummy_dataframe.iterrows():
given_date = row['DATE']
try:
stock_price = get_stock_adj_close(row['TICKER'], given_date)
print(stock_price)
stock_prices.append(stock_price)
except KeyError:
given_date = add_days(given_date,1)
stock_price = get_stock_adj_close(row['TICKER'], given_date)
stock_prices.append(stock_price)
print(stock_prices)
I think while loop will help you. For example:
for i, row in dummy_dataframe.iterrows():
given_date = row['DATE']
stock_price_found = False
while not stock_price_found:
try:
stock_price = get_stock_adj_close(row['TICKER'], given_date)
print(stock_price)
stock_prices.append(stock_price)
stock_price_found = False
except KeyError:
given_date = add_days(given_date,1)
Or you can also use while True together with break:
for i, row in dummy_dataframe.iterrows():
given_date = row['DATE']
while True:
try:
stock_price = get_stock_adj_close(row['TICKER'], given_date)
print(stock_price)
stock_prices.append(stock_price)
break
except KeyError:
given_date = add_days(given_date,1)
Don't forget to make sure that you are not stuck in indefinite loop, would be also helpful some other exit conditions from while loop, for example, after 10 failures.
The problem I'm having is that the continue command is skipping inconsistently. It skips the numerical output ebitda but puts the incorrect ticker next to it. Why is this? If I make the ticker just phm an input it should skip, it correctly prints an empty list [] but when an invalid ticker is placed next to a valid one, the confusion starts happening.
import requests
ticker = ['aapl', 'phm', 'mmm']
ebitda = []
for i in ticker:
r_EV=requests.get('https://query2.finance.yahoo.com/v10/finance/quoteSummary/'+i+'?formatted=true&crumb=8ldhetOu7RJ&lang=en-US®ion=US&modules=defaultKeyStatistics%2CfinancialData%2CcalendarEvents&corsDomain=finance.yahoo.com')
r_ebit = requests.get('https://query1.finance.yahoo.com/v10/finance/quoteSummary/' + i + '?formatted=true&crumb=B2JsfXH.lpf&lang=en-US®ion=US&modules=incomeStatementHistory%2CcashflowStatementHistory%2CbalanceSheetHistory%2CincomeStatementHistoryQuarterly%2CcashflowStatementHistoryQuarterly%2CbalanceSheetHistoryQuarterly%2Cearnings&corsDomain=finance.yahoo.com%27')
data = r_EV.json()
data1 = r_ebit.json()
if data1['quoteSummary']['result'][0]['balanceSheetHistoryQuarterly']['balanceSheetStatements'][0].get('totalCurrentAssets') == None:
continue #skips certain ticker if no Total Current Assets available (like for PHM)
ebitda_data = data['quoteSummary']['result'][0]['financialData']
ebitda_dict = ebitda_data['ebitda']
ebitda.append(ebitda_dict['raw']) #navigates to dictionairy where ebitda is stored
ebitda_formatted = dict(zip(ticker, ebitda))
print(ebitda_formatted)
# should print {'aapl': 73961996288, 'mmm': 8618000384}
# NOT: {'aapl': 73961996288, 'phm': 8618000384}
The continue works just fine. You produce this list:
[73961996288, 8618000384]
However, you then zip that list with ticker, which still has 3 elements in it, including 'phm'. zip() stops when one of the iterables is empty, so you produce the following tuples:
>>> ebitda
[73961996288, 8618000384]
>>> ticker
['aapl', 'phm', 'mmm']
>>> zip(ticker, ebitda)
[('aapl', 73961996288), ('phm', 8618000384)]
If you are selectively adding ebitda values to a list, you'd also have to record what ticker values you processed:
used_ticker.append(i)
and use that new list.
Or you could just start with an empty ebitda_formatted dictionary and add to that in the loop:
ebitda_formatted[i] = ebitda_dict['raw']