I'm attempting to determine which loans in a loan portfolio exceed the FHFA County Loan Limit to project impact of upcoming law changes for a study. I've had versions of the code work with a small (14k loans) sample set, but when importing the full portfolio (5.6m) the code does not work. I'm definitely pretty new to Python, my experience is limited to SAS and R, and that's admittedly rusty.
As I don't have access to live data, I'm importing the data w/ chunksize of 5k which has alleviated memory issues. and I've imported the loan limit data from the FHFA website, and created a dictionary for year, state, and county code.
I also used pd.to_datetime() and a .notnull() in an attempt to remove nulls from the data and county fields.
def loan_calculation_new(row):
year = row['PROCESSED_DATE'].year
if row['PROCESSED_DATE'].month > 9:
year += 1
state_dict = year_dict[year]
if row['FIPS_STATE_CODE'] not in state_dict:
print("No State Code")
return None
county_dict = state_dict[row['FIPS_STATE_CODE']]
if row['FIPS_COUNTY_CODE'] not in county_dict:
limit = 485300
return
limit = county_dict[row['FIPS_COUNTY_CODE']]
limit > row['MTGE_LOAN_AMOUNT'].astype(int)
I keep getting this error when trying to run the calculation:
AttributeError: ("'str' object has no attribute 'year'", 'occurred at index 0')
I'm wondering if the issue is with my data being pipe delineated, and not being interpreted as a date. The Sample was a .csv and seemed to work.
it seems the col PROCESSED_DATE is string, so you need to convert to datetime
if the row from dataframe, you can do:
df['PROCESSED_DATE'] = pd.to_datetime(df['PROCESSED_DATE'])
import datetime
def loan_calculation_new(row):
year = datetime.strptime(row['PROCESSED_DATE'], "<EXPECTED FORMAT>").year
if row['PROCESSED_DATE'].month > 9:
year += 1
...
Related
I'm trying to get index members using Bloomberg APIs in Python. I have no issues getting current constituents, but I want a historical list (example: what where Russell 1000 or S&P 500 constituents as of Q1 1995).
To get the current index members I can use following:
In excel I can use INDX_MEMBERS to get the constituents:
=BDS("Index Ticker", INDX_MEMBERS)
In Python:
import pybbg
def Main():
bbg = pybbg.Pybbg()
IndexConst = bbg.bds('IndexName', 'INDX_MEMBERS')
or:
from tia.bbg import LocalTerminal
resp = LocalTerminal.get_reference_data(index_ticker + ' INDEX', 'INDX_MEMBERS')
members = resp.as_frame().iloc[0,0]
Question is how can I get historical index members/constituents. For example I would generate quarterly dates and then I want to know list of constituents for each date.
['2020-06-30',
'2020-03-31',
'2019-12-31',
'2019-09-30',
'2019-06-30',
'2019-03-31',
'2018-12-31' ... '1980-06-30',]
I've tried many solutions, including one below where I'm getting an empty frame:
from tia.bbg import LocalTerminal
date_start = datetime.date(2010,6,28)
date_end = datetime.date(2020,6,28)
members_russell1000_3 = LocalTerminal.get_historical('RIY Index', 'INDX_MEMBERS',start=date_start, end=date_end,).as_frame()
or the solution below, where regardless of date (now or 20 years ago) I'm receiving the same list of constituents:
from xbbg import blp
members = blp.bds('RIY Index', 'INDX_MEMBERS', DVD_Start_Dt=k[1], DVD_End_Dt=k[1])
Variable Explanation to above examples:
'RIY Index' - Russell 1000 index ticker
'INDX_MEMBERS' - Bloomberg field (flds) for list of index constituents
Alternatively I would be happy if I could get historical list of changes to index constituents with dates (I already have current constituents)
You need to use the INDX_MWEIGHT_PX field and the END_DATE_OVERRIDE override (date format: yyyymmdd). It is a reference data request, so probably bds and not bdh in the python library but I've never used it, so not 100% sure and you may need to try a few solutions until you find the correct one.
I've found that the below works
blp.bds('RIY Index', "INDX_MWEIGHT", END_DATE_OVERRIDE="20210101")
and gives the same results as an excel query
=BDS("RIY Index", "INDX_MWEIGHT_HIST", "END_DATE_OVERRIDE",'20210101')
Alternatively using "INDX_MWEIGHT_PX" gives the actual weight and current price values also.
I am very very new to Python (no coding history or skills whatsoever). I have been trying to automate pulling data from Yahoo and have built the following program from whatever I could find on the net. SO please excuse the poor coding attempt (however it almost works perfectly for me).
I am trying to download listed financial stock data (as you'll see in the code)
I want it downloaded to a specific excel sheet - in its raw form (as I link it to another excel sheet which runs my calculations).
Here is the problem. The following code works perfectly for all US stocks, a bunch of EU stocks but none for Australian / NZ and some EU stocks where i get the error: "Length mismatch: Expected axis has 4 elements, new values have 2 element"
I am absolutely stumped. It was working previously - then I started playing around with matplotlib and now nothing is working for Australian / New Zealand (and some EU) Stocks.
Any help whatsoever is greatly appreciated and again, I am brand new to this so please go easy: Here is my code:
import pandas as pd
import yfinance as yf
import yahoofinancials
from yahoo_fin.stock_info import *
from openpyxl import load_workbook
x = input("Enter Stock: ")
a = (x)
datatoexcel = pd.ExcelWriter("File.xlsx",engine='xlsxwriter')
data = stats_df = si.get_stats(a)
stats_df.to_excel(datatoexcel, sheet_name='Stats')
data = StatsVal_df = get_stats_valuation(x)
StatsVal_df.to_excel(datatoexcel, sheet_name='Stats Val')
data = BS_df = si.get_balance_sheet(a)
BS_df.to_excel(datatoexcel, sheet_name='Balance Sheet')
data = IS_df = si.get_income_statement(a)
IS_df.to_excel(datatoexcel, sheet_name='PnL')
data = CF_df = si.get_cash_flow(a)
CF_df.to_excel(datatoexcel, sheet_name='CashFlow')
Data = Data_df = get_data(x)
Data_df.to_excel(datatoexcel, sheet_name='Historical Price History')
datatoexcel.save()
The issue is mainly contained to:
data = stats_df = si.get_stats(a)
stats_df.to_excel(datatoexcel, sheet_name='Stats')
so, for example, I will run "GOOGL" / "AAPL" / "MSFT" / "BSX" / "BMW.DE" and it works perfectly. Yet, when I run "NAN.AX" / "CBA.AX" or any other stock like that - i get the error: Length mismatch: Expected axis has 4 elements, new values have 2 element
If you check the documentation for yahoo_fin, it mentions that data is retrieved by scraping the yahoo page for the selected stock. Without a paid Yahoo account, you can't see data for foreign stocks.
US stock: https://finance.yahoo.com/quote/NFLX/key-statistics?p=NFLX
Foreign: https://finance.yahoo.com/quote/CBA.AX/key-statistics?p=CBA.AX
The documentation can be found here:
http://theautomatic.net/yahoo_fin-documentation/#get_stats
To clarify the issue, you can run this code:
import yahoo_fin.stock_info as si
print('AAPL', len(si.get_stats('AAPL')))
print('CBA.AX', len(si.get_stats('CBA.AX')))
Output
AAPL 50
Traceback (most recent call last):
.....
ValueError: Length mismatch: Expected axis has 4 elements, new values have 2 elements
I am getting a
HTTPError: Bad response
when trying to receive weather data of the Dark Sky API using the darkskylib in Python. Actually it is a 400 bad request code.
It seems it only happens when I use a loop through my pandas dataframe instances because when I run my code for a single instance I am getting the correct values as well as when I am using a direct URL request in my browser.
Here is my function which is called later (with df being the dataframe)
def engineer_features(df):
from datetime import datetime as dt
from darksky import forecast
print("Add weather data...")
# Add Windspeed
df['ISSTORM'] = 0
# Add Temperature
df['ISHOT'] = 0
df['ISCOLD'] = 0
# Add Precipitation probability
# (because the weather station is not at the coordinate of the taxi
# only a probability is added, but in regard to the center of Porto
# (because else the API calls would have been costly))
df['PRECIPPROB'] = 0
# sort data frame
data_times = df.sort_values(by='TIMESTAMP')
# initialize variable for previous day's date day (day before the first day)
prevDay = data_times.at[0,'TIMESTAMP'].date().day - 1
#initialize hour counter
hourCount = 0
# add personal DarkSky API key and assign with location data
key = 'MY_API_KEY'
PORTO = key, 41.1579, -8.6291
# loop through the sorted dataframe and add weather related data to the main dataframe
for index, row in data_times.iterrows():
# if the new row is a new day make a new API call for weather data of that new day
if row["TIMESTAMP"].day != prevDay:
# get Weather data
t = row["TIMESTAMP"].date().isoformat()
porto = forecast(*PORTO, time=t)
porto.refresh(units='si')
###...more code
My particular issue was that I converted my datetime into date. So instead of writing
t = row["TIMESTAMP"].date().isoformat()
I need to write
t = row["TIMESTAMP"].isoformat()
I am using crime statistics (in a data frame)and I am trying to find when most crimes occur between 12 am-8am,8am-4pm, and 4pm-12pm. I have already converted the column to DateTime. the code I used is:
#first attempt
df_15['FIRST_OCCURRENCE_DATE']=pd.date_range('01/01/2015',periods=10000,freq='H')
df_15[(df_15['FIRST_OCCURrENCE_DATE'] > '2015-1-1 00:00:00') & (df_15['FIRST_OCCURRENCE_DATE'] <= '2015-12-31 08:00:00')]
#second attempt
df_15 = df_15.set_index(df_15['FIRST_OCCURRENCE_DATE'])
df_15.loc['2015-01-01 00:00:00':'2015-12-31 00:00:00']
#third attempt
date_rng = pd.date_range(start='00:00:00', end='08:00:00',freq='H')
date_rng1 = pd.DataFrame(date_rng)
date_rng1.head(30)
#fourth attempt
df_15.FIRST_OCCURRENCE_DATE.dt.hour
ts = pd.to_datetime('12/31/2015 08:00:00')
df_15.loc[df_15.FIRST_OCCURRENCE_DATE <= ts,:].head()
The results I get are time entries that go outside of 08:00:00.
PS. all the data is from the same year
Looks like you can just do a little arithmetic and count:
(df_15['FIRST_OCCURrENCE_DATE'].dt.hour // 8).value_counts()
There are a lot of ways to solve this problem but this is likely the simplest. Extract the hour of day from each date, find which time slot it belongs to. Floor-divide by 8 to get 0 (12AM-8AM), 1 (8AM-4PM), or 2 (4PM-12AM) for each, and just count these occurrences.
Currently I have a function which returns the stock ticker with the highest error for the entire data set. What I actually want is to return the stock ticker with the highest error for the current day.
Here is the current function:
#main.route('/api/highest/error')
def get_highest_error():
"""
API which returns the highest stock error for the current day.
:return: ticker of the stock matching the query.
"""
sub = db.session.query(db.func.max(Stock.error).label('max_error')).subquery()
stock = db.session.query(Stock).join(sub, sub.c.max_error == Stock.error).first()
return stock.ticker
Here is what I attempted:
todays_stock = db.session.query(db.func.date(Stock.time_stamp) == date.today())
stock = todays_stock.filter(db.func.max(Stock.error))
return stock.ticker
Unfortunately this is operating on a BaseQuery which is not what I expected.
I also tried:
stock = Stock.query.filter(db.func.date(Stock.time_stamp) == date.today()).filter(db.func.max(Stock.error)).first()
But this generated an error with the messageaggregate functions are not allowed in WHERE
The error is pretty self explanatory. You cannot use aggregate functions in the WHERE clause. If you have to eliminate group rows based on aggregates, use HAVING. But that's not what you need: for fetching the row with greatest error order by error in descending order and pick the first row:
stock = Stock.query.\
filter(db.func.date(Stock.time_stamp) == date.today()).\
order_by(Stock.error.desc().nullslast()).\
first()
Unless you have a ridiculous amount of Stock per day, the sorting should be plenty fast. Note that db.func.date(Stock.time_stamp) == date.today() is not very index friendly, unless your DB supports functional indexes. Instead you could filter on a half open range:
today = date.today()
...
filter(Stock.time_stamp >= today,
Stock.time_stamp < today + timedelta(days=1)).\