Bloomberg APIs - historical index members in Python

Bloomberg APIs - historical index members in Python - python

I'm trying to get index members using Bloomberg APIs in Python. I have no issues getting current constituents, but I want a historical list (example: what where Russell 1000 or S&P 500 constituents as of Q1 1995).
To get the current index members I can use following:
In excel I can use INDX_MEMBERS to get the constituents:
=BDS("Index Ticker", INDX_MEMBERS)
In Python:
import pybbg
def Main():
bbg = pybbg.Pybbg()
IndexConst = bbg.bds('IndexName', 'INDX_MEMBERS')
or:
from tia.bbg import LocalTerminal
resp = LocalTerminal.get_reference_data(index_ticker + ' INDEX', 'INDX_MEMBERS')
members = resp.as_frame().iloc[0,0]
Question is how can I get historical index members/constituents. For example I would generate quarterly dates and then I want to know list of constituents for each date.
['2020-06-30',
'2020-03-31',
'2019-12-31',
'2019-09-30',
'2019-06-30',
'2019-03-31',
'2018-12-31' ... '1980-06-30',]
I've tried many solutions, including one below where I'm getting an empty frame:
from tia.bbg import LocalTerminal
date_start = datetime.date(2010,6,28)
date_end = datetime.date(2020,6,28)
members_russell1000_3 = LocalTerminal.get_historical('RIY Index', 'INDX_MEMBERS',start=date_start, end=date_end,).as_frame()
or the solution below, where regardless of date (now or 20 years ago) I'm receiving the same list of constituents:
from xbbg import blp
members = blp.bds('RIY Index', 'INDX_MEMBERS', DVD_Start_Dt=k[1], DVD_End_Dt=k[1])
Variable Explanation to above examples:
'RIY Index' - Russell 1000 index ticker
'INDX_MEMBERS' - Bloomberg field (flds) for list of index constituents
Alternatively I would be happy if I could get historical list of changes to index constituents with dates (I already have current constituents)

You need to use the INDX_MWEIGHT_PX field and the END_DATE_OVERRIDE override (date format: yyyymmdd). It is a reference data request, so probably bds and not bdh in the python library but I've never used it, so not 100% sure and you may need to try a few solutions until you find the correct one.

I've found that the below works
blp.bds('RIY Index', "INDX_MWEIGHT", END_DATE_OVERRIDE="20210101")
and gives the same results as an excel query
=BDS("RIY Index", "INDX_MWEIGHT_HIST", "END_DATE_OVERRIDE",'20210101')
Alternatively using "INDX_MWEIGHT_PX" gives the actual weight and current price values also.

Related

Downloading weekly Sentinel 2 data using SentinelApi

I'm trying to download weekly Sentinel 2 data for one year. So, one Sentinel dataset within each week of the year. I can create a list of datasets using the code:
from sentinelsat import SentinelAPI
api = SentinelAPI(user, password, 'https://scihub.copernicus.eu/dhus')
products = api.query(footprint,
date = ('20211001', '20221031'),
platformname = 'Sentinel-2',
processinglevel = 'Level-2A',
cloudcoverpercentage = (0,10)
)
products_gdf = api.to_geodataframe(products)
products_gdf_sorted = products_gdf.sort_values(['beginposition'], ascending=[False])
products_gdf_sorted
This creates a list of all datasets available within the year, and as the data capture is around one in every five days you could argue I can work off this list. But instead I would like to have just one option each week (Mon - Sun). I thought I could create a dataframe with a startdate and an enddate for each week and loop that through the api.query code. But not sure how I would do this.
I have created a dataframe using:
import pandas as pd
dates_df = pd.DataFrame({'StartDate':pd.date_range(start='20211001', end='20221030', freq = 'W-MON'),'EndDate':pd.date_range(start='20211004', end='20221031', freq = 'W-SUN')})
print (dates_df)
Any tips or advice is greatly appreciated. Thanks!

Python: iterate through the rows of a csv and calculate date difference if there is a change in a column

Only basic knowledge of Python, so I'm not even sure if this is possible?
I have a csv that looks like this:
[1]: https://i.stack.imgur.com/8clYM.png
(This is dummy data, the real one is about 30K rows.)
I need to find the most recent job title for each employee (unique id) and then calculate how long (= how many days) the employee has been on the same job title.
What I have done so far:
import csv
import datetime
from datetime import *
data = open("C:\\Users\\User\\PycharmProjects\\pythonProject\\jts.csv",encoding="utf-8")
csv_data = csv.reader(data)
data_lines = list(csv_data)
print(data_lines)
for i in data_lines:
for j in i[0]:
But then I haven't got anywhere because I can't even conceptualise how to structure this. :-(
I also know that at one point I will need:
datetime.strptime(data_lines[1][2] , '%Y/%M/%d').date()
Could somebody help, please? I just need a new list saying something like:
id jt days
500 plumber 370
Edit to clarify: The dates are data points taken. I need to calculate back from the most recent of those back until the job title was something else. So in my example for employee 5000 from 04/07/2021 to 01/03/2020.

Let's consider sample data as follows:
id,jtitle,date
5000,plumber,01/01/2020
5000,senior plumber,02/03/2020
6000,software engineer,01/02/2020
6000,software architecture,06/02/2021
7000,software tester,06/02/2019
The following code works.
import pandas as pd
import datetime
# load data
data = pd.read_csv('data.csv')
# convert to datetime object
data.date = pd.to_datetime(data.date, dayfirst=True)
print(data)
# group employees by ID
latest = data.sort_values('date', ascending=False).groupby('id').nth(0)
print(latest)
# find the latest point in time where there is a change in job title
prev_date = data.sort_values('date', ascending=False).groupby('id').nth(1).date
print(prev_date)
# calculate the difference in days
latest['days'] = latest.date - prev_date
print(latest)
Output:
jtitle date days
id
5000 senior plumber 2020-03-02 61 days
6000 software architecture 2021-02-06 371 days
7000 software tester 2019-02-06 NaT

But then I haven't got anywhere because I can't even conceptualise how to structure this. :-(
Have a map (dict) of employee to (date, title).
For every row, check if you already have an entry for the employee. If you don't just put the information in the map, otherwise compare the date of the row and that of the entry. If the row has a more recent date, replace the entry.
Once you've gone through all the rows, you can just go through the map you've collected and compute the difference between the date you ended up with and "today".
Incidentally your pattern is not correct, the sample data uses a %d/%m/%Y (day/month/year) or %m/%d/%Y (month/day/year) format, the sample data is not sufficient to say which, but it certainly is not YMD.

Seems like I'm too late... Nevertheless, in case you're interested, here's a suggestion in pure Python (nothing wrong with Pandas, though!):
import csv
import datetime as dt
from operator import itemgetter
from itertools import groupby
reader = csv.reader('data.csv')
next(reader) # Discard header row
# Read, transform (date), and sort in reverse (id first, then date):
data = sorted(((i, jtitle, dt.datetime.strptime(date, '%d/%m/%Y'))
for i, jtitle, date in reader),
key=itemgetter(0, 2), reverse=True)
# Process data grouped by id
result = []
for i, group in groupby(data, key=itemgetter(0)):
_, jtitle, end = next(group) # Fetch last job title resp. date
# Search for first ocurrence of different job title:
start = end
for _, jt, start in group:
if jt != jtitle:
break
# Collect results in list with datetimes transformed back
result.append((i, jtitle, end.strftime('%d/%m/%Y'), (end - start).days))
result = sorted(result, key=itemgetter(0))
The result for the input data
id,jtitle,date
5000,plumber,01/01/2020
5000,plumber,01/02/2020
5000,senior plumber,01/03/2020
5000,head plumber,01/05/2020
5000,head plumber,02/09/2020
5000,head plumber,05/01/2021
5000,head plumber,04/07/2021
6000,electrician,01/02/2018
6000,qualified electrician,01/06/2020
7000,plumber,01/01/2004
7000,plumber,09/11/2020
7000,senior plumber,05/06/2021
is
[('5000', 'head plumber', '04/07/2021', 490),
('6000', 'qualified electrician', '01/06/2020', 851),
('7000', 'senior plumber', '05/06/2021', 208)]

How do I get quarterly S&P500 constituents in Python from the detailed change data?

I want to use S&P500 company information to calculate an index. However, the companies in S&P500 changes frequently, I want to know the constituents for each quarter, but I can only get the most recent list from Wikipedia, the code is as below:
table=pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = table[0]
tickers = df.Symbol.to_list()
'tickers' is a list that contains all the company tickers in S&P500 companies
['MMM',
'ABT',
'ABBV',
'ABMD',
'ACN',
'ATVI',
'ADBE',
'AMD',
'AAP',
'AES',
'AFL',
'A',
'APD',
'AKAM',
'ALK',
'ALB',
'ARE',
...]
Now I found a table that contains the historical change information of S&P500 constituents. There are dates, changes, and tickers for all the companies. '1' means the company was added to the list, and '-1' means the company was removed from the list. I want to use this information, particularly 'DateAfterChange', and get the lists of companies in the S&P500 for the past 20 quarters(5 years). A complete list can be found here: https://docs.google.com/spreadsheets/d/1xkq2kkf-iElKl9BhEwqQx3Pgkh0B9dFKJpefQ4oOI_g/edit#gid=455032226.
DateBeforeChange DateAfterChange Change Ticker
20200623 20200624 1 TMUSR
20200618 20200619 1 BIO
20200618 20200619 1 TDY
20200618 20200619 1 TYL
20200618 20200619 -1 ADS
20200618 20200619 -1 HOG
My expected output could be single lists or in a combined format like this:
2019-Q1 2019-Q2 2019-Q3 2019-Q4
A B C D
B C D F
C D E E
D E F G
E F G H
...
What I'm thinking about is to use the most recent list of companies, and first divide the date info into quarters in the change data, and then add back those were removed and remove those were added in the past. But I'm just not sure how to do that in Python. Can anyone please help?

This method works:
import pandas as pd
# current list
table = pd.read_html('https://en.wikipedia.org/wiki/List_of_S%26P_500_companies')
df = table[0]
tickers = df.Symbol.to_list()
# your file of changes
change = pd.read_excel("sp500change.xlsx")
# convert afterchange to datetime and set as index, sorting
change["DateAfterChange"] = pd.to_datetime(change["DateAfterChange"], format="%Y%m%d")
change.set_index("DateAfterChange", inplace=True)
change = change.sort_index(ascending=False)
# groupby quarter, creating list of tickers for additions and deletions from list
change = change.groupby([pd.Grouper(freq="Q"), "Change"])["Ticker"].agg(lambda x: list(x)).to_frame()
# set index afterchange, change to strings and set these as columns
change = change.reset_index(drop=False).set_index("DateAfterChange")
change["Change"] = change["Change"].map({-1: "drop", 1: "add"})
change = change.pivot(columns="Change")
change.columns = change.columns.droplevel(0)
# series of tickers over time
tick_series = pd.Series({pd.to_datetime("today"): tickers})
tick_series = tick_series.append(pd.Series(index=change.index)).sort_index(ascending=False)
for i in tick_series.iloc[1:].index:
tick_series.loc[i] = list(set(tick_series.shift(1).loc[i] + change.loc[i]["drop"]).difference(set(change.loc[i]["add"])))
The for loop takes the previous list (it is working backwards, so this is the more recent list), and adds the tickers that were dropped in the quarter, and removes those that were added in the quarter. Sets were needed to only keep the differences between the "add" and "more recent + drop" lists.
Hopefully you have found a solution by now anyway, and haven't waited for 2 years...

Python: Length mismatch: Expected axis has 4 elements, new values have 2 element

I am very very new to Python (no coding history or skills whatsoever). I have been trying to automate pulling data from Yahoo and have built the following program from whatever I could find on the net. SO please excuse the poor coding attempt (however it almost works perfectly for me).
I am trying to download listed financial stock data (as you'll see in the code)
I want it downloaded to a specific excel sheet - in its raw form (as I link it to another excel sheet which runs my calculations).
Here is the problem. The following code works perfectly for all US stocks, a bunch of EU stocks but none for Australian / NZ and some EU stocks where i get the error: "Length mismatch: Expected axis has 4 elements, new values have 2 element"
I am absolutely stumped. It was working previously - then I started playing around with matplotlib and now nothing is working for Australian / New Zealand (and some EU) Stocks.
Any help whatsoever is greatly appreciated and again, I am brand new to this so please go easy: Here is my code:
import pandas as pd
import yfinance as yf
import yahoofinancials
from yahoo_fin.stock_info import *
from openpyxl import load_workbook
x = input("Enter Stock: ")
a = (x)
datatoexcel = pd.ExcelWriter("File.xlsx",engine='xlsxwriter')
data = stats_df = si.get_stats(a)
stats_df.to_excel(datatoexcel, sheet_name='Stats')
data = StatsVal_df = get_stats_valuation(x)
StatsVal_df.to_excel(datatoexcel, sheet_name='Stats Val')
data = BS_df = si.get_balance_sheet(a)
BS_df.to_excel(datatoexcel, sheet_name='Balance Sheet')
data = IS_df = si.get_income_statement(a)
IS_df.to_excel(datatoexcel, sheet_name='PnL')
data = CF_df = si.get_cash_flow(a)
CF_df.to_excel(datatoexcel, sheet_name='CashFlow')
Data = Data_df = get_data(x)
Data_df.to_excel(datatoexcel, sheet_name='Historical Price History')
datatoexcel.save()
The issue is mainly contained to:
data = stats_df = si.get_stats(a)
stats_df.to_excel(datatoexcel, sheet_name='Stats')
so, for example, I will run "GOOGL" / "AAPL" / "MSFT" / "BSX" / "BMW.DE" and it works perfectly. Yet, when I run "NAN.AX" / "CBA.AX" or any other stock like that - i get the error: Length mismatch: Expected axis has 4 elements, new values have 2 element

If you check the documentation for yahoo_fin, it mentions that data is retrieved by scraping the yahoo page for the selected stock. Without a paid Yahoo account, you can't see data for foreign stocks.
US stock: https://finance.yahoo.com/quote/NFLX/key-statistics?p=NFLX
Foreign: https://finance.yahoo.com/quote/CBA.AX/key-statistics?p=CBA.AX
The documentation can be found here:
http://theautomatic.net/yahoo_fin-documentation/#get_stats
To clarify the issue, you can run this code:
import yahoo_fin.stock_info as si
print('AAPL', len(si.get_stats('AAPL')))
print('CBA.AX', len(si.get_stats('CBA.AX')))
Output
AAPL 50
Traceback (most recent call last):
.....
ValueError: Length mismatch: Expected axis has 4 elements, new values have 2 elements

How do I fix this attribute error i my calculation step?

I'm attempting to determine which loans in a loan portfolio exceed the FHFA County Loan Limit to project impact of upcoming law changes for a study. I've had versions of the code work with a small (14k loans) sample set, but when importing the full portfolio (5.6m) the code does not work. I'm definitely pretty new to Python, my experience is limited to SAS and R, and that's admittedly rusty.
As I don't have access to live data, I'm importing the data w/ chunksize of 5k which has alleviated memory issues. and I've imported the loan limit data from the FHFA website, and created a dictionary for year, state, and county code.
I also used pd.to_datetime() and a .notnull() in an attempt to remove nulls from the data and county fields.
def loan_calculation_new(row):
year = row['PROCESSED_DATE'].year
if row['PROCESSED_DATE'].month > 9:
year += 1
state_dict = year_dict[year]
if row['FIPS_STATE_CODE'] not in state_dict:
print("No State Code")
return None
county_dict = state_dict[row['FIPS_STATE_CODE']]
if row['FIPS_COUNTY_CODE'] not in county_dict:
limit = 485300
return
limit = county_dict[row['FIPS_COUNTY_CODE']]
limit > row['MTGE_LOAN_AMOUNT'].astype(int)
I keep getting this error when trying to run the calculation:
AttributeError: ("'str' object has no attribute 'year'", 'occurred at index 0')
I'm wondering if the issue is with my data being pipe delineated, and not being interpreted as a date. The Sample was a .csv and seemed to work.

it seems the col PROCESSED_DATE is string, so you need to convert to datetime
if the row from dataframe, you can do:
df['PROCESSED_DATE'] = pd.to_datetime(df['PROCESSED_DATE'])

import datetime
def loan_calculation_new(row):
year = datetime.strptime(row['PROCESSED_DATE'], "<EXPECTED FORMAT>").year
if row['PROCESSED_DATE'].month > 9:
year += 1
...

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.