Python: Transform ISIN, WKN or RIC to Yahoo Ticker Symbol? - python

Based on this post here, I have the possibility to transform the ISIN to some form ticker symbol with help of library investpy. This transformation is correct for most of united states stocks.
But this symbol itself is not in any case the same as the ticker-symbol I need to call pandas_dataframe. I think more exactly it conforms the RIC-symbol (e.g. look here).
For example if I try the following call:
import investpy
df = investpy.stocks.search_stocks(by='isin', value='DE0006048432')
print(df)
My output is:
country name ... currency symbol
0 germany Henkel VZO ... EUR HNKG_p
1 italy Henkel VZO ... EUR HNKG_p
2 switzerland Henkel VZO ... EUR HNKG_pEUR
but
from pandas_datareader import data as pdr
stock = pdr.DataReader('HNKG_p', data_source="yahoo", start="2021-01-01", end="2021-10-30")
gives me an error.
The correct call I need is:
stock = pdr.DataReader('HEN3.DE', data_source="yahoo", start="2021-01-01", end="2021-10-30")
So my question is:
is there a way to transform an ISIN, maybe WKN or also RIC to the
ticker-symbol yahoo needs for DataReader call.
Or more general
Is there a way to get historical stock data with the knowledge of ISIN, maybe WKN or RIC?

Super ugly and error prone but better than nothing:
import investpy as ip
import yahooquery as yq
from pandas_datareader import data as pdr
company_name = ip.stocks.search_stocks(by='isin', value='DE0006048432')
company_name = company_name["name"][0].split(' ')[0]
symbol = yq.search(company_name)["quotes"][0]["symbol"]
stock = pdr.DataReader(symbol, data_source="yahoo", start="2021-01-01", end="2021-10-30")
You could extend this code using things like fuzzywuzzy and an ordinary testing module with doctest. Do not use this code in production.
I am not even sure if this call keeps the order of the returned values:
yq.search(company_name)["quotes"]
So this code might actually behave randomly but it might give you a direction.

Related

Is there a way to pull analyst recommendations from Bloomberg into Python using pdblp package?

I am able to pull in prices:
import pdblp
con = pdblp.BCon()
con.start()
start_date = '20220101'
end_date = '20220701'
aapl = con.bdh(['AAPL US Equity'],['PX_LAST'],start_date,end_date) # pull generic prices
But when I look at ANR function within AAPL I can see a chart with 12M Tgt Px and other data on Buy Holds and Sell bar graph. Is there a way to pull this data using the API? I guess I would need to have the 12M Tgt Px as its own 'Security' so I can call PX_LAST on it unless there is another parameter in con.bdh that I'm missing.
Thanks.

Getting data from World Bank API using pandas

I'm trying to obtain a table of data obtaining just the country, year and value from this World Bank API but I can't seem to filter for just the data I want. I've seen that these types of questions have already been asked but all the answers didn't seem to work.
Would really appreciate some help. Thank you!
import requests
import pandas as pd
from bs4 import BeautifulSoup
import json
url ="http://api.worldbank.org/v2/country/{}/indicator/NY.GDP.PCAP.CD?date=2015&format=json"
country = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
"GRC""HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
"NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
"GBR","USA","VNM","ZWE"]
html={}
for i in country:
url_one = url.format(i)
html[i] = requests.get(url_one).json()
my_values=[]
for i in country:
value=html[i][1][0]['value']
my_values.append(value)
Edit
My data currently looks like this, I'm trying to extract the country name which is in '{'country': {'id': 'AO', 'value': 'Angola''}, the 'date' and the 'value'
Edit 2
Got the data I'm looking for but its repeated twice each
Note: Assumed that it would be great to store information for all the years at once and not only for one year - Enables you to simply filter in later processing. Take a look, there is a missing "," between your countries "GRC""HUN"
There are different options to achieve your goal, just point with two of them in the right direction.
Option #1
Pick information needed from json response, create a reshaped dict and append() it to my_values:
for d in data[1]:
my_values.append({
'country':d['country']['value'],
'date':d['date'],
'value':d['value']
})
Example
import requests
import pandas as pd
url = 'http://api.worldbank.org/v2/country/%s/indicator/NY.GDP.PCAP.CD?format=json'
countries = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
"GRC","HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
"NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
"GBR","USA","VNM","ZWE"]
my_values = []
for country in countries:
data = requests.get(url %country).json()
try:
for d in data[1]:
my_values.append({
'country':d['country']['value'],
'date':d['date'],
'value':d['value']
})
except Exception as err:
print(f'[ERROR] country ==> {country} error ==> {err}')
pd.DataFrame(my_values).sort_values(['country', 'date'], ascending=True)
Option #2
Create a dataframes directly from the json response, concat them and make some adjustments on the final dataframe:
for d in data[1]:
my_values.append(pd.DataFrame(d))
...
pd.concat(my_values).loc[['value']][['country','date','value']].sort_values(['country', 'date'], ascending=True)
Output
country
date
value
Algeria
1971
341.389
Algeria
1972
442.678
Algeria
1973
554.293
Algeria
1974
818.008
Algeria
1975
936.79
...
...
...
Zimbabwe
2016
1464.59
Zimbabwe
2017
1235.19
Zimbabwe
2018
1254.64
Zimbabwe
2019
1316.74
Zimbabwe
2020
1214.51
Pandas read_json method needs valid JSON str, path object or file-like object, but you put string.
https://pandas.pydata.org/docs/reference/api/pandas.read_json.html
Try this:
import requests
import pandas as pd
url = "http://api.worldbank.org/v2/country/%s/indicator/NY.GDP.PCAP.CD?date=2015&format=json"
countries = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
"GRC""HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
"NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
"GBR","USA","VNM","ZWE"]
datas = []
for country in countries:
data = requests.get(url %country).json()
try:
values = data[1][0]
datas.append(pd.DataFrame(values))
except Exception as err:
print(f"[ERROR] country ==> {country} with error ==> {err}")
df = pd.concat(datas)

error "can only convert an array of size 1 to a Python scalar" when I try to get the index of a row of a pandas dataframe as an integer

I have an excel file with stock symbols and many other columns. I have a simplified version of the excel file below:
Symbol
Industry
0
AAPL
Technology Manufacturing
1
MSFT
Technology Manufacturing
2
TSLA
Electric Car Manufacturing
Essentially, I am trying to get the Industry based on the Symbol.
For example, if I use 'AAPL' I want to get 'Technology Manufacturing'. Here is my code so far.
import pandas as pd
excel_file1 = 'file.xlsx'
df = pd.read_excel(excel_file1)
stock = 'AAPL'
row_index = df[df['Symbol'] == stock].index.item()
industry = df['Industry'][row_index]
print(industry)
after trying to get row_index, I get an error: "ValueError: can only convert an array of size 1 to a Python scalar"
can someone solve this? Also let's say row_index works: is this code (below) correct?
industry = df['Industry'][row_index]
Use:
stock = 'AAPL'
industry = df[df['Symbol'] == stock]['Industry'][0]
OR:, if you want to search using index, use df.loc:
stock = 'AAPL'
industry = df.loc[df[df['Symbol'] == stock].index, 'Industry'][0]
But the first one's much better.

Python: Length mismatch: Expected axis has 4 elements, new values have 2 element

I am very very new to Python (no coding history or skills whatsoever). I have been trying to automate pulling data from Yahoo and have built the following program from whatever I could find on the net. SO please excuse the poor coding attempt (however it almost works perfectly for me).
I am trying to download listed financial stock data (as you'll see in the code)
I want it downloaded to a specific excel sheet - in its raw form (as I link it to another excel sheet which runs my calculations).
Here is the problem. The following code works perfectly for all US stocks, a bunch of EU stocks but none for Australian / NZ and some EU stocks where i get the error: "Length mismatch: Expected axis has 4 elements, new values have 2 element"
I am absolutely stumped. It was working previously - then I started playing around with matplotlib and now nothing is working for Australian / New Zealand (and some EU) Stocks.
Any help whatsoever is greatly appreciated and again, I am brand new to this so please go easy: Here is my code:
import pandas as pd
import yfinance as yf
import yahoofinancials
from yahoo_fin.stock_info import *
from openpyxl import load_workbook
x = input("Enter Stock: ")
a = (x)
datatoexcel = pd.ExcelWriter("File.xlsx",engine='xlsxwriter')
data = stats_df = si.get_stats(a)
stats_df.to_excel(datatoexcel, sheet_name='Stats')
data = StatsVal_df = get_stats_valuation(x)
StatsVal_df.to_excel(datatoexcel, sheet_name='Stats Val')
data = BS_df = si.get_balance_sheet(a)
BS_df.to_excel(datatoexcel, sheet_name='Balance Sheet')
data = IS_df = si.get_income_statement(a)
IS_df.to_excel(datatoexcel, sheet_name='PnL')
data = CF_df = si.get_cash_flow(a)
CF_df.to_excel(datatoexcel, sheet_name='CashFlow')
Data = Data_df = get_data(x)
Data_df.to_excel(datatoexcel, sheet_name='Historical Price History')
datatoexcel.save()
The issue is mainly contained to:
data = stats_df = si.get_stats(a)
stats_df.to_excel(datatoexcel, sheet_name='Stats')
so, for example, I will run "GOOGL" / "AAPL" / "MSFT" / "BSX" / "BMW.DE" and it works perfectly. Yet, when I run "NAN.AX" / "CBA.AX" or any other stock like that - i get the error: Length mismatch: Expected axis has 4 elements, new values have 2 element
If you check the documentation for yahoo_fin, it mentions that data is retrieved by scraping the yahoo page for the selected stock. Without a paid Yahoo account, you can't see data for foreign stocks.
US stock: https://finance.yahoo.com/quote/NFLX/key-statistics?p=NFLX
Foreign: https://finance.yahoo.com/quote/CBA.AX/key-statistics?p=CBA.AX
The documentation can be found here:
http://theautomatic.net/yahoo_fin-documentation/#get_stats
To clarify the issue, you can run this code:
import yahoo_fin.stock_info as si
print('AAPL', len(si.get_stats('AAPL')))
print('CBA.AX', len(si.get_stats('CBA.AX')))
Output
AAPL 50
Traceback (most recent call last):
.....
ValueError: Length mismatch: Expected axis has 4 elements, new values have 2 elements

Cannot convert object type to string; and then filter on that string; python pandas dataframe

I am trying to pull all stock tickers from NYSE, and then filter out for only those with MarketCap above 5B.
I am running into a problem because based on how my data load comes in all columns are data type "Object" and I cannot find anyway to convert them to anything else. See my code and comments below:
import pandas as pd
import numpy as np
# NYSE
url_nyse = "http://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download"
df = pd.DataFrame.from_csv(url_nyse)
df = df.drop(df.columns[[0, 1, 3, 6,7]], axis=1)
This is my initial data load of NYSE stocks, and then I filter for just MarketCap, Sector, and Industry.
At first I was hoping to filter out MarketCap first by anything with "M" in it was removed and then removing the first and last characters to get a number which then could be filtered to keep anything above 5. However I think it is because of the data types being "Object" and not string I have not bee able to do it directly. So I then created new columns that would contain only letters or numbers, hoping that I could then convert to data type string and float from there.
df['MarketCap_Num'] = df.MarketCap.str[1:-1]
df['Billion_Filter'] = df.MarketCap.str[-1:]
So MarketCap_Num column has only the numbers by removing the first and last characters and Billion_Filter is only the last character where I will remove any value that = M.
However even though these columns are just numbers or just strings I CANNOT find anyway to convert to change from object datatype so then my filtering is not working at all. Any help is much appreciated.
I have tried .astype(float), pd.to_numeric, type functions to no success.
My filtering code would then be:
df[df.Billion_Filter.str.contains("B")]
But when I run that nothing happens, no error but also no filter happens. When I run this code on a different table it works, so it must be the object data type that is holding it up.
Convert the MarketCap column into floats by first removing the dollar signs and then substituting B with e9 and M with e6. This should make it easy to use .astype(float) on the column to do the conversion.
import pandas as pd
import numpy as np
# NYSE
url_nyse = "http://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download"
df = pd.DataFrame.from_csv(url_nyse)
df = df.drop(df.columns[[0, 1, 3, 6,7]], axis=1)
df = df.replace({'MarketCap': {'\$': '', 'B': 'e9', 'M': 'e6', 'n/a': np.nan}}, regex=True)
df.MarketCap = df.MarketCap.astype(float)
print(df[df.MarketCap > 5000000000].head(10))
Yields:
MarketCap Sector industry
Symbol
MMM 1.419900e+11 Health Care Medical/Dental Instruments
WUBA 1.039000e+10 Technology Computer Software: Programming, Data Processing
ABB 5.676000e+10 Consumer Durables Electrical Products
ABT 9.887000e+10 Health Care Major Pharmaceuticals
ABBV 1.563200e+11 Health Care Major Pharmaceuticals
ACN 9.388000e+10 Miscellaneous Business Services
AYI 7.240000e+09 Consumer Durables Building Products
ADNT 7.490000e+09 Capital Goods Auto Parts:O.E.M.
AAP 7.370000e+09 Consumer Services Other Specialty Stores
ASX 1.083000e+10 Technology Semiconductors
You should be able to change the type of the MarketCap_Num column by using:
df['MarketCap_Num'] = df.MarketCap.str[1:-1].astype(np.float64)
You can then check the data types by df.dtypes.
As for the filter, you can simple just say
df_filtered = df[df['Billion_Filter'] =="B"].copy()
since you will only have one letter in your Billion_Filter column.
Obhect datatype works as string. You should be able to use both str.contains and extract the number without having to convert the object type to string
df = df[df['MarketCap'].str.contains('B')].copy()
df['MarketCap'] = df['MarketCap'].str.extract('(\d+.?\d*)', expand = False)
MarketCap Sector industry
Symbol
DDD 1.12 Technology Computer Software: Prepackaged Software
MMM 141.99 Health Care Medical/Dental Instruments
WUBA 10.39 Technology Computer Software: Programming, Data Processing
EGHT 1.32 Public UtilitiesTelecommunications Equipment
AIR 1.48 Capital Goods Aerospace

Categories

Resources