I'm trying to obtain a table of data obtaining just the country, year and value from this World Bank API but I can't seem to filter for just the data I want. I've seen that these types of questions have already been asked but all the answers didn't seem to work.
Would really appreciate some help. Thank you!
import requests
import pandas as pd
from bs4 import BeautifulSoup
import json
url ="http://api.worldbank.org/v2/country/{}/indicator/NY.GDP.PCAP.CD?date=2015&format=json"
country = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
"GRC""HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
"NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
"GBR","USA","VNM","ZWE"]
html={}
for i in country:
url_one = url.format(i)
html[i] = requests.get(url_one).json()
my_values=[]
for i in country:
value=html[i][1][0]['value']
my_values.append(value)
Edit
My data currently looks like this, I'm trying to extract the country name which is in '{'country': {'id': 'AO', 'value': 'Angola''}, the 'date' and the 'value'
Edit 2
Got the data I'm looking for but its repeated twice each
Note: Assumed that it would be great to store information for all the years at once and not only for one year - Enables you to simply filter in later processing. Take a look, there is a missing "," between your countries "GRC""HUN"
There are different options to achieve your goal, just point with two of them in the right direction.
Option #1
Pick information needed from json response, create a reshaped dict and append() it to my_values:
for d in data[1]:
my_values.append({
'country':d['country']['value'],
'date':d['date'],
'value':d['value']
})
Example
import requests
import pandas as pd
url = 'http://api.worldbank.org/v2/country/%s/indicator/NY.GDP.PCAP.CD?format=json'
countries = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
"GRC","HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
"NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
"GBR","USA","VNM","ZWE"]
my_values = []
for country in countries:
data = requests.get(url %country).json()
try:
for d in data[1]:
my_values.append({
'country':d['country']['value'],
'date':d['date'],
'value':d['value']
})
except Exception as err:
print(f'[ERROR] country ==> {country} error ==> {err}')
pd.DataFrame(my_values).sort_values(['country', 'date'], ascending=True)
Option #2
Create a dataframes directly from the json response, concat them and make some adjustments on the final dataframe:
for d in data[1]:
my_values.append(pd.DataFrame(d))
...
pd.concat(my_values).loc[['value']][['country','date','value']].sort_values(['country', 'date'], ascending=True)
Output
country
date
value
Algeria
1971
341.389
Algeria
1972
442.678
Algeria
1973
554.293
Algeria
1974
818.008
Algeria
1975
936.79
...
...
...
Zimbabwe
2016
1464.59
Zimbabwe
2017
1235.19
Zimbabwe
2018
1254.64
Zimbabwe
2019
1316.74
Zimbabwe
2020
1214.51
Pandas read_json method needs valid JSON str, path object or file-like object, but you put string.
https://pandas.pydata.org/docs/reference/api/pandas.read_json.html
Try this:
import requests
import pandas as pd
url = "http://api.worldbank.org/v2/country/%s/indicator/NY.GDP.PCAP.CD?date=2015&format=json"
countries = ["DZA","AGO","ARG","AUS","AUT","BEL","BRA","CAN","CHL","CHN","COL","CYP", "CZE","DNK","FIN","FRA","GEO","DEU",
"GRC""HUN","ISL","IND","IDN","IRL","ISR","ITA","JPN","KAZ","KWT","LBN","LIE","MYS","MEX","MCO","MAR","NPL","NLD",
"NZL","NGA","NOR","OMN","PER","PHL","POL","PRT","QAT","ROU","SGP","ZAF","ESP","SWE","CHE","TZA","THA","TUR","UKR",
"GBR","USA","VNM","ZWE"]
datas = []
for country in countries:
data = requests.get(url %country).json()
try:
values = data[1][0]
datas.append(pd.DataFrame(values))
except Exception as err:
print(f"[ERROR] country ==> {country} with error ==> {err}")
df = pd.concat(datas)
I am very very new to Python (no coding history or skills whatsoever). I have been trying to automate pulling data from Yahoo and have built the following program from whatever I could find on the net. SO please excuse the poor coding attempt (however it almost works perfectly for me).
I am trying to download listed financial stock data (as you'll see in the code)
I want it downloaded to a specific excel sheet - in its raw form (as I link it to another excel sheet which runs my calculations).
Here is the problem. The following code works perfectly for all US stocks, a bunch of EU stocks but none for Australian / NZ and some EU stocks where i get the error: "Length mismatch: Expected axis has 4 elements, new values have 2 element"
I am absolutely stumped. It was working previously - then I started playing around with matplotlib and now nothing is working for Australian / New Zealand (and some EU) Stocks.
Any help whatsoever is greatly appreciated and again, I am brand new to this so please go easy: Here is my code:
import pandas as pd
import yfinance as yf
import yahoofinancials
from yahoo_fin.stock_info import *
from openpyxl import load_workbook
x = input("Enter Stock: ")
a = (x)
datatoexcel = pd.ExcelWriter("File.xlsx",engine='xlsxwriter')
data = stats_df = si.get_stats(a)
stats_df.to_excel(datatoexcel, sheet_name='Stats')
data = StatsVal_df = get_stats_valuation(x)
StatsVal_df.to_excel(datatoexcel, sheet_name='Stats Val')
data = BS_df = si.get_balance_sheet(a)
BS_df.to_excel(datatoexcel, sheet_name='Balance Sheet')
data = IS_df = si.get_income_statement(a)
IS_df.to_excel(datatoexcel, sheet_name='PnL')
data = CF_df = si.get_cash_flow(a)
CF_df.to_excel(datatoexcel, sheet_name='CashFlow')
Data = Data_df = get_data(x)
Data_df.to_excel(datatoexcel, sheet_name='Historical Price History')
datatoexcel.save()
The issue is mainly contained to:
data = stats_df = si.get_stats(a)
stats_df.to_excel(datatoexcel, sheet_name='Stats')
so, for example, I will run "GOOGL" / "AAPL" / "MSFT" / "BSX" / "BMW.DE" and it works perfectly. Yet, when I run "NAN.AX" / "CBA.AX" or any other stock like that - i get the error: Length mismatch: Expected axis has 4 elements, new values have 2 element
If you check the documentation for yahoo_fin, it mentions that data is retrieved by scraping the yahoo page for the selected stock. Without a paid Yahoo account, you can't see data for foreign stocks.
US stock: https://finance.yahoo.com/quote/NFLX/key-statistics?p=NFLX
Foreign: https://finance.yahoo.com/quote/CBA.AX/key-statistics?p=CBA.AX
The documentation can be found here:
http://theautomatic.net/yahoo_fin-documentation/#get_stats
To clarify the issue, you can run this code:
import yahoo_fin.stock_info as si
print('AAPL', len(si.get_stats('AAPL')))
print('CBA.AX', len(si.get_stats('CBA.AX')))
Output
AAPL 50
Traceback (most recent call last):
.....
ValueError: Length mismatch: Expected axis has 4 elements, new values have 2 elements
I am trying to pull all stock tickers from NYSE, and then filter out for only those with MarketCap above 5B.
I am running into a problem because based on how my data load comes in all columns are data type "Object" and I cannot find anyway to convert them to anything else. See my code and comments below:
import pandas as pd
import numpy as np
# NYSE
url_nyse = "http://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download"
df = pd.DataFrame.from_csv(url_nyse)
df = df.drop(df.columns[[0, 1, 3, 6,7]], axis=1)
This is my initial data load of NYSE stocks, and then I filter for just MarketCap, Sector, and Industry.
At first I was hoping to filter out MarketCap first by anything with "M" in it was removed and then removing the first and last characters to get a number which then could be filtered to keep anything above 5. However I think it is because of the data types being "Object" and not string I have not bee able to do it directly. So I then created new columns that would contain only letters or numbers, hoping that I could then convert to data type string and float from there.
df['MarketCap_Num'] = df.MarketCap.str[1:-1]
df['Billion_Filter'] = df.MarketCap.str[-1:]
So MarketCap_Num column has only the numbers by removing the first and last characters and Billion_Filter is only the last character where I will remove any value that = M.
However even though these columns are just numbers or just strings I CANNOT find anyway to convert to change from object datatype so then my filtering is not working at all. Any help is much appreciated.
I have tried .astype(float), pd.to_numeric, type functions to no success.
My filtering code would then be:
df[df.Billion_Filter.str.contains("B")]
But when I run that nothing happens, no error but also no filter happens. When I run this code on a different table it works, so it must be the object data type that is holding it up.
Convert the MarketCap column into floats by first removing the dollar signs and then substituting B with e9 and M with e6. This should make it easy to use .astype(float) on the column to do the conversion.
import pandas as pd
import numpy as np
# NYSE
url_nyse = "http://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download"
df = pd.DataFrame.from_csv(url_nyse)
df = df.drop(df.columns[[0, 1, 3, 6,7]], axis=1)
df = df.replace({'MarketCap': {'\$': '', 'B': 'e9', 'M': 'e6', 'n/a': np.nan}}, regex=True)
df.MarketCap = df.MarketCap.astype(float)
print(df[df.MarketCap > 5000000000].head(10))
Yields:
MarketCap Sector industry
Symbol
MMM 1.419900e+11 Health Care Medical/Dental Instruments
WUBA 1.039000e+10 Technology Computer Software: Programming, Data Processing
ABB 5.676000e+10 Consumer Durables Electrical Products
ABT 9.887000e+10 Health Care Major Pharmaceuticals
ABBV 1.563200e+11 Health Care Major Pharmaceuticals
ACN 9.388000e+10 Miscellaneous Business Services
AYI 7.240000e+09 Consumer Durables Building Products
ADNT 7.490000e+09 Capital Goods Auto Parts:O.E.M.
AAP 7.370000e+09 Consumer Services Other Specialty Stores
ASX 1.083000e+10 Technology Semiconductors
You should be able to change the type of the MarketCap_Num column by using:
df['MarketCap_Num'] = df.MarketCap.str[1:-1].astype(np.float64)
You can then check the data types by df.dtypes.
As for the filter, you can simple just say
df_filtered = df[df['Billion_Filter'] =="B"].copy()
since you will only have one letter in your Billion_Filter column.
Obhect datatype works as string. You should be able to use both str.contains and extract the number without having to convert the object type to string
df = df[df['MarketCap'].str.contains('B')].copy()
df['MarketCap'] = df['MarketCap'].str.extract('(\d+.?\d*)', expand = False)
MarketCap Sector industry
Symbol
DDD 1.12 Technology Computer Software: Prepackaged Software
MMM 141.99 Health Care Medical/Dental Instruments
WUBA 10.39 Technology Computer Software: Programming, Data Processing
EGHT 1.32 Public UtilitiesTelecommunications Equipment
AIR 1.48 Capital Goods Aerospace