Accessing nested dictionary from a JSON, with variable headers - python

I am trying to use json_normalize to parse data from the yahoo financials package. Seem to be running into an issue when trying to separate the columns out from the last object, a variable date. Each date I believe is a dictionary, which contains various balance sheet line items.
My code is:
import json
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import yfinance as yf
from yahoofinancials import YahooFinancials
tickerinput = "AAPL"
ticker = yf.Ticker(tickerinput)
tickerfin = YahooFinancials(tickerinput)
balancesheet = tickerfin.get_financial_stmts('annual', 'balance')
''' Flattening w json_normalize'''
balsheet = pd.json_normalize(balancesheet, record_path=['balanceSheetHistory', tickerinput])
I have also tried using this below code but receive a key error, despite it being in the original JSON output.
balsheet = pd.json_normalize(balancesheet, record_path=['balanceSheetHistory', tickerinput], meta=['2021-09-25', ['totalLiab','totalStockholderEquity','totalAssets','commonStock','otherCurrentAssets','retainedEarnings','otherLiab','treasuryStock','otherAssets','cash','totalCurrentLiabilities','shortLongTermDebt','otherStockholderEquity','propertyPlantEquipment','totalCurrentAssets','longTermInvestments','netTangibleAssets','shortTermInvestments','netReceivables','longTermDebt','inventory','accountsPayable']], errors='ignore')
The main issue is that I am returned the below data frame:
Returned dataframe from balsheet
Sample Output of the JSON file:
JSON Output (balancesheet variable)

Related

Function to export multiple yfinance stocks to csv

I'm trying to define a function to allow me to extract information on stocks over the past 12 months and export it to a CSV file. I'm not sure where it's going wrong as it prints 'bad'. Any thoughts?
Thanks.
import pandas as py
import numpy as np
import yfinance as yf
import datetime as dt
from pandas_datareader import data as pdr
from yahoofinancials import YahooFinancials
yf.pdr_override()
now_time=dt.datetime.now()
start_time = dt.datetime(now_time.year - 1, now_time.month , now_time.day)
bad_names=[]
def download_stock(stock):
try:
print(stock)
stock_df = pdr.get_yahoo_data(stock, start_time, now_time)
stock_df['Name'] = stock
output_name = stock + '_data.csv'
stock_df.to_csv("./stocks/"+output_name)
except:
bad_names.append(stock)
print('bad: %s' % (stock))
download_stock('AAPL')
A try - except block will handle any exception and simply execute what follows after except.
You could try running the code without the try-except block and see what the error is.
Alternatively, you could use
except Exception as e:
print(e)
So you can know what is going wrong exactly. Looking at it now, I would guess that you are missing one dot in the filepath "../stocks/"+output_name

Importing saved HTML file in Pandas as DataFrame instead of a dict

I have a table saved offline into HTML format. I wish to import it in Pandas and work on it. But pandas imports it as Dict instead of dataframe. Here is my code:
import pandas as pd
import html5lib
option_table = pd.read_html("C:/Users/home-pc/Desktop/operator.html")
print(option_table[['Circle name', 'Code']])
Here is the HTML table which i have saved offline on my computer:enter link description here
The error I get when I run my code is:
Traceback (most recent call last):
File "C:\Users\home-pc\Desktop\offline.py", line 5, in <module>
print(option_table[['Circle name', 'Code']])
TypeError: list indices must be integers or slices, not list
How can I import my offline HTML page as a dataframe instead of a dict.
Looks like it's getting imported as a one-element list. Indexing that first element and then casting it as a DataFrame worked for me:
import pandas as pd
import html5lib
option_table = pd.read_html("https://sinuateainudog.htmlpasta.com/")
option_table_df = pd.DataFrame(option_table[0])
print(option_table_df[['Circle name', 'Code']])

Where am I going wrong retrieving stock data from Quandl?

ValueError: The Quandl API key must be provided either through the api_key variable or through the environmental variable QUANDL_API_KEY.
I am trying to retrieve some simple stock data from Quandl. I have put in the actual API key instead of the x in the below example code below but I am still getting errors. Am I missing out on something?
import datetime as dt
import matplotlib.pyplot as plt
from matplotlib import style
import pandas as pd
import pandas_datareader.data as web
style.use('ggplot')
symbol = 'AAPL'
api_key = 'x'
start = dt.datetime(2015, 1, 1)
end = dt.datetime.now()
df = web.DataReader(symbol, 'quandl', start, end, api_key)
print(df.head())
From the quandl docs:
AUTHENTICATION The Quandl Python module is free but you must have a
Quandl API key in order to download data. To get your own API key, you
will need to create a free Quandl account and set your API key.
After importing the Quandl module, you can set your API key with the
following command: quandl.ApiConfig.api_key = "YOURAPIKEY"
So you will need to pip install and import quandl. Then you can set the api_key attribute as above.
If you only want to get the data from Quandl, maybe you can try another approach.
import pandas as pd
import Quandl
api_key = 'yoursuperamazingquandlAPIkey'
df = Quandl.get('heregoesthequandlcode', authtoken = api_key)
print(df.head())

Invalid syntax querying JSON LOAD on Python

I'm trying to print out the data from the sig API but it is giving me an error although the url is correct.
import requests
import json
from json import loads
import pandas as pd
import matplotlib as plt
requests.get("https://api.meetup.com/2/groups?zip=eh1+1af&offset=0&city=Edinburgh&format=json&lon=-3.19000005722&category_id=34&photo-host=public&page=500&radius=25.0&fields=&lat=55.9500007629&order=id&desc=false&sig_id=243750775&sig=9072b77fb34f5b84a392da2505fd946c58e94fe5")
The error is here, apparently ("Invalid syntax");
print json.load(requests.get("https://api.meetup.com/2/groups?zip=eh1+1af&offset=0&city=Edinburgh&format=json&lon=-3.19000005722&category_id=34&photo-host=public&page=500&radius=25.0&fields=&lat=55.9500007629&order=id&desc=false&sig_id=243750775&sig=9072b77fb34f5b84a392da2505fd946c58e94fe5"))
Thank you
Since you're using requests, you should be aware that the Response object has a json method you can call to retrieve JSON from the HTTP response.
r = requests.get(url).json()
type(r)
dict
You can load the JSON response r into a dataframe, if you have to.
df = pd.io.json.json_normalize(r['results'])

Parse requests.get() output into a pandas dataframe

I am following a tutorial an am stuck at parsing the output of requests.get()
My goal is to connect to the API below to pull historical crypto-currency prices and put them into a pandas dataframe for further analysis.
[API: https://www.cryptocompare.com/api/#-api-data-histoday-]
Here's what I have.
import requests
response = requests.get("https://min-api.cryptocompare.com/data/histodayfsym=ETC&tsym=USD&limit=10&aggregate=3&e=CCCAGG")
print(response.text)
Now I want to output into a dataframe...
pd.DataFrame.from_dict(response)
But I get...
PandasError: DataFrame constructor not properly called!
You can use the json package to convert to dict:
import requests
from json import loads
import pandas as pd
response = requests.get("https://min-api.cryptocompare.com/data/histodayfsym=ETC&tsym=USD&limit=10&aggregate=3&e=CCCAGG")
dic = loads(response.text)
print(type(dic))
pd.DataFrame.from_dict(dic)
However as jonrsharpe noted, a much more simple way would be:
import requests
import pandas as pd
response = requests.get("https://min-api.cryptocompare.com/data/histodayfsym=ETC&tsym=USD&limit=10&aggregate=3&e=CCCAGG")
print(type(response.json()))
pd.DataFrame.from_dict(response.json())

Categories

Resources