Parse requests.get() output into a pandas dataframe - python

I am following a tutorial an am stuck at parsing the output of requests.get()
My goal is to connect to the API below to pull historical crypto-currency prices and put them into a pandas dataframe for further analysis.
[API: https://www.cryptocompare.com/api/#-api-data-histoday-]
Here's what I have.
import requests
response = requests.get("https://min-api.cryptocompare.com/data/histodayfsym=ETC&tsym=USD&limit=10&aggregate=3&e=CCCAGG")
print(response.text)
Now I want to output into a dataframe...
pd.DataFrame.from_dict(response)
But I get...
PandasError: DataFrame constructor not properly called!

You can use the json package to convert to dict:
import requests
from json import loads
import pandas as pd
response = requests.get("https://min-api.cryptocompare.com/data/histodayfsym=ETC&tsym=USD&limit=10&aggregate=3&e=CCCAGG")
dic = loads(response.text)
print(type(dic))
pd.DataFrame.from_dict(dic)
However as jonrsharpe noted, a much more simple way would be:
import requests
import pandas as pd
response = requests.get("https://min-api.cryptocompare.com/data/histodayfsym=ETC&tsym=USD&limit=10&aggregate=3&e=CCCAGG")
print(type(response.json()))
pd.DataFrame.from_dict(response.json())

Related

I want to fetch data from a website and put in MySQL workbench, but it's not working

First time programmer here, please don't be harsh on me.
I want to fetch data from the URL's and put it inside MYSQL workbench database, it says that it's working, see image: enter image description here. But it's not doing so, what is wrong in the script?
# GET ALL WorldRecords from https://api.isuresults.eu/records
import requests
import pandas as pd
from pandas.io.json import json_normalize
from helper_db import make_db_connection
engine = make_db_connection
def get_isu_worldrecord_db(engine):
URL = "https://api.isuresults.eu/records/?type=WR"
df_final=pd.DataFrame()
for i in range(1,20):
params = {'page': i}
api = requests.get(url=URL, params=params)
data = api.json()
df = json_normalize(data,'results')
df_final=df_final.append(df,ignore_index=True,sort=False)
df_final=df_final.drop(['laps'], axis=1)
df_final.to_sql("Tester", con=engine,if_exists="replace", chunksize=1000)
return
You define this method, but you don't really run it.
Add another line at the last:
get_isu_worldrecord_db(engine)

Accessing nested dictionary from a JSON, with variable headers

I am trying to use json_normalize to parse data from the yahoo financials package. Seem to be running into an issue when trying to separate the columns out from the last object, a variable date. Each date I believe is a dictionary, which contains various balance sheet line items.
My code is:
import json
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
import yfinance as yf
from yahoofinancials import YahooFinancials
tickerinput = "AAPL"
ticker = yf.Ticker(tickerinput)
tickerfin = YahooFinancials(tickerinput)
balancesheet = tickerfin.get_financial_stmts('annual', 'balance')
''' Flattening w json_normalize'''
balsheet = pd.json_normalize(balancesheet, record_path=['balanceSheetHistory', tickerinput])
I have also tried using this below code but receive a key error, despite it being in the original JSON output.
balsheet = pd.json_normalize(balancesheet, record_path=['balanceSheetHistory', tickerinput], meta=['2021-09-25', ['totalLiab','totalStockholderEquity','totalAssets','commonStock','otherCurrentAssets','retainedEarnings','otherLiab','treasuryStock','otherAssets','cash','totalCurrentLiabilities','shortLongTermDebt','otherStockholderEquity','propertyPlantEquipment','totalCurrentAssets','longTermInvestments','netTangibleAssets','shortTermInvestments','netReceivables','longTermDebt','inventory','accountsPayable']], errors='ignore')
The main issue is that I am returned the below data frame:
Returned dataframe from balsheet
Sample Output of the JSON file:
JSON Output (balancesheet variable)

Store RDF data into Triplestore via SPARQL endpoint using python

I am trying to save data in the following url as triples into triples store for future query. Here are my code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import re
url='http://gnafld.net/address/?per_page=10&page=7'
page = requests.get(url)
response = requests.get(url)
response.raise_for_status()
results = re.findall('\"Address ID: (GAACT[0-9]+)\"', response.text)
address1=results[0]
a = "http://gnafld.net/address/"
new_url = a + address1
r = requests.get(new_url).content
print(r)
After I run the code above, I got the answer like:
enter image description here
My question is how to insert the RDF data to a Fuseki Server SPARQL endpoint? I try the code like this:
import rdflib
from rdflib.plugins.stores import sparqlstore
#the following sparql endpoint is provided by the GNAF website
endpoint = 'http://gnafld.net/sparql'
store = sparqlstore.SPARQLUpdateStore(endpoint)
gs=rdflib.ConjunctiveGraph(store)
gs.open((endpoint,endpoint))
for stmt in r:
gs.add(stmt)
But it seems that it does not work. How can I fix this problem? Thanks for your help!
The answer you show in the image is in RDF triple format, it is just not pretty printed.
To store the RDF data in an RDF store you can use RDFlib. Here is an example of how to do that.
If you use Jena Fuseki server you should be able to access it from python just as you access any other SPARQL endpoint from python.
You may want to see my answer to a related SO question as well.

Invalid syntax querying JSON LOAD on Python

I'm trying to print out the data from the sig API but it is giving me an error although the url is correct.
import requests
import json
from json import loads
import pandas as pd
import matplotlib as plt
requests.get("https://api.meetup.com/2/groups?zip=eh1+1af&offset=0&city=Edinburgh&format=json&lon=-3.19000005722&category_id=34&photo-host=public&page=500&radius=25.0&fields=&lat=55.9500007629&order=id&desc=false&sig_id=243750775&sig=9072b77fb34f5b84a392da2505fd946c58e94fe5")
The error is here, apparently ("Invalid syntax");
print json.load(requests.get("https://api.meetup.com/2/groups?zip=eh1+1af&offset=0&city=Edinburgh&format=json&lon=-3.19000005722&category_id=34&photo-host=public&page=500&radius=25.0&fields=&lat=55.9500007629&order=id&desc=false&sig_id=243750775&sig=9072b77fb34f5b84a392da2505fd946c58e94fe5"))
Thank you
Since you're using requests, you should be aware that the Response object has a json method you can call to retrieve JSON from the HTTP response.
r = requests.get(url).json()
type(r)
dict
You can load the JSON response r into a dataframe, if you have to.
df = pd.io.json.json_normalize(r['results'])

Python retrieving data from web HTTP 400: Bad Request Error (Too many Requests?)

I am using a python module (googlefinance) to retrieve stock information. In my code, I create a symbols list which then gets sent into a loop to collect the information for each symbol.
The symbols list contains about 3000 indexes which is why I think I am getting this error. When I try shortening the range of the loop (24 requests), it works fine. I have tried also tried using a time delay in between requests but no luck. How can I make it so that I can retrieve the information for all specified symbols without getting the HTTP 400 Error?
from googlefinance import getQuotes
import pandas as pd
import pymysql
import time
import threading
import urllib.request
def createSymbolList(csvFile):
df = pd.read_csv(csvFile)
saved_column = df['Symbol']
return saved_column
def getSymbolInfo(symbolList):
newList=[]
for i in range(int(24)):
newList.append(getQuotes(symbolList[i]))
return newList
nyseList = createSymbolList("http://www.nasdaq.com/screening/companies-by-name.aspx?letter=0&exchange=nyse&render=download")
try:
l=(getSymbolInfo(nyseList))
print(l)
print(len(l))
except urllib.error.HTTPError as err:
print(err)

Categories

Resources