I have the below code, where I am trying to get the data from https://www.quandl.com/data/TSE/documentation/metadata. (Trying to get the Download detailed data)
for page_number in range(1, 5):
link = r'https://www.quandl.com/api/v3/datasets.csv?database_code=TSE&per_page=100&sort_by=id&page=' + str(page_number)
r = requests.get(link, stream=True).text
print(r)
# How to put the results in a dataframe?
However, I have trouble putting the results in a dataframe / saving it in a SQLite database. How should I be doing this?
You can use Pandas to read this data directly:
import pandas as pd
url = ("https://www.quandl.com/api/v3/datasets.csv?"
"database_code=TSE&per_page=100&sort_by=id&page={0}")
[pd.read_csv(url.format(page_number)) for page_number in range(1, 5)]
To read from response you can use StringIO:
from io import StringIO
pd.read_csv(StringIO(r.text))
Related
I have been trying to fetch Json data from an API using Python so that I can transfer that data to sqlite3 database. The issue is that the data is unbalanced. My end goal is to transfer this json data to a .db file in sqlite3.
Here is what I did:
import pandas as pd
url = "https://baseballsavant.mlb.com/gf?game_pk=635886"
df = pd.read_json(url)
print(df)
This is the error I am getting:
raise ValueError("All arrays must be of the same length")
ValueError: All arrays must be of the same length
It's not obvious what you want your final DataFrame to look like, but appending "orient='index'" avoids the problem in this case.
import pandas as pd
url = "https://baseballsavant.mlb.com/gf?game_pk=635886"
df = pd.read_json(url, orient='index')
print(df)
You could also request the data with, for example, the requests module and prepare it before loading it into a DataFrame
import requests
url = "https://baseballsavant.mlb.com/gf?game_pk=635886"
response = requests.get(url)
data = response.json()
"""
Do data transformations here
"""
df = pd.DataFrame.from_dict(data)
I want to create a table with the information available on this website. I want the table to have 3 columns: 0 series/date, 1 title and 2 links. I already managed to get the first two columns but I don't know how to get the link for each entry.
import pandas as pd
import requests
url = "http://legislaturautuado.com/pgs/resolutions.php?st=5&f=2016"
r = requests.get(url)
df_list = pd.read_html(r.text)
df = df_list[0]
df.head()
Will it be possible to get what I want by only using pandas?
As far as I know, it's not possible with pandas only. It can be done with BeautifulSoup, though:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = "http://legislaturautuado.com/pgs/resolutions.php?st=5&f=2016"
r = requests.get(url)
html_table = BeautifulSoup(r.text).find('table')
r.close()
df = pd.read_html(str(html_table), header=0)[0]
df['Link'] = [link.get('href') for link in html_table.find_all('a')]
The I wrote some code to scrape data off of a web page and put it in csv format, but the end result isn't what I want. The figures/values are handled properly, but the column headings aren't. Here's what I mean:
Above is how it looks in the spreadsheet.
And this is how it looks in a text editor. I don't understand what's wrong because when I open the array in Spyder it looks alright:
Here's my code:
from numpy import array, insert, transpose, savetxt
from bs4 import BeautifulSoup as bs
from requests import get
from csv import writer
def obtain_links():
data = {"inflation":get("https://www.cbn.gov.ng/rates/inflrates.asp").text,
"crude_oil":get("https://www.cbn.gov.ng/rates/DailyCrude.asp").text,
"for_res":get("https://www.cbn.gov.ng/IntOps/Reserve.asp").text,
"exch":get("https://www.cbn.gov.ng/rates/").text,
"m_market":get("https://www.cbn.gov.ng/rates/mnymktind.asp").text}
return data
data = obtain_links()
def parse_inf(data=data):
html = bs(data["inflation"],"lxml")
year = html.find("td",width="62%").find("h2").text
months = html.find("div",id="ContentTextinner").find_all("th")
inf_months = [i.text for i in months]
inf_months = [f"{i} {year}" for i in inf_months]
inf_months[0] = inf_months[0][:5]
inf_months = array(inf_months).transpose()
measure = html.find("div",id="ContentTextinner").find_all("td",align="left")
measure = [i.text for i in measure][:-3]
values = [[i.text[:5] for i in html.find("div",id="ContentTextinner").find_all("td",class_="style2",style="width: 15%")],
[i.text[:5] for i in html.find("div",id="ContentTextinner").find_all("td",class_="style2",style="width: 16%")],
[i.text[:5] for i in html.find("div",id="ContentTextinner").find_all("td",class_="style2",width="20%")]]
values.insert(0,measure)
values = array(values)
inf_data = insert(values,0,inf_months,axis=1)
return inf_data
inf_data = parse_inf()
savetxt("/home/user/Documents/Scraped Data/Inflation Data.csv",inf_data,fmt="%s",delimiter=",")
Any ideas chaps and chappettes?
I have a table that spans across many pages. I'm able to pull the info from a designated page and pull it into a CSV table. My goal now is to have this iterate through all the pages and add it to the bottom of the previous page's info. Here is the code so far that works on a single page:
import requests
import pandas as pd
url = 'https://www.mineralanswers.com/oklahoma/producers?page=1'
html = requests.get(url).content
df_list = pd.read_html(html)
df = df_list[-1]
print(df)
df.to_csv('my data.csv')
The page URL is setup in the "...producers?page=1, ...producers?page=2 ...producers?page=3" format so I feel like it's likely possible using a loop, I just am having trouble amending the data instead of overwriting it.
Here is corrected example code to fetch 3 pages and append them to one DataFrame. You may run this code here online.
import requests
import pandas as pd
df = pd.DataFrame()
for page in range(1, 4):
url = 'https://www.mineralanswers.com/oklahoma/producers?page=' + str(page)
html = requests.get(url).content
df_list = pd.read_html(html)
df = df.append(df_list[-1], ignore_index = True)
df.to_csv('my data.csv')
I'm trying to get an API call and save it as a dataframe.
problem is that I need the data from the 'result' column.
Didn't succeed to do that.
I'm basically just trying to save the API call as a csv file in order to work with it.
P.S when I do this with a "JSON to CSV converter" from the web it does it as I wish. (example: https://konklone.io/json/)
import requests
import pandas as pd
import json
res = requests.get("http://api.etherscan.io/api?module=account&action=txlist&
address=0xddbd2b932c763ba5b1b7ae3b362eac3e8d40121a&startblock=0&
endblock=99999999&sort=asc&apikey=YourApiKeyToken")
j = res.json()
j
df = pd.DataFrame(j)
df.head()
output example picture
Try this
import requests
import pandas as pd
import json
res = requests.get("http://api.etherscan.io/api?module=account&action=txlist&address=0xddbd2b932c763ba5b1b7ae3b362eac3e8d40121a&startblock=0&endblock=99999999&sort=asc&apikey=YourApiKeyToken")
j = res.json()
# print(j)
filename ="temp.csv"
df = pd.DataFrame(j['result'])
print(df.head())
df.to_csv(filename)
Looks like you need.
df = pd.DataFrame(j["result"])