Im trying to extract currency pairs from the poloniex API using Python pandas.
I believe the data returned is all just a single column name:
Columns: [{"BTC_BCN":{"BTC":"479.74697466", "BCN":"1087153595.32266165"}, "BTC_BELA":{"BTC":"32.92293515", "BELA":"1807337.13247948"}, "BTC_BLK":{"BTC":"25.70374054", "BLK":"606717.86348734"}, "BTC_BTCD":{"BTC":"24.32220571", "BTCD":"1264.02352237"}, "BTC_BTM":{"BTC":"11.57816905", "BTM":"80673.47934437"}, "BTC_BTS":{"BTC":"1102.88787610", "BTS":"30426626.64558044"}
The result I want: BTC_BCN, BTC_BELA, BTC_BLK, etc...
But not really sure if there is a simple way to get this without string parsing since they all appear to just be column names.
Code:
from bs4 import BeautifulSoup
import csv
import urllib2
import pandas as pd
try:
from StringIO import StringIO
except:
from io import StringIO
sock= urllib2.urlopen('https://poloniex.com/public?command=return24hVolume')
link=sock.read()
soup = BeautifulSoup(link,'lxml')
csv_data = StringIO(soup.text)
df=pd.read_csv(csv_data,delimiter=' *, *',engine='python')
df2=df.iloc[1:2,0:20]
You don't need BeautifulSoup here at all. The contents of the webpage is JSON - parse it with .read_json() directly:
df = pd.read_json('https://poloniex.com/public?command=return24hVolume')
Related
My python script fetches data from below website 'http://api.sl.se/api2/deviations.json?key=c7606e4606f642a380f7fdd75d683448' in a text file.
Now my aim is to filter: 'Headers', 'Details', 'FromDateTime', 'UptoDateTime' and 'Updated'
I have tried BS with text specific search, but not there...Below code shows that. Any help will be indeed helpful :)Sorry if I missed something very natural..
'''
import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd
import csv
import operator
from numpy import *
# Collect and parse first page
page = requests.get('http://api.sl.se/api2/deviations.json?
key=c7606e4606f642a380f7fdd75d683448')
soup = BeautifulSoup(page.text, 'html.parser')
#print(soup)
for script in
soup(["Header","Details","Updated","UpToDateTime","FromDateTime"]):
script.extract()
# get text
text = soup.get_text()
# break into lines and remove leading and trailing space on each
lines = (line.strip() for line in text.splitlines())
# break multi-headlines into a line each
chunks = (phrase.strip() for line in lines for phrase in line.split(" "))
# drop blank lines
text = '\n'.join(chunk for chunk in chunks if chunk)
f1 = open(data.txt", "r")
resultFile = open("out.csv", "wb")
wr = csv.writer(resultFile, quotechar=',')
'''
I expect a csv with columns of Header","Details","Updated","UpToDateTime","FromDateTime"
You are doing in a wrong way. You don't need a beautifulsoup for this task. Your api returning data as json. BeautifulSoup is best for html. For your purpose you can use PANDAS and JSON Library.
Pandas can read directly from webresource as well but you want only requestdata from json so that you require both library.
Here is a snippet which you can use :
import pandas as pd
import requests
import json
page = requests.get('http://api.sl.se/api2/deviations.json?key=c7606e4606f642a380f7fdd75d683448')
data = json.loads(page.text)
df = pd.DataFrame(data["ResponseData"])
df.to_csv("file path")
Change File path and you get whole data inside csv.
But if you want to remove any column or any manipulation over data you can do using pandas dataframe as well. It is very powerful library you can learn about it using google.
I found library that allows me to get data from yahoo finance very efficiently. It's a wonderful library.
The problem is, I can't save the data into a csv file.
I've tried converting the data to a Panda Dataframe but I think I'm doing it incorrectly and I'm getting a bunch of 'NaN's.
I tried using Numpy to save directly into a csv file and that's not working either.
import yfinance as yf
import csv
import numpy as np
urls=[
'voo',
'msft'
]
for url in urls:
tickerTag = yf.Ticker(url)
print(tickerTag.actions)
np.savetxt('DivGrabberTest.csv', tickerTag.actions, delimiter = '|')
I can print the data on console and it's fine. Please help me save it into a csv. Thank you!
If you want to store the ticker results for each url in different csv files you can do:
for url in urls:
tickerTag = yf.Ticker(url)
tickerTag.actions.to_csv("tickertag{}.csv".format(url))
if you want them all to be in the same csv file you can do
import pandas as pd
tickerlist = [yf.Ticker.url for url in urls]
pd.concat(tickerlist).to_csv("tickersconcat.csv")
Using requests I am creating an object which is in .csv format. How can I then write that object to a DataFrame with pandas?
To get the requests object in text format:
import requests
import pandas as pd
url = r'http://test.url'
r = requests.get(url)
r.text #this will return the data as text in csv format
I tried (doesn't work):
pd.read_csv(r.text)
pd.DataFrame.from_csv(r.text)
Try this
import requests
import pandas as pd
import io
urlData = requests.get(url).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
I think you can use read_csv with url:
pd.read_csv(url)
filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)
The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv
import pandas as pd
import io
import requests
url = r'http://...'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r))
If it doesnt work, try update last line:
import pandas as pd
import io
import requests
url = r'http://...'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r.text))
Using "read_csv with url" worked:
import requests, csv
import pandas as pd
url = 'https://arte.folha.uol.com.br/ciencia/2020/coronavirus/csv/mundo/dados-bra.csv'
corona_bra = pd.read_csv(url)
print(corona_bra.head())
if the url has no authentication then you can directly use read_csv(url)
if you have authentication you can use request to get it un-pickel and print the csv and make sure the result is CSV and use panda.
You can directly use importing
import csv
I am looking to gather all the data from the penultimate worksheet in this Excel file along with all the data in the last Worksheet from "Maturity Years" of 5.5 onward. The code I have below currently grabs data from solely the last workbook and I was wondering what the necessary alterations would be.
import urllib2
import pandas as pd
import os
import xlrd
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
socket = urllib2.urlopen(url)
xd = pd.ExcelFile(socket)
df = xd.parse(xd.sheet_names[-1], header=None)
print df
I was thinking of using glob but I haven't seen any application of it with an Online Excel file.
Edit: I think the following allows me to combine two worksheets of data into a single Dataframe. However, if there is a better answer please feel free to show it.
import urllib2
import pandas as pd
import os
import xlrd
url = 'http://www.bankofengland.co.uk/statistics/Documents/yieldcurve/uknom05_mdaily.xls'
socket = urllib2.urlopen(url)
xd = pd.ExcelFile(socket)
df1 = xd.parse(xd.sheet_names[-1], header=None)
df2 = xd.parse(xd.sheet_names[-2], header=None)
bigdata = df1.append(df2,ignore_index = True)
print bigdata
How can I import data for example for the field A1?
When I use etree.parse() I get an error, because I dont have a xml file.
It's a zip file:
import zipfile
from lxml import etree
z = zipfile.ZipFile('mydocument.ods')
data = z.read('content.xml')
data = etree.XML(data)
etree.dump(data)