Using requests I am creating an object which is in .csv format. How can I then write that object to a DataFrame with pandas?
To get the requests object in text format:
import requests
import pandas as pd
url = r'http://test.url'
r = requests.get(url)
r.text #this will return the data as text in csv format
I tried (doesn't work):
pd.read_csv(r.text)
pd.DataFrame.from_csv(r.text)
Try this
import requests
import pandas as pd
import io
urlData = requests.get(url).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
I think you can use read_csv with url:
pd.read_csv(url)
filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)
The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv
import pandas as pd
import io
import requests
url = r'http://...'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r))
If it doesnt work, try update last line:
import pandas as pd
import io
import requests
url = r'http://...'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r.text))
Using "read_csv with url" worked:
import requests, csv
import pandas as pd
url = 'https://arte.folha.uol.com.br/ciencia/2020/coronavirus/csv/mundo/dados-bra.csv'
corona_bra = pd.read_csv(url)
print(corona_bra.head())
if the url has no authentication then you can directly use read_csv(url)
if you have authentication you can use request to get it un-pickel and print the csv and make sure the result is CSV and use panda.
You can directly use importing
import csv
Related
Why pandas is not able to read this csv file and returns 'UnicodeEncodeError'. I tried lot of solutions from stackoverflow (local download, different encoding, change the engine...), but still not working...How to fix it?
import pandas as pd
url = 'http://é.com'
pd.read_csv(url,encoding='utf-8')
TL;DR
Your URL contains non ASCII character as the error complains.
Just change:
url = 'http://é.com'
For:
url = 'http://%C3%A9.com'
And the problem is fixed.
Solutions
Automatic URL escaping
Reading the error in depth shows that after executing the request to get resource behind the URL, the read_csv function expects the URL of resource to be ASCII encoded which seems not the be the case for this specific resource.
This call that is made by read_csv fails miserably:
import urllib.request
urllib.request.urlopen(url)
The problem is due to the accent in é that must be escaped to prevent urlopen to fail. Below a clean way to enforce this requirement:
import urllib.parse
result = urllib.parse.urlparse(url)
replaced = result._replace(path=urllib.parse.quote(result.path))
url = urllib.parse.urlunparse(replaced)
pd.read_csv(url)
Handling dataflow by yourself
Alternatively you can by pass this limitation by handling the complete flow by yourself. Following snippet does the trick:
import io
import gzip
import pandas as pd
import requests
url = 'http://é.com'
response = requests.get(url)
file = io.BytesIO(response.content)
with gzip.open(file, 'rb') as handler:
df = pd.read_csv(handler)
The key is to get the HTTP resource and deflate it then fake the content as a file-like object because read_csv does read directly CSV strings.
How can I read excel file from a url into a dataframe?
import requests
request_url = 'https://pishtazfund.com/Download/DownloadNavChartList?exportType=Excel&fromDate=5/9/2008&toDate=2/22/2022&basketId=0'
response = requests.get(request_url, headers={'Accept': 'text/html'})
I can not convert the response into a dataframe, any idea or solution appreciated
You can use panda's read_csv()
import pandas as pd
df = pd.read_csv('https://pishtazfund.com/Download/DownloadNavChartList?exportType=Excel&fromDate=5/9/2008&toDate=2/22/2022&basketId=0')
Want to download to local directory. This code works for csv but not xlsx. It writes a file but cannot be opened as Excel.
Any help will be appreciated.
url = 'https://some_url'
resp = requests.get(url)
open('some_filename.xlsx', 'wb').write(resp.content)
You could create a dataframe from the resp data and then use pd.to_excel() function to obtain the xlsx file. This is a tested solution, and it worked for me.
import requests
import pandas as pd
import io
url='https://www.google.com' #as an example
urlData = requests.get(url).content #Get the content from the url
dataframe = pd.read_csv(io.StringIO(urlData.decode('latin-1')))
filename="data.xlsx"
dataframe.to_excel(filename)
In pandas you could just do:
import pandas as pd
url = 'https://some_url'
df = pd.read_csv(url)
I want to download data from https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData to dataframe.
I have tried below script, but could not succeeded.
import requests, io
import pandas as pd
URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData'
#1
urlData = requests.get(URL).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
print(len(rawData))
Error: Python IDLE Got Stuck
#2
r = requests.get(URL)
urlData = pd.read_csv(io.StringIO(r))
print(len(urlData))
Error:-
urlData = pd.read_csv(io.StringIO(r))
TypeError: initial_value must be str or None, not Response
#3
urlData = pd.read_csv(URL, header=None)
print(len(urlData))
I got this working with
import requests, io
import pandas as pd
URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData'
#1
urlData = requests.get(URL).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')), sep="\t")
print(rawData.head())
print(rawData.info())
Simplest way is to use urllib2.
import urllib2
url_name = 'http://abc.pdf'
response = urllib2.urlopen(url_name)
file = open(url_name.split('//')[1], 'w')
file.write(response.read())
file.close()
I tried to download the data through the URL, and it does take a very long time. I recommend you to download through wget and then process it. The script itself seems fine.
Im trying to extract currency pairs from the poloniex API using Python pandas.
I believe the data returned is all just a single column name:
Columns: [{"BTC_BCN":{"BTC":"479.74697466", "BCN":"1087153595.32266165"}, "BTC_BELA":{"BTC":"32.92293515", "BELA":"1807337.13247948"}, "BTC_BLK":{"BTC":"25.70374054", "BLK":"606717.86348734"}, "BTC_BTCD":{"BTC":"24.32220571", "BTCD":"1264.02352237"}, "BTC_BTM":{"BTC":"11.57816905", "BTM":"80673.47934437"}, "BTC_BTS":{"BTC":"1102.88787610", "BTS":"30426626.64558044"}
The result I want: BTC_BCN, BTC_BELA, BTC_BLK, etc...
But not really sure if there is a simple way to get this without string parsing since they all appear to just be column names.
Code:
from bs4 import BeautifulSoup
import csv
import urllib2
import pandas as pd
try:
from StringIO import StringIO
except:
from io import StringIO
sock= urllib2.urlopen('https://poloniex.com/public?command=return24hVolume')
link=sock.read()
soup = BeautifulSoup(link,'lxml')
csv_data = StringIO(soup.text)
df=pd.read_csv(csv_data,delimiter=' *, *',engine='python')
df2=df.iloc[1:2,0:20]
You don't need BeautifulSoup here at all. The contents of the webpage is JSON - parse it with .read_json() directly:
df = pd.read_json('https://poloniex.com/public?command=return24hVolume')