Convert text data from requests object to dataframe with pandas - python

Using requests I am creating an object which is in .csv format. How can I then write that object to a DataFrame with pandas?
To get the requests object in text format:
import requests
import pandas as pd
url = r'http://test.url'
r = requests.get(url)
r.text #this will return the data as text in csv format
I tried (doesn't work):
pd.read_csv(r.text)
pd.DataFrame.from_csv(r.text)

Try this
import requests
import pandas as pd
import io
urlData = requests.get(url).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))

I think you can use read_csv with url:
pd.read_csv(url)
filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)
The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv
import pandas as pd
import io
import requests
url = r'http://...'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r))
If it doesnt work, try update last line:
import pandas as pd
import io
import requests
url = r'http://...'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r.text))

Using "read_csv with url" worked:
import requests, csv
import pandas as pd
url = 'https://arte.folha.uol.com.br/ciencia/2020/coronavirus/csv/mundo/dados-bra.csv'
corona_bra = pd.read_csv(url)
print(corona_bra.head())

if the url has no authentication then you can directly use read_csv(url)
if you have authentication you can use request to get it un-pickel and print the csv and make sure the result is CSV and use panda.
You can directly use importing
import csv

Related

Unicode error when reading a csv file with pandas

Why pandas is not able to read this csv file and returns 'UnicodeEncodeError'. I tried lot of solutions from stackoverflow (local download, different encoding, change the engine...), but still not working...How to fix it?
import pandas as pd
url = 'http://é.com'
pd.read_csv(url,encoding='utf-8')
TL;DR
Your URL contains non ASCII character as the error complains.
Just change:
url = 'http://é.com'
For:
url = 'http://%C3%A9.com'
And the problem is fixed.
Solutions
Automatic URL escaping
Reading the error in depth shows that after executing the request to get resource behind the URL, the read_csv function expects the URL of resource to be ASCII encoded which seems not the be the case for this specific resource.
This call that is made by read_csv fails miserably:
import urllib.request
urllib.request.urlopen(url)
The problem is due to the accent in é that must be escaped to prevent urlopen to fail. Below a clean way to enforce this requirement:
import urllib.parse
result = urllib.parse.urlparse(url)
replaced = result._replace(path=urllib.parse.quote(result.path))
url = urllib.parse.urlunparse(replaced)
pd.read_csv(url)
Handling dataflow by yourself
Alternatively you can by pass this limitation by handling the complete flow by yourself. Following snippet does the trick:
import io
import gzip
import pandas as pd
import requests
url = 'http://é.com'
response = requests.get(url)
file = io.BytesIO(response.content)
with gzip.open(file, 'rb') as handler:
df = pd.read_csv(handler)
The key is to get the HTTP resource and deflate it then fake the content as a file-like object because read_csv does read directly CSV strings.

Problem in reading Excel file form url into a dataframe

How can I read excel file from a url into a dataframe?
import requests
request_url = 'https://pishtazfund.com/Download/DownloadNavChartList?exportType=Excel&fromDate=5/9/2008&toDate=2/22/2022&basketId=0'
response = requests.get(request_url, headers={'Accept': 'text/html'})
I can not convert the response into a dataframe, any idea or solution appreciated
You can use panda's read_csv()
import pandas as pd
df = pd.read_csv('https://pishtazfund.com/Download/DownloadNavChartList?exportType=Excel&fromDate=5/9/2008&toDate=2/22/2022&basketId=0')

Download xlsx file with Python

Want to download to local directory. This code works for csv but not xlsx. It writes a file but cannot be opened as Excel.
Any help will be appreciated.
url = 'https://some_url'
resp = requests.get(url)
open('some_filename.xlsx', 'wb').write(resp.content)
You could create a dataframe from the resp data and then use pd.to_excel() function to obtain the xlsx file. This is a tested solution, and it worked for me.
import requests
import pandas as pd
import io
url='https://www.google.com' #as an example
urlData = requests.get(url).content #Get the content from the url
dataframe = pd.read_csv(io.StringIO(urlData.decode('latin-1')))
filename="data.xlsx"
dataframe.to_excel(filename)
In pandas you could just do:
import pandas as pd
url = 'https://some_url'
df = pd.read_csv(url)

Download data from URL in Python 3.6

I want to download data from https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData to dataframe.
I have tried below script, but could not succeeded.
import requests, io
import pandas as pd
URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData'
#1
urlData = requests.get(URL).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
print(len(rawData))
Error: Python IDLE Got Stuck
#2
r = requests.get(URL)
urlData = pd.read_csv(io.StringIO(r))
print(len(urlData))
Error:-
urlData = pd.read_csv(io.StringIO(r))
TypeError: initial_value must be str or None, not Response
#3
urlData = pd.read_csv(URL, header=None)
print(len(urlData))
I got this working with
import requests, io
import pandas as pd
URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData'
#1
urlData = requests.get(URL).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')), sep="\t")
print(rawData.head())
print(rawData.info())
Simplest way is to use urllib2.
import urllib2
url_name = 'http://abc.pdf'
response = urllib2.urlopen(url_name)
file = open(url_name.split('//')[1], 'w')
file.write(response.read())
file.close()
I tried to download the data through the URL, and it does take a very long time. I recommend you to download through wget and then process it. The script itself seems fine.

Extracting data/string from Pandas DF column

Im trying to extract currency pairs from the poloniex API using Python pandas.
I believe the data returned is all just a single column name:
Columns: [{"BTC_BCN":{"BTC":"479.74697466", "BCN":"1087153595.32266165"}, "BTC_BELA":{"BTC":"32.92293515", "BELA":"1807337.13247948"}, "BTC_BLK":{"BTC":"25.70374054", "BLK":"606717.86348734"}, "BTC_BTCD":{"BTC":"24.32220571", "BTCD":"1264.02352237"}, "BTC_BTM":{"BTC":"11.57816905", "BTM":"80673.47934437"}, "BTC_BTS":{"BTC":"1102.88787610", "BTS":"30426626.64558044"}
The result I want: BTC_BCN, BTC_BELA, BTC_BLK, etc...
But not really sure if there is a simple way to get this without string parsing since they all appear to just be column names.
Code:
from bs4 import BeautifulSoup
import csv
import urllib2
import pandas as pd
try:
from StringIO import StringIO
except:
from io import StringIO
sock= urllib2.urlopen('https://poloniex.com/public?command=return24hVolume')
link=sock.read()
soup = BeautifulSoup(link,'lxml')
csv_data = StringIO(soup.text)
df=pd.read_csv(csv_data,delimiter=' *, *',engine='python')
df2=df.iloc[1:2,0:20]
You don't need BeautifulSoup here at all. The contents of the webpage is JSON - parse it with .read_json() directly:
df = pd.read_json('https://poloniex.com/public?command=return24hVolume')

Categories

Resources