I want to download data from https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData to dataframe.
I have tried below script, but could not succeeded.
import requests, io
import pandas as pd
URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData'
#1
urlData = requests.get(URL).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
print(len(rawData))
Error: Python IDLE Got Stuck
#2
r = requests.get(URL)
urlData = pd.read_csv(io.StringIO(r))
print(len(urlData))
Error:-
urlData = pd.read_csv(io.StringIO(r))
TypeError: initial_value must be str or None, not Response
#3
urlData = pd.read_csv(URL, header=None)
print(len(urlData))
I got this working with
import requests, io
import pandas as pd
URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData'
#1
urlData = requests.get(URL).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')), sep="\t")
print(rawData.head())
print(rawData.info())
Simplest way is to use urllib2.
import urllib2
url_name = 'http://abc.pdf'
response = urllib2.urlopen(url_name)
file = open(url_name.split('//')[1], 'w')
file.write(response.read())
file.close()
I tried to download the data through the URL, and it does take a very long time. I recommend you to download through wget and then process it. The script itself seems fine.
Related
i am trying to download the data from the following url and tying to save it as csv data but the output i am getting is a text file. can anyone pls help what i am doing wrong here ? also, is it possible to add multiple url in the same script and download multiple csv files.
import csv
import pandas as pd
import requests
from datetime import datetime
CSV_URL = ('https://dsv-ops-toolkit.ihsmvals.com/ftp?config=fenics-bgc&file=IRSDATA_20211129_1700_Intra.csv&directory=%2FIRS%2FIntraday%2FDaily')
with requests.Session() as s:
download = s.get(CSV_URL)
decoded_content = download.content.decode('utf-8')
cr = csv.reader(decoded_content.splitlines(), delimiter=',')
date =datetime.now().strftime('%y%m%d')
my_list = list(cr)
df=pd.DataFrame(my_list)
df.to_csv(f'RFR_{date}')
You can create a list of your necessary URLs like:
urls = ['http://url1.com','http://url2.com','http://url3.com']
Iterate through the list for each url and your requests will be as it is:
for each_url in urls:
with requests.Session() as s:
# your_code_here
Hope you'll find this helpful.
Want to download to local directory. This code works for csv but not xlsx. It writes a file but cannot be opened as Excel.
Any help will be appreciated.
url = 'https://some_url'
resp = requests.get(url)
open('some_filename.xlsx', 'wb').write(resp.content)
You could create a dataframe from the resp data and then use pd.to_excel() function to obtain the xlsx file. This is a tested solution, and it worked for me.
import requests
import pandas as pd
import io
url='https://www.google.com' #as an example
urlData = requests.get(url).content #Get the content from the url
dataframe = pd.read_csv(io.StringIO(urlData.decode('latin-1')))
filename="data.xlsx"
dataframe.to_excel(filename)
In pandas you could just do:
import pandas as pd
url = 'https://some_url'
df = pd.read_csv(url)
I'd like to download multiple txt that saves data by day.
The address is like this:
http://100.200.100.200/cd200730.txt
I'd like to download those txt files with start and end data input. I've made it through until I get all URLs but haven't found a way to download and save each file with its name of the day. - "cd200730.txt", "cd200731.txt", etc
import requests
import pandas as pd
# set date_range to start and end
date_range=pd.date_range(start='2018-04-24', end='2018-04-27', freq='D')
df=date_range.strftime('%y%m%d')
df2=df.to_frame(index=False,name='date')
df2['date'] = df2['date'].apply(lambda x: f"http://100.200.100.200/cd{x}.txt")
for url in df2.date:
r = requests.get(url, allow_redirects=True)
open(url, 'wb').write(r.content)
When I run this, I get the following error:
OSError: [Errno 22] Invalid argument: 'http://10.47.149.67/cd180424.txt'
When I run it with changed last line,"open('url.txt'...)", I get only the last file.
I feel like I should make another for loop in the part of "open(url)".
Is there any I can complete this work?
---Edited--- v0.1
I've made it through as following:
import requests
import pandas as pd
# date_range to start and end
date_range=pd.date_range(start='2018-04-24', end='2018-04-25', freq='D')
df=date_range.strftime('%y%m%d')
df_filename=df.to_frame(index=False,name='file_name')
df_filename['file_name']=df_filename['file_name'].apply(lambda x: f"cd{x}.txt")
df2=df.to_frame(index=False,name='date')
df2['date'] = df2['date'].apply(lambda x: f"http://100.200.100.200/cd{x}.txt")
for url in df2.date:
r = requests.get(url, allow_redirects=False)
for name in df_filename['file_name']:
open(name, 'wb').write(r.content)
---Edited--- v0.2
"v0.1" only saves the same data with various date files (cd200718.csv and cd200719 have the same data)
Something little is missing..
---Edited--- v0.3
Finnaly, the following works perfectly!
for url,name in zip(df2.date,df_filename['file_name']):
r = requests.get(url, allow_redirects=False)
open(name, 'wb').write(r.content)
Using requests I am creating an object which is in .csv format. How can I then write that object to a DataFrame with pandas?
To get the requests object in text format:
import requests
import pandas as pd
url = r'http://test.url'
r = requests.get(url)
r.text #this will return the data as text in csv format
I tried (doesn't work):
pd.read_csv(r.text)
pd.DataFrame.from_csv(r.text)
Try this
import requests
import pandas as pd
import io
urlData = requests.get(url).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
I think you can use read_csv with url:
pd.read_csv(url)
filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)
The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv
import pandas as pd
import io
import requests
url = r'http://...'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r))
If it doesnt work, try update last line:
import pandas as pd
import io
import requests
url = r'http://...'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r.text))
Using "read_csv with url" worked:
import requests, csv
import pandas as pd
url = 'https://arte.folha.uol.com.br/ciencia/2020/coronavirus/csv/mundo/dados-bra.csv'
corona_bra = pd.read_csv(url)
print(corona_bra.head())
if the url has no authentication then you can directly use read_csv(url)
if you have authentication you can use request to get it un-pickel and print the csv and make sure the result is CSV and use panda.
You can directly use importing
import csv
My file named as 'blueberry.jpg' begins downloading, when I click on the following url manually provided that the username and password are typed when asked:
http://example.com/blueberry/download
How can I make that happen using Python?
import urllib.request
url = 'http://example.com/blueberry/download'
data = urllib.request.urlopen(url).read()
fo = open('E:\\quail\\' + url.split('/')[1] + '.jpg', 'w')
print (data, file = fo)
fo.close()
However above program does not write the required file, how can I provide the required username and password?
Use requests, which provides a friendlier interface to the various url libraries in Python:
import os
import requests
from urlparse import urlparse
username = 'foo'
password = 'sekret'
url = 'http://example.com/blueberry/download/somefile.jpg'
filename = os.path.basename(urlparse(url).path)
r = requests.get(url, auth=(username,password))
if r.status_code == 200:
with open(filename, 'wb') as out:
for bits in r.iter_content():
out.write(bits)
UPDATE:
For Python3 get urlparse with: from urllib.parse import urlparse
I'm willing to bet you are using basic auth. So try doing the following:
import urllib.request
url = 'http://username:pwd#example.com/blueberry/download'
data = urllib.request.urlopen(url).read()
fo = open('E:\\quail\\' + url.split('/')[1] + '.jpg', 'w')
print (data, file = fo)
fo.close()
Let me know if this works.