Download data from URL in Python 3.6

Download data from URL in Python 3.6 - python

I want to download data from https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData to dataframe.
I have tried below script, but could not succeeded.
import requests, io
import pandas as pd
URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData'
#1
urlData = requests.get(URL).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))
print(len(rawData))
Error: Python IDLE Got Stuck
#2
r = requests.get(URL)
urlData = pd.read_csv(io.StringIO(r))
print(len(urlData))
Error:-
urlData = pd.read_csv(io.StringIO(r))
TypeError: initial_value must be str or None, not Response
#3
urlData = pd.read_csv(URL, header=None)
print(len(urlData))

I got this working with
import requests, io
import pandas as pd
URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData'
#1
urlData = requests.get(URL).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')), sep="\t")
print(rawData.head())
print(rawData.info())

Simplest way is to use urllib2.
import urllib2
url_name = 'http://abc.pdf'
response = urllib2.urlopen(url_name)
file = open(url_name.split('//')[1], 'w')
file.write(response.read())
file.close()

I tried to download the data through the URL, and it does take a very long time. I recommend you to download through wget and then process it. The script itself seems fine.

Related

How to convert url data to csv using python

i am trying to download the data from the following url and tying to save it as csv data but the output i am getting is a text file. can anyone pls help what i am doing wrong here ? also, is it possible to add multiple url in the same script and download multiple csv files.
import csv
import pandas as pd
import requests
from datetime import datetime
CSV_URL = ('https://dsv-ops-toolkit.ihsmvals.com/ftp?config=fenics-bgc&file=IRSDATA_20211129_1700_Intra.csv&directory=%2FIRS%2FIntraday%2FDaily')
with requests.Session() as s:
download = s.get(CSV_URL)
decoded_content = download.content.decode('utf-8')
cr = csv.reader(decoded_content.splitlines(), delimiter=',')
date =datetime.now().strftime('%y%m%d')
my_list = list(cr)
df=pd.DataFrame(my_list)
df.to_csv(f'RFR_{date}')

You can create a list of your necessary URLs like:
urls = ['http://url1.com','http://url2.com','http://url3.com']
Iterate through the list for each url and your requests will be as it is:
for each_url in urls:
with requests.Session() as s:
# your_code_here
Hope you'll find this helpful.

Download xlsx file with Python

Want to download to local directory. This code works for csv but not xlsx. It writes a file but cannot be opened as Excel.
Any help will be appreciated.
url = 'https://some_url'
resp = requests.get(url)
open('some_filename.xlsx', 'wb').write(resp.content)

You could create a dataframe from the resp data and then use pd.to_excel() function to obtain the xlsx file. This is a tested solution, and it worked for me.
import requests
import pandas as pd
import io
url='https://www.google.com' #as an example
urlData = requests.get(url).content #Get the content from the url
dataframe = pd.read_csv(io.StringIO(urlData.decode('latin-1')))
filename="data.xlsx"
dataframe.to_excel(filename)

In pandas you could just do:
import pandas as pd
url = 'https://some_url'
df = pd.read_csv(url)

Download multiple(consecutive) txt from multiple(consecutive day)URLs

I'd like to download multiple txt that saves data by day.
The address is like this:
http://100.200.100.200/cd200730.txt
I'd like to download those txt files with start and end data input. I've made it through until I get all URLs but haven't found a way to download and save each file with its name of the day. - "cd200730.txt", "cd200731.txt", etc
import requests
import pandas as pd
# set date_range to start and end
date_range=pd.date_range(start='2018-04-24', end='2018-04-27', freq='D')
df=date_range.strftime('%y%m%d')
df2=df.to_frame(index=False,name='date')
df2['date'] = df2['date'].apply(lambda x: f"http://100.200.100.200/cd{x}.txt")
for url in df2.date:
r = requests.get(url, allow_redirects=True)
open(url, 'wb').write(r.content)
When I run this, I get the following error:
OSError: [Errno 22] Invalid argument: 'http://10.47.149.67/cd180424.txt'
When I run it with changed last line,"open('url.txt'...)", I get only the last file.
I feel like I should make another for loop in the part of "open(url)".
Is there any I can complete this work?
---Edited--- v0.1
I've made it through as following:
import requests
import pandas as pd
# date_range to start and end
date_range=pd.date_range(start='2018-04-24', end='2018-04-25', freq='D')
df=date_range.strftime('%y%m%d')
df_filename=df.to_frame(index=False,name='file_name')
df_filename['file_name']=df_filename['file_name'].apply(lambda x: f"cd{x}.txt")
df2=df.to_frame(index=False,name='date')
df2['date'] = df2['date'].apply(lambda x: f"http://100.200.100.200/cd{x}.txt")
for url in df2.date:
r = requests.get(url, allow_redirects=False)
for name in df_filename['file_name']:
open(name, 'wb').write(r.content)
---Edited--- v0.2
"v0.1" only saves the same data with various date files (cd200718.csv and cd200719 have the same data)
Something little is missing..
---Edited--- v0.3
Finnaly, the following works perfectly!
for url,name in zip(df2.date,df_filename['file_name']):
r = requests.get(url, allow_redirects=False)
open(name, 'wb').write(r.content)

Convert text data from requests object to dataframe with pandas

Using requests I am creating an object which is in .csv format. How can I then write that object to a DataFrame with pandas?
To get the requests object in text format:
import requests
import pandas as pd
url = r'http://test.url'
r = requests.get(url)
r.text #this will return the data as text in csv format
I tried (doesn't work):
pd.read_csv(r.text)
pd.DataFrame.from_csv(r.text)

Try this
import requests
import pandas as pd
import io
urlData = requests.get(url).content
rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')))

I think you can use read_csv with url:
pd.read_csv(url)
filepath_or_buffer : str, pathlib.Path, py._path.local.LocalPath or any object with a read() method (such as a file handle or StringIO)
The string could be a URL. Valid URL schemes include http, ftp, s3, and file. For file URLs, a host is expected. For instance, a local file could be file ://localhost/path/to/table.csv
import pandas as pd
import io
import requests
url = r'http://...'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r))
If it doesnt work, try update last line:
import pandas as pd
import io
import requests
url = r'http://...'
r = requests.get(url)
df = pd.read_csv(io.StringIO(r.text))

Using "read_csv with url" worked:
import requests, csv
import pandas as pd
url = 'https://arte.folha.uol.com.br/ciencia/2020/coronavirus/csv/mundo/dados-bra.csv'
corona_bra = pd.read_csv(url)
print(corona_bra.head())

if the url has no authentication then you can directly use read_csv(url)
if you have authentication you can use request to get it un-pickel and print the csv and make sure the result is CSV and use panda.
You can directly use importing
import csv

Download a file providing username and password using Python

My file named as 'blueberry.jpg' begins downloading, when I click on the following url manually provided that the username and password are typed when asked:
http://example.com/blueberry/download
How can I make that happen using Python?
import urllib.request
url = 'http://example.com/blueberry/download'
data = urllib.request.urlopen(url).read()
fo = open('E:\\quail\\' + url.split('/')[1] + '.jpg', 'w')
print (data, file = fo)
fo.close()
However above program does not write the required file, how can I provide the required username and password?

Use requests, which provides a friendlier interface to the various url libraries in Python:
import os
import requests
from urlparse import urlparse
username = 'foo'
password = 'sekret'
url = 'http://example.com/blueberry/download/somefile.jpg'
filename = os.path.basename(urlparse(url).path)
r = requests.get(url, auth=(username,password))
if r.status_code == 200:
with open(filename, 'wb') as out:
for bits in r.iter_content():
out.write(bits)
UPDATE:
For Python3 get urlparse with: from urllib.parse import urlparse

I'm willing to bet you are using basic auth. So try doing the following:
import urllib.request
url = 'http://username:pwd#example.com/blueberry/download'
data = urllib.request.urlopen(url).read()
fo = open('E:\\quail\\' + url.split('/')[1] + '.jpg', 'w')
print (data, file = fo)
fo.close()
Let me know if this works.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Download data from URL in Python 3.6 - python

I got this working with import requests, io import pandas as pd URL = 'https://download.bls.gov/pub/time.series/ln/ln.data.1.AllData' #1 urlData = requests.get(URL).content rawData = pd.read_csv(io.StringIO(urlData.decode('utf-8')), sep="\t") print(rawData.head()) print(rawData.info())

Simplest way is to use urllib2. import urllib2 url_name = 'http://abc.pdf' response = urllib2.urlopen(url_name) file = open(url_name.split('//')[1], 'w') file.write(response.read()) file.close()

I tried to download the data through the URL, and it does take a very long time. I recommend you to download through wget and then process it. The script itself seems fine.

Related

How to convert url data to csv using python

Download xlsx file with Python

Download multiple(consecutive) txt from multiple(consecutive day)URLs

Convert text data from requests object to dataframe with pandas

Download a file providing username and password using Python

Categories

Resources