This question already has answers here:
How to download a file over HTTP?
(30 answers)
Closed 4 years ago.
I'm completely new to Python, I want to download a file by sending a request to the server. When I type it into my browser, I see the CSV file is downloaded, but when I try sending a get request it does not return anything. for example:
import urllib2
response = urllib2.urlopen('https://publicwww.com/websites/%22google.com%22/?export=csv')
data = response.read()
print 'data: ', data
It does not show anything, how can I handle that? When I search on the web, all the questions are about how to send a get request. I can send the get request, but I have no idea of how the file can be downloaded as it is not in the response of the request.
I do not have any idea of how to find a solution for that.
You can use the urlretrieve to download the file
EX:
u = "https://publicwww.com/websites/%22google.com%22/?export=csv"
import urllib
urllib.request.urlretrieve (u, "Ktest.csv")
You can also download a file using requests module in python.
import shutil
import requests
url = "https://publicwww.com/websites/%22google.com%22/?export=csv"
response = requests.get(url, stream=True)
with open('file.csv', 'wb') as out_file:
shutil.copyfileobj(response.raw, out_file)
del response
import os
os.system("wget https://publicwww.com/websites/%22google.com%22/?export=csv")
You could try wget, if you have it.
Related
This question already has answers here:
Downloading a directory tree with ftplib
(6 answers)
Closed 2 years ago.
I know this question has been asked multiple times, but none of the solutions actually worked so far.
I would like to pull some files to a web tool based on an URL.
This seems to be an FTP share but using
import ftplib
url = 'ftp://ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS1167'
ftp = ftplib.FTP(url)
6 ftp = ftplib.FTP(url)
gaierror: [Errno -2] Name or service not known
It is easy to download single files with wget:
wget.download(url+'/'+filename, out=ms_dir)
However, the python implementation of wget does not have all features of the Linux tool implemented. So, something like wget.download(url+'/*.*', out=ms_dir) does not work.
Therefore, I need to pull the list of files that I want to download first and download the files one by one. I tried beautifulsoup, requests, urllib. But all the solutions seem over-complicated for a problem that was probably solved a million times ten years ago, or don't work at all.
However, e.g.
import requests
response = requests.get(url, params=params)
InvalidSchema: No connection adapters were found for...
import urllib3
http = urllib3.PoolManager()
r = http.request('GET', url)
URLSchemeUnknown: Not supported URL scheme ftp
And so on. I am not sure what I am doing wrong here.
import ftplib
from urllib.parse import urlparse
def get_files_from_ftp_directory(url):
url_parts = urlparse(url)
domain = url_parts.netloc
path = url_parts.path
ftp = ftplib.FTP(domain)
ftp.login()
ftp.cwd(path)
filenames = ftp.nlst()
ftp.quit()
return filenames
get_files_from_ftp_directory(URL)
Thanks, I was using the whole URL instead of just the domain to login. I use this function to get the filenames and then download them with the more comfortable wget api.
I am trying to write output from API request(passing through shell command) to JSON file using python.
import os
assignments = os.system("curl -u https://apitest.com/api/-u domain:jsdjbfkjsbdfden")
Getting a response in string format, How I can save this response to a JSON file
I tried with the request library in python with same domain name and api_key not sure why i am getting 404 error "{"error":"Invalid api id. Verify your subdomain parameter"}"
import requests
from requests.auth import HTTPBasicAuth
url = "https://apitest.com/api/"
headers = {"SUBDOMAIN":"domain","api_key": "jsdjbfkjsbdfden"}
authParams = HTTPBasicAuth('username#gmail.com', 'password#')
response = requests.get(url,headers=headers,auth = authParams)
Any help would be appreciated.
You should be using the requests library instead of system calls.
import requests
r = requests.get('https://postman-echo.com/get?foo1=bar1&foo2=bar2')
print(r.content)
Writing to a file is covered in many tutorials across the internet such as w3schools and has been covered extensively on StackOverflow already.
is It not easier to use a "requests" libarary to make a queries ?
import requests
link = "" #your link
myobj = {'somekey': 'somevalue'}
r = requests.post(link, data=myobj)
r.status_code
If you have to use a command:
import os
assignments = os.system("curl -u https://apitest.com/api/-u domain:jsdjbfkjsbdfden > somefile")
There's no real reason to use python's requests module persé except for purity, however keeping it pure python helps portability.
I have tried to upload a pdf by sending a POST Request to an API in R and in Python but I am not having a lot of success.
Here is my code in R
library(httr)
url <- "https://envoc-apply-api.azurewebsites.net/api/apply"
POST(url, body = upload_file("filename.pdf"))
The status I received is 500 when I want a status of 202
I have also tried with the exact path instead of just the filename but that comes up with a file does not exist error
My code in Python
import requests
url ='https://envoc-apply-api.azurewebsites.net/api/apply'
files = {'file': open('filename.pdf', 'rb')}
r = requests.post(url, files=files)
Error I received
FileNotFoundError: [Errno 2] No such file or directory: 'filename.pdf'
I have been trying to use these to guides as examples.
R https://cran.r-project.org/web/packages/httr/vignettes/quickstart.html
Python http://requests.readthedocs.io/en/latest/user/quickstart/
Please let me know if you need any more info.
Any help will be appreciated.
You need to specify a full path to the file:
import requests
url ='https://envoc-apply-api.azurewebsites.net/api/apply'
files = {'file': open('C:\Users\me\filename.pdf', 'rb')}
r = requests.post(url, files=files)
or something like that: otherwise it never finds filename.pdf when it tries to open it.
I am writing a python script, which will save pdf file locally according to the format given in URL. for eg.
https://Hostname/saveReport/file_name.pdf #saves the content in PDF file.
I am opening this URL through python script :
import webbrowser
webbrowser.open("https://Hostname/saveReport/file_name.pdf")
The url contains lots of images and text. Once this URL is opened i want to save a file in pdf format using python script.
This is what i have done so far.
Code 1:
import requests
url="https://Hostname/saveReport/file_name.pdf" #Note: It's https
r = requests.get(url, auth=('usrname', 'password'), verify=False)
file = open("file_name.pdf", 'w')
file.write(r.read())
file.close()
Code 2:
import urllib2
import ssl
url="https://Hostname/saveReport/file_name.pdf"
context = ssl._create_unverified_context()
response = urllib2.urlopen(url, context=context) #How should i pass authorization details here?
html = response.read()
In above code i am getting: urllib2.HTTPError: HTTP Error 401: Unauthorized
If i use Code 2, how can i pass authorization details?
I think this will work
import requests
import shutil
url="https://Hostname/saveReport/file_name.pdf" #Note: It's https
r = requests.get(url, auth=('usrname', 'password'), verify=False,stream=True)
r.raw.decode_content = True
with open("file_name.pdf", 'wb') as f:
shutil.copyfileobj(r.raw, f)
One way you can do that is:
import urllib3
urllib3.disable_warnings()
url = r"https://websitewithfile.com/file.pdf"
fileName = r"file.pdf"
with urllib3.PoolManager() as http:
r = http.request('GET', url)
with open(fileName, 'wb') as fout:
fout.write(r.data)
You can try something like :
import requests
response = requests.get('https://websitewithfile.com/file.pdf',verify=False, auth=('user', 'pass'))
with open('file.pdf','w') as fout:
fout.write(response.read()):
For some files - at least tar archives (or even all other files) you can use pip:
import sys
from subprocess import call, run, PIPE
url = "https://blabla.bla/foo.tar.gz"
call([sys.executable, "-m", "pip", "download", url], stdout=PIPE, stderr=PIPE)
But you should confirm that the download was successful some other way as pip would raise error for any files that are not archives containing setup.py, hence stderr=PIPE (Or may be you can determine if the download was successful by parsing subprocess error message).
Basically i need a program that given a URL, it downloads a file and saves it. I know this should be easy but there are a couple of drawbacks here...
First, it is part of a tool I'm building at work, I have everything else besides that and the URL is HTTPS, the URL is of those you would paste in your browser and you'd get a pop up saying if you want to open or save the file (.txt).
Second, I'm a beginner at this, so if there's info I'm not providing please ask me. :)
I'm using Python 3.3 by the way.
I tried this:
import urllib.request
response = urllib.request.urlopen('https://websitewithfile.com')
txt = response.read()
print(txt)
And I get:
urllib.error.HTTPError: HTTP Error 401: Authorization Required
Any ideas? Thanks!!
You can do this easily with the requests library.
import requests
response = requests.get('https://websitewithfile.com/text.txt',verify=False, auth=('user', 'pass'))
print(response.text)
to save the file you would type
with open('filename.txt','w') as fout:
fout.write(response.text):
(I would suggest you always set verify=True in the resquests.get() command)
Here is the documentation:
Doesn't the browser also ask you to sign in? Then you need to repeat the request with the added authentication like this:
Python urllib2, basic HTTP authentication, and tr.im
Equally good: Python, HTTPS GET with basic authentication
If you don't have Requests module, then the code below works for python 2.6 or later. Not sure about 3.x
import urllib
testfile = urllib.URLopener()
testfile.retrieve("https://randomsite.com/file.gz", "/local/path/to/download/file")
You can try this solution: https://github.qualcomm.com/graphics-infra/urllib-siteminder
import siteminder
import getpass
url = 'https://XYZ.dns.com'
r = siteminder.urlopen(url, getpass.getuser(), getpass.getpass(), "dns.com")
Password:<Enter Your Password>
data = r.read() / pd.read_html(r.read()) # need to import panda as pd for the second one