django how to download a file from the internet - python

I want to have a user input a file URL and then have my django app download the file from the internet.
My first instinct was to call wget inside my django app, but then I thought there may be another way to get this done. I couldn't find anything when I searched. Is there a more django way to do this?

You are not really dependent on Django for this.
I happen to like using requests library.
Here is an example:
import requests
def download(url, path, chunk=2048):
req = requests.get(url, stream=True)
if req.status_code == 200:
with open(path, 'wb') as f:
for chunk in req.iter_content(chunk):
f.write(chunk)
f.close()
return path
raise Exception('Given url is return status code:{}'.format(req.status_code))
Place this is a file and import into your module whenever you need it.
Of course this is very minimal but this will get you started.

You can use urlopen from urllib2 like in this example:
import urllib2
pdf_file = urllib2.urlopen("http://www.example.com/files/some_file.pdf")
with open('test.pdf','wb') as output:
output.write(pdf_file.read())
For more information, read the urllib2 docs.

Related

Python script to download PDF not downloading the PDF?

I have a Python 3.10 script to download a PDF from a URL, I get no errors but when I run the code the PDF does not download. I've done a sanity check to ensure the PDF is actually on the URL (which it is)
I'm not sure if this maybe has something to do with HTTP/ HTTPS? This site does have an expired HTTPS certificate, but it is a government site and this is really for testing only so I am not worried about that and can ignore the error
from fileinput import filename
import os
import os.path
from datetime import datetime
import urllib.request
import requests
import urllib3
urllib3.disable_warnings()
resp = requests.get('http:// url domain .org', verify=False)
urllib.request.urlopen('http:// my url .pdf')
filename = datetime.now().strftime("%Y_%m_%d-%I_%M_%S_%p")
save_path = "C:/Users/bob/Desktop/folder"
Or maybe is the issue something to do with urllib3 ignoring the error and urllib downloading the file?
Redacted the specific URL here
The urllib.request.urlopen method doesn't save the remote URL to a file -- it returns a response object that can be treated as a file-like object. You could do something like:
response = urllib.request.urlopen('http:// my url .pdf')
with open('filename.pdf') as fd:
fd.write(response.read())
The urllib.request.urlretrieve method, on the other hand, will take care of writing the remote content to a local file. You would use it like this to write the PDF file to a local file named filename.pdf:
response = urllib.request.urlretrieve('http://my url .pdf',
filename='filename.pdf')
See the documentation for information about the return value from the urlretrieve method.

How can I input a filename and download the file in Python?

I have data base of file. I'm writing a program to ask the user to input file name and using that input to find the file, download it,make a folder locally and save the file..which module in Python should be used?
Can be as small as this:
import requests
my_filename = input('Please enter a filename:')
my_url = 'http://www.somedomain/'
r = requests.get(my_url + my_filename, allow_redirects=True)
with open(my_filename, 'wb') as fh:
fh.write(r.content)
Well, do you have the database online?
If so I would suggest you the requests module, very pythonic and fast.
Another great module based on requests is robobrowser.
Eventually, you may need beautiful soup to parse the HTML or XML data.
I would avoid using selenium because it's designed for web-testing, it needs a browser and its webdriver and it's pretty slow. It doesn't fit your needs at all.
Finally, to interact with the database I'd use sqlite3
Here a sample:
from requests import Session
import os
filename = input()
with Session() as session:
url = f'http://www.domain.example/{filename}'
try:
response = session.get(url)
except requests.exceptions.ConnectionError:
print('File not existing')
download_path = f'C:\\Users\\{os.getlogin()}\\Downloads\\your application'
os.makedirs(dowload_path, exist_ok=True)
with open(os.path.join(download_path, filename), mode='wb') as dbfile:
dbfile.write(response.content)
However, you should read how to ask a good question.

urllib: Get name of file from direct download link

Python 3. Probably need to use urllib to do this,
I need to know how to send a request to a direct download link, and get the name of the file it attempts to save.
(As an example, a KSP mod from CurseForge: https://kerbal.curseforge.com/projects/mechjeb/files/2355387/download)
Of course, the file ID (2355387) will be changed. It could be from any project, but always on CurseForge. (If that makes a difference on the way it's downloaded.)
That example link results in the file:
How can I return that file name in Python?
Edit: I should note that I want to avoid saving the file, reading the name, then deleting it if possible. That seems like the worst way to do this.
Using urllib.request, when you request a response from a url, the response contains a reference to the url you are downloading.
>>> from urllib.request import urlopen
>>> url = 'https://kerbal.curseforge.com/projects/mechjeb/files/2355387/download'
>>> response = urlopen(url)
>>> response.url
'https://addons-origin.cursecdn.com/files/2355/387/MechJeb2-2.6.0.0.zip'
You can use os.path.basename to get the filename:
>>> from os.path import basename
>>> basename(response.url)
'MechJeb2-2.6.0.0.zip'
from urllib import request
url = 'file download link'
filename = request.urlopen(request.Request(url)).info().get_filename()

Download csv file through python (url)

I work on a project and I want to download a csv file from a url. I did some research on the site but none of the solutions presented worked for me.
The url offers you directly to download or open the file of the blow I do not know how to say a python to save the file (it would be nice if I could also rename it)
But when I open the url with this code nothing happens.
import urllib
url='https://data.toulouse-metropole.fr/api/records/1.0/download/?dataset=dechets-menagers-et-assimiles-collectes'
testfile = urllib.request.urlopen(url)
Any ideas?
Try this. Change "folder" to a folder on your machine
import os
import requests
url='https://data.toulouse-metropole.fr/api/records/1.0/download/?dataset=dechets-menagers-et-assimiles-collectes'
response = requests.get(url)
with open(os.path.join("folder", "file"), 'wb') as f:
f.write(response.content)
You can adapt an example from the docs
import urllib.request
url='https://data.toulouse-metropole.fr/api/records/1.0/download/?dataset=dechets-menagers-et-assimiles-collectes'
with urllib.request.urlopen(url) as testfile, open('dataset.csv', 'w') as f:
f.write(testfile.read().decode())

downloading a file, not the contents

I am trying to automate downloading a .Z file from a website, but the file I get is 2kb when it should be around 700 kb and it contains a list of the contents of the page (ie: all the files available for download). I am able to download it manually without a problem. I have tried urllib and urllib2 and different configurations of each, but each does the same thing. I should add that the urlVar and fileName variables are generated in a different part of the code, but I have given an example of each here to demonstrate.
import urllib2
urlVar = "ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/txga1000.14d.Z"
fileName = txga1000.14d.Z
downFile = urllib2.urlopen(urlVar)
with open(fileName, "wb") as f:
f.write(downFile.read())
At least the urllib2documentation suggest you should use the Requestobject. This works with me:
import urllib2
req = urllib2.Request("ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/txga1000.14d.Z")
response = urllib2.urlopen(req)
data = response.read()
Data length seems to be 740725.
I was able to download what seems like the correct size for your file with the following python2 code:
import urllib2
filename = "txga1000.14d.Z"
url = "ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/{}".format(filename)
reply = urllib2.urlopen(url)
buf = reply.read()
with open(filename, "wb") as fh:
fh.write(buf)
Edit: The post above me was answered faster and is much better.. I thought I'd post since I tested and wrote this out anyways.

Categories

Resources