How to download file using python when url doesn't change [duplicate]

How to download file using python when url doesn't change [duplicate] - python

This question already has answers here:
Download Returned Zip file from URL
(9 answers)
Closed last year.
I want to download a file from a webpage. That webpage only has one .zip file (that's what I want to download), but when I click on the .zip file, it starts download but the URL doesn't change (the URL still remains of the form http://ldn2800:8080/id=2800). How can I download this using python, considering that there is no URL of the form http://example.com/1.zip?
Also, when I directly go to the page http://ldn2800:8080/id=2800, it just opens that page with the .zip file but doesn't download it without clicking. How do download it using python?
UPDATE: Right now I'm doing it this way:
if (str(dict.get('id')) == winID):
#or str(dict.get('id')) == linuxID):
#if str(dict.get('number')) == buildNo:
buildTypeId = dict.get('id')
ID = dict.get('id')
downloadURL = "http://example:8080/viewType.html?buildId=26009&tab=artifacts&buildTypeId=" + ID
directory = BindingsDest + "\\" + buildNo
if not os.path.exists(directory):
os.makedirs(directory)
fileName = None
if buildTypeId == linuxID:
fileName = linuxLib + "-" + buildNo + ".zip"
elif buildTypeId == winID:
fileName = winLib + "-" + buildNo + ".zip"
if fileName is not None:
print(dict)
downloadFile(downloadURL, directory, fileName)
def downloadFile(downloadURL, directory, fileName, user=user, password=password):
if user is not None and password is not None:
request = requests.get(downloadURL, stream=True, auth=(user, password))
else:
request = requests.get(downloadURL, stream=True)
with open(directory + "\\" + fileName, 'wb') as handle:
for block in request.iter_content(1024):
if not block:
break
handle.write(block)
But, it just creates a zip in the required location but that zip can't be opened and has nothing.
Can something like this be done: like searching for the filename on the webpage and then download that pattern matched?

Check the HTTP status code to make sure that no error happened. You may use the builtin method raise_for_status to do so: https://requests.readthedocs.io/en/master/api/#requests.Response.raise_for_status
def downloadFile(downloadURL, directory, fileName, user=user, password=password):
if user is not None and password is not None:
request = requests.get(downloadURL, stream=True, auth=(user, password))
else:
request = requests.get(downloadURL, stream=True)
request.raise_for_status()
with open(directory + "\\" + fileName, 'wb') as handle:
for block in request.iter_content(1024):
if not block:
break
handle.write(block)
Are you sure that there is no networking issue such as proxy/fw/etc ?
EDIT: according to your above comment, I'm not sure that this answers your actual problem. Revised answer:
You access a web page containing a link to a zip file. This link, you say, is the same as the page itself. But if you click on it in a browser, it downloads the file instead of reaching the HTML page again. That's strange but can be explained in various ways. Please copy/paste the whole HTML page code (including the link to the zip file), that will probably help us understanding the issue.

Related

Download a single file from a shared Dropbox's folder without having the share link of this file

As the title says, I have access to a shared folder where some files are uploaded. I just want to donwload an specific file, called "db.dta". So, I have this script:
def download_file(url, filename):
url = url
file_name = filename
with open(file_name, "wb") as f:
print("Downloading %s" % file_name)
response = requests.get(url, stream=True)
total_length = response.headers.get('content-length')
if total_length is None: # no content length header
f.write(response.content)
else:
dl = 0
total_length = int(total_length)
for data in response.iter_content(chunk_size=4096):
dl += len(data)
f.write(data)
done = int(50 * dl / total_length)
sys.stdout.write("\r[%s%s]" % ('=' * done, ' ' * (50-done)) )
sys.stdout.flush()
print(" ")
print('Descarga existosa.')
It actually download shares links of files if I modify the dl=0 to 1, like this:
https://www.dropbox.com/s/ajklhfalsdfl/db_test.dta?dl=1
The thing is, I dont have the share link of this particular file in this shared folder, so if I use the url of the file preview, I get an error of denied access (even if I change dl=0 to 1).
https://www.dropbox.com/sh/a630ksuyrtw33yo/LKExc-MKDKIIWJMLKFJ?dl=1&preview=db.dta
Error given:
dropbox.exceptions.ApiError: ApiError('22eaf5ee05614d2d9726b948f59a9ec7', GetSharedLinkFileError('shared_link_access_denied', None))
Is there a way to download this file?

If you have the shared link to the parent folder and not the specific file you want, you can use the /2/sharing/get_shared_link_file endpoint to download just the specific file.
In the Dropbox API v2 Python SDK, that's the sharing_get_shared_link_file method (or sharing_get_shared_link_file_to_file). Based on the error output you shared, it looks like you are already using that (though not in the particular code snippet you posted).
Using that would look like this:
import dropbox
dbx = dropbox.Dropbox(ACCESS_TOKEN)
folder_shared_link = "https://www.dropbox.com/sh/a630ksuyrtw33yo/LKExc-MKDKIIWJMLKFJ"
file_relative_path = "/db.dat"
res = dbx.sharing_get_shared_link_file(url=folder_shared_link, path=file_relative_path)
print("Metadata: %s" % res[0])
print("File data: %s bytes" % len(res[1].content))
(You mentioned both "db.dat" and "db.dta" in your question. Make sure you use whichever is actually correct.)
Additionally, note if you using a Dropbox API app registered with the "app folder" access type: there's currently a bug that can cause this shared_link_access_denied error when using this method with an access token for an app folder app.

Can you use requests.get() without HTTPS/HTTP

I have a code that takes a random flag from the Flag Mashup Bot and downloads it:
import requests
DIR = 'C:/Users/myUser/Desktop/Flags/'
URL = 'https://flagsmashupbot.pythonanywhere.com/mashup?passwd=fl4gsm4shupb0t'
def download_image(img_url: str, dest_dir: str):
img_data = requests.get(img_url).content
with open(dest_dir, 'wb') as file:
file.write(img_data)
if __name__ == "__main__":
response = requests.get(URL)
if response.ok:
page = response.text
image_url = page[page.find('data:image', page.find('data:image') + 1):page.find('" download=')]
name = page[page.find('" download=') + 12:page.find('_FlagsMashupBot.png"')]
DIR += (name + '.png')
print(DIR)
download_image(image_url, DIR)
When I run it, I get the following error on line 8:
requests.exceptions.InvalidSchema: No connection adapters were found for [image URL]
When I read about it, I realized that it's because the image URLs from the site don't start with "https://" (or at least that's what I understood).
So, is there a way to use requests.get() without having https at the start of the URL?

The reason you would not get an HTTP/HTTPs based URL is since the data is in href format pointing to the base64 encoded version of the image.
You may use urllib to open up the href download link and save the contents to a file:
data = 'data:image/png;charset=utf-8;base64,iVBORw0KGgoAAAANSUhEUgAABwgAAASwCAIAAABggIlUAAAABmJLR0QA/wD/AP+gvaeTAAAgAElEQVR4nOzdaZhd92Hf97Pde+fOYGYAzAx2gCBBivsCgqtEUl7piF.......'
response = urllib.request.urlopen(data)
with open('image.png', 'wb') as f:
f.write(response.file.read())

Download zip file with Django

I'm quite new on Django and i'm looking for a way to dwonload a zip file from my django site but i have some issue when i'm running this piece of code:
def download(self):
dirName = settings.DEBUG_FOLDER
name = 'test.zip'
with ZipFile(name, 'w') as zipObj:
# Iterate over all the files in directory
for folderName, subfolders, filenames in os.walk(dirName):
for filename in filenames:
# create complete filepath of file in directory
filePath = os.path.join(folderName, filename)
# Add file to zip
zipObj.write(filePath, basename(filePath))
path_to_file = 'http://' + sys.argv[-1] + '/' + name
resp= {}
# Grab ZIP file from in-memory, make response with correct MIME-type
resp = HttpResponse(content_type='application/zip')
# ..and correct content-disposition
resp['Content-Disposition'] = 'attachment; filename=%s' % smart_str(name)
resp['X-Sendfile'] = smart_str(path_to_file)
return resp
I get:
Exception Value:
<HttpResponse status_code=200, "application/zip"> is not JSON serializable
I tried to change the content_type to octet-stream but it doesn't work
And to use a wrapper as followw:
wrapper = FileWrapper(open('test.zip', 'rb'))
content_type = 'application/zip'
content_disposition = 'attachment; filename=name'
# Grab ZIP file from in-memory, make response with correct MIME-type
resp = HttpResponse(wrapper, content_type=content_type)
# ..and correct content-disposition
resp['Content-Disposition'] = content_disposition
I didn't find useful answer so far but maybe I didn't search well, so if it seems my problem had been already traited, feel free to notify me
Thank you very much for any help

You have to send the zip file as byte
response = HttpResponse(zipObj.read(), content_type="application/zip")
response['Content-Disposition'] = 'attachment; filename=%s' % smart_str(name)
return response

I would do like this:
(Caveat I use wsl so the python function will make use of cmd lines)
In view:
import os
def zipdownfun(request):
""" Please establish in settings.py where media file should be downloaded from.
In my case is media with a series of other folders inside. Media folder is at the same level of project root folder, where settings.py is"""
file_name = os.path.join(MEDIA_URL,'folder_where_your_file_is','file_name.zip')
"""let us put the case that you have zip folder in media folder"""
file_folder_path = os.path.join(MEDIA_URL,'saving_folder')
"""The command line takes as first variable the name of the
future zip file and as second variable the destination folder"""
cmd = f'zip {file_name} {file_folder_path}'
"""With os I open a process in the background so that some magic
happens"""
os.system(cmd)
"""I don't know what you want to do with this, but I placed the
URL of the file in a button for the download, so you will need
the string of the URL to place in href of an <a> element"""
return render(request,'your_html_file.html', {'url':file_name})
The db I have created, will be updated very often. I used a slightly different version of this function with -r clause since I had to zip, each time, a folder. Why I did this? The database I have created has to allow the download of this zipped folder. This folder will be updated daily. So this function basically overwrites the file each time that is downloaded. It will be so fresh of new data each time.
Please refer to this page to understand how to create a button for the download of the generated file.
Take as reference approach 2. The URL variable that you are passing to the Django template should be used at the place of the file (screenshot attached)
I hope it can help!

App Engine - download files from Cloud Storage

I am using Python 2.7 and Reportlab to create .pdf files for display/print in my app engine system. I am using ndb.Model to store the data if that matters.
I am able to produce the equivalent of a bank statement for a single client on-line. That is; the user clicks the on-screen 'pdf' button and the .pdf statement appears on screen in a new tab, exactly as it should.
I am using the following code to save .pdf files to Google Cloud Storage successfully
buffer = StringIO.StringIO()
self.p = canvas.Canvas(buffer, pagesize=portrait(A4))
self.p.setLineWidth(0.5)
try:
# create .pdf of .csv data here
finally:
self.p.save()
pdfout = buffer.getvalue()
buffer.close()
filename = getgcsbucket() + '/InvestorStatement.pdf'
write_retry_params = gcs.RetryParams(backoff_factor=1.1)
try:
gcs_file = gcs.open(filename,
'w',
content_type='application/pdf',
retry_params=write_retry_params)
gcs_file.write(pdfout)
except:
logging.error(traceback.format_exc())
finally:
gcs_file.close()
I am using the following code to create a list of all files for display on-screen, it shows all the files stored above.
allfiles = []
bucket_name = getgcsbucket()
rfiles = gcs.listbucket(bucket_name)
for rfile in rfiles:
allfiles.append(rfile.filename)
return allfiles
My screen (html) shows rows of ([Delete] and Filename). When the user clicks the [Delete] button, the following delete code snippet works (filename is /bucket/filename, complete)
filename = self.request.get('filename')
try:
gcs.delete(filename)
except gcs.NotFoundError:
pass
My question - given I have a list of files on-screen, I want the user to click on the filename and for that file to be downloaded to the user's computer. In Google's Chrome Browser, this would result in the file being downloaded, with it's name displayed on the bottom left of the screen.
One other point, the above example is for .pdf files. I will also have to show .csv files in the list and would like them to be downloaded as well. I only want the files to be downloaded, no display is required.
So, I would like a snippet like ...
filename = self.request.get('filename')
try:
gcs.downloadtousercomputer(filename) ???
except gcs.NotFoundError:
pass
I think I have tried everything I can find both here and elsewhere. Sorry I have been so long-winded. Any hints for me?

To download a file instead of showing it in the browser, you need to add a header to your response:
self.response.headers["Content-Disposition"] = 'attachment; filename="%s"' % filename
You can specify the filename as shown above and it works for any file type.

One solution you can try is to read the file from the bucket and print the content as the response with the correct header:
import cloudstorage
...
def read_file(self, filename):
bucket_name = "/your_bucket_name"
file = bucket_name + '/' + filename
with cloudstorage.open(file) as cloudstorage_file:
self.response.headers["Content-Disposition"] = str('attachment;filename=' + filename)
contents = cloudstorage_file.read()
cloudstorage_file.close()
self.response.write(contents)
Here filename could be something you are sending as GET parameter and needs to be a file that exist on your bucket or you will raise an exception.
[1] Here you will find a sample.
[1]https://cloud.google.com/appengine/docs/standard/python/googlecloudstorageclient/read-write-to-cloud-storage

urllib2 File Download Fails...due to System Security?

I'm trying to download files (approximately 1 - 1.5MB/file) from a NASA server (URL), but to no avail! I've tried a few things with urllib2 and run into two results:
I create a new file on my machine that is only ~200KB and has nothing in it
I create a 1.5MB file on my machine that has nothing in it!
By "nothing in it" I mean when I open the file (these are hdf5 files, so I open them in hdfView) I see no hierarchical structure...literally looks like an empty h5 file. But, when I open the file in a text editor I can see there is SOMETHING there (it's binary, so in text it looks like...well, binary).
I think I am using urllib2 appropriately, though I have never successfully used urllib2 before. Would you please comment on whether what I am doing is right or not, and suggest something better?
from urllib2 import Request, urlopen, URLError, HTTPError
base_url = 'http://avdc.gsfc.nasa.gov/index.php?site=1480884223&id=40&go=list&path=%2FH2O%2F/2010'
file_name = 'download_2.php?site=1480884223&id=40&go=download&path=%2FH2O%2F2010&file=MLS-Aura_L2GP-H2O_v03-31-c01_2010d360.he5'
url = base_url + file_name
req = Request(url)
# Open the url
try:
f = urlopen(req)
print "downloading " + url
# Open our local file for writing
local_file = open('test.he5', "w" + file_mode)
#Write to our local file
local_file.write(f.read())
local_file.close()
except HTTPError, e:
print "HTTP Error:",e.code , url
except URLError, e:
print "URL Error:",e.reason , url
I got this script (which seems to be the closest to working) from here.
I am unsure what the file_name should be. I looked at the page source information of the archive and pulled the file name listed there (not the same as what shows up on the web page), and doing this yields the 1.5MB file that shows nothing in hdfview.

You are creating an invalid url:
base_url = 'http://avdc.gsfc.nasa.gov/index.php?site=1480884223&id=40&go=list&path=%2FH2O%2F/2010'
file_name = 'download_2.php?site=1480884223&id=40&go=download&path=%2FH2O%2F2010&file=MLS-Aura_L2GP-H2O_v03-31-c01_2010d360.he5'
url = base_url + file_name
You probably meant:
base_url = 'http://avdc.gsfc.nasa.gov/'
file_name = 'download_2.php?site=1480884223&id=40&go=download&path=%2FH2O%2F2010&file=MLS-Aura_L2GP-H2O_v03-31-c01_2010d360.he5'
When downloading a large file, it's better to use a buffered copy from filehandle to filehandle:
import shutil
# ...
f = urlopen(req)
with open('test.he5', "w" + file_mode) as local_file:
shutil.copyfileobj(f, local_file)
.copyfileobj will efficiently load from the open urllib connection and write to the open local_file file handle. Note the with statement, when the code block underneath concludes it'll automatically close the file for you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to download file using python when url doesn't change [duplicate] - python

Related

Download a single file from a shared Dropbox's folder without having the share link of this file

Can you use requests.get() without HTTPS/HTTP

Download zip file with Django

App Engine - download files from Cloud Storage

urllib2 File Download Fails...due to System Security?

Categories

Resources