Download a file without name using Python - python

I want to download a file, there is a hyper link in html page which does not include the file name and extension. How can I download the file using python?
For example the link is http://1.1.1.1:8080/tank-20/a/at_download/file,
but whenever I click on it the file will download and open with browser.

Use python requests to get the body of the response and write to file, this is essentially what the browser is doing when you click the link.
Try the below:
import requests
# define variables
request_url = "http://1.1.1.1:8080/tank-20/a/at_download/file"
output_file = "output.txt"
# send get request
response = requests.get(request_url)
# use 'with' to write to file
with open(output_file, 'w') as fh:
fh.write(response.content)
fh.close()

Related

How to download a huge gz file (around 3 GB size) from a URL where there is no file name present using python

I am trying to download a file using python from a URL. However its not working and instead I am getting index.html.
Please help on same.
import requests
target_url = "https://transparency-in-coverage.uhc.com/?file=2022-07-01_United-HealthCare-Services_Third-Party-Administrator_EP1-50_C1_in-network-rates.json.gz&origin=uhc"
filename = "2022-07-01_United-HealthCare-Services_Third-Party-Administrator_EP1-50_C1_in-network-rates.json.gz"
with requests.get(target_url, stream=True) as r:
r.raise_for_status()
with open(filename, "wb") as f:
for chunk in r.iter_content(chunk_size=1024):
f.write(chunk)
That's because you're the URL you specified is for an HTML page that subsequently starts the download for the .gz file you want.
This is the link for the file:
https://mrfstorageprod.blob.core.windows.net/mrf-even/2022-07-01_ALL-SAVERS-INSURANCE-COMPANY_Insurer_PS1-50_C2_in-network-rates.json.gz?sv=2021-04-10&st=2022-07-05T22%3A19%3A13Z&se=2022-07-09T22%3A19%3A13Z&skoid=89efab61-5daa-4cf2-aa04-ce3ba9d1d1e8&sktid=db05faca-c82a-4b9d-b9c5-0f64b6755421&skt=2022-07-05T22%3A19%3A13Z&ske=2022-07-09T22%3A19%3A13Z&sks=b&skv=2021-04-10&sr=b&sp=r&sig=NaLrw2KG239S%2BpfZibvw7%2B25AAQsf9GYZ1gFK0KRN20%3D&rscd=attachment
To find it, you need to have the inspector open on the 'Network' tab whilst loading the page (or you can click on the file in the list when it loads the list of files on the page). When the download starts you'll see two files pop-up, one of which is the actual URL of the .gz file.
It does look the URL has a timestamp in it, so it might not work at a later time, I don't know.

how can I download files from the URL that does not have file name in it and how to get to know its file extension or file type

this is an URL example "https://procurement-notices.undp.org/view_file.cfm?doc_id=257280"
if you put it in the browser a file will start downloading in your system.
I want to download this file using python and store it somewhere on my computer
this is how tried
import requests
# first_url = 'https://readthedocs.org/projects/python-guide/downloads/pdf/latest/'
second_url="https://procurement-notices.undp.org/view_file.cfm?doc_id=257280"
myfile = requests.get(second_url , allow_redirects=True)
# this works for the first URL
# open('example.pdf' , 'wb').write(myfile.content)
# this did't work for both of them
# open('example.txt' , 'wb').write(myfile.content)
# this works for the second URL
open('example.doc' , 'wb').write(myfile.content)
first: if I put the first_url in the browser it will download a pdf file, putting second_url will download a .doc file How can I know what type of file will the URL give to us or what type of file will be downloaded so that I use the correct open(...) method?
second: If I use the second URL in the browser a file with the name "T__proc_notices_notices_080_k_notice_doc_79545_770020123.docx" starts downloading. how can I know this file name when I try to download the file?
if you know any better solution kindly let me know for the implementation.
kindly have a quick look at Downloading Files from URLs and zip downloaded files in python question aswell
myfile.headers['content-type'] will give you the MIME-type of the URL's content and myfile.headers['content-disposition'] gives you info like filename etc. (if the response contains this header at all)
you can use response headers content-type like for first url it is application/pdf and sencond url for is application/msword you save file according to it. you can make extension dictinary where you can store possible file format and their types and match with it. your second question is also same like this one so i am taking your two urls from that question and for file name i am using just integers
all_Urls = ['https://omextemplates.content.office.net/support/templates/en-us/tf16402488.dotx' ,
'https://procurement-notices.undp.org/view_file.cfm?doc_id=257280']
extension_dict = {'application/vnd.openxmlformats-officedocument.wordprocessingml.document':'.docx',
'application/vnd.openxmlformats-officedocument.wordprocessingml.template':'.dotx',
'application/vnd.ms-word.document.macroEnabled.12':'.docm',
'application/vnd.ms-word.template.macroEnabled.12':'.dotm',
'application/pdf':'.pdf',
'application/msword':'.doc'}
for i,url in enumerate(all_Urls):
resp = requests.get(url)
response_headers = resp.headers
file_extension = extensio_dict[response_headers['Content-Type']]
with open(f"{i}.{file_extension}",'wb') as f:
f.write(resp.content)
for MIME-Type see this answer

Download file using requests python

I have a list of Url of files that open as Download Dialogue box with an option to save and open.
I'm using the python requests module to download the files. While using Python IDLE I'm able to download the file with the below code.
link = fileurl
r = requests.get(link,allow_redirects=True)
with open ("a.torrent",'wb') as code:
code.write(r.content)
But when I use this code along with for loop, the file which gets downloaded is corrupted or says unable to open.
for link in links:
name = str(links.index(link)) ++ ".torrent"
r = requests.get(link,allow_redirects=True)
with open (name,'wb') as code:
code.write(r.content)
If you are trying to download a video from a website, try
r = get(video_link, stream=True)
with open ("a.mp4",'wb') as code:
for chunk in r.iter_content(chunk_size=1024):
file.write(chunk)

Downloading pdf with urllib.requests writes a pdf that cannot be opened

I am trying to download a pdf file from a website with authentication and save it locally. This code appears to run but saves a pdf file that cannot be opened ("it is either not a supported file type or because the file has been damaged").
import urllib.request
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm=None,
uri=r'http://website/',
user='admin',
passwd='pass')
opener = urllib.request.build_opener(auth_handler)
urllib.request.install_opener(opener)
url = 'http://www.website.com/example.pdf'
res = opener.open(url)
urllib.request.urlretrieve(url, "example.pdf")
Sounds like you have a bad URL. Make sure you get the ".pdf" file on your browser when you enter that URL in the browser.
EDIT:
I meant to say, your URL should be like this: "http://www.cse.msu.edu/~chooseun/Test2.pdf" Your code must be able to pull off this pdf from the web address. Hope this helps.
I think the problem is with "urllib.request.urlretrieve(url, "example.pdf")". After you get through the authentication save the file using something like this instead:
pdfFile = urllib.request.urlopen(url)
file = open("example.pdf", 'wb')
file.write(pdfFile.read())
file.close()

Creating a URLretreive function using urllib2 in python

I want to have a function which can save a page from the web into a designated path using urllib2.
Problem with urllib is that it doesn't check for Error 404, but unfortunately urllib2 doesn't have such a function although it can check for http errors.
How can i make a function to save the file permanently to a path?
def save(url,path):
g=urllib2.urlopen(url)
*do something to save g to 'path'*
Just use .read() to get the contents and write it to a file path.
def save(url,path):
g = urllib2.urlopen(url)
with open(path, "w") as fH:
fH.write(g.read())

Categories

Resources