Downloading zip files with python

Downloading zip files with python - python

Using a simple code to download zip files
import requests
def download_url(url, save_path, chunk_size=128):
r = requests.get(url, stream=True)
with open(save_path, 'wb') as fd:
for chunk in r.iter_content(chunk_size=chunk_size):
fd.write(chunk)
url = 'https://www1.nseindia.com/content/historical/EQUITIES/1994/NOV/cm03NOV1994bhav.csv.zip'
save_path = 'D:/folder/Programming/Python/trading/Bhavcopy/bhavcopy.csv.zip'
download_url(url,save_path)
The end result is the creation of an invalid zip file. I tried to open the website by manually pasting the url on browser and got this
But when I open link via the original website i.e going the nse website and clicking button to download, the link works.
Additional data
Here is the link from where you try downloading the file for yourself. https://www1.nseindia.com/products/content/equities/equities/archieve_eq.htm
I'm donwloading the files from first option(Bhavcopy) for the first date for which it is available (3rd Nov 1994)

You need to send referer headers:
headers = {'Referer':'https://www1.nseindia.com'}
...
r = requests.get(url, stream=True,headers=headers)

Related

How to download a huge gz file (around 3 GB size) from a URL where there is no file name present using python

I am trying to download a file using python from a URL. However its not working and instead I am getting index.html.
Please help on same.
import requests
target_url = "https://transparency-in-coverage.uhc.com/?file=2022-07-01_United-HealthCare-Services_Third-Party-Administrator_EP1-50_C1_in-network-rates.json.gz&origin=uhc"
filename = "2022-07-01_United-HealthCare-Services_Third-Party-Administrator_EP1-50_C1_in-network-rates.json.gz"
with requests.get(target_url, stream=True) as r:
r.raise_for_status()
with open(filename, "wb") as f:
for chunk in r.iter_content(chunk_size=1024):
f.write(chunk)

That's because you're the URL you specified is for an HTML page that subsequently starts the download for the .gz file you want.
This is the link for the file:
https://mrfstorageprod.blob.core.windows.net/mrf-even/2022-07-01_ALL-SAVERS-INSURANCE-COMPANY_Insurer_PS1-50_C2_in-network-rates.json.gz?sv=2021-04-10&st=2022-07-05T22%3A19%3A13Z&se=2022-07-09T22%3A19%3A13Z&skoid=89efab61-5daa-4cf2-aa04-ce3ba9d1d1e8&sktid=db05faca-c82a-4b9d-b9c5-0f64b6755421&skt=2022-07-05T22%3A19%3A13Z&ske=2022-07-09T22%3A19%3A13Z&sks=b&skv=2021-04-10&sr=b&sp=r&sig=NaLrw2KG239S%2BpfZibvw7%2B25AAQsf9GYZ1gFK0KRN20%3D&rscd=attachment
To find it, you need to have the inspector open on the 'Network' tab whilst loading the page (or you can click on the file in the list when it loads the list of files on the page). When the download starts you'll see two files pop-up, one of which is the actual URL of the .gz file.
It does look the URL has a timestamp in it, so it might not work at a later time, I don't know.

Download file using requests python

I have a list of Url of files that open as Download Dialogue box with an option to save and open.
I'm using the python requests module to download the files. While using Python IDLE I'm able to download the file with the below code.
link = fileurl
r = requests.get(link,allow_redirects=True)
with open ("a.torrent",'wb') as code:
code.write(r.content)
But when I use this code along with for loop, the file which gets downloaded is corrupted or says unable to open.
for link in links:
name = str(links.index(link)) ++ ".torrent"
r = requests.get(link,allow_redirects=True)
with open (name,'wb') as code:
code.write(r.content)

If you are trying to download a video from a website, try
r = get(video_link, stream=True)
with open ("a.mp4",'wb') as code:
for chunk in r.iter_content(chunk_size=1024):
file.write(chunk)

Download a file without name using Python

I want to download a file, there is a hyper link in html page which does not include the file name and extension. How can I download the file using python?
For example the link is http://1.1.1.1:8080/tank-20/a/at_download/file,
but whenever I click on it the file will download and open with browser.

Use python requests to get the body of the response and write to file, this is essentially what the browser is doing when you click the link.
Try the below:
import requests
# define variables
request_url = "http://1.1.1.1:8080/tank-20/a/at_download/file"
output_file = "output.txt"
# send get request
response = requests.get(request_url)
# use 'with' to write to file
with open(output_file, 'w') as fh:
fh.write(response.content)
fh.close()

How to download files from website using PHP with Python

I have a Python script that crawls various webistes and downloads files form them. My problem is, that some of the websites seem to be using PHP, at least that's my theory since the URLs look like this: https://www.portablefreeware.com/download.php?dd=1159
The problem is that I can't get any file names or endings from a link like this and therefore can't save the file. Currently I'm only saving the URLs.
Is there any way to get to the actual file name behind the link?
This is my stripped down download code:
r = requests.get(url, allow_redirects=True)
file = open("name.something", 'wb')
file.write(r.content)
file.close()
Disclaimer: I've never done any work with PHP so please forgive any incorrect terminolgy or understanding I have of that. I'm happy to learn more though

import requests
import mimetypes
response = requests.get('https://www.portablefreeware.com/download.php?dd=1159')
content=response.content
content_type = response.headers['Content-Type']
ext= mimetypes.guess_extension(content_type)
print(content)# [ZipBinary]
print(ext)# .zip
print(content_type)#application/zip, application/octet-stream
with open("newFile."+ext, 'wb') as f:
f.write(content)
f.close()

With your use of the allow_redirects=True option, requests.get would automatically follow the URL in the Location header of the response to make another request, losing the headers of the first response as a result, which is why you can't find the file name information anywhere.
You should instead use the allow_redirects=False option so that you can the Location header, which contains the actual download URL:
import requests
url = 'https://www.portablefreeware.com/download.php?dd=1159'
r = requests.get(url, allow_redirects=False)
print(r.headers['Location'])
This outputs:
https://www.diskinternals.com/download/Linux_Reader.exe
Demo: https://replit.com/#blhsing/TrivialLightheartedLists
You can then make another request to the download URL, and use os.path.basename to obtain the name of the file to which the content will be written:
import os
url = r.headers['Location']
with open(os.path.basename(url), 'w') as file:
r = requests.get(url)
file.write(r.content)

You're using requests for downloading. This doesn't work with downloads of this kind.
Try urllib instead:
import urllib.request
urllib.request.urlretrieve(url, filepath)

You can download the file with file name get from response header.
Here's my code for a download with a progress bar and a chunk size buffer:
To display a progress bar, use tqdm. pip install tqdm
In this, chunk write is used to save memory during downloading.
import os
import requests
import tqdm
url = "https://www.portablefreeware.com/download.php?dd=1159"
response_header = requests.head(url)
file_path = response_header.headers["Location"]
file_name = os.path.basename(file_path)
with open(file_name, "wb") as file:
response = requests.get(url, stream=True)
total_length = int(response.headers.get("content-length"))
for chunk in tqdm.tqdm(response.iter_content(chunk_size=1024), total=total_length / 1024, unit="KB"):
if chunk:
file.write(chunk)
file.flush()
Progress output:
6%|▌ | 2848/46100.1640625 [00:04<01:11, 606.90KB/s]

redirectable can be bounced via DNS distributed Network any where. So the example answers above show https://www but in my case they will be resolved to Europe so my fastest local source is coming in as
https://eu.diskinternals.com/download/Linux_Reader.exe
by far the simplest is to raw curl first if its good no need to inspect or scrape
without bothering to resolve anything,
curl -o 1159.tmp https://www.portablefreeware.com/download.php?dd=1159
however I know in this case that not the expected result, so next level is
curl -I https://www.portablefreeware.com/download.php?dd=1159 |find "Location"
and that gives the result as shown by others
https://www.diskinternals.com/download/Linux_Reader.exe
but that's not the fuller picture since if we back feed that
curl.exe -K location.txt
we get
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved here.</p>
</body></html>
hence the nested redirects to
https://eu.diskinternals.com/download/Linux_Reader.exe
all of that can be command line scripted to run in loops in a line or two but I don't use Python so you will need to write perhaps a dozen lines to do similar
C:\Users\WDAGUtilityAccount\Desktop>curl -O https://eu.diskinternals.com/download/Linux_Reader.exe
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 44.9M 100 44.9M 0 0 3057k 0 0:00:15 0:00:15 --:--:-- 3640k
C:\Users\WDAGUtilityAccount\Desktop>dir /b lin*.*
Linux_Reader.exe
and from the help file yesterdays extra update (Sunday, ‎September ‎4, ‎2022) Link
curl -O https://eu.diskinternals.com/download/Uneraser_Setup.exe

Save streaming audio from URL as MP3, or even just audio file from URL as MP3

I am trying to have my server, in python 3, go grab files from URLs. Specifically, I would like to pass a URL into a function, I would like the function to go grab an audio file(of many varying formats) and save it as an MP3, probably using ffmpeg or ffmpy. If the URL also has a PDF, I would also like to save that, as a PDF. I haven't done much research on the PDF yet, but I have been working on the audio piece and wasn't sure if this was even possible.
I have looked at several questions here, but most notably;
How do I download a file over HTTP using Python?
It's a little old but I tried several methods in there and always get some sort of issue. I have tried using the requests library, urllib, streamripper, and maybe one other.
Is there a way to do this and with a recommended library?
For example, most of the ones I have tried do save something, like the html page, or an empty file called 'file.mp3' in this case.
Streamripper received a try changing user agents error.
I am not sure if this is possible, but I am sure there is something I'm not understanding here, could someone point me in the right direction?
This isn't necessarily the code I'm trying to use, just an example of something I have used that doesn't work.
import requests
url = "http://someurl.com/webcast/something"
r = requests.get(url)
with open('file.mp3', 'wb') as f:
f.write(r.content)
# Retrieve HTTP meta-data
print(r.status_code)
print(r.headers['content-type'])
print(r.encoding)
**Edit
import requests
import ffmpy
import datetime
import os
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE AUDIO/MPEG, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.MP3
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE application/pdf, THE FILE WILL
## BE SAVED AS THE CURRENT-DATE-AND-TIME.PDF
##
## THIS SCRIPT CAN BE PASSED A URL AND IF THE URL RETURNS
## HTTP HEADER FOR CONTENT TYPE other than application/pdf, OR
## audio/mpeg, THE FILE WILL NOT BE SAVED
def BordersPythonDownloader(url):
print('Beginning file download requests')
r = requests.get(url, stream=True)
contype = r.headers['content-type']
if contype == "audio/mpeg":
print("audio file")
filename = '[{}].mp3'.format(str(datetime.datetime.now()))
with open('file.mp3', 'wb+') as f:
f.write(r.content)
ff = ffmpy.FFmpeg(
inputs={'file.mp3': None},
outputs={filename: None}
)
ff.run()
if os.path.exists('file.mp3'):
os.remove('file.mp3')
elif contype == "application/pdf":
print("pdf file")
filename = '[{}].pdf'.format(str(datetime.datetime.now()))
with open(filename, 'wb+') as f:
f.write(r.content)
else:
print("URL DID NOT RETURN AN AUDIO OR PDF FILE, IT RETURNED {}".format(contype))
# INSERT YOUR URL FOR TESTING
# OR CALL THIS SCRIPT FROM ELSEWHERE, PASSING IT THE URL
#DEFINE YOUR URL
#url = 'http://archive.org/download/testmp3testfile/mpthreetest.mp3'
#CALL THE SCRIPT; PASSING IT YOUR URL
#x = BordersPythonDownloader(url)
#ANOTHER EXAMPLE WITH A PDF
#url = 'https://www.cisco.com/c/en/us/td/docs/switches/lan/catalyst6500/ios/12-2SY/configuration/guide/sy_swcg/etherchannel.pdf'
#x = BordersPythonDownloader(url)
Thanks Richard, this code works and helps me understand this better. Any suggestions for improving the above working example?

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Downloading zip files with python - python

You need to send referer headers: headers = {'Referer':'https://www1.nseindia.com'} ... r = requests.get(url, stream=True,headers=headers)

Related

How to download a huge gz file (around 3 GB size) from a URL where there is no file name present using python

Download file using requests python

Download a file without name using Python

How to download files from website using PHP with Python

Save streaming audio from URL as MP3, or even just audio file from URL as MP3

Categories

Resources