Download File or Video from URL (Python 3) - python

ı tried diffrent libs to download video from url. But even one of them didnt worked.
Here is the link, that ı trying: https://td-cdn.pw/api.php?download=tikdown.org-42500282235.mp4
If it opened once, it directly asking to download, not like a html video. And ı want to save this video to local folder.
If you guys help me ı would be so proud :) (btw ım freaking ı try to solve it last 4 hours)

There are two steps to getting file downloaded in Python so the process is os independant. I would recommend using inbuilt requests library. We use it to make requests to server and fetch content. Then we write the data into a file in next step.
import requests
URL = "https://td-cdn.pw/api.php?download=tikdown.org-42500282235.mp4"
FILE_TO_SAVE_AS = "myvideo.mp4" # the name you want to save file as
resp = requests.get(URL) # making requests to server
with open(FILE_TO_SAVE_AS, "wb") as f: # opening a file handler to create new file
f.write(resp.content) # writing content to file
But this is just a simple example. You can implement other features like try/catch blocks to catch any exceptions or use custom headers while making requests.

Related

Python Download NetCDF file from a website which provides the file after clicking a button

If you go to this website:
https://ruc.noaa.gov/raobs/Data_request.cgi?byr=2010&bmo=5&bdy=26&bhr=12&eyr=2010&emo=5&edy=27&ehr=15&shour=All+Times&ltype=All+Levels&wunits=Knots&access=WMO+Station+Identifier
Type "72632" into the box, and change "Format" to "NetCDF format (binary)", and then click "Continue Data Access", a NetCDF file is downloaded to your computer.
If I use the Chrome developer tools to track network activity after clicking this button, I can see the the "Request URL" which leads to this file being downloaded is:
https://ruc.noaa.gov/raobs/GetRaobs.cgi?shour=All+Times&ltype=All+Levels&wunits=Knots&bdate=2010052612&edate=2010052715&access=WMO+Station+Identifier&view=NO&StationIDs=72632&osort=Station+Series+Sort&oformat=NetCDF+format+%28Binary%29
If you copy and paste that URL into a web browser, the file is downloaded.
What I want to do is use Python to take a URL formatted like the one above, and retrieve the associated NetCDF file.
I've had luck in the past doing something like
url = 'https://ruc.noaa.gov/raobs/GetRaobs.cgi?shour=All+Times&ltype=All+Levels&wunits=Knots&bdate=2010052612&edate=2010052715&access=WMO+Station+Identifier&view=NO&StationIDs=72632&osort=Station+Series+Sort&oformat=NetCDF+format+%28Binary%29'
da = xr.open_dataset(url)
But that doesn't work in this case:
OSError: [Errno -75] NetCDF: Malformed or unexpected Constraint: b'https://ruc.noaa.gov/raobs/GetRaobs.cgi?shour=All+Times&ltype=All+Levels&wunits=Knots&bdate=2010052612&edate=2010052715&access=WMO+Station+Identifier&view=NO&StationIDs=72632&osort=Station+Series+Sort&oformat=NetCDF+format+%28Binary%29'
I've also tried to wget the URL, but that just downloads a ".cgi" file which I don't think is useful.
Thanks for any help!
You could use my package nctoolkit to download the file and then export to xarray. This will save the file to a temporary directory, but will remove it once the session is done.
import nctoolkit as nc
import xarray as xr
ds = nc.open_url("https://ruc.noaa.gov/raobs/GetRaobs.cgi?shour=All+Times&ltype=All+Levels&wunits=Knots&bdate=2010052612&edate=2010052715&access=WMO+Station+Identifier&view=NO&StationIDs=72632&osort=Station+Series+Sort&oformat=NetCDF+format+%28Binary%29")
ds_xr = ds.to_xarray()

python request.get response object to download xlsx file from url saves excel file but file is smaller and not accessible

Although I use stackoverflow regularly to solve problems, this is my first post :). I hope you can help!
I have a link that automatically downloads a .xlsx file. You get directed to a simple screen with a link if the download does not start automatically.
I can download the file and save it to disk.
However, the .xlsx file that is saved is smaller (2kb vs 6kb) and I cannot open it. When I open it with LibreOffice Calc it asks to select a language to use for import and then nothing happens.
Maybe some encoding/decoding problem?
I have tried several different methods to retrieve the file and save it to disk from various threads, but often with the same result.
This is my code:
resp = requests.get(url)
with open('filename.xlsx', 'wb') as output:
output.write(resp.content)

How to encode a video response in python?

In a request that I made I received a byte response and I know it is a response of a video. and it's status code was 200. And I don't know how to use it. I mean I tried to encode it into utf-8 and then save it to a file but it is not playable. media players is unable to read it's content here's the request that I made
import requests
resp = requests.get('https://bcboltsony-a.akamaihd.net/media/v1/hls/v4/aes128/5182475815001/4ded6ac4-6f8b-4da2-8194-db2391d5e331/164fe5c5-15a3-4997-b4c6-7dd4b95f9c57/92410c6d-c565-4341-8650-1d40a795ece2/5x/segment1.ts?akamai_token=exp=1589337578~acl=/media/v1/hls/v4/aes128/5182475815001/4ded6ac4-6f8b-4da2-8194-db2391d5e331/164fe5c5-15a3-4997-b4c6-7dd4b95f9c57/92410c6d-c565-4341-8650-1d40a795ece2/*~hmac=bf9745f2a9b51c04d59eb9955de20dcf1b4c8c7e434ad0bdd639f2d80fa10ecc')
open('E:/video.mp4', 'wb').write(bytes(resp.text, encoding='utf-8'))
how to convert this response to a watchable format
Try using wget which can help download files 10x easier.
Here is a simple code with your situation:
import wget
url = "https://bcboltsony-a.akamaihd.net/media/v1/hls/v4/aes128/5182475815001/4ded6ac4-6f8b-4da2-8194-db2391d5e331/164fe5c5-15a3-4997-b4c6-7dd4b95f9c57/92410c6d-c565-4341-8650-1d40a795ece2/5x/segment1.ts?akamai_token=exp=1589337578~acl=/media/v1/hls/v4/aes128/5182475815001/4ded6ac4-6f8b-4da2-8194-db2391d5e331/164fe5c5-15a3-4997-b4c6-7dd4b95f9c57/92410c6d-c565-4341-8650-1d40a795ece2/*~hmac=bf9745f2a9b51c04d59eb9955de20dcf1b4c8c7e434ad0bdd639f2d80fa10ecc"
wget.download(url, 'c:/users/Yourname/downloads/video.mp4')
If this does not work the problem of encoding may be on the url's side.
Your code is absolutely right.But note that:
If you open this page in your explorer,you will find it is a .ts file instead of .mp4 file.
Also,if you download it in the explorer directly, you also couldn't play it directly.In my PC, it also reminds me it has been damaged.
If you search it in the internet, .ts file is encrypted(In the page of your url,the way it encrypt is AES128).Maybe you need to take some measures.
Replace your code with the below code. I hope it will work :).
open('E:/video.mp4', 'wb').write(resp.content)

Automate .get requests via python

I have a python script that scrapes a page, uses jinja2 templating engine to output the appropriate HTML that I finally got working thanks to you kind folks and the people of The Coding Den Discord.
I'm looking to automate the .get request I'm making at the top of the file.
I have thousands of URLs I want this script to run on. What's a good way to go about this? I've tried using an array of URLs but requests says no to that. It complains that must be a string. So it seems I would need to iterate over the compiledUrls variable each time. Any advice on the subject would be much appreciated.
Build a text file with the urls.
urls.txt
https://www.perfectimprints.com/custom-promos/20267/Pizza-Cutters1.html
https://www.perfectimprints.com/custom-promos/20267/Pizza-Cutters2.html
https://www.perfectimprints.com/custom-promos/20267/Pizza-Cutters3.html
https://www.perfectimprints.com/custom-promos/20267/Pizza-Cutters4.html
https://www.perfectimprints.com/custom-promos/20267/Pizza-Cutters5.html
get urls and process:
with open("urls.txt") as file:
for single_url in file:
url = requests.get(single_url.strip())
..... # your code continue here

Python: save a page with a lot of graphics as a .html file

I want to save a visited page on disk as a file. I am using a urllib and URLOpener.
I choose a site http://emma-watson.net/. The file is saved correctly as .html, but when I open the file I noticed that the main picture on top which contains bookmarks to other subpages is not displayed and also some other elements (like POTD). How can I save the page correctly to have all of the page saved on disk?
def saveUrl(url):
testfile = urllib.URLopener()
testfile.retrieve(url,"file.html")
...
saveUrl("http://emma-watson.net")
The screen of real page:
The screen of the opened file on my disk:
What you're trying to do is create a very simple web scraper (that is, you want to find all the links in the file, and download them, but you don't want to do so recursively, or do any fancy filtering or postprocessing, etc.).
You could do this by using a full-on web scraper library like scrapy and just restricting it to a depth of 1 and not enabling anything else.
Or you could do it manually. Pick your favorite HTML parser (BeautifulSoup is easy to use; html.parser is built into the stdlib; there are dozens of other choices). Download the page, then parse the resulting file, scan it for img, a, script, etc. tags with URLs, then download those URLs as well, and you're done.
If you want this all to be stored in a single file, there are a number of "web archive file" formats that exist, and different browsers (and other tools) support different ones. The basic idea of most of them is that you create a zipfile with the files in some specific layout and some extension like .webarch instead of .zip. That part's easy. But you also need to change all the absolute links to be relative links, which is a little harder. Still, it's not that hard with a tool like BeautifulSoup or html.parser or lxml.
As a side note, if you're not actually using the UrlOpener for anything, you're making life harder for yourself for no good reason; just use urlopen. Also, as the docs mention, you should be using urllib2, not urllib; in fact urllib.urlopen is deprecated as of 2.6. And, even if you do need to use an explicit opener, as the docs say, "Unless you need to support opening objects using schemes other than http:, ftp:, or file:, you probably want to use FancyURLopener."
Here's a simple example (enough to get you started, once you decide exactly what you do and don't want) using BeautifulSoup:
import os
import urllib2
import urlparse
import bs4
def saveUrl(url):
page = urllib2.urlopen(url).read()
with open("file.html", "wb") as f:
f.write(page)
soup = bs4.BeautifulSoup(f)
for img in soup('img'):
imgurl = img['src']
imgpath = urlparse.urlparse(imgurl).path
imgpath = 'file.html_files/' + imgpath
os.makedirs(os.path.dirname(imgpath))
img = urllib2.urlopen(imgurl)
with open(imgpath, "wb") as f:
f.write(img)
saveUrl("http://emma-watson.net")
This code won't work if there are any images with relative links. To handle that, you need to call urlparse.urljoin to attach a base URL. And, since the base URL can be set in various different ways, if you want to handle every page anyone will ever write, you will need to read up on the documentation and write the appropriate code. It's at this point that you should start looking at something like scrapy. But, if you just want to handle a few sites, just writing something that works for those sites is fine.
Meanwhile, if any of the images are loaded by JavaScript after page-load time—which is pretty common on modern websites—nothing will work, short of actually running that JavaScript code. At that point, you probably want a browser automation tool like Selenium or a browser simulator tool like Mechanize+PhantomJS, not a scraper.

Categories

Resources