I want to download a Video file from internet using requests library and before saving that endit metadata of the video.
import requests
url = 'https://www.sample-videos.com/video123/mp4/720/big_buck_bunny_720p_5mb.mp4'
r = requests.get(url, stream=True)
with open('video.mp4', 'wb') as file:
file.write(r.content)
I just want to change the metadata to video before saving the file.
I don't think that's possible. My approach would be to first download the video and then consider using a library such as tagpy or mutagen.
I would recommend mutagen since I find that it has a good documentation
See here on installation steps for mutagen
Example code using mutagen
>>> import mutagen
>>> mutagen.File("11. The Way It Is.ogg")
{'album': [u'Always Outnumbered, Never Outgunned'],
'title': [u'The Way It Is'], 'artist': [u'The Prodigy'],
'tracktotal': [u'12'], 'albumartist': [u'The Prodigy'],'date': [u'2004'],
'tracknumber': [u'11'],
>>> _.info.pprint()
u'Ogg Vorbis, 346.43 seconds, 499821 bps'
>>>
And then to change the title, you simply access the dictionary key and change the value of it
from mutagen.flac import FLAC
audio = FLAC("example.flac")
audio["title"] = u"An example"
audio.pprint()
audio.save()
To build on AzyCrw4282's answer, mutagen can be used to do what you're looking for before saving the file.
The API docs for mutagen.File() states that it's looking for a filething, which is "a filename or file-like object". This means that you can buffer it to an in-memory location, modify your metadata with Mutagen, then save it to disk. Please be aware that the entire binary response will be in memory, this may cause issues depending on your available system resources.
from io import BytesIO
import requests
import mutagen
with requests.get(url, stream=True) as r:
r.raise_for_status()
buf = BytesIO()
for chunk in r.iter_content(chunk_size=8192):
if chunk:
buf.write(chunk)
buf.seek(0)
video = mutagen.File(buf)
# ... do your modifications
with open('/your/file/path.mp4', 'wb') as f:
f.write(buf.getbuffer())
Related
I'm making a program that downloads PDFs from the internet.
Here's a example of the code:
import httpx # <-- This also happens with the requests module
URL = "http://62.182.86.140/main/0/aee7239ffcf7871e1d6687ced1215e22/Markus%20Nix%20-%20Exploring%20Python-Entwickler%20%282005%29.djvu"
r = httpx.get(URL, timeout=20.0).content.decode("ascii")
with open(f"./example.pdf", "w") as f:
f.write(str(content))
But when I write to a file, none of my pdf viewers (tried okular and zathura) can read them.
But when I download it using a program like wget, there's no problems.
Then when I compare the two files (one downloaded with python, and the other with wget), everything is encoded, and I can't figure out how to decode it (.decode() doesn't work).
import httpx
def main(url):
r = httpx.get(url, timeout=20)
with open('file.djvu', 'wb') as f:
f.write(r.content)
main('http://62.182.86.140/main/0/aee7239ffcf7871e1d6687ced1215e22/Markus%20Nix%20-%20Exploring%20Python-Entwickler%20%282005%29.djvu')
How would I go about navigating to a URL that's stored in a list and downloading the file? I'd preferably like to be able to store the MP4 file as it's clip title. I've used requests to retrieve the urls.
Thanks
list_clips = ['https://clips.twitch.tv/SpeedySneakyHeronKappaClaus', 'https://clips.twitch.tv/SplendidGiantPuffinThunBeast', 'https://clips.twitch.tv/ArtsyAuspiciousHamburgerThisIsSparta', 'https://clips.twitch.tv/BoringNiceHerbsSaltBae']
You can use python's requests module to download the file. Please refer the code below
import requests, os
for clips in list_clips:
clip_title = os.path.basename(clips)
r = requests.get(clips)
with open(clip_title+'.mp4', 'wb') as f:
f.write(r.content)
I was trying to make a script to download songs from internet. I was first trying to download the song by using "requests" library. But I was unable to play the song. Then, I did the same using "urllib2" library and I was able to play the song this time.
Can't we use "requests" library to download songs? If yes, how?
Code by using requests:
import requests
doc = requests.get("http://gaana99.com/fileDownload/Songs/0/28768.mp3")
f = open("movie.mp3","wb")
f.write(doc.text)
f.close()
Code by using urllib2:
import urllib2
mp3file = urllib2.urlopen("http://gaana99.com/fileDownload/Songs/0/28768.mp3")
output = open('test.mp3','wb')
output.write(mp3file.read())
output.close()
Use doc.content to save binary data:
import requests
doc = requests.get('http://gaana99.com/fileDownload/Songs/0/28768.mp3')
with open('movie.mp3', 'wb') as f:
f.write(doc.content)
Explanation
A MP3 file is only binary data, you cannot retrieve its textual part. When you deal with plain text, doc.text is ideal, but for any other binary format, you have to access bytes with doc.content.
You can check the used encoding, when you get a plain text response, doc.encoding is set, else it is empty:
>>> doc = requests.get('http://gaana99.com/fileDownload/Songs/0/28768.mp3')
>>> doc.encoding
# nothing
>>> doc = requests.get('http://www.example.org')
>>> doc.encoding
ISO-8859-1
A similar way from here:
import urllib.request
urllib.request.urlretrieve('http://gaana99.com/fileDownload/Songs/0/28768.mp3', 'movie.mp3')
I'm trying to scrape this image using urllib.urlretrieve.
>>> import urllib
>>> urllib.urlretrieve('http://i9.mangareader.net/one-piece/3/one-piece-1668214.jpg',
path) # path was previously defined
This code successfully saves the file in the given path. However, when I try to open the file, I get:
Could not load image 'imagename.jpg':
Error interpreting JPEG image file (Not a JPEG file: starts with 0x3c 0x21)
When I do file imagename.jpg in my bash terminal, I get imagefile.jpg: HTML document, ASCII text.
So how do I scrape this image as a JPEG file?
It's because the owner of the server hosting that image is deliberately blocking access from Python's urllib. That's why it's working with requests. You can also do it with pure Python, but you'll have to give it an HTTP User-Agent header that makes it look like something other than urllib. For example:
import urllib2
req = urllib2.Request('http://i9.mangareader.net/one-piece/3/one-piece-1668214.jpg')
req.add_header('User-Agent', 'Feneric Was Here')
resp = urllib2.urlopen(req)
imgdata = resp.read()
with open(path, 'wb') as outfile:
outfile.write(imgdata)
So it's a little more involved to get around, but still not too bad.
Note that the site owner probably did this because some people had gotten abusive. Please don't be one of them! With great power comes great responsibility, and all that.
I am trying to automate downloading a .Z file from a website, but the file I get is 2kb when it should be around 700 kb and it contains a list of the contents of the page (ie: all the files available for download). I am able to download it manually without a problem. I have tried urllib and urllib2 and different configurations of each, but each does the same thing. I should add that the urlVar and fileName variables are generated in a different part of the code, but I have given an example of each here to demonstrate.
import urllib2
urlVar = "ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/txga1000.14d.Z"
fileName = txga1000.14d.Z
downFile = urllib2.urlopen(urlVar)
with open(fileName, "wb") as f:
f.write(downFile.read())
At least the urllib2documentation suggest you should use the Requestobject. This works with me:
import urllib2
req = urllib2.Request("ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/txga1000.14d.Z")
response = urllib2.urlopen(req)
data = response.read()
Data length seems to be 740725.
I was able to download what seems like the correct size for your file with the following python2 code:
import urllib2
filename = "txga1000.14d.Z"
url = "ftp://www.ngs.noaa.gov/cors/rinex/2014/100/txga/{}".format(filename)
reply = urllib2.urlopen(url)
buf = reply.read()
with open(filename, "wb") as fh:
fh.write(buf)
Edit: The post above me was answered faster and is much better.. I thought I'd post since I tested and wrote this out anyways.