I have been trying to download a video file with python and at the same time playing it with VLC.
I have tried few ways. One of them is to download in a single thread with continuous fetch and append data. This style is slow but video plays. The code is something like below
self.fp = open(dest, "w")
while not self.stop_down and _continue: with urllib2 request
try:
size = 1024 * 8
data = page.read(size)
bytld+= size
self.fp.write(data)
This function takes longer to download but I am able to play the video while its loading.
However I have been trying to download in multiple parts at the same time..
With proper threading logics
req= urllib2.Request(self.url)
req.headers['Range'] = 'bytes=%s-%s' % (self.startPos, self.end)
response = urllib2.urlopen(req)
content = response.read()
if os.path.exists(self.dest) :
out_fd = open(self.dest, "r+b")
else :
out_fd = open(self.dest, "w+b")
out_fd.seek(self.startPos, 0)
out_fd.write(content)
out_fd.close()
With my threading I am making sure that each part of the file is being saved on sequentially.
But for some reason I can't play this file at all while downloading.
Is there anything I am not doing right? Is the "Range" should be modified different way?
Turns out for each block of data in thread mode Range has to be +1 BYTE. So if the first block is 1024 next one is from 1023 to whatever.
Related
To do:
Download a video from url. The downloading should stop once a specific file size has been reached.
This is what I have tried so far.
import requests
with open(local_file_name, 'w') as f:
r = requests.get(url,stream=True)
for chunk in r.iter_content(chunk_size=5000000):
if chunk:
f.write(chunk)
break
I used 'break' after writing that chunk of specific size to the file.
Though the file is being created and the size of it is 5 MB, I am not able to view the file as it throwing error while opening.
Description of code
My script below works fine. It basically just finds all the data files that I'm interested in from a given website, checks to see if they are already on my computer (and skips them if they are), and lastly downloads them using cURL on to my computer.
The problem
The problem I'm having is sometimes there are 400+ very large files and I can't download them all at the same time. I'll push Ctrl-C but it seems to cancel the cURL download not the script so I end up needing to cancel all the downloads one by one. Is there a way around this? Maybe somehow making a key command that will let me stop at the end of the current download?
#!/usr/bin/python
import os
import urllib2
import re
import timeit
filenames = []
savedir = "/Users/someguy/Documents/Research/VLF_Hissler/Data/"
#connect to a URL
website = urllib2.urlopen("http://somewebsite")
#read html code
html = website.read()
#use re.findall to get all the data files
filenames = re.findall('SP.*?\.mat', html)
#The following chunk of code checks to see if the files are already downloaded and deletes them from the download queue if they are.
count = 0
countpass = 0
for files in os.listdir(savedir):
if files.endswith(".mat"):
try:
filenames.remove(files)
count += 1
except ValueError:
countpass += 1
print "counted number of removes", count
print "counted number of failed removes", countpass
print "number files less removed:", len(filenames)
#saves the file names into an array of html link
links=len(filenames)*[0]
for j in range(len(filenames)):
links[j] = 'http://somewebsite.edu/public_web_junk/southpole/2014/'+filenames[j]
for i in range(len(links)):
os.system("curl -o "+ filenames[i] + " " + str(links[i]))
print "links downloaded:",len(links)
You could always check the file size using curl before downloading it:
import subprocess, sys
def get_file_size(url):
"""
Gets the file size of a URL using curl.
#param url: The URL to obtain information about.
#return: The file size, as an integer, in bytes.
"""
# Get the file size in bytes
p = subprocess.Popen(('curl', '-sI', url), stdout=subprocess.PIPE)
for s in p.stdout.readlines():
if 'Content-Length' in s:
file_size = int(s.strip().split()[-1])
return file_size
# Your configuration parameters
url = ... # URL that you want to download
max_size = ... # Max file size in bytes
# Now you can do a simple check to see if the file size is too big
if get_file_size(url) > max_size:
sys.exit()
# Or you could do something more advanced
bytes = get_file_size(url)
if bytes > max_size:
s = raw_input('File is {0} bytes. Do you wish to download? '
'(yes, no) '.format(bytes))
if s.lower() == 'yes':
# Add download code here....
else:
sys.exit()
I'm new to Python and was trying to figure out how to code a script that will download the contents of HTML pages. I was thinking of doing something like:
Y = 0
X = "example.com/example/" + Y
While Y != 500:
(code to download file), Y++
if Y == 500:
break
so the (Y) is the file name and I need to download files from example.com/example/1 all the way till file number 500, regardless of the file type.
Read this official docs page:
This module provides a high-level interface for fetching data across the World Wide Web.
In particular, the urlopen() function is similar to the built-in function open(), but accepts Universal Resource Locators (URLs) instead of filenames.
Some restrictions apply — it can only open URLs for reading, and no seek operations are available.
So you have code like this:
import urllib
content = urllib.urlopen("http://www.google.com").read()
#urllib.request.urlopen(...).read() in python 3
The following code should meet your need. It will download 500 web contents and save them to disk.
import urllib2
def grab_html(url):
response = urllib2.urlopen(url)
mimetype = response.info().getheader('Content-Type')
return response.read(), mimetype
for i in range(500):
filename = str(i) # Use digit as filename
url = "http://example.com/example/{0}".format(filename)
contents, _ = grab_html(url)
with open(filename, "w") as fp:
fp.write(contents)
Notes:
If you need parallel fetching, here is a great example https://docs.python.org/3/library/concurrent.futures.html
I have multiple URLs that returns zip files. Most of the files, I'm able to download using urllib2 library as follows:
request = urllib2.urlopen(url)
zip_file = request.read()
The problem I'm having is that one of the files is 35Mb in size (zipped) and I'm never able to finish downloading it using this library. I'm able to download it using wget and the browser normally.
I have tried downloading the file in chuncks like this:
request = urllib2.urlopen(url)
buffers = []
while True:
buffer = request.read(8192)
if buffer:
buffers.append(buffer)
else:
break
final_file = ''.join(buffers)
But this also does not finish the download. No error is raised, so it's hard to debug what is happening. Unfortunately, I can't post an example of the url / file here.
Any suggestions / advices?
This is copy / paste from my application which downloads it's own update installer. It reads the file in blocks and immediately saves the blocks in output file on the disk.
def DownloadThreadFunc(self):
try:
url = self.lines[1]
data = None
req = urllib2.Request(url, data, {})
handle = urllib2.urlopen(req)
self.size = int(handle.info()["Content-Length"])
self.actualSize = 0
name = path.join(DIR_UPDATES, url.split("/")[-1])
blocksize = 64*1024
fo = open(name, "wb")
while not self.terminate:
block = handle.read(blocksize)
self.actualSize += len(block)
if len(block) == 0:
break
fo.write(block)
fo.close()
except (urllib2.URLError, socket.timeout), e:
try:
fo.close()
except:
pass
error("Download failed.", unicode(e))
I use self.size and self.actualSize to show the download progress in GUI thread and self.terminate to cancel the download from the GUI button if needed.
So here we have a Python script:
""" Record a few seconds of audio and save to a WAVE file. """
import pyaudio
import wave
import sys
chunk = 1024
FORMAT = pyaudio.paInt16
CHANNELS = 1
RATE = 44100
RECORD_SECONDS = 5
WAVE_OUTPUT_FILENAME = "output.wav"
p = pyaudio.PyAudio()
stream = p.open(format = FORMAT,
channels = CHANNELS,
rate = RATE,
input = True,
frames_per_buffer = chunk)
print "* recording"
all = []
for i in range(0, RATE / chunk * RECORD_SECONDS):
data = stream.read(chunk)
all.append(data)
print "* done recording"
stream.close()
p.terminate()
# write data to WAVE file
data = ''.join(all)
wf = wave.open(WAVE_OUTPUT_FILENAME, 'wb')
wf.setnchannels(CHANNELS)
wf.setsampwidth(p.get_sample_size(FORMAT))
wf.setframerate(RATE)
wf.writeframes(data)
wf.close()
And this script does what the first comentary line says, if you run it in terminal it will output a ".wav" file in the path you're set in the moment of the execution... What I want to do is to get that file and "manipule" it, instead of outputting it to the Computer, I want to store it in a variable or something like that, and then I would like to POST it to an URL parsing some parametters along with it... I saw some interesting examples of posting multipart-encoded files using requests, as you can see here:
http://docs.python-requests.org/en/latest/user/quickstart/
But I made several attempts of achieving what I'm descripting in this question and I was unlucky... Maybe a little guidance will help with this one :)
Being Brief, what I need is to record a WAV file from microphone and then POST it to an URL (Parsing Data like the Headers with it) and then get the output in a print statement or something like that in the terminal...
Thank You!!
wave.open lets you pass either a file name or a file-like object to save into. If you pass in a StringIO object rather than WAVE_OUTPUT_FILENAME, you'll can get a string object that you can presumably use to construct a POST request.
Note that this will load the file into memory -- if it might be really long, you might prefer to do it into a temporary file and then use that to make your request. Of course, you're already loading it into memory, so maybe that's not an issue.