Python split video file into smaller parts - python

I am stuck with my project(web application using Python-Django) on converting a large file(say 1GB) to small parts using python.I could create the large file into smaller parts,but the problem is only the part 1 gets played and rest of the files wont open.
I understood i need to specify the video information before the video data but i dont know how.
Below is my code and someone help me how i could split the the large file into smaller ones.
[N:B] I need to split the video from the django views when the upload is completed
def video_segments(video):
loc = settings.MEDIA_ROOT + '/' + format(video.video_file)
filetype = format(video.video_file).split(".")
data = None
i = 0
start_index = 0
end_index = 1024000
file = Path(loc)
size = file.stat().st_size
file = open(loc, "rb")
while end_index < size:
i=i+1
file.seek(start_index)
bytes = file.read(end_index-start_index)
newfile = open(settings.MEDIA_ROOT+"/"+filetype[0]+format(i)+"."+filetype[1],"wb")
newfile.write(bytes)
start_index = end_index + 1
end_index = end_index + 1024000
`

I assume you are serving something like H.264 with a MP4 header to a modern www-browser.
If you chop video-files into parts like this the second part won't have any header and therefore will not play in any browser.
The question is why you are doing this at all.
Normally the entire file is served to the browser and the browser gets the parts it needs with HTTP partial file retrieval, modern browsers are smart enough to get only the parts they need assuming the video file in encoded correctly for this purpose.

Related

Python - multiprocessing with large strings slower

I am working on a textual analysis of a large sample of 10Ks (about 150,000) and desperately trying to speed up my program with multiprocessing. The relevant function loads the txt files, parses them with some RegExp and saves them as "clean":
def plain_10k(f):
input_text = open(ipath + "\\" + f, errors = "ignore").read()
# REGEXP
output_file = open(opath + "\\" + f, "w", errors = "ignore")
output_file.write(input_text)
output_file.close()
I try to perform this function over a list of file names as follows:
with Pool(processes = 8) as pool, tqdm(total = len(files_10k)) as pbar:
for d in pool.imap_unordered(plain_10k, files_10k):
pbar.update()
Unfortunately, the program seems to be stuck as it is not returning (i.e. saving clean txt files) anything. Even with a small list of 10 files, nothing happens.
What is the problem here?
If it is relevant: the size of the input txt files ranges from 10kb to 10mb with the majority beeing smaller than 1mb.
I am quite new to Python, so the code above is the result of hours of googling and certainly not very good. I am happy about any comments and suggestions.
Thank you very much in advance!

Download a large number of multiple files

I have a set of image url with index, now I want to parse it through a downloader that can download multiple files at a time to speed up the process.
I tried to put the file name and URL to dicts(name and d2 respectively) and then use requests and threading to do that:
def Thread(start,stop):
for i in range(start, stop):
url = d2[i]
r = requests.get(url)
with open('assayImage/{}'.format(name[i]), 'wb') as f:
f.write(r.content)
for n in range(0, len(d2), 1500):
stop = n + 1500 if n +1500 <= len(d2) else len(d2)
threading.Thread(target = Thread, args = (n,stop)).start()
However, sometimes the connection is timed out and that file will not be downloaded, and after a while, the download speed decreases dramatically. For example, for the first 1 hour, I can download 10000 files, but 3 hours later I can only download 8000 files. Each file size is small, around 500KB.
So, I want to ask that is there any stable way to download a large number of multiple files? I really appreciate your answer.

Flask: Get the size of request.files object

I want to get the size of uploading image to control if it is greater than max file upload limit. I tried this one:
#app.route("/new/photo",methods=["POST"])
def newPhoto():
form_photo = request.files['post-photo']
print form_photo.content_length
It printed 0. What am I doing wrong? Should I find the size of this image from the temp path of it? Is there anything like PHP's $_FILES['foo']['size'] in Python?
There are a few things to be aware of here - the content_length property will be the content length of the file upload as reported by the browser, but unfortunately many browsers dont send this, as noted in the docs and source.
As for your TypeError, the next thing to be aware of is that file uploads under 500KB are stored in memory as a StringIO object, rather than spooled to disk (see those docs again), so your stat call will fail.
MAX_CONTENT_LENGTH is the correct way to reject file uploads larger than you want, and if you need it, the only reliable way to determine the length of the data is to figure it out after you've handled the upload - either stat the file after you've .save()d it:
request.files['file'].save('/tmp/foo')
size = os.stat('/tmp/foo').st_size
Or if you're not using the disk (for example storing it in a database), count the bytes you've read:
blob = request.files['file'].read()
size = len(blob)
Though obviously be careful you're not reading too much data into memory if your MAX_CONTENT_LENGTH is very large
If you don't want save the file to disk first, use the following code, this work on in-memory stream
import os
file = request.files['file']
# os.SEEK_END == 2
# seek() return the new absolute position
file_length = file.seek(0, os.SEEK_END)
# also can use tell() to get current position
# file_length = file.tell()
# seek back to start position of stream,
# otherwise save() will write a 0 byte file
# os.SEEK_END == 0
file.seek(0, os.SEEK_SET)
otherwise, this will better
request.files['file'].save('/tmp/file')
file_length = os.stat('/tmp/file').st_size
The proper way to set a max file upload limit is via the MAX_CONTENT_LENGTH app configuration. For example, if you wanted to set an upload limit of 16 megabytes, you would do the following to your app configuration:
app.config['MAX_CONTENT_LENGTH'] = 16 * 1024 * 1024
If the uploaded file is too large, Flask will automatically return status code 413 Request Entity Too Large - this should be handled on the client side.
The following section of the code should meet your purpose..
form_photo.seek(0,2)
size = form_photo.tell()
As someone else already suggested, you should use the
app.config['MAX_CONTENT_LENGTH']
to restrict file sizes. But
Since you specifically want to find out the image size, you can do:
import os
photo_size = os.stat(request.files['post-photo']).st_size
print photo_size
You can go by popen from os import
save it first
photo=request.files['post-photo']
photo.save('tmp')
now, just get the size
os.popen('ls -l tmp | cut -d " " -f5').read()
this in bytes
for Megabytes or Gigabytes, use the flag --b M or --b G

Delete / Insert Data in mmap'ed File

I am working on a script in Python that maps a file for processing using mmap().
The tasks requires me to change the file's contents by
Replacing data
Adding data into the file at an offset
Removing data from within the file (not whiting it out)
Replacing data works great as long as the old data and the new data have the same number of bytes:
VDATA = mmap.mmap(f.fileno(),0)
start = 10
end = 20
VDATA[start:end] = "0123456789"
However, when I try to remove data (replacing the range with "") or inserting data (replacing the range with contents longer than the range), I receive the error message:
IndexError: mmap slice assignment is
wrong size
This makes sense.
The question now is, how can I insert and delete data from the mmap'ed file?
From reading the documentation, it seems I can move the file's entire contents back and forth using a chain of low-level actions but I'd rather avoid this if there is an easier solution.
In lack of an alternative, I went ahead and wrote two helper functions - deleteFromMmap() and insertIntoMmap() - to handle the low level file actions and ease the development.
The closing and reopening of the mmap instead of using resize() is do to a bug in python on unix derivates leading resize() to fail. (http://mail.python.org/pipermail/python-bugs-list/2003-May/017446.html)
The functions are included in a complete example.
The use of a global is due to the format of the main project but you can easily adapt it to match your coding standards.
import mmap
# f contains "0000111122223333444455556666777788889999"
f = open("data","r+")
VDATA = mmap.mmap(f.fileno(),0)
def deleteFromMmap(start,end):
global VDATA
length = end - start
size = len(VDATA)
newsize = size - length
VDATA.move(start,end,size-end)
VDATA.flush()
VDATA.close()
f.truncate(newsize)
VDATA = mmap.mmap(f.fileno(),0)
def insertIntoMmap(offset,data):
global VDATA
length = len(data)
size = len(VDATA)
newsize = size + length
VDATA.flush()
VDATA.close()
f.seek(size)
f.write("A"*length)
f.flush()
VDATA = mmap.mmap(f.fileno(),0)
VDATA.move(offset+length,offset,size-offset)
VDATA.seek(offset)
VDATA.write(data)
VDATA.flush()
deleteFromMmap(4,8)
# -> 000022223333444455556666777788889999
insertIntoMmap(4,"AAAA")
# -> 0000AAAA22223333444455556666777788889999
There is no way to shift contents of a file (be it mmap'ed or plain) without doing it explicitly. In the case of a mmap'ed file, you'll have to use the mmap.move method.

Writing multiple sound files into a single file in python

I have three sound files for example a.wav, b.wav and c.wav . I want to write them into a single file for example all.xmv (extension could be different too) and when I need I want to extract one of them and I want to play it (for example I want to play a.wav and extract it form all.xmv).
How can I do it in python. I have heard that there is a function named blockwrite in Delphi and it does the thing that I want. Is there a function in python that is like blockwrite in Delphi or how can I write these files and play them?
Would standard tar/zip files work for you?
http://docs.python.org/library/zipfile.html
http://docs.python.org/library/tarfile.html
If the archive idea (which is btw, the best answer to your question) doesn't suit you, you can fuse the data from several files in one file, e.g. by writing consecutive blocks of binary data (thus creating an uncompressed archive!)
Let paths be a list of files that should be concatenated:
import io
import os
offsets = [] # the offsets that should be kept for later file navigation
last_offset = 0
fout = io.FileIO(out_path, 'w')
for path in paths:
f = io.FileIO(path) # stream IO
fout.write(f.read())
f.close()
last_offset += os.path.getsize(path)
offsets.append(last_offset)
fout.close()
# Pseudo: write the offsets to separate file e.g. by pickling
# ...
# reading the data, given that offsets[] list is available
file_ID = 10 # e.g. you need to read 10th file
f = io.FileIO(path)
f.seek(offsets[file_ID - 1]) # seek to required position
read_size = offsets[filed_ID] - offsets[file_ID - 1] # get the file size
data = f.read(read_size) # here we are!
f.close()

Categories

Resources