For my project I want to extract frames from a video to make thumbnails. However every method i find is super slow going through the entere video. Here is what I have tried:
with iio.imopen(v.file.path, "r") as vObj:
metadata = iio.immeta(v.file.path, exclude_applied=False)
frame_num = int(metadata['fps']*metadata['duration']-metadata['fps'])
for i in range(10):
x = int((frame_num/100)*(i*10))
frame = vObj.read(index=x)
path = v.get_thumbnail_path(index=i)
os.makedirs(os.path.dirname(path), exist_ok=True)
iio.imwrite(path, frame)
logger.info('Written video thumbnail: {}'.format(path))
For a long video that takes extremely long. I know videos are compressed over multiple frames, however if I just manually open a video and jump to a point it also does not require to go through the video from first to last frame.
I don't care about specific frames, just roughly 10%, so sticking to keyframes is fine, if it makes it faster.
How to grab a frame every 10% of the video quickly?
Thank you.
The way you are approaching it is correct, and a current PR of mine (#939) will make the performance of calling read multiple times competitive.
Small benchmark:
import imageio.v3 as iio
import numpy as np
from timeit import Timer
def bench():
with iio.imopen("cam1.mp4", "r", plugin="pyav") as file:
n_frames = file.properties().shape[0]
read_indices = np.linspace(0, n_frames-1, 10, dtype=int)
for count, idx in enumerate(read_indices):
frame = file.read(index=idx)
iio.imwrite(f"thumbs/thumbnail_{count}.jpg", frame)
best = min(Timer("bench()", globals=globals()).repeat(5, number=1))
print(f"Best time: {best:.3f}")
Current (v2.25.0): Best time: 2.134
Future (after #939): Best time: 0.924
The above benchmark uses a small, publically available video. The real-world gain will depend on the specific video being processed (and how keyframes are laid out within it). For example, in a longer video (several minutes, not publically available) you will notice a real difference:
Current (v2.25.0): Best time: 42.952
Future (after #939): Best time: 1.687
This could be even faster for shorter GOP sizes, but that requires you to have control over how the video is produced and comes at the expense of increased file-size...
I don't care about specific frames, just roughly 10%, so sticking to keyframes is fine, if it makes it faster.
Reading keyframes isn't a reliable approach here unless you can make some assumptions/assertions about the types of videos you are operating on. Many videos aim for a GOP length of 250 frames but will have shorter/dynamic lengths based on how dynamic the content being encoded is. You can't generally know where keyframes are in advance, so you may end up with arbitrarily skewed results.
For example, a relatively slow-changing video recorded for 10 seconds at 25FPS (with a realized GOP of 250 frames) will have exactly 1 keyframe (the first one), and seeking to the closest keyframe anywhere in the video will always yield the first frame. Likely this isn't what you have in mind.
Related
I was using skvideo library to extract all frames of the videofiles in two ways. For the relatively small videos, I was extracting the whole video as an np-array:
videodata = skvideo.io.vread(videofile)
And for bigger ones I was using generator and doing iteration of frames, and running some calculations on each frame separately, to not have issues with memory:
videogen=skvideo.io.vreader(videofile)
for frame in videogen:
param=calc_func(frame)
However, sometimes I would like to access just a single frame #n without a need to load all of them. So far the only way I see is to break the loop once iteration index 'i==n', and if the number n is high, then it still requires iterating over a lot of frames. So I am wondering if there is faster way to do this, by just specifying needed frame to extract from video.
Seeking to random points in a video file with OpenCV seems to be much slower than in media players like Windows Media Player or VLC. I am trying to seek to different positions on a video file encoded in H264 (or MPEG-4 AVC (part10)) using VideoCapture and the time taken to seek to the position seems to be proportional to the frame number queried. Here's a small code example of what I'm trying to do:
import cv2
cap = cv2.VideoCapture('example_file')
frame_positions = [200, 400, 8000, 200000]
for frame_position in frame_positions:
cap.set(cv2.cv.CV_CAP_PROP_FRAMES, frame_position)
img = cap.read()
cv2.imshow('window', img)
cv2.waitKey(0)
The perceived times for when the images are displayed from above are proportional to the frame number. That is, frame number 200 and 400, barely have any delay, 8000 some noticeable lag, but 200000 would take almost half a minute.
Why isn't OpenCV able to seek as "quickly" as say Windows Media Player? Could it be that OpenCV is not using the FFMPEG codecs correctly while seeking? Would building OpenCV from sources with some alternate configuration for codecs help? If so, could someone tell me what the configuration could be?
I have only tested this on Windows 7 and 10 PCs, with OpenCV binaries as is, with relevant FFMPEG DLLs in system path.
Another observation: With OpenCV (binaries) versions greater than 2.4.9 (Example 2.4.11, 3.3.0), the first seek works, but not the subsequent ones. That is, it can seek to frame 200 from above example, but not to 400 and the rest; the video just jumps back to frame 0. But since it works for me with 2.4.9, I'm happy for now.
GPU acceleration should not matter for seeking, because you are not decoding frames. In addition, even if you were decoding frames, doing so on the GPU would be slower than on the CPU, because your CPU nowadays has video codecs "soldered" into the chip, which makes video decoding very fast, and there would have to be some book-keeping to shovel data from main memory into the GPU.
It sounds like OpenCV implements a "safe" way of seeking: Video files can contain stream offsets. For example, your audio stream may be set off against your video stream. As another example, you might have cut away the beginning of a video and saved the result. If your cut did not happen precisely at a key frame, video editing software like ffmpeg will include a small number of frames before your cut in the output file, in order to allow the frame at which your cut happened to be decoded properly (for which the previous frames might be necessary). In this case, too, there will be a stream offset.
In order to make sure that such offsets are interpreted the right way, that is, to really hit exactly the desired frame relative to "time 0", the only "easy", but expensive way is to really eat and decode all the video frames. And that's apparently what openCV is doing here. Your video players do not bother about this, because everyday users don't notice and the controls in the GUI are anyway much to imprecise.
I might be wrong about this. But answers to other questions and some experiments I conducted to evaluate them showed that only the "slow" way of counting the frames in a video gave accurate results.
It's likely because that is a very basic code example and the mentioned applications are doing something more clever.
A few points:
Windows Media Player has hardware acceleration
Windows Media Player almost definitly uses your GPU, you could try disabling this to see what difference it makes
VLC is an open source project so you could check out it's code to see how it does video seeking
VLC probably also uses your GPU
OpenCV provides GPU functions that will most likely make your code much quicker
If speed for seeking is important, you almost definitly want to work with the GPU when doing video operations:
https://github.com/opencv/opencv/blob/master/samples/gpu/video_reader.cpp
Here are some related github issues:
https://github.com/opencv/opencv/issues/4890
https://github.com/opencv/opencv/issues/9053
Re-encode your video with ffmpeg. It works for me.
I have a video file and all I want for now is put all the video's frames into a Python list. I am using Python's OpenCV library to do it. But my laptop could never do it. it just gets stuck and I have to cut the power to restart it. my guess is that python list is unable to handle all the frames due to memory deficiency. Here is the code and i believe it is the right way to do what I want(syntax). now I need why the laptop is getting stuck and any solution other than using list.
import cv2
video = cv2.VideoCapture("myvideo.mp4")
all_frames = []
while 1:
ret, frame = video.read()
if ret:
all_frames.append(frame)
continue
break
below is some data about the video that might help you
the video contains 7000 frames.
every frame has (1080, 1920) dimension
You can't afford to do that this way.
When reading, the frames are uncompressed from the .mp4 to raw output like 3 bytes per pixel or such.
So you want to store 7000*3*1080*1920 bytes total which is roughly 43 Gb !!
Not to mention that the constant resizing of the list owing to append creates even more copies, so even if you had the memory available, this would be very long.
The idea behind this program is probably to analyse the frames. So basically you don't need all the frames in memory at the same time.
In that case, read a small number of them (in a revolving buffer), perform your shape detection analysis, whatever, store the analysed data (much smaller) and drop the raw data, repeat (programs performing real-time analysis cannot store all the data, because they're running forever)
I am working with .h5 files with little experience.
In a script I wrote I load in data from an .h5 file. The shape of the resulting array is: [3584, 3584, 75]. Here the values 3584 denotes the number of pixels, and 75 denotes the number of time frames. Loading the data and printing the shape takes 180 ms. I obtain this time using os.times().
If I now want to look at the data at a specific time frame I use the following piece of code:
data_1 = data[:, :, 1]
The slicing takes up a lot of time (1.76 s). I understand that my 2D array is huge but at some point I would like to loop over time which will take very long as I'm performing this slice within the for loop.
Is there a more effective/less time consuming way of slicing the time frames or handling this type of data?
Thank you!
Note: I'm making assumptions here since I'm unfamiliar with .H5 files and the Python code the accesses them.
I think that what is happening is that when you "load" the array, you're not actually loading an array. Instead, I think that an object is constructed on top of the file. It probably reads in dimensions and information related to how the file is organized, but it doesn't read the whole file.
That object mimicks an array so good that when you later on perform the slice operation, the normal Python slice operation can be executed, but at this point the actual data is being read. That's why the slice takes so long time compared to "loading" all the data.
I arrive at this conclusion because of the following.
If you're reading 75 frames of 3584x3584 pixels, I'm assuming they're uncompressed (H5 seems to be just raw dumps of data), and in that case, 75 * 3.584 * 3.584 = 963.379.200, this is around 918MB of data. Couple that with you "reading" this in 180ms, we get this calculation:
918MB / 180ms = 5.1GB/second reading speed
Note, this number is for 1-byte pixels, which is also unlikely.
This speed thus seems highly unlikely, as even the best SSDs today reach way below 1GB/sec.
It seems much more plausible that an object is just constructed on top of the file and the slice operation incurs the cost of reading at least 1 frame worth of data.
If we divide the speed by 75 to get per-frame speed, we get 68MB/sec speed for 1-byte pixels, and with 24 or 32-bit pixels we get up to 270MB/sec reading speeds. Much more plausible.
I have a number of MP3 files containing lectures where the speaker talks very slowly and I would like to alter the MP3 file so that the playback rate is about 1.5 times as fast as normal.
Can someone suggest a good Python library for this? By the way, I'm running Python 2.6 on Windows.
Thanks in advance.
I wrote a library, pydub which is mainly designed for manipulating audio.
I've created an experimental time-stretching algorithm if you're interested in seeing how these sorts of things work.
Essentially you want to throw away a portion of your data, but you can't just play back the waveform faster because then it'll all get high pitched (as synthesizerpatel mentioned). Instead you want to throw away chunks (20 Hz is the lowest a human can hear so 50ms chunks do not cause audible frequency changes, though there are other artifacts).
PS - I get 50ms like so:
20 Hz == 1 second per 20 cycles
or
1000 ms per 20 cycles
or
1000ms / 20Hz == 50ms per cycle
pymedia includes a recode_audio.py example that allows arbitrary input and output formats available here. This of course requires the installation of pymedia as well.
Note that as Nick T notes, if you just change the sample-rate without resampling you'll get high-pitched 'fast' audio, so you'll want to employ time-stretching in combination with changing the bit-rate.
You can have a try on _spawn module in audio_segment.py of Pydub. Here is an example code:
from pydub import AudioSegment
import os
def speed_swifter(sound, speed=1.0):
return sound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={"frame_rate": int(sound.frame_rate * speed)})
in_path = 'your/path/of/input_file/hello.mp3'
ex_path = 'your/path/of/output_file/hello.mp3'
sound = AudioSegment.from_file(in_path)
# generate a slower audio for example
slower_sound = speed_change(sound, 0.5)
slower_sound.export(os.path.join(ex_path, 'slower.mp3'), format="mp3")