Seeking to random points in a video file with OpenCV seems to be much slower than in media players like Windows Media Player or VLC. I am trying to seek to different positions on a video file encoded in H264 (or MPEG-4 AVC (part10)) using VideoCapture and the time taken to seek to the position seems to be proportional to the frame number queried. Here's a small code example of what I'm trying to do:
import cv2
cap = cv2.VideoCapture('example_file')
frame_positions = [200, 400, 8000, 200000]
for frame_position in frame_positions:
cap.set(cv2.cv.CV_CAP_PROP_FRAMES, frame_position)
img = cap.read()
cv2.imshow('window', img)
cv2.waitKey(0)
The perceived times for when the images are displayed from above are proportional to the frame number. That is, frame number 200 and 400, barely have any delay, 8000 some noticeable lag, but 200000 would take almost half a minute.
Why isn't OpenCV able to seek as "quickly" as say Windows Media Player? Could it be that OpenCV is not using the FFMPEG codecs correctly while seeking? Would building OpenCV from sources with some alternate configuration for codecs help? If so, could someone tell me what the configuration could be?
I have only tested this on Windows 7 and 10 PCs, with OpenCV binaries as is, with relevant FFMPEG DLLs in system path.
Another observation: With OpenCV (binaries) versions greater than 2.4.9 (Example 2.4.11, 3.3.0), the first seek works, but not the subsequent ones. That is, it can seek to frame 200 from above example, but not to 400 and the rest; the video just jumps back to frame 0. But since it works for me with 2.4.9, I'm happy for now.
GPU acceleration should not matter for seeking, because you are not decoding frames. In addition, even if you were decoding frames, doing so on the GPU would be slower than on the CPU, because your CPU nowadays has video codecs "soldered" into the chip, which makes video decoding very fast, and there would have to be some book-keeping to shovel data from main memory into the GPU.
It sounds like OpenCV implements a "safe" way of seeking: Video files can contain stream offsets. For example, your audio stream may be set off against your video stream. As another example, you might have cut away the beginning of a video and saved the result. If your cut did not happen precisely at a key frame, video editing software like ffmpeg will include a small number of frames before your cut in the output file, in order to allow the frame at which your cut happened to be decoded properly (for which the previous frames might be necessary). In this case, too, there will be a stream offset.
In order to make sure that such offsets are interpreted the right way, that is, to really hit exactly the desired frame relative to "time 0", the only "easy", but expensive way is to really eat and decode all the video frames. And that's apparently what openCV is doing here. Your video players do not bother about this, because everyday users don't notice and the controls in the GUI are anyway much to imprecise.
I might be wrong about this. But answers to other questions and some experiments I conducted to evaluate them showed that only the "slow" way of counting the frames in a video gave accurate results.
It's likely because that is a very basic code example and the mentioned applications are doing something more clever.
A few points:
Windows Media Player has hardware acceleration
Windows Media Player almost definitly uses your GPU, you could try disabling this to see what difference it makes
VLC is an open source project so you could check out it's code to see how it does video seeking
VLC probably also uses your GPU
OpenCV provides GPU functions that will most likely make your code much quicker
If speed for seeking is important, you almost definitly want to work with the GPU when doing video operations:
https://github.com/opencv/opencv/blob/master/samples/gpu/video_reader.cpp
Here are some related github issues:
https://github.com/opencv/opencv/issues/4890
https://github.com/opencv/opencv/issues/9053
Re-encode your video with ffmpeg. It works for me.
Related
I am using a video which has around 30000 frames, trying to use the below FER code for emotion recognition
The entire process is taking anywhere between 10-15 hrs just to analyze the video?
Is there a way to speed up the processing time or any other algorithm to detect facial emotion?
Here is the code:
from fer import Video
from fer import FER
import os
import sys
import pandas as pd
location_videofile = "/Users/Akash/Desktop/videoplayback.mp4"
input_video = Video(location_videofile)
processing_data = input_video.analyze(face_detector, display=False, frequency=5)
Tried adding the frequency paramter in the analyze function as well, but of no use since the processing time is pretty much the same, i am assuming it affects the output and not the analyze function
With the following answer I will give you several solutions that may or may not work with your particular video.
The FER code relies on tensorflow and opencv for processing the data.
Assuming a default installation of these packages through pip, tensorflow is already running on gpu (you may want to double check that), while opencv is not.
Some of the functionalities of opencv can run on gpu and they may be the ones that FER is using: in this case, you may want to build the opencv package with GPU support (you can take a look here).
Another solution is to downsample the video-frames of you video by your own before supplying it to FER.
Downsample each frame of the video in order to reduce the number of pixels in each frame. This may give a huge speed-up, if you can afford it (i.e. faces are occupying much of the screen and the number of frame pixels is relatively high)
Multiprocessing. You could split the video in several mini-videos that you can analyse with multiple python processes. In my opinion, this is the cheapest and more reliable way to deal with the speed issue without loss in accuracy
I'm building a simple Python application that involves altering the speed of an audio track.
(I acknowledge that changing the framerate of an audio also make pitch appear different, and I do not care about pitch of the audio being altered).
I have tried using solution from abhi krishnan using pydub, which looks like this.
from pydub import AudioSegment
sound = AudioSegment.from_file(…)
def speed_change(sound, speed=1.0):
# Manually override the frame_rate. This tells the computer how many
# samples to play per second
sound_with_altered_frame_rate = sound._spawn(sound.raw_data, overrides={
"frame_rate": int(sound.frame_rate * speed)
})
# convert the sound with altered frame rate to a standard frame rate
# so that regular playback programs will work right. They often only
# know how to play audio at standard frame rate (like 44.1k)
return sound_with_altered_frame_rate.set_frame_rate(sound.frame_rate)
However, the audio with changed speed sounds distorted, or crackled, which would not be heard with using Audacity to do the same, and I hope I find out a way to reproduce in Python how Audacity (or other digital audio editors) changes the speed of audio tracks.
I presume that the quality loss is caused by the original audio having low framerate, which is 8kHz, and that .set_frame_rate(sound.frame_rate) tries to sample points of the audio with altered speed in the original, low framerate. Simple attempts of setting the framerate of the original audio or the one with altered framerate, and the one that were to be exported didn't work out.
Is there a way in Pydub or in other Python modules that perform the task in the same way Audacity does?
Assuming what you want to do is to play audio back at say x1.5 the speed of the original. This is synonymous to saying to resample the audio samples down by 2/3rds and pretend that the sampling rate hasn't changed. Assuming this is what you are after, I suspect most DSP packages would support it (search audio resampling as the keyphrase).
You can try scipy.signal.resample_poly()
from scipy.signal import resample_poly
dec_data = resample_poly(sound.raw_data,up=2,down=3)
dec_data should have 2/3rds of the number of samples as the original raw_data samples. If you play dec_data samples at the sound's sampling rate, you should get a sped-up version. The downside of using resample_poly is you need a rational factor, and having large numerator or denominator will cause output less ideal. You can try scipy's resample function or seek other packages, which supports audio resampling.
My goal is to be able to detect a specific noise that comes through the speakers of a PC using Python. That means the following, in pseudo code:
Sound is being played out of the speakers, by applications such as games for example,
ny "audio to detect" sound happens, and I want to detect that, and take an action
The specific sound I want to detect can be found here.
If I break that down, i believe I need two things:
A way to sample the audio that is being streamed to an audio device
I actually have this bit working -- with the code found here : https://gist.github.com/renegadeandy/8424327f471f52a1b656bfb1c4ddf3e8 -- it is based off of sounddevice example plot - which I combine with an audio loopback device. This allows my code, to receive a callback with data that is played to the speakers.
A way to compare each sample with my "audio to detect" sound file.
The detection does not need to be exact - it just needs to be close. For example there will be lots of other noises happening at the same time, so its more being able to detect the footprint of the "audio to detect" within the audio stream of a variety of sounds.
Having investigated this, I found technologies mentioned in this post on SO and also this interesting article on Chromaprint. The Chromaprint article uses fpcalc to generate fingerprints, but because my "audio to detect" is around 1 - 2 seconds, fpcalc can't generate the fingerprint. I need something which works across smaller timespaces.
Can somebody help me with the problem #2 as detailed above?
How should I attempt this comparison (ideally with a little example), based upon my sampling using sounddevice in the audio_callback function.
Many thanks in advance.
I want to use OpenCV Python to do SIFT feature detection on remote sensing images. These images are high resolution and can be thousands of pixels wide (7000 x 6000 or bigger). I am having trouble with insufficient memory, however. As a reference point, I ran the same 7000 x 6000 image in Matlab (using VLFEAT) without memory error, although larger images could be problematic. Does anyone have suggestions for processing this kind of data set using OpenCV SIFT?
OpenCV Error: Insufficient memory (Failed to allocate 672000000 bytes) in cv::OutOfMemoryError, file C:\projects\opencv-python\opencv\modules\core\src\alloc.cpp, line 55
OpenCV Error: Assertion failed (u != 0) in cv::Mat::create, file
(I'm using Python 2.7 and OpenCV 3.4 in the Spyder IDE on a Windows 64-bit with 32 GB of RAM.)
I would split the image into smaller windows. So long as you know the windows overlap (I assume you have an idea of the lateral shift) the match in any window will be valid.
You can even use this as a check, the translation between feature points in any part of the image must be the same for the transform to be valid
There are a few flavors how to process SIFT corner detection in this case:
process single image per unit/time one core;
multiprocess 2 or more images /unit time on single core;
multiprocess 2 or more images/unit time on multiple cores.
Read cores as either cpu or gpu. Threading result in serial processing instead of parallel.
As stated Rebecca has at least 32gb internal memory on her PC at her disposal which is more than sufficient for option 1 to process at once.
So in that light.. splitting a single image as suggested by Martin... should be a last resort in my opinion.
Why should you avoid splitting a single image in multiple windows during feature detection (w/o running out of memory)?
Answer:
If a corner is located at the spilt-side of the window and thus becomes unwillingly two more or less polygonal straight-line-like shapes you won't find the corner you're looking for, unless you got a specialized algorithm to search for those anomalies.
In casu:
In Rebecca's case its crucial to know which approach she took on processing the image(s)... Was it one, two, or many more images loaded simultaneously into memory?
If hundreds or thousands of images are simultaneously loaded into memory... you're basically choking the system by taking away its breathing space (in the form of free memory). In addition, we're not talking about other programs that are loaded into memory and claim (reserve) or consume memory space for various background programs. That comes on top of the issue at hand.
Overthinking:
If as suggested by Martin there is an issue with the Opencv lib in handling such amount of images as described by Rebecca.. do some debugging and then report your findings to Opencv, post a question here at SO as she did... but post also code that shows how you deal with the image processing at the start; as explained above why that is important. And yes as Martin stated... don't post wrappers... totally pointless to do so. A referral link to it (with possible version number) is more than enough... or a tag ;-)
I have a video file and all I want for now is put all the video's frames into a Python list. I am using Python's OpenCV library to do it. But my laptop could never do it. it just gets stuck and I have to cut the power to restart it. my guess is that python list is unable to handle all the frames due to memory deficiency. Here is the code and i believe it is the right way to do what I want(syntax). now I need why the laptop is getting stuck and any solution other than using list.
import cv2
video = cv2.VideoCapture("myvideo.mp4")
all_frames = []
while 1:
ret, frame = video.read()
if ret:
all_frames.append(frame)
continue
break
below is some data about the video that might help you
the video contains 7000 frames.
every frame has (1080, 1920) dimension
You can't afford to do that this way.
When reading, the frames are uncompressed from the .mp4 to raw output like 3 bytes per pixel or such.
So you want to store 7000*3*1080*1920 bytes total which is roughly 43 Gb !!
Not to mention that the constant resizing of the list owing to append creates even more copies, so even if you had the memory available, this would be very long.
The idea behind this program is probably to analyse the frames. So basically you don't need all the frames in memory at the same time.
In that case, read a small number of them (in a revolving buffer), perform your shape detection analysis, whatever, store the analysed data (much smaller) and drop the raw data, repeat (programs performing real-time analysis cannot store all the data, because they're running forever)