I'm working on a code, which reads incoming videos from Raspberry Pi, performs face detection on the frames, places frames around the faces, and then write backs the frames into an MP4 file with the same FPS. I use OpenCV to open and read from the PiCam.
When I looked into the saved video, it looks like it's moving too fast. I let my code to run for around 2 minutes, but my video has a length of 30 second. When I disable all post-processings (face detection), I can observe stable speed on the output video.
I can understand that Raspberry Pi has a small processor for heavy computations, but cannot understand why the video length is shorter? Is it possible that my face detection pipeline running much slower than the camera FPS, so the camera buffer should drop frames that are not going to be grabbed by the pipeline in a timely-fashion?
Any help here is highly appreciated!
Related
I am using Python 3.9 and Open-CV (cv2) to read frames from a video stream and save them as JPGs.
My program seems to run OK. It captures the video stream fine, obtains frames, and saves them as JPGs.
However, the frames it is obtaining from the stream are out-of-date - sometimes by several minutes. The clock in the video stream is running accurately, but the clock displays in the JPGs are all identical (to the second - but one or more minutes prior to the datetime in the program's "print()" output (and the saved JPG file time), and moving objects that were in view at the time they were saved are missing completely.
Strangely:
The JPG images are not identical in size. They grow by 10K - 20K as the sequence progresses. Even though they look identical to the eye, they show significant difference when compared using CV2 - but no difference if compared using PIL (which is about 10 - 15 times slower for image comparisons).
The camera can be configured to send a snapshot by email when it detects motion. These snapshots are up-to-date, and show moving objects that were in frame at the time (but no clock display). Enabling or disabling this facility has no effect on the out-of-date issue with JPGs extracted from the video stream. And, sadly, the snapshots are only about 60K, and too low resolution for our purposes (which is an AI application that needs images to be 600K or more).
The camera itself is ONVIF - and things like PTZ work nicely from Python code. Synology Surveillance Station works really well with it in every aspect. This model has reasonably good specs - zoom and good LPR anti-glare functionality. It is made in China - but I don't want to be 'a poor workman who blames his tools'.
Can anyone spot something in the program code that may be causing this?
Has anyone encountered this issue, and can suggest a work-around or different library / methodology?
(And if it is indeed an issue with this brand / model of camera, you are welcome to put in a plug for a mid-range LPR camera that works well for you in an application like this.)
Here is the current program code:
import datetime
from time import sleep
import cv2
goCapturedStream = None
# gcCameraLogin, gcCameraURL, & gcPhotoFolder are defined in the program, but omitted for simplicity / obfuscation.
def CaptureVideoStream():
global goCapturedStream
print(f"CaptureVideoStream({datetime.datetime.now()}): Capturing video stream...")
goCapturedStream = cv2.VideoCapture(f"rtsp://{gcCameraLogin}#{gcCameraURL}:554/stream0")
if not goCapturedStream.isOpened(): print(f"Error: Video Capture Stream was not opened.")
return
def TakePhotoFromVideoStream(pcPhotoName):
llResult = False ; laFrame = None
llResult, laFrame = goCapturedStream.read()
print(f"TakePhotoFromVideoStream({datetime.datetime.now()}): Result is {llResult}, Frame data type is {type(laFrame)}, Frame length is {len(laFrame)}")
if not ".jpg" in pcPhotoName.lower(): pcPhotoName += ".jpg"
lcFullPathName = f"{gcPhotoFolder}/{pcPhotoName}"
cv2.imwrite(lcFullPathName, laFrame)
def ReleaseVideoStream():
global goCapturedStream
goCapturedStream.release()
goCapturedStream = None
# Main Program: Obtain sequence of JPG images from captured video stream
CaptureVideoStream()
for N in range(1,7):
TakePhotoFromVideoStream(f"Test{N}.jpg")
sleep(2) # 2 seconds
ReleaseVideoStream()
Dan Masek's suggestions were very valuable.
The program (now enhanced significantly) saves up-to-date images correctly, when triggered by the camera's inbuilt motion detection (running in a separate thread and communicating through global variables).
The key tricks were:
A much faster loop reading the frames (and discarding most of them). I reduced the sleep to 0.1 (and even further to 0.01), and saved relatively few frames to JPG files only when required
Slowing down the frame rate on the camera (from 25 to 10 fps - even tried 5 at one point). This meant that the camera didn't get ahead of the software and send unpredictable frames.
I am using flask-python framework , opencv and face_recognition library to do face recognition. I'm trying to run the webcam using python -opencv and if a face is detected in the frame, it is sent to the backend where the frame is processed and recognition of the person is performed but this api call with response is taking around 0.3 seconds , this delay is causing a lag in the smooth flow of webcam.
what I would like to have is the webcam run continuously and when the face is detected in the frame it should not block the free flow of webcam streaming, instead the processing of this frame should be handled separately in the background while the main thread is running webcam.
Any help in this regard is much appreciated
Theres a number of things that you can do to resolve this issue,
Dont pass every frame into detection, chances are that you are using a good webcam, which gives you 30FPS or higher, you dont need such a high FPS for real time detection, what you could do is that, only send every 3rd or 4th frame for detection, and for the other frames, just draw the old boxes to the stream, this will give you accurate enough predictions
Shift your model to GPU, face_recognition gives you the choice between using a CPU and GPU(cnn only), shifting it onto any available GPU's will speed up the process
Resize your input frame, you really don't need a 1080p HD frame to detect frames, anything smaller like 480p will also do (though this is entirely dependent on your problem), resize your frame and make it smaller, the smaller the input matrix the quicker it is
Vectorize your operations, instead of using the default face_recognition API for identity comparisons, i suggest you code a vectorized version of comparisons in numpy that calculates the L2 distance in large matrices, instead of having to do it in a loop, this will save alot of computation cost.
Limit the number of comparisons per person. You could have 10 embeddings of person A and 10 of person B, finding the distance of the unknown embeddings with each of these 20 embeddings, can be computationally expensive, resort to doing something like, taking average embeddings, or the centroids of these embeddings, by clustering them
I'm currently using the sony QX1 for wireless transfers for large images. The camera is being triggered over the USB port. Pictures from the camera are being transferred with URLLib to a raspberry pi. (I can't use the api to trigger the camera. It has to be from this external source.)
The camera is triggered around every 2.5 seconds. Through timing testing it seems like I'm able to get the larger picture back to the pi at ~ 3.2 seconds per image.
I've noticed that when the camera is triggered my transfer is terminated. I'm assuming this has to do with the embedded design of the camera itself and there isn't a way to get around this but please correct me if I'm wrong!
Does the camera support the range header? Basically I grab the image size from the header. I'm trying to grab the beginning X bytes until the camera triggers again then grab the next X bytes until I get the entire image.
Thanks for the help and let me know if I need to give a deeper explanation of what is going on here.
I don't know about the range header, but it will still not allow you to take more pictures than your downloadspeed allows (unless you have some larger than 2.5 seconds intervals now and then).
Maybe you can reduce the image resolution to a size that fits into the 2.5 sec interval? Or (just some thinking outside of the box:-) use 2 QX1's switching, so you get a 5 second interval for each...
I am looking into image processing using an SJ4000 camera, linked up via USB to a Raspberry Pi (running Raspbian Jessie) for image processing with OpenCV in Python. I have achieved quite a bit using my webcam but now need to port it into the SJ4000's environment, however I am stuck at this hurdle.
The code I've used is identical to the answer to this question: rotated face detection.
On my laptop's webcam, I get a reasonably good framerate. When the SJ4000 is connected to my laptop via USB as well, I get a good framerate. However, on the Raspberry Pi, when I execute the same code, the image is just frozen for some reason. I then need to force quit the video viewer window which shows up as it's simply frozen.
EDIT 1: After closing the Spyder IDE and loading it up again a few times, and executing the same code, I can see a feed, but the framerate is very low (2-3 seconds per frame) and it will just freeze after some time.
EDIT 2: I've done further testing and find that when I include the face detection code, it takes a long time for the feed to be displayed as there is a TEN second delay. When I forward the feed live without any processing, it's very responsive.
How should I get around this? Is the only way getting a more powerful processor?
Thanks for any help!
Like others said, face detection is very computationally expensive using HOG/Haar descriptors. You won't be able to do real time face detection on the Raspberry Pi. On my Raspberry Pi 3, I can do human body detection on a 300x300 image at around 5 fps.
What I recommend is: Do motion detection. When motion is detected, start face detection.
Further optimization can be done by running face detection in its own thread, and have motion detection feed a FIFO of frames to be analyzed by face detector if motion is detected in a frame. That way, your face detector can operate asynchronously, and not hold up the main thread capturing the video frames, and doing motion detection.
I want to build a webcam based 3D scanner, since I'm going to use a lot of webcams I doing tests before.
I have orderer 3 exact camera that I will drive in python to take snapshot at the same time.
Obviously the bus is going to be saturated when there will be 50 of them.
What I want to know is if the camera are able to hold the picture until they are transfered to the computer.
To simulate this behavior I'd like to slow down the USB bus and make a snapshot with 3 camera,
I'm under windows 7 pro, is this possible?
Thanks.
PS : couldn't I saturate the USB BUS by pluggin some USB external harddrive and doing some file transfert?
What I want to know is if the camera are able to hold the picture until they are transfered to the computer.
That depends on the camera model, but since you mention in your post you are using "webcams", then the answer is almost certainly no. You could slow down the requests you make to the camera to take a picture though.
This sequence of events is possible:
wait
request camera takes picture
camera returns picture as normal
wait
This sequence of events is not possible (with webcams at least)
wait
request camera takes picture
wait
camera returns picture at a significantly later time that you want
to have control over
wait
If you need the functionality displayed in the last sequence I provide (a controllable time between capture and readout of the picture) you will need to upgrade to a better camera, such as a machine vision camera. These cameras usually cost considerably more than webcams and are unlikely to interface over USB (though you might find some that do).
You might be able to find some other solution to your problem (for instance what happens if you request 50 photos from 50 cameras nd saturate the USB bus? Do the webcams you have buffer the data well enough so that it achieves your ultimate goal, or does this affect the quality of the picture?)