I am developing a GUI with PyQt5 and I am stuck.
Because my program is running on a RaspberryPi4 I have limited processing power. I am getting video input from my webcam and want to perform face_recognition operations on this input. Due to the limited processing power i need to ignore a lot of input frames and just use every n-th frame to perform face recognition with to speed up the process.
I tried to program a delay similar to this thread: Call function every x seconds (Python)
but it didn't work. Is there a possibility to directly refer to a frame?
This is the function where I am reading from the webcam:
def run(self):
checker=0
process_this_frame = 0
# capture from web cam
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH,640);
cap.set(cv2.CAP_PROP_FRAME_HEIGHT,300);
while True:
ret, cv_img = cap.read()
if ret:
img = cv2.resize(cv_img, (0, 0), fx=0.25, fy=0.25)
process_this_frame = process_this_frame+2
print('process_this_frame: ' , process_this_frame)
if process_this_frame % 20 == 0:
predictions = predict(img, model_path="trained_knn_model.clf")
print('showing predicted face')
cv_img = show_prediction_labels_on_image(cv_img, predictions)
checker=1
self.change_pixmap_signal.emit(cv_img)
else:
checker=0
self.change_pixmap_signal.emit(cv_img)
Specifically I am looking for a suitable if condition to execute the predict function only on every n-th frame and when I am not doing the predict on the cv_img I want to display just the frame in my else case. I tried with multiple modulo operators but did not find a suitable solution.
How can I do that? It would be cool to refer to a number of frames instead of using a time delay so I can try to find the best solution.
Related
We are designing an algorithm that builds a Merkle tree from a list of Perceptual hashes. the hashes are generated for every frame that we capture from a video. The incentive behind this is that we are able to identify hashes even if the video format has changed.
To verify this, we had two images : Video.mp4 and Video.avi. We extracted frames at 30 fps, and ran pHash over these images. To test our functionality, it is imperative that both the images at every instance (from .mp4 and from .avi) stay the same. However there are still some differences in those two images.
Including code for reference:
Extract frames from video:
def extract_frames(file_path, write_to_path, fps=30):
cap = cv2.VideoCapture(file_path)
count = 0
os.mkdir(f'{write_to_path}/frames')
while cap.isOpened():
ret, frame = cap.read()
if ret:
grayed_image = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
cv2.imwrite(f"{write_to_path}/frames/frame{count}.bmp", grayed_image)
count += fps # i.e. at 30 fps, this advances one second
cap.set(1, count)
else:
cap.release()
break
print(f"Frame Extraction complete. Extracted {count // fps} frames.")
return count
Test if two images are similar
def check_images(path_1, path_2):
img_1 = cv2.imread(path_1, 0)
img_2 = cv2.imread(path_2, 0)
if img_1.shape == img_2.shape:
difference = cv2.subtract(img_1, img_2)
print(difference)
result = not np.any(difference)
return result
print("Unequal shapes, ", img_1.shape, img_2.shape)
return False
The Perceptual hash function
def generate_p_hashes(count, frame_path, fps=30):
count_two = 0
hashes = []
# fileToWrite = open('/content/hash.txt', 'a')
while count_two != count:
temp_hash = imagehash.phash(Image.open(f"{frame_path}/frames/frame{count_two}.bmp"))
count_two += fps
str_temp_hash = str(temp_hash)
hashes.append(str_temp_hash)
print(f"PHash generation complete. Generated {count_two // fps} hashes")
return hashes
Imagehash is a Python package available at : https://github.com/JohannesBuchner/imagehash
The images:
a. Frame captured from .avi file:
b. Frame captured from .mp4 file:
Here's what I've tried:
Convert image to grayscale so color channels are excluded.
Try all different image formats (JPEG, PNG with compression 0, TIFF, BMP)
Sample output:
What is the best way to store these images, so that irrespective of the video source that I am extracting from, the image will stay the same ?
Lossy-compressed files or video streams of different technologies will never give you exactly the same content from the same original source, this is not possible. With a high compression ratio, the images can be quite different.
If the goal is to authenticate, watermark or detect copies, you need to use features that are robust to lossy compression/decompression.
I'm new to programming and i cant seem to figure out how to correctly optimise my project, i have a function which takes 2 images and used opencv to stitch the images together. This process ussually takes 0.5seconds for each image to be stitched together, i would like to optimise this so that the images are stitched together at a faster rate.
So, at the moment i have a 2 arrays each containing 800 images, i also have a function called stitch_images which processes each image set to be stitched together. However, for this function im using a while loop to go through each image and stitch it to its corresponding image - this seems to be causing me issues as the while loop is blocking the process. I'm also using 2 shared global variables which contain the images.
Theoretically what i would like to achieve is 4 processes, each process process takes a set of image and works on it --> effectively reducing the computational time by 1/4th.
my question is, how would i go about achieving this? i understand that there are multiple different ways of multiprocessing in python such as threading, multiprocess, queues. which would be the best option for me? if there is an easy way to implement this would anyone have any example code for this?
this is my current set up:
import multiprocessing
import time
import cv2
# Global variables:
frames_1 = []
frames_2 = []
panorama = []
# converting the video into frames for individual image processing
def convert_video_to_frames():
cap = cv2.VideoCapture("Sample_video_1.mp4")
ret = True
while ret:
ret, img = cap.read() # read one frame from the 'capture' object; img is (H, W, C)
if ret:
frames_1.append(img)
cap = cv2.VideoCapture("Sample_video_2.mp4")
ret = True
while ret:
ret, img = cap.read() # read one frame from the 'capture' object; img is (H, W, C)
if ret:
frames_2.append(img)
return frames_1, frames_2
#converting final output images back to video
def convert_frames_to_video():
print("now creating stitched image video")
height, width, layers = panorama[0].shape
size = (width, height)
out = cv2.VideoWriter('project.avi', cv2.VideoWriter_fourcc(*'DIVX'), 15, size)
for i in range(len(panorama)):
out.write(panorama[i])
out.release()
def stitch_images():
print("image processing starting...")
stitcher = cv2.Stitcher_create(cv2.STITCHER_PANORAMA)
while len(frames_1) != 0:
status, result = stitcher.stitch((frames_1.pop(0), frames_2.pop(0)))
if status == 0: # pass
panorama.append(result)
else:
print("image stitching failed")
if __name__ == '__main__':
convert_video_to_frames() # dummy function
start = time.perf_counter()
stitch_images()
finish = time.perf_counter()
print(f'finished in {round(finish - start, 2)} seconds(s)')
print("now converting images to video...")
convert_frames_to_video()
Also, i've attempted at using multiprocessing and adding locks to achieve this but adding:
p1 = multiprocessing.Process(target=stitch_images)
p2 = multiprocessing.Process(target=stitch_images)
p1.start()
p2.start()
p1.join()
p2.join()
but when i run this it seems to skip the while loop all together?
I have written the following code
import cv2
import datetime
import time
import pandas as pd
cascPath = 'haarcascade_frontalface_dataset.xml' # dataset
faceCascade = cv2.CascadeClassifier(cascPath)
video_capture = cv2.VideoCapture('video1.mp4')
frames = video_capture.get(cv2.CAP_PROP_FRAME_COUNT)
fps = int(video_capture.get(cv2.CAP_PROP_FPS))
print(frames) #1403 frames
print(fps) #30 fps
# calculate duration of the video
seconds = int(frames / fps)
print("duration in seconds:", seconds) #46 seconds
df = pd.DataFrame(columns=['Time(Seconds)', 'Status'])
start = time.time()
print(start)
n=5
while True:
ret, frame = video_capture.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #converts frame to grayscale image
faces = faceCascade.detectMultiScale(
gray, scaleFactor=1.1,
minNeighbors=5,
minSize=(30, 30),
flags=cv2.FONT_HERSHEY_SIMPLEX
)
if len(faces) == 0:
print(time.time()-start, 'No Face Detected')
df = df.append({'Time(Seconds)': (time.time()-start) , 'Status':'No Face detected' }, ignore_index=True)
else:
print(time.time()-start, 'Face Detected')
df = df.append({'Time(Seconds)':(time.time()-start), 'Status':'Face Detected' }, ignore_index=True)
# Draw a rectangle around the faces
for (x, y, w, h) in faces:
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
# Display the resulting frame
cv2.imshow('Video', frame)
df.to_csv('output.csv', index = False)
if cv2.waitKey(1) & 0xFF == ord('q'):
# print(df.head(2))
break
# When everything is done, release the capture
video_capture.release()
cv2.destroyAllWindows()
If you want to download the video I m working on, you can download it from here
download the haar cascade XML file from here
I have a few doubts in this.
Currently it is running on all the 1403 frames of the video, I want to optimize it such that it runs inference after every n frames, which is customizable. in code I have mentioned n =5. So, if n= 5. no of frames should be 1403/5 = 280
The timestamps in my CSV are not coming accurate, I want them to be relative to the video. Basically, the first column (Time(Seconds) should designate the time in the video and the status should determine the status (detected/not detected) of the frame at that moment, the Time(second) column should end at around 46 seconds which is the length of the video.
my cv2.imshow is showing a video that is somewhere around 2x speed, I believe I can control the speed by using cv2.imKey(), what should be the optimal parameter for cv2.waitKey so that I get a similar speed video as output.
Thanks for going through the whole question
If you want to read every 'n' frames, you can wrap your VideoCapture.read() call in a loop like this:
for a in range(n):
ret, frame = video_capture.read();
For the timestamps in the csv file, if that came with the dataset I'd trust that. It's possible the camera isn't capturing at a consistent framerate. If you think the framerate is consistent and want to generate the timestamps yourself you can keep track of how many frames you've gone through and divide the video length by that. (i.e. at frame 150 the timestamp would be (150 / 1403) * 46 seconds)
cv2.imshow() just shows frames as fast as the loop runs. This is mostly controlled through cv2.waitKey(milliseconds). If you think the processing that you're doing in the loop takes a negligible amount of time you can just set the time in the waitKey to be ((n / 1403) * 46 * 1000). Otherwise you should use the python time module to track how long the processing takes and subtract that time from the wait.
Edit:
Sorry, I should have been more clear with the first part. That for loop only has the VideoCapture.read() line in it, nothing else. This way you'll read 'n' frames, but only process one out of every 'n' frames. This isn't replacing the overall while loop that you already have. You're just using the for loop to dump the frames you want to skip.
Oh, and you should also have a check for the return value of the read().
if not ret:
break;
The program will probably crash at the end of the video if it doesn't have that check.
I've a scenario where only trucks will pass a toll gate of which I want to capture the number plate only when the truck has halted (to get a good quality image to run OCR on). The OCR solution is built but capturing a frame every time a truck comes to a halt seems to be tricky to me.
Can you help me with the approach or a similar working code to achieve the same using Python 3.6+ and OpenCV. I'm not willing to run any explicit model to detect motion or something, just a simple background subtraction would do, in order to avoid overhead time.
Sample image frame from the video: click here.
Here is the code I'm currently working on, it checks if the background subtraction between two respective frames is more than 10% threshold, then it captures the frame. But, I've to do just the opposite, i.e, if background subtraction is zero, then capture the frame, more logic needs to be added here, like, after capturing a frame, we've to skip all following static frames which are true positive until the the next truck arrives and comes to a halt.
The code:
x_0 = 720
x_1 = 870
y_0 = 190
y_1 = 360
fgbg = cv2.createBackgroundSubtractorMOG2()
cap = cv2.VideoCapture(r"C:\\Users\\aa\\file.asf")
i=0
while(cap.isOpened()):
ret, frame = cap.read()
if ret == True:
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
frame = cv2.GaussianBlur(frame, (21, 21), 0)
fgmask = fgbg.apply(frame)
fgmask_crop= fgmask[y_0:y_1, x_0:x_1]
frame_crop = frame[y_0:y_1, x_0:x_1]
#out_video.write(frame_crop)
cv2.imshow("crop", fgmask_crop)
fg = cv2.copyTo(frame,fgmask)
bg=cv2.copyTo(frame,cv2.bitwise_not(fgmask))
pixels = cv2.countNonZero(fgmask_crop)
image_area = frame_crop.shape[0] * frame_crop.shape[1]
area_ratio = (pixels / image_area) * 100
if area_ratio>10:
i=i+1
print(i)
target= 'C:\\Users\\aa\\op'
fileName = ("res%d.png" % (i))
path_nm = os.path.join(target, fileName)
cv2.imwrite(path_nm,frame_crop)
key = cv2.waitKey(25)
if key == ord('q'):
break
else:
break
cv2.destroyAllWindows()
#out.release()
cap.release()
Any help shall be highly acknowledged.
I'm working with videos of eyes surgeries in which both a surgical tool and its shadow appear; the frames have a very peculiar light condition due to the surgical torch used to lit the area where to operate.
I'm trying to detect the shadow of the tool to be able to track it from one frame to its subsequent but I'm having no success.
What are the most effective and most common techniques to detect shadows?
I tried thresholding to isolate darker areas, CLAHE to enhance contract and I tried to use different colorspaces that better divide intensities and brightness. I also tried background subtraction.
I'd like to have a binary map of the shadow or a list of keypoints lying on the shadow to be able to detect and track it.
This is an example of the frames I'm working on - This is another frame - And another one
As you can see, shadow is not always present in the frame and sometimes it is not-so-sharp.
This is a gif of a video I'm working on - Just focus on the light condition and the shadow, the quality is very low because I compressed it to make it a GIF, the real videos is FullHD.
Following, the code snippet used for BackgrounSubtractMOG2:
def run(video_src):
cam = cv.VideoCapture(video_src)
cam.set(cv.CAP_PROP_FPS, 10)
subtractor = cv.createBackgroundSubtractorMOG2(detectShadows=True)
frame_idx = 0
l_edge, r_edge = 0, -1
while True:
_, frame = cam.read()
if frame_idx == 0:
l_edge, r_edge = crop(frame)
frame_idx += 1
frame = frame[:1000, l_edge:r_edge, :]
fgMask = subtractor.apply(frame)
cv.imshow('FG Mask', fgMask)
cv.imshow('Frame', frame)
cv.waitKey()
If the eye doesn't move, you can use BackgroundSubtractorMOG2