I'm new to programming and i cant seem to figure out how to correctly optimise my project, i have a function which takes 2 images and used opencv to stitch the images together. This process ussually takes 0.5seconds for each image to be stitched together, i would like to optimise this so that the images are stitched together at a faster rate.
So, at the moment i have a 2 arrays each containing 800 images, i also have a function called stitch_images which processes each image set to be stitched together. However, for this function im using a while loop to go through each image and stitch it to its corresponding image - this seems to be causing me issues as the while loop is blocking the process. I'm also using 2 shared global variables which contain the images.
Theoretically what i would like to achieve is 4 processes, each process process takes a set of image and works on it --> effectively reducing the computational time by 1/4th.
my question is, how would i go about achieving this? i understand that there are multiple different ways of multiprocessing in python such as threading, multiprocess, queues. which would be the best option for me? if there is an easy way to implement this would anyone have any example code for this?
this is my current set up:
import multiprocessing
import time
import cv2
# Global variables:
frames_1 = []
frames_2 = []
panorama = []
# converting the video into frames for individual image processing
def convert_video_to_frames():
cap = cv2.VideoCapture("Sample_video_1.mp4")
ret = True
while ret:
ret, img = cap.read() # read one frame from the 'capture' object; img is (H, W, C)
if ret:
frames_1.append(img)
cap = cv2.VideoCapture("Sample_video_2.mp4")
ret = True
while ret:
ret, img = cap.read() # read one frame from the 'capture' object; img is (H, W, C)
if ret:
frames_2.append(img)
return frames_1, frames_2
#converting final output images back to video
def convert_frames_to_video():
print("now creating stitched image video")
height, width, layers = panorama[0].shape
size = (width, height)
out = cv2.VideoWriter('project.avi', cv2.VideoWriter_fourcc(*'DIVX'), 15, size)
for i in range(len(panorama)):
out.write(panorama[i])
out.release()
def stitch_images():
print("image processing starting...")
stitcher = cv2.Stitcher_create(cv2.STITCHER_PANORAMA)
while len(frames_1) != 0:
status, result = stitcher.stitch((frames_1.pop(0), frames_2.pop(0)))
if status == 0: # pass
panorama.append(result)
else:
print("image stitching failed")
if __name__ == '__main__':
convert_video_to_frames() # dummy function
start = time.perf_counter()
stitch_images()
finish = time.perf_counter()
print(f'finished in {round(finish - start, 2)} seconds(s)')
print("now converting images to video...")
convert_frames_to_video()
Also, i've attempted at using multiprocessing and adding locks to achieve this but adding:
p1 = multiprocessing.Process(target=stitch_images)
p2 = multiprocessing.Process(target=stitch_images)
p1.start()
p2.start()
p1.join()
p2.join()
but when i run this it seems to skip the while loop all together?
Related
I am implementing a program where I need to have an array of images which are reused (for speed purposes). I thought I could easily create a multidimensional numpy array and reuse each plane as a grayscale image without reallocating new memory, but I am having trouble doing so. Below is a piece of code I create just to illustrate this (this is a simplified version of what I need, just to illustrate my point):
import cv2
import numpy as np
# Access camera
cap = cv2.VideoCapture(0)
cv2.namedWindow('Main process')
# Confirm we are able to acquire images (and initialize the frame variable)
ret, rgb_frame = cap.read()
if ret is False:
print('Error, unable to acquire frame...')
exit(0)
NUM_BUFFERS = 3
gray_frames_array = np.zeros(
(rgb_frame.shape[0], rgb_frame.shape[1], NUM_BUFFERS),
dtype=rgb_frame.dtype)
i = 0
while True:
ret, _ = cap.read(rgb_frame)
if ret is False:
print('Error, unable to acquire frame...')
exit(0)
cv2.cvtColor(src=rgb_frame, code=cv2.COLOR_BGR2GRAY,
dst=gray_frames_array[:, :, i])
cv2.imshow('Main process', gray_frames_array[:, :, i])
if cv2.waitKey(5) == 27:
break
# Use th next "buffer"
i = (i+1) % NUM_BUFFERS
print('done')
My mistake is probably from my Matlab background, but, as it is, I would expect this program to just work without any memory being allocated during the "While True" cycle. However, I get the error:
Expected Ptr<cv::UMat> for argument 'dst'
I know that if I use a list of [height, width] numpy arrays, instead of a [height, width,NUM_BUFFERS], it will work just fine but I was looking to get this working with a single multidimensional numpy array.
Thanks to Dan Masek for pointing out the correct answer. I just shifted the dimensions and it does not give an error anymore.
Although the code is very similiar, I am leaving it here for future reference.
import cv2
import numpy as np
# Access camera
cap = cv2.VideoCapture(0)
cv2.namedWindow('Main process')
# Confirm we are able to acquire images (and initialize the frame variable)
ret, rgb_frame = cap.read()
if ret is False:
print('Error, unable to acquire frame...')
exit(0)
NUM_BUFFERS = 3
gray_frames_array = np.zeros(
(NUM_BUFFERS, rgb_frame.shape[0], rgb_frame.shape[1]),
dtype=rgb_frame.dtype)
i = 0
while True:
ret, _ = cap.read(rgb_frame)
if ret is False:
print('Error, unable to acquire frame...')
exit(0)
cv2.cvtColor(src=rgb_frame, code=cv2.COLOR_BGR2GRAY,
dst=gray_frames_array[i, :, :])
cv2.imshow('Main process', gray_frames_array[i, :, :])
if cv2.waitKey(5) == 27:
break
# Use th next "buffer"
i = (i+1) % NUM_BUFFERS
print('done')
I am developing a GUI with PyQt5 and I am stuck.
Because my program is running on a RaspberryPi4 I have limited processing power. I am getting video input from my webcam and want to perform face_recognition operations on this input. Due to the limited processing power i need to ignore a lot of input frames and just use every n-th frame to perform face recognition with to speed up the process.
I tried to program a delay similar to this thread: Call function every x seconds (Python)
but it didn't work. Is there a possibility to directly refer to a frame?
This is the function where I am reading from the webcam:
def run(self):
checker=0
process_this_frame = 0
# capture from web cam
cap = cv2.VideoCapture(0)
cap.set(cv2.CAP_PROP_FRAME_WIDTH,640);
cap.set(cv2.CAP_PROP_FRAME_HEIGHT,300);
while True:
ret, cv_img = cap.read()
if ret:
img = cv2.resize(cv_img, (0, 0), fx=0.25, fy=0.25)
process_this_frame = process_this_frame+2
print('process_this_frame: ' , process_this_frame)
if process_this_frame % 20 == 0:
predictions = predict(img, model_path="trained_knn_model.clf")
print('showing predicted face')
cv_img = show_prediction_labels_on_image(cv_img, predictions)
checker=1
self.change_pixmap_signal.emit(cv_img)
else:
checker=0
self.change_pixmap_signal.emit(cv_img)
Specifically I am looking for a suitable if condition to execute the predict function only on every n-th frame and when I am not doing the predict on the cv_img I want to display just the frame in my else case. I tried with multiple modulo operators but did not find a suitable solution.
How can I do that? It would be cool to refer to a number of frames instead of using a time delay so I can try to find the best solution.
I was doing some tests with multiprocessing to parallelize face detection and recognition and I came across a strange behaviour, in which detectMultiScale() (that performs the face detection) was running slower inside a child process than in the parent process (just calling the function).
Thus, I wrote the code below in which 10 images are enqueued and then the face detection is performed sequentially with one of two approaches: just calling the detection function or running it inside a single new process. For each detectMultiScale() call, the time of execution is printed. Executing this code gives me an average of 0.22s for each call in the first approach and 0.54s for the second. Also, the total time to process the 10 images is greater in the second approach too.
I don't know why the same code snippet is running slower inside the new process. If only the total time were greater I would understand (considering the overhead of setup a new process), but this I don't get it. For the record, I'm running it in a Raspberry Pi 3B+.
import cv2
import multiprocessing
from time import time, sleep
def detect(face_cascade, img_queue, bnd_queue):
while True:
image = img_queue.get()
if image is not None:
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ti = time()
########################################
faces = face_cascade.detectMultiScale(
gray_image,
scaleFactor=1.1,
minNeighbors=3,
minSize=(130, 130))
########################################
tf = time()
print('det time: ' + str(tf-ti))
if len(faces) > 0:
max_bounds = (0,0,0,0)
max_size = 0
for (x,y,w,h) in faces:
if w*h > max_size:
max_size = w*h
max_bounds = (x,y,w,h)
img_queue.task_done()
bnd_queue.put('bound')
else:
img_queue.task_done()
break
face_cascade = cv2.CascadeClassifier('../lbpcascade_frontalface_improved.xml')
cam = cv2.VideoCapture(0)
cam.set(cv2.CAP_PROP_FRAME_WIDTH, 2592)
cam.set(cv2.CAP_PROP_FRAME_HEIGHT, 1944)
cam.set(cv2.CAP_PROP_BUFFERSIZE, 1)
img_queue = multiprocessing.JoinableQueue()
i = 0
while i < 10:
is_there_frame, image = cam.read()
if is_there_frame:
image = image[0:1944, 864:1728]
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
img_queue.put(image)
i += 1
bnd_queue = multiprocessing.JoinableQueue()
num_process = 1
ti = time()
# MULTIPROCESSING PROCESS APPROACH
for _ in range(num_process):
p = multiprocessing.Process(target=detect, args=(face_cascade, img_queue, bnd_queue))
p.start()
for _ in range(num_process):
img_queue.put(None)
#
# FUNCTION CALL APPROACH
#img_queue.put(None)
#while not img_queue.empty():
# detect(face_cascade, img_queue, bnd_queue)
img_queue.join()
tf = time()
print('TOTAL TIME: ' + str(tf-ti))
while not bnd_queue.empty():
bound = bnd_queue.get()
if bound != 'bound':
print('ERROR')
bnd_queue.task_done()
I am having same issue and I think the reason is that tasks is somewhat I/O bound and also the overhead created by multiprocessing itself.
You can also read the article here https://www.pyimagesearch.com/2019/09/09/multiprocessing-with-opencv-and-python/
And the problem you mentioned specifically with detectMultiScale() method is same as mine. I have also tried using serialize and making variables global and also of class level but nothing help..
I've a scenario where only trucks will pass a toll gate of which I want to capture the number plate only when the truck has halted (to get a good quality image to run OCR on). The OCR solution is built but capturing a frame every time a truck comes to a halt seems to be tricky to me.
Can you help me with the approach or a similar working code to achieve the same using Python 3.6+ and OpenCV. I'm not willing to run any explicit model to detect motion or something, just a simple background subtraction would do, in order to avoid overhead time.
Sample image frame from the video: click here.
Here is the code I'm currently working on, it checks if the background subtraction between two respective frames is more than 10% threshold, then it captures the frame. But, I've to do just the opposite, i.e, if background subtraction is zero, then capture the frame, more logic needs to be added here, like, after capturing a frame, we've to skip all following static frames which are true positive until the the next truck arrives and comes to a halt.
The code:
x_0 = 720
x_1 = 870
y_0 = 190
y_1 = 360
fgbg = cv2.createBackgroundSubtractorMOG2()
cap = cv2.VideoCapture(r"C:\\Users\\aa\\file.asf")
i=0
while(cap.isOpened()):
ret, frame = cap.read()
if ret == True:
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
frame = cv2.GaussianBlur(frame, (21, 21), 0)
fgmask = fgbg.apply(frame)
fgmask_crop= fgmask[y_0:y_1, x_0:x_1]
frame_crop = frame[y_0:y_1, x_0:x_1]
#out_video.write(frame_crop)
cv2.imshow("crop", fgmask_crop)
fg = cv2.copyTo(frame,fgmask)
bg=cv2.copyTo(frame,cv2.bitwise_not(fgmask))
pixels = cv2.countNonZero(fgmask_crop)
image_area = frame_crop.shape[0] * frame_crop.shape[1]
area_ratio = (pixels / image_area) * 100
if area_ratio>10:
i=i+1
print(i)
target= 'C:\\Users\\aa\\op'
fileName = ("res%d.png" % (i))
path_nm = os.path.join(target, fileName)
cv2.imwrite(path_nm,frame_crop)
key = cv2.waitKey(25)
if key == ord('q'):
break
else:
break
cv2.destroyAllWindows()
#out.release()
cap.release()
Any help shall be highly acknowledged.
I would like to do video processing on neighboring frames. More specific, I would like to compute the mean square error between neighboring frames:
mean_squared_error(prev_frame,frame)
I know how to compute this in a linear straightforward way: I use the imutils package to utilize a queue to decouple loading the frames and processing them. By storing them in a queue, I don't need to wait for them before I can process them. ... but I want to be even faster...
# import the necessary packages to read the video
import imutils
from imutils.video import FileVideoStream
# package to compute mean squared errror
from skimage.metrics import mean_squared_error
if __name__ == '__main__':
# SPECIFY PATH TO VIDEO FILE
file = "VIDEO_PATH.mp4"
# START IMUTILS VIDEO STREAM
print("[INFO] starting video file thread...")
fvs = FileVideoStream(path_video, transform=transform_image).start()
# INITALIZE LIST to store the results
mean_square_error_list = []
# READ PREVIOUS FRAME
prev_frame = fvs.read()
# LOOP over frames from the video file stream
while fvs.more():
# GRAP THE NEXT FRAME from the threaded video file stream
frame = fvs.read()
# COMPUTE the metric
metric_val = mean_squared_error(prev_frame,frame)
mean_square_error_list.append(1-metric_val) # Append to list
# UPDATE previous frame variable
prev_frame = frame
Now my question is: How can I mutliprocess the computation of the metric to increase speed and save time ?
My operating system is Windows 10 and I am using python 3.8.0
There are too many aspects of making things faster, I'll only focus on the multiprocessing part.
As you don't want to read the whole video at a time, we have to read the video frame by frame.
I'll be using opencv (cv2), numpy for reading the frames, calculating mse, and saving the mse to disk.
First, we can start without any multiprocessing so we can benchmark our results. I'm using a video of 1920 by 1080 dimension, 60 FPS, duration: 1:29, size: 100 MB.
import cv2
import sys
import time
import numpy as np
import subprocess as sp
import multiprocessing as mp
filename = '2.mp4'
def process_video():
cap = cv2.VideoCapture(filename)
proc_frames = 0
mse = []
prev_frame = None
ret = True
while ret:
ret, frame = cap.read() # reading frames sequentially
if ret == False:
break
if not (prev_frame is None):
c_mse = np.mean(np.square(prev_frame-frame))
mse.append(c_mse)
prev_frame = frame
proc_frames += 1
np.save('data/' + 'sp' + '.npy', np.array(mse))
cap.release()
return
if __name__ == "__main__":
t1 = time.time()
process_video()
t2 = time.time()
print(t2-t1)
In my system, it runs for 142 secs.
Now, we can take the multiprocessing approach. The idea can be summarized in the following illustration.
GIF credit: Google
We make some segments (based on how many cpu cores we have) and process those segmented frames in parallel.
import cv2
import sys
import time
import numpy as np
import subprocess as sp
import multiprocessing as mp
filename = '2.mp4'
def process_video(group_number):
cap = cv2.VideoCapture(filename)
num_processes = mp.cpu_count()
frame_jump_unit = cap.get(cv2.CAP_PROP_FRAME_COUNT) // num_processes
cap.set(cv2.CAP_PROP_POS_FRAMES, frame_jump_unit * group_number)
proc_frames = 0
mse = []
prev_frame = None
while proc_frames < frame_jump_unit:
ret, frame = cap.read()
if ret == False:
break
if not (prev_frame is None):
c_mse = np.mean(np.square(prev_frame-frame))
mse.append(c_mse)
prev_frame = frame
proc_frames += 1
np.save('data/' + str(group_number) + '.npy', np.array(mse))
cap.release()
return
if __name__ == "__main__":
t1 = time.time()
num_processes = mp.cpu_count()
print(f'CPU: {num_processes}')
# only meta-data
cap = cv2.VideoCapture(filename)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = cap.get(cv2.CAP_PROP_FPS)
frame_jump_unit = cap.get(cv2.CAP_PROP_FRAME_COUNT) // num_processes
cap.release()
p = mp.Pool(num_processes)
p.map(process_video, range(num_processes))
# merging
# the missing mse will be
final_mse = []
for i in range(num_processes):
na = np.load(f'data/{i}.npy')
final_mse.extend(na)
try:
cap = cv2.VideoCapture(filename) # you could also take it outside the loop to reduce some overhead
frame_no = (frame_jump_unit) * (i+1) - 1
print(frame_no)
cap.set(1, frame_no)
_, frame1 = cap.read()
#cap.set(1, ((frame_jump_unit) * (i+1)))
_, frame2 = cap.read()
c_mse = np.mean(np.square(frame1-frame2))
final_mse.append(c_mse)
cap.release()
except:
print('failed in 1 case')
# in the last few frames, nothing left
pass
t2 = time.time()
print(t2-t1)
np.save(f'data/final_mse.npy', np.array(final_mse))
I'm using just numpy save to save the partial results, you can try something better.
This one runs for 49.56 secs with my cpu_count = 12. There are definitely some bottlenecks that can be avoided to make it run faster.
The only issue with my implementation is, it's missing the mse for regions where the video was segmented, it's pretty easy to add. As we can index individual frames at any location with OpenCV in O(1), we can just go to those locations and calculate mse separately and merge to the final solution. [Check the updated code it fixes the merging part]
You can write a simple sanity check to ensure, both provide the same result.
import numpy as np
a = np.load('data/sp.npy')
b = np.load('data/final_mse.npy')
print(a.shape)
print(b.shape)
print(a[:10])
print(b[:10])
for i in range(len(a)):
if a[i] != b[i]:
print(i)
Now, some additional speedups can come from using a CUDA-compiled opencv, ffmpeg, adding queuing mechanism plus multiprocessing, etc.