So I have been trying to make a motion tracker to track a dog moving in a video (recorded Top-down) retrieve a cropped video showing the dog and ignore the rest of the background.
I tried first with object tracking using the available algorithms in opencv 3 (BOOSTING, MIL, KCF, TLD, MEDIANFLOW, GOTURN(returns an error, couldn't solve it yet)) from this link and I even tried a basic algorithm for motion tracking by subtracting the first frame, but none of them gives a good result. Link
I would prefer a code with a preset rectangle box that surrounds the area of motion once it is detected. Something like in this video
I'm not very familiar with OPENCV, but I believe single motion tracking is not supposed to be an issue since a lot of work has been done already. Should I consider other libraries/APIs or is there a better code/tutorial I can follow to get this done? my point is to use this later with neural network (which is why I'm trying to solve it using python/opencv)
Thanks for any help/advice
Edit:
I removed the previous code to make the post cleaner.
Also, based on the feedback I got and further research, I was able to modify some code to make it close to my wanted result. However, I still have an annoying problem with the tracking. It seems like the first frame affects the rest of the tracking since even after the dog moves, it keeps detecting its first location. I tried to limit the tracking to only 1 action using a flag, but the detection gets messed up. This is the code and pictures showing results:
jimport imutils
import time
import cv2
previousFrame = None
def searchForMovement(cnts, frame, min_area):
text = "Undetected"
flag = 0
for c in cnts:
# if the contour is too small, ignore it
if cv2.contourArea(c) < min_area:
continue
#Use the flag to prevent the detection of other motions in the video
if flag == 0:
(x, y, w, h) = cv2.boundingRect(c)
#print("x y w h")
#print(x,y,w,h)
cv2.rectangle(frame, (x, y), (x+w, y+h), (0, 255, 0), 2)
text = "Detected"
flag = 1
return frame, text
def trackMotion(ret,frame, gaussian_kernel, sensitivity_value, min_area):
if ret:
# Convert to grayscale and blur it for better frame difference
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (gaussian_kernel, gaussian_kernel), 0)
global previousFrame
if previousFrame is None:
previousFrame = gray
return frame, "Uninitialized", frame, frame
frameDiff = cv2.absdiff(previousFrame, gray)
thresh = cv2.threshold(frameDiff, sensitivity_value, 255, cv2.THRESH_BINARY)[1]
thresh = cv2.dilate(thresh, None, iterations=2)
_, cnts, _ = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
frame, text = searchForMovement(cnts, frame, min_area)
#previousFrame = gray
return frame, text, thresh, frameDiff
if __name__ == '__main__':
video = "Track.avi"
video0 = "Track.mp4"
video1= "Ntest1.avi"
video2= "Ntest2.avi"
camera = cv2.VideoCapture(video1)
time.sleep(0.25)
min_area = 5000 #int(sys.argv[1])
cv2.namedWindow("Security Camera Feed")
while camera.isOpened():
gaussian_kernel = 27
sensitivity_value = 5
min_area = 2500
ret, frame = camera.read()
#Check if the next camera read is not null
if ret:
frame, text, thresh, frameDiff = trackMotion(ret,frame, gaussian_kernel, sensitivity_value, min_area)
else:
print("Video Finished")
break
cv2.namedWindow('Thresh',cv2.WINDOW_NORMAL)
cv2.namedWindow('Frame Difference',cv2.WINDOW_NORMAL)
cv2.namedWindow('Security Camera Feed',cv2.WINDOW_NORMAL)
cv2.resizeWindow('Thresh', 800,600)
cv2.resizeWindow('Frame Difference', 800,600)
cv2.resizeWindow('Security Camera Feed', 800,600)
# uncomment to see the tresh and framedifference displays
cv2.imshow("Thresh", thresh)
cv2.imshow("Frame Difference", frameDiff)
cv2.putText(frame, text, (10, 20), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 0, 255), 2)
cv2.imshow("Security Camera Feed", frame)
key = cv2.waitKey(3) & 0xFF
if key == 27 or key == ord('q'):
print("Bye")
break
camera.release()
cv2.destroyAllWindows()
This picture shows how the very first frame is still affecting the frame difference results, which forces the box to cover area with no motion.
This one shows a case when motion is ignored an no-longer existing motion (frame difference from the second and first frames of the video) being falsely detected. When I allow multiple tracking it tracks both, which is still wrong since it detects an empty area.
Does anyone have an idea where the code is wrong or lacking ? I keep trying but cannot get it to work properly.
Thank you in advance !!
To include motion detection I have created generic components on NPM Registry and docker hub
This detects the motion on client web cam( React app) and uses server in Python based on open CV
so Client just captures web cam images and server analyses these images using OPENCV to determine if there is a motion or not
client can specify a call back function which server calls each time there is a motion
Server is just a docker image which you can pull and run and specify its URL to client
NPM Registry(Client)
Registry Link:
https://www.npmjs.com/settings/kunalpimparkhede/packages
Command
npm install motion-detector-client
Docker Image (Server)
Link
https://hub.docker.com/r/kunalpimparkhede/motiondetectorwebcam
Command
docker pull kunalpimparkhede/motiondetectorwebcam
You just need to write following code to have motion detection
Usage:
import MotionDetectingClient from './MotionDetectingClient';
<MotionDetectingClient server="http://0.0.0.0:8080" callback={handleMovement}/>
function handleMovement(pixels)
{
console.log("Movement By Pixel="+pixels)
}
On server side : just start the docker server on port 8080:
docker run --name motion-detector-server-app -P 8080:5000 motion-detector-server-app
Related
I'm working on a project with april tags and a computer vision system to detect them from a webcam. I have a good system as of now that prints the data to the terminal but I would like to display this numerical/text data on top of the video window or in another window. I've already tried using cv2.putText() but that only puts static text on the page and it can't be updated in real time like I want. This is my code that tries to update a window in real time with the number of tags detected in the webcam video. But it ends up just writing a 1 for example and I can't figure out a way to erase that text and update it.
Is this even possible in OpenCV? Or is there another way?
while True:
success, frame = cap.read()
if not success:
break
gray = cv2.cvtColor(frame, cv2.COLOR_RGB2GRAY)
detections, dimg = detector.detect(gray, return_image=True)
print(detections)
num_detections = len(detections)
# print('Detected {} tags.\n'.format(num_detections))
num_detections_string = str(num_detections)
overlay = frame // 2 + dimg[:, :, None] // 2
clear_text = ''
text = checkNumDetections(num_detections, num_detections_string)
cv2.putText(whiteBackground, clear_text, (100, 100), cv2.FONT_HERSHEY_PLAIN, 10, (0, 255, 0), 2)
cv2.putText(whiteBackground, text, (100, 100), cv2.FONT_HERSHEY_PLAIN, 10, (0, 255, 0), 2)
cv2.imshow(window, overlay)
k = cv2.waitKey(1)
cv2.imshow(dataWindow, whiteBackground)
if k == 27:
break
You need to re-initialise the 'whiteBackground' image in each loop, before you draw anything on it.
I know this will work, but it will give you a black background:
whiteBackground = np.zeros((columns, rows, channels), dtype = "uint8")
This should work to give you a white background, but experiment and see:
whiteBackground = np.full((columns, rows, channels), 255, dtype = "uint8")
I usually work with opencv in c++, so I'm not 100% sure of the exact syntax, but that should work.
Hello there people of the internet,
The code in question is using python 3.8.5 and opencv 4 (I do not know how to check the exact version but I know its opencv 4). My team and I are attempting to take a live video feed from a usb webcam and determine the distance between the camera and the object in the video feed. We had some success in reading the distance with image stills taken from the same camera and read via the imutils library. But now we want to attempt to calculate that data live.
Our code is below.
from imutils import paths
import numpy as np
import imutils
import cv2
import time
import os
def find_marker(image):
#conver the image into grayscales, blurs it then detects edges
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(gray, 35, 125)
#find the contours in the edged image and keep the largest one;
#w'll assume that this our piece of paper in the image
cnts = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
c = max(cnts, key = cv2.contourArea)
#compute the bounding box of the paper region and return it
return cv2.minAreaRect(c)
def distance_to_camera(knownWidth, focalLength, perWidth):
#compute and return the distance from the marker to the camera
return (knownWidth * focalLength) / perWidth
#intialize the known distances from the camera to the object
KNOWN_DISTANCE = 22
#initialize the known object width, which in this case the piece of paper is 12 inches
KNOWN_WIDTH = 11
#load the first image that contains an object that is known to be 2 feet
#from our camera, the find the paper marker in the image and
#initialize the focal length
rootimage = cv2.imread("/Volumes/404/final_rov_code/Python/images/2ft.jpg")
marker1 = find_marker(rootimage)
marker2 = marker1[0][1] - marker1[1][1]
focalLength = (marker2 * KNOWN_DISTANCE) / KNOWN_WIDTH
print(marker1)
print(marker2)
image = cv2.VideoCapture(0)
#Loop over the image
while True:
#load the image, find the marker in the image then compute the
#distance to the marker from the camera
frame, ret = image.read()
marker = find_marker(ret)
inches = distance_to_camera(KNOWN_WIDTH, focalLength, marker[1][0])
print(inches)
#draw a bounding box around the image and display it
box = cv2.cv.BoxPoints(marker) if imutils.is_cv2() else cv2.boxPoints(marker)
box = np.int0(box)
cv2.drawContours(frame, [box], -1, (0, 255, 0), 2)
cv2.putText(ret, "%.2fin" % inches,
(ret.shape[1] - 200, ret.shape[0] - 20), cv2.FONT_HERSHEY_SIMPLEX,
2.0, (0, 255, 0), 3)
cv2.imshow("image", ret)
# if cv2.waitKey(33) == ord('q'):
# os.system('pause')
I understand that it should be as minimalistic as possible but since we have no idea what could be causing the program to hang upon reading the first frame of the video feed. Could it be the fact that the processing is taking too many resources from the single thread? (We're all newbies to the advanced sides of opencv and python 3)
There is no other errors that we are aware of at the moment so no leads in the terminal of where it could be coming from.
Thank you in advance.
Your problem is likely a result of not including the waitkey() statement at the end of your while loop. It takes time for openCV to load the image, so if the program doesn't pause for long enough for the image to be drawn, the display just doesn't update. Check out this other StackOverflow question for more details.
In addition, you have your ret and frame variables mixed up. ret should be the first one and frame should be the second. Right now, the drawContours() method isn't going to do anything because you're passing it a boolean instead of an image.
Making those changes fixed this for me using Python 3.9 and OpenCV 4.5.
I've a scenario where only trucks will pass a toll gate of which I want to capture the number plate only when the truck has halted (to get a good quality image to run OCR on). The OCR solution is built but capturing a frame every time a truck comes to a halt seems to be tricky to me.
Can you help me with the approach or a similar working code to achieve the same using Python 3.6+ and OpenCV. I'm not willing to run any explicit model to detect motion or something, just a simple background subtraction would do, in order to avoid overhead time.
Sample image frame from the video: click here.
Here is the code I'm currently working on, it checks if the background subtraction between two respective frames is more than 10% threshold, then it captures the frame. But, I've to do just the opposite, i.e, if background subtraction is zero, then capture the frame, more logic needs to be added here, like, after capturing a frame, we've to skip all following static frames which are true positive until the the next truck arrives and comes to a halt.
The code:
x_0 = 720
x_1 = 870
y_0 = 190
y_1 = 360
fgbg = cv2.createBackgroundSubtractorMOG2()
cap = cv2.VideoCapture(r"C:\\Users\\aa\\file.asf")
i=0
while(cap.isOpened()):
ret, frame = cap.read()
if ret == True:
frame = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
frame = cv2.GaussianBlur(frame, (21, 21), 0)
fgmask = fgbg.apply(frame)
fgmask_crop= fgmask[y_0:y_1, x_0:x_1]
frame_crop = frame[y_0:y_1, x_0:x_1]
#out_video.write(frame_crop)
cv2.imshow("crop", fgmask_crop)
fg = cv2.copyTo(frame,fgmask)
bg=cv2.copyTo(frame,cv2.bitwise_not(fgmask))
pixels = cv2.countNonZero(fgmask_crop)
image_area = frame_crop.shape[0] * frame_crop.shape[1]
area_ratio = (pixels / image_area) * 100
if area_ratio>10:
i=i+1
print(i)
target= 'C:\\Users\\aa\\op'
fileName = ("res%d.png" % (i))
path_nm = os.path.join(target, fileName)
cv2.imwrite(path_nm,frame_crop)
key = cv2.waitKey(25)
if key == ord('q'):
break
else:
break
cv2.destroyAllWindows()
#out.release()
cap.release()
Any help shall be highly acknowledged.
I am trying to use the haar-cascade in OpenCV 4.0 to detect faces for emotion, gender & age estimation. sometimes the detectmultiscale() function returns an empty tuple which raises an error in the later parts of recognition.
I tried creating a while loop until the face is detected, but it seems once the face is not detected it is not being detected again(in the same captured frame), I get empty tuples returned. the weird thing is that sometimes the program works flawlessly.
the detection model is being loaded correctly, since cv2.CascadeClassifier.empty(face_cascade) returns False.
there seems to be no problem with the captured frame since I can display it properly.
after searching I found that detectmultiscale() does, in fact, return an empty tuple when no faces are detected.
Python OpenCV face detection code sometimes raises `'tuple' object has no attribute 'shape'`
face_cascade = cv2.CascadeClassifier(
'C:\\Users\\kj\\Desktop\\jeffery 1\\trained_models\\detection_models\\haarcascade_frontalface_alt.xml')
retval = cv2.CascadeClassifier.empty(face_cascade)
print(retval)
returns False
def video_cap(out_queue):
video_capture = cv2.VideoCapture(0, cv2.CAP_DSHOW)
#video_capture.set(3, 768)
#video_capture.set(4, 1024)
while True:
ret, bgr_image = video_capture.read()
cv2.imshow('frame',bgr_image)
cv2.waitKey(1000)
cv2.destroyAllWindows()
if video_capture.isOpened() == False :
video_capture.open(0)
if(ret):
gray_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2GRAY)
rgb_image = cv2.cvtColor(bgr_image, cv2.COLOR_BGR2RGB)
faces = detect_faces(face_detection, gray_image)
ret_list = [gray_image, rgb_image, faces]
print("DEBUG: VIDEO_CAPTURE MODULE WORKING")
out_queue.put(ret_list)
return
video_cap function is threaded
def detect_faces(detection_model, gray_image_array):
faces1 = detection_model.detectMultiScale(gray_image_array, scaleFactor= 2, minNeighbors=10,minSize=(64,64))
while(len(faces1)== 0 ):
faces1 = detection_model.detectMultiScale(gray_image_array, scaleFactor=2, minNeighbors=10, minSize=(64, 64))
print(faces1)
if(len(faces1)!=0):
break
return faces1
I get the output:
()
()
()
()....
goes on until I terminate.
how do I fix the problem?
This is a snippet of the code I used. I removed the ARGUMENTS in the detectMultiScale() function and it ran fine.
Also, make sure you have the correct path to the xml files.
classifier = cv2.CascadeClassifier("../../../l-admin/anaconda3/lib/python3.6/site-packages/cv2/data/haarcascade_frontalface_default.xml")
img = cv2.imread('../Tolulope/Adetula Tolulope (2).jpg')
face = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = classifier.detectMultiScale(face)
print(type(faces), faces)
for (x, y, w, h) in faces:
img = cv2.imwrite("facesa.png", cv2.rectangle(img, (x, y), (x + w, y + h), (0, 255, 0), 3))
On a Secondary note, the reason my own did work might be because my camera did locate my face due to the lightning. So I suggest you try it out with a picture first before using the video.
I have a similar issue when I use jpg format but the main problem is always in the format of the image as when i used png it automatically give the tuple with correct values.
classifier = cv2.CascadeClassifier("haarcascade_frontalface_default.xml")
# reading the image
img = cv2.imread('i.png')
# showing the image
#cv2.imshow('shaswat face detection ',img)
# making image to gray scale as black and white
grayscaled_img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# cv2.imshow('shaswat face detection ',grayscaled_img)
# detecting the image
# return top left and bottom right points
faces = classifier.detectMultiScale(grayscaled_img)
print(faces)
#cv2.rectangle(img , face_coordinates[0] , face_coordinates[1] , (255,0,0) , 10)
the output shows
[[ 87 114 361 361]]
So the popular approach to lane detection using computer vision is to perform these 5 steps:
Convert the image to grayscale, smooth the image by gaussian
function
Use canny edge function to detect edges (obviously right? )
Mark the region of interest ROI
Use hough transform fucntion to detect straight lines and have line function to draw them.
That's what my approach.
But the point here is we usually need to manually select the ROI. In most case when apply to dash camera on a car, it's ok since the view does not change much.
But my situation is different, I want to detect road lanes based on traffic surveillance cameras, and of course there are many of them. Each camera has its own view, so I think that there must be a way to automatically separate the road and non-road areas.
My question is how to detect the ROI automatically?
My idea here is that the road area will have lots of pixel movements and the non-road area will not. From that idea we could automatically detect the ROI.
I have manage to use opencv to extract the background (background subtracted) from this video (https://youtu.be/bv3NEzjb5sU) using openCV and subtractBackgroundMOG2 function.
The code about canny edge and hough transform is basically ok after we have the ROI. This below is the code to train and extract the background. I though we could modify it to give the region mask or something that can use as ROI for later steps.
Thank you.
import cv2
from pathlib import Path
def bg_train(video_source, number_of_run, number_of_frames):
cap = cv2.VideoCapture(video_source)
fgbg = cv2.createBackgroundSubtractorMOG2(detectShadows=True)
frame_number = -1
default_background = Path("default_background.png")
# the number of loop that will run to create a better background
i = 0
# check the file if it's already exist
if default_background.is_file():
i = 1
default_background = "default_background.png"
else:
default_background = None
i = 0
while i < number_of_run:
# Capture next frame and show in window
ret, frame = cap.read()
if not ret:
print("frame capture failed or not :)")
break
cv2.imshow("training_original", frame)
# subtract foreground and show in new window
background_read = cv2.imread(default_background)
fg = fgbg.apply(frame, background_read, -1)
fg_mask = filter_mask(fg)
cv2.imshow('Training_FG', fg_mask)
# subtract background and show in new window
bg = fgbg.getBackgroundImage(fg_mask)
cv2.imshow('Training_background', bg)
print(i, frame_number, bg.shape[0], bg.shape[1], bg.shape[2])
# Counting frame and save the final background after training
frame_number += 1
if frame_number == number_of_frames and i < number_of_run - 1:
i += 1
cv2.imwrite("default_background.png", bg)
cap.release()
cv2.destroyAllWindows()
cap = cv2.VideoCapture(video_source)
frame_number = -1
elif frame_number == number_of_frames and i == number_of_run - 1:
i += 1
cv2.imwrite("background_final.png", bg)
cv2.imshow("final background", bg)
return 1, bg
k = cv2.waitKey(1) & 0xff
if k == 27:
print("exit by user...")
return 0, None
cap.release()
cv2.destroyAllWindows()
def main():
video_source = "highway-30s.mp4"
check, background_after = bg_train(video_source, 2, 500)
if check == 0:
return 0
elif check == 1:
cv2.imshow("the background, press ESC to close window", background_after)
c = cv2.waitKey(0)
if c == 27:
cv2.destroyAllWindows()
if __name__ == '__main__':
main()
You could train the algorithm to pick the ROI, over an initial amount of time, by tracking movement in the viewport over a series of frames.
Filter out movements and create lines/vectors representing their direction. Once you have enough of these samples you can figure out the best ROI by using a bounding box which encapsulates these vectors.