Im trying to implement a digit recognition program for Video capture in openCV. It works with normal(still) pictures as input, but when I add the video capture functionality it gets stuck while recording, if I move the camera around. My code for the program is here:
import numpy as np
import cv2
from sklearn.externals import joblib
from skimage.feature import hog
# Load the classifier
clf = joblib.load("digits_cls.pkl")
# Default camera has index 0 and externally(USB) connected cameras have
# indexes ranging from 1 to 3
cap = cv2.VideoCapture(0)
while(True):
# Capture frame-by-frame
ret, frame = cap.read()
# Convert to grayscale and apply Gaussian filtering
im_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
im_gray = cv2.GaussianBlur(im_gray, (5, 5), 0)
# Threshold the image
ret, im_th = cv2.threshold(im_gray.copy(), 120, 255, cv2.THRESH_BINARY_INV)
# Find contours in the binary image 'im_th'
_, contours0, hierarchy = cv2.findContours(im_th, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Draw contours in the original image 'im' with contours0 as input
# cv2.drawContours(frame, contours0, -1, (0,0,255), 2, cv2.LINE_AA, hierarchy, abs(-1))
# Rectangular bounding box around each number/contour
rects = [cv2.boundingRect(ctr) for ctr in contours0]
# Draw the bounding box around the numbers
for rect in rects:
cv2.rectangle(frame, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (0, 255, 0), 3)
# Make the rectangular region around the digit
leng = int(rect[3] * 1.6)
pt1 = int(rect[1] + rect[3] // 2 - leng // 2)
pt2 = int(rect[0] + rect[2] // 2 - leng // 2)
roi = im_th[pt1:pt1+leng, pt2:pt2+leng]
# Resize the image
roi = cv2.resize(roi, (28, 28), im_th, interpolation=cv2.INTER_AREA)
roi = cv2.dilate(roi, (3, 3))
# Calculate the HOG features
roi_hog_fd = hog(roi, orientations=9, pixels_per_cell=(14, 14), cells_per_block=(1, 1), visualise=False)
nbr = clf.predict(np.array([roi_hog_fd], 'float64'))
cv2.putText(frame, str(int(nbr[0])), (rect[0], rect[1]),cv2.FONT_HERSHEY_DUPLEX, 2, (0, 255, 255), 3)
# Display the resulting frame
cv2.imshow('frame', frame)
cv2.imshow('Threshold', im_th)
# Press 'q' to exit the video stream
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
The error i get, is that there is no input at the resize ROI(region of interest). I find it weird because it works as long as I don't move thing around too much in the picture. Im sure that it isn't the camera that in at fault, since I've tried a lot of different cameras. Here is the specific error message:
Traceback (most recent call last):
File "C:\Users\marti\Desktop\Code\Python\digitRecognition\Video_cap.py", line 55, in <module>
roi = cv2.resize(roi, (28, 28), im_th, interpolation=cv2.INTER_AREA)
cv2.error: D:\Build\OpenCV\opencv-3.2.0\modules\imgproc\src\imgwarp.cpp:3492: error: (-215) ssize.width > 0 && ssize.height > 0 in function cv::resize
Picture of the program in action, if a move the numbers around the program freezes
You're using a fixed threshold for the preprocessing before trying to find contours. Since cv2.resize() has to resize something, it expects the roi matrix to have non-zero width and height. I'm guessing that at some point when you're moving the camera, you don't detect any digits, because of your non-adaptive preprocessing algorithm.
I suggest that you display the thresholded image and an image with contours superimposed on the frame while moving the camera. This way you'll be able to debug the algorithm. Also, you make sure to print(len(rects)) to see if any rectangles have been detected.
Another trick would be to save the frames and run the algorithm on the last frame saved before crashing, to find out why that frame is causing the error.
Summarizing, you really need to take control over your code if you expect it to produce meaningful results. The solution - depending on your data - might be using some kind of contrast enhancement before the thresholding operaton and/or using the Otsu's Method or Adaptive Thresholding with some additional filtering.
What about trying this:
if roi.any():
roi = cv2.resize(roi, (28, 28), frame, interpolation=cv2.INTER_AREA)
roi = cv2.dilate(roi, (3, 3))
I think this does what you want (I simplified yours for the example):
cap = cv2.VideoCapture(0)
while(True):
# Capture frame-by-frame
ret, frame = cap.read()
frame2=frame.copy()
# Convert to grayscale and apply Gaussian filtering
im_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
im_gray = cv2.GaussianBlur(im_gray, (5, 5), 0)
ret, im_th = cv2.threshold(im_gray.copy(), 120, 255, cv2.THRESH_BINARY_INV)
# Find contours in the binary image 'im_th'
_, contours0, hierarchy = cv2.findContours(im_th, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Rectangular bounding box around each number/contour
rects = [cv2.boundingRect(ctr) for ctr in contours0]
# Draw the bounding box around the numbers
for rect in rects:
cv2.rectangle(frame, (rect[0], rect[1]), (rect[0] + rect[2], rect[1] + rect[3]), (255, 0, 255), 3)
# Make the rectangular region around the digit
leng = int(rect[3] * 1.6)
pt1 = int(rect[1] + rect[3] // 2 - leng // 2)
pt2 = int(rect[0] + rect[2] // 2 - leng // 2)
roi = im_th[pt1:pt1+leng, pt2:pt2+leng]
# Resize the image
if roi.any():
roi = cv2.resize(roi, (28, 28), frame, interpolation=cv2.INTER_AREA)
roi = cv2.dilate(roi, (3, 3))
# Display the resulting frame
cv2.imshow('frame', frame)
#cv2.imshow('Threshold', im_th)
# Press 'q' to exit the video stream
if cv2.waitKey(1) & 0xFF == ord('q'):
break
# When everything done, release the capture
cap.release()
cv2.destroyAllWindows()
Related
i found very interesting article about detection of moving objects, here is correspondng link :Detection of moving object
and also corresponding article : Article about object detection
i followed code and try to implement my self, here is corresponding code :
import cv2
import numpy as np
import matplotlib.pyplot as plt
from Background_Image_Creation import get_background
cap =cv2.VideoCapture("video_1.mp4")
#print(cap.get(cv2.CAP_PROP_FRAME_COUNT))
#print(cap.get(cv2.CAP_PROP_FPS))
frame_width = int(cap.get(3))
frame_height = int(cap.get(4))
save_name = "Result.mp4"
# define codec and create VideoWriter object
out = cv2.VideoWriter(save_name,cv2.VideoWriter_fourcc(*'mp4v'), 10, (frame_width, frame_height))
background_frame =get_background("video_1.mp4")
background = cv2.cvtColor(background_frame, cv2.COLOR_BGR2GRAY)
print(background.shape)
frame_count =0
consecutive_frame=8
#frame_diff_list =[]
while cap.isOpened():
ret,frame =cap.read()
print(ret)
print(frame.shape)
if ret==True:
frame_count+=1
orig_frame =frame.copy()
gray =cv2.cvtColor(frame,cv2.COLOR_BGR2GRAY)
if frame_count % consecutive_frame == 0 or frame_count == 1:
frame_diff_list =[]
frame_diff = cv2.absdiff(gray, background)
ret, thresh = cv2.threshold(frame_diff, 50, 255, cv2.THRESH_BINARY)
dilate_frame = cv2.dilate(thresh, None, iterations=2)
frame_diff_list.append(dilate_frame)
print(frame_diff_list)
if len(frame_diff_list) == consecutive_frame:
# add all the frames in the `frame_diff_list`
sum_frames = sum(frame_diff_list)
print(sum_frames)
# find the contours around the white segmented areas
contours, hierarchy = cv2.findContours(sum_frames, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# draw the contours, not strictly necessary
for i, cnt in enumerate(contours):
cv2.drawContours(frame, contours, i, (0, 0, 255), 3)
for contour in contours:
# continue through the loop if contour area is less than 500...
# ... helps in removing noise detection
if cv2.contourArea(contour) < 500:
continue
# get the xmin, ymin, width, and height coordinates from the contours
(x, y, w, h) = cv2.boundingRect(contour)
# draw the bounding boxes
cv2.rectangle(orig_frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.imshow('Detected Objects', orig_frame)
out.write(orig_frame)
if cv2.waitKey(100) & 0xFF == ord('q'):
break
else:
break
cap.release()
cv2.destroyAllWindows()
code for background frame creation is also presented :
import numpy as np
import cv2
import matplotlib.pyplot as plt
def get_background(path):
cap =cv2.VideoCapture(path)
frame_indices =cap.get(cv2.CAP_PROP_FRAME_COUNT)*np.random.uniform(size=50)
frames =[]
for idx in frame_indices:
cap.set(cv2.CAP_PROP_POS_FRAMES,idx)
ret,frame =cap.read()
frames.append(frame)
median_frame = np.median(frames, axis=0).astype(np.uint8)
return median_frame
#median_frame =get_background("video_1.mp4")
#cv2.imshow("Median_Background",median_frame)
#cv2.waitKey(0)
#cv2.destroyAllWindows()
#plt.show()
code runs fine, but output video does not contain anything, it just 1 KGB size, one thing what i am thinking is that this fragment
frame_diff_list.append(dilate_frame)
is colored with yellow color, here is screenshot :
and also when i try to print print(frame_diff_list)
it just printed one output :
i was more surprised when i have tested
print(ret)
print(frame.shape)
from the begining of the loop and it just printed one output :
True
(360, 640, 3)
it seems that loop does not cover all frames right? could you help me please to figure out what is wrong with my code?
I try to play a ROI of the window as shown in the picture by receiving the image with the Webcam.
- Conditions
1) The size of the main window is 400x300.
2) ROI is set to 320x240 at (30,30).
3) Put a red border on the ROI.
enter image description here
Here is how I did it:
Create a mask with the same size as the frame.
Make a square ROI on the mask.
Use cv2.add to synthesize frame and mask, and assign it to dst.
Check dst with cv2.imshow.
Here is code
import numpy as np, cv2
capture = cv2.VideoCapture(0)
if capture.isOpened() is None:
raise Exception("Camera not connected")
capture.set(cv2.CAP_PROP_FRAME_WIDTH, 300)
capture.set(cv2.CAP_PROP_FRAME_HEIGHT, 400)
while True:
ret, frame = capture.read()
if not ret:
break
mask = np.zeros(frame.shape[:2], np.uint8)
cv2.rectangle(mask, (30, 30), (320, 240), (0, 0, 255), -1)
dst = cv2.add(frame, frame, mask = mask)
cv2.imshow("mask", mask)
cv2.imshow("frame", frame)
cv2.imshow("dst", dst)
if cv2.waitKey(10) == 27:
break
capture.release()
cv2.destroyAllWindows()
But, I don't solve it
Any other way?
Thanks for the feedback.
A mask has only one channel. Change this line:
cv2.rectangle(mask, (30, 30), (320, 240), 255, -1)
Additionally you shouldn't use cv2.add() because it changes your image. Better do:
dst = cv2.bitwise_and(frame, frame, mask=mask)
And this is how you can simply crop the image, if you need to:
dst = frame[30:240, 30:320]
Regarding the setting of frame width and height please read this.
I want to use OCR (pytesseract) to recognize the text located in images like these:
I have thousands of these arrows. Until now the procedure is as follows: I first resize the image (for another process). Then I crop the image to get rid of the most part of the arrow. Next I draw a white rectangle as a frame to remove further noise but still have distance between text and image borders for better text recognition. I resize the image again to ensure a height of capital letters to ~30 px (https://groups.google.com/forum/#!msg/tesseract-ocr/Wdh_JJwnw94/24JHDYQbBQAJ). Finally I binarize the image with a threshold of 150.
Full code:
import cv2
image_file = '001.jpg'
# load the input image and grab the image dimensions
image = cv2.imread(image_file, cv2.IMREAD_GRAYSCALE)
(h_1, w_1) = image.shape[:2]
# resize the image and grab the new image dimensions
image = cv2.resize(image, (int(w_1*320/h_1), 320))
(h_1, w_1) = image.shape
# crop image
image_2 = image[70:h_1-70, 20:w_1-20]
# get image_2 height, width
(h_2, w_2) = image_2.shape
# draw white rectangle as a frame around the number -> remove noise
cv2.rectangle(image_2, (0, 0), (w_2, h_2), (255, 255, 255), 40)
# resize image, that capital letters are ~ 30 px in height
image_2 = cv2.resize(image_2, (int(w_2*50/h_2), 50))
# image binarization
ret, image_2 = cv2.threshold(image_2, 150, 255, cv2.THRESH_BINARY)
# save image to file
cv2.imwrite('processed_' + image_file, image_2)
# tesseract part can be commented out
import pytesseract
config_7 = ("-c tessedit_char_whitelist=0123456789AB --oem 1 --psm 7")
text = pytesseract.image_to_string(image_2, config=config_7)
print("OCR TEXT: " + "{}\n".format(text))
The problem is that the text located in the arrow is never centered. Sometimes I remove part of the text with the method described above (e.g. in image 50A).
Is there a method in image processing to get rid of the arrow in a more elegant way? For instance using contour detection and deletion? I am more interested in the OpenCV part than the tesseract part to recognize the text.
Any help is appreciated.
If you look at the pictures you will see that there is a white arrow in the image which is also the biggest contour (especially if you draw a black border on the image). If you make a blank mask and draw the arrow (biggest contour on the image) then erode it a little bit you can perform a per element bitwise conjunction of the actual image and eroded mask. If it is not clear look at the bottom code and comments and you will see that it is actually pretty simple.
# imports
import cv2
import numpy as np
img = cv2.imread("number.png") # read image
# you can resize the image here if you like - it should still work for both sizes
h, w = img.shape[:2] # get the actual images height and width
img = cv2.resize(img, (int(w*320/h), 320))
h, w = img.shape[:2]
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) # transform to grayscale
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)[1] # perform OTSU threhold
cv2.rectangle(thresh, (0, 0), (w, h), (0, 0, 0), 2)
contours = cv2.findContours(thresh, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)[0] # search for contours
max_cnt = max(contours, key=cv2.contourArea) # select biggest one
mask = np.zeros((h, w), dtype=np.uint8) # create a black mask
cv2.drawContours(mask, [max_cnt], -1, (255, 255, 255), -1) # draw biggest contour on the mask
kernel = np.ones((15, 15), dtype=np.uint8) # make a kernel with appropriate values - in both cases (resized and original) 15 is ok
erosion = cv2.erode(mask, kernel, iterations=1) # erode the mask with given kernel
reverse = cv2.bitwise_not(img.copy()) # reversed image of the actual image 0 becomes 255 and 255 becomes 0
img = cv2.bitwise_and(reverse, reverse, mask=erosion) # per-element bit-wise conjunction of the actual image and eroded mask (erosion)
img = cv2.bitwise_not(img) # revers the image again
# save image to file and display
cv2.imwrite("res.png", img)
cv2.imshow("img", img)
cv2.waitKey(0)
cv2.destroyAllWindows()
Result:
You can try simple Python script:
import cv2
import numpy as np
img = cv2.imread('mmubS.png', cv2.IMREAD_GRAYSCALE)
thresh = cv2.threshold(img, 200, 255, cv2.THRESH_BINARY_INV )[1]
im_flood_fill = thresh.copy()
h, w = thresh.shape[:2]
im_flood_fill=cv2.rectangle(im_flood_fill, (0,0), (w-1,h-1), 255, 2)
mask = np.zeros((h + 2, w + 2), np.uint8)
cv2.floodFill(im_flood_fill, mask, (0, 0), 0)
im_flood_fill = cv2.bitwise_not(im_flood_fill)
cv2.imshow('clear text', im_flood_fill)
cv2.imwrite('text.png', im_flood_fill)
Result:
I wrote a code to detect a white color using a Webcam (Logitech C310 HD Webcam) in RaspberryPi 3B+.
Code will perform the following functions:
Capture the video using cv2.VideoCapture (0)
Grab each frame of video and look for white objects in that frame.
If a white object is present in that frame, code will encircle it and will print White.
Real video and processed video will be shown using cv2.imshow('frame',frame1) and cv2.imshow('res',res1).
Code is shown below:
import cv2
import numpy as np
from time import sleep
cap1 = cv2.VideoCapture(0)
cap1.set(3,640)
cap1.set(4,480)
cap1.set(5,15)
while(1):
_, frame1 = cap1.read()
#hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# define range of white color in HSV
# change it according to your need !
lower_white = np.array([150,150,150], dtype=np.uint8)
upper_white = np.array([255,255,255], dtype=np.uint8)
# Threshold the HSV image to get only white colors
mask1 = cv2.inRange(frame1, lower_white, upper_white)
kernal = np.ones((5, 5), "uint8")
# Tracking the Red Color
(_, contours1, hierarchy) = cv2.findContours(mask1, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
for pic, contour1 in enumerate(contours1):
area1 = cv2.contourArea(contour1)
#start = time.time()
if (area1 > 1200):
print 'white in cam 1'
#x, y, w, h = cv2.boundingRect(contour)
#img = cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)
ellipse1 = cv2.fitEllipse(contour1)
cv2.ellipse(frame1, ellipse1, (0, 255, 0), 2)
#cv2.putText(frame, "RED color", (x, y), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255))
red = cv2.dilate(mask1, kernal)
# Bitwise-AND mask and original image
res1 = cv2.bitwise_and(frame1,frame1, mask= mask1)
cv2.imshow('frame',frame1)
cv2.imshow('res',res1)
k = cv2.waitKey(5) & 0xFF
if k == 27:
break
cap1.release()
cv2.destroyAllWindows()
My problem is that a few months back, this code was working fine in real-time. But now when I again tried to run the same code, my video didn't run in real-time but a huge time lag is present. For example, when there is a white object in front of the camera, it will take a few minutes to show it as white. In past, when I ran this code, video frame window appears without any time lag but now frame window appear after a few minutes.
Please explain to me how to solve this problem? Why is it so that in past same code was working but now it's not? Is it a problem of python2 or python3 or RaspberryPi?
I tried to make a motion detector but as I saw, it is very sensitive because of lighting conditions. I would like to detect if there was a change on a dart board (so when somebody throws a dart, detect it).
This is how I tried it:
"""
Vecsei Gabor
"""
import cv2
#Initialize the first frame in the video stream
prevFrame = None
#Area of the detected contour, below this value it's not counted as detected
dontCare = 500
#Capture from webcam
cap = cv2.VideoCapture(0)
#Limit the FPS to 10 (For this task the lower the better)
cap.set(cv2.cv.CV_CAP_PROP_FPS, 15)
#counter for the detection
i = 0
while(True):
# Capture frame-by-frame
ret, frame = cap.read()
#Blur for better results
output = cv2.GaussianBlur(frame, (21, 21), 0)
#If the first frame is None, initialize it
if prevFrame is None:
prevFrame = output
continue
#Compute the absolute difference between the current frame and
#First frame
frameDelta = cv2.absdiff(prevFrame, output)
#Convert to gray to detect contours
frameDelta = cv2.cvtColor(frameDelta, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(frameDelta, 21, 255, cv2.THRESH_BINARY)[1]
#Dilate the thresholded image to fill in holes, then find contours
#on thresholded image
thresh = cv2.dilate(thresh, None, iterations=2)
cnts, hier = cv2.findContours(thresh.copy(), cv2.RETR_EXTERNAL,
cv2.CHAIN_APPROX_SIMPLE)
cnts_sorted = sorted(cnts, key = cv2.contourArea, reverse = True)[:1]
#Loop over the contours
for c in cnts_sorted:
#If the contour is too small, ignore it
if cv2.contourArea(c) < dontCare:
continue
# compute the bounding box for the contour, draw it on the frame,
# and update the text
(x, y, w, h) = cv2.boundingRect(c)
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
i+=1
print "Detected something!" + str(i)
print "Area: " + str(cv2.contourArea(c))
prevFrame = output
cv2.imshow('Webcam ',frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
Basically I just watch for the differences between two frames and if there was a change on the frame I save it as out new image, so we can detect new changes.