How can I go about trying to order the items of a picture from top left to bottom right, such as in the image below? Currently receiving this error with the following code .
a = sorted(keypoints, key=lambda p: (p[0]) + (p1))[0] # find upper left point
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
This question is modelled from this: Ordering coordinates from top left to bottom right
def preprocess(img):
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_blur = cv2.GaussianBlur(img_gray, (5, 5), 1)
img_canny = cv2.Canny(img_blur, 50, 50)
kernel = np.ones((3, 3))
img_dilate = cv2.dilate(img_canny, kernel, iterations=2)
img_erode = cv2.erode(img_dilate, kernel, iterations=1)
return img_erode
image_final = preprocess(picture_example.png)
keypoints, hierarchy = cv2.findContours(image_final, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
points = []
while len(keypoints) > 0:
a = sorted(keypoints, key=lambda p: (p[0]) + (p[1]))[0] # find upper left point
b = sorted(keypoints, key=lambda p: (p[0]) - (p[1]))[-1] # find upper right point
cv2.line(image_final, (int([0]), int([1])), (int([0]), int([1])), (255, 0, 0), 1)
# convert opencv keypoint to numpy 3d point
a = np.array([[0],[1], 0])
b = np.array([[0],[1], 0])
row_points = []
remaining_points = []
for k in keypoints:
p = np.array([[0],[1], 0])
d = k.size # diameter of the keypoint (might be a theshold)
dist = np.linalg.norm(np.cross(np.subtract(p, a), np.subtract(b, a))) / np.linalg.norm(b) # distance between keypoint and line a->b
if d/2 > dist:
points.extend(sorted(row_points, key=lambda h:[0]))
keypoints= remaining_points
New Picture:
Reference Ordering Picture:
Will use center of mass to determine center point ordering.
The resulting numbering depends on how many rows you want there to be. With the program I will show you how to make, you can specify the number of rows before you run the program.
For example, here is the original image:
Here is the numbered image when you specify 4 rows:
Here is the numbered image when you specify 6 rows:
For the other image you provided (with its frame cropped so the frame won't be detected as a shape), you can see there will be 4 rows, so putting 4 into the program will give you:
Let's have a look at the workflow considering 4 rows. The concept I used is to divide the image into 4 segments along the y axis, forming 4 rows. For each segment of the image, find every shape that has its center in that segment. Finally, order the shapes in each segment by their x coordinate.
Import the necessary libraries:
import cv2
import numpy as np
Define a function that will take in an image input and return the image processed to something that will allow python to later retrieve their contours:
def process_img(img):
img_gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
img_canny = cv2.Canny(img_gray, 100, 100)
kernel = np.ones((2, 3))
img_dilate = cv2.dilate(img_canny, kernel, iterations=1)
img_erode = cv2.erode(img_dilate, kernel, iterations=1)
return img_erode
Define a function that will return the center of a contour:
def get_centeroid(cnt):
length = len(cnt)
sum_x = np.sum(cnt[..., 0])
sum_y = np.sum(cnt[..., 1])
return int(sum_x / length), int(sum_y / length)
Define a function that will take in a processed image and return the center points of the shapes found in the image:
def get_centers(img):
contours, hierarchies = cv2.findContours(img, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)
for cnt in contours:
if cv2.contourArea(cnt) > 100:
yield get_centeroid(cnt)
Define a function that will take in an image array, img, an array of coordinates, centers, the number of segments for the image, row_amt, and the height of each segment, row_h, as input. It will return row_amt arrays (sorted by their x coordinates), each containing every point in centers that lies in its corresponding row of the image:
def get_rows(img, centers, row_amt, row_h):
centers = np.array(centers)
d = row_h / row_amt
for i in range(row_amt):
f = centers[:, 1] - d * i
a = centers[(f < d) & (f > 0)]
yield a[a.argsort(0)[:, 0]]
Read in the image, get its processed form using the processed function defined, and get the center of each shape in the image using the centers function defined:
img = cv2.imread("shapes.png")
img_processed = process_img(img)
centers = list(get_centers(img_processed))
Get the height of the image to use for the get_rows function defined, and define a count variable, count, to keep track of the numbering:
h, w, c = img.shape
count = 0
Loop through the centers of the shape divided into 4 rows, drawing the line that connects the rows for visualization:
for row in get_rows(img, centers, 4, h):
cv2.polylines(img, [row], False, (255, 0, 255), 2)
for x, y in row:
Add to the count variable, and draw the count onto the specific location on the image from the row array:
count += 1, (x, y), 10, (0, 0, 255), -1)
cv2.putText(img, str(count), (x - 10, y + 5), 1, cv2.FONT_HERSHEY_PLAIN, (0, 255, 255), 2)
Finally, show the image:
cv2.imshow("Ordered", img)
This is not completly the same task as in your linked question you took the code from:
You have contours, while the other question has points.
You have to come up with a method to sort contours (they might overlap in one dimension and so on...). There are multiple ways to do that, depending on your use case. The easiest might be to use the center of mass of your contour. This can be done like here: Center of mass in contour (Python, OpenCV). Then you can make an array of objects out of it, that contain points and use the code that you found.
The code that you found assumes that the points are basically more or less on a grid. So all the points 1-5 on your reference image are roughly on a line. In the new picture you posted, this is not really the case. It might be better to go for a clustering approach here: Cluster the center points y coordinates with some aproach (maybe one from here). Then for each cluster: sort the elements by there centers x coordinate.
As I already said there are multiple ways in doing that and it depends hardly on your use case.
sorry if the title is unclear. Basically, I've written a program that tracks an object of a certain color as it moves around my webcam's FOV. As I move the object around, the computer places a red dot on the center of the object and moves the dot with the object. However, the object's location doesn't really mean anything yet. I want the frame to be divided into four equal parts and each part outputs a different number. For example, if the object (dot) is in quadrant one, I want the number 1 to appear on the frame. How would I do this? Can anyone nudge me in the right direction? I'm using OpenCV-Python and am grateful for any help.
Here is the code I have so far.
# import the necessary packages
from collections import deque
from import VideoStream
import numpy as np
import argparse
import cv2
import imutils
import time
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-v", "--video",
help="path to the (optional) video file")
ap.add_argument("-b", "--buffer", type=int, default=32,
help="max buffer size")
args = vars(ap.parse_args())
# define the lower and upper boundaries of the "orange"
# fish in the HSV color space
orangeLower = (5, 50, 50)
orangeUpper = (15, 255, 255)
# initialize the list of tracked points, the frame counter,
# and the coordinate deltas
pts = deque(maxlen=args["buffer"])
counter = 0
(dX, dY) = (0, 0)
direction = ""
# if a video path was not supplied, grab the reference
# to the webcam
if not args.get("video", False):
vs = VideoStream(src=0).start()
# otherwise, grab a reference to the video file
vs = cv2.VideoCapture(args["video"])
# allow the camera or video file to warm up
# keep looping
while True:
# grab the current frame
frame =
# handle the frame from VideoCapture or VideoStream
frame = frame[1] if args.get("video", False) else frame
# if we are viewing a video and we did not grab a frame,
# then we have reached the end of the video
if frame is None:
# resize the frame, blur it, and convert it to the HSV
# color space
frame = imutils.resize(frame, width=600)
blurred = cv2.GaussianBlur(frame, (11, 11), 0)
hsv = cv2.cvtColor(blurred, cv2.COLOR_BGR2HSV)
# construct a mask for the color "orange", then perform
# a series of dilations and erosions to remove any small
# blobs left in the mask
mask = cv2.inRange(hsv, orangeLower, orangeUpper)
mask = cv2.erode(mask, None, iterations=2)
mask = cv2.dilate(mask, None, iterations=2)
# find contours in the mask and initialize the current
# (x, y) center of the ball
cnts = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,
cnts = imutils.grab_contours(cnts)
center = None
# only proceed if at least one contour was found
if len(cnts) > 0:
# find the largest contour in the mask, then use
# it to compute the minimum enclosing circle and
# centroid
c = max(cnts, key=cv2.contourArea)
((x, y), radius) = cv2.minEnclosingCircle(c)
M = cv2.moments(c)
center = (int(M["m10"] / M["m00"]), int(M["m01"] / M["m00"]))
# only proceed if the radius meets a minimum size
if radius > 10:
# draw the circle and centroid on the frame,
# then update the list of tracked points, (int(x), int(y)), int(radius),
(0, 255, 255), 2), center, 5, (0, 0, 255), -1)
# loop over the set of tracked points
for i in np.arange(1, len(pts)):
# if either of the tracked points are None, ignore
# them
if pts[i - 1] is None or pts[i] is None:
# check to see if enough points have been accumulated in
# the buffer
if counter >= 10 and i == 10 and pts[i-10] is not None:
# compute the difference between the x and y
# coordinates and re-initialize the direction
# text variables
dX = pts[i-10][0] - pts[i][0]
dY = pts[i-10][1] - pts[i][1]
(dirX, dirY) = ("", "")
# ensure there is significant movement in the
# x-direction
if np.abs(dX) > 20:
dirX = "East" if np.sign(dX) == 1 else "West"
# ensure there is significant movement in the
# y-direction
if np.abs(dY) > 20:
dirY = "South" if np.sign(dY) == 1 else "North"
# handle when both directions are non-empty
if dirX != "" and dirY != "":
direction = "{}-{}".format(dirY, dirX)
# otherwise, only one direction is non-empty
direction = dirX if dirX != "" else dirY
# otherwise, compute the thickness of the line and
# draw the connecting lines
thickness = int(np.sqrt(args["buffer"] / float(i + 1)) * 2.5)
cv2.line(frame, pts[i - 1], pts[i], (0, 0, 255), thickness)
# show the movement deltas and the direction of movement on
# the frame
cv2.putText(frame, direction, (10, 30), cv2.FONT_HERSHEY_SIMPLEX,
0.65, (0, 0, 255), 3)
cv2.putText(frame, "dx: {}, dy: {}".format(dX, dY),
(10, frame.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX,
0.35, (0, 0, 255), 1)
# show the frame to the screen and increment the frame counter
cv2.imshow("Frame", frame)
cv2.rectangle(img=frame, pt1=(0, 0), pt2=(300, 225), color=(0, 0, 0), thickness=3, lineType=8, shift=0)
cv2.rectangle(img=frame, pt1 = (300, 1), pt2 = (600, 225), color = (0, 0, 0), thickness = 3, lineType = 8, shift = 0)
cv2.rectangle(img=frame, pt1 = (0, 225), pt2 = (300, 550), color = (0, 0, 0), thickness = 3, lineType = 8, shift = 0)
cv2.rectangle(img=frame, pt1 = (300, 225), pt2 = (600, 550), color = (0, 0, 0), thickness = 3, lineType = 8, shift = 0)
cv2.imshow("Frame", frame)
key = cv2.waitKey(1) & 0xFF
counter += 1
# if the 'q' key is pressed, stop the loop
if key == ord("q"):
# if we are not using a video file, stop the camera video stream
if not args.get("video", False):
# otherwise, release the camera
# close all windows
Here is an image of the frame I get when I run the code.
As you can see, there are lines dividing the image in fourths. These rectangles are where I want the outputs to be.
I'm not sure what is going on here but when I use the findContours() function using the cv2.RETER_EXTERNAL on this image:
it still seem to detect inside contours and calculates the area of them weirdly which prevents me from filtering the unwanted contours....
Any clue to why is that?
Here is the original and dialated tresh images:
here's the code so far:
import cv2
import PIL
import numpy as np
import imutils
imgAddr = "ADisplay2.jpg"
cropX = 20
cropY = 200
cropAngle = 2
CropIndex = (cropX, cropY, cropAngle)
img = cv2.imread(imgAddr)
cv2.imshow("original image",img)
(h, w) = img.shape[:2]
(cX, cY) = (w / 2, h / 2)
# rotate our image by 45 degrees
M = cv2.getRotationMatrix2D((cX, cY), -1.2, 1.0)
rotated = cv2.warpAffine(img, M, (w, h))
#cv2.imshow("Rotated by 45 Degrees", rotated)
cropedImg = rotated[300:700, 100:1500]
# grab the dimensions of the image and calculate the center of the image
#cv2.imshow("croped img", cropedImg)
grayImg = cv2.cvtColor(cropedImg, cv2.COLOR_BGR2GRAY)
#cv2.imshow("gray scale image", grayImg)
blurredImg = cv2.GaussianBlur(grayImg, (9, 9), 0)
cv2.imshow("Blurred_Img", blurredImg)
(T, threshInvImg) = cv2.threshold(blurredImg, 0, 255,
cv2.imshow("ThresholdInvF.jpg", threshInvImg)
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (7,19))
#opening = cv2.morphologyEx(threshInvImg, cv2.MORPH_OPEN, kernel)
#cv2.imshow("openingImg", opening)
dialeteImg = cv2.morphologyEx(threshInvImg, cv2.MORPH_DILATE, kernel)
cv2.imshow("erodeImg", dialeteImg)
cannyImg = cv2.Canny(dialeteImg, 100,200)
cv2.imshow("Canny_img", cannyImg)
hierarchy,cntsImg,_ = cv2.findContours(cannyImg,cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
#print("Img cnts: {}".format(cntsImg))
#print("Img hierarchy: {}".format(hierarchy))
txtOffset = (25, 50)
for cntIdx, cnt in enumerate(cntsImg):
cntArea = cv2.contourArea(cnt)
print("Area of contour #{} = {}".format(cntIdx, cntArea))
(x, y, w, h) = cv2.boundingRect(cnt)
cv2.rectangle(cropedImg, (x, y), (x + w, y + h), (0, 255, 0), 2)
txtIdxPos = [x,y]
txtPos = ((txtIdxPos[0] + txtOffset[0]), (txtIdxPos[1] + txtOffset[1]))
cv2.putText(cropedImg, "#{}".format(cntIdx), txtPos, cv2.FONT_HERSHEY_SIMPLEX, 1.25, (0, 0, 255), 4)
cv2.imshow("drawCntsImg.jpg", cropedImg)
Thanks for helping :D
What you could do is to only use them if they're within a certain size. For this you could use contourArea(). It seems you already compute this anyhow.
For example:
for cntIdx, cnt in enumerate(cntsImg):
cntArea = cv2.contourArea(cnt)
# Skip iteration if area is too big or small to filter out non-digits
if cntArea < 50 or cntArea > 100: continue # Need to fiddle with these values
print("Area of contour #{} = {}".format(cntIdx, cntArea))
(x, y, w, h) = cv2.boundingRect(cnt)
cv2.rectangle(cropedImg, (x, y), (x + w, y + h), (0, 255, 0), 2)
txtIdxPos = [x,y]
txtPos = ((txtIdxPos[0] + txtOffset[0]), (txtIdxPos[1] + txtOffset[1]))
cv2.putText(cropedImg, "#{}".format(cntIdx), txtPos, cv2.FONT_HERSHEY_SIMPLEX,1.25, (0, 0, 255), 4)
You are already printing out each contour's area. You could use that to get an idea of what sizes to let through.
If the size digits might vary between images it could still be a problem. For that you could, for example, calculate the average contour area, which should be very close to the typical digit area. Then say that each contour should be at least this close to the average area.
Note: Just remember to make the minimum area large enough to let a 1 through.
If you want to rather use aspect ratio, then it's easy to change your formula, as you already calculate the height and width.
# If height is smaller than 1.5*w or larger than 2.5*w, then skip
if not 1.5 < h/w < 2.5: continue # Need to fiddle with these values
You could even use this to calculate the area. It has a chance as being different from contourArea. For example:
cntArea = w*h
I'm trying to apply Kalman filter with opencv in python for tracking position of a ball. I can already detect it but there is still some noise I want to eliminate. There are two variables I measure - x and y position - and there are four variables I would like to get - x and y position and x and y velocity - but I get none. When I display x0, y0, vy and vx on the screen I get "[.0]".
Another problem is that I cannot apply control matrix to kalman.predict() function because I get the following error:
OpenCV Error: Assertion failed (a_size.width == len) in gemm, file /tmp/opencv3-20170518-8732-1bjq2j7/opencv-3.2.0/modules/core/src/matmul.cpp, line 1537
Traceback (most recent call last):
File "", line 128, in <module>
kalmanout = kalman.predict(kalman.controlMatrix)
cv2.error: /tmp/opencv3-20170518-8732-1bjq2j7/opencv-3.2.0/modules/core/src/matmul.cpp:1537: error: (-215) a_size.width == len in function ge
This is the piece of code I'm using for Kalman filter (for control matrix application I use line kalmanout = kalman.predict(kalman.controlMatrix) at the end:
# import the necessary packages
from collections import deque
import numpy as np
import argparse
import imutils
import cv2
import time
# construct the argument parse and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-v", "--video",
help="path to the (optional) video file")
ap.add_argument("-b", "--buffer", type=int, default=10,
help="max buffer size")
ap.add_argument("-a", "--min-area", type=int, default=500, help="minimum area size")
args = vars(ap.parse_args())
# define the lower and upper boundaries of the "blue"
# ball in the HSV color space, then initialize the
# list of tracked points
greenLower = (48, 62, 88)
greenUpper = (151, 238, 255)
pts = deque(maxlen=args["buffer"])
tintervals = deque(maxlen=args["buffer"])
tPrev = 0;
pRad = 0
mapix = 0
mspeed = 0
# if a video path was not supplied, grab the reference
# to the webcam
if not args.get("video", False):
camera = cv2.VideoCapture(0)
# otherwise, grab a reference to the video file
camera = cv2.VideoCapture(args["video"])
# keep looping
#initialize background subtraction
fgbg = cv2.createBackgroundSubtractorMOG2()
while True:
# grab the current frame
(grabbed, frame) =
displayx = 0
# start counting time
tPrev = time.time()
# if we are viewing a video and we did not grab a frame,
# then we have reached the end of the video
if args.get("video") and not grabbed:
# resize the frame and apply background subtraction
frame = imutils.resize(frame, width=500)
mask = fgbg.apply(frame)
res = cv2.bitwise_and(frame, frame, mask = mask)
# blur the frame and convert it to the HSV
blurred = cv2.GaussianBlur(res, (11, 11), 0)
hsv = cv2.cvtColor(res, cv2.COLOR_BGR2HSV)
# construct a mask for the color "blue", then perform
# a series of dilations and erosions to remove any small
# blobs left in the mask
mask = cv2.inRange(hsv, greenLower, greenUpper)
mask = cv2.erode(mask, None, iterations=2)
mask = cv2.dilate(mask, None, iterations=2)
# find contours in the mask and initialize the current
# (x, y) center of the ball
cnts = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,
center = None
# only proceed if at least one contour was found
if len(cnts) > 0:
# find the largest contour in the mask, then use
# it to compute the minimum enclosing circle and
# centroid
c = max(cnts, key=cv2.contourArea)
((x, y), radius) = cv2.minEnclosingCircle(c)
pRad = radius
M = cv2.moments(c)
center = (int(M["m10"] / M["m00"]), int(M["m01"] / M["m00"]))
# only proceed if the radius meets a minimum size
if radius > 10:
# draw the circle and centroid on the frame,
# then update the list of tracked points, (int(x), int(y)), int(radius),
(0, 255, 255), 2), center, 5, (0, 0, 255), -1)
# update time intervals queue
tintervals.appendleft(time.time() - tPrev)
# update the points queue
# predict position of the ball
if (pRad > 0 and len(pts) > 5):
if pts[0] != None and pts[1] != None:
apix = 98.1/(0.032/pRad)
mapix = apix
y0 = pts[0][1]
x0 = pts[0][0]
kalmanin = np.array((2,1), np.float32) # measurement
kalmanout = np.zeros((4,1), np.float32) # tracked / prediction
kalmanin = np.array([[np.float32(x0)],[np.float32(y0)]])
tkalman = 0.01
kalman = cv2.KalmanFilter(4,2)
kalman.measurementMatrix = np.array([[1,0,0,0],[0,1,0,0]],np.float32)
kalman.transitionMatrix = np.array([[1,0,tkalman,0],[0,1,0,tkalman],[0,0,1,0],[0,0,0,1]],np.float32)
kalman.controlMatrix = np.array([[0],[0.5*(tkalman**2.0)], [0],[tkalman]],np.float32) * mapix
kalman.processNoiseCov = np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]],np.float32) * 0.03
kalman.processNoiseCov = np.array([[1,0,0,0],[0,1,0,0],[0,0,1,0],[0,0,0,1]],np.float32) * 0.03
kalman.measurementNoiseCov = np.array([[1,0],[0,1]],np.float32) * 0.00009
kalmanout = kalman.predict(kalman.controlMatrix)
x0 = kalmanout[0]
y0 = kalmanout[1]
vx = kalmanout[2]
vy = kalmanout[3]
displayx = x0
listX = []
listY = []
for i in range(1, 11):
t = 0.01 * i
y = y0 + vy * t + (apix * (t ** 2)) / 2
x = x0 + vx * t
mspeed = vy
for i in range(0, 9):
cv2.line(frame, (listX[i], listY[i]), (listX[i+1], listY[i+1]), (255, 0, 0), 4)
# loop over the set of tracked points
for i in xrange(1, len(pts)):
# if either of the tracked points are None, ignore
# them
if pts[i - 1] is None or pts[i] is None:
# otherwise, compute the thickness of the line and
# draw the connecting lines
thickness = int(np.sqrt(args["buffer"] / float(i + 1)) * 2.5)
cv2.line(frame, pts[i - 1], pts[i], (0, 0, 255), thickness)
cv2.putText(frame, "y axis speed: {}".format(displayx),
(120, frame.shape[0] - 70), cv2.FONT_HERSHEY_SIMPLEX,
0.5, (0, 0, 255), 1)
cv2.putText(frame, "radius in px: {}".format(pRad),
(120, frame.shape[0] - 30), cv2.FONT_HERSHEY_SIMPLEX,
0.5, (0, 0, 255), 1)
cv2.putText(frame, "apix: {}".format(mapix),
(120, frame.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX,
0.5, (0, 0, 255), 1)
if (mapix != 0):
cv2.putText(frame, "radius in meters: {}".format((9.81*pRad)/mapix),
(120, frame.shape[0] - 50), cv2.FONT_HERSHEY_SIMPLEX,
0.5, (0, 0, 255), 1)
# shows x, y position, (newest input from pts)
cv2.putText(frame, "x, y: {}".format(pts[0]),
(10, frame.shape[0] - 10), cv2.FONT_HERSHEY_SIMPLEX,
0.35, (0, 0, 255), 1)
# show the frame to our screen
cv2.imshow("Frame", frame)
key = cv2.waitKey(1) & 0xFF
# if the 'q' key is pressed, stop the loop
if key == ord("q"):
# cleanup the camera and close any open windows
First of all I would move the initialization of the Kalman filter outside the loop. The main issue with your code is that you have set the control matrix. If I understand your task you are only observing the system, not controlling it. Just skip the kalman.controlMatrix initialization or set it to a zero matrix. In the loop you then just use
kalmanout = kalman.predict()
I'm trying to detect whether the user's eyes are open or closed in a live video, using haar cascade algorithm in python.
Unfortunately it doesn't work well.
I understood that "haarcascade_eye.xml" is used to detect open eyes and "haarcascade_lefteye_2splits" is used to detect an eye (closed or open).
I wanted to compare the open eyes and eyes in general in the video but it makes false recognition of closed eyes. Are there other\more ways to improve it?
Here is my code:
import numpy as np
import cv2
face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
lefteye_cascade = cv2.CascadeClassifier('haarcascade_lefteye_2splits.xml')
cap = cv2.VideoCapture(0)
while True:
ret, img =
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x, y, w, h) in faces:
cv2.rectangle(img, (x, y), (x + w, y + h), (255, 0, 0), 2)
# regions of interest
roi_gray = gray[y:y + h, (x+w)/2:x + w]
roi_color = img[y:y + h, (x+w)/2:x + w]
eye = 0
openEye = 0
counter = 0
openEyes = eye_cascade.detectMultiScale(roi_gray)
AllEyes = lefteye_cascade.detectMultiScale(roi_gray)
for (ex, ey, ew, eh) in openEyes:
openEye += 1
cv2.rectangle(roi_color, (ex, ey), (ex + ew, ey + eh), (0, 255, 0),2)
for (ex, ey, ew, eh) in AllEyes:
eye += 1
cv2.rectangle(roi_color, (ex, ey), (ex + ew, ey + eh), (0, 0, 40),2)
if (openEye != eye):
print ('alert')
cv2.imshow('img', img)
k = cv2.waitKey(30) & 0xff
if k == 27:
eventually I used DLib library for recognizing facial landmarks :)
Check this out. it gives the eye status. change the threshold according to lightning condictions.
resource :
# import the necessary packages
from scipy.spatial import distance as dist
from import FileVideoStream
from import VideoStream
from imutils import face_utils
import numpy as np
import argparse
import imutils
import time
import dlib
import cv2
def eye_aspect_ratio(eye):
# compute the euclidean distances between the two sets of
# vertical eye landmarks (x, y)-coordinates
A = dist.euclidean(eye[1], eye[5])
B = dist.euclidean(eye[2], eye[4])
# compute the euclidean distance between the horizontal
# eye landmark (x, y)-coordinates
C = dist.euclidean(eye[0], eye[3])
# compute the eye aspect ratio
ear = (A + B) / (2.0 * C)
# return the eye aspect ratio
return ear
# frames the eye must be below the threshold
# initialize the frame counters and the total number of blinks
# initialize dlib's face detector (HOG-based) and then create
# the facial landmark predictor
print("[INFO] loading facial landmark predictor...")
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
# grab the indexes of the facial landmarks for the left and
# right eye, respectively
(lStart, lEnd) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"]
(rStart, rEnd) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]
vs = VideoStream(src=0).start()
# vs = VideoStream(usePiCamera=True).start()
# loop over frames from the video stream
while True:
# if this is a file video stream, then we need to check if
# there any more frames left in the buffer to process
# grab the frame from the threaded video file stream, resize
# it, and convert it to grayscale
# channels)
frame =
frame = imutils.resize(frame, width=450)
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# detect faces in the grayscale frame
rects = detector(gray, 0)
# loop over the face detections
for rect in rects:
# determine the facial landmarks for the face region, then
# convert the facial landmark (x, y)-coordinates to a NumPy
# array
shape = predictor(gray, rect)
shape = face_utils.shape_to_np(shape)
# extract the left and right eye coordinates, then use the
# coordinates to compute the eye aspect ratio for both eyes
leftEye = shape[lStart:lEnd]
rightEye = shape[rStart:rEnd]
leftEAR = eye_aspect_ratio(leftEye)
rightEAR = eye_aspect_ratio(rightEye)
# average the eye aspect ratio together for both eyes
ear = (leftEAR + rightEAR)
# compute the convex hull for the left and right eye, then
# visualize each of the eyes
leftEyeHull = cv2.convexHull(leftEye)
rightEyeHull = cv2.convexHull(rightEye)
cv2.drawContours(frame, [leftEyeHull], -1, (0, 255, 0), 1)
cv2.drawContours(frame, [rightEyeHull], -1, (0, 255, 0), 1)
# check to see if the eye aspect ratio is below the blink
# threshold, and if so, increment the blink frame counter
if ear < EYE_AR_THRESH:
cv2.putText(frame, "Eye: {}".format("close"), (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
cv2.putText(frame, "EAR: {:.2f}".format(ear), (300, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
# otherwise, the eye aspect ratio is not below the blink
# threshold
cv2.putText(frame, "Eye: {}".format("Open"), (10, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
cv2.putText(frame, "EAR: {:.2f}".format(ear), (300, 30),
cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
# draw the total number of blinks on the frame along with
# the computed eye aspect ratio for the frame
# show the frame
cv2.imshow("Frame", frame)
key = cv2.waitKey(1) & 0xFF
# if the `q` key was pressed, break from the loop
if key == ord("q"):
# do a bit of cleanup
# Import the necessary packages
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.animation as animation
from EAR_calculator import *
from imutils import face_utils
from import VideoStream
import matplotlib.pyplot as plt
import matplotlib.animation as animate
from matplotlib import style
import imutils
import dlib
import time
import cv2
from playsound import playsound
from scipy.spatial import distance as dist
import os
import csv
import numpy as np
import pandas as pd
from datetime import datetime
# Declare a constant which will work as the threshold for EAR value, below which it will be regared as a blink
# Declare another costant to hold the consecutive number of frames to consider for a blink
# Another constant which will work as a threshold for MAR value
# Now, intialize the dlib's face detector model as 'detector' and the landmark predictor model as 'predictor'
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
# Grab the indexes of the facial landamarks for the left and right eye respectively
(lstart, lend) = face_utils.FACIAL_LANDMARKS_IDXS["left_eye"]
(rstart, rend) = face_utils.FACIAL_LANDMARKS_IDXS["right_eye"]
(mstart, mend) = face_utils.FACIAL_LANDMARKS_IDXS["mouth"]
image = cv2.imread("images/raja_sleepy.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Detect faces
rects = detector(image, 1)
if len(rects)==1:
for (i, rect) in enumerate(rects):
shape = predictor(gray, rect)
# Convert it to a (68, 2) size numpy array
shape = face_utils.shape_to_np(shape)
# Draw a rectangle over the detected face
(x, y, w, h) = face_utils.rect_to_bb(rect)
cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)
# Put a number
leftEye = shape[lstart:lend]
rightEye = shape[rstart:rend]
mouth = shape[mstart:mend]
# Compute the EAR for both the eyes
leftEAR = eye_aspect_ratio(leftEye)
rightEAR = eye_aspect_ratio(rightEye)
# Take the average of both the EAR
EAR = (leftEAR + rightEAR) / 2.0
#live datawrite in csv
# Compute the convex hull for both the eyes and then visualize it
leftEyeHull = cv2.convexHull(leftEye)
rightEyeHull = cv2.convexHull(rightEye)
# Draw the contours
cv2.drawContours(image, [leftEyeHull], -1, (0, 255, 0), 1)
cv2.drawContours(image, [rightEyeHull], -1, (0, 255, 0), 1)
cv2.drawContours(image, [mouth], -1, (0, 255, 0), 1)
MAR = mouth_aspect_ratio(mouth)
# Check if EAR < EAR_THRESHOLD, if so then it indicates that a blink is taking place
# Thus, count the number of frames for which the eye remains closed
cv2.drawContours(image, [leftEyeHull], -1, (0, 0, 255), 1)
cv2.drawContours(image, [rightEyeHull], -1, (0, 0, 255), 1)
cv2.putText(image, "Sleepy", (270, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
# Check if the person is yawning
cv2.drawContours(image, [mouth], -1, (0, 0, 255), 1)
cv2.putText(image, "Yawn ", (270, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 255), 2)
cv2.imshow("Output", image)
elif len(rects)==0:
print("Face not available")
print("Multiple face detected")