Related
I have an image processing problem that I can't solve. I have a set of 375 images like the one below (1). I'm trying to remove the background, so to make "background substraction" (or "foreground extraction") and get only the waste on a plain background (black/white/...).
(1) Image example
I tried many things, including createBackgroundSubtractorMOG2 from OpenCV, or threshold. I also tried to remove the background pixel by pixel by subtracting it from the foreground because I have a set of 237 background images (2) (the carpet without the waste, but which is a little bit offset from the image with the objects). There are also variations in brightness on the background images.
(2) Example of a background image
Here is a code example that I was able to test and that gives me the results below (3) and (4). I use Python 3.8.3.
# Function to remove the sides of the images
def delete_side(img, x_left, x_right):
for i in range(img.shape[0]):
for j in range(img.shape[1]):
if j<=x_left or j>=x_right:
img[i,j] = (0,0,0)
return img
# Intialize the background model
backSub = cv2.createBackgroundSubtractorMOG2(history=250, varThreshold=2, detectShadows=True)
# Read the frames and update the background model
for frame in frames:
if frame.endswith(".png"):
filepath = FRAMES_FOLDER + '/' + frame
img = cv2.imread(filepath)
img_cut = delete_side(img, x_left=190, x_right=1280)
gray = cv2.cvtColor(img_cut, cv2.COLOR_BGR2GRAY)
mask = backSub.apply(gray)
newimage = cv2.bitwise_or(img, img, mask=mask)
img_blurred = cv2.GaussianBlur(newimage, (5, 5), 0)
gray2 = cv2.cvtColor(img_blurred, cv2.COLOR_BGR2GRAY)
_, binary = cv2.threshold(gray2, 10, 255, cv2.THRESH_BINARY)
final = cv2.bitwise_or(img, img, mask=binary)
newpath = RESULT_FOLDER + '/' + frame
cv2.imwrite(newpath, final)
I was inspired by many other cases found on Stackoverflow or others (example: removing pixels less than n size(noise) in an image - open CV python).
(3) The result obtained with the code above
(4) Result when increasing the varThreshold argument to 10
Unfortunately, there is still a lot of noise on the resulting pictures.
As a beginner in "background substraction", I don't have all the keys to get an optimal solution. If someone would have an idea to do this task in a more efficient and clean way (Is there a special method to handle the case of transparent objects? Can noise on objects be eliminated more effectively? etc.), I'm interested :)
Thanks
Thanks for your answers. For information, I simply change of methodology and use a segmentation model (U-Net) with 2 labels (foreground, background), to identify the background. It works quite well.
My task is to detect an object in a given image using OpenCV (I do not care whether it is the Python or C++ implementation). The object, shown below in three examples, is a black rectangle with five white rectagles within. All dimensions are known.
However, the rotation, scale, distance, perspective, lighting conditions, camera focus/lens, and background of the image are not known. The edge of the black rectangle is not guaranteed to be fully visible, however there will not be anything in front of the five white rectangles ever - they will always be fully visible. The end goal is to be able to detect the presence of this object within an image, and rotate, scale, and crop to show the object with the perspective removed. I am fairly confident that I can adjust the image to crop to just the object, given its four corners. However I am not so confident that I can reliably find those four corners. In ambiguous cases, not finding the object is preferred to misidentifying some other feature of the image as the object.
Using OpenCV I have come up with the following methods, however I feel I might be missing something obvious. Are there any more methods available, or is one of these the optimal solution?
Edge based outline
First idea was to look for the outside edge of the object.
Using Canny edge detection (after scaling to known size, grayscaling and gaussian blurring), finding a contour which best matches the outer shape of the object.
This deals with perspective, colour, size issues, but fails when there is a complicated background for example, or if there is something of similar shape to the object elsewhere in the image. Maybe this could be improved by a better set of rules for finding the correct contour - perhaps involving the five white rectangles as well as the outer edge.
Feature detection
The next idea was to match to a known template using feature detecting.
Using ORB feature detecting, descriptor matching and homography (from this tutorial) fails, I believe because the features it is detecting are very similar to other features within the object (lots of coreners which are precisely one-quarter white and three-quarters black). However, I do like the idea of matching to a known template - this idea makes sense to me. I suppose though that because the object is quite basic geometrically, it's likely to find a lot of false positives in the feature matching step.
Parallel Lines
Using Houghlines or HoughLinesP, looking for evenly spaced parallel lines. Have just started down this road so need to investigate the best methods for thresholding etc. While it looks messy for images with complex backgrounds, I think it may work well as I can rely on the fact that the white rectangles within the black object should always be high contrast, giving a good indication of where the lines are.
'Barcode Scan'
My final idea is to scan the image by line, looking for the white to black pattern.
I have not started this method, but the idea is to take a strip of the image (at some angle), convert to HSV colour space, and look for the regular black-to-white pattern appearing five times sequentially in the Value column. This idea sounds promising to me, as I believe it should ignore many of the unknown variables.
Thoughts
I have looked at a number of OpenCV tutorials, as well as SO questions such as this one, however because my object is quite geometrically simple I am having issues implementing the ideas given.
I feel like this is an achievable task, however my struggle is knowing which method to pursue further. I have experimented with the first two ideas quite a bit, and while I haven't achieved anything very reliable, maybe there is something I am missing. Is there a standard way of achieving this task which I have not thought of, or is one of my suggested methods the most sensible?
EDIT: Once the corners are found using one of the above methods (or some other method), I am thinking of using Hu Moments or OpenCV's matchShapes() function to remove any false positives.
EDIT2: Added some more input image examples as requested by #Timo
Orig1
Orig2
Orig3
Extra image 1
Extra image 2
Extra image 3
Extra image 4
I had some time looking into the problem and made a little python script. I'm detecting the white rectangles inside your shape. Paste the code into a .py file and copy all input images in an input subfolder. The final result of the image is just a dummy atm and the script isn't complete yet. I'll try to continue it in the next couple of days. The script will create a debug subfolder where it'll save some images that show the current detection state.
import numpy as np
import cv2
import os
INPUT_DIR = 'input'
DEBUG_DIR = 'debug'
OUTPUT_DIR = 'output'
IMG_TARGET_SIZE = 1000
# each algorithm must return a rotated rect and a confidence value [0..1]: (((x, y), (w, h), angle), confidence)
def main():
# a list of all used algorithms
algorithms = [rectangle_detection]
# load and prepare images
files = list(os.listdir(INPUT_DIR))
images = [cv2.imread(os.path.join(INPUT_DIR, f), cv2.IMREAD_GRAYSCALE) for f in files]
images = [scale_image(img) for img in images]
for img, filename in zip(images, files):
results = [alg(img, filename) for alg in algorithms]
roi, confidence = merge_results(results)
display = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
display = cv2.drawContours(display, [cv2.boxPoints(roi).astype('int32')], -1, (0, 230, 0))
cv2.imshow('img', display)
cv2.waitKey()
def merge_results(results):
'''Merges all results into a single result.'''
return max(results, key=lambda x: x[1])
def scale_image(img):
'''Scales the image so that the biggest side is IMG_TARGET_SIZE.'''
scale = IMG_TARGET_SIZE / np.max(img.shape)
return cv2.resize(img, (0,0), fx=scale, fy=scale)
def rectangle_detection(img, filename):
debug_img = cv2.cvtColor(img, cv2.COLOR_GRAY2BGR)
_, binarized = cv2.threshold(img, 50, 255, cv2.THRESH_BINARY)
contours, _ = cv2.findContours(binarized, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
# detect all rectangles
rois = []
for contour in contours:
if len(contour) < 4:
continue
cont_area = cv2.contourArea(contour)
if not 1000 < cont_area < 15000: # roughly filter by the volume of the detected rectangles
continue
cont_perimeter = cv2.arcLength(contour, True)
(x, y), (w, h), angle = rect = cv2.minAreaRect(contour)
rect_area = w * h
if cont_area / rect_area < 0.8: # check the 'rectangularity'
continue
rois.append(rect)
# save intermediate results in the debug folder
rois_img = cv2.drawContours(debug_img, contours, -1, (0, 0, 230))
rois_img = cv2.drawContours(rois_img, [cv2.boxPoints(rect).astype('int32') for rect in rois], -1, (0, 230, 0))
save_dbg_img(rois_img, 'rectangle_detection', filename, 1)
# todo: detect pattern
return rois[0], 1.0 # dummy values
def save_dbg_img(img, folder, filename, index=0):
'''Writes the given image to DEBUG_DIR/folder/filename_index.png.'''
folder = os.path.join(DEBUG_DIR, folder)
if not os.path.exists(folder):
os.makedirs(folder)
cv2.imwrite(os.path.join(folder, '{}_{:02}.png'.format(os.path.splitext(filename)[0], index)), img)
if __name__ == "__main__":
main()
Here is an example image of the current WIP
The next step is to detect the pattern / relation between mutliple rectangles. I'll update this answer when I make progress.
I've been working on a 'fun' solution that combines multiple videos from a security camera into one video, the idea being to overlay foreground motion over many hours into a few seconds for a 'quick preview'. Thanks to Stephen and Bahramdum for getting me on the right path.
The project is open source, for anyone to see. So far, I've played with the following background extractions:
OpenCV BackgroundSubtraction using a variety of algorithms
Mask R-CNN
Yolov3/TinyYolo
Optical flow
(I haven't yet tried detection+centroid tracking, will do so next)
Based on my experiments so far, OpenCV's background extraction generally works the best due to the fact that it extracts foreground purely based on motion. Plus its very fast. True, that it also extracts things like moving leaves etc, but we can work on removing those.
Here is an example of 3 hours of video blended into one short video.
https://youtu.be/C-mJfzvFTdg
My current challenge is that it is doing a bad job with shiny objects, like cars.
Here is an example:
Background subtraction consistently does a bad job with extracting polygons for shiny objects and findContours does no better either.
I've tried several options, but my current approach is documented here, the gist of which is:
Convert frame to HSV
remove intensity (I read this in another SO thread for shiny objects)
Apply background subtraction
Clean up outside noise with MORPH_OPEN
Blur mask to hopefully connect near while blobs
find contours on new masks
only keep contours of certain min area
create a new mask, where we draw only these contours with fill
Do a final dilation to connect close filled contours of new masks
10.Use this new mask to extract foreground from the frame and overlay it with current blended video
Would anyone have suggestions on how to improve extraction for reflective objects?
self.fgbg = cv2.createBackgroundSubtractorMOG2(detectShadows=False,
history=self.history)
frame_hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
frame_hsv[:,:,0] = 0 # see if removing intensity helps
# gray = cv2.cvtColor(frame_hsv, cv2.COLOR_BGR2GRAY)
# create initial background subtraction
frame_mask = self.fgbg.apply(frame_hsv)
# remove noise
frame_mask = cv2.morphologyEx(frame_mask, cv2.MORPH_OPEN, self.kernel_clean)
# blur to merge nearby masks, hopefully
frame_mask = cv2.medianBlur(frame_mask,15)
#frame_mask = cv2.GaussianBlur(frame_mask,(5,5),cv2.BORDER_DEFAULT)
#frame_mask = cv2.blur(frame_mask,(20,20))
h,w,_ = frame.shape
new_frame_mask = np.zeros((h,w),dtype=np.uint8)
copy_frame_mask = frame_mask.copy()
# find contours of mask
relevant = False
ctrs,_ = cv2.findContours(copy_frame_mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
rects = []
# only select relevant contours
for contour in ctrs:
area = cv2.contourArea(contour)
if area >= self.min_blend_area:
x,y,w,h = cv2.boundingRect(contour)
pts = Polygon([[x,y], [x+w,y], [x+w, y+h], [x,y+h]])
if g.poly_mask is None or g.poly_mask.intersects(pts):
relevant = True
cv2.drawContours(new_frame_mask, [contour], -1, (255, 255, 255), -1)
rects.append([x,y,w,h])
# do a dilation to again, combine the contours
frame_mask = cv2.dilate(new_frame_mask,self.kernel_fill,iterations = 5)
frame_mask = new_frame_mask
I found way too many variations on different conditions to have something predictable using openCV's background extraction
So I switched to yolov3 for object identification (added it as a new option, actually). While TinyYOLO was pretty terrible, YoloV3 seems to be adequate, albeit much slower.
Unfortunately, given YoloV3 only does rectangle and not subject masks, I had to switch to addWeighted method of cv2 to blend a new object on top of the blended frame.
Example: https://github.com/pliablepixels/zmMagik/blob/master/sample/1.gif
I have found a plethora of questions regarding finding "things" in images using openCV, et al. in Python but so far I have been unable to piece them together for a reliable solution to my problem.
I am attempting to use computer vision to help count tiny surface mount electronics parts. The idea is for me to dump parts onto a solid color piece of paper, snap a picture, and have the software tell me how many items are in it.
The "things" differ from one picture to the next but will always be identical in any one image. I seem to be able to manually tune the parameters for things like hue/saturation for a particular part but it tends to require tweaking every time I change to a new part.
My current, semi-functioning code is posted below:
import imutils
import numpy
import cv2
import sys
def part_area(contours, round=10):
"""Finds the mode of the contour area. The idea is that most of the parts in an image will be separated and that
finding the most common area in the list of areas should provide a reasonable value to approximate by. The areas
are rounded to the nearest multiple of 200 to reduce the list of options."""
# Start with a list of all of the areas for the provided contours.
areas = [cv2.contourArea(contour) for contour in contours]
# Determine a threshold for the minimum amount of area as 1% of the overall range.
threshold = (max(areas) - min(areas)) / 100
# Trim the list of areas down to only those that exceed the threshold.
thresholded = [area for area in areas if area > threshold]
# Round the areas to the nearest value set by the round argument.
rounded = [int((area + (round / 2)) / round) * round for area in thresholded]
# Remove any areas that rounded down to zero.
cleaned = [area for area in rounded if area != 0]
# Count the areas with the same values.
counts = {}
for area in cleaned:
if area not in counts:
counts[area] = 0
counts[area] += 1
# Reduce the areas down to only those that are in groups of three or more with the same area.
above = []
for area, count in counts.iteritems():
if count > 2:
for _ in range(count):
above.append(area)
# Take the mean of the areas as the average part size.
average = sum(above) / len(above)
return average
def find_hue_mode(hsv):
"""Given an HSV image as an input, compute the mode of the list of hue values to find the most common hue in the
image. This is used to determine the center for the background color filter."""
pixels = {}
for row in hsv:
for pixel in row:
hue = pixel[0]
if hue not in pixels:
pixels[hue] = 0
pixels[hue] += 1
counts = sorted(pixels.keys(), key=lambda key: pixels[key], reverse=True)
return counts[0]
if __name__ == "__main__":
# load the image and resize it to a smaller factor so that the shapes can be approximated better
image = cv2.imread(sys.argv[1])
# define range of blue color in HSV
hsv = cv2.cvtColor(image, cv2.COLOR_BGR2HSV)
center = find_hue_mode(hsv)
print 'Center Hue:', center
lower = numpy.array([center - 10, 50, 50])
upper = numpy.array([center + 10, 255, 255])
# Threshold the HSV image to get only blue colors
mask = cv2.inRange(hsv, lower, upper)
inverted = cv2.bitwise_not(mask)
blurred = cv2.GaussianBlur(inverted, (5, 5), 0)
edged = cv2.Canny(blurred, 50, 100)
dilated = cv2.dilate(edged, None, iterations=1)
eroded = cv2.erode(dilated, None, iterations=1)
# find contours in the thresholded image and initialize the shape detector
contours = cv2.findContours(eroded.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if imutils.is_cv2() else contours[1]
# Compute the area for a single part to use when setting the threshold and calculating the number of parts within
# a contour area.
part_area = part_area(contours)
# The threshold for a part's area - can't be too much smaller than the part itself.
threshold = part_area * 0.5
part_count = 0
for contour in contours:
if cv2.contourArea(contour) < threshold:
continue
# Sometimes parts are close enough together that they become one in the image. To battle this, the total area
# of the contour is divided by the area of a part (derived earlier).
part_count += int((cv2.contourArea(contour) / part_area) + 0.1) # this 0.1 "rounds up" slightly and was determined empirically
# Draw an approximate contour around each detected part to give the user an idea of what the tool has computed.
epsilon = 0.1 * cv2.arcLength(contour, True)
approx = cv2.approxPolyDP(contour, epsilon, True)
cv2.drawContours(image, [approx], -1, (0, 255, 0), 2)
# Print the part count and show off the processed image.
print 'Part Count:', part_count
cv2.imshow("Image", image)
cv2.waitKey(0)
Here's an example of the type of input image I am using:
or this:
And I'm currently getting results like this:
The results clearly show that the script is having trouble identifying some parts and it's true Achilles heel seems to be when parts touch one another.
So my question/challenge is, what can I do to improve the reliability of this script?
The script is to be integrated into an existing Python tool so I am searching for a solution using Python. The solution does not need to be pure Python as I am willing to install whatever 3rd party libraries might be needed.
If the objects are all of similar types, you might have more success isolating a single example in the image and then using feature matching to detect them.
A full solution would be out of scope for Stack Overflow, but my suggestion for progress would be to first somehow find one or more "correct" examples using your current rectangle retrieval method. You could probably look for all your samples that are of the expected size, or that are accurate rectangles.
Once you have isolated a few positive examples, use some feature matching techniques to find the others. There is a lot of reading up you probably need to do on it but that is a potential solution.
A general summary is that you use your positive examples to find "features" of the object you want to detect. These "features" are generally things like corners or changes in gradient. OpenCV contains many methods you can use.
Once you have the features, there are several algorithms in OpenCV you can look at that will search the image for all matching features. You’ll want one that is rotation invariant (can detect the same features arranged in different rotation), but you probably don’t need scale invariance (can detect the same features at multiple scales).
My one concern with this method is that the items you are searching for in your images are quite small. It might be difficult to find good, consistent features to match on.
You're tackling a 2D object recognition problem, for which there are many possible approaches. You've gone about it using background/foreground segmentation, which is ok as you have control on the scene (laying down the background paper sheet). However this will always have fundamental limitations when the objects touch. A simple solution to your problem can be this:
1) You assume that touching objects are rare events (which is a fine assumption in your problem). Therefore you can compute the areas for each segmented region, and compute the median of these, which will give a robust estimate for the object's area. Let's call this robust estimate A (in squared pixels). This will be fine if fewer than 50% of regions correspond to touching objects.
2) You then proceed to measure the number of objects in each segmented region. Let Ai be the area of the ith region. You then compute the number of objects in each region by Ni=round(Ai/A). You then sum Ni to give you the total number of objects.
This approach will be fine as long as the following conditions are met:
A) The touching objects do not significantly overlap
B) You do not have objects lying on their sides. If you do you might be able to deal with this using two area estimates (side and flat). Better to eliminate this scenario if you can for simplicity.
C) The objects are all roughly the same distance to the camera. If this is not the case then the areas of the objects (in pixels) cannot be modelled well by a single value.
D) There are not partially visible objects at the borders of the image.
E) You ensure that only the same type of object is visible in each image.
I am using a microscope to observe the motion of small 4 micron beads. I have a video of the beads moving and I would like to process the video to extract the bead locations as a function of time to get a mathematical model of their motion.
I am currently using opencv and programming in python
My code was importing a video from file, thresholding the image then applying a HoughCircles transform to find the spherical beads.
import numpy as np
import cv2
def nothing(x):
pass
cap = cv2.VideoCapture('testvideo.mp4')
cv2.namedWindow('trackbar')
cv2.createTrackbar('Param1','trackbar',40,255,nothing)
cv2.createTrackbar('Param2','trackbar',10,255,nothing)
cv2.createTrackbar('MaxRadius','trackbar',18,255,nothing)
while(cap.isOpened()):
e1 = cv2.getTickCount()
ret, frame = cap.read()
#get grayscale image for HoughCircles
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
p1 = cv2.getTrackbarPos('Param1','trackbar')
p2 = cv2.getTrackbarPos('Param2','trackbar')
rMax = cv2.getTrackbarPos('MaxRadius','trackbar')
#Threshold grayscale image
ret,th1 = cv2.threshold(gray,127,255,cv2.THRESH_BINARY)
#Find circles in image and store locations to circles list
circles = cv2.HoughCircles(th1, cv2.cv.CV_HOUGH_GRADIENT, 1, 10,
param1=p1, param2=p2,minRadius=0,
maxRadius=rMax)
#Hack for fixing the list if it is empty so program wont crash
if circles == None:
circles = [[[0,0,0.000],[0,0,0.000]]]
#convert circles list to integer list
circles = np.uint16(np.around(circles))
#store points to a file
datafile = file('datafile.txt', 'a')
np.savetxt(datafile, circles[0], fmt ='%i',delimiter=',', newline = ',')
datafile.write('\n')
datafile.close()
for i in circles[0,:]:
# draw the outer circle
cv2.circle(frame,(i[0],i[1]),i[2],(0,255,0),2)
# draw the center of the circle
cv2.circle(frame,(i[0],i[1]),2,(0,0,255),3)
cv2.imshow('detected circles',frame)
cv2.imshow('threshold video',th1)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
e2 = cv2.getTickCount()
time = (e2 - e1)/ cv2.getTickFrequency()
print time
cap.release()
cv2.destroyAllWindows()
Here is a sample frame from the video I am using for detecting the beads.
http://imgur.com/bVZbH3c
Here is an example of the single frame tracking that I did with the previous algorithm
http://imgur.com/4VWJI2F
I don't need to detect every single bead. Just an aggregate would be fine.
The beads are spherical and should look the same, so is there a library that I can use to integrate the bead image over the entire image and see where the points are most correlated in the image? Sometimes the beads are out of focus and that's why the current program I have keeps bouncing around and giving me false positives for the beads.
I eventually need this process to happen realtime so if possible it would be nice to have the algorithm be as efficient as possible.
If anyone knows a good approach to this type of problem it would be appreciated.