I've tried using the findChessboardCorners function in open CV python. But it's not working.
These are the images I'm trying to get it to detect these images.
board.jpg:
board2.jpg:
I want it to be able to detect where the squares are and if a piece is on it.
So far I've tried
import cv2 as cv
import numpy as np
def rescaleFrame(frame, scale=0.75):
#rescale image
width = int(frame.shape[1] * scale)
height = int(frame.shape[0] * scale)
dimensions = (width,height)
return cv.resize(frame, dimensions, interpolation=cv.INTER_AREA)
img = cv.imread("board2.jpg")
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
ret, corners = cv.findChessboardCorners(gray, (8,8),None)
if ret == True:
# Draw and display the corners
img = cv.drawChessboardCorners(img, (8,8), corners,ret)
img=rescaleFrame(img)
cv.imshow("board",img)
v.waitKey(0)
I was expect it to work like how this tutorial shows
The function findChessboardCorners is used to calibrate cameras using a black-and-white chessboard pattern. As far as I know, is not designed to detect the corners of a chess board with chess pieces on it.
This site shows an example of calibration "chess boards." And this site shows how these calibration chess boards are used, this example uses the ROS Library.
You can still use OpenCV but will need to try other functions. Assuming you took the photos yourself, you've also made the problem harder on yourself by using a background that has a lot of lines and corners, meaning you'll have to differentiate between those corners and corners on the board. You can also see that the top corners of the board behind the rooks are occluded. If you can retake the photos, I would take a top-down photo and do it on a blank surface that contrasts with the chessboard.
One example of corner detection in OpenCV is Harris corner detection. I wrote up a short example for you. You'll need to play around with this and other corner detection methods to see what works best. I found that adding a sobel filter to strength the lines in your image gave much better results. But it's still going to detect corners in the background and the corners on the pieces. You'll need to figure out how to filter those out.
import cv2 as cv
from matplotlib import pyplot as plt
import numpy as np
def sobel(src_image, kernel_size):
grad_x = cv.Sobel(src_image, cv.CV_16S, 1, 0, ksize=kernel_size, scale=1,
delta=0, borderType=cv.BORDER_DEFAULT)
grad_y = cv.Sobel(src_image, cv.CV_16S, 0, 1, ksize=kernel_size, scale=1,
delta=0, borderType=cv.BORDER_DEFAULT)
abs_grad_x = cv.convertScaleAbs(grad_x)
abs_grad_y = cv.convertScaleAbs(grad_y)
grad = cv.addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0)
return grad
def process_image(src_image_path):
# load the image
src_image = cv.imread(src_image_path)
# convert to RGB (otherwise when you display this image the colors will look incorrect)
src_image = cv.cvtColor(src_image, cv.COLOR_BGR2RGB)
# convert to grayscale before attempting corner detection
src_gray = cv.cvtColor(src_image, cv.COLOR_BGR2GRAY)
# standard technique to eliminate noise
blur_image = cv.blur(src_gray,(3,3))
# strengthen the appearance of lines in the image
sobel_image = sobel(blur_image, 3)
# detect corners
corners = cv.cornerHarris(sobel_image, 2, 3, 0.04)
# for visualization to make corners easier to see
corners = cv.dilate(corners, None)
# overlay on a copy of the source image
dest_image = np.copy(src_image)
dest_image[corners>0.01*corners.max()]=[0,0,255]
return dest_image
src_image_path = "board1.jpg"
dest_image = process_image(src_image_path)
plt.imshow(dest_image)
plt.show()
Related
I work at a studio that does school photos and we are trying to make a script to eliminate the job of cropping each photo to a template. The photos we work with are fairly uniform but they vary in resolution and head position a bit. I took up the mantle of trying to write the script with my fairly limited Python knowledge and through a lot of trial and error and online resources I think I have got most of the way there.
At the moment I am trying to figure out the best way to have the image crop from the NumPy array with the head where I want and I just cant find a good flexible solution. The head needs to be positioned slightly differently for pose 1 and pose 2 so its needs to be easy to change on the fly (Probably going to implement some sort of simple GUI to input stuff like that, but for now I can just change the code).
I also need to be able to change the output resolution of the photo so they are all uniform (2000x2500). Anyone have any ideas?
At the moment this is my current code, it just saves the detected face square:
import cv2
import os.path
import glob
# Cascade path
cascPath = 'haarcascade_frontalface_default.xml'
# Create the haar cascade
faceCascade = cv2.CascadeClassifier(cascPath)
#Check for output folder and create if its not there
if not os.path.exists('output'):
os.makedirs('output')
# Read Images
images = glob.glob('*.jpg')
for c, i in enumerate(images):
image = cv2.imread(i, 1)
# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Find face(s) using cascade
faces = faceCascade.detectMultiScale(
gray,
scaleFactor=1.1, # size of groups
minNeighbors=5, # How many groups around are detected as face for it to be valid
minSize=(500, 500) # Min size in pixels for face
)
# Outputs number of faces found in image
print('Found {0} faces!'.format(len(faces)))
# Places a rectangle on face
for (x, y, w, h) in faces:
imgCrop = image[y:y+h,x:x+w]
if len(faces) > 0:
#Saves Images to output folder with OG name
cv2.imwrite('output/'+ i, imgCrop)
I can crop using it like this:
# Crop Padding
left = 300
right = 300
top = 400
bottom = 1000
for (x, y, w, h) in faces:
imgCrop = image[y-top:y+h+bottom, x-left:x+w+right]
but that outputs pretty random resolutions and changes based on the image resolution
TL;DR
To set a new resolution with the dimension, you can use cv2.resize. There may be a pixel loss so you can use the interpolation method.
The newly resized image may be in BGR format, so you may need to convert to RGB format.
cv2.resize(src=crop, dsize=(2000, 2500), interpolation=cv2.INTER_LANCZOS4)
crop = cv2.cvtColor(crop, cv2.COLOR_BGR2RGB) # Make sure the cropped image is in RGB format
cv2.imwrite("image-1.png", crop)
Suggestion:
One approach is using python's face-recognition library.
The approach is using two sample images for training.
Predict the next image based on training images.
For instance, The followings are the training images:
We want to predict the faces in the below image:
When we get the facial encodings of the training images and apply to the next image:
import face_recognition
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image, ImageDraw
# Load a sample picture and learn how to recognize it.
first_image = face_recognition.load_image_file("images/ex.jpg")
first_face_encoding = face_recognition.face_encodings(first_image)[0]
# Load a second sample picture and learn how to recognize it.
second_image = face_recognition.load_image_file("images/index.jpg")
sec_face_encoding = face_recognition.face_encodings(second_image)[0]
# Create arrays of known face encodings and their names
known_face_encodings = [
first_face_encoding,
sec_face_encoding
]
print('Learned encoding for', len(known_face_encodings), 'images.')
# Load an image with an unknown face
unknown_image = face_recognition.load_image_file("images/babes.jpg")
# Find all the faces and face encodings in the unknown image
face_locations = face_recognition.face_locations(unknown_image)
face_encodings = face_recognition.face_encodings(unknown_image, face_locations)
# Convert the image to a PIL-format image so that we can draw on top of it with the Pillow library
# See http://pillow.readthedocs.io/ for more about PIL/Pillow
pil_image = Image.fromarray(unknown_image)
# Create a Pillow ImageDraw Draw instance to draw with
draw = ImageDraw.Draw(pil_image)
# Loop through each face found in the unknown image
for (top, right, bottom, left), face_encoding in zip(face_locations, face_encodings):
matches = face_recognition.compare_faces(known_face_encodings, face_encoding)
face_distances = face_recognition.face_distance(known_face_encodings, face_encoding)
best_match_index = np.argmin(face_distances)
draw.rectangle(((left, top), (right, bottom)), outline=(0, 0, 255), width=5)
# Remove the drawing library from memory as per the Pillow docs
del draw
# Display the resulting image
plt.imshow(pil_image)
plt.show()
The output will be:
The above is my suggestion. When you create a new resolution with the current image, there will be a pixel loss. Therefore you need to use an interpolation method.
For instance: after finding the face locations, select the coordinates in the original image.
# Add after draw.rectangle function.
crop = unknown_image[top:bottom, left:right]
Set new resolution with the size 2000 x 2500 and interpolation with CV2.INTERN_LANCZOS4.
Possible Question: Why CV2.INTERN_LANCZOS4?
Of course, you can select whatever you like, but in this post CV2.INTERN_LANCZOS4 was suggested.
cv2.resize(src=crop, dsize=(2000, 2500), interpolation=cv2.INTER_LANCZOS4)
Save the image
crop = cv2.cvtColor(crop, cv2.COLOR_BGR2RGB) # Make sure the cropped image is in RGB format
cv2.imwrite("image-1.png", crop)
Outputs are around 4.3 MB Therefore I can't display in here.
From the final result, we clearly see and identify faces. The library precisely finds the faces in the image.
Here what you can do:
Either you can use the training images of your own-set, or you can use the example above.
Apply the face-recognition function for each image, using the trained face-locations and save the results in the directory.
here is how I got it to crop how I wanted, this is added right below the "output number of faces" function
#Get the face postion and output values into variables, might not be needed but I did it
for (x, y, w, h) in faces:
xdis = x
ydis = y
w = w
h = h
#Get scale value by dividing wanted head hight by detected head hight
ws = 600/w
hs = 600/h
#scale image to get head to right size, uses bilinear interpolation by default
scale = cv2.resize(image,(0,0),fx=hs,fy=ws)
#calculate head postion for given values
sxdis = int(xdis*ws) #applying scale to x distance and turning it into a integer
sydis = int(ydis*hs) #applying scale to y distance and turning it into a integer
sycent = sydis+300 #adding half head hight to get center
ystart = sycent-700 #subtract where you want the head center to be in pixels, this is for the vertical
yend = ystart+2500 #Add whatever you want vertical resolution to be
xcent = sxdis+300 #adding half head hight to get center
xstart = xcent-1000 #subtract where you want the head center to be in pixels, this is for the horizontal
xend = xstart+2000 #add whatever you want the horizontal resolution to be
#Crop the image
cropped = scale[ystart:yend, xstart:xend]
Its a mess but it works exactly how I wanted it to work.
ended up going with openCV instead of switching to python-Recognition because of speed but I might switch over if I can get multithreading to work in python-recognition.
I've been trying to convert stereo images into a depth map with use of opencv, but not matter what I do it seems to come out unreadable.
I was able to get an accurate depth image of example images that were provided in the opencv tutorial but not on any other image. Even when I attempted to download other premade, calibrated stereo image from online I get terrible results that are neither accurate nor are even close to quality that I get with the example images.
here is my main python script that I use to make the depth map:
import numpy as np
import cv2
from matplotlib import pyplot as plt
imgL = cv2.imread('calimg_L.png',0)
imgR = cv2.imread('calimg_R.png',0)
# imgL = cv2.imread('./images/example_L.png',0)
# imgR = cv2.imread('./images/example_R.png',0)
stereo = cv2.StereoSGBM_create(numDisparities=16, blockSize=15)
disparity = stereo.compute(imgR,imgL)
norm_image = cv2.normalize(disparity, None, alpha = 0, beta = 1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
cv2.imwrite("disparityImage.jpg", norm_image)
plt.imshow(norm_image)
plt.show()
where calimg_L.png is a calibrated version of the original image.
Here is the code I use to calibrate my images:
import numpy as np
import cv2
import glob
from matplotlib import pyplot as plt
def createCalibratedImage(inputImage, outputName):
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((3*3,3), np.float32)
objp[:,:2] = np.mgrid[0:3,0:3].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
# org = cv2.imread('./chess.jpg')
# orig_cal_img = cv2.resize(org, (384, 288))
# cv2.imwrite("cal_chess.jpg", orig_cal_img)
images = glob.glob('./chess_webcam/*.jpg')
for fname in images:
print('file in use: ' + fname)
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (3,3),None)
# print("doing the thing");
print('status: ' + str(ret));
# If found, add object points, image points (after refining them)
if ret == True:
# print("found something");
objpoints.append(objp)
cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
imgpoints.append(corners)
# Draw and display the corners
cv2.drawChessboardCorners(img, (3,3), corners,ret)
cv2.imshow('img',img)
cv2.waitKey(500)
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
img = inputImage
h, w = img.shape[:2]
newcameramtx, roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h))
# undistort
print('undistorting...')
mapx,mapy = cv2.initUndistortRectifyMap(mtx,dist,None,newcameramtx,(w,h),5)
dst = cv2.remap(inputImage ,mapx,mapy,cv2.INTER_LINEAR)
# crop the image
x,y,w,h = roi
dst = dst[y:y+h, x:x+w]
# cv2.imwrite('calibresult.png',dst)
cv2.imwrite(outputName + '.png',dst)
cv2.destroyAllWindows()
original_L = cv2.imread('capture_L.jpg')
original_R = cv2.imread('capture_R.jpg')
createCalibratedImage(original_R, "calimg_R")
createCalibratedImage(original_L, "calimg_L")
print("images calibrated and outputed")
This code was taken from opencv tutorial on how to calibrate images and was provided at least 16 images of the chess board, but was only able to identify the chessboard in about 4 - 5 of them. The reason I used such a relatively small grid search of 3x3 is because anything higher left me without any images to use for calibration due to its inability to find the chessboard.
Here is what I get from an example image(sorry for weird link, couldn't find how to upload):
https://ibb.co/DYMcdZc
here is the original:
https://ibb.co/gMkqyXD
https://ibb.co/YQZY40C
This acts a it should, but when I use it with any other image it gives me a mess, for example:
output:
https://ibb.co/kXwgDVn
looks like just a mess of pixels, to be fair when you put it into 'gray' on imshow it looks more readable but it is not very representative of the image's depth, here are the originals:
https://ibb.co/vqDKGS0
https://ibb.co/f0X1gMB
Even worse so, when I take images myself and do calibrate them through the chessboard code, it comes out as just a random mess of white and black pixels, and values of some goes into negatives and some pixels are impossibly high value.
tl;dr I can't get any stereo images to be made into a depth map even though the example image works just fine, why is that?
First I want to say that obtaining a good depth map is not such a simple task, and using the basic StereoMatching won't always lead to good results. Nevertheless, something better can be achieved.
In order:
Calibration: you should be able to find the checkerboard in more images, 4/5 is a very low number for calibration, it is very hard to estimate correctly the camera parameters with such low number. How do the images look like? Did you read them as grayscale images? Usually also using a different number for row and column (not 3x3 grid, like 4x3) helps to understand the checkerboard position (otherwise it could be ambiguous which side is up or right, for example, a 90 rotation would result in 0 rotation).
Rectification: this can be easily checked by looking at the images. Open two images on two different layers (using GIMP or similar) and check for similar points. After you rectified the images, they should lie on the same line. Are they really on the same line? If yes, rectification work, otherwise, you need a better calibration. The stereo matching won't work without this step.
Stereo Matching: if all above steps are correct, then you may have a problem on the parameters of the stereo matching. First thing to check is disparity range (since it looks like you have different resolution between example images and your images, you should check and adapt that value). Min disparity can also help (if you reduce the disparity range, you reduce the error possibilities) and also block size (15 is quite big, smaller is also enough).
From what you say, my guess would be the problem is on the calibration. You should try to check the rectified images, and if the problem is there try to acquire a new dataset (or find online a better one) and calibrate your images there. Once you can calibrate and rectify your images correctly, you should get better results.
I see the code is similar to the tutorial here so I guess that's correct and the main problem are the images. Hope this can help,I can help you more if you test and see where the probelm is!
I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform import radon
filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.
Here is an example:
The resulted image that I get is the same as the input image.
Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.
EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:
import cv2
import pytesseract
print(pytesseract.image_to_osd(cv2.imread(file_name)))
If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.
Python3/OpenCV4 script to align scanned documents.
Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:
Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.
# Rotate the image around in a circle
angle = 0
while angle <= 360:
# Rotate the source image
img = rotate(src, angle)
# Crop the center 1/3rd of the image (roi is filled with text)
h,w = img.shape
buffer = min(h, w) - int(min(h,w)/1.15)
roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
# Create background to draw transform on
bg = np.zeros((buffer*2, buffer*2), np.uint8)
# Compute the sums of the rows
row_sums = sum_rows(roi)
# High score --> Zebra stripes
score = np.count_nonzero(row_sums)
scores.append(score)
# Image has best rotation
if score <= min(scores):
# Save the rotatied image
print('found optimal rotation')
best_rotation = img.copy()
k = display_data(roi, row_sums, buffer)
if k == 27: break
# Increment angle and try again
angle += .75
cv2.destroyAllWindows()
How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:
# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:
Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
For each peak, check if the stronger sub-peak is on top or at the bottom.
Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.
The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.
Here's a figure to illustrate the approach; A few lines of you example image
Blue: Original projection
Orange: smoothed projection
Horizontal line: average of the smoothed projection for the whole image.
here's some code that does this:
import cv2
import numpy as np
# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)
# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]
# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
line = smooth[start:end]
if np.argmax(line) < len(line)/2:
lower_peaks += 1
print(lower_peaks / len(line_starts))
this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.
Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.
Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.
You can use the Alyn module. To install it:
pip install alyn
Then to use it to deskew images(Taken from the homepage):
from alyn import Deskew
d = Deskew(
input_file='path_to_file',
display_image='preview the image on screen',
output_file='path_for_deskewed image',
r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()
Note that Alyn is only for deskewing text.
In the original picture, I would like to detect circular regions. (glands) I managed to get to know the outlines of the regions, but because of the many smaller objects (nuclei), I can not go any further.
My original idea was to remove small objects using the cv2.connectedComponentsWithStats function. But unfortunately, as shown in the picture, the glandy regions also contain small objects, they are not connected properly. The function also throws out the small regions that outline the glands, leaving some parts out of the contours.
Can someone help me to find a solution to this problem?
Thank you very much in advance
Original picture
The approximate contour of the glands (with a lot of small objects in it)
After cv2.connectedComponentsWithStats
OpenCV
I think you can solve your task by using the Hough transform. Something like this could work for you (you have to adjust the parameters according to your needs):
import sys
import cv2 as cv
import numpy as np
def main(argv):
filename = argv[0]
src = cv.imread(filename, cv.IMREAD_COLOR)
if src is None:
print ('Error opening image!')
print ('Usage: hough_circle.py [image_name -- default ' + default_file + '] \n')
return -1
gray = cv.cvtColor(src, cv.COLOR_BGR2GRAY)
gray = cv.medianBlur(gray, 5)
rows = gray.shape[0]
circles = cv.HoughCircles(gray, cv.HOUGH_GRADIENT, 1, rows / 32,
param1=100, param2=30,
minRadius=20, maxRadius=200)
if circles is not None:
circles = np.uint16(np.around(circles))
for i in circles[0, :]:
center = (i[0], i[1])
# circle center
cv.circle(src, center, 1, (0, 100, 100), 3)
# circle outline
radius = i[2]
cv.circle(src, center, radius, (255, 0, 255), 2)
cv.imshow("detected circles", src)
cv.waitKey(0)
return 0
if __name__ == "__main__":
main(sys.argv[1:])
Some additional preprocessing might be required, to get rid of the noise, e.g. Morphological Transformations and performing edge detection right before the transformation might be helpful as well.
Neural Networks
Another option would be to use a neural network for image segmentation. A quite successful one is Mask RCNN. There is already a working python implementation on GitHub: Mask RCNN - Nucleus.
I want to calibrate a car video recorder and use it for 3D reconstruction with Structure from Motion (SfM). The original size of the pictures I have took with this camera is 1920x1080. Basically, I have been using the source code from the OpenCV tutorial for the calibration.
But there are some problems and I would really appreciate any help.
So, as usual (at least in the above source code), here is the pipeline:
Find the chessboard corner with findChessboardCorners
Get its subpixel value with cornerSubPix
Draw it for visualisation with drawhessboardCorners
Then, we calibrate the camera with a call to calibrateCamera
Call the getOptimalNewCameraMatrix and the undistort function to undistort the image
In my case, since the image is too big (1920x1080), I have resized it to 640x320 (during SfM, I will also use this size of image, so, I don't think it would be any problem). And also, I have used a 9x6 chessboard corners for the calibration.
Here, the problem arose. After a call to the getOptimalNewCameraMatrix, the distortion come out totally wrong. Even the returned ROI is [0,0,0,0]. Below is the original image and its undistorted version:
You can see the image in the undistorted image is at the bottom left.
But, if I didn't call the getOptimalNewCameraMatrix and just straight undistort it, I got a quite good image.
So, I have three questions.
Why is this? I have tried with another dataset taken with the same camera, and also with my iPhone 6 Plus, but the results are same as above.
Another question is, what is the getOptimalNewCameraMatrix does? I have read the documentations several times but still cannot understand it. From what I have observed, if I didn't call the getOptimalNewCameraMatrix, my image will retain its size but it would be zoomed and blurred. Can anybody explain this function in more detail for me?
For SfM, I guess the call to getOptimalNewCameraMatrix is important? Because if not, the undistorted image would be zoomed and blurred, making the keypoint detection harder (in my case, I will be using the optical flow)?
I have tested the code with the opencv sample pictures and the results are just fine.
Below is my source code:
from sys import argv
import numpy as np
import imutils # To use the imutils.resize function.
# Resizing while preserving the image's ratio.
# In this case, resizing 1920x1080 into 640x360.
import cv2
import glob
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((9*6,3), np.float32)
objp[:,:2] = np.mgrid[0:9,0:6].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
images = glob.glob(argv[1] + '*.jpg')
width = 640
for fname in images:
img = cv2.imread(fname)
if width:
img = imutils.resize(img, width=width)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (9,6),None)
# If found, add object points, image points (after refining them)
if ret == True:
objpoints.append(objp)
corners2 = cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
imgpoints.append(corners2)
# Draw and display the corners
img = cv2.drawChessboardCorners(img, (9,6), corners2,ret)
cv2.imshow('img',img)
cv2.waitKey(500)
cv2.destroyAllWindows()
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
for fname in images:
img = cv2.imread(fname)
if width:
img = imutils.resize(img, width=width)
h, w = img.shape[:2]
newcameramtx, roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h))
# undistort
dst = cv2.undistort(img, mtx, dist, None, newcameramtx)
# crop the image
x,y,w,h = roi
dst = dst[y:y+h, x:x+w]
cv2.imshow("undistorted", dst)
cv2.waitKey(500)
mean_error = 0
for i in xrange(len(objpoints)):
imgpoints2, _ = cv2.projectPoints(objpoints[i], rvecs[i], tvecs[i], mtx, dist)
error = cv2.norm(imgpoints[i],imgpoints2, cv2.NORM_L2)/len(imgpoints2)
mean_error += error
print "total error: ", mean_error/len(objpoints)
Already ask someone in answers.opencv.org and he tried my code and my dataset with success. I wonder what is actually wrong.
Question #2:
With cv::getOptimalNewCameraMatrix(...) you can compute a new camera matrix according to the free scaling parameter alpha.
If alpha is set to 1 then all the source image pixels are retained in the undistorted image that is you'll see black and curved border along the undistorted image (like a pincushion). This scenario is unlucky for several computer vision algorithms, because new edges are appeared on the undistorted image for example.
By default cv::undistort(...) regulates the subset of the source image that will be visible in the corrected image and that's why only the sensible pixels are shown on that - no pincushion around the corrected image but data loss.
Anyway, you are allowed to control the subset of the source image that will be visible in the corrected image:
cv::Mat image, cameraMatrix, distCoeffs;
// ...
cv::Mat newCameraMatrix = cv::getOptimalNewCameraMatrix(cameraMatrix, distCoeffs, image.size(), 1.0);
cv::Mat correctedImage;
cv::undistort(image, correctedImage, cameraMatrix, distCoeffs, newCameraMatrix);
Question #1:
It is just my feeling, but you should also take care, if you resize your image after the calibration then the camera matrix must be also "scaled" as well, for example:
cv::Mat cameraMatrix;
cv::Size calibSize; // Image during the calibration, e.g. 1920x1080
cv::Size imageSize; // Your current image size, e.g. 640x320
// ...
cv::Matx31d t(0.0, 0.0, 1.0);
t(0) = (double)imageSize.width / (double)calibSize.width;
t(1) = (double)imageSize.height / (double)calibSize.height;
cameraMatrixScaled = cv::Mat::diag(cv::Mat(t)) * cameraMatrix;
This must be done only for the camera matrix, because the distortion coefficients do not depend on the resolution.
Question #3:
Whatever I think cv::getOptimalNewCameraMatrix(...) is not important in your case, the undistorted image can be zoomed and blurred because you remove the effect of a non-linear transformation. If I were you then I would try the optical flow without calling cv::undistort(...). I think that even a distorted image can contain a lot of good features for tracking.