After trying example stated in opencv documentation.
When I tried the same code on KITTI image pair I get this:
The code I am using right now looks like this, changing the parameters in StereoBM_create did not help much:
import numpy as np
import cv2
from matplotlib import pyplot as plt
imgL = cv2.imread('000002_left.png',0)
imgR = cv2.imread('000002_right.png',0)
stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15)
#stereo = cv2.StereoBM_create(numDisparities=64, blockSize=17)
disparity = stereo.compute(imgL,imgR)
cv2.imwrite('depth_map.png', disparity)
disp_v2 = cv2.imread('depth_map.png')
disp_v2 = cv2.applyColorMap(disp_v2, cv2.COLORMAP_JET)
plt.imshow(disp_v2)
cv2.imwrite('depth_map_coloured.png', disp_v2)
plt.show()
Question is: How can I make the depth map better?
In my experience, StereoBM (OpenCV) doesn't work with KITTI images. Maybe because KITTI images are much more complex.
But I achieved to get good results using this:
https://github.com/ialhashim/DenseDepth
You should adjust the parameters of the stereo matcher in opencv.
This is a function inside a class I created. It can be seen that I adjust some parameters such as a number of disparities, min disparity etc.:
def get_stereo_map(self, image_idx):
left_RGB = self.get_left_RGB(image_idx) # left RGB image
right_RGB = self.get_right_RGB(image_idx) # right RGB image
# compute depth map from stereo
stereo = cv2.StereoBM_create()
stereo.setMinDisparity(0)
num_disparities = 16*5
stereo.setNumDisparities(num_disparities)
stereo.setBlockSize(15)
stereo.setSpeckleRange(16)
# stereo.setSpeckleWindowSize(45)
stereo_depth_map = stereo.compute(
cv2.cvtColor(np.array(left_RGB), cv2.COLOR_RGB2GRAY),
cv2.cvtColor(np.array(right_RGB), cv2.COLOR_RGB2GRAY))
# by equation + divide by 16 to get true disperities
stereo_depth_map = (self.storage.focal_pix_RGB * self.storage.baseline_m_RGB) \
/ (stereo_depth_map/16)
stereo_depth_map = DataParser.crop_redundant(stereo_depth_map)
return stereo_depth_map
For full code refer to my repo: https://github.com/janezlapajne/kitty-stereo-dataset-parser
Ground truth from lidar and stereo distance map are also included. Hope it helps anyone.
Related
I am working on a StereoVision project. I set up my stereo camera, shot a picture (it's 2 parallel matrix cameras), then I read the openCV documentation, tried out the examples and other datasets and it seems like it is working just fine. On the other hand with my pictures the disparity image is a mess. I tried it with BM and SGBM method as well. The main question is if anyone had this type of problem before, is our camera set up bad, or am I just missing something important?
I attach my code and pictures.
import cv2
import numpy
import numpy as np
from matplotlib import pyplot as plt
left = cv2.imread("../JR_Pictures/JR_1_Test_left.bmp", cv2.IMREAD_GRAYSCALE)
right = cv2.imread("../JR_Pictur`enter code here`es/JR_1_Test_right.bmp",cv2.IMREAD_GRAYSCALE)
left = cv2.resize(left, (0, 0), None, 0.5, 0.5)
right = cv2.resize(right, (0, 0), None, 0.5, 0.5)
fx = 942.8 # 50 # 942.8 # lense focal length
baseline = 58.0 # distance in mm between the two cameras
disparities = 128 # num of disparities to consider
block = 13 # block size to match
units = 0.512 # depth units, adjusted for the output to fit in one byte
sbm = cv2.StereoBM_create(numDisparities=disparities,
blockSize=block)
left_matcher = cv2.StereoBM_create(numDisparities=disparities, blockSize=block)
wlsFilter = cv2.ximgproc.createDisparityWLSFilter(left_matcher)
right_matcher = cv2.ximgproc.createRightMatcher(left_matcher)
disparityL = left_matcher.compute(left, right)
disparityR = right_matcher.compute(left, right)
sigma = 1.5
lmbda = 32000.0
wls_filter = cv2.ximgproc.createDisparityWLSFilter(left_matcher);
wls_filter.setLambda(lmbda);
wls_filter.setSigmaColor(sigma);
filtered_disp = wls_filter.filter(disparityL, left, disparity_map_right=disparityR);
# calculate disparities
disparity = sbm.compute(left, right)
numpy_horizontal = np.hstack((left, right))
hori = np.hstack((disparityL, filtered_disp))
cv2.imshow('HorizontalStack1', numpy_horizontal)
cv2.imshow('HoriStack2', hori)
cv2.waitKey(0)
valid_pixels = disparity > 0
# calculate depth data
depth = numpy.zeros(shape=left.shape).astype("uint8")
depth[valid_pixels] = (fx * baseline) / (units * disparity[valid_pixels])
# visualize depth data
depth = cv2.equalizeHist(depth)
colorized_depth = numpy.zeros((left.shape[0], left.shape[1], 3), dtype="uint8")
temp = cv2.applyColorMap(depth, cv2.COLORMAP_JET)
colorized_depth[valid_pixels] = temp[valid_pixels]
plt.imshow(colorized_depth)
plt.show()
I tried out several codes from Github,Stackoverflow,OpenCv tutorials but none of them worked well, so i thought the problem is with out camera or with out image.I had to downscale them, because it was BMP fileformat and i cannot upload it to stackoverflow :D
So, these are my left and right raw images.
Left Pic, Right Pic:
And my DisparityRaw,Filtered, and calculated height map.
If I missed any information let me know, and thanks for help.
A couple of things are missing. stereo_BM is not magic and doesn't do everything for you.
As I already wrote here, you need to have a calibrated system, where all the intrinsic and extrinsic parameters of the stereo rig are known.
Did you calibrate your system? How did you end up with those values of fx and baseline?
Are you using a stereo rig or those are simply two images done with the same camera?
Why do we need calibration?
First, look at your images: they are not rectified! Rectified images have corresponding points on a horizontal line. Rectification can be done only if you have a calibrated system.
As you may see from the bottom corner of the book, it is not aligned (different height in left and right).
Secondly, you are not considering lens distortion that can be quite big on common cameras.
Then, to calculate depth you need the baseline information.
I encourage you to give it a try.
You can find my code to build the depth map here, you may join it with other examples to create your own system.
Here is how I do calibration instead. Good luck.
I am trying to figure out how to extract multiple objects from a greyscale image with mineral grains. I need to segment the different grains and then extract each of them to save as a separate image file.
While doing research, I have found that everyone uses skimage. I'm worried that some grains will not be extracted (mineralgrains).
Has anyone had to work on a similar problem?
I'm not entirely sure what the mineral grains are in this image, but segmenting and labeling connected components is pretty straight forward using skimage algorithms.
The simplest approach that I know of is accomplished in two major steps:
Generate a binary image. The most straight forward methods for this are using thresholding based methods such as those elaborated on here: https://scikit-image.org/docs/dev/auto_examples/segmentation/plot_thresholding.html
Separating connected objects. This is often accomplished through use of a watershed algorithm as elaborated on here: https://scikit-image.org/docs/stable/auto_examples/segmentation/plot_watershed.html
Assuming that the mineral grains are the bigger objects in the example image, a first pass approach may look something like the following
import numpy as np
from scipy import ndimage as ndi
from skimage import io
from skimage.filters import gaussian, threshold_li
from skimage.morphology import remove_small_holes, remove_small_objects
from skimage.segmentation import watershed
from skimage.feature import peak_local_max
from skimage.measure import label
# read in the image
img = io.imread('fovAK.png', as_gray=True)
gauss_sigma = 1 # standard deviation for Gaussian kernel for removing a bit of noise from image
obj_min_size = 5000 # minimum # of pixels for labeled objects
min_hole_size = 5000 # The maximum area, in pixels, of a contiguous hole that will be filled
watershed_footprin = 100 # size of local region within which to search for peaks for watershed
#########################
#make a binary image using threshold
#########################
img_blurred = gaussian(img, gauss_sigma) # remove a bit of noise from image
img_thresh = threshold_li(img_blurred) # use histogram based method to determine intensity threshold
img_binary = img_blurred > img_thresh # make a binary image
# make a border around the image so that holes at edge are not filled
ydims, xdims = img.shape
img_binary[0, :] = 0
img_binary[ydims-1, :] = 0
img_binary[:, 0] = 0
img_binary[:, xdims-1] = 0
img_rsh = remove_small_holes(img_binary, min_hole_size) # removes small holdes in binary image
img_rso = remove_small_objects(img_rsh, obj_min_size) # removes small objects in binary image
#############################
#use watershed algorithm to seperate connected objects
####################################
distance = ndi.distance_transform_edt(img_rso) # distance transform of image
coords = peak_local_max(distance, labels=img_rso, footprint=np.ones((ws_footprint, ws_footprint))) # local pead distances
mask = np.zeros(distance.shape, dtype=bool) # mask for watershed
mask[tuple(coords.T)] = True
markers, _ = ndi.label(mask)
labels = watershed(-distance, markers, mask=img_rso)
This is absolutely a first pass on your example image and is by no means perfect. You may need to use a different thresholding algorithm or tweak other parameters but hopefully this gets you moving in the right direction.
From there you can just iterate over the labeled objects and save individual images.
There are definitely more sophisticated approaches to these kinds of problems ranging from edge detectors and bandpass filters all the out to using ML based methods but hopefully this gets you moving in the right direction.
I've been trying to convert stereo images into a depth map with use of opencv, but not matter what I do it seems to come out unreadable.
I was able to get an accurate depth image of example images that were provided in the opencv tutorial but not on any other image. Even when I attempted to download other premade, calibrated stereo image from online I get terrible results that are neither accurate nor are even close to quality that I get with the example images.
here is my main python script that I use to make the depth map:
import numpy as np
import cv2
from matplotlib import pyplot as plt
imgL = cv2.imread('calimg_L.png',0)
imgR = cv2.imread('calimg_R.png',0)
# imgL = cv2.imread('./images/example_L.png',0)
# imgR = cv2.imread('./images/example_R.png',0)
stereo = cv2.StereoSGBM_create(numDisparities=16, blockSize=15)
disparity = stereo.compute(imgR,imgL)
norm_image = cv2.normalize(disparity, None, alpha = 0, beta = 1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
cv2.imwrite("disparityImage.jpg", norm_image)
plt.imshow(norm_image)
plt.show()
where calimg_L.png is a calibrated version of the original image.
Here is the code I use to calibrate my images:
import numpy as np
import cv2
import glob
from matplotlib import pyplot as plt
def createCalibratedImage(inputImage, outputName):
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((3*3,3), np.float32)
objp[:,:2] = np.mgrid[0:3,0:3].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
# org = cv2.imread('./chess.jpg')
# orig_cal_img = cv2.resize(org, (384, 288))
# cv2.imwrite("cal_chess.jpg", orig_cal_img)
images = glob.glob('./chess_webcam/*.jpg')
for fname in images:
print('file in use: ' + fname)
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (3,3),None)
# print("doing the thing");
print('status: ' + str(ret));
# If found, add object points, image points (after refining them)
if ret == True:
# print("found something");
objpoints.append(objp)
cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
imgpoints.append(corners)
# Draw and display the corners
cv2.drawChessboardCorners(img, (3,3), corners,ret)
cv2.imshow('img',img)
cv2.waitKey(500)
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
img = inputImage
h, w = img.shape[:2]
newcameramtx, roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h))
# undistort
print('undistorting...')
mapx,mapy = cv2.initUndistortRectifyMap(mtx,dist,None,newcameramtx,(w,h),5)
dst = cv2.remap(inputImage ,mapx,mapy,cv2.INTER_LINEAR)
# crop the image
x,y,w,h = roi
dst = dst[y:y+h, x:x+w]
# cv2.imwrite('calibresult.png',dst)
cv2.imwrite(outputName + '.png',dst)
cv2.destroyAllWindows()
original_L = cv2.imread('capture_L.jpg')
original_R = cv2.imread('capture_R.jpg')
createCalibratedImage(original_R, "calimg_R")
createCalibratedImage(original_L, "calimg_L")
print("images calibrated and outputed")
This code was taken from opencv tutorial on how to calibrate images and was provided at least 16 images of the chess board, but was only able to identify the chessboard in about 4 - 5 of them. The reason I used such a relatively small grid search of 3x3 is because anything higher left me without any images to use for calibration due to its inability to find the chessboard.
Here is what I get from an example image(sorry for weird link, couldn't find how to upload):
https://ibb.co/DYMcdZc
here is the original:
https://ibb.co/gMkqyXD
https://ibb.co/YQZY40C
This acts a it should, but when I use it with any other image it gives me a mess, for example:
output:
https://ibb.co/kXwgDVn
looks like just a mess of pixels, to be fair when you put it into 'gray' on imshow it looks more readable but it is not very representative of the image's depth, here are the originals:
https://ibb.co/vqDKGS0
https://ibb.co/f0X1gMB
Even worse so, when I take images myself and do calibrate them through the chessboard code, it comes out as just a random mess of white and black pixels, and values of some goes into negatives and some pixels are impossibly high value.
tl;dr I can't get any stereo images to be made into a depth map even though the example image works just fine, why is that?
First I want to say that obtaining a good depth map is not such a simple task, and using the basic StereoMatching won't always lead to good results. Nevertheless, something better can be achieved.
In order:
Calibration: you should be able to find the checkerboard in more images, 4/5 is a very low number for calibration, it is very hard to estimate correctly the camera parameters with such low number. How do the images look like? Did you read them as grayscale images? Usually also using a different number for row and column (not 3x3 grid, like 4x3) helps to understand the checkerboard position (otherwise it could be ambiguous which side is up or right, for example, a 90 rotation would result in 0 rotation).
Rectification: this can be easily checked by looking at the images. Open two images on two different layers (using GIMP or similar) and check for similar points. After you rectified the images, they should lie on the same line. Are they really on the same line? If yes, rectification work, otherwise, you need a better calibration. The stereo matching won't work without this step.
Stereo Matching: if all above steps are correct, then you may have a problem on the parameters of the stereo matching. First thing to check is disparity range (since it looks like you have different resolution between example images and your images, you should check and adapt that value). Min disparity can also help (if you reduce the disparity range, you reduce the error possibilities) and also block size (15 is quite big, smaller is also enough).
From what you say, my guess would be the problem is on the calibration. You should try to check the rectified images, and if the problem is there try to acquire a new dataset (or find online a better one) and calibrate your images there. Once you can calibrate and rectify your images correctly, you should get better results.
I see the code is similar to the tutorial here so I guess that's correct and the main problem are the images. Hope this can help,I can help you more if you test and see where the probelm is!
I cannot fetch a correct disparity map from a couple simple images, as shown below:
LEFT
RIGHT
Disparity
The codes:
import cv2
import numpy as np
# frames buffer
frames = []
# image categories
cat_selected = 0
cat_list = ['open']
cat_calib = [np.load('LUMIA_CALIB.npy')]
# load images
def im_load(image, calib):
frame = cv2.imread(image,0)
if calib is not None:
frame = cv2.undistort(cv2.resize(frame, (640, 480)), *calib[0])
x, y, w, h = calib[1]
frame = frame[y : y + h, x : x + w]
return frame
for idx, im in enumerate(['left', 'right']):
frames.append(im_load('images/%s/%s.jpg' %(cat_list[cat_selected], im), cat_calib[cat_selected]))
cv2.namedWindow(im, cv2.cv.CV_WINDOW_NORMAL)
cv2.imshow(im, frames[idx])
cv2.imwrite('%s.jpg' %im, frames[idx])
stereo = cv2.StereoBM(1, 16, 15)
disparity = stereo.compute(frames[0], frames[1])
cv2.namedWindow('map', 0)
cv2.imshow('map', cv2.convertScaleAbs(disparity))
cv2.imwrite('disparity.jpg', disparity)
cv2.waitKey(0)
cv2.destroyAllWindows()
Questions
What is wrong with the code and how can I fix it?
What are the effects of the distance between cameras while computing depth?
What is the unit of the members of the disparity matrix's values?
P.S
The codes computes the disparity map for Tsukuba set of images, alright though.
I don't know if this is relevant or not but the distance between two cameras is 14.85 cm.
Question 1
You seem to have forgotten to rectify your images from a stereo calibration procedure.
Question 2
The distance between two cameras is called Baseline. In general, the bigger the baseline, the better the accuracy at a distance is, with a sacrifice on field of view. The opposite is true too.
Question 3
The disparity is probably in pixel if you're using Python (I'm no expert in the python API of OpenCV). In general, if you want the distance values from the triangulation (in real world unit), you'll want to use reprojectImageTo3D or its equivalent in Python. You will need to first calibrate your stereo rig using a chessboard (and knowing the size of the squares).
I'm new to python and stuck..
I want to make a python script that allows me to separate adjacent particles on an image like this:
into separate regions like this:
I was suggested to use the watershed method, which as far as I understand it would give me a something like this:
EDIT Actually found out that this is distance transform and not watershed
Where I then could use a threshold to separate them.. Followed this openCV watershed guide but it only worked to cut out the particles. Was not able to "transform" the code to do what I want.
I then took another approach. Tried to use the openCV contours which gave me good contours of the particles. I have then been looking intensively for an easy way to perform polygon offset in order to shrink the edge like this:
Using the center from the offset contours (polygon) should give me the number of particles.. But I just haven been able to find a simple way to do edge offset / polygon shrinking with python.
Here is a script using numpy, scipy and the scikit-image (aka skimage). It makes use of local maxima extraction and watershading plus labeling (ie connected components extraction).
import numpy as np
import scipy.misc
import scipy.ndimage
import skimage.feature
import skimage.morphology
# parameters
THRESHOLD = 128
# read image
im = scipy.misc.imread("JPh65.png")
# convert to gray image
im = im.mean(axis=-1)
# find peaks
peak = skimage.feature.peak_local_max(im, threshold_rel=0.9, min_distance=10)
# make an image with peaks at 1
peak_im = np.zeros_like(im)
for p in peak:
peak_im[p[0], p[1]] = 1
# label peaks
peak_label, _ = scipy.ndimage.label(peak_im)
# propagate peak labels with watershed
labels = skimage.morphology.watershed(255 - im, peak_label)
# limit watershed labels to area where the image is intense enough
result = labels * (im > THRESHOLD)