I am working on a StereoVision project. I set up my stereo camera, shot a picture (it's 2 parallel matrix cameras), then I read the openCV documentation, tried out the examples and other datasets and it seems like it is working just fine. On the other hand with my pictures the disparity image is a mess. I tried it with BM and SGBM method as well. The main question is if anyone had this type of problem before, is our camera set up bad, or am I just missing something important?
I attach my code and pictures.
import cv2
import numpy
import numpy as np
from matplotlib import pyplot as plt
left = cv2.imread("../JR_Pictures/JR_1_Test_left.bmp", cv2.IMREAD_GRAYSCALE)
right = cv2.imread("../JR_Pictur`enter code here`es/JR_1_Test_right.bmp",cv2.IMREAD_GRAYSCALE)
left = cv2.resize(left, (0, 0), None, 0.5, 0.5)
right = cv2.resize(right, (0, 0), None, 0.5, 0.5)
fx = 942.8 # 50 # 942.8 # lense focal length
baseline = 58.0 # distance in mm between the two cameras
disparities = 128 # num of disparities to consider
block = 13 # block size to match
units = 0.512 # depth units, adjusted for the output to fit in one byte
sbm = cv2.StereoBM_create(numDisparities=disparities,
blockSize=block)
left_matcher = cv2.StereoBM_create(numDisparities=disparities, blockSize=block)
wlsFilter = cv2.ximgproc.createDisparityWLSFilter(left_matcher)
right_matcher = cv2.ximgproc.createRightMatcher(left_matcher)
disparityL = left_matcher.compute(left, right)
disparityR = right_matcher.compute(left, right)
sigma = 1.5
lmbda = 32000.0
wls_filter = cv2.ximgproc.createDisparityWLSFilter(left_matcher);
wls_filter.setLambda(lmbda);
wls_filter.setSigmaColor(sigma);
filtered_disp = wls_filter.filter(disparityL, left, disparity_map_right=disparityR);
# calculate disparities
disparity = sbm.compute(left, right)
numpy_horizontal = np.hstack((left, right))
hori = np.hstack((disparityL, filtered_disp))
cv2.imshow('HorizontalStack1', numpy_horizontal)
cv2.imshow('HoriStack2', hori)
cv2.waitKey(0)
valid_pixels = disparity > 0
# calculate depth data
depth = numpy.zeros(shape=left.shape).astype("uint8")
depth[valid_pixels] = (fx * baseline) / (units * disparity[valid_pixels])
# visualize depth data
depth = cv2.equalizeHist(depth)
colorized_depth = numpy.zeros((left.shape[0], left.shape[1], 3), dtype="uint8")
temp = cv2.applyColorMap(depth, cv2.COLORMAP_JET)
colorized_depth[valid_pixels] = temp[valid_pixels]
plt.imshow(colorized_depth)
plt.show()
I tried out several codes from Github,Stackoverflow,OpenCv tutorials but none of them worked well, so i thought the problem is with out camera or with out image.I had to downscale them, because it was BMP fileformat and i cannot upload it to stackoverflow :D
So, these are my left and right raw images.
Left Pic, Right Pic:
And my DisparityRaw,Filtered, and calculated height map.
If I missed any information let me know, and thanks for help.
A couple of things are missing. stereo_BM is not magic and doesn't do everything for you.
As I already wrote here, you need to have a calibrated system, where all the intrinsic and extrinsic parameters of the stereo rig are known.
Did you calibrate your system? How did you end up with those values of fx and baseline?
Are you using a stereo rig or those are simply two images done with the same camera?
Why do we need calibration?
First, look at your images: they are not rectified! Rectified images have corresponding points on a horizontal line. Rectification can be done only if you have a calibrated system.
As you may see from the bottom corner of the book, it is not aligned (different height in left and right).
Secondly, you are not considering lens distortion that can be quite big on common cameras.
Then, to calculate depth you need the baseline information.
I encourage you to give it a try.
You can find my code to build the depth map here, you may join it with other examples to create your own system.
Here is how I do calibration instead. Good luck.
Related
I am trying to align an RGB image with an IR image (single channel).
The goal is to create a 4 channel image R,G,B,IR.
In order to do this, I am using cv2.findTransformECC as described in this very neat guide. The code is unchanged for now, except for line 13 where the Motion is set to Euclidian because I want to handle rotations in the future. I am using Python.
In order to verify the workings of the software, I used the images from the guide. It worked well so I wanted to correlate satellite images from multiple spectra as described above. Unfortunately, I ran into problems here.
Sometimes the algorithm converged (after ages) and sometimes it immediately crashed because it cant converge and other times it "finds" a solution that is clearly wrong. Attached you find two images that, from a human perspective, are easy to match, but the algorithm fails. The images are not rotated in any way, they are just not the exact same image (check the borders), so a translational motion is expected. Images are of Lake Neusiedlersee in Austria, the source is Sentinelhub.
Edit: With "sometimes" I refer to using different images from Sentinel. One pair of images has consistently the same outcome.
I know that ECC is not feature-based which might pose a problem here.
I have also read that it is somewhat dependent on the initial warp matrix.
My questions are:
Am I using cv2.findTransformECC wrong?
Is there a better way to do this?
Should I try to "Monte-Carlo" the initial matrices until it converges? (This feels wrong)
Do you suggest using a feature-based algorithm?
If so, is there one available or would I have to implement this myself?
Thanks for the help!
Do you suggest using a feature-based algorithm?
Sure.
There are many feature detections algorithms.
I generally choose SIFT because it provides good matching results and the runtime is feasibly fast.
import cv2 as cv
import numpy as np
# read the images
ir = cv.imread('ir.jpg', cv.IMREAD_GRAYSCALE)
rgb = cv.imread('rgb.jpg', cv.IMREAD_COLOR)
descriptor = cv.SIFT.create()
matcher = cv.FlannBasedMatcher()
# get features from images
kps_ir, desc_ir = descriptor.detectAndCompute(ir, mask=None)
gray = cv.cvtColor(rgb, cv.COLOR_BGR2GRAY)
kps_color, desc_color = descriptor.detectAndCompute(gray, mask=None)
# find the corresponding point pairs
if (desc_ir is not None and desc_color is not None and len(desc_ir) >=2 and len(desc_color) >= 2):
rawMatch = matcher.knnMatch(desc_color, desc_ir, k=2)
matches = []
# ensure the distance is within a certain ratio of each other (i.e. Lowe's ratio test)
ratio = 0.75
for m in rawMatch:
if len(m) == 2 and m[0].distance < m[1].distance * ratio:
matches.append((m[0].trainIdx, m[0].queryIdx))
# convert keypoints to points
pts_ir, pts_color = [], []
for id_ir, id_color in matches:
pts_ir.append(kps_ir[id_ir].pt)
pts_color.append(kps_color[id_color].pt)
pts_ir = np.array(pts_ir, dtype=np.float32)
pts_color = np.array(pts_color, dtype=np.float32)
# compute homography
if len(matches) > 4:
H, status = cv.findHomography(pts_ir, pts_color, cv.RANSAC)
warped = cv.warpPerspective(ir, H, (rgb.shape[1], rgb.shape[0]))
warped = cv.cvtColor(warped, cv.COLOR_GRAY2BGR)
# visualize the result
winname = 'result'
cv.namedWindow(winname, cv.WINDOW_KEEPRATIO)
alpha = 5
# res = cv.addWeighted(rgb, 0.5, warped, 0.5, 0)
res = None
def onChange(alpha):
global rgb, warped, res, winname
res = cv.addWeighted(rgb, alpha/10, warped, 1 - alpha/10, 0)
cv.imshow(winname, res)
onChange(alpha)
cv.createTrackbar('alpha', winname, alpha, 10, onChange)
cv.imshow(winname, res)
cv.waitKey()
cv.destroyWindow(winname)
Result (alpha=8)
Edit: It seems like SIFT is not the best option as it fails for some other examples. Example images are in another question.
In this case, I suggest using SURF.
It is a patented algorithm, so it does not come with the latest OpenCV PIP installations.
You can install previous versions of OpenCV or build it from source.
descriptor = cv.xfeatures2d.SURF_create()
Result (alpha=8)
Edit2: It is now clear that the key to achieve this task is to choose the correct feature descriptor. As a final note, I suggest choosing the appropriate motion model. Affine transform fits better than homography in this case.
H, _ = cv.estimateAffine2D(pts_ir, pts_color)
H = np.vstack((H, [0, 0, 1]))
Affine transform result:
I've been trying to convert stereo images into a depth map with use of opencv, but not matter what I do it seems to come out unreadable.
I was able to get an accurate depth image of example images that were provided in the opencv tutorial but not on any other image. Even when I attempted to download other premade, calibrated stereo image from online I get terrible results that are neither accurate nor are even close to quality that I get with the example images.
here is my main python script that I use to make the depth map:
import numpy as np
import cv2
from matplotlib import pyplot as plt
imgL = cv2.imread('calimg_L.png',0)
imgR = cv2.imread('calimg_R.png',0)
# imgL = cv2.imread('./images/example_L.png',0)
# imgR = cv2.imread('./images/example_R.png',0)
stereo = cv2.StereoSGBM_create(numDisparities=16, blockSize=15)
disparity = stereo.compute(imgR,imgL)
norm_image = cv2.normalize(disparity, None, alpha = 0, beta = 1, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_32F)
cv2.imwrite("disparityImage.jpg", norm_image)
plt.imshow(norm_image)
plt.show()
where calimg_L.png is a calibrated version of the original image.
Here is the code I use to calibrate my images:
import numpy as np
import cv2
import glob
from matplotlib import pyplot as plt
def createCalibratedImage(inputImage, outputName):
# termination criteria
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
# prepare object points, like (0,0,0), (1,0,0), (2,0,0) ....,(6,5,0)
objp = np.zeros((3*3,3), np.float32)
objp[:,:2] = np.mgrid[0:3,0:3].T.reshape(-1,2)
# Arrays to store object points and image points from all the images.
objpoints = [] # 3d point in real world space
imgpoints = [] # 2d points in image plane.
# org = cv2.imread('./chess.jpg')
# orig_cal_img = cv2.resize(org, (384, 288))
# cv2.imwrite("cal_chess.jpg", orig_cal_img)
images = glob.glob('./chess_webcam/*.jpg')
for fname in images:
print('file in use: ' + fname)
img = cv2.imread(fname)
gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
# Find the chess board corners
ret, corners = cv2.findChessboardCorners(gray, (3,3),None)
# print("doing the thing");
print('status: ' + str(ret));
# If found, add object points, image points (after refining them)
if ret == True:
# print("found something");
objpoints.append(objp)
cv2.cornerSubPix(gray,corners,(11,11),(-1,-1),criteria)
imgpoints.append(corners)
# Draw and display the corners
cv2.drawChessboardCorners(img, (3,3), corners,ret)
cv2.imshow('img',img)
cv2.waitKey(500)
ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1],None,None)
img = inputImage
h, w = img.shape[:2]
newcameramtx, roi=cv2.getOptimalNewCameraMatrix(mtx,dist,(w,h),1,(w,h))
# undistort
print('undistorting...')
mapx,mapy = cv2.initUndistortRectifyMap(mtx,dist,None,newcameramtx,(w,h),5)
dst = cv2.remap(inputImage ,mapx,mapy,cv2.INTER_LINEAR)
# crop the image
x,y,w,h = roi
dst = dst[y:y+h, x:x+w]
# cv2.imwrite('calibresult.png',dst)
cv2.imwrite(outputName + '.png',dst)
cv2.destroyAllWindows()
original_L = cv2.imread('capture_L.jpg')
original_R = cv2.imread('capture_R.jpg')
createCalibratedImage(original_R, "calimg_R")
createCalibratedImage(original_L, "calimg_L")
print("images calibrated and outputed")
This code was taken from opencv tutorial on how to calibrate images and was provided at least 16 images of the chess board, but was only able to identify the chessboard in about 4 - 5 of them. The reason I used such a relatively small grid search of 3x3 is because anything higher left me without any images to use for calibration due to its inability to find the chessboard.
Here is what I get from an example image(sorry for weird link, couldn't find how to upload):
https://ibb.co/DYMcdZc
here is the original:
https://ibb.co/gMkqyXD
https://ibb.co/YQZY40C
This acts a it should, but when I use it with any other image it gives me a mess, for example:
output:
https://ibb.co/kXwgDVn
looks like just a mess of pixels, to be fair when you put it into 'gray' on imshow it looks more readable but it is not very representative of the image's depth, here are the originals:
https://ibb.co/vqDKGS0
https://ibb.co/f0X1gMB
Even worse so, when I take images myself and do calibrate them through the chessboard code, it comes out as just a random mess of white and black pixels, and values of some goes into negatives and some pixels are impossibly high value.
tl;dr I can't get any stereo images to be made into a depth map even though the example image works just fine, why is that?
First I want to say that obtaining a good depth map is not such a simple task, and using the basic StereoMatching won't always lead to good results. Nevertheless, something better can be achieved.
In order:
Calibration: you should be able to find the checkerboard in more images, 4/5 is a very low number for calibration, it is very hard to estimate correctly the camera parameters with such low number. How do the images look like? Did you read them as grayscale images? Usually also using a different number for row and column (not 3x3 grid, like 4x3) helps to understand the checkerboard position (otherwise it could be ambiguous which side is up or right, for example, a 90 rotation would result in 0 rotation).
Rectification: this can be easily checked by looking at the images. Open two images on two different layers (using GIMP or similar) and check for similar points. After you rectified the images, they should lie on the same line. Are they really on the same line? If yes, rectification work, otherwise, you need a better calibration. The stereo matching won't work without this step.
Stereo Matching: if all above steps are correct, then you may have a problem on the parameters of the stereo matching. First thing to check is disparity range (since it looks like you have different resolution between example images and your images, you should check and adapt that value). Min disparity can also help (if you reduce the disparity range, you reduce the error possibilities) and also block size (15 is quite big, smaller is also enough).
From what you say, my guess would be the problem is on the calibration. You should try to check the rectified images, and if the problem is there try to acquire a new dataset (or find online a better one) and calibrate your images there. Once you can calibrate and rectify your images correctly, you should get better results.
I see the code is similar to the tutorial here so I guess that's correct and the main problem are the images. Hope this can help,I can help you more if you test and see where the probelm is!
I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:
import numpy as np
import cv2
from skimage.transform import radon
filename = 'path_to_filename'
# Load file, converting to grayscale
img = cv2.imread(filename)
I = cv2.cvtColor(img, COLOR_BGR2GRAY)
h, w = I.shape
# If the resolution is high, resize the image to reduce processing time.
if (w > 640):
I = cv2.resize(I, (640, int((h / w) * 640)))
I = I - np.mean(I) # Demean; make the brightness extend above and below zero
# Do the radon transform
sinogram = radon(I)
# Find the RMS value of each row and find "busiest" rotation,
# where the transform is lined up perfectly with the alternating dark
# text and white lines
r = np.array([np.sqrt(np.mean(np.abs(line) ** 2)) for line in sinogram.transpose()])
rotation = np.argmax(r)
print('Rotation: {:.2f} degrees'.format(90 - rotation))
# Rotate and save with the original resolution
M = cv2.getRotationMatrix2D((w/2,h/2),90 - rotation,1)
dst = cv2.warpAffine(img,M,(w,h))
cv2.imwrite('rotated.jpg', dst)
This code works well with most of the documents, except with some angles: (180 and 0) and (90 and 270) are often detected as the same angle (i.e it does not make difference between (180 and 0) and (90 and 270)). So I get a lot of upside-down documents.
Here is an example:
The resulted image that I get is the same as the input image.
Is there any suggestion to detect if an image is upside down using Opencv and Python?
PS: I tried to check the orientation using EXIF data, but it didn't lead to any solution.
EDIT:
It is possible to detect the orientation using Tesseract (pytesseract for Python), but it is only possible when the image contains a lot of characters.
For anyone who may need this:
import cv2
import pytesseract
print(pytesseract.image_to_osd(cv2.imread(file_name)))
If the document contains enough characters, it is possible for Tesseract to detect the orientation. However, when the image has few lines, the orientation angle suggested by Tesseract is usually wrong. So this can not be a 100% solution.
Python3/OpenCV4 script to align scanned documents.
Rotate the document and sum the rows. When the document has 0 and 180 degrees of rotation, there will be a lot of black pixels in the image:
Use a score keeping method. Score each image for it's likeness to a zebra pattern. The image with the best score has the correct rotation. The image you linked to was off by 0.5 degrees. I omitted some functions for readability, the full code can be found here.
# Rotate the image around in a circle
angle = 0
while angle <= 360:
# Rotate the source image
img = rotate(src, angle)
# Crop the center 1/3rd of the image (roi is filled with text)
h,w = img.shape
buffer = min(h, w) - int(min(h,w)/1.15)
roi = img[int(h/2-buffer):int(h/2+buffer), int(w/2-buffer):int(w/2+buffer)]
# Create background to draw transform on
bg = np.zeros((buffer*2, buffer*2), np.uint8)
# Compute the sums of the rows
row_sums = sum_rows(roi)
# High score --> Zebra stripes
score = np.count_nonzero(row_sums)
scores.append(score)
# Image has best rotation
if score <= min(scores):
# Save the rotatied image
print('found optimal rotation')
best_rotation = img.copy()
k = display_data(roi, row_sums, buffer)
if k == 27: break
# Increment angle and try again
angle += .75
cv2.destroyAllWindows()
How to tell if the document is upside down? Fill in the area from the top of the document to the first non-black pixel in the image. Measure the area in yellow. The image that has the smallest area will be the one that is right-side-up:
# Find the area from the top of page to top of image
_, bg = area_to_top_of_text(best_rotation.copy())
right_side_up = sum(sum(bg))
# Flip image and try again
best_rotation_flipped = rotate(best_rotation, 180)
_, bg = area_to_top_of_text(best_rotation_flipped.copy())
upside_down = sum(sum(bg))
# Check which area is larger
if right_side_up < upside_down: aligned_image = best_rotation
else: aligned_image = best_rotation_flipped
# Save aligned image
cv2.imwrite('/home/stephen/Desktop/best_rotation.png', 255-aligned_image)
cv2.destroyAllWindows()
Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:
Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
For each peak, check if the stronger sub-peak is on top or at the bottom.
Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.
The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.
Here's a figure to illustrate the approach; A few lines of you example image
Blue: Original projection
Orange: smoothed projection
Horizontal line: average of the smoothed projection for the whole image.
here's some code that does this:
import cv2
import numpy as np
# load image, convert to grayscale, threshold it at 127 and invert.
page = cv2.imread('Page.jpg')
page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
# project the page to the side and smooth it with a gaussian
projection = np.sum(page, 1)
gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
gaussian_filter /= np.sum(gaussian_filter)
smooth = np.convolve(projection, gaussian_filter)
# find the pixel values where we expect lines to start and end
mask = smooth > np.average(smooth)
edges = np.convolve(mask, [1, -1])
line_starts = np.where(edges == 1)[0]
line_endings = np.where(edges == -1)[0]
# count lines with peaks on the lower side
lower_peaks = 0
for start, end in zip(line_starts, line_endings):
line = smooth[start:end]
if np.argmax(line) < len(line)/2:
lower_peaks += 1
print(lower_peaks / len(line_starts))
this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.
Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.
Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.
You can use the Alyn module. To install it:
pip install alyn
Then to use it to deskew images(Taken from the homepage):
from alyn import Deskew
d = Deskew(
input_file='path_to_file',
display_image='preview the image on screen',
output_file='path_for_deskewed image',
r_angle='offest_angle_in_degrees_to_control_orientation')`
d.run()
Note that Alyn is only for deskewing text.
After trying example stated in opencv documentation.
When I tried the same code on KITTI image pair I get this:
The code I am using right now looks like this, changing the parameters in StereoBM_create did not help much:
import numpy as np
import cv2
from matplotlib import pyplot as plt
imgL = cv2.imread('000002_left.png',0)
imgR = cv2.imread('000002_right.png',0)
stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15)
#stereo = cv2.StereoBM_create(numDisparities=64, blockSize=17)
disparity = stereo.compute(imgL,imgR)
cv2.imwrite('depth_map.png', disparity)
disp_v2 = cv2.imread('depth_map.png')
disp_v2 = cv2.applyColorMap(disp_v2, cv2.COLORMAP_JET)
plt.imshow(disp_v2)
cv2.imwrite('depth_map_coloured.png', disp_v2)
plt.show()
Question is: How can I make the depth map better?
In my experience, StereoBM (OpenCV) doesn't work with KITTI images. Maybe because KITTI images are much more complex.
But I achieved to get good results using this:
https://github.com/ialhashim/DenseDepth
You should adjust the parameters of the stereo matcher in opencv.
This is a function inside a class I created. It can be seen that I adjust some parameters such as a number of disparities, min disparity etc.:
def get_stereo_map(self, image_idx):
left_RGB = self.get_left_RGB(image_idx) # left RGB image
right_RGB = self.get_right_RGB(image_idx) # right RGB image
# compute depth map from stereo
stereo = cv2.StereoBM_create()
stereo.setMinDisparity(0)
num_disparities = 16*5
stereo.setNumDisparities(num_disparities)
stereo.setBlockSize(15)
stereo.setSpeckleRange(16)
# stereo.setSpeckleWindowSize(45)
stereo_depth_map = stereo.compute(
cv2.cvtColor(np.array(left_RGB), cv2.COLOR_RGB2GRAY),
cv2.cvtColor(np.array(right_RGB), cv2.COLOR_RGB2GRAY))
# by equation + divide by 16 to get true disperities
stereo_depth_map = (self.storage.focal_pix_RGB * self.storage.baseline_m_RGB) \
/ (stereo_depth_map/16)
stereo_depth_map = DataParser.crop_redundant(stereo_depth_map)
return stereo_depth_map
For full code refer to my repo: https://github.com/janezlapajne/kitty-stereo-dataset-parser
Ground truth from lidar and stereo distance map are also included. Hope it helps anyone.
I cannot fetch a correct disparity map from a couple simple images, as shown below:
LEFT
RIGHT
Disparity
The codes:
import cv2
import numpy as np
# frames buffer
frames = []
# image categories
cat_selected = 0
cat_list = ['open']
cat_calib = [np.load('LUMIA_CALIB.npy')]
# load images
def im_load(image, calib):
frame = cv2.imread(image,0)
if calib is not None:
frame = cv2.undistort(cv2.resize(frame, (640, 480)), *calib[0])
x, y, w, h = calib[1]
frame = frame[y : y + h, x : x + w]
return frame
for idx, im in enumerate(['left', 'right']):
frames.append(im_load('images/%s/%s.jpg' %(cat_list[cat_selected], im), cat_calib[cat_selected]))
cv2.namedWindow(im, cv2.cv.CV_WINDOW_NORMAL)
cv2.imshow(im, frames[idx])
cv2.imwrite('%s.jpg' %im, frames[idx])
stereo = cv2.StereoBM(1, 16, 15)
disparity = stereo.compute(frames[0], frames[1])
cv2.namedWindow('map', 0)
cv2.imshow('map', cv2.convertScaleAbs(disparity))
cv2.imwrite('disparity.jpg', disparity)
cv2.waitKey(0)
cv2.destroyAllWindows()
Questions
What is wrong with the code and how can I fix it?
What are the effects of the distance between cameras while computing depth?
What is the unit of the members of the disparity matrix's values?
P.S
The codes computes the disparity map for Tsukuba set of images, alright though.
I don't know if this is relevant or not but the distance between two cameras is 14.85 cm.
Question 1
You seem to have forgotten to rectify your images from a stereo calibration procedure.
Question 2
The distance between two cameras is called Baseline. In general, the bigger the baseline, the better the accuracy at a distance is, with a sacrifice on field of view. The opposite is true too.
Question 3
The disparity is probably in pixel if you're using Python (I'm no expert in the python API of OpenCV). In general, if you want the distance values from the triangulation (in real world unit), you'll want to use reprojectImageTo3D or its equivalent in Python. You will need to first calibrate your stereo rig using a chessboard (and knowing the size of the squares).