Box blur is not any faster than Gaussian blur? - python

I have written some code to apply filters to an image using kernel convolution. Currently, it takes quite a long time, approximately 30 seconds for a 400x400 image. I understand that box blurs are much faster than Gaussian blurs. However, when I change my kernel to a box blur it seems to take as much time as the Gaussian blur. Any ideas?
import cv2
import numpy as np
img = cv2.imread('test.jpg')
img2 = cv2.imread('test.jpg')
height, width, channels = img.shape
GB3 = np.array([[1,2,1], [2,4,2], [1,2,1]])
GB5 = np.array([[1,4,6,4,1], [4,16,24,16,4], [6,24,36,24,6], [4,16,24,16,4], [1,4,6,4,1]])
BB = np.array([[1,1,1], [1,1,1], [1,1,1]])
kernel = BB
#initialise
kernel_sum = 1
filtered_sum_r = 0
filtered_sum_g = 0
filtered_sum_b = 0
for i in range(kernel.shape[0]):
for j in range(kernel.shape[1]):
p = kernel[i][j]
kernel_sum += p
for x in range(1,width-1):
for y in range(1,height-1):
for i in range(kernel.shape[0]):
for j in range(kernel.shape[1]):
filtered_sum_b += img[y-1+j,x-1+i,0]*kernel[i][j]
filtered_sum_g += img[y-1+j,x-1+i,1]*kernel[i][j]
filtered_sum_r += img[y-1+j,x-1+i,2]*kernel[i][j]
new_pixel_r = filtered_sum_r/kernel_sum
new_pixel_g = filtered_sum_g/kernel_sum
new_pixel_b = filtered_sum_b/kernel_sum
if new_pixel_r>255:
new_pixel_r = 255
elif new_pixel_r<0:
new_pixel_r = 0
if new_pixel_g>255:
new_pixel_g = 255
elif new_pixel_g<0:
new_pixel_g = 0
if new_pixel_b>255:
new_pixel_b = 255
elif new_pixel_b<0:
new_pixel_b = 0
img2[y,x,0] = new_pixel_b
img2[y,x,1] = new_pixel_g
img2[y,x,2] = new_pixel_r
filtered_sum_r = 0
filtered_sum_g = 0
filtered_sum_b = 0
#print(kernel_sum)
scale = 2
img_big = cv2.resize(img, (0,0), fx=scale, fy=scale)
img2_big = cv2.resize(img2, (0,0), fx=scale, fy=scale)
cv2.imshow('original', img_big)
cv2.imshow('processed', img2_big)
cv2.waitKey(0)
cv2.destroyAllWindows()

you are using python loops. that will always be orders of magnitude slower than optimized binary code. whenever possible, use library functions, i.e. numpy and OpenCV. or write your critical code as compilable Cython.
your code's access pattern is suboptimal. you should move along rows in the inner loop (for y: for x:) because that's how the image is stored. the reason here is how your CPU's cache is used. in row-major storage, a cache line contains several pixels in a row. if you run along columns, you only use that cache line once before needing another.
your code doesn't make use of the property that both types of filter are "separable"
convolution can be expressed as an elementwise multiplication in the frequency domain (DFT, multiply, inverse DFT), which is the usual way to perform convolutions.
Use OpenCV's filter2D function for your convolutions.
As for box blur vs gaussian, the only difference is "interesting" weights vs. no weights (all equal). That amounts to a few more multiplications, or not. When the code is optimized, its execution time can be dominated by the time needed to transfer the data from RAM to CPU. that goes for optimized code, not pure python loops.

Related

skeletonization (thinning) of small images not giving expected results - python

I am trying to implement a skeletonization of small images. But I am not getting an expected results. I tried also thin() and medial_axis() but nothing seems to work as expected. I am suspicious that this problem occurs because of the small resolutions of images. Here is the code:
import cv2
from numpy import asarray
import numpy as np
# open image
file = "66.png"
img_grey = cv2.imread(file, cv2.IMREAD_GRAYSCALE)
afterMedian = cv2.medianBlur(img_grey, 3)
thresh = 140
# threshold the image
img_binary = cv2.threshold(afterMedian, thresh, 255, cv2.THRESH_BINARY)[1]
# make binary image
arr = asarray(img_binary)
binaryArr = np.zeros(asarray(img_binary).shape)
for i in range(0, arr.shape[0]):
for j in range(0, arr.shape[1]):
if arr[i][j] == 255:
binaryArr[i][j] = 1
else:
binaryArr[i][j] = 0
# perform skeletonization
from skimage.morphology import skeletonize
cv2.imshow("binary arr", binaryArr)
backgroundSkeleton = skeletonize(binaryArr)
# convert to non-binary image
bSkeleton = np.zeros(arr.shape)
for i in range(0, arr.shape[0]):
for j in range(0, arr.shape[1]):
if backgroundSkeleton[i][j] == 0:
bSkeleton[i][j] = 0
else:
bSkeleton[i][j] = 255
cv2.imshow("background skeleton", bSkeleton)
cv2.waitKey(0)
The results are:
I would expect something more like this:
This applies to similar shapes also:
Expectation:
Am I doing something wrong? Or it will truly will not be possible with such small pictures, because I tried skeletonization on bigger images and it worked just fine. Original images:
You could try the skeleton in DIPlib (dip.EuclideanSkeleton):
import numpy as np
import diplib as dip
import cv2
file = "66.png"
img_grey = cv2.imread(file, cv2.IMREAD_GRAYSCALE)
afterMedian = cv2.medianBlur(img_grey, 3)
thresh = 140
bin = afterMedian > thresh
sk = dip.EuclideanSkeleton(bin, endPixelCondition='three neighbors')
dip.viewer.Show(bin)
dip.viewer.Show(sk)
dip.viewer.Spin()
The endPixelCondition input argument can be used to adjust how many branches are preserved or removed. 'three neighbors' is the option that produces the most branches.
The code above produces branches also towards the corners of the image. Using 'two neighbors' prevents that, but produces fewer branches towards the object as well. The other way to prevent it is to set edgeCondition='object', but in this case the ring around the object becomes a square on the image boundary.
To convert the DIPlib image sk back to a NumPy array, do
sk = np.array(sk)
sk is now a Boolean NumPy array (values True and False). To create an array compatible with OpenCV simply cast to np.uint8 and multiply by 255:
sk = np.array(sk, dtype=np.uint8)
sk *= 255
Note that, when dealing with NumPy arrays, you generally don't need to loop over all pixels. In fact, it's worth trying to avoid doing so, as loops in Python are extremely slow.
It seems the scikit-image is much better choice than cv2 here.
since the package define Bit functions, if you are playing with BW images, then try this ready to use code:
skeletonize
note: if process pass the image details, then don’t upsample the input at first until you tried other functions:again use skimage morphology functions to enhance details which in such case your code will be work on bigger area of images too. You could look here.

Automatic contrast and brightness adjustment of a color photo of a sheet of paper with OpenCV

When photographing a sheet of paper (e.g. with phone camera), I get the following result (left image) (jpg download here). The desired result (processed manually with an image editing software) is on the right:
I would like to process the original image with openCV to get a better brightness/contrast automatically (so that the background is more white).
Assumption: the image has an A4 portrait format (we don't need to perspective-warp it in this topic here), and the sheet of paper is white with possibly text/images in black or colors.
What I've tried so far:
Various adaptive thresholding methods such as Gaussian, OTSU (see OpenCV doc Image Thresholding). It usually works well with OTSU:
ret, gray = cv2.threshold(img, 0, 255, cv2.THRESH_OTSU + cv2.THRESH_BINARY)
but it only works for grayscale images and not directly for color images. Moreover, the output is binary (white or black), which I don't want: I prefer to keep a color non-binary image as output
Histogram equalization
applied on Y (after RGB => YUV transform)
or applied on V (after RGB => HSV transform),
as suggested by this answer (Histogram equalization not working on color image - OpenCV) or this one (OpenCV Python equalizeHist colored image):
img3 = cv2.imread(f)
img_transf = cv2.cvtColor(img3, cv2.COLOR_BGR2YUV)
img_transf[:,:,0] = cv2.equalizeHist(img_transf[:,:,0])
img4 = cv2.cvtColor(img_transf, cv2.COLOR_YUV2BGR)
cv2.imwrite('test.jpg', img4)
or with HSV:
img_transf = cv2.cvtColor(img3, cv2.COLOR_BGR2HSV)
img_transf[:,:,2] = cv2.equalizeHist(img_transf[:,:,2])
img4 = cv2.cvtColor(img_transf, cv2.COLOR_HSV2BGR)
Unfortunately, the result is quite bad since it creates awful micro contrasts locally (?):
I also tried YCbCr instead, and it was similar.
I also tried CLAHE (Contrast Limited Adaptive Histogram Equalization) with various tileGridSize from 1 to 1000:
img3 = cv2.imread(f)
img_transf = cv2.cvtColor(img3, cv2.COLOR_BGR2HSV)
clahe = cv2.createCLAHE(tileGridSize=(100,100))
img_transf[:,:,2] = clahe.apply(img_transf[:,:,2])
img4 = cv2.cvtColor(img_transf, cv2.COLOR_HSV2BGR)
cv2.imwrite('test.jpg', img4)
but the result was equally awful too.
Doing this CLAHE method with LAB color space, as suggested in the question How to apply CLAHE on RGB color images:
import cv2, numpy as np
bgr = cv2.imread('_example.jpg')
lab = cv2.cvtColor(bgr, cv2.COLOR_BGR2LAB)
lab_planes = cv2.split(lab)
clahe = cv2.createCLAHE(clipLimit=2.0,tileGridSize=(100,100))
lab_planes[0] = clahe.apply(lab_planes[0])
lab = cv2.merge(lab_planes)
bgr = cv2.cvtColor(lab, cv2.COLOR_LAB2BGR)
cv2.imwrite('_example111.jpg', bgr)
gave bad result too. Output image:
Do an adaptive thresholding or histogram equalization separately on each channel (R, G, B) is not an option since it would mess with the color balance, as explained here.
"Contrast strechting" method from scikit-image's tutorial on Histogram Equalization:
the image is rescaled to include all intensities that fall within the 2nd and 98th percentiles
is a little bit better, but still far from the desired result (see image on top of this question).
TL;DR: how to get an automatic brightness/contrast optimization of a color photo of a sheet of paper with OpenCV/Python? What kind of thresholding/histogram equalization/other technique could be used?
Contrast and brightness can be adjusted using alpha (α) and beta (β), respectively. These variables are often called the gain and bias parameters. The expression can be written as
OpenCV already implements this as cv2.convertScaleAbs() so we can just use this function with user defined alpha and beta values.
import cv2
image = cv2.imread('1.jpg')
alpha = 1.95 # Contrast control (1.0-3.0)
beta = 0 # Brightness control (0-100)
manual_result = cv2.convertScaleAbs(image, alpha=alpha, beta=beta)
cv2.imshow('original', image)
cv2.imshow('manual_result', manual_result)
cv2.waitKey()
But the question was
How to get an automatic brightness/contrast optimization of a color photo?
Essentially the question is how to automatically calculate alpha and beta. To do this, we can look at the histogram of the image. Automatic brightness and contrast optimization calculates alpha and beta so that the output range is [0...255]. We calculate the cumulative distribution to determine where color frequency is less than some threshold value (say 1%) and cut the right and left sides of the histogram. This gives us our minimum and maximum ranges. Here's a visualization of the histogram before (blue) and after clipping (orange). Notice how the more "interesting" sections of the image are more pronounced after clipping.
To calculate alpha, we take the minimum and maximum grayscale range after clipping and divide it from our desired output range of 255
α = 255 / (maximum_gray - minimum_gray)
To calculate beta, we plug it into the formula where g(i, j)=0 and f(i, j)=minimum_gray
g(i,j) = α * f(i,j) + β
which after solving results in this
β = -minimum_gray * α
For your image we get this
Alpha: 3.75
Beta: -311.25
You may have to adjust the clipping threshold value to refine results. Here's some example results using a 1% threshold with other images: Before -> After
Automated brightness and contrast code
import cv2
import numpy as np
from matplotlib import pyplot as plt
# Automatic brightness and contrast optimization with optional histogram clipping
def automatic_brightness_and_contrast(image, clip_hist_percent=1):
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Calculate grayscale histogram
hist = cv2.calcHist([gray],[0],None,[256],[0,256])
hist_size = len(hist)
# Calculate cumulative distribution from the histogram
accumulator = []
accumulator.append(float(hist[0]))
for index in range(1, hist_size):
accumulator.append(accumulator[index -1] + float(hist[index]))
# Locate points to clip
maximum = accumulator[-1]
clip_hist_percent *= (maximum/100.0)
clip_hist_percent /= 2.0
# Locate left cut
minimum_gray = 0
while accumulator[minimum_gray] < clip_hist_percent:
minimum_gray += 1
# Locate right cut
maximum_gray = hist_size -1
while accumulator[maximum_gray] >= (maximum - clip_hist_percent):
maximum_gray -= 1
# Calculate alpha and beta values
alpha = 255 / (maximum_gray - minimum_gray)
beta = -minimum_gray * alpha
'''
# Calculate new histogram with desired range and show histogram
new_hist = cv2.calcHist([gray],[0],None,[256],[minimum_gray,maximum_gray])
plt.plot(hist)
plt.plot(new_hist)
plt.xlim([0,256])
plt.show()
'''
auto_result = cv2.convertScaleAbs(image, alpha=alpha, beta=beta)
return (auto_result, alpha, beta)
image = cv2.imread('1.jpg')
auto_result, alpha, beta = automatic_brightness_and_contrast(image)
print('alpha', alpha)
print('beta', beta)
cv2.imshow('auto_result', auto_result)
cv2.waitKey()
Result image with this code:
Results with other images using a 1% threshold
An alternative version is to add gain and bias to an image using saturation arithmetic instead of using OpenCV's cv2.convertScaleAbs(). The built-in method does not take an absolute value, which would lead to nonsensical results (e.g., a pixel at 44 with alpha = 3 and beta = -210 becomes 78 with OpenCV, when in fact it should become 0).
import cv2
import numpy as np
# from matplotlib import pyplot as plt
def convertScale(img, alpha, beta):
"""Add bias and gain to an image with saturation arithmetics. Unlike
cv2.convertScaleAbs, it does not take an absolute value, which would lead to
nonsensical results (e.g., a pixel at 44 with alpha = 3 and beta = -210
becomes 78 with OpenCV, when in fact it should become 0).
"""
new_img = img * alpha + beta
new_img[new_img < 0] = 0
new_img[new_img > 255] = 255
return new_img.astype(np.uint8)
# Automatic brightness and contrast optimization with optional histogram clipping
def automatic_brightness_and_contrast(image, clip_hist_percent=25):
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# Calculate grayscale histogram
hist = cv2.calcHist([gray],[0],None,[256],[0,256])
hist_size = len(hist)
# Calculate cumulative distribution from the histogram
accumulator = []
accumulator.append(float(hist[0]))
for index in range(1, hist_size):
accumulator.append(accumulator[index -1] + float(hist[index]))
# Locate points to clip
maximum = accumulator[-1]
clip_hist_percent *= (maximum/100.0)
clip_hist_percent /= 2.0
# Locate left cut
minimum_gray = 0
while accumulator[minimum_gray] < clip_hist_percent:
minimum_gray += 1
# Locate right cut
maximum_gray = hist_size -1
while accumulator[maximum_gray] >= (maximum - clip_hist_percent):
maximum_gray -= 1
# Calculate alpha and beta values
alpha = 255 / (maximum_gray - minimum_gray)
beta = -minimum_gray * alpha
'''
# Calculate new histogram with desired range and show histogram
new_hist = cv2.calcHist([gray],[0],None,[256],[minimum_gray,maximum_gray])
plt.plot(hist)
plt.plot(new_hist)
plt.xlim([0,256])
plt.show()
'''
auto_result = convertScale(image, alpha=alpha, beta=beta)
return (auto_result, alpha, beta)
image = cv2.imread('1.jpg')
auto_result, alpha, beta = automatic_brightness_and_contrast(image)
print('alpha', alpha)
print('beta', beta)
cv2.imshow('auto_result', auto_result)
cv2.imwrite('auto_result.png', auto_result)
cv2.imshow('image', image)
cv2.waitKey()
Robust Locally-Adaptive Soft Binarization! That's what I call it.
I've done similar stuffs before, for a bit different purpose, so this may not perfectly fit for your needs, but hope it helps (also I wrote this code at night for personal use so it's ugly). In a sense, this code was intended to solve a more general case compared to yours, where we can have a lot of structured noise on the background (see demo below).
What this code does? Given a photo of a sheet of paper, it will whiten it so that it can be perfectly printable. See example images below.
Teaser: that's how your pages will look like after this algorithm (before and after). Notice that even the color marker annotations are gone, so I don't know if this will fit your use case but the code might be useful:
To get a perfectly clean results, you might need to toy around with filtering parameters a bit, but as you can see, even with default parameters it works quite well.
Step 0: Cut the images to fit closely to the page
Let's asume you somehow did this step (it seems like that in the examples you provided). If you need a manual annotate-and-rewarp tool, just pm me! ^^ The results of this step is below (the examples I use here are arguably harder than the one you provided, whilst it may not exactly match your case):
From this we can immediately see the following problems:
Lightening condition is not even. This means all simple binarization methods won't work. I tried a lot of solutions available in OpenCV, as well as their combinations, none of them worked!
A lot of background noise. In my case, I needed to remove the grid of the paper, and also the ink from the other side of the paper that is visible through the thin sheet.
Step 1: Gamma correction
The reasoning of this step is to balance out the contrast of the whole image (since your image can be slightly overexposed/underexposed depending to the lighting condition).
This may seem at first as an unnecessary step, but the importance of it cannot be underestimated: in a sense, it normalizes the images to the similar distributions of exposures, so that you can choose meaningful hyper-parameters later (e.g. the DELTA parameter in next section, the noise filtering parameters, parameters for morphological stuffs, etc.)
# Somehow I found the value of `gamma=1.2` to be the best in my case
def adjust_gamma(image, gamma=1.2):
# build a lookup table mapping the pixel values [0, 255] to
# their adjusted gamma values
invGamma = 1.0 / gamma
table = np.array([((i / 255.0) ** invGamma) * 255
for i in np.arange(0, 256)]).astype("uint8")
# apply gamma correction using the lookup table
return cv2.LUT(image, table)
Here are results of gamma adjusting:
You can see that it is a bit more... "balanced" now. Without this step, all parameters that you will pick by hand in later steps will become less robust!
Step 2: Adaptive Binarization to Detect the Text Blobs
In this step, we will adaptively binarize out the text blobs. I will add more comments later, but the idea basically is following:
We divide the image into blocks of size BLOCK_SIZE. The trick is to choose its size large enough so that you still get a large chunk of text and background (i.e. larger than any symbols that you have), but small enough to not suffer from any lightening condition variations (i.e. "large, but still local").
Inside each block, we do locally-adaptive binarization: we look at the median value and hypothesize that it is the background (because we chose the BLOCK_SIZE large enough to have the majority of it to be background). Then, we further define DELTA — basically just a threshold of "how far away from median we will still consider it as background?".
So, the function process_image gets the job done. Moreover, you can modify the preprocess and postprocess functions to fit your need (however, as you can see from the example above, the algorithm is pretty robust, i.e. it works quite well out-of-the-box without modifying too much the parameters).
The code of this part assumes the foreground to be darker than the background (i.e. ink on paper). But you can easily change that by tweaking the preprocess function: instead of 255 - image, return just image.
# These are probably the only important parameters in the
# whole pipeline (steps 0 through 3).
BLOCK_SIZE = 40
DELTA = 25
# Do the necessary noise cleaning and other stuffs.
# I just do a simple blurring here but you can optionally
# add more stuffs.
def preprocess(image):
image = cv2.medianBlur(image, 3)
return 255 - image
# Again, this step is fully optional and you can even keep
# the body empty. I just did some opening. The algorithm is
# pretty robust, so this stuff won't affect much.
def postprocess(image):
kernel = np.ones((3,3), np.uint8)
image = cv2.morphologyEx(image, cv2.MORPH_OPEN, kernel)
return image
# Just a helper function that generates box coordinates
def get_block_index(image_shape, yx, block_size):
y = np.arange(max(0, yx[0]-block_size), min(image_shape[0], yx[0]+block_size))
x = np.arange(max(0, yx[1]-block_size), min(image_shape[1], yx[1]+block_size))
return np.meshgrid(y, x)
# Here is where the trick begins. We perform binarization from the
# median value locally (the img_in is actually a slice of the image).
# Here, following assumptions are held:
# 1. The majority of pixels in the slice is background
# 2. The median value of the intensity histogram probably
# belongs to the background. We allow a soft margin DELTA
# to account for any irregularities.
# 3. We need to keep everything other than the background.
#
# We also do simple morphological operations here. It was just
# something that I empirically found to be "useful", but I assume
# this is pretty robust across different datasets.
def adaptive_median_threshold(img_in):
med = np.median(img_in)
img_out = np.zeros_like(img_in)
img_out[img_in - med < DELTA] = 255
kernel = np.ones((3,3),np.uint8)
img_out = 255 - cv2.dilate(255 - img_out,kernel,iterations = 2)
return img_out
# This function just divides the image into local regions (blocks),
# and perform the `adaptive_mean_threshold(...)` function to each
# of the regions.
def block_image_process(image, block_size):
out_image = np.zeros_like(image)
for row in range(0, image.shape[0], block_size):
for col in range(0, image.shape[1], block_size):
idx = (row, col)
block_idx = get_block_index(image.shape, idx, block_size)
out_image[block_idx] = adaptive_median_threshold(image[block_idx])
return out_image
# This function invokes the whole pipeline of Step 2.
def process_image(img):
image_in = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
image_in = preprocess(image_in)
image_out = block_image_process(image_in, BLOCK_SIZE)
image_out = postprocess(image_out)
return image_out
The results are nice blobs like this, closely following the ink trace:
Step 3: The "Soft" Part of Binarization
Having the blobs that covers the symbols and a little bit more, we can finally do the whitening procedure.
If we look more closely at the photos of sheets of papers with text (especially those that have hand writings), the transformation from "background" (white paper) to "foreground" (the dark color ink) is not sharp, but very gradual. Other binarization-based answers in this section proposes a simple thresholding (even if they are locally-adaptive, it is still a threshold), which works okay for printed text, but will produce not-so-pretty results with hand writings.
So, the motivation of this section is that we want to preserve that effect of gradual transmission from black to white, just as natural photos of sheets of papers with natural ink. The final purpose for that is to make it printable.
The main idea is simple: the more the pixel value (after thresholding above) differs from the local min value, the more likely it is belonging to the background. We can express this using a family of Sigmoid functions, re-scaled to the range of local block (so that this function is adaptively scaled thorough the image).
# This is the function used for composing
def sigmoid(x, orig, rad):
k = np.exp((x - orig) * 5 / rad)
return k / (k + 1.)
# Here, we combine the local blocks. A bit lengthy, so please
# follow the local comments.
def combine_block(img_in, mask):
# First, we pre-fill the masked region of img_out to white
# (i.e. background). The mask is retrieved from previous section.
img_out = np.zeros_like(img_in)
img_out[mask == 255] = 255
fimg_in = img_in.astype(np.float32)
# Then, we store the foreground (letters written with ink)
# in the `idx` array. If there are none (i.e. just background),
# we move on to the next block.
idx = np.where(mask == 0)
if idx[0].shape[0] == 0:
img_out[idx] = img_in[idx]
return img_out
# We find the intensity range of our pixels in this local part
# and clip the image block to that range, locally.
lo = fimg_in[idx].min()
hi = fimg_in[idx].max()
v = fimg_in[idx] - lo
r = hi - lo
# Now we use good old OTSU binarization to get a rough estimation
# of foreground and background regions.
img_in_idx = img_in[idx]
ret3,th3 = cv2.threshold(img_in[idx],0,255,cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# Then we normalize the stuffs and apply sigmoid to gradually
# combine the stuffs.
bound_value = np.min(img_in_idx[th3[:, 0] == 255])
bound_value = (bound_value - lo) / (r + 1e-5)
f = (v / (r + 1e-5))
f = sigmoid(f, bound_value + 0.05, 0.2)
# Finally, we re-normalize the result to the range [0..255]
img_out[idx] = (255. * f).astype(np.uint8)
return img_out
# We do the combination routine on local blocks, so that the scaling
# parameters of Sigmoid function can be adjusted to local setting
def combine_block_image_process(image, mask, block_size):
out_image = np.zeros_like(image)
for row in range(0, image.shape[0], block_size):
for col in range(0, image.shape[1], block_size):
idx = (row, col)
block_idx = get_block_index(image.shape, idx, block_size)
out_image[block_idx] = combine_block(
image[block_idx], mask[block_idx])
return out_image
# Postprocessing (should be robust even without it, but I recommend
# you to play around a bit and find what works best for your data.
# I just left it blank.
def combine_postprocess(image):
return image
# The main function of this section. Executes the whole pipeline.
def combine_process(img, mask):
image_in = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
image_out = combine_block_image_process(image_in, mask, 20)
image_out = combine_postprocess(image_out)
return image_out
Some stuffs are commented since they are optional. The combine_process function takes the mask from the previous step, and executes the whole composition pipeline. You can try to toy with them for your specific data (images). The results are neat:
Probably I will add more comments and explanations to the code in this answer. Will upload the whole thing (together with cropping and warping code) on Github.
This method should work well for your application. First you find a threshold value that separates the distribution modes well in the intensity histogram then rescale the intensity using that value.
from skimage.filters import threshold_yen
from skimage.exposure import rescale_intensity
from skimage.io import imread, imsave
img = imread('mY7ep.jpg')
yen_threshold = threshold_yen(img)
bright = rescale_intensity(img, (0, yen_threshold), (0, 255))
imsave('out.jpg', bright)
I'm here using Yen's method, can learn more about this method on this page.
I think the way to do that is 1) Extract the chroma (saturation) channel from HCL colorspace. (HCL works better than HSL or HSV). Only colors should have non-zero saturation, so bright, and gray shades will be dark. 2) Threshold that result using otsu thresholding to use as a mask. 3) Convert your input to grayscale and apply local area (i.e., adaptive) thresholding. 4) put the mask into the alpha channel of the original and then composite the local area thresholded result with the original, so that it keeps the colored area from the original and everywhere else uses the local area thresholded result.
Sorry, I do not know OpeCV that well, but here are the steps using ImageMagick.
Note that channels are numbered starting with 0. (H=0 or red, C=1 or green, L=2 or blue)
Input:
magick image.jpg -colorspace HCL -channel 1 -separate +channel tmp1.png
magick tmp1.png -auto-threshold otsu tmp2.png
magick image.jpg -colorspace gray -negate -lat 20x20+10% -negate tmp3.png
magick tmp3.png \( image.jpg tmp2.png -alpha off -compose copy_opacity -composite \) -compose over -composite result.png
ADDITION:
Here is Python Wand code, which produces the same output result. It needs Imagemagick 7 and Wand 0.5.5.
#!/bin/python3.7
from wand.image import Image
from wand.display import display
from wand.version import QUANTUM_RANGE
with Image(filename='text.jpg') as img:
with img.clone() as copied:
with img.clone() as hcl:
hcl.transform_colorspace('hcl')
with hcl.channel_images['green'] as mask:
mask.auto_threshold(method='otsu')
copied.composite(mask, left=0, top=0, operator='copy_alpha')
img.transform_colorspace('gray')
img.negate()
img.adaptive_threshold(width=20, height=20, offset=0.1*QUANTUM_RANGE)
img.negate()
img.composite(copied, left=0, top=0, operator='over')
img.save(filename='text_process.jpg')
First we separate text and color markings. This can be done in a color space with a color saturation channel. I used instead a very simple method inspired by this paper: the ration of min(R,G,B)/ max(R,G,B) will be near 1 for (light) gray areas and << 1 for colored areas. For dark gray areas we get anything between 0 and 1, but this doesn't matter: either these areas go to the color mask and are then added as is or they are not included in the mask and are contributed to the output from the binarized text. For black we use the fact that 0/0 becomes 0 when converted to uint8.
The grayscale image text gets locally thresholded to produce a black and white image. You can pick your favorite technique from this comparison or that survey. I chose the NICK technique that copes well with low contrast and is rather robust, i.e. the choice of the parameter k between about -0.3 and -0.1 works well for a very wide range of conditions which is good for automatic processing. For the sample document provided the chosen technique doesn't play a big role as it is relatively uniformly illuminated, but in order to cope with non-uniformly illuminated images it should be a local thresholding technique.
In the final step, the color areas are added back to the binarized text image.
So this solution is very similar to #fmw42's solution (all credit for the idea to him) with the exception of the different color detection and binarization methods.
image = cv2.imread('mY7ep.jpg')
# make mask and inverted mask for colored areas
b,g,r = cv2.split(cv2.blur(image,(5,5)))
np.seterr(divide='ignore', invalid='ignore') # 0/0 --> 0
m = (np.fmin(np.fmin(b, g), r) / np.fmax(np.fmax(b, g), r)) * 255
_,mask_inv = cv2.threshold(np.uint8(m), 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
mask = cv2.bitwise_not(mask_inv)
# local thresholding of grayscale image
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
text = cv2.ximgproc.niBlackThreshold(gray, 255, cv2.THRESH_BINARY, 41, -0.1, binarizationMethod=cv2.ximgproc.BINARIZATION_NICK)
# create background (text) and foreground (color markings)
bg = cv2.bitwise_and(text, text, mask = mask_inv)
fg = cv2.bitwise_and(image, image, mask = mask)
out = cv2.add(cv2.cvtColor(bg, cv2.COLOR_GRAY2BGR), fg)
If you don't need the color markings, you can simply binarize the grayscale image:
image = cv2.imread('mY7ep.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
text = cv2.ximgproc.niBlackThreshold(gray, 255, cv2.THRESH_BINARY, at_bs, -0.3, binarizationMethod=cv2.ximgproc.BINARIZATION_NICK)
This is a C# transpilation(performed via https://github.com/uxmal/pytocs) for nathancy's answer for Emgu.CV wrapper library:
/// <summary>
/// <see>https://stackoverflow.com/questions/56905592/automatic-contrast-and-brightness-adjustment-of-a-color-photo-of-a-sheet-of-pape/75455163#75455163</see>
/// </summary>
public static (Mat autoResult, int alpha, int beta) AutomaticBrightnessAndContrast(Mat image, double clipHistPercent = 1)
{
var gray = new Mat();
CvInvoke.CvtColor(image, gray, ColorConversion.Bgr2Gray);
// Calculate grayscale histogram
var hist = new Mat();
var grayVector = new VectorOfMat(gray);
CvInvoke.CalcHist(grayVector, new[] {0}, null, hist, new[] {256}, new[] {0f, 256}, false);
var histSize = hist.Rows;
// Calculate cumulative distribution from the histogram
var accumulator = new List<float> {hist.Get<float>(0, 0)};
foreach (var index in Enumerable.Range(1, histSize - 1))
accumulator.Add(accumulator[index - 1] + hist.Get<float>(index, 0));
// Locate points to clip
var maximum = accumulator[255];
clipHistPercent *= maximum / 100.0;
clipHistPercent /= 2.0;
// Locate left cut
var minimumGray = 0;
while (accumulator[minimumGray] < clipHistPercent)
minimumGray += 1;
// Locate right cut
var maximumGray = histSize - 1;
while (accumulator[maximumGray] >= maximum - clipHistPercent)
maximumGray -= 1;
// Calculate alpha and beta values
var alpha = 255 / (maximumGray - minimumGray);
var beta = -minimumGray * alpha;
var autoResult = new Mat();
CvInvoke.ConvertScaleAbs(image, autoResult, alpha, beta);
return (autoResult, alpha, beta);
}
public static class MatExtension
{
/// <summary>
/// <see>https://stackoverflow.com/questions/32255440/how-can-i-get-and-set-pixel-values-of-an-emgucv-mat-image/69537504#69537504</see>
/// </summary>
public static unsafe T Get<T>(this Mat mat, int row, int col) =>
new ReadOnlySpan<T>(mat.DataPointer.ToPointer(), mat.Rows * mat.Cols * mat.ElementSize)
[(row * mat.Cols) + col];
}
If you are using OpenCvSharp, just modify all invokes to OpenCV with updated parameters like Rotate an image without cropping in OpenCV in C++
Also note that OpenCvSharp already has Mat.Set<> method that functions same as mat.at<> in the original OpenCV, so we don't have to copy these methods from How can I get and set pixel values of an EmguCV Mat image?

Optimized skeleton function for opencv with python

So I am using OpenCV on raspbian (raspberry pi 2 model B). I am doing vision/image processing obviously and the rasppi is what I was given (I would use a computer if I could for this).
I need to run a skeleton function. I found the following implementation:
import cv2
import numpy as np
img = cv2.imread('img.png',0)
size = np.size(img)
skeleton = np.zeros(img.shape,np.uint8)
ret,img = cv2.threshold(img,127,255,0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
finished = False
while(not finished):
eroded = cv2.erode(img,kernel)
temp = cv2.dilate(eroded,kernel)
temp = cv2.subtract(img,temp)
skeleton = cv2.bitwise_or(skeleton,temp)
img = eroded.copy()
zeros = size - cv2.countNonZero(img)
if zeros==size:
finished = True
cv2.imshow("skeleton",skeleton)
cv2.waitKey(0)
cv2.destroyAllWindows()
While it runs, it's very very slow unsurprisingly (I am doing an FFT and bandpass filtering operation the image before this, then running the skeleton operation). The other code is slow, but will complete the operations.
The images are big - I could crop them some, but I don't think it would be enough. I was trying to find an optimized version of this, but so far haven't come up with anything. Any ideas or solutions?
In this answer, I'll focus on improving your implementation, rather than the algorithm. While this won't gain us a significant amount, I think it's still useful to be aware of.
Preparation
Let's begin with some boilerplate -- necessary imports, some test image, and few functions to let us compare easily:
from timeit import default_timer as timer
import numpy as np
import cv2
# Create a decent size test image...
img = cv2.imread('cage.png',0)
img = cv2.resize(img, (2048, 2048))
cv2.normalize(img, img, 0, 255, cv2.NORM_MINMAX)
def time_fn(fn, img, iters=1):
start = timer()
result = None
for i in range(iters):
result = fn(img)
end = timer()
return (result,((end - start) / iters) * 1000)
def run_test(fn, img, i):
res, t = time_fn(fn, img, 4)
cv2.imwrite("skeleton_%d.png" % i, res[0])
print "Variant %d" % i
print "Input size = (%d, %d)" % img.shape[:2]
print "Ran %d iterations to find skeleton." % res[1]
print "Avg. find_skeleton time = %0.4f s." % (t/1000)
Variant 1 (Original)
Let's turn your implementation into a function, and remove a few unnecessary bits. Out of curiosity, let's track the number of iterations needed for the skeletonization.
def find_skeleton1(img):
skeleton = np.zeros(img.shape,np.uint8)
_,thresh = cv2.threshold(img,127,255,0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
iters = 0
while(True):
eroded = cv2.erode(thresh, kernel)
temp = cv2.dilate(eroded, kernel)
temp = cv2.subtract(thresh, temp)
skeleton = cv2.bitwise_or(skeleton, temp)
thresh = eroded.copy()
iters += 1
if cv2.countNonZero(thresh) == 0:
return (skeleton,iters)
And let's see how it performs to set our baseline.
>>> run_test(find_skeleton1, img, 1)
Variant 1
Input size = (2048, 2048)
Ran 338 iterations to find skeleton.
Avg. find_skeleton time = 2.7969 s.
Variant 2
The first improvement we can make is to minimize the number of allocations of new array objects, and reuse as much as possible. We can create a few more temporary arrays (like skeleton), and use the dst parameter of the OpenCV functions in the loop ignoring the return value. Since we provide a destination of correct shape and data type, the existing array gets reused.
def find_skeleton2(img):
skeleton = np.zeros(img.shape,np.uint8)
eroded = np.zeros(img.shape,np.uint8)
temp = np.zeros(img.shape,np.uint8)
_,thresh = cv2.threshold(img,127,255,0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
iters = 0
while(True):
cv2.erode(thresh, kernel, eroded)
cv2.dilate(eroded, kernel, temp)
cv2.subtract(thresh, temp, temp)
cv2.bitwise_or(skeleton, temp, skeleton)
thresh = eroded.copy()
iters += 1
if cv2.countNonZero(thresh) == 0:
return (skeleton,iters)
Let's try this out, and check that the results are the same:
>>> print np.array_equal(find_skeleton1(img)[0], find_skeleton2(img)[0])
True
>>> run_test(find_skeleton2, img, 2)
Variant 2
Input size = (2048, 2048)
Ran 338 iterations to find skeleton.
Avg. find_skeleton time = 1.4356 s.
Variant 3
The next step is to get rid of unnecessary copies -- there's one that's very obvious: thresh = eroded.copy(). Notice that in the following iteration, we immediately overwrite the contents of eroded. Hence, we don't really care what it contains, as long as it's the correct shape and data type. They are, so this means that rather than performing a copy, we can just swap the two objects.
def find_skeleton3(img):
skeleton = np.zeros(img.shape,np.uint8)
eroded = np.zeros(img.shape,np.uint8)
temp = np.zeros(img.shape,np.uint8)
_,thresh = cv2.threshold(img,127,255,0)
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
iters = 0
while(True):
cv2.erode(thresh, kernel, eroded)
cv2.dilate(eroded, kernel, temp)
cv2.subtract(thresh, temp, temp)
cv2.bitwise_or(skeleton, temp, skeleton)
thresh, eroded = eroded, thresh # Swap instead of copy
iters += 1
if cv2.countNonZero(thresh) == 0:
return (skeleton,iters)
Again, let's verify the results match and do some timing.
>>> print np.array_equal(find_skeleton1(img)[0], find_skeleton3(img)[0])
True
>>> run_test(find_skeleton3, img, 3)
Variant 3
Input size = (2048, 2048)
Ran 338 iterations to find skeleton.
Avg. find_skeleton time = 0.9839 s.
Few simple changes got the timing down to ~35% of the original. Of course, it still does hundreds of iterations processing the entire image. Next step would be to look into ways how to reduce the amount of work -- in the latter iterations, significant areas of the working image are black, and don't contribute anything to the skeleton.
NB: Measurements done on i7-4930K. I don't have a raspberry, feel free to add timings from yours, so we see what sort of effect it has.

Faster method for adjusting PIL pixel values

I'm writing a script to chroma key (green screen) and composite some videos using Python and PIL (pillow). I can key the 720p images, but there's some left over green spill. Understandable but I'm writing a routine to remove that spill...however I'm struggling with how long it's taking. I can probably get better speeds using numpy tricks, but I'm not that familiar with it. Any ideas?
Here's my despill routine. It takes a PIL image and a sensitivity number but I've been leaving that at 1 so far...it's been working well. I'm coming in at just over 4 seconds for a 720p frame to remove this spill. For comparison, the chroma key routine runs in about 2 seconds per frame.
def despill(img, sensitivity=1):
"""
Blue limits green.
"""
start = time.time()
print '\t[*] Starting despill'
width, height = img.size
num_channels = len(img.getbands())
out = Image.new("RGBA", img.size, color=0)
for j in range(height):
for i in range(width):
#r,g,b,a = data[j,i]
r,g,b,a = img.getpixel((i,j))
if g > (b*sensitivity):
out_g = (b*sensitivity)
else:
out_g = g
# end if
out.putpixel((i,j), (r,out_g,b,a))
# end for
# end for
out.show()
print '\t[+] done.'
print '\t[!] Took: %0.1f seconds' % (time.time()-start)
exit()
return out
# end despill
Instead of putpixel, I tried to write the output pixel values to a numpy array then convert the array to a PIL image, but that was averaging just over 5 seconds...so this was faster somehow. I know putpixel isn't the snappiest option but I'm at a loss...
putpixel is slow, and loops like that are even slower, since they are run by the Python interpreter, which is slow as hell. The usual solution is to convert immediately the image to a numpy array and solve the problem with vectorized operations on it, which run in heavily optimized C code. In your case I would do something like:
arr = np.array(img)
g = arr[:,:,1]
bs = arr[:,:,2]*sensitivity
cond = g>bs
arr[:,:,1] = cond*bs + (~cond)*g
out = Image.fromarray(arr)
(it may not be correct and I'm sure it can be optimized way better, this is just a sketch)

Intensity normalization of image using Python+PIL - Speed issues

I'm working on a little problem in my sparetime involving analysis of some images obtained through a microscope. It is a wafer with some stuff here and there, and ultimately I want to make a program to detect when certain materials show up.
Anyways, first step is to normalize the intensity across the image, since the lens does not give uniform lightning. Currently I use an image, with no stuff on, only the substrate, as a background, or reference, image. I find the maximum of the three (intensity) values for RGB.
from PIL import Image
from PIL import ImageDraw
rmax = 0;gmax = 0;bmax = 0;rmin = 300;gmin = 300;bmin = 300
im_old = Image.open("test_image.png")
im_back = Image.open("background.png")
maxx = im_old.size[0] #Import the size of the image
maxy = im_old.size[1]
im_new = Image.new("RGB", (maxx,maxy))
pixback = im_back.load()
for x in range(maxx):
for y in range(maxy):
if pixback[x,y][0] > rmax:
rmax = pixback[x,y][0]
if pixback[x,y][1] > gmax:
gmax = pixback[x,y][1]
if pixback[x,y][2] > bmax:
bmax = pixback[x,y][2]
pixnew = im_new.load()
pixold = im_old.load()
for x in range(maxx):
for y in range(maxy):
r = float(pixold[x,y][0]) / ( float(pixback[x,y][0])*rmax )
g = float(pixold[x,y][1]) / ( float(pixback[x,y][1])*gmax )
b = float(pixold[x,y][2]) / ( float(pixback[x,y][2])*bmax )
pixnew[x,y] = (r,g,b)
The first part of the code determines the maximum intensity of the RED, GREEN and BLUE channels, pixel by pixel, of the background image, but needs only be done once.
The second part takes the "real" image (with stuff on it), and normalizes the RED, GREEN and BLUE channels, pixel by pixel, according to the background. This takes some time, 5-10 seconds for an 1280x960 image, which is way too slow if I need to do this to several images.
What can I do to improve the speed? I thought of moving all the images to numpy arrays, but I can't seem to find a fast way to do that for RGB images.
I'd rather not move away from python, since my C++ is quite low-level, and getting a working FORTRAN code would probably take longer than I could ever save in terms of speed :P
import numpy as np
from PIL import Image
def normalize(arr):
"""
Linear normalization
http://en.wikipedia.org/wiki/Normalization_%28image_processing%29
"""
arr = arr.astype('float')
# Do not touch the alpha channel
for i in range(3):
minval = arr[...,i].min()
maxval = arr[...,i].max()
if minval != maxval:
arr[...,i] -= minval
arr[...,i] *= (255.0/(maxval-minval))
return arr
def demo_normalize():
img = Image.open(FILENAME).convert('RGBA')
arr = np.array(img)
new_img = Image.fromarray(normalize(arr).astype('uint8'),'RGBA')
new_img.save('/tmp/normalized.png')
See http://docs.scipy.org/doc/scipy/reference/generated/scipy.misc.fromimage.html#scipy.misc.fromimage
You can say
databack = scipy.misc.fromimage(pixback)
rmax = numpy.max(databack[:,:,0])
gmax = numpy.max(databack[:,:,1])
bmax = numpy.max(databack[:,:,2])
which should be much faster than looping over all (r,g,b) triplets of your image.
Then you can do
dataold = scip.misc.fromimage(pixold)
r = dataold[:,:,0] / (pixback[:,:,0] * rmax )
g = dataold[:,:,1] / (pixback[:,:,1] * gmax )
b = dataold[:,:,2] / (pixback[:,:,2] * bmax )
datanew = numpy.array((r,g,b))
imnew = scipy.misc.toimage(datanew)
The code is not tested, but should work somehow with minor modifications.
This is partially from FolksTalk webpage:
from PIL import Image
import numpy as np
# Read image file
in_file = "my_image.png"
# convert('RGB') for PNG file type
image = Image.open(in_file).convert('RGB')
pixels = np.asarray(image)
# Convert from integers to floats
pixels = pixels.astype('float32')
# Normalize to the range 0-1
pixels /= 255.0

Categories

Resources